Technology: Data Warehose concept

RDBMS record-oriented structure

The record-based structure is the most common choice for data warehouse applications.

In this structure, data is stored in physical records using the common physical location of

data values as the logical connection across all data points of the individual record. For

example, to find a customer address, the system must first locate the customer using an

indexed value such as customer number, then scan across the record to the position for

address. As a result, the smallest addressable unit of storage is an entire record, and all

physical I/O functions will always move complete records or sets of complete records.

Since entire records are physically written and read as single units, selecting any field

requires reading the entire record, and changing a single value requires reading and

rewriting the entire record. If the new value requires more space than the original,

extensive additional I/O is needed. The modified record may be written into an overflow

area, forcing either multiple read operations at query time or reorganizing and rewriting the

previous area.

Columnar database column-based structure

The newer column structure has gained interest as the indexing and data transfer

problems associated with record structures have proved problematic for analytics

applications. The column-based DBMS stores all of the values from one column of a table

in a contiguous data set. This allows the reading and/or writing of parts of records. It

conserves I/O bandwidth by transferring only the values that may be used in the query.

Since most data warehouse applications use only a few columns from a table during a

typical single access, the resulting bandwidth savings can be substantial.

Writing new records requires updating each of the column sets that belong to the record.

Reading a complete record requires locating the correct value from each column set and

assembling those into the original record format. Therefore, full record operations show

little improvement over record-based systems and, in some cases, even can perform more

slowly.

Correlation DBMS value-based structure

The most recent innovation, value-based storage (VBS)—used in the correlation database

management system (CDBMS)—also avoids the indexing problems of record structures,

but in addition, it has exploratory and analytical features not available with either recordoriented

or column-based structures. The value-based model stores each unique value

once, then uses extended metadata to maintain the information needed to manage record

formatting. VBS minimizes bandwidth consumption, as the actual data value storage is so

small that the entire set of all data values can frequently stay in memory, affecting I/O

bandwidth only when the system is started.

Relational Database (RDBMS)

There are many commercial variations of record-based databases available. All share the

common characteristic of storing related values in a physically contiguous record. The

most commonly used form used for data warehousing and analytics is the RDBMS, which

has dominated the entire database market for years due to its strengths in processing and

storing transactions and in generating reports.

Columnar Database

A columnar database stores data much differently than does a RDBMS. In this structure,

data is stored in sets by column—that is, all of the values stored in column one are stored

in one set, all the values in column two in another set, and so on. In addition to the values,

the information needed to reinsert them into the proper position in the original record

format is stored with each set.

Correlation Database (CDBMS)

The newest entry in the data warehouse platform arena is the correlation database

(CDBMS) with its value-based storage (VBS) model. This structure stores data values

independently from the logical schema and uses extended metadata to maintain the

information needed to reconstruct the original records. VBS offers performance equal to or

better than the columnar structures, plus additional features not provided by either

RDBMSs or columnar databases. In VBS, each unique data value is stored only once,

making the database extremely compact and fast.

Although in my point of view relational databases (RDBMS) are the most common choice for data warehouse implementations, their record-based structure is far from ideal. There are many commercial variations of record-based databases available. All share the

common characteristic of storing related values in a physically contiguous record. The

most commonly used form used for data warehousing and analytics is the RDBMS, which

has dominated the entire database market for years due to its strengths in processing and

storing transactions and in generating reports.

As data volumes grow and users demand more sophisticated analytical capabilities, the deficiencies of the RDBMS to data storage become more conspicuous.

Conclusion

Despite their limitations, rrecord-based relational databases have long been the prevailing

data structure used in data warehouse systems. However, as data volumes have

increased and analytical needs have become more sophisticated, the shortcomings of the

RDBMS—most significantly, the design compromise required between optimizing for

performance versus query flexibility—have spurred the development of alternative data

structures.

Columnar databases offer faster query performance and require less disk storage space

than do RDBMSs, but they force their own compromise between optimizing for new record

insertion versus record selection and retrieval.

The newest data warehouse database, correlation DBMS, uses a value-based storage

(VBS) model that eliminates design compromises. The upfront physical design effort is

minimized since the CDBMS is optimized for both query performance and flexibility

automatically as data is loaded. The correlation DBMS provides load speeds comparable

to RDBMS and columnar systems, minimizes disk space requirements, optimizes query

performance, and enables unique types of queries—such as associative and incremental

queries—that are impractical or impossible with other data structures.

Technology

Popular Posts

Friday, October 29, 2010

Data Warehose concept

No comments:

Post a Comment