Popular Posts

Friday, October 29, 2010

Data Warehose concept

RDBMS record-oriented structure
The record-based structure is the most common choice for data warehouse applications.
In this structure, data is stored in physical records using the common physical location of
data values as the logical connection across all data points of the individual record. For
example, to find a customer address, the system must first locate the customer using an
indexed value such as customer number, then scan across the record to the position for
address. As a result, the smallest addressable unit of storage is an entire record, and all
physical I/O functions will always move complete records or sets of complete records.
Since entire records are physically written and read as single units, selecting any field
requires reading the entire record, and changing a single value requires reading and
rewriting the entire record. If the new value requires more space than the original,
extensive additional I/O is needed. The modified record may be written into an overflow
area, forcing either multiple read operations at query time or reorganizing and rewriting the
previous area.

Columnar database column-based structure
The newer column structure has gained interest as the indexing and data transfer
problems associated with record structures have proved problematic for analytics
applications. The column-based DBMS stores all of the values from one column of a table
in a contiguous data set. This allows the reading and/or writing of parts of records. It
conserves I/O bandwidth by transferring only the values that may be used in the query.
Since most data warehouse applications use only a few columns from a table during a
typical single access, the resulting bandwidth savings can be substantial.
Writing new records requires updating each of the column sets that belong to the record.
Reading a complete record requires locating the correct value from each column set and
assembling those into the original record format. Therefore, full record operations show
little improvement over record-based systems and, in some cases, even can perform more
slowly.

Correlation DBMS value-based structure
The most recent innovation, value-based storage (VBS)—used in the correlation database
management system (CDBMS)—also avoids the indexing problems of record structures,
but in addition, it has exploratory and analytical features not available with either recordoriented
or column-based structures. The value-based model stores each unique value
once, then uses extended metadata to maintain the information needed to manage record
formatting. VBS minimizes bandwidth consumption, as the actual data value storage is so
small that the entire set of all data values can frequently stay in memory, affecting I/O
bandwidth only when the system is started.


Relational Database (RDBMS)
There are many commercial variations of record-based databases available. All share the
common characteristic of storing related values in a physically contiguous record. The
most commonly used form used for data warehousing and analytics is the RDBMS, which
has dominated the entire database market for years due to its strengths in processing and
storing transactions and in generating reports.

Columnar Database
A columnar database stores data much differently than does a RDBMS. In this structure,
data is stored in sets by column—that is, all of the values stored in column one are stored
in one set, all the values in column two in another set, and so on. In addition to the values,
the information needed to reinsert them into the proper position in the original record
format is stored with each set.

Correlation Database (CDBMS)
The newest entry in the data warehouse platform arena is the correlation database
(CDBMS) with its value-based storage (VBS) model. This structure stores data values
independently from the logical schema and uses extended metadata to maintain the
information needed to reconstruct the original records. VBS offers performance equal to or
better than the columnar structures, plus additional features not provided by either
RDBMSs or columnar databases. In VBS, each unique data value is stored only once,
making the database extremely compact and fast.

Although in my point of view  relational databases (RDBMS) are the most common choice for data warehouse implementations, their record-based structure is far from ideal. There are many commercial variations of record-based databases available. All share the
common characteristic of storing related values in a physically contiguous record. The
most commonly used form used for data warehousing and analytics is the RDBMS, which
has dominated the entire database market for years due to its strengths in processing and
storing transactions and in generating reports.

 As data volumes grow and users demand more sophisticated analytical capabilities, the deficiencies of the RDBMS to data storage become more conspicuous.


Conclusion
Despite their limitations, rrecord-based relational databases have long been the prevailing
data structure used in data warehouse systems. However, as data volumes have
increased and analytical needs have become more sophisticated, the shortcomings of the
RDBMS—most significantly, the design compromise required between optimizing for
performance versus query flexibility—have spurred the development of alternative data
structures.
Columnar databases offer faster query performance and require less disk storage space
than do RDBMSs, but they force their own compromise between optimizing for new record
insertion versus record selection and retrieval.
The newest data warehouse database, correlation DBMS, uses a value-based storage
(VBS) model that eliminates design compromises. The upfront physical design effort is
minimized since the CDBMS is optimized for both query performance and flexibility
automatically as data is loaded. The correlation DBMS provides load speeds comparable
to RDBMS and columnar systems, minimizes disk space requirements, optimizes query
performance, and enables unique types of queries—such as associative and incremental
queries—that are impractical or impossible with other data structures.


No comments:

Post a Comment

minute workers Web Hosting