|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Definitions are from Wikipedia, the free encyclopedia What is a Data Warehouse? A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. What is a Data Mart? A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information. What is an Operational Data Store (ODS)? An operational data store (or "ODS") is a database designed to integrate data from multiple sources to facilitate operations, analysis and reporting. Because the data originates from multiple sources, the integration often involves cleaning, redundancy resolution and business rule enforcement. An ODS is usually designed to contain low level or atomic (indivisible) data such as transactions and prices as opposed to aggregated or summarized data such as net contributions. Aggregated data is usually stored in the Data warehouse. What are Star and Snowflake Schemas? The star schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts. The name star schema is derived from the fact that the schema diagram is shaped like a star. The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake and star schema are methods of storing data which are multidimensional in nature (i.e. which can be analyzed by any or all of a number of independent factors) in a relational database. The snowflake schema (sometimes called snowflake join schema) consists of one Fact table connected to many dimension tables, which can be connected to other dimension tables. A diagram of the schema resembles a snowflake due to a single center point branching out in many directions. The tables which describe the dimensions in the snowflake scheme are in Third normal form, whereas star schema dimensions are in Second normal form. What is a Fact? In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the center of a star schema, surrounded by dimension tables. Fact tables provide the (usually) additive values which act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (region is made up of many stores) What is a Dimension? In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric measures. For example, "Customer", "Date", and "Product" are all dimensions that could be applied meaningfully to a sales receipt. A dimensional data element is similar to a categorical variable in statistics. What is OLAP? Online Analytical Processing, or OLAP, is an approach to quickly providing
answers to analytical queries that are multidimensional in nature. OLAP
is part of the broader category business intelligence, which also includes
Extract transform load (ETL), relational reporting and data mining. The
typical applications of OLAP are in business reporting for sales, marketing,
management reporting, business process management (BPM), budgeting and
forecasting, financial reporting and similar areas. The term OLAP was
created as a slight modification of the traditional database term OLTP
(Online Transaction Processing). What is OLTP? Online transaction processing, or OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. The term is somewhat ambiguous; some understand a "transaction" in the context of computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions.[1] OLTP has also been used to refer to processing in which the system responds immediately to user requests. An automatic teller machine (ATM) for a bank is an example of a commercial transaction processing application. What is ETL? Extract, transform, and load (ETL) is a process in data warehousing that involves
ETL is important, as it is the way data actually gets loaded into the
warehouse. This article assumes that data is always loaded into a data
warehouse, whereas the term ETL can in fact refer to a process that loads
any database.
|
![]() |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||