Content area
Full text
Flattening is challenging but vital
In "SSIS Novices' Guide to Data Warehouses: Moving Data into the Data Warehouse" (January 2011, InstantDoc ID 128971), I showed you the basic structure of a data warehouse whose databases contain sets of tables that store raw, staged, and dimensionally modeled data. These tables are referred to as the Raw tables, Stage tables, and Dimensional tables, respectively. I also showed you how to create a SQL Server Integration Services (SSIS) package called the Raw package. This package is used to move a near exact copy of your source data from an external location (probably the transactional database and server) to the Raw tables.
The next step in building a data warehouse is moving the data from the Raw tables to the Stage tables, which is referred to as staging the data. During the staging step, many changes are made to both the data's structure and content. The changes arc driven by business rules and dimensional architectural needs.
When explaining the kind of changes that take place in the staging step, I like to refer to something called the Five F Words. Although some aspects of data warehousing can be frustrating, none of these F words will get you in trouble at work. In fact, only one has four letters, and that word is Flag
As Figure 1 shows, the Five F Words arc Flatten, Fix, Flag, Filter, and Figure. In this article, I'll cover Flatten. I like to start with this F word because flattening data is often the hardest concept for new warehouse designers to grasp, especially if they come from a transactional database design background.
Flattening IOI
Designing a transactional database is all about getting it into third normal form (3NF). Normalization becomes second nature to database developers, which is why data warehouse design often feels very foreign to database developers who are building their first data warehouse. Many tables within warehouses are denormalizcd or flattened when compared to their transactional system counterparts. Transactional systems need to load new data quickly and can return report data slowly, as reporting is a secondary, less crucial activity. Warehouses can load data slowly (often in the wee hours of the morning), but should return report and ad-hoc query data...





