Financial-oriented reporting warehouses may suffer from an inherent and pervasive lack of flexibility, due to excess reliance on a star schema design in strategic repository areas. Localized reporting systems will not accomplish stated objectives if one attempts to design and implement too many diverse types of reports (across business lines, market segments, organizational units, etc.) against a star schema-centric reporting database. Different groups of data analysts will always view an enterprise's data from dissimilar perspectives and disparate conventions. They will find it cumbersome to share star schema data structures that attempt to satisfy too wide a reporting audience simultaneously. It is many times best to load semi-normalized data into "a black box" data warehouse firstly, and then feed star or OLAP structures accordingly as data marts to propitiate sectarian wishes.
A given collection of data users will view information in a certain sequence and aggregation level; other groups will require different sequences, access paths, and sets of fundamental data. Requirement gaps between user groups will probably be large. Even though analytic knowledge workers toil away with data from the same transactional source systems of record--semantics names, and definitions of same/similar data elements vary from one department or business unit to another, making a "one size fits all" dimensional reporting model difficult to navigate, especially without a unified and associated meta data repository robustly in place. In addition, star schema tables sometimes share dimensions; this results in the spreading out of fact tables, and thus the warehouse tends to capture only one or a handful of effective data measures (such as unearned premium or specific notionals) per fact table, making the main warehouse repository more complex and resulting in custom fact tables based on line of business, instead of serving multiple organizational sections. Enterprise data audiences should be defined with as broad a brush as possible when constructing a main repository; it will be impossible to satisfy many different business units if the main repository is too customized.
Before embarking on a dimensional design, the data modeler must thoroughly understand what issues, problems, and questions the model will solve (or facilitate) for each business constituency that will be using the repository, now or in the future. Data users seldom have a thorough grasp of their own reporting needs or analytical requirements; therefore, it is best to capture as much as the source system universe as possible in the main warehouse and let individual user groups set requirements for their respective data marts, with data ownership being the ultimate goal. We must first have popular consensus and agreement on names, definitions, and business rules (calculations, aggregations, transformations) of data elements before we can populate proposed or current star schemas.
Dimensional models are usually better broken down according to localized needs and fed by a main quasi-normalized and self-contained warehouse that will port information out of the main repository and into dimensional marts, cubes, or XML mechanisms. The phased warehouse approach (normalized warehouse feeding dimensional marts) will also lend itself more readily--logically and physically--to n-tier architectures--with more options for the separation of data, presentation layers, procedure calls, business logic, transformations, and more. This will ultimately empower the knowledge users as they receive repository data, incarnated in data marts along their lines of business on demand. The star schema structure in itself does not provide the necessary information about the atomic level data--the very foundation of the data warehouse--required for the fact tables within the star schema. Going to a star schema too early in the game will not work--as one size does not fit all. For large warehouse implementations, star schema design must occur along specified measures and dimensions, and may not work effectively across various lines of business--likewise for aggregations and roll-ups. Star schema models, OLAP and ROLAP cubes, and virtual star schemas built from relational views or object database extensions, should occur downstream.
Enterprise information requirements and organization modeling are usually portrayed--in addition to UML constructs such as State Transition Diagrams and Case Hierarchies--via normalized entity relationship models. These models help segue into and form the basis of a data warehouse's vital meta data constructs and repositories by defining and capturing a more atomic and transactional level of data. Communication between an organization's data warehouse team and data stewardship team and (source-system) business analysts will be more effective if tangible common ground exists in the form of a ERD based model. The initial definition and implementation of a schema at such a diminutive data stratum is critical to the success of a main data warehouse or reporting repository. A main warehouse stratum founded on a star schema fabrication will not add value: I32f we fail to store incoming data at its lowest level of granularity in a normalized data layer we may find ourselves reextracting, rescrubbing, and retransforming source system data every time we need to add a new derived attribute to an operational or OLAP report. Such issues will be compounded with the introduction of historical and archiving requirements, which will pose new problems to the designers, DBAs, and developers. Implementing star structures in the wrong places can prevent you easily getting back to the detailed nature of your business in a pinch.
Furthermore, reorganizations, product introductions, new products, premium calculations, policy types, broker relationships, etc., are going to continually affect the warehouse. If the warehouse is going to stay current (and being current will be a big selling point of the warehouse), changes to the warehouse have to be made fast. A reporting or DSS warehouse/repository that is built on a logical and somewhat normalized data model, carefully abstracted from its source systems and properly insulated from downstream data marts (vis-・vis a staging areas) will be the exigent remedy. In other words: The normalized approach provides a magnitude of much greater capability and scalability--allowing for any question to be asked of the data by all interested parties, and the easy addition of more data in the future. If we have changing business requirements or the merging of companies/business units, the main data repository may have to be redesigned and/or re-adjusted to meet any new business requirements. This can be very expensive and also limit the future scalability and growth of the warehouse architecture. If the bulk of our dimensional, transactional and historical data is already stored in a normalized layer, we can react more quickly to changing user requirements
In contrast to a normalized main repository, data marts should, for the most part, be based on star schema models and contain data for specific subject areas. A good star schema will be scoped and processed along business or product lines. Data will be organized through multidimensional models that will support OLAP analysis, representing a conceptualization of the data closest to the way in which a manager perceives the corporate reality. This is why we must build star schemas along business lines: With smaller star schemas and dimension models in data marts, we can focus on distinct and trenchant business issues. In some instances, we may even want to affect the oft-used technique of creating duplicate dimensional panoramas by creating virtual star schema tables via relational database views. These views may be found in a de-normalized staging or reporting area; or hybrid relational-star-schema views could be used to feed OLAP data cubes as part of a load-extraction process for such tools as Cognos or Essbase.
Some other problems with using dimensional models to form the basis of a main reporting repository for the enterprise include the following:
Data warehouses and data marts focus on structured data; however, many new epoch (B2B and CRM) projects and applications will have to operate on less structured, non-relational (EBDIC binary, flat files, documents) data. In today's distributed computed environments, it is likely that certain systems under your watch will have to feed data to and receive data from IT complexes that touch outside that comfortable domain you have come to know as simply・in-house".
The ability for the main repository to accommodate XML configurations is paramount, and justifies a normalized physical model. A relational repository will be better able to support and parse delimited XML feeds, extracts, and transaction logs. Data should be justly enabled so that it can be read out of the repository, brought into a XML transformation process, and written to a intranet portal or outside exchange-exponentially expanding the reach of IT systems and customer/client/broker relationships in an unlimited fashion. This same process can work in reverse, as XML data is sent to a warehouse staging area or an ETL tool's domain for extraction, cleansing, and transformation before taking residency in the repository. Whether feeding the repository from a legacy system, or taking a feed from the warehouse, XML transformation engines can carry out transformations and resolve semantic and business differences that may have occurred as a result of CRM initiatives, Business to Business exchanges, or integration of third-party systems.
For distributed information requirements, the entity relationship diagram and the star-schema diagram need not compete with each other within the scope of a data warehouse or reporting repository project. The two are complementary and should be working together cohesively. Each are valuable but must be used different places with different expectations. Subscribing to the Bill Inmon camp of data warehouse architecture-flexible warehouse design should spring forth from an organization's core data, initially modeled as closely to "third normal form" as possible. From this point, the seasoned architect may proceed to formulate an enterprise data road map, replete with star schema data marts, de-normalized staging areas, XQuery enabled ODS structures, and other sundry architecture morphologies as determined by the system's requirements and collected elements of datum. A supple and amenable multilevel architecture with detail data in the foundation, augmented by the use of star joins and multidimensional modeling in associated data marts-each facilitating a different set of OLAP reports from a particular business perspective-should provide a flexible, maintainable, extensible, and friendly architecture for most firms.