Implementing a data warehouse is a massive effort that must be carefully planned and executed according to established methods. Below are the implementation issues, steps, and guidelines.
Key Implementation Issues in Data Warehousing
- Massive Effort – Implementing a data warehouse is generally a massive effort that must be planned and executed according to established methods.
- Operational Issues – Construction, administration, and quality control are the significant operational issues which arise with data warehousing.
- Challenging Considerations – Some of the important and challenging considerations while implementing data warehouse are: the design, construction, and implementation of the warehouse.
- Enterprise-Wide Complexity – The building of an enterprise-wide warehouse in a large organization is a major undertaking.
- Manual Data Risks – Manual Data Processing can risk the correctness of the data being entered.
- Administration Complexity – Administration of a data warehouse is an intensive enterprise, proportional to the complexity and size of the warehouse.
- Understanding Administration – The complex nature of the administration should be understood by an organization that attempts to administer a data warehouse.
- Flexibility & Analytics – There must be flexibility to accept and integrate analytics to streamline the business intelligence process.
- Updating Schemas – To handle the evolutions, acquisition component and the warehouse’s schema should be updated.
- Quality Control – A significant issue in data warehousing is the quality control of data. The major concerns are: quality and consistency of data.
- Consistency Challenge – Consistency remains a significant issue for the database administrator.
- Heterogeneous Data – One of the major challenges is melding data from heterogeneous and disparate sources. This includes differences in naming, domain definitions, and identification numbers.
- Interactions with Changes – The data warehouse administrator must consider the possible interactions with elements of the warehouse every time a source database changes.
- Accuracy of Data – There should be accuracy of data. The efficiency and working of a warehouse are only as good as the data that supports its operation.
- Usage Projections – Usage projections should be estimated conservatively prior to construction of the data warehouse and should be revised continually to reflect current requirements.
- Accommodating Changes – To accommodate addition and attrition of data sources, the warehouse should be designed. This also avoids a major redesign.
- Evolving Sources – Sources and source data will evolve, and the warehouse must accommodate such changes.
- Data Fitting Challenge – Another continual challenge is fitting the available source data into the data model of the warehouse. This is because requirements and capabilities of the warehouse will change over time as there will be continual rapid change in technology.
- Skills Requirement – A far broader skill set will be required for administration of data warehouse compared to traditional database administration.
- Management in Large Organizations – Managing the data warehouse in large organizations involves design of the management function and selection of the management team.
Data Warehousing Implementation Steps
- Requirements Analysis and Capacity Planning - Define enterprise needs, architectures, plan capacity, and pick hardware and software solutions. This step includes consultation with senior management and various stakeholders.
- Hardware Integration - Install and integrate servers, storage systems, and user software tools.
- Modeling - Develop the warehouse schema and views. Complex warehouses may require the use of modeling tools.
- Physical Modeling - Design the physical data warehouse organization, data location, data splitting, access strategies, and indexing.
- Data Sources - Identify and connect the information sources. This may involve gateways, ODBC drivers, or other wrappers.
- ETL (Extract, Transform, Load) - Define ETL tools, acquire them, and implement them. Tools may need customization to meet business needs.
- Populate the Data Warehouse - Test ETL tools with a staging area. Populate the warehouse based on schema and view definitions.
- End-User Applications - Design and implement applications required by end users.
- Roll-Out - After population and testing, roll out the warehouse system and applications to the user community.
Implementation Guidelines
- Building Increments - Start with data marts for specific projects. Later integrate into an enterprise data warehouse, allowing all departments to extract information from one system.
- Need a Champion - A data warehouse proposal must have a strong leader who researches expected costs and benefits. This person persuades colleagues and connects different units in the enterprise.
- Senior Management Support - Essential due to the resource-intensive nature and long-term commitment of data warehousing projects.
- Ensure Quality - Only clean, validated records should be loaded into the warehouse.
- Corporate Strategy Alignment - A warehouse project must align with corporate goals and objectives.
- Business Plan - Stakeholders must clearly understand costs, benefits, and the project plan. Prevents misinformation and keeps the project on track.
- Training - Users must be trained to effectively utilize and understand the warehouse.
- Adaptability - Flexibility must be built into the system to accommodate modifications as enterprise needs evolve.
- Joint Management - Both IT and business professionals must jointly manage the project. Ensures proper communication and alignment with enterprise goals.