Among all DSSs, data warehouse systems are perhaps the ones that have received the most attention from academic and industrial communities. Informally, data warehousing can be defined as follows:
Data warehousing is a set of strategies, techniques, and tools that allow knowledge workers—senior managers, directors, managers, and analysts—to conduct data analytics that aid in decision-making and the improvement of information resources.
The definition of data warehousing offered here is intended to be generic; it provides an overview of the process but does not include specifics. To fully comprehend the purpose and utility of data warehousing, you must first comprehend the requirements that led to its creation. R. Kimball neatly summarized a few claims typically submitted by end users of traditional information systems in 1996:
- “We have heaps of data, but we cannot access it!” This shows the frustration of those responsible for the future of their enterprises but have no technical tools to help them extract the required information in a proper format.
- “How can people playing the same role achieve substantially different results?” In midsize to large enterprises, many databases are usually available, each devoted to a specific business area. They are often stored on different logical and physical media that are not conceptually integrated. As a result, the results achieved in every business area are likely to be inconsistent.
- “We want to select, group, and manipulate data in every possible way!” Decision-making processes cannot always be planned before the decisions are made. End users need a tool that is user-friendly and flexible enough to conduct ad hoc analyses. They want to choose which new correlations they need to search for in real time as they analyse the information retrieved.
- “Show me just what matters!” Examining data at the maximum level of detail is not only useless for decision-making processes but also self-defeating because it does not allow users to focus their attention on meaningful information.
- “Everyone knows that some data is wrong!” An appreciable percentage of transactional data is not correct—or it is unavailable. It is clear that you cannot achieve good results if you base your analyses on incorrect or incomplete data.
Data Warehouse Administration
Data warehouse administration is frequently considered to be a subset of database management. Furthermore, there are certain parallels between databases and data warehouses. However, data warehouses have qualities that databases do not have.
For example, data warehouses employ software known as online analytical processing (OLAP) systems, whereas databases use online transaction processing (OLTP) systems. This may be the most fundamental distinction between the two:
- OLAP systems are data analysis tools that are best suited for data warehouses.
- OLTP systems are database-specific technologies that focus on transactions that accomplish data processing.
The function of managing and maintaining database management systems (DBMS) software is known as database administration. Mainstream DBMS technologies that require continual management include Oracle, IBM DB2, and Microsoft SQL Server.
Database administration refers to the entire collection of activities that a database administrator performs to ensure that a database is always available when needed. Other closely connected duties include database security, database monitoring and troubleshooting, and future growth planning.
Database administration is a critical activity in every organisation that relies on one or more databases.
What Does a Database Administrator Do?
The rising need for database administrators is being driven by technological improvements and the capacity to store mind-boggling volumes of data on discs and in the cloud for e-commerce. System DBA and Application DBA are two specialties that make use of database administration skills:
- System DBAs (conventional or general-purpose DBAs) are responsible for all physical and technological aspects of database management that keep systems functioning smoothly, efficiently, and without errors. Primary duties include: ➯Ensuring data security and avoiding data loss ➯Upgrading permissions ➯Applying software patches ➯Managing multiple applications on separate platforms ➯Merging existing databases ➯Developing and testing changes to database structures
- Application DBAs are experienced system DBAs who specialise in a certain application or group of applications, such as database design, functional competence, or specific server knowledge (e.g., Oracle).
- Oracle programme DBAs must have specialised knowledge of business administration and Oracle application server architectures. They must understand Oracle database design, the internal workings of the Oracle application server, and how to modify programs. Additionally, functional expertise in a specific industry or type of business is required.
Administration of Client-Server Systems
The DBA's role is to determine which database system and architecture is best for the firm. Key responsibilities include:
- Understanding business strategy and objectives
- Developing policies for database systems maintenance and management
- Ensuring system performance meets organizational needs
Advantages of client-server architecture include portability, scalability, and interconnectability across multiple network topologies. When using a Relational DBMS on client-server systems, DBAs must consider:
- Transaction control
- Performance
- Security
- Integrity
- Procedure logic
SQL has evolved into a standard data access language for client and server machines. While additional SQL extensions may be appealing, they may limit future application adaptability.
- A server instance is a running server. A database without an instance is inactive, and an instance without a database is useless.
Tools Used in DBA Administration
Data Dictionary
- Made up of tables and views
- Server looks up information about the data
- Often referred to as recursive requests
Stored Procedures
- Collections of SQL statements saved on the server
- Execute like ordinary programming procedures
- Reduce network traffic and improve server performance
Data Buffering
- Temporarily stores data in memory during query execution
- Reduces disc I/O and improves performance
- Shared buffering allows multiple clients to benefit
Asynchronous Data Writing
- Smooths out I/O peaks
- Scheduled to minimize performance reduction
Data Integrity Enforcement
- Centralized processing ensures consistent enforcement of database constraints
Concurrency Control Features
- Allows multiple clients to access the server while maintaining data integrity
- Includes locking, multi-versioning, and rollback mechanisms
Connectivity and Communication
- Ensures SQL and data transfer between clients and servers
- Supports multiple network protocols for multivendor environments
Client-Server Security
Server Safety
- Centralized access control
- Three degrees of security: ➯ Enrollment of users ➯ Privileges of access ➯ Allocation of resources
Client Safety
- Handles errors from unauthorized operations reported by the server
Network Safety
- Encryption techniques protect data during transmission
- Journaling and logging help detect security violations
Tuning the Database
DBAs must provide efficient service to users. Important areas for tuning include:
Tuning SQL
- Use indexes on primary and foreign keys
- Full table scans may be faster for queries retrieving >20% of rows
- Avoid constructs that prevent index usage
- Joins are generally faster than nested queries
- Use brief table aliases to boost performance
Tuning Disk I/O
- Minimize disk contention
- Distribute frequently accessed files across multiple disks
- Reserve free space in blocks (PCTFREE in Oracle)
- Minimize dynamic expansion
- Optimize DBWR (database writer) processes
Tuning Memory
- Prioritize memory access over disk access
- Key memory areas in Oracle: ➯ System Global Area (SGA) ➯ Caches (data dictionary cache, buffer cache) ➯ Program Global Area (PGA) ➯ User Global Area (UGA) ➯ Software code areas
Tuning Contention
- Contention occurs when multiple processes access the same resource
- Main Oracle areas of contention: ➯ Data blocks ➯ Rollback segments ➯ Redo log buffers
Tools to Assist Performance Tuning
Software Monitors
- Provide system statistics
- May add CPU overhead
Hardware Monitors
- Record data without affecting system performance
Other Performance Tools
- Describe the facilities: Analyses access paths without executing queries (Oracle and DB2)
- SQL*DBA monitoring: Monitors database activity (locking, I/O, statistics, tables)
- RUNSTATS (DB2): Computes statistics after data loads or updates