We all know that databases are framed to deal with data and its storage. Also, we are even confused about which database to use as we have lots of options to pick! Generally, we choose the database producer or the owner. Besides that, we can also select the right database for our need by analyzing its types such as Hierarchical, a Relational, a Network database, or an Object-oriented database.
What is a Hierarchical Database?
As the name suggests, the hierarchical database model is most appropriate for use cases in which the main focus of information gathering is based on a concrete hierarchy, such as several individual employees reporting to a single department at a company.
The schema for hierarchical databases is defined by its tree-like organization, in which there is typically a root “parent” directory of data stored as records that links to various other subdirectory branches, and each subdirectory branch, or child record, may link to various other subdirectory branches.
The hierarchical database structure dictates that, while a parent record can have several child records, each child record can only have one parent record. Data within records is stored in the form of fields, and each field can only contain one value. Retrieving hierarchical data from a hierarchical database architecture requires traversing the entire tree, starting at the root node.
Data is stored in a hierarchical format. This time the data is arranged logically in a top-down format. In a hierarchical database, data is grouped in records, which are subdivided into a series of segments.
Primary and Secondary Storage
- Primary Storage: Fastest media but volatile (cache, main memory).
- Secondary Storage: Next level in hierarchy, non-volatile, moderately fast access time also called online storage (e.g., flash memory, magnetic disks).
- Tertiary Storage: Lowest level in hierarchy, non-volatile, slow access time also called offline storage (e.g., magnetic tape, optical storage).
Magnetic Disks
- Read-write head positioned very close to the platter surface (almost touching it) reads or writes magnetically encoded information.
- Surface of platter divided into circular tracks (over 16,000 tracks per platter on typical hard disks).
- Each track is divided into sectors (the smallest unit of data that can be read or written).
- Sector size typically 512 bytes.
- Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks).
To read/write a sector, the disk arm swings to position the head on the right track. The platter spins continually, and data is read/written as the sector passes under the head.
Head-disk assemblies: Multiple disk platters on a single spindle (typically 2 to 4), one head per platter, mounted on a common arm.
Cylinder: Consists of the track of all the platters.
Earlier generation disks were susceptible to head-crashes. The surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk.
Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted.
Disk controller: Interfaces between the computer system and the disk drive hardware.
- Accepts high-level commands to read or write a sector.
- Initiates actions such as moving the disk arm to the right track and actually reading or writing the data.
- Computes and attaches checksums to each sector to verify that data is read back correctly.
- If data is corrupted, with very high probability the stored checksum would not match the recomputed checksum.
- Ensures successful writing by reading back the sector after writing it.
- Performs remapping of bad sectors.
Optimization of Disk-Block Access
Block: A contiguous sequence of sectors from a single track. Data is transferred between disk and main memory in blocks.
- Block sizes range from 512 bytes to several kilobytes.
- Smaller blocks: More transfers from disk.
- Larger blocks: More space wasted due to partially filled blocks.
- Typical block sizes today range from 4 to 16 kilobytes.
Disk-arm scheduling algorithms order pending accesses to tracks so that disk arm movement is minimized.
- Example: Elevator algorithm → move disk arm in one direction (from outer to inner tracks or vice versa), processing the next request in that direction, till no more requests remain, then reverse direction and repeat.
File Organization:
- Optimize block access time by organizing the blocks to correspond to how data will be accessed (e.g., store related information on the same or nearby cylinders).
- Files may get fragmented over time (e.g., if data is inserted to/deleted from the file or free blocks on disk are scattered).
- Sequential access to a fragmented file results in increased disk arm movement.
- Some systems have utilities to defragment the file system in order to speed up file access.
Advantages and Disadvantages of Hierarchical Databases
Advantages:
- Traversing through a tree structure is very simple and fast due to its one-to-many relationships format.
- Several programming languages provide functionality to read tree structure databases.
- Easy to understand due to its one-to-many relationships.
Disadvantages:
- Rigid format of one-to-many relationships (a child cannot have more than one parent).
- Multiple nodes with the same parent add redundant data.
- Moving one record from one level to another can be challenging.
Examples:
- Website navigation file or sitemap.
- A company organization chart.
What is a Relational Database?
A relational database organizes data into tables, which can be linked—or related—based on data common to each. This capability enables you to retrieve an entirely new table from data in one or more tables with a single query.
It also allows you and your business to better understand the relationships among all available data and gain new insights for making better decisions or identifying new opportunities.
Tables: Rows and Columns
- Tables can have hundreds, thousands, or even millions of rows of data.
- Rows are often called records.
- Columns are labeled with a descriptive name (e.g., “age”) and have a specific data type.
Example:
- A column called age may have a type of INTEGER (meaning it can only hold whole numbers).
- The schema of a table is defined by the set of columns and data types.
Difference Between Relational and Hierarchical Databases
- Simpler to use: Hierarchical databases use logical parent-child relationships and look simpler. Relational databases use tables, fields, and often require a unique key for each record.
- Which is older? Hierarchical databases came before relational databases.
- Data notion: In hierarchical databases, data is called “segments.” In relational databases, it’s called “fields.”
- Inheritance: In hierarchical databases, child nodes inherit properties of the parent. In relational, there’s no inheritance.
- Data linking: Hierarchical → implicit parent-child links. Relational → explicit linking with primary and foreign keys.
- Use of keys: Relational databases rely on primary and foreign keys for unique identification. Hierarchical databases do not use keys; they rely on paths.
- Unique & duplicate data: Relational → easy to manage unique records. Hierarchical → needs extra processing to find duplicates.
- Data fetching: Hierarchical requires path traversal; relational allows flexible querying.
- Relationships: Hierarchical supports only one-to-many. Relational supports many-to-many.
- Fields vs nodes: Relational → fields. Hierarchical → nodes/segments.
- Usage: Hierarchical → library systems, sitemaps. Relational → employee databases, goods/inventory.