Extended Hashing
Extended Hashing, often referred to as Extendible Hashing, is a dynamic hashing technique used to handle growing or shrinking datasets efficiently, especially in database systems and disk-based storage.
It is an improvement over static hashing, where the hash table size is fixed and leads to problems like overflow chains or excessive collisions as the dataset grows.
Why Use Extended Hashing?
- To avoid overflow buckets in static hashing.
- To allow the hash table to grow or shrink dynamically.
- To provide efficient access time even when the dataset size changes frequently.
Key Concepts of Extendible Hashing
1. Global Depth (GD)
- Number of bits considered from the hashed value to index into the directory.
- The directory size is
2^GD
.
2. Local Depth (LD)
- Number of bits used by a specific bucket.
- Helps determine if a bucket can be split independently or if the directory must be doubled.
3. Directory
- A table (array) of size
2^GD
that stores pointers to buckets. - Can have multiple entries pointing to the same bucket.
4. Bucket
- Stores the actual records (keys or data).
- Has a fixed capacity (e.g., 2 or 3 records).
- When full, triggers a bucket split.
Hash Function and Bit Extraction
A hash function generates a binary hash value. Only the first d
bits of this value are used:
d
= local depth or global depth (depending on operation)- Index is computed using:
index = hash(key) % 2^GD
(i.e., first GD bits)
Example of Extendible Hashing
Assume:
- Bucket size = 2
- Global Depth (GD) = 1 (initially)
- Directory size = 2 (2^1)
- Hash function = last few bits of binary key (simplified)
Step 1: Insert 4, 8, 10, 12
Assume hash(key) gives these binary representations.
4 → 100
8 → 1000
10 → 1010
12 → 1100
Only the last 1 bit (GD=1) is used:
- 4 → 0
- 8 → 0
- 10 → 0
- 12 → 0
All keys map to index 0 → Bucket 0
Step 2: Bucket 0 overflows (after 2 keys). Now split.
- Increase local depth of Bucket 0 to 2
- Now look at 2 bits of hash value for these keys:
- 4 → 00
- 8 → 00
- 10 → 10
- 12 → 00
Now we must increase global depth to match LD = 2 → Directory becomes size 4.
Index (2 bits): 00 01 10 11
Redistribute keys:
- Bucket 00: 4, 8, 12
- Bucket 10: 10
Bucket 00 still overflows → split again. This time:
- Bucket 00 local depth becomes 3
- Global depth stays 2 (directory size remains 4), but now more complex
Continue this process as keys are inserted...
Time and Space Complexity
Here is the time and space complexity analysis for Extendible Hashing.
Time Complexity
Operation | Average Case | Worst Case | Explanation |
---|---|---|---|
Search | O(1) | O(n) | Typically constant time, but in the worst case (before or during a split), may need to scan a full bucket. |
Insert | O(1) | O(n) | Usually fast; worst case occurs if multiple bucket splits and directory doubling are triggered. |
Delete | O(1) | O(n) | Fast if no merge or directory shrink is needed; worst if many entries must be adjusted. |
In practice, due to smart bucket splitting and hashing, insertions and lookups are nearly always O(1)
.
Space Complexity
Component | Space | Explanation |
---|---|---|
Directory | O(2^GD) | Directory size doubles with global depth (GD ), each entry is a pointer to a bucket. |
Buckets | O(n) | Each bucket stores a fixed number of records (e.g., 2 or 4), and the total number of buckets depends on n (number of records). |
Overall Space | O(n + 2^GD) | Total space = data + directory size. |
Advantages of Extendible Hashing
Dynamic Growth
The hash table grows as needed by doubling the directory when necessary.
No need to predefine the maximum size of the hash table.
Avoids Overflow Chains
Unlike static hashing, it avoids long overflow chains by splitting buckets instead of appending to a linked list.
Efficient Search, Insert, and Delete
On average, operations take O(1) time, similar to static hashing, even as the dataset grows.
Buckets are accessed using a fixed number of hash bits.
Space-Efficient Bucket Splitting
Only the overflowing bucket is split, not the entire table, which saves space and time.
Supports Shrinking
The directory can shrink when many records are deleted, freeing memory (if implemented).
Good for Disk-Based Systems
Extendible hashing minimizes disk access by keeping directory in main memory and storing only the buckets on disk.
Disadvantages of Extendible Hashing
Directory Doubling is Expensive
When the global depth increases, the entire directory size doubles, which can be expensive in terms of memory and performance.
High Memory Usage for Directory
The directory can become large, especially if many buckets have a low local depth and point to unique entries.
Complex Implementation
Requires careful handling of,
- Global and local depths
- Bucket splitting
- Directory updates
- More complex than static hashing or simple chaining.
Pointer Overhead
Multiple directory entries may point to the same bucket, increasing pointer management overhead.
Poor Cache Locality
Due to indirection and scattered memory access (especially with large directories and many buckets), cache performance may degrade compared to array-based hashing.