Extended Hashing

Extended Hashing, often referred to as Extendible Hashing, is a dynamic hashing technique used to handle growing or shrinking datasets efficiently, especially in database systems and disk-based storage.

It is an improvement over static hashing, where the hash table size is fixed and leads to problems like overflow chains or excessive collisions as the dataset grows.

Why Use Extended Hashing?

  • To avoid overflow buckets in static hashing.
  • To allow the hash table to grow or shrink dynamically.
  • To provide efficient access time even when the dataset size changes frequently.

Key Concepts of Extendible Hashing

1. Global Depth (GD)

  • Number of bits considered from the hashed value to index into the directory.
  • The directory size is 2^GD.

2. Local Depth (LD)

  • Number of bits used by a specific bucket.
  • Helps determine if a bucket can be split independently or if the directory must be doubled.

3. Directory

  • A table (array) of size 2^GD that stores pointers to buckets.
  • Can have multiple entries pointing to the same bucket.

4. Bucket

  • Stores the actual records (keys or data).
  • Has a fixed capacity (e.g., 2 or 3 records).
  • When full, triggers a bucket split.

Hash Function and Bit Extraction

A hash function generates a binary hash value. Only the first d bits of this value are used:

  • d = local depth or global depth (depending on operation)
  • Index is computed using: index = hash(key) % 2^GD (i.e., first GD bits)

Example of Extendible Hashing

Assume:

  • Bucket size = 2
  • Global Depth (GD) = 1 (initially)
  • Directory size = 2 (2^1)
  • Hash function = last few bits of binary key (simplified)

Step 1: Insert 4, 8, 10, 12

Assume hash(key) gives these binary representations.

4  → 100
8  → 1000
10 → 1010
12 → 1100

Only the last 1 bit (GD=1) is used:

  • 4 → 0
  • 8 → 0
  • 10 → 0
  • 12 → 0

All keys map to index 0 → Bucket 0

Step 2: Bucket 0 overflows (after 2 keys). Now split.

  • Increase local depth of Bucket 0 to 2
  • Now look at 2 bits of hash value for these keys:
    • 4 → 00
    • 8 → 00
    • 10 → 10
    • 12 → 00

Now we must increase global depth to match LD = 2 → Directory becomes size 4.

Index (2 bits): 00 01 10 11

Redistribute keys:

  • Bucket 00: 4, 8, 12
  • Bucket 10: 10

Bucket 00 still overflows → split again. This time:

  • Bucket 00 local depth becomes 3
  • Global depth stays 2 (directory size remains 4), but now more complex

Continue this process as keys are inserted...


Time and Space Complexity

Here is the time and space complexity analysis for Extendible Hashing.

Time Complexity

OperationAverage CaseWorst CaseExplanation
SearchO(1)O(n)Typically constant time, but in the worst case (before or during a split), may need to scan a full bucket.
InsertO(1)O(n)Usually fast; worst case occurs if multiple bucket splits and directory doubling are triggered.
DeleteO(1)O(n)Fast if no merge or directory shrink is needed; worst if many entries must be adjusted.

In practice, due to smart bucket splitting and hashing, insertions and lookups are nearly always O(1).

Space Complexity

ComponentSpaceExplanation
DirectoryO(2^GD)Directory size doubles with global depth (GD), each entry is a pointer to a bucket.
BucketsO(n)Each bucket stores a fixed number of records (e.g., 2 or 4), and the total number of buckets depends on n (number of records).
Overall SpaceO(n + 2^GD)Total space = data + directory size.

Advantages of Extendible Hashing

Dynamic Growth

The hash table grows as needed by doubling the directory when necessary.

No need to predefine the maximum size of the hash table.

Avoids Overflow Chains

Unlike static hashing, it avoids long overflow chains by splitting buckets instead of appending to a linked list.

Efficient Search, Insert, and Delete

On average, operations take O(1) time, similar to static hashing, even as the dataset grows.

Buckets are accessed using a fixed number of hash bits.

Space-Efficient Bucket Splitting

Only the overflowing bucket is split, not the entire table, which saves space and time.

Supports Shrinking

The directory can shrink when many records are deleted, freeing memory (if implemented).

Good for Disk-Based Systems

Extendible hashing minimizes disk access by keeping directory in main memory and storing only the buckets on disk.


Disadvantages of Extendible Hashing

Directory Doubling is Expensive

When the global depth increases, the entire directory size doubles, which can be expensive in terms of memory and performance.

High Memory Usage for Directory

The directory can become large, especially if many buckets have a low local depth and point to unique entries.

Complex Implementation

Requires careful handling of,

  • Global and local depths
  • Bucket splitting
  • Directory updates
  • More complex than static hashing or simple chaining.

Pointer Overhead

Multiple directory entries may point to the same bucket, increasing pointer management overhead.

Poor Cache Locality

Due to indirection and scattered memory access (especially with large directories and many buckets), cache performance may degrade compared to array-based hashing.