Rehashing

Rehashing is a technique used in hashing-based data structures like hash tables when the current hash table becomes too full or inefficient due to collisions or load factor increase.

It involves creating a new larger hash table and re-inserting all existing elements using a new hash function or the same function with a larger table size.

How Rehashing Works?

  1. Create a new hash table with a larger size, often double the current size or a prime number.
  2. Iterate through all elements in the current hash table.
  3. Re-insert each element into the new table using the hash function (updated for new table size).
  4. Update the reference of the old table to point to the new one.

Why Rehashing is Needed?

High Load Factor

Load factor = (Number of elements) / (Size of the hash table).

When it exceeds a certain threshold (typically 0.7–0.75), the chance of collision increases.

Too Many Collisions: Collisions slow down search, insert, and delete operations.

Performance Degradation: To maintain O(1) average-case time complexity, we need to limit the load factor.


When to Do Rehashing?

What is Load Factor?

Load Factor (α) is a measure that tells how full a hash table is.

Load factor = (Number of elements) / (Size of the hash table).

Where:

  • n = total number of keys inserted
  • m = total size (capacity) of the hash table

Why is Load Factor Important?

It directly affects the performance of hash table operations like insertion, search, and deletion.

Low Load Factor (e.g., < 0.5)

  • Few elements → few collisions → fast operations.

High Load Factor (e.g., > 0.75)

  • Table is nearly full → more collisions → slower operations.

Too High (α ≥ 1.0)

  • In open addressing, insertion may fail due to no available slots.

Threshold for Rehashing

Most implementations set a threshold load factor. When the current load factor crosses this threshold, rehashing is triggered.

StrategyTypical Load Factor Threshold
Open Addressing0.6 to 0.75
Separate ChainingCan go > 1.0 (no fixed limit)

Rehashing Strategy

  • Double the size (or next prime number).
  • Recalculate hash for each element using new size.
  • Reinsert into new hash table.

Example: Rehashing

Let's assume an initial hash table of size 5 using Linear Probing and a simple hash function.

hash(key) = key % table_size

Initial Insertion

Keys: 5, 15, 25, 35, 45

All collide at index 0 but are inserted via linear probing.

IndexKey
05
115
225
3
4

Load factor = 3/5 = 0.6 → Insert few more keys

Add: 35 → inserted at index 3

Add: 45 → inserted at index 4

Load factor = 5/5 = 1 → Rehashing needed

Rehashing Process

Create new table of size 10 (Double the old has table size).

Re-insert keys 5, 15, 25, 35, 45 using new hash function: key % 10

New Table:

IndexKey
0
1
2
3
4
55
615
725
835
945

Now the load factor = 5/10 = 0.5 → fewer collisions and better performance.


Time and Space Complexity of Rehashing

Best/Average Case (Single Rehashing Step): O(n), where n is the number of elements in the table.

Amortized Time Complexity for Insertions (Multiple Rehashing): O(1) per operation if rehashing is done infrequently.


Advantages of Rehashing

Maintains fast O(1) average time complexity for search, insert, and delete.

Reduces collisions.

Dynamically handles growing data sets.


Disadvantages of Rehashing

Costly Operation: Rehashing takes O(n) time, which may pause performance for a while.

Memory Overhead: Temporarily requires extra memory to hold both tables.

Implementation Complexity: Needs extra care in collision resolution and resizing logic.


Real-World Application of Rehashing

Hash Maps / Hash Tables in Java, C++, Python (dict in Python, HashMap in Java)

Databases for indexing

Caches like LRU Cache

Symbol Tables in Compilers


Numerical Question for Rehashing

  1. Use the hash function h(k) = k % m, where m = 5. Insert the following keys:
    25, 15, 35, 45, 55 Use linear probing for collision resolution.
    1. Show the state of the hash table after all insertions.
    2. What is the load factor?
    3. Should rehashing be triggered if threshold = 0.75?
  2. Use the hash function h(k) = k % 5. Insert the keys: 25, 15, 35, 45, 55
    Use linear probing.
    1. a) Show the table
    2. b) Load factor
    3. c) Rehashing?