Hashing Data Structure

Hashing is a well-known technique to search any particular element among several elements. It minimise the number of comparisons while performing the search.

  • Hashing is extremely efficient.
  • The time taken by it to perform the search does not depend upon the total number of elements.
  • It completes the search with constant time complexity O(1).

Hashing Mechanism

  • An array data structure called as Hash table is used to store the data items.
  • Based on the hash key value, data items are inserted into the hash table.

Hash Key value

  • Hash key value is a special value that serves as an index for a data item.
  • It indicates where the data item should be be stored in the hash table.
  • Hash key value is generated using a hash function

Hash Function

Hash function is a function that maps any big number or string to a small integer value.

Hash function takes the data item as an input and returns a small integer value as an output.

  • The small integer value is called as a hash value.
  • Hash value of the data item is then used as an index for storing it into the hash table.

Types of Hash Functions

There are various types of hash functions available such as

  1. Truncation Method
  2. Mid Square Hash Function
  3. Division Hash Function
  4. Folding Hash Function etc.

It depends on the user which hash function wants to use.

1. Truncation Method

The Truncation Method truncates a part of the given keys, depending upon the size of the hash table.

  • Choose the hash table size.
  • Then the respective right most or left most digits are truncated and used as hash code/value.

Example: 123,42,56

H(123) = 1

H(42) = 4

H(56) = 5

2. Mid Square Method

It is a Hash Function method where Square the given keys and take the respective middle digits from each squared value and use that as the hash value, for the respective keys.

Example: 123, 42, 56

H(123) = 1 (1232 = 15129)

H(42) = 7 (422 = 1764)

H(56) = 3 (562 = 3136)

3. Folding Method

Partition the key K into number of parts, like K1, K2,.... Kn, then add the parts together and ignore the carry and use it as the hash value.

Example: 123, 43, 56

H(123) = [ 1+2+3 = 6 ]

H(43) = [ 4+3 = 7 ]

H(56) = [ 5+6 = 11 ]

4. Division Method

Choose a number m, larger than the number of keys. The number m is usually chosen to be a prime number. The formula for the division method is,

Hash(key)= key % Tablesize

Tablesize : 10 20,21,24,26,32,34

H(20)= 20 % 10 = 0

H(21)= 21 % 10 = 1

H(24)=24 % 10 = 4

H(26)= 26 % 10 = 6

H(32) = 32 % 10 = 2

H(34) = 34%10 = 4 (34 IS COLLISION)

Characteristics of a Good Hash Function

A hash function is a method to convert a key (like a string, number, or object) into a fixed-size integer usually used as an index in a hash table.

A good hash function is critical to ensure that operations like insert, delete, and search in a hash table are fast and efficient (ideally O(1) time).

1. Uniform Distribution

The function should distribute keys evenly across all slots of the hash table.

This avoids clustering and reduces collisions.

Example: If your hash table has 10 buckets and you insert 100 keys, each bucket should ideally have around 10 keys.

2. Deterministic

For the same input key, it should always produce the same hash value.

If key = 25, then hash(25) should always return the same index.

3. Minimize Collisions

A good hash function should try to produce unique hash values for different keys as much as possible.

Fewer collisions mean faster operations.

4. Fast Computation

The hash function must be computationally efficient.

It should not take more time than the actual operation (search, insert, delete).

5. Hash Value Must Be in Range

The hash value should always be within the bounds of the hash table size.

hash(key) = key % table_size;

If the table size is 10, the hash must return values from 0 to 9.

6. Handles Different Types of Input

It should be able to handle integers, strings, or compound keys gracefully.

7. Less Clustering

Should avoid placing too many keys into neighboring slots.

This helps reduce primary clustering (especially in linear probing).

8. Should Use All Input Data

Every part of the key should contribute to the final hash value.

Ignoring part of the key can lead to more collisions.


Hash Table

A Hash Table is a data structure that stores key-value pairs. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

It provides fast access to data using a key, typically in O(1) time for insert, delete, and search operations (in average case).

How Does a Hash Table Work?

  • A key is passed into a hash function.
  • The hash function returns an index.
  • The key-value pair is stored at that index in an array.

Advantages of Hash Tables

  • Fast access (O(1)) for search, insert, delete
  • Efficient for large datasets
  • Widely used in databases, compilers, caches

Disadvantages of Hash Tables

  • Collisions reduce performance
  • Not ordered (cannot retrieve data in sorted order)
  • Needs resizing if too full (rehashing)
  • Complex hash functions can increase computation time

Basic Operations on Hash Table

OperationDescriptionTime Complexity (Avg)
InsertAdd a key-value pairO(1)
SearchFind a value using its keyO(1)
DeleteRemove a key-value pairO(1)

Applications of Hashing

Data Retrieval in Hash Tables

Hash tables allow for constant time (O(1)) access to data. Common use in implementing maps, dictionaries, and sets in languages like C++, Java, Python, etc.

Example: unordered_map in C++, dict in Python.

Database Indexing

Databases use hashing for indexing to quickly locate records. Helps in implementing primary keys and index-based queries.

Password Storage

In security, passwords are stored as hashed values. Even if data is stolen, the actual passwords are not revealed.

Example: Using SHA-256 or bcrypt for storing password hashes.

Compiler Symbol Tables

Compilers use hash tables to store identifiers (variables, functions) for quick lookup. Speeds up parsing and semantic checking phases.

Caching

Hashing is used to store frequently accessed data in cache for quick retrieval.

Example: Browser caches, CPU caches, DNS caching.

Routing and Load Balancing

Hashing helps in distributing requests evenly across servers. Consistent hashing is used in distributed systems like CDNs and cloud infrastructure.

Blockchain and Cryptography

Blocks in blockchain are identified by their hashes. Cryptographic hash functions (e.g., SHA-256) are core to digital signatures and proof-of-work.

Detecting Duplicates

Hashing is used to check for duplicate files or data by comparing hash values. Common in backup systems, file systems, and data deduplication.

File Searching

Hash tables are used in file systems and applications to quickly locate files using filenames or metadata.

Memory Addressing

Hashing helps in allocating memory slots and managing page tables in operating systems.

Pattern Matching Algorithms

Algorithms like Rabin-Karp use hashing to efficiently search for patterns in strings.


Collision in Hashing

A collision occurs when two different keys are hashed to the same index in the hash table.

Why Do Collisions Happen?

  • The hash table is smaller than the total possible keys.
  • The hash function isn’t perfect and may produce the same index for multiple keys.
  • It's mathematically unavoidable due to the pigeonhole principle when key space > table size.

Collision Resolution Techniques

Collision Resolution Techniques are the techniques used for resolving or handling the collision.

Collision Resolution Techniques