Skip to content

Hashing

Introduction

Hashing is a technique that transforms data of any size into a fixed-size value — called a hash — using a special function. Think of it like a library's catalog system: instead of searching every shelf for a book, you use a catalog code that tells you exactly where to look. Hashing is the backbone of some of the most efficient data structures in computer science (hash tables, hash maps, hash sets) and is fundamental for solving problems that require fast lookups, duplicate detection, and data grouping.

Why Hashing Matters

Imagine you have a phone book with 10 million entries and you need to find someone's number. Searching one by one (linear search) could take 10 million steps. Even a clever approach like binary search needs the list sorted and takes about 23 steps. But with hashing, you can find the number in one step on average — constant time, O(1). That's the magic of hashing.

Core Concepts

What is a Hash Function?

A hash function takes an input (a key) and produces a number (the hash code). This number is then used as an index to store or retrieve data in an array-like structure.

A good hash function has these properties:

  • Deterministic: The same input always produces the same output.
  • Uniform distribution: It spreads keys evenly across available slots, minimizing clustering.
  • Fast to compute: The whole point is speed — a slow hash function defeats the purpose.

Here's a simple analogy: imagine assigning seats at a wedding. A hash function is like your seating algorithm — given a guest's name, it tells you which table they sit at. A bad algorithm puts everyone at table 3; a good one spreads guests evenly.

Hash Tables (Hash Maps)

A hash table is the data structure that uses hashing internally. It stores key-value pairs. When you want to store a value, the key is run through the hash function, which determines the index (bucket) where the value is placed.

Hash Collisions

A collision happens when two different keys produce the same hash index. This is inevitable — if you have more possible keys than array slots (and you usually do), some keys must share a slot. It's like the birthday paradox: in a room of just 23 people, there's a 50% chance two share a birthday.

The two most common strategies for handling collisions are:

1. Chaining (Separate Chaining)

Each bucket holds a list (or linked list). When a collision occurs, the new entry is simply added to the list at that bucket.

2. Open Addressing (Linear Probing)

When a collision occurs, you look for the next empty slot in the array. Variants include linear probing (check the next slot), quadratic probing (check slots at increasing distances), and double hashing (use a second hash function to determine the step size).

Load Factor and Resizing

The load factor is the ratio of stored entries to the total number of buckets:

Load Factor = number of entries / number of buckets

When the load factor exceeds a threshold (commonly 0.75), the hash table resizes — typically doubling the number of buckets and rehashing all existing entries. This keeps operations fast.

Time Complexity

OperationAverage CaseWorst Case
InsertO(1)O(n)
SearchO(1)O(n)
DeleteO(1)O(n)

The worst case happens when all keys collide into the same bucket, effectively turning the hash table into a linked list. In practice, with a good hash function and proper resizing, you almost always get O(1).

Implementation

Building a Hash Table from Scratch

Let's build a hash table with chaining to understand how it works under the hood:

javascript
class HashTable {
  constructor(size = 16) {
    this.buckets = new Array(size).fill(null).map(() => []);
    this.size = size;
    this.count = 0;
  }

  // Simple hash function
  _hash(key) {
    let hash = 0;
    const str = String(key);
    for (let i = 0; i < str.length; i++) {
      // Multiply by prime for better distribution
      hash = (hash * 31 + str.charCodeAt(i)) % this.size;
    }
    return hash;
  }

  // Insert or update a key-value pair
  set(key, value) {
    // Resize if load factor exceeds 0.75
    if (this.count / this.size > 0.75) {
      this._resize();
    }

    const index = this._hash(key);
    const bucket = this.buckets[index];

    // Check if key already exists — update if so
    for (let i = 0; i < bucket.length; i++) {
      if (bucket[i][0] === key) {
        bucket[i][1] = value;
        return;
      }
    }

    // Key doesn't exist — add new entry
    bucket.push([key, value]);
    this.count++;
  }

  // Retrieve value by key
  get(key) {
    const index = this._hash(key);
    const bucket = this.buckets[index];

    for (const [k, v] of bucket) {
      if (k === key) return v;
    }

    return undefined; // Key not found
  }

  // Delete a key-value pair
  delete(key) {
    const index = this._hash(key);
    const bucket = this.buckets[index];

    for (let i = 0; i < bucket.length; i++) {
      if (bucket[i][0] === key) {
        bucket.splice(i, 1);
        this.count--;
        return true;
      }
    }

    return false; // Key not found
  }

  // Check if key exists
  has(key) {
    return this.get(key) !== undefined;
  }

  // Resize the table when load factor is too high
  _resize() {
    const oldBuckets = this.buckets;
    this.size *= 2;
    this.buckets = new Array(this.size).fill(null).map(() => []);
    this.count = 0;

    for (const bucket of oldBuckets) {
      for (const [key, value] of bucket) {
        this.set(key, value);
      }
    }
  }
}

// --- Usage ---
const table = new HashTable();
table.set('alice', '555-0101');
table.set('bob', '555-0202');
table.set('charlie', '555-0303');

console.log(table.get('alice'));   // '555-0101'
console.log(table.get('bob'));     // '555-0202'
console.log(table.has('dave'));    // false

table.set('alice', '555-9999');    // Update existing key
console.log(table.get('alice'));   // '555-9999'

table.delete('bob');
console.log(table.get('bob'));     // undefined

Using JavaScript's Built-in Map and Object

In practice, you rarely build your own hash table. JavaScript gives you two hash-based structures:

javascript
// --- Using Map (preferred for most use cases) ---
const map = new Map();
map.set('name', 'Alice');
map.set(42, 'the answer');
map.set(true, 'yes');

console.log(map.get('name'));  // 'Alice'
console.log(map.size);         // 3
console.log(map.has(42));      // true

map.delete(true);

// Iterate over entries
for (const [key, value] of map) {
  console.log(`${key} => ${value}`);
}

// --- Using Object (works for string keys) ---
const obj = {};
obj['name'] = 'Bob';
obj['age'] = 30;

console.log(obj['name']);      // 'Bob'
console.log('age' in obj);     // true

delete obj['age'];

Common Problems Solved with Hashing

Hashing shines in several categories of problems. Let's walk through the most important ones.

1. Two Sum (The Classic)

Given an array of numbers and a target sum, find two numbers that add up to the target.

Without hashing: Check every pair — O(n²).
With hashing: One pass — O(n).

javascript
function twoSum(nums, target) {
  const seen = new Map(); // value → index

  for (let i = 0; i < nums.length; i++) {
    const complement = target - nums[i];

    if (seen.has(complement)) {
      return [seen.get(complement), i];
    }

    seen.set(nums[i], i);
  }

  return []; // No solution found
}

console.log(twoSum([2, 7, 11, 15], 9));  // [0, 1]
console.log(twoSum([3, 2, 4], 6));       // [1, 2]

2. Frequency Counting

Count how often each element appears — one of the most common interview patterns.

javascript
function frequencyCount(arr) {
  const freq = new Map();

  for (const item of arr) {
    freq.set(item, (freq.get(item) || 0) + 1);
  }

  return freq;
}

const fruits = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple'];
const counts = frequencyCount(fruits);

for (const [fruit, count] of counts) {
  console.log(`${fruit}: ${count}`);
}
// apple: 3
// banana: 2
// cherry: 1

3. Finding Duplicates

javascript
function findDuplicates(arr) {
  const seen = new Set();
  const duplicates = new Set();

  for (const item of arr) {
    if (seen.has(item)) {
      duplicates.add(item);
    }
    seen.add(item);
  }

  return [...duplicates];
}

console.log(findDuplicates([1, 3, 5, 3, 7, 1, 9])); // [3, 1]

4. Group Anagrams

Anagrams are words with the same letters in different order. We can use a sorted version of each word as a hash key:

javascript
function groupAnagrams(words) {
  const groups = new Map();

  for (const word of words) {
    // Sort the letters to create a canonical key
    const key = word.split('').sort().join('');

    if (!groups.has(key)) {
      groups.set(key, []);
    }
    groups.get(key).push(word);
  }

  return [...groups.values()];
}

const result = groupAnagrams(['eat', 'tea', 'tan', 'ate', 'nat', 'bat']);
console.log(result);
// [ ['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat'] ]

5. First Non-Repeating Character

javascript
function firstUnique(str) {
  const freq = new Map();

  for (const char of str) {
    freq.set(char, (freq.get(char) || 0) + 1);
  }

  for (const char of str) {
    if (freq.get(char) === 1) {
      return char;
    }
  }

  return null;
}

console.log(firstUnique('aabccbdee')); // 'd'
console.log(firstUnique('aabbcc'));    // null

6. Subarray with Given Sum (Using Prefix Sum + Hash)

This is a powerful technique combining hashing with prefix sums:

javascript
function subarraySum(nums, target) {
  const prefixSums = new Map();
  prefixSums.set(0, 1); // Empty prefix has sum 0
  let currentSum = 0;
  let count = 0;

  for (const num of nums) {
    currentSum += num;

    // If (currentSum - target) was seen before,
    // then there's a subarray summing to target
    if (prefixSums.has(currentSum - target)) {
      count += prefixSums.get(currentSum - target);
    }

    prefixSums.set(currentSum, (prefixSums.get(currentSum) || 0) + 1);
  }

  return count;
}

console.log(subarraySum([1, 1, 1], 2));      // 2
console.log(subarraySum([1, 2, 3, -3, 4], 4)); // 3

Hash Functions in Depth

Common Hash Function Strategies

Rolling Hash (for String Matching)

A rolling hash lets you compute the hash of a sliding window in O(1) instead of recomputing from scratch. This is the core idea behind the Rabin-Karp string matching algorithm:

javascript
function rabinKarp(text, pattern) {
  const BASE = 31;
  const MOD = 1e9 + 7;
  const n = text.length;
  const m = pattern.length;
  const results = [];

  if (m > n) return results;

  // Compute hash of pattern and first window
  let patternHash = 0;
  let windowHash = 0;
  let basePow = 1; // BASE^(m-1) mod MOD

  for (let i = 0; i < m; i++) {
    patternHash = (patternHash * BASE + pattern.charCodeAt(i)) % MOD;
    windowHash = (windowHash * BASE + text.charCodeAt(i)) % MOD;
    if (i > 0) basePow = (basePow * BASE) % MOD;
  }

  for (let i = 0; i <= n - m; i++) {
    if (windowHash === patternHash) {
      // Verify character by character (hash collision possible)
      if (text.substring(i, i + m) === pattern) {
        results.push(i);
      }
    }

    // Slide the window: remove leftmost char, add next char
    if (i < n - m) {
      windowHash = (
        (windowHash - text.charCodeAt(i) * basePow % MOD + MOD) * BASE +
        text.charCodeAt(i + m)
      ) % MOD;
    }
  }

  return results;
}

console.log(rabinKarp('abcabcabc', 'abc')); // [0, 3, 6]

Hashing vs. Other Approaches

Cryptographic vs. Non-Cryptographic Hashing

It's important to distinguish between the two:

AspectNon-CryptographicCryptographic
PurposeFast lookups, data structuresSecurity, integrity
SpeedVery fastIntentionally slower
Examplesdjb2, MurmurHash, FNVSHA-256, bcrypt, MD5
Reversible?Not designed to prevent itComputationally infeasible
Collision resistanceAcceptableCritical

For data structures and algorithms, we use non-cryptographic hash functions. For password storage and data integrity, look into Cryptography.

Best Practices

  1. Choose the right data structure: Use Map over plain objects in JavaScript when keys can be non-strings, or when you need guaranteed insertion order.
  2. Understand your hash function: A poor hash function that clusters keys will degrade O(1) performance to O(n). Always use well-tested, established hash functions.
  3. Monitor load factor: If you're implementing your own hash table, resize at a load factor of 0.75 to maintain O(1) average performance.
  4. Use Sets for membership testing: When you only need to check whether something exists (no associated value), use Set instead of Map — it's cleaner and conveys intent.
  5. Combine hashing with other techniques: Many powerful solutions combine hashing with prefix sums, sliding windows, or sorting. The Two Sum and Subarray Sum problems above are great examples.
  6. Be aware of ordering: Hash tables don't preserve insertion order in all languages (JavaScript Map does, but Java HashMap does not). If order matters, consider a LinkedHashMap or sort after retrieval.
  7. Handle edge cases: Always account for empty inputs, missing keys, and potential hash collisions in your logic.
  8. Use prime numbers for table sizes: When implementing your own hash table, prime-sized tables reduce collision patterns caused by common key distributions.
  • Arrays — Hash tables are built on top of arrays; understanding array indexing is fundamental to understanding hashing.
  • Linked Lists — Chaining-based hash tables use linked lists to handle collisions.
  • Binary Search Trees — An alternative data structure for key-value lookups with guaranteed O(log n) and sorted output.
  • Tries — Specialized tree structure for string keys that can outperform hash maps for prefix-based queries.
  • Graphs — Adjacency list representations often use hash maps for fast neighbor lookups.
  • Sorting Algorithms — Some problems can be solved with either sorting or hashing; understanding both helps you pick the best approach.
  • Sliding Window — A pattern frequently combined with hash maps for substring and subarray problems.

Practice Problems

These problems will build your hashing intuition from beginner to advanced:

LeetCode

HackerRank

Start with the Two Sum and Contains Duplicate problems — they'll cement the fundamental pattern. Then move to Subarray Sum Equals K and Longest Consecutive Sequence to see how hashing enables elegant O(n) solutions to problems that seem to require O(n²) or sorting.