Appearance
Hashing
Introduction
Hashing is a technique that transforms data of any size into a fixed-size value — called a hash — using a special function. Think of it like a library's catalog system: instead of searching every shelf for a book, you use a catalog code that tells you exactly where to look. Hashing is the backbone of some of the most efficient data structures in computer science (hash tables, hash maps, hash sets) and is fundamental for solving problems that require fast lookups, duplicate detection, and data grouping.
Why Hashing Matters
Imagine you have a phone book with 10 million entries and you need to find someone's number. Searching one by one (linear search) could take 10 million steps. Even a clever approach like binary search needs the list sorted and takes about 23 steps. But with hashing, you can find the number in one step on average — constant time, O(1). That's the magic of hashing.
Core Concepts
What is a Hash Function?
A hash function takes an input (a key) and produces a number (the hash code). This number is then used as an index to store or retrieve data in an array-like structure.
A good hash function has these properties:
- Deterministic: The same input always produces the same output.
- Uniform distribution: It spreads keys evenly across available slots, minimizing clustering.
- Fast to compute: The whole point is speed — a slow hash function defeats the purpose.
Here's a simple analogy: imagine assigning seats at a wedding. A hash function is like your seating algorithm — given a guest's name, it tells you which table they sit at. A bad algorithm puts everyone at table 3; a good one spreads guests evenly.
Hash Tables (Hash Maps)
A hash table is the data structure that uses hashing internally. It stores key-value pairs. When you want to store a value, the key is run through the hash function, which determines the index (bucket) where the value is placed.
Hash Collisions
A collision happens when two different keys produce the same hash index. This is inevitable — if you have more possible keys than array slots (and you usually do), some keys must share a slot. It's like the birthday paradox: in a room of just 23 people, there's a 50% chance two share a birthday.
The two most common strategies for handling collisions are:
1. Chaining (Separate Chaining)
Each bucket holds a list (or linked list). When a collision occurs, the new entry is simply added to the list at that bucket.
2. Open Addressing (Linear Probing)
When a collision occurs, you look for the next empty slot in the array. Variants include linear probing (check the next slot), quadratic probing (check slots at increasing distances), and double hashing (use a second hash function to determine the step size).
Load Factor and Resizing
The load factor is the ratio of stored entries to the total number of buckets:
Load Factor = number of entries / number of buckets
When the load factor exceeds a threshold (commonly 0.75), the hash table resizes — typically doubling the number of buckets and rehashing all existing entries. This keeps operations fast.
Time Complexity
| Operation | Average Case | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Search | O(1) | O(n) |
| Delete | O(1) | O(n) |
The worst case happens when all keys collide into the same bucket, effectively turning the hash table into a linked list. In practice, with a good hash function and proper resizing, you almost always get O(1).
Implementation
Building a Hash Table from Scratch
Let's build a hash table with chaining to understand how it works under the hood:
javascript
class HashTable {
constructor(size = 16) {
this.buckets = new Array(size).fill(null).map(() => []);
this.size = size;
this.count = 0;
}
// Simple hash function
_hash(key) {
let hash = 0;
const str = String(key);
for (let i = 0; i < str.length; i++) {
// Multiply by prime for better distribution
hash = (hash * 31 + str.charCodeAt(i)) % this.size;
}
return hash;
}
// Insert or update a key-value pair
set(key, value) {
// Resize if load factor exceeds 0.75
if (this.count / this.size > 0.75) {
this._resize();
}
const index = this._hash(key);
const bucket = this.buckets[index];
// Check if key already exists — update if so
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket[i][1] = value;
return;
}
}
// Key doesn't exist — add new entry
bucket.push([key, value]);
this.count++;
}
// Retrieve value by key
get(key) {
const index = this._hash(key);
const bucket = this.buckets[index];
for (const [k, v] of bucket) {
if (k === key) return v;
}
return undefined; // Key not found
}
// Delete a key-value pair
delete(key) {
const index = this._hash(key);
const bucket = this.buckets[index];
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket.splice(i, 1);
this.count--;
return true;
}
}
return false; // Key not found
}
// Check if key exists
has(key) {
return this.get(key) !== undefined;
}
// Resize the table when load factor is too high
_resize() {
const oldBuckets = this.buckets;
this.size *= 2;
this.buckets = new Array(this.size).fill(null).map(() => []);
this.count = 0;
for (const bucket of oldBuckets) {
for (const [key, value] of bucket) {
this.set(key, value);
}
}
}
}
// --- Usage ---
const table = new HashTable();
table.set('alice', '555-0101');
table.set('bob', '555-0202');
table.set('charlie', '555-0303');
console.log(table.get('alice')); // '555-0101'
console.log(table.get('bob')); // '555-0202'
console.log(table.has('dave')); // false
table.set('alice', '555-9999'); // Update existing key
console.log(table.get('alice')); // '555-9999'
table.delete('bob');
console.log(table.get('bob')); // undefinedUsing JavaScript's Built-in Map and Object
In practice, you rarely build your own hash table. JavaScript gives you two hash-based structures:
javascript
// --- Using Map (preferred for most use cases) ---
const map = new Map();
map.set('name', 'Alice');
map.set(42, 'the answer');
map.set(true, 'yes');
console.log(map.get('name')); // 'Alice'
console.log(map.size); // 3
console.log(map.has(42)); // true
map.delete(true);
// Iterate over entries
for (const [key, value] of map) {
console.log(`${key} => ${value}`);
}
// --- Using Object (works for string keys) ---
const obj = {};
obj['name'] = 'Bob';
obj['age'] = 30;
console.log(obj['name']); // 'Bob'
console.log('age' in obj); // true
delete obj['age'];Common Problems Solved with Hashing
Hashing shines in several categories of problems. Let's walk through the most important ones.
1. Two Sum (The Classic)
Given an array of numbers and a target sum, find two numbers that add up to the target.
Without hashing: Check every pair — O(n²).
With hashing: One pass — O(n).
javascript
function twoSum(nums, target) {
const seen = new Map(); // value → index
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (seen.has(complement)) {
return [seen.get(complement), i];
}
seen.set(nums[i], i);
}
return []; // No solution found
}
console.log(twoSum([2, 7, 11, 15], 9)); // [0, 1]
console.log(twoSum([3, 2, 4], 6)); // [1, 2]2. Frequency Counting
Count how often each element appears — one of the most common interview patterns.
javascript
function frequencyCount(arr) {
const freq = new Map();
for (const item of arr) {
freq.set(item, (freq.get(item) || 0) + 1);
}
return freq;
}
const fruits = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple'];
const counts = frequencyCount(fruits);
for (const [fruit, count] of counts) {
console.log(`${fruit}: ${count}`);
}
// apple: 3
// banana: 2
// cherry: 13. Finding Duplicates
javascript
function findDuplicates(arr) {
const seen = new Set();
const duplicates = new Set();
for (const item of arr) {
if (seen.has(item)) {
duplicates.add(item);
}
seen.add(item);
}
return [...duplicates];
}
console.log(findDuplicates([1, 3, 5, 3, 7, 1, 9])); // [3, 1]4. Group Anagrams
Anagrams are words with the same letters in different order. We can use a sorted version of each word as a hash key:
javascript
function groupAnagrams(words) {
const groups = new Map();
for (const word of words) {
// Sort the letters to create a canonical key
const key = word.split('').sort().join('');
if (!groups.has(key)) {
groups.set(key, []);
}
groups.get(key).push(word);
}
return [...groups.values()];
}
const result = groupAnagrams(['eat', 'tea', 'tan', 'ate', 'nat', 'bat']);
console.log(result);
// [ ['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat'] ]5. First Non-Repeating Character
javascript
function firstUnique(str) {
const freq = new Map();
for (const char of str) {
freq.set(char, (freq.get(char) || 0) + 1);
}
for (const char of str) {
if (freq.get(char) === 1) {
return char;
}
}
return null;
}
console.log(firstUnique('aabccbdee')); // 'd'
console.log(firstUnique('aabbcc')); // null6. Subarray with Given Sum (Using Prefix Sum + Hash)
This is a powerful technique combining hashing with prefix sums:
javascript
function subarraySum(nums, target) {
const prefixSums = new Map();
prefixSums.set(0, 1); // Empty prefix has sum 0
let currentSum = 0;
let count = 0;
for (const num of nums) {
currentSum += num;
// If (currentSum - target) was seen before,
// then there's a subarray summing to target
if (prefixSums.has(currentSum - target)) {
count += prefixSums.get(currentSum - target);
}
prefixSums.set(currentSum, (prefixSums.get(currentSum) || 0) + 1);
}
return count;
}
console.log(subarraySum([1, 1, 1], 2)); // 2
console.log(subarraySum([1, 2, 3, -3, 4], 4)); // 3Hash Functions in Depth
Common Hash Function Strategies
Rolling Hash (for String Matching)
A rolling hash lets you compute the hash of a sliding window in O(1) instead of recomputing from scratch. This is the core idea behind the Rabin-Karp string matching algorithm:
javascript
function rabinKarp(text, pattern) {
const BASE = 31;
const MOD = 1e9 + 7;
const n = text.length;
const m = pattern.length;
const results = [];
if (m > n) return results;
// Compute hash of pattern and first window
let patternHash = 0;
let windowHash = 0;
let basePow = 1; // BASE^(m-1) mod MOD
for (let i = 0; i < m; i++) {
patternHash = (patternHash * BASE + pattern.charCodeAt(i)) % MOD;
windowHash = (windowHash * BASE + text.charCodeAt(i)) % MOD;
if (i > 0) basePow = (basePow * BASE) % MOD;
}
for (let i = 0; i <= n - m; i++) {
if (windowHash === patternHash) {
// Verify character by character (hash collision possible)
if (text.substring(i, i + m) === pattern) {
results.push(i);
}
}
// Slide the window: remove leftmost char, add next char
if (i < n - m) {
windowHash = (
(windowHash - text.charCodeAt(i) * basePow % MOD + MOD) * BASE +
text.charCodeAt(i + m)
) % MOD;
}
}
return results;
}
console.log(rabinKarp('abcabcabc', 'abc')); // [0, 3, 6]Hashing vs. Other Approaches
Cryptographic vs. Non-Cryptographic Hashing
It's important to distinguish between the two:
| Aspect | Non-Cryptographic | Cryptographic |
|---|---|---|
| Purpose | Fast lookups, data structures | Security, integrity |
| Speed | Very fast | Intentionally slower |
| Examples | djb2, MurmurHash, FNV | SHA-256, bcrypt, MD5 |
| Reversible? | Not designed to prevent it | Computationally infeasible |
| Collision resistance | Acceptable | Critical |
For data structures and algorithms, we use non-cryptographic hash functions. For password storage and data integrity, look into Cryptography.
Best Practices
- Choose the right data structure: Use
Mapover plain objects in JavaScript when keys can be non-strings, or when you need guaranteed insertion order. - Understand your hash function: A poor hash function that clusters keys will degrade O(1) performance to O(n). Always use well-tested, established hash functions.
- Monitor load factor: If you're implementing your own hash table, resize at a load factor of 0.75 to maintain O(1) average performance.
- Use Sets for membership testing: When you only need to check whether something exists (no associated value), use
Setinstead ofMap— it's cleaner and conveys intent. - Combine hashing with other techniques: Many powerful solutions combine hashing with prefix sums, sliding windows, or sorting. The Two Sum and Subarray Sum problems above are great examples.
- Be aware of ordering: Hash tables don't preserve insertion order in all languages (JavaScript
Mapdoes, but JavaHashMapdoes not). If order matters, consider aLinkedHashMapor sort after retrieval. - Handle edge cases: Always account for empty inputs, missing keys, and potential hash collisions in your logic.
- Use prime numbers for table sizes: When implementing your own hash table, prime-sized tables reduce collision patterns caused by common key distributions.
Related Concepts
- Arrays — Hash tables are built on top of arrays; understanding array indexing is fundamental to understanding hashing.
- Linked Lists — Chaining-based hash tables use linked lists to handle collisions.
- Binary Search Trees — An alternative data structure for key-value lookups with guaranteed O(log n) and sorted output.
- Tries — Specialized tree structure for string keys that can outperform hash maps for prefix-based queries.
- Graphs — Adjacency list representations often use hash maps for fast neighbor lookups.
- Sorting Algorithms — Some problems can be solved with either sorting or hashing; understanding both helps you pick the best approach.
- Sliding Window — A pattern frequently combined with hash maps for substring and subarray problems.
Practice Problems
These problems will build your hashing intuition from beginner to advanced:
LeetCode
- Two Sum — The quintessential hashing problem.
- Valid Anagram — Frequency counting.
- Group Anagrams — Hash map grouping.
- Contains Duplicate — Set-based duplicate detection.
- First Unique Character in a String — Frequency map + scan.
- Subarray Sum Equals K — Prefix sum + hash map.
- Longest Consecutive Sequence — Set-based O(n) solution.
- LRU Cache — Combines hash map with doubly-linked list.
HackerRank
- Hash Tables: Ransom Note — Frequency counting basics.
- Sherlock and Anagrams — Substring hashing.
- Count Triplets — Multi-pass hash map reasoning.
- Frequency Queries — Dual hash maps for frequency tracking.
Start with the Two Sum and Contains Duplicate problems — they'll cement the fundamental pattern. Then move to Subarray Sum Equals K and Longest Consecutive Sequence to see how hashing enables elegant O(n) solutions to problems that seem to require O(n²) or sorting.