Random Hashing

ABSTRACT

Random Hashing is an Open Addressing collision resolution strategy that eliminates clustering by using a Pseudorandom Number Generator (PRNG) to determine the probe sequence. By seeding the PRNG with the key itself, the algorithm ensures that the “random” path is consistent and repeatable for every search.

1. The Concept: Using a Seeded Sequence

In Linear Probing, the sequence is always $in d e x + 1$ . In Double Hashing, it is $in d e x + H_{2} (k)$ . In Random Hashing, the sequence is generated by a PRNG.

For any given key $k$ , the PRNG will produce a sequence of numbers $r_{1}, r_{2}, r_{3} ...$ . The indices checked are simply these numbers modulo $M$ .

Why use the Key as a Seed?

A fundamental requirement of any Hash Table operation is that it must be deterministic.

If you insert “Apple” and the PRNG picks indices ${2, 5, 9}$ , then when you try to find “Apple” later, the PRNG must pick ${2, 5, 9}$ again.
If we used a truly random seed (like the current time), we would never be able to find the keys we inserted! By using the key as the seed, we guarantee the same “random” path every time.

2. Implementation Logic

The algorithm essentially treats the PRNG as a generator for the probe sequence.

insert_RandomHash(k):
    // Use the key as the seed to ensure determinism
    RNG = new PseudorandomNumberGenerator(seed = k)
    
    Loop infinitely:
        nextNumber = RNG.next()
        index = nextNumber % M
 
        if arr[index] == k:
            return false // Duplicate found
 
        else if arr[index] == NULL:
            arr[index] = k
            return true // Success!
 
        // If collision, the loop continues and the RNG 
        // provides the next "random" jump
        
        if all M locations probed:
            resize_and_rehash()

3. Random Hashing vs. Double Hashing

While both methods effectively solve the problem of Primary Clustering, they have different practical trade-offs:

Feature	Double Hashing	Random Hashing
Probe Logic	Arithmetic ( $H_{1} + i \cdot H_{2}$ )	PRNG Sequence
Complexity	Very Low (Fast math)	Medium (PRNG state updates)
Distribution	Excellent	Optimal (Approximates Uniform Hashing)
Practicality	Industry Standard	Less common due to PRNG overhead

4. The Performance Catch

In theory, Random Hashing is the “gold standard” for distribution because it most closely mimics Uniform Hashing (the theoretical ideal where every probe is independent).

However, in the real world, generating “high-quality” pseudorandom numbers involves complex bitwise operations and multiplications that can be significantly slower than the simple addition used in Double Hashing. Because Double Hashing is “good enough” to eliminate clustering, the extra CPU cycles required for Random Hashing are rarely justified.

Jason's Notebook

Explorer

Random Hashing

1. The Concept: Using a Seeded Sequence

Why use the Key as a Seed?

2. Implementation Logic

3. Random Hashing vs. Double Hashing

4. The Performance Catch

Graph View

Table of Contents

Backlinks