ABSTRACT
A Hash Table is an array-based data structure that leverages a hash function to map keys to specific array indices. Its defining characteristic is that its average-case performance is independent of the number of elements () it stores, allowing for operations in most practical scenarios.
1. The “Constant Time” Disclaimer
In computer science, we frequently label Hash Table operations as . However, it is important to understand what this measurement excludes:
- Ignoring the Hash Function: The designation refers to the array access after the hash value is calculated.
- The Cost of : For complex data types like strings or lists, a good hash function must iterate over all elements in the collection. Therefore, hashing a string of length is technically an operation.
- Why we say : We use because the time complexity does not grow as you add more items to the table. Unlike a Binary Search Tree (where search time increases as the tree gets taller), a Hash Table’s “jump” to an index remains constant regardless of the total number of entries.
2. Formal Definition
A Hash Table consists of:
- Backing Array: An array of size , where is the Capacity.
- Hash Function (): A function that maps a key to a valid index . A common simple function is .
The Core Operations (Simplified)
Below is the logic for a “collision-free” Hash Table (assuming every key maps to a unique index):
insert(key)
index = H(key)
if arr[index] is empty:
arr[index] = keyfind(key)
index = H(key)
return arr[index] == key3. The “Unordered” Property
A critical trade-off of the Hash Table is that it is unordered. Because hash functions often use complex math (like ) to randomize where keys land and avoid collisions, the physical order of elements in the array has no relationship to their actual values.
- Sorted Iteration: There is no efficient way to print a Hash Table in alphabetical or numerical order.
- C++ Implementation: The standard library calls this structure
unordered_setto explicitly remind programmers that the sequence of elements is not guaranteed.
// Example: Output order is non-deterministic
unordered_set<string> animals = {"Giraffe", "Polar Bear", "Toucan"};
for(auto s : animals) {
cout << s << endl; // Likely outputs: Toucan, Giraffe, Polar Bear
}4. The Challenge of Collisions
In any real-world Hash Table, the number of possible keys (e.g., all possible human names) is vastly larger than the table’s capacity (). This leads to collisions.
Collision: When two different keys, and , result in the same hash value: .
Collisions are the primary cause of performance degradation in Hash Tables. If many keys map to the same index, the performance can slide toward .
5. Summary Analysis
| Feature | Complexity (Average) | Logic |
|---|---|---|
| Find | Compute index Direct array access. | |
| Insert | Compute index Place in array. | |
| Remove | Compute index Clear array slot. | |
| Order | Unordered | Indices are randomized to reduce collisions. |