Array List
Summary Description
An ArrayList is an Abstract Data Type (ADT) implemented using a Dynamic Array. It acts as a wrapper around a standard fixed-size array, providing a mechanism for automatic resizing when the capacity is reached.
Key Features
-
Random Access: Leveraging the underlying array’s properties, access to any element via its index is performed in time.
-
Contiguity: Elements are stored in contiguous memory locations. There are no “gaps” or empty slots between the first and last element.
-
Resizing Logic: When the array becomes full, the container typically allocates a new array (usually double the current size), copies all elements, and deallocates the old array.
Complexity Analysis: Unsorted ArrayList
Performance Metrics
-
Find: Best Case , Average Case , Worst Case
-
Insert: Best Case , Average Case , Worst Case
-
Remove: Best Case , Average Case , Worst Case
Mathematical Derivations
-
Average Find:
-
Average Insert:
-
Average Remove:
Complexity Analysis: Sorted ArrayList
Performance Metrics
-
Find: Best Case , Average Case , Worst Case
-
Insert: Best Case , Average Case , Worst Case
-
Remove: Best Case , Average Case , Worst Case
Key Differences
-
Binary Search: Because the data is sorted and indexed, we can discard half the search space in each step, resulting in performance for Find.
-
Insertion/Removal: Even though we can find the target index in , we must still shift elements to maintain contiguity, keeping the operation at .
Space Complexity
The space complexity for both sorted and unsorted ArrayLists is .
Memory Management
-
Utilization: The actual memory allocated () is always greater than or equal to the number of elements ().
-
Resizing Bounds: In a typical doubling implementation, the array is at its most efficient just before resizing (where ) and its least efficient just after resizing (where ). In both cases, the relationship is linear.
Summary Comparison
Comparison Table
| Feature | Unsorted ArrayList | Sorted ArrayList |
|---|---|---|
| Primary Advantage | Fastest Insertions (at end) | Fastest Search (Binary Search) |
| Search Method | Linear Search | Binary Search |
| Index Access | ||
| Contiguous? | Yes | Yes |
Linked List
Summary Description
Linked Lists are linear data structures composed of nodes connected to one another via pointers. Unlike Array Lists, Linked Lists do not store elements in contiguous memory locations.
Structure and Access
-
Global Pointers: We typically maintain access to two pointers:
head(pointing to the first node) andtail(pointing to the last node). -
Singly-Linked List: Each node maintains a single
nextpointer that points to the subsequent node in the list. -
Doubly-Linked List: Each node maintains two pointers: a
nextpointer and apreviouspointer, allowing for bidirectional traversal. -
Sequential Access: We do not have random access (indexing). To reach an inner node, we must traverse the list node-by-node starting from the
headortail.
Modification Logic
Once the target location is identified, the actual insertion or removal of a node is an operation. This is achieved by simply redirecting the pointers of the neighboring nodes. Consequently, the total time complexity for these operations is dominated by the time required to find the target position.
Complexity Analysis
Performance Metrics
-
Find: Best Case , Average Case , Worst Case
-
Insert: Best Case , Average Case , Worst Case
-
Remove: Best Case , Average Case , Worst Case
Mathematical Derivations
-
Average Find (Singly-Linked):
-
Average Find (Doubly-Linked): If we know the index and can choose to start from the
heador thetail(whichever is closer), the average number of nodes visited is:
Space Complexity
The space complexity for a Linked List is .
Memory Allocation
-
Node Overhead: Each of the nodes requires space for the stored data plus the overhead of 1 or 2 pointers (for singly- or doubly-linked, respectively).
-
Per-Node Cost: Since each node uses space for its pointers, the total space scales linearly with the number of elements.
Summary Comparison
Comparison Table
| Feature | Singly-Linked List | Doubly-Linked List |
|---|---|---|
| Pointers per Node | 1 (Next) | 2 (Next & Previous) |
| Traversal | Forward Only | Bidirectional |
| Memory Overhead | Lower | Higher |
| Find (Worst Case) |
Skip Lists
Summary Description
Skip Lists are probabilistic data structures that augment a standard Linked List with multiple layers of forward pointers. This multi-level hierarchy allows the search process to “skip” over large sections of the list, effectively mimicking the efficiency of Binary Search in a linked structure.
Structure and Probabilistic Height
-
Layers and Height: Each node contains a variable number of forward pointers, referred to as the node’s height.
-
Max Height: We typically define a maximum height , where .
-
Coin Flipping: To maintain balance without complex reshuffling, node heights are determined randomly. A weighted coin with probability is flipped repeatedly; the height increases with each “heads” until the first “tails” or the maximum height is reached.
-
Efficient Traversal: Searches begin at the highest level of the head node, moving forward as long as the next node’s value is less than the target, and dropping down a level when the next value is too large.
Modification Logic
Similar to a Linked List, the cost of insertion or removal is dominated by the search time. Once the correct position is located, updating the pointers across the node’s levels takes time.
Complexity Analysis
Performance Metrics
-
Find: Best Case , Average Case , Worst Case
-
Insert: Best Case , Average Case , Worst Case
-
Remove: Best Case , Average Case , Worst Case
Analysis Details
-
Worst Case: In the highly unlikely event that all nodes are assigned the same height (e.g., height 1), the structure degenerates into a standard Singly-Linked List with performance.
-
Average Case: Due to the probabilistic distribution of node heights, the expected search path length is logarithmic, providing efficiency for finding, inserting, and removing elements.
-
Best Case: The best case for a find occurs if the target is the very first element checked. For insertion and removal, even with a best-case find, we must still update pointers across levels.
Space Complexity
The space complexity for a Skip List is generally discussed in terms of its expected and worst-case bounds.
-
Expected Space: On average, the number of pointers is a constant multiple of , resulting in an expected space complexity of .
-
Worst-Case Space: In the worst-case scenario regarding height distribution, the space complexity can reach or , though the probability of this occurring is extremely low given the coin-flip logic.
Summary Comparison
Comparison Table
| Feature | Linked List | Skip List |
|---|---|---|
| Search Strategy | Linear | Multi-level “Skip” |
| Search Time (Average) | ||
| Implementation Type | Deterministic | Probabilistic |
| Structure | Single level | Logarithmic levels |
Heap
Summary Description
A Heap is a specialized complete binary tree that satisfies the Heap Property. Because it is a complete tree (all levels are fully filled except possibly the last, which is filled from left to right), it maintains a balanced height of .
The Heap Property
-
Priority: For any two nodes and , if is the parent of , then must have a higher or equal priority than .
-
Min-Heap: The parent’s value is always less than or equal to its children’s values. The root is the minimum element.
-
Max-Heap: The parent’s value is always greater than or equal to its children’s values. The root is the maximum element.
Array List Implementation
Heaps are most efficiently implemented using an Array List (Dynamic Array). This provides better cache locality and eliminates the need for explicit pointers. In an array-based heap starting at index :
-
Root: Stored at index .
-
Parent of node at : Located at index .
-
Left Child of node at : Located at index .
-
Right Child of node at : Located at index .
Complexity Analysis
Performance Metrics
-
Peek: Best/Average/Worst Case . We simply access the element at index .
-
Insert: Best Case , Average/Worst Case .
-
Pop (Extract): Best Case , Average/Worst Case .
Operational Logic
-
Insertion (Up-Heap/Bubble-up): The new element is placed at the first available spot at the end of the array to maintain the “complete tree” property, then compared with its parent and swapped upwards until the heap property is restored.
-
Pop (Down-Heap/Trickle-down): The root is removed, and the last element in the array is moved to the root position. This element is then swapped downwards with its highest-priority child until the heap property is restored.
Space Complexity
The space complexity for a Heap is .
- Storage efficiency: Since it is implemented as an Array List, it follows the space complexity of dynamic arrays. Because the tree is always complete, there are no null pointers or wasted internal nodes, making it very memory-efficient compared to a standard linked binary tree.
Summary Comparison
Comparison Table
| Feature | Heap | Sorted Array List |
|---|---|---|
| Find Min/Max | ||
| Insert | ||
| Remove Min/Max | ||
| Structure | Complete Tree | Linear |
Binary Search Tree (BSTs)
Summary Description
A Binary Search Tree (BST) is a fundamental hierarchical data structure where each node follows a specific ordering rule: the value of any node is greater than all values in its left subtree and smaller than all values in its right subtree.
Specialized BST Variants
To solve the “degeneracy” problem (where a tree becomes a linear list), several variants exist:
-
Randomized Search Trees (Treap, RST): A combination of a BST and a Heap. Each node has a key (following BST property) and a random priority (following Heap property).
-
AVL Tree: A strictly self-balancing BST where the heights of the left and right subtrees of any node differ by at most one.
-
Red-Black Tree: A self-balancing BST that uses “colors” (red/black) and specific structural rules to maintain balance. It is less strict than AVL but more efficient for insertions and deletions.
Complexity Analysis: Regular BST
Performance Metrics
-
Find/Insert/Remove:
-
Worst Case: — Occurs if elements are inserted in sorted order, creating a “skewed” tree that resembles a Linked List.
-
Average Case: — Assuming a random distribution of keys.
-
Best Case: — If the operation happens at the root node.
-
Space Complexity
- : Each node stores data and three pointers (parent, left, right), requiring constant space per node.
Complexity Analysis: Randomized Search Trees (Treap, RST)
Performance Metrics
-
Worst Case: — High probability of being logarithmic, but theoretically could degenerate if keys and random priorities align poorly.
-
Average Case: — Random priorities act as a safeguard against skewed input.
Space Complexity
- : Similar to BST, with the addition of a priority value stored in each node.
Complexity Analysis: AVL Tree
Performance Metrics
-
Find/Insert/Remove (Worst Case): — The strict balance property guarantees logarithmic height.
-
Comparison: Because AVL is strictly balanced, it is usually faster for find operations than Red-Black trees but requires more “rotations” during modifications.
Space Complexity
- : Nodes often store an additional height factor to manage balancing.
Complexity Analysis: Red-Black Tree
Performance Metrics
-
Find/Insert/Remove (Worst Case): — Balance is maintained such that the longest path to a leaf is no more than twice as long as the shortest.
-
Comparison: Red-Black trees are generally faster for insert and remove operations because they require fewer structural adjustments (rotations) compared to AVL trees.
Space Complexity
- : Nodes store a single bit of information for “color” (red or black).
Summary Comparison Table
| Feature | Regular BST | Randomized (Treap) | AVL Tree | Red-Black Tree |
|---|---|---|---|---|
| Balance Strategy | None | Probabilistic (Priorities) | Strict Height Factor | Color-based Rules |
| Search (Worst) | ||||
| Modification (Worst) | ||||
| Best Use Case | Small/Random Data | General Purpose | Search-Heavy Apps |
B-Tree and B+ Tree
Summary Description
B-Trees and B+ Trees are “fat” balanced search trees designed to handle large amounts of data, particularly for systems that read and write large blocks of data (like databases and filesystems). They minimize disk I/O by having a high branching factor, which keeps the tree very flat.
B-Tree Fundamentals
-
Node Capacity: Every internal node must have at least and at most children.
-
Sorted Order: Elements within a single node are kept in ascending order.
-
Child Placement: Child pointers are positioned between elements, acting as dividers. For elements and , the subtree between them contains all values such that .
-
Perfect Balance: All leaf nodes must reside on the same level.
B+ Tree Variant
A B+ Tree is an optimization where the internal nodes function only as a “road map” (containing search keys), while the actual data records are stored exclusively in the leaf nodes.
-
Internal vs. Leaf: Internal nodes contain up to children and keys. Leaves contain the actual data records (up to records).
-
Search Keys: Internal nodes store keys to direct the search. In many implementations, the smallest data record in a subtree is used as the search key in the parent.
-
Leaf Connectivity: Leaves are often linked together in a list to allow for efficient range queries (e.g., “find all records between 10 and 50”).
Complexity Analysis: B-Tree
Performance Metrics
-
Find: Worst Case . The high branching factor makes this very fast.
-
Insert/Remove: Worst Case . This accounts for traversing the height () and potentially shifting up to elements within a node to maintain sorted order during a split or merge.
-
Best Case Find: if the target is in the root node.
Space Complexity
- : Nodes are required to be at least half-full (specifically at least keys). Even in the worst-case scenario where every node is at minimum occupancy, the space remains linear.
Complexity Analysis: B+ Tree
Performance Metrics
-
Find: Worst Case . You must travel to the leaf level () and then find the record within the leaf node ().
-
Insert/Remove: Worst Case . This involves the search path plus the overhead of shifting keys in internal nodes and records in a leaf.
-
Best Case Find: . Unlike a B-Tree, you must always travel to the leaf level to retrieve a record, even if the key is present in the root.
Space Complexity
- : Because nodes must be at least 50% full, the overhead for internal search keys and pointers remains proportional to the number of data records.
Summary Comparison Table
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Location | Any node (internal or leaf) | Only leaf nodes |
| Search Path | Can end at any level | Always ends at leaf level |
| Range Queries | Difficult (requires tree traversal) | Easy (leaves are linked) |
| Branching Factor | Lower (data takes up node space) | Higher (internal nodes are “slim”) |
Hash Table and Hash Map
Summary Description
A Hash Table is a data structure that maps keys to indices in an array using a hash function. A Hash Map extends this concept by storing (key, value) pairs. The primary goal of a Hash Table is to achieve near-constant time performance for core operations.
Key Concepts
-
Hash Function: Converts a key into an integer. For a function to be valid, if , then . A “good” function minimizes collisions ( when ).
-
Load Factor (): Defined as , where is the number of elements and is the table capacity. Generally, should stay below to maintain performance.
-
Capacity: Often chosen as a prime number to improve the distribution of keys and reduce collision patterns.
Collision Resolution Strategies
-
Open Addressing: All elements are stored within the array itself.
-
Linear Probing: If a collision occurs, check the next available slot ().
-
Double Hashing: Uses a second hash function to determine the “step size” for probing ().
-
Random Hashing: Uses a pseudorandom sequence seeded by the key to find available slots.
-
-
Closed Addressing (Separate Chaining): Each array slot points to a separate data structure (like a Linked List or BST) that holds all keys hashing to that index.
-
Cuckoo Hashing: Uses two hash functions and two possible slots per key. If a collision occurs, the new key “kicks out” the resident key, which then moves to its alternative slot.
Complexity Analysis: Probing (Linear, Double, Random)
Performance Metrics
-
Find/Insert/Remove:
-
Worst Case: — Occurs if every key hashes to the same index or causes a massive cluster, requiring a probe through all elements.
-
Average Case: — Provided the load factor is kept low and the hash function is uniform.
-
Best Case: — The key is found at the initial hashed index with no collisions.
-
Space Complexity
- : The table size is maintained at a constant multiple of the number of elements.
Complexity Analysis: Separate Chaining
Performance Metrics
-
Find/Remove:
-
Worst Case: — Occurs if all elements hash to the same bucket, turning the search into a linear scan of a Linked List.
-
Average Case: — Assuming a uniform distribution where each bucket has a small, constant number of elements.
-
-
Insert: if prepending to a list; in the worst case if checking for duplicates is required.
Space Complexity
- : Includes the array capacity plus the space for each of the nodes in the chains.
Complexity Analysis: Cuckoo Hashing
Performance Metrics
-
Find/Remove:
- Worst Case: — A key is guaranteed to be in one of only two specific locations. This makes Cuckoo Hashing excellent for applications requiring predictable lookup times.
-
Insert:
-
Worst Case: — If a cycle of “kicking out” elements is detected, the table must be resized and rebuilt (rehashed).
-
Average Case: .
-
Summary Comparison Table
| Feature | Linear Probing | Separate Chaining | Cuckoo Hashing |
|---|---|---|---|
| Collision Logic | Find next empty slot | Linked List per bucket | Kick out existing key |
| Find (Worst) | |||
| Insert (Avg) | |||
| Space Efficiency | High (in-place) | Lower (pointers) | High (in-place) |
| Load Factor Limit | Very sensitive ( ideal) | Less sensitive | Sensitive ( ideal) |
Multiway Trie
Summary Description
A Multiway Trie is a specialized tree-based data structure used for retrieval, typically of strings over a defined alphabet . Unlike other trees that store the entire key within a node, a Trie stores keys along the paths from the root, where each edge represents a single character.
Structure and Key Properties
-
Edge Labeling: Each edge is labeled with a character from the alphabet .
-
Node Degree: Every node can have a maximum of children, representing each possible character in the alphabet.
-
Key Path: A key is defined by the concatenation of edge labels on the path from the root to a “word node” (a node marked as the end of a valid string).
-
Parameters: We use to represent the length of the longest key and for the total number of elements stored.
Complexity Analysis
Performance Metrics
-
Find: Worst Case . The algorithm traverses the tree character by character. Since each step involves looking up a pointer in an array of size , each character is processed in time relative to .
-
Insert: Worst Case . Similar to find, the algorithm follows the path of the string and creates new nodes/edges ( each) for any characters not already present.
-
Remove: Worst Case . After finding the “word node,” the “is-word” marker is removed. Depending on the implementation, the algorithm may also prune the branch upwards if the nodes are no longer needed.
Average and Best Case
-
Average Case: . While the formal complexity is , the practical speed depends on the distribution of key lengths. If keys are distributed uniformly, the expected time scales with the average length of the strings.
-
Best Case: . For
FindandRemove, if the first character of the query is not a child of the root, the operation terminates immediately.
Space Complexity
The space complexity of a Multiway Trie is often its primary drawback, as it prioritizes speed over memory efficiency.
-
Worst-Case (Dense): . If the Trie contains every possible string of length , the structure grows exponentially with the alphabet size. Total nodes:
-
Worst-Case (Sparse): . If all words share no common prefixes, we have separate paths of length . Each node in these paths allocates an array of size to store pointers for the alphabet, leading to high memory overhead.
[Image showing memory waste in a trie node with a large alphabet]
Summary Comparison Table
| Feature | Multiway Trie | Binary Search Tree (Strings) |
|---|---|---|
| Search Time | ||
| Space Complexity | ||
| Prefix Matching | Excellent / Natural | Difficult |
| Alphabet Dependence | Highly Dependent | Independent |
Ternary Search Tree (TST)
Summary Description
A Ternary Search Tree is a hybrid data structure that combines the space efficiency of a Binary Search Tree with the prefix-searching capabilities of a Trie. Each node stores a single character and has at most three children.
Node Structure and Logic
-
Left Child: Points to a node whose character value is less than the current node.
-
Right Child: Points to a node whose character value is greater than the current node.
-
Middle Child: Points to a node representing the next character in the current word string.
-
Word Representation: A word is formed by collecting the characters of nodes where you take the “middle” path, ending at a designated “word node.”
Complexity Analysis
Performance Metrics
-
Worst Case (): Like a standard BST, if keys are inserted in alphabetical or reverse-alphabetical order, the tree can degenerate into a long, linear chain (essentially a Linked List).
-
Average Case (): In a well-balanced TST, the search time is logarithmic. While the alphabet size heavily impacts Multiway Tries, TSTs are much more resistant to large alphabets because they use BST logic at each character level.
-
Best Case (): If the word you are looking for (length ) was the very first word inserted, you will follow a direct “middle-child” path with minimal left/right branching.
Space Complexity
-
: Each character of each unique word () typically requires one node.
-
Efficiency: TSTs are significantly more space-efficient than Multiway Tries because they only allocate nodes for characters that actually exist in the dataset, rather than allocating a full array for every single node.
Summary Comparison Table
| Feature | Multiway Trie | Ternary Search Tree |
|---|---|---|
| Children per Node | Up to | 3 |
| Search Time (Avg) | ||
| Space Complexity | ||
| Alphabet Sensitivity | High | Low |
Summary of String Data Structures (Quick Reference)
| Data Structure | Search (Avg) | Space (Worst) | Best Use Case |
|---|---|---|---|
| Multiway Trie | High-speed prefix search with small alphabets | ||
| TST | Large alphabets, limited memory | ||
| BST (Strings) | Simple implementations, general purpose |
Disjoint Set (Union-Find)
Summary Description
The Disjoint Set ADT manages a collection of non-overlapping sets. It is defined by two primary operations: Union, which merges two sets into one, and Find, which determines which set a particular element belongs to (usually by returning a “representative” or sentinel node).
Up-Tree Implementation
Disjoint Sets are most efficiently implemented using Up-Trees, where each node points to its parent rather than its children. The root of each tree serves as the sentinel node for that set.
-
Union-by-Size (Rank): To keep trees flat, the root of the smaller set is attached to the root of the larger set. This ensures the tree height stays logarithmic.
-
Path Compression: During a
Findoperation, every node along the path to the root is reattached directly to the root. This drastically flattens the tree for future operations. -
Optimal Strategy: Because Path Compression frequently changes tree heights, making “Union-by-Height” difficult to track accurately, most implementations use Union-by-Size combined with Path Compression.
Complexity Analysis
Performance Metrics
-
Find:
-
Worst Case: (with Union-by-Size only).
-
Amortized: . The inverse Ackermann function grows so slowly that it is effectively constant for all practical values of .
-
Best Case: (if the element is already the sentinel).
-
-
Union:
-
Worst Case: (dominated by the two
Findoperations required to locate the roots). -
Amortized: .
-
Best Case: (if both elements provided are already roots).
-
Space Complexity
- : Each of the elements is stored exactly once, with each node requiring a single parent pointer (or an integer representing size if it is a root).
Summary Comparison Table (Updated)
| Data Structure | Search (Avg) | Space (Worst) | Best Use Case |
|---|---|---|---|
| Multiway Trie | High-speed prefix search with small alphabets | ||
| TST | Memory-efficient prefix search for large alphabets | ||
| BST (Strings) | Simple implementations, general purpose | ||
| Disjoint Set | Grouping elements and cycle detection in |
Graphs and Graph Representation
Summary Description
A Graph is a mathematical structure consisting of a set of vertices (nodes) and a set of edges (connections). Graphs are used to model relationships between objects, such as social networks, road maps, or task dependencies.
Graph Classifications
-
Directed vs. Undirected: In a directed graph, edges have a specific direction (from to ). In an undirected graph, an edge between and is bidirectional.
-
Weighted vs. Unweighted: Weighted graphs assign a numerical value (cost, distance, etc.) to each edge.
-
Sparse vs. Dense: A sparse graph has few edges (close to ), while a dense graph has many edges (approaching the maximum possible, ).
-
Simple Graphs: We typically disallow “parallel edges” (multiple edges between the same two nodes), ensuring .
Adjacency Matrix
An Adjacency Matrix is a 2D array where the entry at row and column indicates the presence (and weight) of an edge.
Performance Metrics
-
Edge Lookup: . You can instantly check if an edge exists by indexing the matrix.
-
Iterating Outgoing Edges: . To find neighbors of , you must scan the entire -th row.
-
Space Complexity: . This is constant regardless of how many edges actually exist, making it inefficient for sparse graphs.
Adjacency List
An Adjacency List is an array of pointers, where each pointer leads to a list (usually a Linked List or Dynamic Array) of that vertex’s neighbors.
Performance Metrics
-
Edge Lookup: worst-case. In a highly skewed graph where all edges originate from one node, you might have to scan the entire list of edges.
-
Iterating Outgoing Edges: . This is the most efficient way to traverse neighbors, as you only visit existing edges.
-
Space Complexity: . You only store the nodes and the edges that actually exist, making it the preferred choice for sparse graphs.
Summary Comparison Table
| Representation | Lookup Edge | Find Neighbors | Space Complexity | Best Use Case |
|---|---|---|---|---|
| Adjacency Matrix | Dense graphs | |||
| Adjacency List | Sparse graphs |
Updated Master Comparison Table
| Data Structure | Search (Avg) | Space (Worst) | Best Use Case |
|---|---|---|---|
| Multiway Trie | High-speed prefix search with small alphabets | ||
| TST | Memory-efficient prefix search for large alphabets | ||
| BST (Strings) | Simple implementations, general purpose | ||
| Disjoint Set | Grouping elements and cycle detection | ||
| Adjacency List | General graph algorithms (BFS, DFS, Dijkstra) |