Alexandre Lim

This book is a fantastic introduction to data structures and algorithms. I have already read it twice and often come back to it. You'll gain a strong foundation in your programming skills. The author did a great job at explaining the technical concepts. It's not an easy read, nor is it too complicated. I believe it's definitely worth investing time in reading the book.

Notes

The first step in writing fast code is to understand what data structures are and how different data structures can affect the speed of our code.

Data is a broad term that refers to all types of information. Data structures refer to how data is organized. But the organization of data doesn’t just matter for organization’s sake. It can significantly impact how fast your code runs.

The array is one of the most basic data structures in computer science. The size of an array is how many data elements the array holds. The index of an array is the number that identifies where a piece of data lives inside the array.

To understand the performance of any data structure, we need to analyze the common ways our code might interact with that data structure. Many data structures are used in four basic ways. These operations are:

Read
Search
Insert
Delete

When we measure how “fast” an operation takes, we do not refer to how fast the operation takes in terms of pure time, but instead in how many steps it takes.

A computer has immediate access to all of its memory addresses, but it has no idea offhand what values are contained at each memory address.

A basic search operation where the computer checks each cell one at a time—is known as linear search.

The efficiency of inserting a new piece of data into an array depends on where within the array you’re inserting it.

When allocating an array, the computer always keeps track of the array’s size.

The worst-case scenario for insertion into an array containing N elements is N + 1 steps. It’s an insertion at the beginning. It’s also the case with deletion, but it takes N steps.

A set is a data structure that does not allow duplicate values to be contained within it. Every insertion into a set first requires a search.

An algorithm is simply a set of instructions for completing a specific task.

The ordered array data structure requires that the values are always kept in order. While insertion is less efficient than a classic array, the ordered array shines when it comes to searching. We can use the binary search algorithm.

For linear search, each time we double the size of the array, we double the number of steps of our search. But in the case of binary search, we only add one more step. Keep in mind the trade-off: by using an ordered array, you have somewhat slower insertion, but a much faster search.

It’s important to realize that there usually isn’t a single data structure or algorithm that is perfect for every situation.

The key question is: if there are N data elements, how many steps will the algorithm take?

O(N) is known as having a linear time. O(1) algorithm can be referred to as having constant time.

The soul of Big O: how will an algorithm’s performance change as the data increases? It wants to tell you the story of how the number of steps increases as the data changes.

Big O Notation generally refers to the worst-case scenario unless specified otherwise.

O(log N) is the Big O way of describing an algorithm that increases one step each time the data is doubled. It means an algorithm takes as many steps as it takes to keep halving the data elements until we remain with 1.

Given an array of unsorted values, how can we sort them so that they end up in ascending order?

Bubble sort ⇒ in each pass-through, the highest unsorted value “bubbles” up to its correct position. Efficiency is O(N²), also referred to as quadratic time.
Selection sort ⇒ Efficiency is O(N²), but in reality, it’s twice as fast as Bubble sort.
Insertion sort ⇒ Efficiency is O(N²) with best case O(N)
Quicksort ⇒ Efficiency is O(N log N) with worst-case O(N²)

Often, when an algorithm nest one loop inside another, the algorithm is O(N²). Nested loops that result in O(N²) occur when each loop revolves around N.

Big O Notation only concerns itself with general categories of algorithm speeds. When two algorithms fall under the same classification of Big O, further analysis is required to determine which algorithm is faster.

All steps are significant. But when we express the steps in the Big O terms, we drop the constants and simplify the expression.

Big O Notation only takes into account the highest order of N when we have multiple orders added together.

Being able to discern best, average, and worst-case scenarios is a key skill in choosing the best algorithm for your needs.

Dealing with multiple datasets ⇒ possibility to express the time complexity as O(N * M).

With an algorithm of O(2ⁿ), each time we add one element of data, the algorithm doubles in steps.

A hash function must convert the same string to the same number every single time it’s applied.

A hash table’s efficiency depends on three factors:

How much data we’re storing in the hash table
How many cells are available in the hash table
Which hash function we’re using

A good hash table strikes a balance of avoiding collisions while not consuming lots of memory. The ideal load factor is 0.7 (7 elements / 10 cells).

Stacks and queues are elegant tools for handling temporary data. They have a special focus on the order in which the data is handled. They are known as abstract data types—it’s a kind of data structure that is a set of theoretical rules that revolves around some other built-in data structure. Abstract data types give us a new mental model for tackling problems.

Stack three constraints:

Data can be inserted only at the end of a stack.
Data can be deleted only at the end of a stack.
Only the last element of the stack can be read.

Queue three restrictions:

Data can be inserted only at the end of a queue.
Data can be deleted only from the front of the queue.
Only the element at the front of a queue can be read.

One type of problem in which recursion is a natural fit is when we need to delve into multiple layers of a problem without knowing how many layers there are. A second one is when we can make a calculation based on a subproblem of the problem at hand. A subproblem is a version of the very same problem applied to a smaller input.

The Top-Down thought process of recursion:

Imagine the function you’re writing had already been implemented by someone else.
Identify the subproblem of the problem.
See what happens when you call the function on the subproblem and go from there.

Using recursion doesn’t mean that you have to eliminate loops from your code. Note that recursion can slow down your code a lot.

O(N!) is known as factorial time.

Dynamic programming is the process of optimizing recursive problems that have overlapping subproblems. It’s typically accomplished with one of the two techniques: memoization or going bottom-up.

To partition an array is to take a random value from the array—which is then called the pivot—and make sure that every number that is less than the pivot ends up to the left of the pivot, and that every number greater than the pivot ends up to the right of the pivot.

The Quicksort algorithm is a combination of partitions and recursion. It works as follows:

Partition the array. The pivot is now in its proper place.
Treat the subarrays to the left and right of the pivot as their own arrays, and recursively repeat Steps 1 and 2. That means we’ll partition each subarray and end up with even smaller sub-arrays to the left and right of each subarray’s pivot. We then partition those sub-subarrays, and so on and so forth.
When we have a subarray that has zero or one element, that is our base case, and we do nothing.

Quickselect can be thought of as a hybrid of Quicksort and binary search. It can find a value without having to sort the entire array.

As of this writing, the fastest sorting algorithms we know of have O(N log N) speeds. There are a lot of algorithms that use sorting as a component of a larger process.

A linked list is a data structure that looks and acts similarly to an array. But instead of being a contiguous block of memory, that data is scattered in the computer’s memory. Connected data that is dispersed throughout memory are known as nodes. Each node comes with the memory address of the next node in the list. The first node is the head, and the last is the tail. When dealing with a linked list, we only have immediate access to its first node. Linked lists are an amazing data structure for moving through an entire list while making insertions or deletions.

A double linked list is O(1) for both inserting at the end and for deleting from the beginning. That’s what makes it a perfect fit for serving as the queue’s underlying data structure.

A binary tree is a tree where each node has zero, one, or two children. A binary search tree is a binary tree that also abides by the following rules:

Each node can have at most one left child and one right child.
A node’s left descendants can only contain values that are less than the node itself. Likewise, a node’s right descendants can only contain values greater than the node itself.

A binary search tree is significantly faster than an ordered array when inserting and deleting data. The process of visiting every node in a data structure is known as traversing the data structure.

A priority queue is a list whose deletions and access are like a classic queue but whose insertions are like an ordered array. Better than an ordered array, the heap data structure serves as a more efficient foundation. We’d rather use a data structure that is constantly very fast than a data structure than is sometimes extremely fast and sometimes slow.

The binary heap maintains the following conditions:

The value of each node must be greater (for max-heap) than each of its descendant nodes. This rule is known as the heap condition.
The tree must be complete. It is to ensure that the heap remains well-balanced.

The heap itself can be an abstract data type that can use an array under the hood. It’s because using an array to implement the heap solves the problem of the last node. A heap’s last node is the rightmost node at its bottom level.

Traits of an array-based heap:

To find the left child of any node ⇒ (index * 2) + 1
To find the right child of any node ⇒ (index * 2) + 2
To find a node’s parent ⇒ (index - 1) / 2

The trie (pronounced “try”) is a kind of tree that is ideal for text-based features such as autocomplete. It can have any number of child nodes.

A graph is a data structure that specializes in relationships. While all trees are graphs, not all graphs are trees. For a graph to be considered a tree, it cannot have cycles, and all nodes must be connected. In graph jargon, a node is a vertex, the lines between vertices are edges, and adjacent vertices are neighbors. A graph where all vertices are connected is said to be a connected graph. A specific sequence of edges to get from one vertex to another is called a path.

Two well-known approaches for graph search are depth-first search (DFS) and breadth-first search (BFS). BFS doesn’t use recursion but a queue. BFS is good for staying close, and DFS is ideal for moving far away quickly. Their efficiency is approximately O(V + E) in a worst-case scenario.

Another useful type of graph is the weighted graph. We can use Dijkstra’s algorithm to solve the shortest path problem.

The key question for space complexity: “if there are N data elements, how many units of memory will the algorithm consume? “ To describe space complexity, we’re only counting the new data the algorithm is generating. Note that a recursive function takes up a unit space for each recursive call it makes. Depending on your constraints, it’s always a tradeoff between time and space complexity.

A greedy algorithm is one that, in each step, chooses what appears to be the best option at that moment in time.

Last Updated

April 18th, 2022

A Common-Sense Guide to Data Structures and AlgorithmsBy Jay Wengrow

Notes

Last Updated