Data Science

Complexity Analysis Of Linear And Binary Search Using Python

A comprehensive walkthough

Tiago Miguel

--

Photo by Nathan Dumlao on Unsplash

The optimization of any product or service is a key step to ensure the best efficiency possible, which translates into having the most output using the least amount of resources.

Code is versatile, allowing the coder to solve problems in various ways. However, not every way is the most efficient one. To demonstrate it I’ll state a searching problem and use a linear and a binary search algorithm.

Linear search checks for the solution starting at the beginning and moving forward one by one and binary search eliminates half of the searching space everytime it evaluates an object.

After solving the problem we are comparing and analysing their complexity using the Big O notation. This measuring tool uses the worst case scenario to quantify complexity.

Stating the problem

We are going to use a simple rotating list problem. To solve it, the algorithms should be able to find how many times an ascending sorted list of integers was rotated.

Rotations example.

In the figure above we can clearly see the sorted list and its final arrangment after three rotations. In each rotation, the last number is popped out of the list and added in the first position.

We’ll try and cover a considerable amount of edge cases. Edge cases are different combinations of the values and rotations in the lists that require slightly different logic, which our algorithms should be able to handle.

  1. A list of size 10 rotated 3 times;
  2. A list that wasn’t rotated;
  3. A list rotated once;
  4. A list of size of size n rotated n-1 times;
  5. An empty list;
  6. A list of size 1;
  7. A list of size 15 with repeating values rotated 6 times;
  8. A list of size 15, where the greatest value is repeated, rotated once.

We’ll attribute ‘-1’ as a hypothetical number of rotations for empty lists. These should be enough edge cases, let’s take a look at the lists:

The lists are contained in dictionaries and the dictionaries will be contained in a list that will be passed to a function that loops over each dictionary.

You can see that in the dictionaries there is an input key, that contains another key nums whose value is the rotated list, and an output key containing the number of rotations we want the algorithms to find.

Linear search

The linear search algorithm logic is simple. It starts at the beginning of lists counting the position and, at every iteration, the algorithm will check if the next value is smaller. If so, because the list is sorted, it considers the current number as the highest and returns the position.

We shouldn’t forget that Python starts counting at 0. So, to return the correct position value (by human standards), we should add 1 to it.

The figure bellow essentially represents what the code is doing. Each arrow means that the next value was checked. If it is greater, the code adds 1 to the position and jumps to the next cycle (green arrow). If it is smaller, the loop ends and returns the current position (red arrow).

Linear search iterates each position one by one.

Binary search

The major advantage there is when using binary search as oposed to linear search is the greater efficiency. As mentioned above, in binary search, the algorithm will check the middle value and, based on some logic, will drop half of the searching space and look for the middle position in the remainder.

Because it doesn’t check each value one by one, it needs to be provided with more complex logic. Before presenting you the code, let me show you some examples of how the algorithm deals with the problem.

Comparison between first and last value to see if list is unrotated.

The blue numbers represent the position of the middle number on the left of the list and the middle number itself inside the list. The middle position is calculated through an integer division of the length of list rounded down.

The first consideration the code does is check if the list is empty. If it is, a -1 is returned. If not, it will check if the value in the lowest position is lower than the value in the highest position (srtaight green arrows). If so, the list is unrotated and the algorithm returns a 0.

Dropping of the right half of the list.

If the first value is greater than the last, the code checks if the value in the position next to middle is greater (red round arrow). Because it isn’t, hence the colour red, it checks if the value in the first position is greater than the middle value, which it is, represented by a green straight arrow.

This means that the greatest value is between the middle and first position. The the middle position becomes the new highest position and the right side of the array is dropped. A new middle position is calculated.

The same loop is applied and the code finds the middle position because the following value is smaller (green round arrow). It returns the third position.

Dropping of the left half of the list.

If the following number isn’t smaller nor the lowest position number is greater that the middle number, the lowest position is replaced by the position after the middle and the left half is dropped.

Note that when the correct value is on the right side, the middle value is ommited, but otherwise it doesn’t happen. The reason for this is that the code needs the middle value to have a smaller following value so that it returns the correct position.

Demonstration of why the middle position os kept when right half is dropped.

Looking at the three examples above, we start to understand what is the logic between the binary search algorithm. Presenting you the code, can you understand how these steps are integrated into it and what other steps are there if any?

You can see the body of the function is slightly more complicated. Just to be sure, let’s see if both algorithms return correct results.

Testing the algorithms

To test the algorithms, we define a function that accepts the linear and binary search functions, and also prints additional information like the list we are analysing, the predicted and true position, and the elapsed time.

Let’s evaluate the linear search algorithm:

Let’s evaluate the binary search algorithm:

We can see that all edge cases were correctly evaluated by both search algorithms. That is great, we can now go through the complexity analysis and what the Big O notation is.

Complexity analysis

We are using the Big O notation to quantify the complexity of both algorithms. Complexity is the amount of time and space and algorithm needs to analyse an input of size n [1].

Using the worst case scenario means that the time complexity of an algorithm with many objects growing at different rates will be calculated as if all were growing at the highest rate [1].

For instance, in a polinomial function of type f(n) = a + bn + cn² + … + dn^m, the Big O of f(n) is of order m: O(f(n)) = O(n^m) [2].

As for the space complexity, it depends on the memory space used by the algorithm to perform the task and the size of the input array [1] [2], which doesn’t really change in both situations. So, their space complexity is O(1).

For linear search, time complexity is O(n) because in the worst case, the entire input array of size n is searched one by one until the algorithm finds the result.

For binary search, time complexity is different. At every iteration, the searching space decreases by half, that means that the time needed to analyse it also decreases by half.

Size decrease of an array in binary search.

Given that the final length of the array is 1 in the worst case scenarion, we can rearrange it to:

logarithmic complexity.

So, let’s say we would like to know how much more time it would take for a search algorithm with an input array of size 10 to com up with the correct answer if the array increased ten times in size:

Complexity comparison between O(n) and O(log(n)).

We can see that the time needed for the linear search algorithm will increase 10 times, whilst the binary algorithm will increase only 0.5 times. This can become powerful for big searching spaces.

Big O complexity types.

You can see that there are many complexity types, but I won’t get into much more detail here about the others. In the figure you can also visualise the difference of time complexity between both algorithms we address here.

Testing the complexity

So that we can have a better understanding of how much binary search improves the linear search, we are going to test the runtime of an unrotated array with a large size of 10 million values using both algorithms.

It is clear that the time complexity is vastly decreased when using binary search in large arrays. The linear search algorithm took almost 330 thousand times more time to complete.

Closing thoughts

There is a major difference in the time taken for both algorithms to complete, which leads to the conclusion that, when applicable, binary search should be used instead of linear search.

Also, one may think that since binary search has slightly more complex logic, it would be slower than linear search, but that is only true for small searching spaces and quickly becomes diluted in the time complexity of bigger arrays.

If possible, the logarithmic Big O complexity should be prefered as it will consume less computational resources and optimize our code.

Thank you for reading and I hope you learned something from this story!

References

[1]. Introduction to Binary Search and Complexity Analysis with Python, by Aakash, at jovian.ai

[2]. Analysis of algorithms, by geeksforgeeks.org

--

--

Tiago Miguel

Chemical Engineer enthusiast in learning programming with python, data science and machine/deep learning skills