How to Build a Computer 11: The Binary Search Algorithm

We’re taking a break from the manufacturing process to cover some ideas in programming. Algorithms, what that means and why. Sounds fancy, doesn’t it? It ain’t as bad as it sounds. Let’s jump right in:

What’s An Algorithm?

It sure as heck ain’t Al Gore’s killer dance moves. The word “Algorithm” means something like a recipe. It’s a series of rules, and if you follow the rules you get the promised result. Computer jerks talk about them because you have to break your instructions down into very simple bits in order to make a machine understand what it needs to do. Let’s try a very simple algorithm.

The MakeStuffBigger algorithm; pass in stuff, get bigger stuff out.

Pass in some stuff
Add one to it.
Return your enbiggened stuff.

That’s all it takes. You MakeStuffBigger on 17 and you get 18 out of it. This is the sort of thing a third grader uses when he’s saying “nuh-uh infinity-plus-one times!”, a practice upon which I’ve discoursed previously.

You can see how the recipe comparison works; a recipe is an algorithm for making a food. Let’s take a look at that.

The PizzaMe algorithm; Pass in dough, sauce, cheese, and one or more toppings. Returns one Pizza

Roll out pizza dough so it’s flat.
Apply sauce.
Apply cheese.
Apply more cheese
Look, I’m from Wisconsin, you’re not getting out of this until there’s enough cheese on that pizza, get me?
For Each topping in [Set of Toppings] apply Topping to pizza.
Preheat oven to 400 degrees Freedom (254 Commie)
Once oven is at temperature insert pizza
Bake for 20 minutes
Remove pizza from oven.

Notice anything? The preheating instruction comes in at step 7. Odds are you were already thinking you should start the oven at step one so you don’t have to wait so long. The oven can be warming up while you’re mucking about with the cheese. That’s the thing people generally worry about when they’re talking about algorithms; which one is quicker. Both the original and revised PizzaMe algorithms will produce a Pizza, just one is quicker.

You could also apply a layer of abstraction to this:

EasyPizzaMe: Pass in Cash, returns one Pizza

Call for delivery

The PizzaMe algorithm is still happening, it’s just happening at the pizza joint. You’ve abstracted past the point where you have to worry about those details. Neat, huh?

Okay, but back to computers.

What’s a Binary Search?

What’s a search? You’re looking through a thing to find another thing. Let’s say you’re looking for my house in Eau Claire Wisconsin. Here’s one way you could go about it.

Find a house in Eau Claire. Knock on the door.
If I answer the door you’re done.
If I don’t then find another house and check that one.

Not terribly efficient, and that’s even assuming I’m sociable enough to answer random door-knockers. Eventually it’d get the job done though. You could make it a little more efficient (3. Ask that guy if he knows Hank) but it’s still slow. But hey, Eau Claire only has like 60,000 people, if you set a computer to do it it’ll be quick enough. It’s not you knocking on those doors after all.

Let’s modify the parameters. Supposing I lived in New York City. At ten million folks it’s much less likely that your computer will perform this search in a reasonable amount of time. (Reasonable defined as ‘I’m getting bored over here!’ It’s actually a pretty useful standard.) Now suppose you don’t actually need to get to my house, you just need my phone number. You could look in the phone book. (I don’t have a land line, but my NYC alter-ego does. That guy’s a dork.)

Simple phone book search: Pass in a name, get out a phone number

Look at the first entry.
If it’s Hank Rhody you’re done.
If it isn’t, look at the next entry.

Still going to take a while, even for a computer. That half second while it’s thinking is enough for me to refresh my Ricochet alerts and get distracte. But hey, phone books are alphabetical by last name. Let’s try again, using that information:

Open the book to the middle. Check any name on that page.
If it’s Hank, you’re done (lucky you!)
If it isn’t, check whether it’s alphabetically before or after it. Probably before; R is towards the end of the alphabet.
Stick your thumb in the book at that page. Open the phone book halfway between your thumb and the correct end of the book.
Repeat.

Each time you’re halving the amount of names you’d have to look through. Even if you’re doing it stupidly you’re going to get to my name after something like thirty checks. It’s a little more complicated than check each name in order but it’s much quicker.

That’s a binary search algorithm; “Binary” means you’re splitting the dictionary in two. It’s pretty quick; it’s possible to make quicker algorithms but they get more complex, and I’ve never needed them at the scale I work. I only got into binary searches because the “check each one” was taking too long. I had an Excel file with a big table (~250k lines, at that time) that I wanted to match times against. Here’s a function I wrote in VBA to do just that.

MatchTimeToTable: Stick in a time, returns the row of the table that contains that time.

“hi” is the highest number line you’re considering, “lo” is the lowest number. Naturally midPt is the middle-est.

Does it make sense? Probably not just from reading through it. I’ll work out an example for you. Let’s say I’m looking between rows 2 and 200,000 for a value. Eventually I’ll find it in the slot 31,337. Here’s a walk-through of what happens:

Calculate midPt (99,999).
Is it correct? No.
Is the guess too small? No.
Then the guess is too big; reset the top row to 99,998 (the next smallest row after the guess) and try again.

Calculate midPt (49,999). Too big again.
Reset and calculate midPt again. 24,999. Too small. Adjust the lower bound this time to 25,000, and calculate it from there.

Calculate midPt again (49,999 – 25,000)/2 + 25,000 = 37,498 (That’s a slightly unusual calculation for an average. The minus/plus bit is to avoid the number getting temporarily too big and breaking things.)

37,498 is too big again. Reset your upper bound and calculate it again. Now we’re checking between 25,000 and 37,497.
Repeating, we get approximately 31,250 (close!), 34,375 (further), 32,750, 32,000, 31,650, 31,450, 31,350, 31,300, 31,325, 31,337

Without cooking the example we found one value in 200,000 in only fourteen guesses. Lighting quick! Okay, I may have rounded for ease in the later calculations; I was doing this manually. You get the point though; having a clever algorithm makes doing your job immensely quicker. Quicker enough to make a difference? It helped quite a bit with my particular problem. But elsewhere?

Why’s an Algorithm?

Once upon a time I was studying thickness of a polyimide film on stainless steel. Had to do with process development. I had a measurement machine and I wasn’t sure I trusted the output of it’s program. Found a way to export a list of coordinates. Three hundred and sixty thousand lines of X, Y, and Z positions. Okay, I can make a visual map of that stuff in an Excel sheet. My first pass at an algorithm was stupid; with nested loops. It’d run through the whole list once for each item in the list. Worse than that; I was asking it to do calculations proportional to the cube of the lines in the table. At least 2,000,000,000,000,000 number crunches. That’s a number so large the national debt hasn’t even reached it yet. Even with Moore’s law at my back I wasn’t going to finish that calculation.

Second pass at the algorithm was less stupid. It only looped through the list once, which meant that I was only asking it to perform on the order of 360,000 calculations. That’s a number so small the federal government spends it without noticing. But I’m just a hedge programmer, playing in Excel while the real programmers are out writing code in hexadecimal. What does this kind of stuff look like in the real world?

Let’s say you’re looking for a unique phrase on every web page on the internet. You go to Google and plunk in the name Claymore J. Flapdoodle and see what it spits out. It tells me “About 16,100 results (0.55 seconds) “. Google has an algorithm that it uses to search the whole internet, categorize results, and show you the best ones, and do it all in half a second. That isn’t (just) rooms and rooms of computers. I don’t know how they do it, but it’s a heck of a lot more complex than a binary search. Google’s search algorithm is one of the most valuable bits of intellectual property in the world. Now you can understand why. The difference a good algorithm makes is the difference between half a second and half a century.

I hope you’ve enjoyed our little interlude into logic. I’ll get back to exposing next week (swearsies) once I’ve gathered some more pictures. Join us next week for “The Terrible Truth of the Autoexposer — Revealed!” or “Freaking Finally”.

This is part eleven of my ongoing series on How to Build a Computer, the “That’s not a Knife” way. You may find previous parts under the tag How to Build a Computer. This week’s post has been brought to you by Eggrolls Plus. The Plus is good too, but you’ll never know it because you’ll love our eggrolls so much. Eggrolls Plus!

[First] [Previous] [Next]

Published in Science & Technology

Tags: Algorithms, cheese, google, How to Build a Computer, Math, Pizza, Saturday Night Science, Wisconsin Life

There are 32 comments.

Become a member to join the conversation. Or sign in if you're already a member.

Contributor
Hank Rhody, Probably Mad
@HankRhody

Post author

8:28 PM EDT ⋅ Aug 15, 2018

dnewlander (View Comment):

LC (View Comment):

[…]

I’ve always found this visualization below to be interesting. Poor bubble sort.

I remember having to write all of those sorts in high school (in Pascal!) just to prove that quicksort was fastest (because we didn’t have YouTube in 1985).

By the time I was in college, they didn’t even bother with that exercise, merely telling new programmers, “Use Quicksort. It’s the fastest.”

Just so y’all know, I spent a meeting today working out how I’d do a VBA quicksort algorithm and sorting ten random numbers in my notebook. I’d call it a boring and unproductive meeting, but this one really wasn’t.
- #31

Member
Judge Mental
@JudgeMental

12:57 AM EDT ⋅ Sep 2, 2018

My post on The Texas Two-Step.
- #32

Conservative Conversation + Podcasts

There are 32 comments.

@