Ricochet is the best place on the internet to discuss the issues of the day, either through commenting on posts or writing your own for our active and dynamic community in a fully moderated environment. In addition, the Ricochet Audio Network offers over 50 original podcasts with new episodes released every day.
Weekend Geek: Strictly Speaking, One Immortal Monkey Would Suffice
This will be a recurring (maybe) series of posts on science, mathematics, and related topics. Apparently, there’s an appetite for this sort of thing at Ricochet. Who knew?
I can’t promise this will be a weekly feature, but call it one in a row so far. Some of you may be put off by math. I hope I can reel you in. Trying to discuss science without mathematics is a little like trying to discuss great works of art without seeing them or great musical works without hearing them. I’ll walk the reader through the math and make it as simple as it can be, but no simpler. Relax, you’ll love it.
Shall we begin?
Monkeying Around…
I’m sure we’ve all heard that “given enough monkeys with typewriters and enough time, they will eventually type the complete works of Shakespeare.” True or false? Not to get all Clintonian, but it depends on the meaning of the word “enough.” Some “scientists” have experimented with actual monkeys, only to find that they were more interested in breaking the typewriters and pooping on them. I certainly hope my tax dollars didn’t pay for that.
In another study, computer simulations were used to look for randomly generated sequences of 9 characters that would match something out of Shakespeare, so that they could be manually reassembled. Sorry, that’s cheating. First, computers are many orders of magnitude faster than live monkeys. Second, looking for short sequences in random order, then manually sorting and reassembling them, is vastly different from producing the complete works of Shakespeare, already arranged in the correct order, in one sequence.
Let’s forget about live, breathing, pooping monkeys for a bit. What we are really postulating is that a long enough sequence of random characters will contain the specific, non-random string we are looking for. This will be simpler if we start off small. Let’s assume we are typing only random digits and looking for the string “123” (just the numbers, not the punctuation). We might ask how long a sequence we would need to type to guarantee that string’s appearance. But it would be the wrong question. There is simply no finite length that would guarantee such a thing. The right question is framed in terms of probabilities.
Suppose you have a 1,000-digit sequence. What is the probability that the string “123” appears in it somewhere? There are 998 possible starting locations where it could appear. The probability that the string appears somewhere is equal to one minus the probability that it does not appear anywhere. This, in turn, is equal to the probability that the sequence does not appear starting in location 1 times the probability that it does not appear starting in location 2, and so on, up to location 998.
The formula is as follows, both symbolically and numerically (stay with me, you can do this):
L = Length of the sequence = 1,000 digits
N = Length of the desired string = 3 digits
K = Number of possible values of each digit = 10
p = Probability that one particular string out of the sequence matches the desired string
p = 1/K^N = 0.001 (I am using ^ to signal an exponent – Ricochet apparently doesn’t do superscripts)
p’ = Probability that one particular string out of the sequence does not match the desired string
p’ = 1-(1/K^N) = 0.999
P’ = Probability that no string out of the sequence of length L matches the desired string
P’ = [1-(1/K^N)]^(L-N+1) = 0.999^998 = 0.368432
P = Probability that at least one string out of the sequence of length L matches the desired string
P = 1 – [1-(1/K^N)]^(L-N+1) = 1 – 0.999^998 = 0.631568
Now, let’s go the other direction. Suppose that instead of postulating the length of the sequence, you postulate a probability that the string appears. To determine the minimum length of the sequence, you just solve the last of the above equations for L. I’ll do it by witchcraft, for now, to save you a bit of time:
L = K^N ln [1/(1-P)] + (N-1) where ln is of course the natural logarithm. This works if you plug in the above numbers:
K^N ln [1/(1-P)] + (N-1) = 10^3 ln [1/(1 – 0.631568)] + (3-1) = 1000.5 (chalk up the .5 to roundoff error).
Now, back to the original problem.
I found a file containing the complete works of Shakespeare on the Internet. Deleting the legal notices and other blurbs, Word counts 4,927,356 characters, including spaces. Old typewriters tended to have around 51 keys on the keyboard, counting the space bar, shift, shift lock, and return keys (for those with electric return). I have little basis for the following assumption other than gut feeling, but let’s say around 1% of the characters in Shakespeare require use of the shift or shift-lock key. Let’s assume the monkey is not going to hold down a shift key while typing other keys; we are assuming one key at a time, so a capital letter requires pressing the shift lock key, then the desired letter, then the shift key to release the lock. We therefore increase the number of characters by 2%: 1.02 x 4927356 = 5,025,903 keystrokes. How long would it take one monkey with a 51-key typewriter to have a 90% chance of producing the complete works of Shakespeare?
P = 0.90
N = 5,025,903 keystrokes in Shakespeare’s complete works
K = 51 keys to choose from on the keyboard
L = K^N ln [1/(1-P)] + (N-1) = 51^5,025,093 (2.30258) + 5,025,902.
The result L (the total number of keystroke typed) can be expressed in scientific notation as 2.711 x 10^8,582,082, an unimaginably large number (this was determined using logarithms, which are essential for numbers like this that are way too large to calculate directly).
Assuming our persistent primate can type 10 keystrokes per second nonstop (equivalent to about 120 words per minute), he can manage 864,000 keystrokes per day and 315,576,000 keystrokes per year, and will have a 90% chance of honoring William in just 8.6 x 10^8,582,073 years. Time to hire some extra help!
We’re a bit fuzzy on global monkey population numbers. We tend only to count the endangered species. But let’s assume a billion monkeys can be rounded up and put to work. All work tirelessly and ceaselessly, without pausing to eat, sleep, poop, or make new monkeys. So divide the required length of time by one billion. Now, only 8.6 x 10^8,582,064 years will be required. Most likely, within ten thousand (10^4) years the works of Shakespeare will have been forgotten; within one billion (10^9) years monkeys will be extinct; and within ten billion years (10^10) the earth will have been swallowed by the sun. Therefore, for all practical purposes, “enough” monkeys with “enough” time cannot possibly produce the works of Shakespeare, although Finnegan’s Wake wouldn’t take very long (read a few pages of it, you’ll see what I mean).
Ah, but what about infinite monkeys with infinite time? That’s different. Go back to the probability equation:
P = 1 – [1-(1/K^N)]^(L-N+1)
Notice that the quantity (1/K^N) is extremely small, but greater than zero. So the quantity [1-(1/K^N)] is less than one. As L approaches infinity (mathematicians never like to talk about a number “equaling” infinity), the quantity [1-(1/K^N)]^(L-N+1) approaches zero, so the probability P approaches 1, or certainty. Note that infinite monkeys are not required. One monkey with infinite time will reproduce the complete works of Shakespeare, as well as the complete works of Marcel Proust, and every other work of literature that has ever been written or could be written (although I wish the monkey wouldn’t bother with Naked Lunch), backwards and forwards, with and without every possible spelling error, in English and every other language that exists or doesn’t exist, within the limitations of the character set available on the monkey’s typewriter.
If you have an infinite number of monkeys, you only need a finite period of time. The number of keystrokes required to produce the works of Shakespeare can be performed in 5.8 days, by my reckoning. If you set an infinite number of monkeys to the task, at least one of them will do just that.
Published in General
“Arthur!” he said, “this is fantastic! We’ve been picked up by a ship powered by the Infinite Improbability Drive! This is incredible! This is incredible! I heard rumors about it before! They were all officially denied, but they must have done it! They’ve built the Improbability Drive! Arthur, this is . . . Arthur? What’s happening?”
Arthur had jammed himself against the door to the cubicle, trying to hold it closed, but it was ill fitting. Tiny furry little hands were squeezing themselves through the cracks, their fingers were inkstained; tiny voices chattered insanely.
Arthur looked up.
“Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us about this script for Hamlet they’ve worked out.”
Right. If the sequence just keeps repeating the same finite sequence over and over to infinity, it is not random. There is a small, nonzero probability that a repeating string of finite length could be produced by a random process, but the probability approaches zero as the length of the string approaches infinity.
Of course, in reading Finnegan’s Wake, time decompresses to infinity with each additional character read.
But can you prove it?
Mathematicians. So easy to troll.
I concede the error. Because of the nature of the written work, it’s intuitively appealing to believe we should restrict ourselves to countable infinities only, but a monkey typing a truly infinite string of, say, 1s and 0s, could represent any location on a dyadic division of the unit interval, and the unit interval is uncountable.
It’s not trolling to drop a hint that prompts someone to actually think it through :-)
I am an engineer. I don’t prove stuff, I just solve the problem at hand. I don’t revel in complications, I try to cut through them.
One monkey, infinite keystrokes, starting at t=0: Just count the keystrokes.
One monkey, infinite keystrokes, no starting time (i.e., the monkey has always been typing): Start at an arbitrary keystroke and count in both directions; alternately i.e., 0, 1, -1, 2, -2, etc.
Infinite monkeys, infinite keystrokes, starting at t=0: Number the monkeys starting at 1, starting with any arbitrary monkey, and arrange them on an x-axis. Each monkey’s keystrokes starting at t=0 are arranged on the y-axis. Start counting at (M1,K1); then (M1,K2), (M2,K1); (M1,K3), (M2,K2), (M3,K1); and continue counting on line segments of slope -1 proceeding radially outward from the origin.
Not a proof, just some examples of how the keystrokes can be counted. Am I oversimplifying?
Yes, but that’s a different problem.
Sorry! You’re right, whether the one string the everlasting monkey types is of countably infinite length is obvious: it is.
Whether there are uncountably many different strings the monkey could type if he didn’t stop as soon as he’d produced a play, but kept going forever is another question (and the answer to it is yes, as I gave). Apologies, still in a post-seizure haze over here. Hopefully it only takes a week or so (not the dreaded three months) to lift :-)
No problem. I admit I had to brush up on a few of these concepts, assuming I ever knew them in the first place. I am learning at least as much as my readers.
But in order for the monkey to truly create the works of Shakespeare he’d not only have to type it within an infinite string of characters but also extract that exact portion. And presumably force unwilling high school students to read it.
Even then, the characters would contain no meaning. And therefore not be an acceptable recreation.
Math that one out suckers!
Sheldon, Sheldon, is that you Sheldon?
By the way, this works for the creation of the world as well. Infinite time is a handy replacement for god. (Though perhaps it would be better to call it a handy refutation of the argument from design, which, despite a thorough refutation by David Hume in the 18th century, refuses to die.)
We’re going to go through all possible permutations of history as well …
I have always suspected that Paul Krugman’s columns are written by monkeys typing random characters.
I thought that this amounted to a disproof of Evolution, and evidence in favor of Design. You have shown that a string of characters on the order of the information contained in the shortest functional amount of DNA would require much, much more time than the four billion years that that are alleged to have been available.
Would you mind walking us through the math on that?
This is a harder problem.
Since writing this post, I have realized that I have overestimated the length of time required to produce the works of Shakespeare, although I don’t think it will make a big difference as to the conclusion. I calculated the probability of producing one specific string of 4,927,356 characters, whereas there might be many different strings that would be recognizable as the works of Shakespeare (plays and sonnets in a different order, variations in spacing, etc.). Similarly, the probability of creating some intelligible piece of literature would be orders of magnitude greater, although still remote. When considering creation, rather than focusing on the probability of randomly generating some specific DNA string, the relevant problem would be randomly generating a string–any string–which would create a living organism of some kind. This is a very different problem.
What else besides DNA produces life? Is there any life without DNA? Any indication of some other possibility that would require less information?
It doesn’t matter how many faces are on the die; if time is infinite, each face comes up infinitely many times.
To be more concrete, suppose that out of all the possible universes that can develop, there’s a one-in-a-ten-to-the-google chance of a universe developing in which life can spontaneously develop.
Well, if time is infinite, such a universe appears infinitely many times.
Was I unclear? There are many different DNA strings that code for various forms of life. The problem I originally solved would have been analogous to looking for just your DNA, for example.
I think Larry Niven said something to the effect “First posit infinity. After that, everything is easy.”
Yeah, I was thinking this might make an interesting line: “Infinity: charming dinner guest, insufferable roommate.” As in, it’s difficult to really live with its consequences.