Blog > The Binary “Look-and-Say” Sequence

The Binary “Look-and-Say” Sequence

November 7th, 2010

The look-and-say sequence (which I talked about here) is the sequence that you get by starting with the number 1 and constructing the next term in the sequence by “reading” the previous term. So 1 becomes “one one”, or 11. That becomes “two ones”, or 21. That becomes “one two, one one”, or 1211, and so on.

In this post, I am going to investigate the related binary version of the sequence, which starts off 1, 11 much like the regular sequence. But then when reading 11, we read it as “two ones”. Since two in binary is 10, the next term in the sequence is 101. When reading that term, we read it as “one one, one zero, one one”, so the next term is 111011. That term is read as “three ones, one zero, two ones”, and since three is 11 in binary and two is 10 in binary, the next term is 11110101, and so on. In this post we will answer two questions in particular about this sequence:

1) On average, how much longer is the (n+1)th term in the sequence than the nth term in the sequence?

2) On average, what is the ratio of the number of ones to the number of zeroes in the sequence?

Non-Interacting Subsequences

Much like the regular look-and-say sequence, we are able to study this sequence by constructing a “basis” of non-interacting subsequences that every term in the binary look-and-say sequence is made up of. Fortunately, constructing such a family of subsequences for the binary version of the look-and-say sequence is much simpler than it is for the decimal version of the sequence – here we only need ten different basic subsequences (whereas we needed 92 different subsequences for the regular look-and-say sequence!). These ten subsequences, and the subsequences they evolve into, are summarized in the following table.

# Subsequence Evolves Into
1 1 (2)
2 11 (3)(1)
3 10 (5)
4 110 (3)(4)
5 1110 (6)
6 11110 (7)(4)
7 100 (9)
8 1100 (3)(8)
9 11100 (10)
10 111100 (7)(8)

So for example, the first term in the sequence, 1, evolves into the subsequence (2), which is 11. That term then evolves into subsequence (3) followed by subsequence (1), or 101. That term then evolves into the subsequence (5) followed by the subsequence (2), or 111011, and so on. The reason that this representation of the sequence is useful is we can use it to describe the evolution of the binary look-and-say sequence entirely within a matrix T. In particular, we let T be the matrix with 1 in its (i,j) entry if the subsequence (i) appears in the evolution rule for subsequence (j), and 0 in its (i,j) entry otherwise:

Now if v is a 10-dimensional vector whose ith entry indicates how many times the subsequence (i) appears in a particular term of the binary look-and-say sequence, it follows that the entries of Tv tell us how many times each subsequence appears in the next term of the binary look-and-say sequence. So it follows from standard theory of linear homogeneous recurrence relations that we can now read off all of the long-term behaviour of the binary look-and-say sequence from the eigenvalues and eigenvectors of T.

Rate of Growth of the Sequence

The asymptotic rate of growth of the number of digits in the terms of the binary look-and-say sequence is simply the magnitude of the largest eigenvalue of the transition matrix T above. Using Maple it is simple to derive this value. If Ln is the number of digits in the nth term of the binary look-and-say sequence, then

This limit is approximately 1.465571, which means that the binary version of this sequence grows much faster than the decimal version of the sequence (recall that the growth rate of the number of digits of the regular look and say sequence is approximately 1.303577). This limit is also the unique real root of the cubic x3 – x2 – 1, which follows from the fact that the characteristic polynomial of T is

Ratio of Number of Ones to Zeroes

If we let Nn denote the number of ones in the nth term of the binary look-and-say sequence, and if we let Zn denote the number of zeroes in the nth term of the sequence, what is

In other words, what is the average ratio of ones to zeroes in this sequence? The following table shows the value of Nn/Zn for n = 3, 4, …, 25, which might give some intuition to the problem:

n Nn/Zn
3 2.000
4 5.000
5 3.000
6 2.000
7 2.000
8 2.000
9 1.786
10 1.762
11 1.742
12 1.717
13 1.691
14 1.690
15 1.680
16 1.676
17 1.672
18 1.671
19 1.669
20 1.668
21 1.667
22 1.667
23 1.666
24 1.666
25 1.666

Based on numerical estimates like those given in the table above, it has been conjectured that the limiting ratio is 5/3 (or some nearby value). We will now show that the limit does indeed exist, but its value is not 5/3 — it just happens to be really close to 5/3.

Much like the maximal eigenvalue of T tells us the overall growth rate of the sequence, the corresponding eigenvector tells us the distribution of the different subsequences that are present in the limit. Once we know the distribution of the individual subsequences, it is not difficult to find out the overall ratio of ones to zeroes by weighing the different subsequences appropriately. So our first step is to find the eigenvector corresponding to the maximal eigenvalue. To this end, it will be convenient to let

α is the same as in the previous section, and β is exactly the growth rate limit that we computed. Then the eigenvector corresponding to the maximal eigenvalue of T is:

What this means is that, in the limit, the fifth subsequence, 1110, is β times as frequently-occurring as the sixth subsequence, 11110 (for example). Now we just weigh each subsequence according to how many zeros and ones they contain, and we find the limiting ratio of ones to zeroes is

In particular, this ratio does not equal 5/3, but rather its decimal expansion begins 1.6657272222676… (which is less than 1/1000 away from 5/3).

Edit (September 14, 2022): Standupmaths has made a nice video about this same topic here.

  1. Sun Fangyan
    February 28th, 2012 at 12:15 | #1

    Nice job! It’s really a good idea to develope the ”look and say ” sequence and have differents versions! And the version on binary is much simpler than the old version as you say. Just a question that i’m not sure: are these 10 subsequences really a basis of a vectorial space of dimension 10 ? I have a little difficulty in identifing all these with what we learn in our maths lessons. Thanks a lot and sorry for the errors of orthaugraphe. I’m not a native English speaker. I’m Chinese and I study now in France.

  2. February 28th, 2012 at 16:11 | #2

    @Sun Fangyan – Nope, when I referred to those subsequences as “basic”, I didn’t mean to imply any relationship with vector bases — I’m using the term very informally. They are “basic” in that, once you understand how they evolve, you understand how *all* binary sequences evolve. However, the operator T is not invertible, so these subsequences are not a basis in the linear algebra sense.

  3. Sun Fangyan
    May 21st, 2012 at 10:55 | #3

    Still a question.For the evolution of the length of the sequence,why don’t you use the same kind of matrix for the matrix classique and the binary one.I mean, why don’t you ,for the coefficents of matrix T, use the division of the lengths*the time that it appears in the next sequence. Why are there 1 everywhere?

  4. Sun Fangyan
    May 21st, 2012 at 11:21 | #4

    so i use the same method of derivation of the conway sequence and i have P := (x^2-2)*(x^8-2*x^7+x^6-x^5+(34/25)*x^4-(9/25)*x^3)as characteristic polynomial,so that way i have 2^1/2 as grow race,around 1.412.
    so what’s the problem?

  5. May 21st, 2012 at 18:09 | #5

    @Sun Fangyan – You can do it either way, either with 1’s everywhere or with the ratio of the lengths. If you use the ratio of the lengths, then the transition matrix becomes the transpose of:

    T =
    0 2 0 0 0 0 0 0 0 0
    1/2 0 1 0 0 0 0 0 0 0
    0 0 0 0 2 0 0 0 0 0
    0 0 2/3 1 0 0 0 0 0 0
    0 0 0 0 0 5/4 0 0 0 0
    0 0 0 3/5 0 0 3/5 0 0 0
    0 0 0 0 0 0 0 0 5/3 0
    0 0 1/2 0 0 0 0 1 0 0
    0 0 0 0 0 0 0 0 0 6/5
    0 0 0 0 0 0 1/2 2/3 0 0

    The maximal eigenvalue of this matrix is 1.465571, just like the matrix presented in the post. I’m not sure why you’re getting a value of sqrt(2) — perhaps if you posted the matrix you’re using we could find out where the discrepancy lies.

  6. Sun Fangyan
    May 23rd, 2012 at 10:56 | #6

    Thanks a lot. I have found my error, I made a mistake in one of the coefficient.But why do the two methods give the same answer? And what’s the difference between them?

  7. Ørjan Johansen
    May 28th, 2012 at 20:29 | #7

    @Sun Fangyan
    Multiplying by the length quotients corresponds to changing the vector basis of the vector space – essentially a renaming of the vectors. This changes which vectors are eigenvectors, but not the eigenvalue.

  8. Jason T. Isaiah
    January 9th, 2013 at 02:59 | #8

    Hi, Can you site some properties of this binary look and say sequence. This variant is awesome and I would like to know more about it. Anyway, I’m planning to make a small research on this and want to know if you have any established properties on this binary look and say. I would be very happy if you can send information about it. Thanks and happy new year! Hope for a fast respond.

  9. January 10th, 2013 at 09:23 | #9

    @Jason T. Isaiah – Unfortunately, I don’t know much more about it than what’s in this blog post and on its OEIS page.

    Good luck with your project though, and let me know if you find out anything else interesting about the sequence!

  10. Jason T. Isaiah
    January 24th, 2013 at 13:03 | #10

    No worries Nathaniel. Thank you, your post is sure a head start for me. :))

    @Nathaniel

  11. Gail Sacatani
    August 14th, 2015 at 07:34 | #11

    What are the applications of the look and say sequence in real life?

  12. April 14th, 2020 at 21:58 | #12

    Cool stuff! I cited your work in my paper https://arxiv.org/abs/2004.06414/.

    I’m working on a similar recursion, where each term of the sequence is the binary description of the bit-wise compliment of the previous term.

  13. July 10th, 2022 at 05:37 | #13

    Thanks a lot. I have found my error, I made a mistake in one of the coefficient.But why do the two methods give the same answer? And what’s the difference between them?

  14. Pascal
    July 26th, 2022 at 06:03 | #14

    In the matrix representation, did you take into account that a vector in this basis does not contain an order, so (3)(1) = (1,0,1,0,0,0,0,0,0,0) = (1)(3). But (1)(3) is not (3)(1) ?! 110 is not 101
    So modelling it in these state space is ambigious?

  15. September 6th, 2022 at 05:03 | #15

    @Pascal – Yes, the matrix only tells us how many times each of the subsequences appears in the next term of the sequence, not the order that they appear in. Since we are only interested in the length of the terms of the sequence, this is OK.

  16. Patrick
    September 14th, 2022 at 17:57 | #16

    The sequences you used as your “basis” seems to be arbitrary.

    1. What do you mean by “non-interacting subsequences”?

    I guess something like “If (n) evolves into (a_1)…(a_k) then (m) can’t evolve into (a_σ(1))…(a_σ(k)) for any permutation σ”, because otherwise column n would be the same as column m. And (a_i) =/= (a_j) for any i =/= j as well, so that we only end up with a matrix of zeroes and ones.

    2. Why did you choose the subsenquences like this? And more importantly, how did you make sure that those ten are all you need?

    For instance, I could start off with 1. Which I set to be (1).
    (1) 1 –> 11 (2)
    Now I can’t use (1)(1) to denote 11 by the rule above. I set 11 to be (2)
    (2) 11 –> 101 (3)(1)
    I don’t have 10 anywhere, so I set 10 to be (3).
    (3) 10 –> 1110 (2)(3)
    Now I have (1), (2), (3) only evolving to sequences of build by themselves. At this point we need to think of sequences that are not generated yet. 111 for instance isn’t in the list yet. However, 111 can be denoted by (1)(2) or (2)(1) so it doesn’t need its own number. At this point we can start and try to put subsequences in front or behind each other to get new ones. We had (1), (2), (1)(2), (2)(1), (3)(1) so next would be (1)(3) which would be 110. Since we can’t use (1)(3) we need to denote this subsequence by (4).
    (4) 110 –> 10110 (3)(4)
    And I can go on with this with (3)(2) or 101 which I denote by (5) since (2)(3) is already taken.
    (5) 101 –> 111011 (1)(4)(2)
    But here we could also have used (2)(5)(1) instead or let (6) denote 1110.

    Those choices will lead to different matrices and different Eigenvalues. My guess is, that the formulas for α and β will just change accordingly, iff the subsequnces can generate all sequences. But how do we make sure, that tehy do?

  17. Mike Steffen
    September 16th, 2022 at 14:35 | #17

    @Patrick

    1) What are non-interacting sub-sequences?

    Very informal here, but a “non-interacting subsequence” is one that begins with a 1 (and is not preceded by another 1) and ends with a 0 (and is not followed by another 0). Following the “look and say” algorithm, these subsequences will not interact with the preceding or following sequences when they evolve. E.g.:

    … X X 0|1 X X X … X X X 0|1 X X …

    The sub-sequence s_n of interest is between the bars ||. Note that the preceding subsquence will evolve into a sequence that ends in a ‘0’ since the final ‘look and say’ will say the number of ending zeros. Our subsequence s_n will evolve into one that starts with a ‘1’ since s_n begins with a non-zero number of ones. s_n will evolve into one that ends with a ‘0’ since the final ‘look and say’ will again say the number of ending zeros. And finally, the following subsequence will also evolve into one that starts with a ‘1’. Hence, all sub-sequences fitting the definition above will evolve into sequences that also fit that definition.

    The starting sub-sequences have a slightly different explanation but they don’t cause a problem. The ending sub-sequences are special cases (hence why ‘1’ and ’11’ are in the table), but as you apply the algorithm starting at the base case, you can see that ‘1’ and ’11’ are the only required ending subsequenes that do not fit the definition above.

    2) How do you choose these specific subsequences?

    Follow the “look and say” algorithm starting on the special base case of ‘1’. This evolves into the second special block ’11’. This evolves into ‘101’, which can be broken up into the following subsequnces: ’10’ (non-interacting) and ‘1’. Note that ‘1’ is still a special base-case, but it always occurs at the end of our number, so the ‘1’ can not interact with a following sub-sequence. ’10’ evolves into ‘1110’, another non-interacting subsequence. ‘1110’ evolves into ‘11110’ – yet another non-interacting subsequence. ‘11110’ evolves into ‘100110’, which can be broken into ‘100’ and ‘110’, both new non-interacting subsequences. ‘110’ evolves into ‘10110’ comprised of ’10’ (we have seen this before) and itself. Continuing this, you will find 10 unique subsequences. All remaining evolutions will result in sequences that can be decomposed into previously seen non-interacting subsequences.

    **

    Came here from Standup Maths. I really enjoyed this since it’s right at the boundary of where I am comfortable with math. I had a “if I was clever enough, I could have come up with this solution” moment. Too bad I’m not more clever!

  1. May 22nd, 2020 at 16:40 | #1