Archive

Posts Tagged ‘Integer Sequences’

Counting and Solving Final Fantasy XIII-2′s Clock Puzzles

February 6th, 2012

Final Fantasy XIII-2 is a role-playing game, released last week in North America, that contains an abundance of mini-games. One of the more interesting mini-games is the “clock puzzle”, which presents the user with N integers arranged in a circle, with each integer being from 1 to \lfloor N/2 \rfloor.

A challenging late-game clock puzzle with N = 12

The way the game works is as follows:

  1. The user may start by picking any of the N positions on the circle. Call the number in this position M.
  2. You now have the option of picking either the number M positions clockwise from your last choice, or M positions counter-clockwise from your last choice. Update the value of M to be the number in the new position that you chose.
  3. Repeat step 2 until you have performed it N-1 times.

You win the game if you choose each of the N positions exactly once, and you lose the game otherwise (if you are forced to choose the same position twice, or equivalently if there is a position that you have not chosen after performing step 2 a total of N-1 times). During the game, N ranges from 5 to 13, though N could theoretically be as large as we like.

Example

To demonstrate the rules in action, consider the following simple example with N = 6 (I have labelled the six positions 05 in blue for easy reference):

If we start by choosing the 1 in position 1, then we have the option of choosing the 3 in either position 0 or 2. Let’s choose the 3 in position 0. Three moves either clockwise or counter-clockwise from here both give the 1 in position 3, so that is our only possible next choice. We continue on in this way, going through the N = 6 positions in the order 103425, as in the following image:

We have now selected each position exactly once, so we are done – we solved the puzzle! In fact, this is the unique solution for the given puzzle.

Counting Clock Puzzles

Let’s work on determining how many different clock puzzles there are of a given size. As mentioned earlier, a clock puzzle with N positions has an integer in the interval [1, \lfloor N/2 \rfloor] in each of the  positions. There are thus \lfloor N/2 \rfloor^N distinct clock puzzles with N positions, which grows very quickly with N – its values for N = 1, 2, 3, … are given by the sequence 0, 1, 1, 16, 32, 729, 2187, 65536, 262144, … (A206344 in the OEIS).

However, this rather crude count of the number of clock puzzles ignores the fact that some clock puzzles have no solution. To illustrate this fact, we present the following simple proposition:

Proposition. There are unsolvable clock puzzles with N positions if and only if N = 4 or N ≥ 6.

To prove this proposition, first note that the clock puzzles for N = 2 or N = 3 are trivially solvable, since each number in the puzzle is forced to be \lfloor N/2 \rfloor = 1. The 32 clock puzzles in the N = 5 case can all easily be shown to be solvable via computer brute force (does anyone have a simple or elegant argument for this case?).

In the N = 4 case, exactly 3 of the 16 clock puzzles are unsolvable:

To complete the proof, it suffices to demonstrate an unsolvable clock puzzle for each N ≥ 6. To this end, we begin by considering the following clock puzzle in the N = 6 case:

The above puzzle is unsolvable because the only way to reach position 0 is to select it first, but from there only one of positions 2 or 4 can be reached – not both. This example generalizes in a straightforward manner to any N ≥ 6 simply by adding more 1′s to the bottom: it will still be necessary to choose position 0 first, and then it is impossible to reach both position 2 and position N-2 from there.

There doesn’t seem to be an elegant way to count the number of solvable clock puzzles with N positions (which is most likely related to the apparent difficulty of solving these puzzles, which will be discussed in the next section), so let’s count the number of solvable clock puzzles via brute force. Simply constructing each of the \lfloor N/2 \rfloor^N clock puzzles and determining which of them are solvable (via the MATLAB script linked at the end of this post) shows that the number of solvable clock puzzles for N = 1, 2, 3, … is given by the sequence 0, 1, 1, 13, 32, 507, 1998, 33136, 193995, … (A206345 in the OEIS).

This count of puzzles is perhaps still unsatisfying, though, since it counts puzzles that are simply mirror images or rotations of each other multiple times. Again, there doesn’t seem to be an elegant counting argument for enumerating the solvable clock puzzles up to rotation and reflection, so we compute this sequence by brute force: 0, 1, 1, 4, 8, 72, 236, 3665, 19037, … (A206346 in the OEIS).

Solving Clock Puzzles

Clock puzzles are one of the most challenging parts of Final Fantasy XIII-2, and with good reason: they are a well-studied graph theory problem in disguise. We can consider each clock puzzle with N positions as a directed graph with N vertices. If position N contains the number M, then there is a directed edge going from vertex N to the vertices M positions clockwise and counter-clockwise from it. In other words, we consider a clock puzzle as a directed graph on N vertices, where the directed edges describe the valid moves around the circle.

The directed graph corresponding to the earlier (solvable) N = 6 example

The problem of solving a clock puzzle is then exactly the problem of finding a directed Hamiltonian path on the associated graph. Because finding a directed Hamiltonian path in general is NP-hard, this seems to suggest that solving clock puzzles might be as well. There of course is the problem that the directed graphs relevant to this problem have very special structure – in particular, every vertex has outdegree ≤ 2, and the graph has a symmetry property that results from clockwise/counter-clockwise movement allowed in the clock puzzles.

The main result of [1] shows that the fact that the outdegree of each vertex is no larger than 2 is no real help: finding directed Hamiltonian paths is still NP-hard given such a promise. However, the symmetry condition seems more difficult to characterize in graph theoretic terms, and could potentially be exploited to produce a fast algorithm for solving these puzzles.

Regardless of the problem’s computational complexity, the puzzles found in the game are quite small (N ≤ 13), so they can be easily solved by brute force. Attached is a MATLAB script (solve_clock.m) that can be used to solve clock puzzles. The first input argument is a vector containing the numeric values in each of the positions, starting from the top and reading clockwise. By default, only one solution is computed. To compute all solutions, set the second (optional) input argument to 1.

The output of the script is either a vector of positions (labelled 0 through N-1, with 0 referring to the top position, 1 referring to one position clockwise from there, and so on) describing an order in which you can visit the positions to solve the puzzle, or 0 if there is no solution.

For example, the script can be used to find our solution to the N = 6 example provided earlier:

>> solve_clock([3,1,3,1,2,3])

ans =
    1 0 3 4 2 5

Similarly, the script can be used to find all four solutions to the puzzle in the screenshot at the very top of this post:

>> solve_clock([6,5,1,4,2,1,6,4,2,1,5,2], 1)

ans =
    3 7 11 9 10 5 4 2 1 8 6 0
    7 3 11 9 10 5 4 2 1 8 6 0
    9 10 5 4 2 3 7 11 1 8 6 0
    9 8 10 5 4 2 3 7 11 1 6 0

Download

References

  1. J. Plesnik. The NP-completeness of the Hamiltonian cycle problem in planar digraphs with degree bound two. Inform. Process. Lett., 8:199–201, 1979.

The Q-Toothpick Cellular Automaton

March 26th, 2011

The Q-toothpick cellular automaton (defined earlier this month by Omar E. Pol) is described by the following simple rules:

  1. On an infinite square grid, draw a quarter circle from one corner of a square to the opposite corner of that square:
  2. Call an endpoint of a quarter circle (or a “Q-toothpick”) exposed if it does not touch the endpoint of any other quarter circle.
  3. From each exposed endpoint, draw two more quarter circles, each of the same size as the first quarter circle you drew. Furthermore, the two quarter circles that you draw are the ones that can be drawn “smoothly” (without creating a 90° or 180° corner). Thus the next two generations of the automaton are (already-placed quarter circles are green, newly-added quarter circles are red):

The name “Q-toothpick” comes from its analogy to the more well-studied toothpick automaton (see Sloane’s A139250 and this paper), in which toothpicks (rather than quarter circles) are repeatedly placed on a grid where exposed ends of other toothpicks lie. In this post, we will examine how this automaton evolves over time, and in particular we will investigate the types of shapes that it produces.

Counting Q-Toothpicks

While the Q-toothpick automaton appears quite random and unpredictable for the first few generations, evolving past generation 6 or so reveals several patterns. The following image depicts the evolution of the automaton for its first 19 generations.

The first 19 generations of the Q-toothpick cellular automaton (red segments are pieces that are newly added in the current generation)

Perhaps the most notable pattern is that the grid is more or less filled up in an expanding square starting from the initial Q-toothpick. In fact, by inspecting generations 4, 6, 10, 18, we see that at generation 2n + 2 (n = 1, 2, 3, …) the automaton has roughly filled in a square of side length 2n+1 + 1, and then evolution continues from there on out of the corners of that square. Also, the number of cells added (A187211) at these generations can now easily be computed:

A187211(2n + 2) = 16 + 8(2n-1 – 1) for n ≥ 3.

Furthermore, the growth in the following generations repeats itself. In particular, we have:

A187211(2n + 3) = 22 for n ≥ 1,
A187211(2n + 4) = 40 for n ≥ 2,
A187211(2n + 5) = 54 for n ≥ 2.

Similarly, for n ≥ 3, the four values of A187211(2n + 6) through A187211(2n + 9) are similarly constant (their values are 56, 70, 120, and 134). In general, for n ≥ k the 2k-1 values of A187211(2n + 2k-1 + 2) through A187211(2n + 2k + 1) are constant in n, though I am not aware of a general formula for what these constants are. If we ignore the first four generations and arrange the number of Q-toothpicks added in each generation in rows of length 2n, we obtain a table that begins as follows:

22, 20
22, 40, 54, 40
22, 40, 54, 56, 70, 120, 134, 72
22, 40, 54, 56, 70, 120, 134, 88, 70, 120, 150, 168, 246, 360, 326, 136

C scripts are provided at the end of this post for computing the values of A187210 and A187211 (and hence the values in the above table).

Shapes Traced Out by Q-Toothpicks

In the graphic above that depicts the initial 19 generations of the Q-toothpick automaton, several shapes are traced out, including circles, diamonds, hearts, and several nameless blobs:

By far the most common of these shapes are circles, diamonds and hearts. The fourth shape appears only on the diagonal and it’s not difficult to see that it forever will make up the entirety of the diagonal (with the exception of the circle in the center). The fifth and sixth objects are the first two members of an infinite family of objects that appear as the automaton evolves. The fifth object first appears in generation 9, and sixth object (which is basically two copies of the fifth object) first appears in generation 17. The following object, which is basically made up of two copies of the sixth object (i.e., four copies of the fifth object) first appears in generation 33:

In general, a new object of this type (made of 2n copies of the fifth object above) first appears in generation 2n+3 + 1. In fact, these objects are the only ones that are traced out by this automaton. [Edit: this final claim is not true! See ebcube's great post that shows a double-heart shape in generation 31.]

Update [March 28, 2011]: I have added a script that counts the number of circles, diamonds, and hearts in the nth generation of the Q-toothpick automaton, and another script that computes Sloane’s A187212.

Download:

  • A187210.c – computes the total number of Q-toothpicks present in the nth generation
  • A187211.c – computes the number of Q-toothpicks added in the nth generation
  • A187212.c – computes the number of Q-toothpicks if we restrict them to the positive quadrant
  • count_shapes.c – computes the number of circles, diamonds, and hearts in the nth generation

The Maximum Score in the Game “Entanglement” is 9080

January 21st, 2011

Entanglement is a browser-based game that has gained a fair bit of popularity lately due to its recent inclusion in Google’s Chrome Web Store and Chrome 9. The way the game works is probably best understood by actually playing it, but here is my brief attempt:

  • You are given a hexagonal tile with six paths printed on it, with two path ends touching each side of the hexagon. One such tile is as follows:

  • You may rotate, but not move the hexagon that has been provided to you.
  • Once you have selected an orientation of the hexagon, a path is traced along that hexagon, and you are provided a new hexagon that you may rotate at the end of your current path.
  • The goal of the game is to create the longest path possible without running into either the centre hexagon or the outer edge of the game board.

To make things a bit more interesting, the game was updated in November 2010 to include a new scoring system that gives you 1 + 2 + 3 + … + n (the nth triangular number) points on a turn if you extend the length of your path by n on that turn. This encourages clever moves that significantly extend the length of the path all at once. The question that I am going to answer today is what the maximum score in Entanglement is under this scoring system (inspired by this reddit thread).

On a Standard-Size Game Board

The standard Entanglement game board is made up of a hexagonal ring of 6 hexagons, surrounded by a hexagonal ring of 12 hexagons, surrounded by a hexagonal ring of 18 hexagons, for a total of 36 hexagons. In order to maximize our score, we want to maximize how much we increase the length of our path on our final move. Thus, we want to just extend our path by a length of one on each of our first 35 moves, and then score big on the 36th move.

Well, each hexagon that we lay has six paths on it, for a total of 6*36 = 216 paths on the board. 35 of those paths will be used up by our first 35 moves. It is not possible to use all of the remaining 181 paths, however, because many of them lead into the edge of the game board or the central hexagon, and connecting to such a path immediately ends the game. Because there are 12 path ends that touch the central hexagon and 84 path ends that touch the outer border, there must be at least (12+84)/2 – 1 = 47 unused paths on the game board (we divided by 2 because each unused path takes up two path ends and we subtracted 1 because one of the paths will be used by us).

Thus we can add a length of at most 181 – 47 = 134 to our path on the 36th and final move of the game, giving a total score of at most 35 (from the first 35 moves of the game) + 1 + 2 + 3 + … + 134 = 35 + 9045 = 9080. Not only is this an upper bound of the possible scores, but it is actually attainable, as demonstrated by the following optimal game board:

Paths in red are unused, the green line depicts the portion of the path laid by the first 35 moves of the game, and the blue line depicts the portion of the path (of length 134) gained on the 36th move. One fun property of the above game board is that it is actually completely “unentangled” – no paths cross over any other paths.

On a Larger or Smaller Game Board

Other than being a good size for playability purposes, there is no reason why we couldn’t play Entanglement on a game board of larger or smaller radius (by radius I mean the number of rings of hexagons around the central hexagon – the standard game board has a radius of 3). We will compute the maximum score simply by mimicking our previous analysis for the standard game board. If the board has radius n, then there are 6 + 12 + 18 + … + 6n = 3n(n+1) hexagons, each of which contains 6 paths. Thus there are 18n(n+1) lengths of path, 3n(n+1)-1 of which are used in the first 3n(n+1)-1 moves of the game, and we want to add as many as possible of the remaining 15n(n+1)+1 lengths of path in the final move of the game. There are 12 path ends that touch the central hexagon and 12 + 24n path ends that touch the outer edge of the game board. Thus there are at least (12 + 12 + 24n)/2 – 1 = 11 + 12n unused paths on the game board.

Tallying the numbers up, we see that on the final move, we can add at most 15n(n+1)+1 – (11 + 12n) = 15n2 + 3n – 10 length of path. If T(n) = n(n+1)/2 is the nth triangular number, then we see that it’s not possible to obtain more than 3n(n+1)-1 + T(15n2 + 3n – 10) = (225/2)n4 + 45n3 – 135n2 – (51/2)n + 44 points. In fact, this score is obtainable via the exact same construction as the optimal board in the n = 3 case – just extend the (counter)clockwise rotation of the path in the obvious way. Thus, the maximum score for a game of Entanglement on a board of radius n for n = 1, 2, 3, … is given by the sequence 41, 1613, 9080, 29462, 72479, …

Further Variants of the “Look-and-Say” Sequence

January 13th, 2011

In two previous posts, I explored Conway’s famous “look-and-say” sequence 1, 11, 21, 1211, 111221, 312211, …, obtained by repeatedly describing the sequence’s previous term, as well as a simple binary variant of the sequence. In this post I will use similar techniques to explore some further variations of the sequence – a version where each term in the sequence is read in ternary, and a related sequence where no digit larger than 2 may be used when describing its terms.

As with the regular look-and-say sequence, the way we will attack these sequences is by constructing a “periodic table” of elementary non-interacting subsequences that all terms in the sequence are made up of. Then standard recurrence relation techniques will allow us to determine the rate of growth of the length of the terms in the sequences as well as the limiting distribution of the different digits in the sequence.

The Ternary Look-and-Say Sequence

Since we have already looked at the regular (i.e., decimal) look-and-say sequence, which is equivalent to the base-4 version of the sequence since it never contains a digit of 4 or larger, and we have also looked at the binary version of the sequence, it makes sense to ask what happens in the intermediate case of the ternary (base-3) version of the sequence: 1, 11, 21, 1211, 111221, 1012211, … (see A001388).

As always, we begin by listing the noninteracting subsequences that make this version of the sequence tick. Not surprisingly, it is more complicated than the corresponding table (of 10 subsequences) in the binary case, but not as complicated as the corresponding table (of 92 subsequences) in the decimal case.

# Subsequence Evolves Into
1 1 (3)
2 10 (5)
3 11 (19)
4 110 (21)
5 1110 (2)(4)
6 111210 (2)(8)
7 111221 (2)(16)
8 1121110 (22)(4)
9 112211 (23)
10 112221 (21)(20)
11 11222110 (21)(24)
12 1122211210 (21)(25)
13 1211 (7)
14 121110 (6)(4)
15 1221 (9)
16 12211 (10)
17 122110 (11)
18 1221121110 (12)(4)
19 21 (13)
20 211 (15)
21 2110 (17)
22 211210 (18)
23 212221 (14)(20)
24 22110 (26)
25 221121110 (27)(4)
26 222110 (2)(24)
27 22211210 (2)(25)

The (27×27) transition matrix for this evolution rule is included in the text file at the end of this post. Its characteristic polynomial is

The maximal eigenvalue of the transition matrix is thus the largest root of x3 – x – 1, which is approximately 1.324718. It follows that the number of digits in the terms of this sequence grows on average by about 32.5% from one term to the next.

The Look-and-Say Sequence with Digits 1 and 2

Closely related to the ternary version of the sequence is the sequence obtained by reading the previous term in the sequence, but with the restriction that you can never use a number larger than 2 (see A110393). This sequence begins 1, 11, 21, 1211, 111221, 21112211, …, and the sixth term is obtained by reading the fifth term as “two ones, one one, two twos, one one”. Because only two different digits appear in this sequence, it is perhaps not surprising that its table of noninteracting subsequences is quite simple:

# Subsequence Evolves Into
1 1 (2)
2 11 (5)
3 111 (7)
4 1211 (3)(6)(1)
5 21 (4)
6 22 (6)
7 2111 (1)(6)(3)

The transition matrix associated with this evolution rule is

As before, the average rate of growth of the number of digits in the terms of this sequence is determined by the magnitude of the largest eigenvalue of this matrix. A simple calculation reveals that this eigenvalue is √φ = 1.272…, where φ = (1 + √5)/2 is the golden ratio. Furthermore, we can answer the question of how many 1s there are in the terms of this sequence compared to 2s by looking at the eigenvector corresponding to the maximal eigenvalue:

What this means is, for example, that the second elementary subsequence (11) occurs φ times as frequently as the fourth elementary subsequence (1211). By weighting the subsequences by the entries in this vector appropriately, we can calculate the limiting ratio of the number of ones to the number of twos as

Download: Transition matrices [plaintext file]

The Binary “Look-and-Say” Sequence

November 7th, 2010

The look-and-say sequence (which I talked about here) is the sequence that you get by starting with the number 1 and constructing the next term in the sequence by “reading” the previous term. So 1 becomes “one one”, or 11. That becomes “two ones”, or 21. That becomes “one two, one one”, or 1211, and so on.

In this post, I am going to investigate the related binary version of the sequence, which starts off 1, 11 much like the regular sequence. But then when reading 11, we read it as “two ones”. Since two in binary is 10, the next term in the sequence is 101. When reading that term, we read it as “one one, one zero, one one”, so the next term is 111011. That term is read as “three ones, one zero, two ones”, and since three is 11 in binary and two is 10 in binary, the next term is 11110101, and so on. In this post we will answer two questions in particular about this sequence:

1) On average, how much longer is the (n+1)th term in the sequence than the nth term in the sequence?

2) On average, what is the ratio of the number of ones to the number of zeroes in the sequence?

Non-Interacting Subsequences

Much like the regular look-and-say sequence, we are able to study this sequence by constructing a “basis” of non-interacting subsequences that every term in the binary look-and-say sequence is made up of. Fortunately, constructing such a family of subsequences for the binary version of the look-and-say sequence is much simpler than it is for the decimal version of the sequence – here we only need ten different basic subsequences (whereas we needed 92 different subsequences for the regular look-and-say sequence!). These ten subsequences, and the subsequences they evolve into, are summarized in the following table.

# Subsequence Evolves Into
1 1 (2)
2 11 (3)(1)
3 10 (5)
4 110 (3)(4)
5 1110 (6)
6 11110 (7)(4)
7 100 (9)
8 1100 (3)(8)
9 11100 (10)
10 111100 (7)(8)

So for example, the first term in the sequence, 1, evolves into the subsequence (2), which is 11. That term then evolves into subsequence (3) followed by subsequence (1), or 101. That term then evolves into the subsequence (5) followed by the subsequence (2), or 111011, and so on. The reason that this representation of the sequence is useful is we can use it to describe the evolution of the binary look-and-say sequence entirely within a matrix T. In particular, we let T be the matrix with 1 in its (i,j) entry if the subsequence (i) appears in the evolution rule for subsequence (j), and 0 in its (i,j) entry otherwise:

Now if v is a 10-dimensional vector whose ith entry indicates how many times the subsequence (i) appears in a particular term of the binary look-and-say sequence, it follows that the entries of Tv tell us how many times each subsequence appears in the next term of the binary look-and-say sequence. So it follows from standard theory of linear homogeneous recurrence relations that we can now read off all of the long-term behaviour of the binary look-and-say sequence from the eigenvalues and eigenvectors of T.

Rate of Growth of the Sequence

The asymptotic rate of growth of the number of digits in the terms of the binary look-and-say sequence is simply the magnitude of the largest eigenvalue of the transition matrix T above. Using Maple it is simple to derive this value. If Ln is the number of digits in the nth term of the binary look-and-say sequence, then

This limit is approximately 1.465571, which means that the binary version of this sequence grows much faster than the decimal version of the sequence (recall that the growth rate of the number of digits of the regular look and say sequence is approximately 1.303577). This limit is also the unique real root of the cubic x3 – x2 – 1, which follows from the fact that the characteristic polynomial of T is

Ratio of Number of Ones to Zeroes

If we let Nn denote the number of ones in the nth term of the binary look-and-say sequence, and if we let Zn denote the number of zeroes in the nth term of the sequence, what is

In other words, what is the average ratio of ones to zeroes in this sequence? The following table shows the value of Nn/Zn for n = 3, 4, …, 25, which might give some intuition to the problem:

n Nn/Zn
3 2.000
4 5.000
5 3.000
6 2.000
7 2.000
8 2.000
9 1.786
10 1.762
11 1.742
12 1.717
13 1.691
14 1.690
15 1.680
16 1.676
17 1.672
18 1.671
19 1.669
20 1.668
21 1.667
22 1.667
23 1.666
24 1.666
25 1.666

Based on numerical estimates like those given in the table above, it has been conjectured that the limiting ratio is 5/3 (or some nearby value). We will now show that the limit does indeed exist, but its value is not 5/3 — it just happens to be really close to 5/3.

Much like the maximal eigenvalue of T tells us the overall growth rate of the sequence, the corresponding eigenvector tells us the distribution of the different subsequences that are present in the limit. Once we know the distribution of the individual subsequences, it is not difficult to find out the overall ratio of ones to zeroes by weighing the different subsequences appropriately. So our first step is to find the eigenvector corresponding to the maximal eigenvalue. To this end, it will be convenient to let

α is the same as in the previous section, and β is exactly the growth rate limit that we computed. Then the eigenvector corresponding to the maximal eigenvalue of T is:

What this means is that, in the limit, the fifth subsequence, 1110, is β times as frequently-occurring as the sixth subsequence, 11110 (for example). Now we just weigh each subsequence according to how many zeros and ones they contain, and we find the limiting ratio of ones to zeroes is

In particular, this ratio does not equal 5/3, but rather its decimal expansion begins 1.6657272222676… (which is less than 1/1000 away from 5/3).

A Derivation of Conway’s Degree-71 “Look-and-Say” Polynomial

October 31st, 2010

The look-and-say sequence is the sequence of numbers 1, 11, 21, 1211, 111221, 312211, …, in which each term is constructed by “reading” the previous term in the sequence. For example, the term 1 is read as “one 1″, which becomes the next term: 11. Then 11 is read as “two ones”, which becomes the next term: 21, and so on.

The remarkable thing about this sequence is that even though it seems at first glance to be quite arbitrary and non-mathematical, it has some interesting properties that were unearthed by John Conway. Most notably, he showed that the number of digits in each term of the sequence on average grows by about 30% from one term to the next. A bit more specifically, he showed that if Ln is the number of digits in the nth term in the sequence, then

where λ is the unique positive real root of the following degree-71 polynomial:

In order to demystify this seemingly bizarre fact, in this post we will show where this polynomial comes from and prove that the above limit does indeed equal its largest root (which happens to be its one and only positive real root).

The Cosmological Theorem

What lets us formally study the look-and-say sequence is a rather ominous-sounding result known as the cosmological theorem, which says that the eighth term and every term after it in the sequence is made up of one or more of 92 “basic” non-interacting subsequences. These 92 basic subsequences are summarized in lexicographical order in the following table. The fourth column in the table says what other subsequence(s) the given subsequence evolves into. For example, the first subsequence, 1112, evolves into the 63rd subsequence: 3112. Similarly, the second subsequence, 1112133, evolves into the 64th subsequence followed by the 62nd subsequence: 31121123.

# Subsequence Length Evolves Into
1 1112 4 (63)
2 1112133 7 (64)(62)
3 111213322112 12 (65)
4 111213322113 12 (66)
5 1113 4 (68)
6 11131 5 (69)
7 111311222112 12 (84)(55)
8 111312 6 (70)
9 11131221 8 (71)
10 1113122112 10 (76)
11 1113122113 10 (77)
12 11131221131112 14 (82)
13 111312211312 12 (78)
14 11131221131211 14 (79)
15 111312211312113211 18 (80)
16 111312211312113221133211322112211213322112 42 (81)(29)(91)
17 111312211312113221133211322112211213322113 42 (81)(29)(90)
18 11131221131211322113322112 26 (81)(30)
19 11131221133112 14 (75)(29)(92)
20 1113122113322113111221131221 28 (75)(32)
21 11131221222112 14 (72)
22 111312212221121123222112 24 (73)
23 111312212221121123222113 24 (74)
24 11132 5 (83)
25 1113222 7 (86)
26 1113222112 10 (87)
27 1113222113 10 (88)
28 11133112 8 (89)(92)
29 12 2 (1)
30 123222112 9 (3)
31 123222113 9 (4)
32 12322211331222113112211 23 (2)(61)(29)(85)
33 13 2 (5)
34 131112 6 (28)
35 13112221133211322112211213322112 32 (24)(33)(61)(29)(91)
36 13112221133211322112211213322113 32 (24)(33)(61)(29)(90)
37 13122112 8 (7)
38 132 3 (8)
39 13211 5 (9)
40 132112 6 (10)
41 1321122112 10 (21)
42 132112211213322112 18 (22)
43 132112211213322113 18 (23)
44 132113 6 (11)
45 1321131112 10 (19)
46 13211312 8 (12)
47 1321132 7 (13)
48 13211321 8 (14)
49 132113212221 12 (15)
50 13211321222113222112 20 (18)
51 1321132122211322212221121123222112 34 (16)
52 1321132122211322212221121123222113 34 (17)
53 13211322211312113211 20 (20)
54 1321133112 10 (6)(61)(29)(92)
55 1322112 7 (26)
56 1322113 7 (27)
57 13221133112 11 (25)(29)(92)
58 1322113312211 13 (25)(29)(67)
59 132211331222113112211 21 (25)(29)(85)
60 13221133122211332 17 (25)(29)(68)(61)(29)(89)
61 22 2 (61)
62 3 1 (33)
63 3112 4 (40)
64 3112112 7 (41)
65 31121123222112 14 (42)
66 31121123222113 14 (43)
67 3112221 7 (38)(39)
68 3113 4 (44)
69 311311 6 (48)
70 31131112 8 (54)
71 3113112211 10 (49)
72 3113112211322112 16 (50)
73 3113112211322112211213322112 28 (51)
74 3113112211322112211213322113 28 (52)
75 311311222 9 (47)(38)
76 311311222112 12 (47)(55)
77 311311222113 12 (47)(56)
78 3113112221131112 16 (47)(57)
79 311311222113111221 18 (47)(58)
80 311311222113111221131221 24 (47)(59)
81 31131122211311122113222 23 (47)(60)
82 3113112221133112 16 (47)(33)(61)(29)(92)
83 311312 6 (45)
84 31132 5 (46)
85 311322113212221 15 (53)
86 311332 6 (38)(29)(89)
87 3113322112 10 (38)(30)
88 3113322113 10 (38)(31)
89 312 3 (34)
90 312211322212221121123222113 27 (36)
91 312211322212221121123222122 27 (35)
92 32112 5 (37)

The important thing about this particular basis of subsequences is that the evolution of any sequence made up of these subsequences is determined entirely by the evolution rule for the subsequences given in the final column of the above table. For example, the eighth term in the look-and-say sequence is 1113213211 = (24)(39). The subsequence (24) evolves into (83) and the subsequence (39) evolves into (9), so the ninth term in the look-and-say sequence is (83)(9), which is 31131211131221.

Computing the Number of Digits in Sequences

Since the evolution of every term in the look-and-say sequence after the eighth can be computed using the table above, we can easily compute the length of every term after the eighth as well. For example, the eighth term in the sequence evolves into (83)(9), so the number of digits of the ninth term in the sequence is 6 + 8 = 14. The subsequence (83) evolves into a subsequence with 10 digits, and (9) evolves into a subsequence with 10 digits, so the tenth term in the look-and-say sequence has 10 + 10 = 20 digits.

All of the information about how the lengths of the 92 subsequences change can be represented in a 92×92 matrix T. In particular, the matrix T has its (i,j) entry equal to Cij × ℓi/ℓj, where Cij is the number of times subsequence (i) appears in the evolution rule for subsequence (j) and ℓi is the length of subsequence (i). This matrix is represented in the following image – white squares represent zero entries in the matrix, and black squares represent the number 2, which is the largest value present in the matrix. Shades of grey represent non-zero numbers, with larger numbers being darker.

Then if we represent a term in the look-and-say sequence as a vector v with its ith entry being ci × ℓi, where ci is the number of times the subsequence (i) appears in that term, we find that the sum of the entries in v is the total length of that term of the look-and-say sequence. More important, however, is the fact that the sum of the entries in Tv is the length of the next term in the look-and-say sequence. The sum of the entries in T2v is the length of the next term in the look-and-say sequence, and so on. So we have found a degree-92 recurrence relation for the length of terms in the look-and-say sequence, and the corresponding transition matrix is T.

Computing the Limit

It is a basic fact of linear homogeneous recurrence relations that a closed-form solution to the recurrence relation can be written down in terms of the eigenvalues of the transition matrix (see the linked Wikipedia page for specifics). As a corollary of this, the limiting ratio of terms in the sequence is equal to the spectral radius of the transition matrix. Fortunately, the transition matrix in this case is quite sparse, so its characteristic polynomial isn’t too difficult to compute:

Indeed, the degree-71 polynomial that λ is a root of is one of the factors of the characteristic polynomial of the transition matrix T. All that remains to do is to get MATLAB to compute the largest root of that polynomial (i.e., the spectral radius of T):

>> max(abs(eig(T)))

ans =
    1.303577269034287

The matrix T is attached below for those who would like to play with it. Something fun to think about: what do the rational eigenvalues (-1, 0, and 1) of T represent?

Download: Transition matrix [plaintext file]