## The Minimal Superpermutation Problem

Imagine that there is a TV series that you want to watch. The series consists of n episodes, with each episode on a single DVD. Unfortunately, however, the DVDs have become mixed up and the order of the episodes is in no way marked (and furthermore, the episodes of the TV show are not connected by any continuous storyline – there is no way to determine the order of the episodes just from watching them).

Suppose that you want to watch the episodes of the TV series, consecutively, in the correct order. The question is: how many episodes must you watch in order to do this?

To illustrate what we mean by this question, suppose for now that n = 2 (i.e., the show was so terrible that it was cancelled after only 2 episodes). If we arbitrarily label one of the episodes “1” and the other episode “2”, then we could watch the episodes in the order “1”, “2”, and then “1” again. Then, regardless of which episode is really the first episode, we’ve seen the two episodes consecutively in the correct order. Furthermore, this is clearly minimal – there is no way to watch fewer than 3 episodes while ensuring that you see both episodes in the correct order.

So what is the minimal number of episodes we must watch for a TV show consisting of n episodes? Somewhat surprisingly, no one knows. So let’s discuss what *is* known.

### Minimal Superpermutations

Rephrased a bit more mathematically, we are interested in finding a shortest possible string on the symbols “1”, “2”, …, “n” that contains every permutation of those symbols as a contiguous substring. We call a string that contains every permutation in this way a *superpermutation*, and one of minimal length is called a *minimal superpermutation*. Minimal superpermutations when n = 1, 2, 3, 4 are easily found via brute-force computer search, and are presented here:

n | Minimal Superpermutation | Length |
---|---|---|

1 | 1 | 1 |

2 | 121 | 3 |

3 | 123121321 | 9 |

4 | 123412314231243121342132413214321 | 33 |

By the time n = 5, the strings we are looking for are much too long to find via brute-force. However, the strings in the n ≤ 4 cases provide some insight that we can hope might generalize to larger n. For example, there is a natural construction that allows us to construct a short superpermutation on n+1 symbols from a short superpermutation on n symbols (which we will describe in the next section), and this construction gives the minimal superpermutations presented in the above table when n ≤ 4.

Similarly, the minimal superpermutations in the above table can be shown via brute-force to be unique (up the relabeling the characters – for example, we don’t count the string “213212312” as distinct from “123121321”, since they are related to each other simply by interchanging the roles of “1” and “2”). Are minimal superpermutations unique for all n?

### Minimal Length

A trivial lower bound on the length of a superpermutation on n symbols is n! + n – 1, since it must contain each of the n! permutations as a substring – the first permutation contributes a length of n characters to the string, and each of the remaining n! – 1 permutations contributes a length of at least 1 character more.

It is not difficult to improve this lower bound to n! + (n-1)! + n – 2 (I won’t provide a proof here, but the idea is to note that when building the superpermutation, you can not add more than n-1 permutations by appending just 1 character each to the string – you eventually have to add 2 or more characters to add a permutation that is not already present). In fact, this argument can be stretched further to show that n! + (n-1)! + (n-2)! + n – 3 is a lower bound as well (a rough proof is provided here). However, the same arguments do **not** seem to extend to lower bounds like n! + (n-1)! + (n-2)! + (n-3)! + n – 4 and so on.

There is also a trivial upper bound on the length of a minimal superpermutation: n×n!, since this is the length of the string obtained by writing out the n! permutations in order without overlapping. However, there is a well-known construction of small superpermutations that provides a much better upper bound, which we now describe.

Suppose we know a small superpermutation on n symbols (such as one of the superpermutations provided in the table in the previous section) and we want to construct a small superpermutation on n+1 symbols. To do so, simply replace each permutation in the n-symbol superpermutation by: (1) that permutation, (2) the symbol n+1, and (3) that permutation again. For example, if we start with the 2-symbol superpermutation “121”, we replace the permutation “12” by “12312” and we replace the permutation “21” by “21321”, which results in the 3-symbol superpermutation “123121321”. The procedure for constructing a 4-symbol superpermutation from this 3-symbol superpermutation is illustrated in the following diagram:

It is a straightforward inductive argument to show that the above method produces n-symbol superpermutations of length for all n. Although it has been conjectured that this superpermutation is minimal [1], this is only known to be true when n ≤ 4.

### Uniqueness

As a result of minimal superpermutations being unique when n ≤ 4, it has been conjectured that they are unique for all n [1]. However, it turns out that there are in fact *many* superpermutations of the conjectured minimal length – the main result of [2] shows that there are at least

distinct n-symbol superpermutations of the conjectured minimal length. For n ≤ 4, this formula gives the empty product (and thus a value of 1), which agrees with the fact that minimal superpermutations are unique in these cases. However, the number of distinct superpermutations then grows extremely quickly with n: for n = 5, 6, 7, 8, there are at least 2, 96, 8153726976, and approximately 3×10^{50} superpermutations of the conjectured minimal length. The 2 such superpermutations in the n = 5 case are as follows (each superpermutation has length 153 and is written on two lines):

12345123415234125341235412314523142531423514231542312453124351243152431254312

1345213425134215342135421324513241532413524132541321453214352143251432154321

and

12345123415234125341235412314523142531423514231542312453124351243152431254312

1354213524135214352134521325413251432513425132451321543215342153241532145321

Similarly, a text file containing all 96 known superpermutations of the expected minimal length 873 in the n = 6 case can be viewed here. It is unknown, however, whether or not these superpermutations are indeed minimal or if there are even more superpermutations of the conjectured minimal length.

**Update [Aug. 13, 2014]:** Ben Chaffin has shown that minimal superpermutations in the n = 5 case have length 153, and he has also shown that there are exactly 8 (not just 2) distinct minimal superpermutations in this case. See the write up here.

**IMPORTANT UPDATE [August 22, 2014]:** Robin Houston has disproved the minimal superpermutation conjecture for all n ≥ 6. See here.

**References**

- D. Ashlock and J. Tillotson. Construction of small superpermutations and minimal injective superstrings.
*Congressus Numerantium*, 93:91–98, 1993.

- N. Johnston. Non-uniqueness of minimal superpermutations.
*Discrete Mathematics*, 313:1553–1557, 2013.

**Other Random Links Related to This Problem**

- A180632 – the main OEIS entry for this problem
- Permutation Strings – a short note written by Jeffrey A. Barnett about this problem
- Generate sequence with all permutations – a stackoverflow post about this problem
- What is the shortest string that contains all permutations of an alphabet? – a mathexchange post about this problem
- The shortest string containing all permutations of n symbols – an (archived) XKCD forums post that I made about this problem a couple years ago

Possibly of interest:

http://www.site.uottawa.ca/~lucia/courses/5165-10/UniPermAppTheory.pdf

Hi,

What happens when the size of the symbol set and the resulting strings are of different lengths?

Specifically, when n is 0..9 and the length is 4, and in a normal pin code?

All door locks I’ve seen uses the last 4 entered digits, making it possible in theory to brute force it, hopefully in less than 10000 attempts. I’ve gotten a feeling it ought to be a solved problem already, I just don’t know what to search for.

Regards,

/Daniel

@Daniel – When the string length is strictly less than the number of symbols (for example, in the 10-symbol case of length 4 that you mentioned), the problem is indeed already solved (it was solved in the Ashlock-Tillotson paper [1] referenced in the blog post). In particular, if you have n symbols and are interested in permutations of length k (k < n) then the minimal superstring has length n!/(n-k)! + k - 1. This gives a length of 5043 in the n = 10, k = 4 case you asked about. Note that this only applies to the case where repeated digits in the code are not allowed (e.g., "1413" is not a valid code since it uses the digit "1" twice). If repeated digits are allowed, then the answer is given by de Bruijn sequences. In particular, the minimum length in this case is n^k + k - 1, which equals 10003 in the n = 10, k = 4 case.

Super, I’ll check those ones out!

I will sleep well tonight, I’ve been thinking about that de Brujin thing for way too many years. 🙂

/Daniel

121

1-2-1

123121321

123-121-321

123412314231243121342132413214321

1234-1231-4231-2431-2-1342-1324-1321-4321

My first observation is that they are palindromes.

It sounds like this is very close to the study of Debrujin sequences. Given an alphabet with A letters, a Debrujin sequence D(A,k) contains every possible subsequence (possibly with repetitions) of length k. The permutation restriction just says that you don’t want any repeated letters in your subsequences, which is a nice restriction to take.

Is there a requirement that the superpermutation must start on a 1? Because it would seem that if 121 is a valid superpermutation for n=2, then so too would 212.

@Pomax

I don’t think there is such a requirement.

Think about it like this: Using all permutations of ABC you can make a superpermutation: ABCABACBA

Now instead of having A=1, B=2, C=3… you could pick any permutation of ABC and 123, for example A=2, B=1, C=3. All permutations will result in a different valid superpermutation. Those superpermutations are in essence the same, so it is probably common to turn them into the lexicographically smallest form.

Hi Everyone!

I wasn’t aware of the Haruhi problem up to these days, when someone on 4chan gave a kind of solution.

I had a look on it and I’d like to post here my approach.

The shortest string containing all the permutations of n digits occurs when you are able to take advantage of the maximum overlap between each permutation, obviously.

Given a permutation of n digits, I can find at maximum (n – 1) different permutations that grant the maximum overlap (n – 1) with an overall length of (2n – 1) digits.

Than I have to lower down to an overlap of (n – 2) for finding a new different permutation that grants other (n – 1) different ones with maximum overlap (n – 1); this new series has length equal to 2n – 1 – (n – 2) (I’m subtracting n – 2 because of the overlap with the previous permutation series).

Lets call L a string of n permutations with overlap (n – 1) between them.

I can put in a row (n – 1) L strings with overlap between them equal to (n – 2) than I have to lower down to (n – 3) for being able to create a new block.

Here it follows the L string lengths for n = 5:

L1 => length 9 (2n – 1)

L2 => length 6 (2n – 1 – (n – 2) for the overlap with L1)

L3 => length 6

L4 => length 6

_______________________________________

L5 => length 7 (2n – 1 – (n – 3) for the overlap with L4)

L6 => length 6

L7 => length 6

L8 => length 6

_______________________________________

L9 => length 7 (2n – 1 – (n – 3) for the overlap with L8)

L10 => length 6

L11 => length 6

L12 => length 6

_______________________________________

L13 => length 7 (2n – 1 – (n – 3) for the overlap with L12)

L14 => length 6

L15 => length 6

L16 => length 6

_______________________________________

L17 => length 7 (2n – 1 – (n – 3) for the overlap with L16)

L18 => length 6

L19 => length 6

L20 => length 6

_______________________________________

L21 => length 7 (2n – 1 – (n – 3) for the overlap with L20)

L22 => length 6

L23 => length 6

L24 => length 6

Total = 152

Lower bound formula = n^2(n – 2)! + n – 3

Hope my post could be of interest.

Best regards.

Roberto, maybe obvious but n^2(n-2)! + n-3 == n! + (n-1)! + (n-2)! + n-3, which is the 4chan lower bound.

@Pomax

No, just redundant due to both comprising the same two permutations, just reordered e.g. 121 = 12 & 21 == 21 & 12 = 212

Nathaniel also mentions in the post: “Similarly, the minimal superpermutations in the above table can be shown via brute-force to be unique (up the relabeling the characters – for example, we don’t count the string “213212312” as distinct from “123121321”, since they are related to each other simply by interchanging the roles of “1” and “2”)”

Actually that is a wrong assumption. You’d have 4 x 10000 attempts = 40000.

Because that is the total string length you actually need to enter.

I was a bit confused at first too when I read Nathaniels answer. I thought: how can the De Bruijn sequence be larger (10003) than the amount of possible codes (10000)? Surely it must be at least equal if not less at best. But in fact you go from 40000 entered numbers to 10003 entered numbers. Quite the reduction.