Taylor Polynomials from Orthogonal Projections

July 23rd, 2018

One of the standard applications of orthogonal projections to function spaces comes in the form of Fourier series, where we use B = \{1,\cos(x),\sin(x),\cos(2x),\sin(2x),\ldots\} as a mutually orthogonal set of functions with respect to the usual inner product

\displaystyle\langle f, g \rangle = \int_{-\pi}^\pi f(x)g(x) \, \mathrm{d}x.

In particular, projecting a function onto the span of the first 2n+1 functions in the set B gives its order-n Fourier approximation:

\displaystyle f(x) \approx a_0 + \sum_{k=1}^n a_k \cos(kx) + \sum_{k=1}^n b_k \sin(kx),

where the coefficients a_0,a_1,a_2,\ldots,a_n and b_1,b_2,\ldots,b_n can be computed via inner products (i.e., integrals) in this space (see many other places on the web, like here and here, for more details).

A natural follow-up question is then whether or not we similarly get Taylor polynomials when we project a function down onto the span of the set of functions C = \{1,x,x^2,x^3,\ldots,x^n\}. More specifically, if we define \mathcal{C}[-1,1] to be the inner product space consisting of continuous functions on the interval [-1,1] with the standard inner product

\displaystyle\langle f, g \rangle = \int_{-1}^1 f(x)g(x) \, \mathrm{d}x,

and consider the orthogonal projection P : \mathcal{C}[-1,1] \rightarrow \mathcal{C}[-1,1] with range equal to \mathcal{P}_n[-1,1], the subspace of degree-n polynomials, is it the case that P(e^x) is the degree-n Taylor polynomial of e^x?

It does not take long to work through an example to see that, no, we do not get Taylor polynomials when we do this. For example, if we choose n = 2 then projecting the function e^x onto the  \mathcal{P}_2[-1,1] results in the function P(e^x) \approx 0.5367x^2 + 1.1036x + 0.9963, which is not its degree-2 Taylor polynomial p_2(x) = 0.5x^2 + x + 1. These various functions are plotted below (e^x is displayed in orange, and the two different approximating polynomials are displayed in blue):

Orthogonal projection versus Taylor polynomial

These plots illustrate why the orthogonal projection of e^x does not equal its Taylor polynomial: the orthogonal projection is designed to approximate e^x as well as possible on the whole interval [-1,1], whereas the Taylor polynomial is designed to approximate it as well as possible at x = 0 (while sacrificing precision near the endpoints of the interval, if necessary).

However, something interesting happens if we change the interval that the orthogonal projection acts on. In particular, if we let 0 < c \in \mathbb{R} be a scalar and instead consider the orthogonal projection P_c : \mathcal{C}[-c,c] \rightarrow \mathcal{C}[-c,c] with range equal to \mathcal{P}_2[-c,c], then a straightforward (but hideous) calculation shows that the best degree-2 polynomial approximation of e^x on the interval [-c,c] is

P_c(e^x) formula

While this by itself is an ugly mess, something interesting happens if we take the limit as c approaches 0:

\displaystyle\lim_{c\rightarrow 0^+} P_c(e^x) = \frac{1}{2}x^2 + x + 1,

which is exactly the Taylor polynomial of e^x. Intuitively, this makes sense and meshes with our earlier observations about Taylor polynomials approximating e^x at x = 0 as well as possible and orthogonal projections P_c(e^x) approximating e^x as well as possible on the interval [-c,c]. However, I am not aware of any proof that this happens in general (i.e., no matter what the degree of the polynomial is and no matter what (sufficiently nice) function is used in place of e^x), and I would love for a kind-hearted commenter to point me to a reference. [Update: user “jam11249” provided a sketch proof of this fact on reddit here.]

Here is an interactive Desmos plot that you can play around with to see the orthogonal projection approach the Taylor polynomial as c \rightarrow 0.

How to compute hard-to-compute matrix norms

January 11th, 2016

There are a wide variety of different norms of matrices and operators that are useful in many different contexts. Some matrix norms, such as the Schatten norms and Ky Fan norms, are easy to compute thanks to the singular value decomposition. However, the computation of many other norms, such as the induced p-norms (when p ≠ 1, 2, ∞), is NP-hard. In this post, we will look at a general method for getting quite good estimates of almost any matrix norm.

The basic idea is that every norm can be written as a maximization of a convex function over a convex set (in particular, every norm can be written as a maximization over the unit ball of the dual norm). However, this maximization is often difficult to deal with or solve analytically, so instead it can help to write the norm as a maximization over two or more simpler sets, each of which can be solved individually. To illustrate how this works, let’s start with the induced matrix norms.

Induced matrix norms

The induced p → q norm of a matrix B is defined as follows:

\displaystyle\|B\|_{p\rightarrow q}:=\max\big\{\|B\mathbf{x}\|_q : \|\mathbf{x}\|_p = 1\big\},


\displaystyle\|\mathbf{x}\|_p := \left(\sum_{i}|x_i|^p\right)^{1/p}

is the vector p-norm. There are three special cases of these norms that are easy to compute:

  1. When p = q = 2, this is the usual operator norm of B (i.e., its largest singular value).
  2. When p = q = 1, this is the maximum absolute column sum: \|B\|_{1\rightarrow 1} = \max_j\sum_i|b_{ij}|.
  3. When p = q = ∞, this is the maximum absolute row sum: \|B\|_{\infty\rightarrow \infty} = \max_i\sum_j|b_{ij}|.

However, outside of these three special cases (and some other special cases, such as when B only has real entries that are non-negative [1]), this norm is much messier. In general, its computation is NP-hard [2], so how can we get a good idea of its value? Well, we rewrite the norm as the following double maximization:

\displaystyle\|B\|_{p\rightarrow q}=\max\big\{|\mathbf{y}^*B\mathbf{x}| : \|\mathbf{x}\|_p = 1, \|\mathbf{y}\|_{q^\prime} = 1\big\},

where q^\prime is the positive real number such that 1/q + 1/q^\prime = 1 (and we take q^\prime = 1 if q = \infty, and vice-versa). The idea is then to maximize over \mathbf{x} and \mathbf{y} one at a time, alternately.

  1. Start by setting j = 1 and fixing a randomly-chosen vector \mathbf{x}_0, scaled so that \|\mathbf{x}_0\|_{p} = 1.
  2. Compute

    \max\big\{|\mathbf{y}^*B\mathbf{x}_{j-1}| : \|\mathbf{y}\|_{q^\prime} = 1\big\},

    keeping \mathbf{x}_{j-1} fixed, and let \mathbf{y}_j be the vector attaining this maximum. By Hölder’s inequality, we know that this maximum value is exactly equal to \|B\mathbf{x}_{j-1}\|_{q}. Furthermore, the equality condition of Hölder’s inequality tells us that the vector \mathbf{y}_j attaining this maximum is the one with complex phases that are the same as those of B\mathbf{x}_{j-1}, and whose magnitudes are such that |\mathbf{y}_j|^{q^\prime} is a multiple of |B\mathbf{x}_{j-1}|^q (here the notation |\cdot|^q means we take the absolute value and the q-th power of every entry of the vector).

  3. Compute

    \max\big\{|\mathbf{y}_j^*B\mathbf{x}| : \|\mathbf{x}\|_{p} = 1\big\},

    keeping \mathbf{y}_j fixed, and let \mathbf{x}_j be the vector attaining this maximum. By an argument almost identical to that of step 2, this maximum is equal to \|B^*\mathbf{y}_j\|_{p^\prime}, where p^\prime is the positive real number such that 1/p + 1/p^\prime = 1. Furthermore, the vector \mathbf{x}_j attaining this maximum is the one with complex phases that are the same as those of B^*\mathbf{y}_j, and whose magnitudes are such that |\mathbf{x}_j|^p is a multiple of |B^*\mathbf{y}_j|^{p^\prime}.

  4. Increment j by 1 and return to step 2. Repeat until negligible gains are made after each iteration.

This algorithm is extremely quick to run, since Hölder’s inequality tells us exactly how to solve each of the two maximizations separately, so we’re left only performing simple vector calculations at each step. The downside of this algorithm is that, even though it will always converge to some local maximum, it might converge to a value that is smaller than the true induced p → q norm. However, in practice this algorithm is fast enough that it can be run several thousand times with different (randomly-chosen) starting vectors \mathbf{x}_0 to get an extremely good idea of the value of \|B\|_{p\rightarrow q}.

It is worth noting that this algorithm is essentially the same as the one presented in [3], and reduces to the power method for finding the largest singular value when p = q = 2. This algorithm has been implemented in the QETLAB package for MATLAB as the InducedMatrixNorm function.

Induced Schatten superoperator norms

There is a natural family of induced norms on superoperators (i.e., linear maps \Phi : M_n \rightarrow M_n) as well. First, for a matrix X \in M_n, we define its Schatten p-norm to be the p-norm of its vector of singular values:

\|X\|_p := \left(\sum_{i=1}^n \sigma_i(X)^p\right)^{1/p}.

Three special cases of the Schatten p-norms include:

  • p = 1, which is often called the “trace norm” or “nuclear norm”,
  • p = 2, which is often called the “Frobenius norm” or “Hilbert–Schmidt norm”, and
  • p = ∞, which is the usual operator norm.

The Schatten norms themselves are easy to compute (since singular values are easy to compute), but their induced counter-parts are not.

Given a superoperator \Phi : M_n \rightarrow M_n, its induced Schatten p → q norm is defined as follows:

\|\Phi\|_{p\rightarrow q} := \max\big\{ \|\Phi(X)\|_q : \|X\|_p = 1 \big\}.

These induced Schatten norms were studied in some depth in [4], and crop up fairly frequently in quantum information theory (especially when p = q = 1) and operator theory (especially when p = q = ∞). The fact that they are NP-hard to compute in general is not surprising, since they reduce to the induced matrix norms (discussed earlier) in the case when \Phi only acts on the diagonal entries of X and just zeros out the off-diagonal entries. However, it seems likely that this norm’s computation is also difficult even in the special cases p = q = 1 and p = q = ∞ (however, it is straightforward to compute when p = q = 2).

Nevertheless, we can obtain good estimates of this norm’s value numerically using essentially the same method as discussed in the previous section. We start by rewriting the norm as a double maximization, where each maximization individually is easy to deal with:

\|\Phi\|_{p\rightarrow q} = \max\big\{ |\mathrm{Tr}(Y^*\Phi(X))| : \|X\|_p = 1, \|Y\|_{q^\prime} = 1\big\},

where q^\prime is again the positive real number (or infinity) satisfying 1/q + 1/q^\prime = 1. We now maximize over X and Y, one at a time, alternately, just as before:

  1. Start by setting j = 1 and fixing a randomly-chosen matrix X_0, scaled so that \|X_0\|_p = 1.
  2. Compute

    \max\big\{|\mathrm{Tr}(Y^*\Phi(X_{j-1})| : \|Y\|_{q^\prime} = 1\big\},

    keeping X_{j-1} fixed, and let Y_j be the matrix attaining this maximum. By the Hölder inequality for Schatten norms, we know that this maximum value is exactly equal to \|\Phi(X_{j-1})\|_{q}. Furthermore, the matrix Y_j attaining this maximum is the one with the same left and right singular vectors as \Phi(X_{j-1}), and whose singular values are such that there is a constant c so that \sigma_i(Y_j)^{q^\prime} = c\sigma_i(\Phi(X_{j-1}))^q for all i (i.e., the vector of singular values of Y_j, raised to the q^\prime power, is a multiple of the vector of singular values of \Phi(X_{j-1}), raised to the q power).

  3. Compute

    \max\big\{|\mathrm{Tr}(Y_j^*\Phi(X)| : \|X\|_{p} = 1\big\},

    keeping Y_j fixed, and let X_j be the matrix attaining this maximum. By essentially the same argument as in step 2, we know that this maximum value is exactly equal to \|\Phi^*(Y_j)\|_{p^\prime}, where \Phi^* is the map that is dual to \Phi in the Hilbert–Schmidt inner product. Furthermore, the matrix X_j attaining this maximum is the one with the same left and right singular vectors as \Phi^*(Y_j), and whose singular values are such that there is a constant c so that \sigma_i(X_j)^{p} = c\sigma_i(\Phi^*(Y_j))^{p^\prime} for all i.

  4. Increment j by 1 and return to step 2. Repeat until negligible gains are made after each iteration.

The above algorithm is almost identical to the algorithm presented for induced matrix norms, but with absolute values and complex phases of the vectors \mathbf{x} and \mathbf{y} replaced by the singular values and singular vectors of the matrices X and Y, respectively. The entire algorithm is still extremely quick to run, since each step just involves computing one singular value decomposition.

The downside of this algorithm, as with the induced matrix norm algorithm, is that we have no guarantee that this method will actually converge to the induced Schatten p → q norm; only that it will converge to some lower bound of it. However, the algorithm works pretty well in practice, and is fast enough that we can simply run it a few thousand times to get a very good idea of what the norm actually is. If you’re interested in making use of this algorithm, it has been implemented in QETLAB as the InducedSchattenNorm function.

Entanglement Norms

The central idea used for the previous two families of norms can also be used to get lower bounds on the following norm on M_m \otimes M_n that comes up from time to time when dealing with quantum entanglement:

\|X\|_{S(1)} := \max\Big\{\big|(\mathbf{v}\otimes\mathbf{w})^*X(\mathbf{x} \otimes \mathbf{y})\big| : \|\mathbf{v}\| = \|\mathbf{w}\| = \|\mathbf{x}\| = \|\mathbf{y}\| = 1\Big\}.

(As a side note: this norm, and some other ones like it, were the central focus on my thesis.) This norm is already written for us as a double maximization, so the idea presented in the previous two sections is somewhat clearer from the start: we fix randomly-generated vectors \mathbf{v} and \mathbf{x} and then maximize over all vectors \mathbf{w} and \mathbf{y}, which can be done simply by computing the left and right singular vectors associated with the maximum singular value of the operator

(\mathbf{v} \otimes I)^*X(\mathbf{x} \otimes I) \in M_n.

We then fix \mathbf{w} and \mathbf{y} as those singular vectors and then maximize over all vectors \mathbf{v} and \mathbf{x} (which is again a singular value problem), and we iterate back and forth until we converge to some value.

As with the previously-discussed norms, this algorithm always converges, and it converges to a lower bound of \|X\|_{S(1)}, but perhaps not its exact value. If you want to take this algorithm out for a spin, it has been implemented in QETLAB as the sk_iterate function.

It’s also worth mentioning that this algorithm generalizes straightforwardly in several different directions. For example, it can be used to find lower bounds on the norms \|\cdot\|_{S(k)} where we maximize on the left and right by pure states with Schmidt rank not larger than k rather than separable pure states, and it can be used to find lower bounds on the geometric measure of entanglement [5].


  1. D. Steinberg. Computation of matrix norms with applications to robust optimization. Research thesis. Technion – Israel University of Technology, 2005.
  2. J. M. Hendrickx and A. Olshevsky. Matrix p-norms are NP-hard to approximate if p ≠ 1,2,∞. 2009. E-print: arXiv:0908.1397
  3. D. W. Boyd. The power method for ℓp norms. Linear Algebra and Its Applications, 9:95–101, 1974.
  4. J. Watrous. Notes on super-operator norms induced by Schatten norms. Quantum Information & Computation, 5(1):58–68, 2005. E-print: arXiv:quant-ph/0411077
  5. T.-C. Wei and P. M. Goldbart. Geometric measure of entanglement and applications to bipartite and multipartite quantum states. Physical Review A, 68:042307, 2003. E-print: arXiv:quant-ph/0212030

Introducing QETLAB: A MATLAB toolbox for quantum entanglement

April 14th, 2015

After over two and a half years in various stages of development, I am happy to somewhat “officially” announce a MATLAB package that I have been developing: QETLAB (Quantum Entanglement Theory LABoratory). This announcement is completely arbitrary, since people started finding QETLAB via Google about a year ago, and a handful of papers have made use of it already, but I figured that I should at least acknowledge its existence myself at some point. I’ll no doubt be writing some posts in the near future that highlight some of its more advanced features, but I will give a brief run-down of what it’s about here.

The Basics

First off, QETLAB has a variety of functions for dealing with “simple” things like tensor products, Schmidt decompositions, random pure and mixed states, applying superoperators to quantum states, computing Choi matrices and Kraus operators, and so on, which are fairly standard daily tasks for quantum information theorists. These sorts of functions are somewhat standard, and are also found in a few other MATLAB packages (such as Toby Cubitt’s nice Quantinf package and Géza Tóth’s QUBIT4MATLAB package), so I won’t really spend any time discussing them here.

Mixed State Separability

The “motivating problem” for QETLAB is the separability problem, which asks us to (efficiently / operationally / practically) determine whether a given mixed quantum state is separable or entangled. The (by far) most well-known tool for this job is the positive partial transpose (PPT) criterion, which says that every separable state remains positive semidefinite when the partial transpose map is applied to it. However, this is just a quick-and-dirty one-way test, and going beyond it is much more difficult.

The QETLAB function that tries to solve this problem is the IsSeparable function, which goes through several separability criteria in an attempt to prove the given state separable or entangled, and provides a journal reference to the paper that contains the separability criteria that works (if one was found).

As an example, consider the “tiles” state, introduced in [1], which is an example of a quantum state that is entangled, but is not detected by the simple PPT test for entanglement. We can construct this state using QETLAB’s UPB function, which lets the user easily construct a wide variety of unextendible product bases, and then verify its entanglement as follows:

>> u = UPB('Tiles'); % generates the "Tiles" UPB
>> rho = eye(9) - u*u'; % rho is the projection onto the orthogonal complement of the UPB
>> rho = rho/trace(rho); % we are now done constructing the bound entangled state
>> IsSeparable(rho)
Determined to be entangled via the realignment criterion. Reference:
K. Chen and L.-A. Wu. A matrix realignment method for recognizing entanglement.
Quantum Inf. Comput., 3:193-202, 2003.
ans =

And of course more advanced tests for entanglement, such as those based on symmetric extensions, are also checked. Generally, quick and easy tests are done first, and slow but powerful tests are only performed if the script has difficulty finding an answer.

Alternatively, if you want to check individual tests for entanglement yourself, you can do that too, as there are stand-alone functions for the partial transpose, the realignment criterion, the Choi map (a specific positive map in 3-dimensional systems), symmetric extensions, and so on.

Symmetry of Subsystems

One problem that I’ve come across repeatedly in my work is the need for robust functions relating to permuting quantum systems that have been tensored together, and dealing with the symmetric and antisymmetric subspaces (and indeed, this type of thing is quite common in quantum information theory). Some very basic functionality of this type has been provided in other MATLAB packages, but it has never been as comprehensive as I would have liked. For example, QUBIT4MATLAB has a function that is capable of computing the symmetric projection on two systems, or on an arbitrary number of 2- or 3-dimensional systems, but not on an arbitrary number of systems of any dimension. QETLAB’s SymmetricProjection function fills this gap.

Similarly, there are functions for computing the antisymmetric projection, for permuting different subsystems, and for constructing the unitary swap operator that implements this permutation.

Nonlocality and Bell Inequalities

QETLAB also has a set of functions for dealing with quantum non-locality and Bell inequalities. For example, consider the CHSH inequality, which says that if \{A_1,A_2\} and \{B_1,B_2\} are \{-1,+1\}-valued measurement settings, then the following inequality holds in classical physics (where \langle \cdot \rangle denotes expectation):

\langle A_1B_1 \rangle + \langle A_1B_2 \rangle + \langle A_2B_1 \rangle - \langle A_2B_2 \rangle \leq 2.

However, in quantum-mechanical settings, this inequality can be violated, and the quantity on the left can take on a value as large as 2\sqrt{2} (this is Tsirelson’s bound). Finally, in no-signalling theories, the quantity on the left can take on a value as large as 4.

All three of these quantities can be easily computed in QETLAB via the BellInequalityMax function:

>> coeffs = [1 1;1 -1]; % coefficients of the terms <A_iB_j> in the Bell inequality
>> a_coe = [0 0]; % coefficients of <A_i> in the Bell inequality
>> b_coe = [0 0]; % coefficients of <B_i> in the Bell inequality
>> a_val = [-1 1]; % values of the A_i measurements
>> b_val = [-1 1]; % values of the B_i measurements
>> BellInequalityMax(coeffs, a_coe, b_coe, a_val, b_val, 'classical')
ans =
>> BellInequalityMax(coeffs, a_coe, b_coe, a_val, b_val, 'quantum')
ans =
>> BellInequalityMax(coeffs, a_coe, b_coe, a_val, b_val, 'nosignal')
ans =

The classical value of the Bell inequality is computed simply by brute force, and the no-signalling value is computed via a linear program. However, no reasonably efficient method is known for computing the quantum value of a Bell inequality, so this quantity is estimated using the NPA hierarchy [2]. Advanced users who want more control can specify which level of the NPA hierarchy to use, or even call the NPAHierarchy function directly themselves. There is also a closely-related function for computing the classical, quantum, or no-signalling value of a nonlocal game (in case you’re a computer scientist instead of a physicist).

Download and Documentation

QETLAB v0.8 is currently available at qetlab.com (where you will also find its documentation) and also on github. If you have any suggestions/comments/requests/anything, or if you have used QETLAB in your work, please let me know!


  1. C.H. Bennett, D.P. DiVincenzo, T. Mor, P.W. Shor, J.A. Smolin, and B.M. Terhal. Unextendible product bases and bound entanglement. Phys. Rev. Lett. 82, 5385–5388, 1999. E-print: arXiv:quant-ph/9808030
  2. M. Navascués, S. Pironio, and A. Acín. A convergent hierarchy of semidefinite programs characterizing the set of quantum correlations. New J. Phys., 10:073013, 2008. E-print: arXiv:0803.4290 [quant-ph]

“Obvious” Does Not Imply “True”: The Minimal Superpermutation Conjecture is False

August 22nd, 2014

One of my favourite examples of an “obvious” mathematical statement that is actually false is the “fact” that if R,S,T \subseteq \mathbb{R}^n are vector spaces then

\dim(R + S + T) = \dim(R) + \dim(S) + \dim(T) \\ {}_{{}_{{}_{.}}}\quad\quad\quad\quad\quad\quad\quad\quad - \dim(R\cap S) -\dim(R\cap T) -\dim(S\cap T) \\ {}_{{}_{{}_{.}}}\quad\quad\quad\quad\quad\quad\quad\quad+\dim(R\cap S\cap T).

The reason that the above statement seems so obvious is that the similar fact \dim(R + S) = \dim(R) + \dim(S) - \dim(R\cap S) does hold, so it’s very tempting to think  “inclusion-exclusion, yadda yadda, it’s simple enough to prove that it’s not worth writing down or working through the details”. However, it’s not true: a counterexample is provided by 3 distinct lines through the origin in \mathbb{R}^2.

There is another problem that I’ve been thinking about for quite some time that is also “obvious”: the minimal superpermutation conjecture. This conjecture was so obvious, in fact, that it appeared as a question in a national programming contest in 1998. Well, last night Robin Houston posted a note on the arXiv showing that, despite being obvious, the conjecture is false [1].


What is the shortest string that contains each permutation of “123” as a contiguous substring? It is straightforward to check that “123121321” contains each of “123”, “132”, “213”, “231”, “312”, and “321” as substrings (i.e., it is a superpermutation of 3 symbols), and it’s not difficult to argue (or use a computer search to show) that it is the shortest string with this property.

Well, we can repeat this question for any number of symbols. I won’t repeat all of the details (because I already wrote about the problem here), but there is a natural recursive construction that takes an (n-1)-symbol superpermutation of length L and spits out an n-symbol superpermutation of length L+n!. This immediately gives us an n-symbol superpermutation of length 1! + 2! + 3! + … + n! for all n. Importantly, it seemed like this construction was the best we could do: computer search verifies that these superpermutations are the smallest possible, and are even unique, for n ≤ 4.

Furthermore, it is not difficult to come up with some lower bounds on the length of superpermutations that seem to suggest that we have found the right answer. A trivial argument shows that an n-symbol superpermutation must have length at least (n-1) + n!, since we need n characters for the first permutation, and 1 additional character for each of the remaining n!-1  permutations. This argument can be refined to show that a superpermutation must actually have length at least (n-2) + (n-1)! + n!, since there is no way to pack the permutations tightly enough so that each one only uses 1 additional character (spend a few minutes trying to construct superpermutations by hand and you’ll see this for yourself). In fact, we can even refine this argument further (see a not-so-pretty proof sketch here) to show that n-symbol superpermutations must have length at least (n-3) + (n-2)! + (n-1)! + n!.

A-ha! A pattern has emerged – surely we can just keep refining this argument over and over again to eventually get a lower bound of 1! + 2! + 3! + … + n!, which shows that the superpermutations we already have are indeed minimal, right? Some variant of this line of thought seemed to be where almost everyone’s mind went when introduced to this problem, and it seemed fairly convincing: this argument is more or less contained within the answers when this question was posted on MathExchange and on StackOverflow (although the authors are usually careful to state that their method only appears to be optimal), and this problem was presented as a programming question in the 1998 Turkish Informatics Olympiad (see the resulting thread here). Furthermore, even on pages where this was acknowledged to be a difficult open problem, it was sometimes claimed that it had been proved for n ≤ 11 (example).

For the above reasons, it was a long time before I was even convinced that this problem was indeed unsolved – it seemed like people had solved this problem but just found it not worth the effort of writing up a full proof, or that people had found a simple way to tackle the problem for moderately large values of n like 10 or 11 that I couldn’t even dream of handling.

The Conjecture is False

It turns out that the minimal superpermutation conjecture is false for all n ≥ 6. That is, there exists a superpermutation of length strictly less than 1! + 2! + 3! + … + n! in all of these cases [1]. In particular, Robin Houston found the following 6-symbol superpermutation of length 872, which is smaller than the conjectured minimum length of 1! + 2! + … + 6! = 873:


So not only are congratulations due to Robin for settling the conjecture, but a big “thank you” are due to him as well for (hopefully) convincing everyone that this problem is not as easy as it appears upon first glance.


  1. R. Houston. Tackling the Minimal Superpermutation Problem. E-print: arXiv:1408.5108 [math.CO], 2014.

All Minimal Superpermutations on Five Symbols Have Been Found

August 13th, 2014

Recall from an earlier blog post that the minimal superpermutation problem asks for the shortest string on the symbols “1”, “2”, …, “n” that contains every permutation of those symbols as a contiguous substring. For example, “121” is a minimal superpermutation on the symbols “1” and “2”, since it contains both “12” and “21” as substrings, and there is no shorter string with this property.

Until now, the length of minimal superpermutations has only been known when n ≤ 4: they have length 1, 3, 9, and 33 in these cases, respectively. It has been conjectured that minimal superpermutations have length \sum_{k=1}^n k! for all n, and I am happy to announce that Ben Chaffin has proved this conjecture when n = 5. More specifically, he showed that minimal superpermutations in the n = 5 case have length 153, and there are exactly 8 such superpermutations (previously, it was only known that minimal superpermutations have either length 152 or 153 in this case, and there are at least 2 superpermutations of length 153).

The Eight Minimal Superpermutations

The eight superpermutations that Ben found are available here (they’re too large to include in the body of this post). Notice that the first superpermutation is the well-known “easy-to-construct” superpermutation described here, and the second superpermutation is the one that was found in [1]. The other six superpermutations are new.

One really interesting thing about the six new superpermutations is that they are the first known minimal superpermutations to break the “gap pattern” that previously-known constructions have. To explain what I mean by this, consider the minimal superpermutation “123121321” on three symbols. We can think about generating this superpermutation greedily: we start with “123”, then we append the character “1” to add the permutation “231” to the string, and then we append the character “2” to add the permutation “312” to the string. But now we are stuck: we have “12312”, and there is no way to append just one character to this string in such a way as to add another permutation to it: we have to append the two characters “13” to get the new permutation “213”.

This phenomenon seemed to be fairly general: in all known small superpermutations on n symbols, there was always a point (approximately halfway through the superpermutation) where n-2 consecutive characters were “wasted”: they did not add any new permutations themselves, but only “prepared” the next symbol to add a new permutation.

However, none of the six new minimal superpermutations have this property: they all never have more than 2 consecutive “wasted” characters, whereas the two previously-known superpermutations each have a run of n-2 = 3 consecutive “wasted” characters. Thus these six new superpermutations are really quite different from any superpermutations that we currently know and love.

How They Were Found

The idea of Ben’s search is to do a depth-first search on the placement of the “wasted” characters (recall that “wasted” characters were defined and discussed in the previous section). Since the shortest known superpermutation on 5 symbols has length 153, and there are 120 permutations of 5 symbols, and the first n-1 = 4 characters of the superpermutation must be wasted, we are left with the problem of trying to place 153 – 120 – 4 = 29 wasted characters. If we can find a superpermutation with only 28 wasted characters (other than the initial 4), then we’ve found a superpermutation of length 152; if we really need all 29 wasted characters, then minimal superpermutations have length 153.

So now we do the depth-first search:

  • Find (via brute-force) the maximum number of permutations that we can fit in a string if we are allowed only 1 wasted character: the answer is 10 permutations (for example, the string “123451234152341” does the job).
  • Now find the maximum number of permutations that we can fit in a string if we are allowed 2 wasted characters. To speed up the search, once we have found a string that contains some number (call it p) of permutations, we can ignore all other strings that use a wasted character before p-10 permutations, since we know from the previous bullet point that the second wasted character can add at most 10 more permutations, for a total of (p-10)+10 = p permutations.
  • We now repeat this process for higher and higher numbers of wasted characters: we find the maximum number of permutations that we can fit in a string with 3 wasted characters, using the results from the previous two bullets to speed up the search by ignoring strings that place 1 or 2 wasted characters too early.
  • Etc.

The results of this computation are summarized in the following table:

Wasted characters Maximum # of permutations
0 5
1 10
2 15
3 20
4 23
5 28
6 33
7 36
8 41
9 46
10 49
11 53
12 58
13 62
14 66
15 70
16 74
17 79
18 83
19 87
20 92
21 96
22 99
23 103
24 107
25 111
26 114
27 116
28 118
29 120

As we can see, it is not possible to place all 120 permutations in a string with 28 or fewer wasted characters, which proves that there is no superpermutation of length 152 in the n = 5 case. C code that computes the values in the above table is available here.

Update [August 18, 2014]: Robin Houston has found a superpermutation on 6 symbols of length 873 (i.e., the conjectured minimal length) with the interesting property that it never has more than one consecutive wasted character! The superpermutation is available here.

IMPORTANT UPDATE [August 22, 2014]: Robin Houston has gone one step further and disproved the minimal superpermutation conjecture for all n ≥ 6. See here.


  1. N. Johnston. Non-uniqueness of minimal superpermutations. Discrete Mathematics, 313:1553–1557, 2013.

What the Operator-Schmidt Decomposition Tells Us About Entanglement

June 27th, 2014

In quantum information theory, the well-known Schmidt decomposition theorem tells us that every pure quantum state |\phi\rangle\in\mathbb{C}^d\otimes\mathbb{C}^d can be written in the form

|\phi\rangle =\displaystyle\sum_{i=1}^d\gamma_i|a_i\rangle\otimes|b_i\rangle,

where each \gamma_i \geq 0 is a real scalar and the sets \{|a_i\rangle\} and \{|b_i\rangle\} form orthonormal bases of \mathbb{C}^d.

The Schmidt decomposition theorem isn’t anything fancy: it is just the singular value decomposition in disguise (the \gamma_i‘s are singular values of some matrix and the sets \{|a_i\rangle\} and \{|b_i\rangle\} are its left and right singular vectors). However, it tells us everything we could ever want to know about the entanglement of |\phi\rangle: it is entangled if and only if it has more than one non-zero \gamma_i, and in this case the question of “how much” entanglement is contained within |\phi\rangle is answered by a certain function of the \gamma_i‘s.

Well, we can find a similar decomposition of mixed quantum states. If \rho\in M_d\otimes M_d is a mixed quantum state then it can be written in its operator-Schmidt decomposition:

\rho=\displaystyle\sum_{i=1}^{d^2}\gamma_iA_i\otimes B_i,

where each \gamma_i \geq 0 is a real scalar and the sets \{A_i\} and \{B_i\} form orthonormal bases of Hermitian matrices in M_d (under the Hilbert–Schmidt inner product \langle A,B\rangle :=\mathrm{Tr}(A^\dagger B)).

Once again, we haven’t really done anything fancy: the operator-Schmidt decomposition is also just the singular value decomposition in disguise in almost the exact same way as the regular Schmidt decomposition. However, its relationship with entanglement of mixed states is much weaker (as we might expect from the fact that the singular value decomposition can be computed in polynomial time, but determining whether a mixed state is entangled or separable (i.e., not entangled) is expected to be hard [1]). In this post, we’ll investigate some cases when the operator-Schmidt decomposition does let us conclude that \rho is separable or entangled.

Proving a State is Entangled: The Realignment Criterion

One reasonably well-known method for proving that a mixed state is entangled is the realignment criterion [2,3]. What is slightly less well-known is that the realignment criterion can be phrased in terms of the coefficients \{\gamma_i\} in the operator-Schmidt decomposition of \rho.

Theorem 1 (realignment criterion). Let \rho\in M_d\otimes M_d have operator-Schmidt decomposition

\rho=\displaystyle\sum_{i=1}^{d^2}\gamma_iA_i\otimes B_i.

If \sum_{i=1}^{d^2}\gamma_i>1 then \rho is entangled.

Proof. The idea is to construct a specific entanglement witness that detects the entanglement in \rho. In particular, the entanglement witness that we will use is W:=I-\sum_{i=1}^{d^2}A_i\otimes B_i. To see that W is indeed an entanglement witness, we must show that (\langle a|\otimes\langle b|)W(|a\rangle\otimes|b\rangle)\geq 0 for all |a\rangle\in\mathbb{C}^m and |b\rangle\in\mathbb{C}^n. Well, some algebra shows that

(\langle a|\otimes\langle b|)W(|a\rangle\otimes|b\rangle) = 1 - \displaystyle\sum_{i=1}^{d^2}\langle a|A_i|a\rangle\langle b|B_i|b\rangle,

so it suffices to show that \sum_{i=1}^{d^2}\langle a|A_i|a\rangle\langle b|B_i|b\rangle\leq 1. To see this notice that

\displaystyle\sum_{i=1}^{d^2}\langle a|A_i|a\rangle\langle b|B_i|b\rangle\leq \sqrt{\sum_{i=1}^{d^2}(\langle a|A_i|a\rangle)^2}\sqrt{\sum_{i=1}^{d^2}(\langle b|B_i|b\rangle)^2}= 1,

where the inequality is the Cauchy–Schwarz inequality and the equality comes from the fact that the sets \{A_i\} and \{B_i\} are orthonormal bases, so \sum_{i=1}^{d^2}(\langle a|A_i|a\rangle)^2=\big\langle |a\rangle\langle a|,|a\rangle\langle a|\big\rangle=1 (and similarly for |b\rangle).

Now that we know that W is an entanglement witness, we must check that it detects the entanglement in \rho (that is, we want to show that \mathrm{Tr}(W\rho)<0). This is straightforward to show by making use of the fact that the sets \{A_i\} and \{B_i\} are orthonormal:

\mathrm{Tr}(W\rho) = \displaystyle\mathrm{Tr}\left( \Big(I - \sum_{i=1}^{d^2}A_i\otimes B_i\Big)\Big(\sum_{i=1}^{d^2}\gamma_iA_i\otimes B_i\Big)\right)=1-\mathrm{Tr}\left(\sum_{i,j=1}^{d^2}\gamma_iA_iA_j\otimes B_iB_j\right)

{}_{{}_{{}_{.}}} \phantom{\mathrm{Tr}(W\rho)}=\displaystyle 1-\sum_{i,j=1}^{d^2}\gamma_i\mathrm{Tr}(A_iA_j)\mathrm{Tr}(B_iB_j)=1-\sum_{i=1}^{d^2}\gamma_i<0.

It follows that \rho is entangled, which completes the proof.

A more popular formulation of the realignment criterion says that if we define the realignment map R by R(|i\rangle\langle j|\otimes|k\rangle\langle\ell|)=|i\rangle\langle k|\otimes|j\rangle\langle\ell| and extending by linearity, and let \|\cdot\|_{tr} denote the trace norm (i.e., the sum of the singular values), then \|R(\rho)\|_{tr}>1 implies that \rho is entangled. The equivalence of these two formulations of the realignment criterion comes from the fact that the singular values of R(\rho) are exactly the coefficients \{\gamma_i\} in its operator-Schmidt decomposition.

Proving a State is Entangled: Beyond the Realignment Criterion

We might naturally wonder whether we can prove that even more states are entangled based on their operator-Schmidt decomposition than those detected by the realignment criterion. The following theorem gives one sense in which the answer to this question is “no”: if we only look at “nice” functions of the coefficients \{\gamma_i\} then the realignment criterion gives the best method of entanglement detection possible.

Theorem 2. Let f : \mathbb{R}^{d^2}\rightarrow\mathbb{R}_{\geq 0} be a symmetric gauge function (i.e., a norm that is invariant under permutations and sign changes of the d^2 entries of the input vector). If we can conclude that \rho\in M_d\otimes M_d is entangled based on the value of f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) then it must be the case that \sum_{i=1}^{d^2}\gamma_i>1.

Proof. Without loss of generality, we scale f so that f(1,0,0,\ldots,0) = 1. We first prove two facts about f.

Claim 1: f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2})\geq \frac{1}{d} for all mixed states \rho\in M_d\otimes M_d. This follows from the fact that \max_i\gamma_i\geq\frac{1}{d} (which itself is kind of a pain to prove: it follows from the fact that the 1\rightarrow\infty Schatten norm of the realignment map R is d, but if anyone knows of a more direct and/or simpler way to prove this, I’d love to see it). If we assume without loss of generality that \gamma_1\geq\frac{1}{d} then

f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) = \frac{1}{2}\big(f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2})+f(\gamma_1,-\gamma_2,\ldots,-\gamma_{d^2})\big)

{}_{{}_{.}} \phantom{f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2})} \geq f(\gamma_1,0,0,\ldots,0) = \gamma_1f(1,0,0,\ldots,0) = \gamma_1 \geq \frac{1}{d},

as desired.

Claim 2: There exists a separable state \rho\in M_d\otimes M_d for which f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) equals any given value in the interval [\frac{1}{d},1]. To see why this is the case, first notice that there exists a separable state \rho\in M_d\otimes M_d with \gamma_1=1 and \gamma_i=0 for all i\geq 2: the state \rho=|0\rangle\langle 0|\otimes|0\rangle\langle 0| is one such example. Similarly, there is a separable state with \gamma_1=\frac{1}{d} and \gamma_i=0 for all i\geq 2: the state \rho=\frac{1}{d^2}I\otimes I is one such example. Furthermore, it is straightforward to interpolate between these two extremes to find separable states (even product states) with \gamma_i = 0 for all i\geq 2 and any value of \gamma_1 \in[\frac{1}{d},1]. For such states we have

f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) = f(\gamma_1,0,0,\ldots,0)=\gamma_1f(1,0,0,\ldots,0)=\gamma_1,

which can take any value in the interval [\frac{1}{d},1] as claimed.

By combining claims 1 and 2, we see that we could only ever use the value of f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) to conclude that \rho is entangled if f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2})>1. However, in this case we have

\displaystyle\sum_{i=1}^{d^2}\gamma_i = f(\gamma_1,0,0,\ldots,0) + f(0,\gamma_2,0,\ldots,0) + \cdots + f(0,0,0,\ldots,\gamma_{d^2})

{}_{{}_{.}}\displaystyle\phantom{\sum_{i=1}^{d^2}\gamma_i}\geq f(\gamma_1,\gamma_2,\ldots,\gamma_{d^2}) >1,

which completes the proof.

Theorem 2 can be phrased naturally in terms of the other formulation of the realignment criterion as well: it says that there is no unitarily-invariant matrix norm \|\cdot\|_{u} with the property that we can use the value of \|R(\rho)\|_{u} to conclude that \rho is entangled, except in those cases where the trace norm (i.e., the realignment criterion) itself already tells us that \rho is entangled.

Nonetheless, we can certainly imagine using functions of the coefficients \{\gamma_i\} that are not symmetric gauge functions. Alternatively, we could take into account some (hopefully easily-computable) properties of the matrices \{A_i\} and \{B_i\}. One such method for detecting entanglement that depends on the coefficients \{\gamma_i\} and the trace of each \{A_i\} and \{B_i\} is as follows.

Theorem 3 [4,5]. Let \rho\in M_d\otimes M_d have operator-Schmidt decomposition

\rho=\displaystyle\sum_{i=1}^{d^2}\gamma_iA_i\otimes B_i.


\displaystyle\sum_{i=1}^{d^2}\big|\gamma_i - \gamma_i^2\mathrm{Tr}(A_i)\mathrm{Tr}(B_i)\big| + \frac{1}{2}\sum_{i=1}^{d^2}\gamma_i^2\big(\mathrm{Tr}(A_i)^2+\mathrm{Tr}(B_i)^2\big) > 1

then \rho is entangled.

I won’t prove Theorem 3 here, but I will note that it is strictly stronger than the realignment criterion, which can be seen by showing that the left-hand side of Theorem 3 is at least as large as the left-hand side of Theorem 1. To show this, observe that

\displaystyle\sum_{i=1}^{d^2}\big|\gamma_i - \gamma_i^2\mathrm{Tr}(A_i)\mathrm{Tr}(B_i)\big| \geq \sum_{i=1}^{d^2}\gamma_i - \sum_{i=1}^{d^2}\gamma_i^2\mathrm{Tr}(A_i)\mathrm{Tr}(B_i)


\displaystyle\frac{1}{2}\sum_{i=1}^{d^2}\gamma_i^2\big(\mathrm{Tr}(A_i)^2+\mathrm{Tr}(B_i)^2\big) - \sum_{i=1}^{d^2}\gamma_i^2\mathrm{Tr}(A_i)\mathrm{Tr}(B_i) = \frac{1}{2}\sum_{i=1}^{d^2}\big(\gamma_i\mathrm{Tr}(A_i)-\gamma_i\mathrm{Tr}(B_i)\big)^2,

which is nonnegative.

Proving a State is Separable

Much like we can use the operator-Schmidt decomposition to sometimes prove that a state is entangled, we can also use it to sometimes prove that a state is separable. To this end, we will use the operator-Schmidt rank of \rho, which is the number of non-zero coefficients \{\gamma_i\} in its operator-Schmidt decomposition. One trivial observation is as follows:

Proposition 4. If the operator-Schmidt rank of \rho is 1 then \rho is separable.

Proof. If the operator-Schmidt rank of \rho is 1 then we can write \rho=A\otimes B for some A,B\in M_d. Since \rho is positive semidefinite, it follows that either A and B are both positive semidefinite or both negative semidefinite. If they are both positive semidefinite, we are done. If they are both negative semidefinite then we can write \rho=(-A)\otimes(-B) and then we are done.

Somewhat surprisingly, however, we can go further than this: it turns out that all states with operator-Schmidt rank 2 are also separable, as was shown in [6].

Theorem 5 [6]. If the operator-Schmidt rank of \rho is 2 then \rho is separable.

Proof. If \rho has operator-Schmidt rank 2 then it can be written in the form \rho=A\otimes B + C\otimes D for some A,B,C,D\in M_d. Throughout this proof, we use the notation a:=\mathrm{Tr}(A), b:=\mathrm{Tr}(B), and so on.

Since \rho is positive semidefinite, so are each of its partial traces. Thus bA+dC and aB+cD are both positive semidefinite operators. It is then straightforward to verify that

(bA+dC)\otimes(aB+cD) + (cA-aC)\otimes(dB-bD)= A\otimes B + C\otimes D =\rho.

What is important here is that we have found a rank-2 tensor decomposition of \rho in which one of the terms is positive semidefinite. Now we define

X := \big((bA+dC)^{-1/2}\otimes(aB+cD)^{-1/2}\big) \rho \big((bA+dC)^{-1/2}\otimes(aB+cD)^{-1/2}\big)

and notice that X = I\otimes I + E\otimes F for some E,F\in M_d (in order to do this, we actually need the partial traces of \rho to be nonsingular, but this is easily taken care of by standard continuity arguments, so we’ll sweep it under the rug). Furthermore, X is also positive semidefinite, and it is separable if and only if \rho is separable. Since X is positive semidefinite, we know that 1 + \lambda_i(E)\lambda_j(F) \geq 0 for all eigenvalues \lambda_i(E) of E and \lambda_j(F) of F. If we absorb scalars between E and F so that \max_i\lambda_i(E) = 1 then this implies that \lambda_j(F) \geq -1 for all j. Thus I-E and I+F are both positive semidefinite. Furthermore, a straightforward calculation shows that

(I-E)\otimes I + E \otimes (I+F) = I\otimes I + E\otimes F = X.

We now play a similar game as before: we define a new matrix

Y := \big((I-E)^{-1/2}\otimes(I+F)^{-1/2}\big)X\big((I-E)^{-1/2}\otimes(I+F)^{-1/2}\big)

and notice that Y = I \otimes G + H \otimes I for some G,H\in M_d (similar to before, we note that there is a standard continuity argument that can be used to handle the fact that I-E and I+F might be singluar). The minimum eigenvalue of Y is then \lambda_{\textup{min}}(G)+\lambda_{\textup{min}}(H), which is non-negative as a result of Y being positive semidefinite. It then follows that

Y = I \otimes (G - \lambda_{\textup{min}}(G)I) + (H - \lambda_{\textup{min}}(H)I) \otimes I + \big(\lambda_{\textup{min}}(G)+\lambda_{\textup{min}}(H)\big)I \otimes I.

Since each term in the above decomposition is positive semidefinite, it follows that Y is separable, which implies that X is separable, which finally implies that \rho is separable.

In light of Theorem 6, it seems somewhat natural to ask how far we can push things: what values of the operator-Schmidt rank imply that a state is separable? Certainly we cannot expect all states with an operator-Schmidt rank of 4 to be separable, since every state in M_2 \otimes M_2 has operator-Schmidt rank 4 or less, and there are entangled states in this space (more concretely, it’s easy to check that the maximally-entangled pure state has operator-Schmidt rank 4).

This left the case of operator-Schmidt rank 3 open. Very recently, it was shown in [7] that a mixed state in M_2 \otimes M_d with operator-Schmidt rank 3 is indeed separable, yet there are entangled states with operator-Schmidt rank 3 in M_3 \otimes M_3.


  1. L. Gurvits. Classical deterministic complexity of Edmonds’ problem and quantum entanglement. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, pages 10–19, 2003. E-print: arXiv:quant-ph/0303055
  2. K. Chen and L.-A. Wu. A matrix realignment method for recognizing entanglement. Quantum Inf. Comput., 3:193–202, 2003. E-print: arXiv:quant-ph/0205017
  3. O. Rudolph. Some properties of the computable cross norm criterion for separability. Phys. Rev. A, 67:032312, 2003. E-print:  E-print: arXiv:quant-ph/0212047
  4. C.-J. Zhang, Y.-S. Zhang, S. Zhang, and G.-C. Guo. Entanglement detection beyond the cross-norm or realignment criterion. Phys. Rev. A, 77:060301(R), 2008. E-print: arXiv:0709.3766 [quant-ph]
  5. O. Gittsovich, O. Gühne, P. Hyllus, and J. Eisert. Unifying several separability conditions using the covariance matrix criterion. Phys. Rev. A, 78:052319, 2008. E-print: arXiv:0803.0757 [quant-ph]
  6. D. Cariello. Separability for weak irreducible matrices. E-print: arXiv:1311.7275 [quant-ph]
  7. D. Cariello. Does symmetry imply PPT property?. E-print: arXiv:1405.3634 [math-ph]