Norms and Dual Norms as Supremums and Infimums

May 26th, 2012

Let \mathcal{H} be a finite-dimensional Hilbert space over \mathbb{R} or \mathbb{C} (the fields of real and complex numbers, respectively). If we let \|\cdot\| be a norm on \mathcal{H} (not necessarily the norm induced by the inner product), then the dual norm of \|\cdot\| is defined by

\displaystyle\|\mathbf{v}\|^\circ := \sup_{\mathbf{w} \in \mathcal{H}}\Big\{ \big| \langle \mathbf{v}, \mathbf{w} \rangle \big| : \|\mathbf{w}\| \leq 1 \Big\}.

The double-dual of a norm is equal to itself (i.e., \|\cdot\|^{\circ\circ} = \|\cdot\|) and the norm induced by the inner product is the unique norm that is its own dual. Similarly, if \|\cdot\|_p is the vector p-norm, then \|\cdot\|_p^\circ = \|\cdot\|_q, where q satisfies 1/p + 1/q = 1.

In this post, we will demonstrate that \|\cdot\|^\circ has an equivalent characterization as an infimum, and we use this characterization to provide a simple derivation of several known (but perhaps not well-known) formulas for norms such as the operator norm of matrices.

For certain norms (such as the “separability norms” presented at the end of this post), this ability to write a norm as both an infimum and a supremum is useful because computation of the norm may be difficult. However, having these two different characterizations of a norm allows us to bound it both from above and from below.

The Dual Norm as an Infimum

Theorem 1. Let S \subseteq \mathcal{H} be a bounded set satisfying {\rm span}(S) = \mathcal{H} and define a norm \|\cdot\| by

\displaystyle\|\mathbf{v}\| := \sup_{\mathbf{w} \in S}\Big\{ \big| \langle \mathbf{v}, \mathbf{w} \rangle \big| \Big\}.

Then \|\cdot\|^\circ is given by

\displaystyle\|\mathbf{v}\|^\circ = \inf\Big\{ \sum_i |c_i| : \mathbf{v} = \sum_i c_i \mathbf{v}_i, \mathbf{v}_i \in S \ \forall \, i \Big\},

where the infimum is taken over all such decompositions of \mathbf{v}.

Before proving the result, we make two observations. Firstly, the quantity \|\cdot\| described by Theorem 1 really is a norm: boundedness of S ensures that the supremum is finite, and {\rm span}(S) = \mathcal{H} ensures that \|\mathbf{v}\| = 0 \implies \mathbf{v} = 0. Secondly, every norm on \mathcal{H} can be written in this way: we can always choose S to be the unit ball of the dual norm \|\cdot\|^\circ. However, there are times when other choices of S are more useful or enlightening (as we will see in the examples).

Proof of Theorem 1. Begin by noting that if \mathbf{w} \in S and \|\mathbf{v}\| \leq 1 then \big| \langle \mathbf{v}, \mathbf{w} \rangle \big| \leq 1. It follows that \|\mathbf{w}\|^{\circ} \leq 1 whenever \mathbf{w} \in S. In fact, we now show that \|\cdot\|^\circ is the largest norm on \mathcal{H} with this property. To this end, let \|\cdot\|_\prime be another norm satisfying \|\mathbf{w}\|_{\prime}^{\circ} \leq 1 whenever \mathbf{w} \in S. Then

\displaystyle \| \mathbf{v} \| = \sup_{\mathbf{w} \in S} \Big\{ \big| \langle \mathbf{w}, \mathbf{v} \rangle \big| \Big\} \leq \sup_{\mathbf{w}} \Big\{ \big| \langle \mathbf{w}, \mathbf{v} \rangle \big| : \|\mathbf{w}\|_{\prime}^{\circ} \leq 1 \Big\} = \|\mathbf{v}\|_\prime.

Thus  \| \cdot \| \leq \| \cdot \|_\prime, so by taking duals we see that \| \cdot \|^\circ \geq \| \cdot \|_\prime^\circ, as desired.

For the remainder of the proof, we denote the infimum in the statement of the theorem by \|\cdot\|_{{\rm inf}}. Our goal now is to show that: (1) \|\cdot\|_{{\rm inf}} is a norm, (2) \|\cdot\|_{{\rm inf}} satisfies \|\mathbf{w}\|_{{\rm inf}} \leq 1 whenever \mathbf{w} \in S, and (3) \|\cdot\|_{{\rm inf}} is the largest norm satisfying property (2). The fact that \|\cdot\|_{{\rm inf}} = \|\cdot\|^\circ will then follow from the first paragraph of this proof.

To see (1) (i.e., to prove that \|\cdot\|_{{\rm inf}} is a norm), we only prove the triangle inequality, since positive homogeneity and the fact that \|\mathbf{v}\|_{{\rm inf}} = 0 if and only if \mathbf{v} = 0 are both straightforward (try them yourself!). Fix \varepsilon > 0 and let \mathbf{v} = \sum_i c_i \mathbf{v}_i, \mathbf{w} = \sum_i d_i \mathbf{w}_i be decompositions of \mathbf{v}, \mathbf{w} with \mathbf{v}_i, \mathbf{w}_i \in S for all i, satisfying \sum_i |c_i| \leq \|\mathbf{v}\|_{{\rm inf}} + \varepsilon and \sum_i |d_i| \leq \|\mathbf{w}\|_{{\rm inf}} + \varepsilon. Then

\displaystyle \|\mathbf{v} + \mathbf{w}\|_{{\rm inf}} \leq \sum_i |c_i| + \sum_i |d_i| \leq \|\mathbf{v}\|_{{\rm inf}} + \|\mathbf{w}\|_{{\rm inf}} + 2\varepsilon.

Since \varepsilon > 0 was arbitrary, the triangle inequality follows, so \|\cdot\|_{{\rm inf}} is a norm.

To see (2) (i.e., to prove that \|\mathbf{v}\|_{{\rm inf}} \leq 1 whenever \mathbf{v} \in S), we simply write \mathbf{v} in its trivial decomposition \mathbf{v} = \mathbf{v}, which gives the single coefficient c_1 = 1, so \|\mathbf{v}\|_{{\rm inf}} \leq \sum_i c_i = c_1 = 1.

To see (3) (i.e., to prove that \|\cdot\|_{{\rm inf}} is the largest norm on \mathcal{H} satisfying condition (2)), begin by letting \|\cdot\|_\prime be any norm on \mathcal{H} with the property that \|\mathbf{v}\|_{\prime} \leq 1 for all \mathbf{v} \in S. Then using the triangle inequality for \|\cdot\|_\prime shows that if \mathbf{v} = \sum_i c_i \mathbf{v}_i is any decomposition of \mathbf{v} with \mathbf{v}_i \in S for all i, then

\displaystyle\|\mathbf{v}\|_\prime = \Big\|\sum_i c_i \mathbf{v}_i\Big\|_\prime \leq \sum_i |c_i| \|\mathbf{v}_i\|_\prime = \sum_i |c_i|.

Taking the infimum over all such decompositions of \mathbf{v} shows that \|\mathbf{v}\|_\prime \leq \|\mathbf{v}\|_{{\rm inf}}, which completes the proof.

The remainder of this post is devoted to investigating what Theorem 1 says about certain specific norms.

Injective and Projective Cross Norms

If we let \mathcal{H} = \mathcal{H}_1 \otimes \mathcal{H}_2, where \mathcal{H}_1 and \mathcal{H}_2 are themselves finite-dimensional Hilbert spaces, then one often considers the injective and projective cross norms on \mathcal{H}, defined respectively as follows:

\displaystyle \|\mathbf{v}\|_{I} := \sup\Big\{ \big| \langle \mathbf{v}, \mathbf{a} \otimes \mathbf{b} \rangle \big| : \|\mathbf{a}\| = \|\mathbf{b}\| = 1 \Big\} \text{ and}

\displaystyle \|\mathbf{v}\|_{P} := \inf\Big\{ \sum_i \| \mathbf{a}_i \| \| \mathbf{b}_i \| : \mathbf{v} = \sum_i \mathbf{a}_i \otimes \mathbf{b}_i \Big\},

where \|\cdot\| here refers to the norm induced by the inner product on \mathcal{H}_1 or \mathcal{H}_2. The fact that \|\cdot\|_{I} and \|\cdot\|_{P} are duals of each other is simply Theorem 1 in the case when S is the set of product vectors:

\displaystyle S = \big\{ \mathbf{a} \otimes \mathbf{b} : \|\mathbf{a}\| = \|\mathbf{b}\| = 1 \big\}.

In fact, the typical proof that the injective and projective cross norms are duals of each other is very similar to the proof of Theorem 1 provided above (see [1, Chapter 1]).

Maximum and Taxicab Norms

Use n to denote the dimension of \mathcal{H} and let \{\mathbf{e}_i\}_{i=1}^n be an orthonormal basis of \mathcal{H}. If we let S = \{\mathbf{e}_i\}_{i=1}^n then the norm \|\cdot\| in the statement of Theorem 1 is the maximum norm (i.e., the p = ∞ norm):

\displaystyle\|\mathbf{v}\|_\infty = \sup_i\Big\{\big|\langle \mathbf{v}, \mathbf{e}_i \rangle \big| \Big\} = \max \big\{ |v_1|,\ldots,|v_n|\big\},

where v_i = \langle \mathbf{v}, \mathbf{e}_i \rangle is the i-th coordinate of \mathbf{v} in the basis \{\mathbf{e}_i\}_{i=1}^n. The theorem then says that the dual of the maximum norm is

\displaystyle \|\mathbf{v}\|_\infty^\circ = \inf \Big\{ \sum_i |c_i| : \mathbf{v} = \sum_i c_i \mathbf{e}_i \Big\} = \sum_{i=1}^n |v_i|,

which is the taxicab norm (i.e., the p = 1 norm), as we expect.

Operator and Trace Norm of Matrices

If we let \mathcal{H} = M_n, the space of n \times n complex matrices with the Hilbert–Schmidt inner product

\displaystyle \big\langle A, B \big\rangle := {\rm Tr}(AB^*),

then it is well-known that the operator norm and the trace norm are dual to each other:

\displaystyle \big\| A \big\|_{op} := \sup_{\mathbf{v}}\Big\{ \big\|A\mathbf{v}\big\| : \|\mathbf{v}\| = 1 \Big\} \text{ and}

\displaystyle \big\| A \big\|_{op}^\circ = \big\|A\big\|_{tr} := \sup_{U}\Big\{ \big| {\rm Tr}(AU) \big| : U \in M_n \text{ is unitary} \Big\},

where \|\cdot\| is the Euclidean norm on \mathbb{C}^n. If we let S be the set of unitary matrices in M_n, then Theorem 1 provides the following alternate characterization of the operator norm:

Corollary 1. Let A \in M_n. Then

\displaystyle \big\|A\big\|_{op} = \inf\Big\{ \sum_i |c_i| : A = \sum_i c_i U_i \text{ and each } U_i \text{ is unitary} \Big\}.

As an application of Corollary 1, we are able to provide the following characterization of unitarily-invariant norms (i.e., norms \|\cdot\|_{\prime} with the property that \big\|UAV\big\|_{\prime} = \big\|A\big\|_{\prime} for all unitary matrices U, V \in M_n):

Corollary 2. Let \|\cdot\|_\prime be a norm on M_n. Then \|\cdot\|_\prime is unitarily-invariant if and only if

\displaystyle \big\|ABC\big\|_\prime \leq \big\|A\big\|_{op}\big\|B\big\|_{\prime}\big\|C\big\|_{op}

for all A, B, C \in M_n.

Proof of Corollary 2. The “if” direction is straightforward: if we let A and C be unitary, then

\displaystyle \big\|B\big\|_\prime = \big\|A^*ABCC^*\big\|_\prime \leq \big\|ABC\big\|_\prime \leq \big\|B\big\|_{\prime},

where we used the fact that \big\|A\big\|_{op} = \big\|C\big\|_{op} = 1. It follows that \big\|ABC\big\|_\prime = \big\|B\big\|_\prime, so \|\cdot\|_\prime is unitarily-invariant.

To see the “only if” direction, write A = \sum_i c_i U_i and C = \sum_i d_i V_i with each U_i and V_i unitary. Then

\displaystyle \big\|ABC\big\|_\prime = \Big\|\sum_{i,j}c_i d_j U_i B V_j\Big\|_\prime \leq \sum_{i,j} |c_i| |d_j| \big\|U_i B V_j\big\|_\prime = \sum_{i,j} |c_i| |d_j| \big\|B\big\|_\prime.

By taking the infimum over all decompositions of A and C of the given form and using Corollary 1, the result follows.

An alternate proof of Corollary 2, making use of some results on singular values, can be found in [2, Proposition IV.2.4].

Separability Norms

As our final (and least well-known) example, let \mathcal{H} = M_m \otimes M_n, again with the usual Hilbert–Schmidt inner product. If we let

\displaystyle S = \{ \mathbf{a}\mathbf{b}^* \otimes \mathbf{c}\mathbf{d}^* : \|\mathbf{a}\| = \|\mathbf{b}\| = \|\mathbf{c}\| = \|\mathbf{d}\| = 1 \},

where \|\cdot\| is the Euclidean norm on \mathbb{C}^m or \mathbb{C}^n, then Theorem 1 tells us that the following two norms are dual to each other:

\displaystyle \big\|A\big\|_s := \sup\Big\{ \big| (\mathbf{a}^* \otimes \mathbf{c}^*)A(\mathbf{b} \otimes \mathbf{d}) \big| : \|\mathbf{a}\| = \|\mathbf{b}\| = \|\mathbf{c}\| = \|\mathbf{d}\| = 1 \Big\} \text{ and}

\displaystyle \big\|A\big\|_s^\circ = \inf\Big\{ \sum_i \big\|A_i\big\|_{tr}\big\|B_i\big\|_{tr} : A = \sum_i A_i \otimes B_i \Big\}.

There’s actually a little bit of work to be done to show that \|\cdot\|_s^\circ has the given form, but it’s only a couple lines – consider it an exercise for the interested reader.

Both of these norms come up frequently when dealing with quantum entanglement. The norm \|\cdot\|_s^\circ was the subject of [3], where it was shown that a quantum state \rho is entangled if and only if \|\rho\|_s^\circ > 1 (I use the above duality relationship to provide an alternate proof of this fact in [4, Theorem 6.1.5]). On the other hand, the norm \|\cdot\|_s characterizes positive linear maps of matrices and was the subject of [5, 6].

References

  1. J. Diestel, J. H. Fourie, and J. Swart. The Metric Theory of Tensor Products: Grothendieck’s Résumé Revisited. American Mathematical Society, 2008. Chapter 1: pdf
  2. R. Bhatia. Matrix Analysis. Springer, 1997.
  3. O. Rudolph. A separability criterion for density operators. J. Phys. A: Math. Gen., 33:3951–3955, 2000. E-print: arXiv:quant-ph/0002026
  4. N. Johnston. Norms and Cones in the Theory of Quantum Entanglement. PhD thesis, University of Guelph, 2012.
  5. N. Johnston and D. W. Kribs. A Family of Norms With Applications in Quantum Information TheoryJournal of Mathematical Physics, 51:082202, 2010.
  6. N. Johnston and D. W. Kribs. A Family of Norms With Applications in Quantum Information Theory IIQuantum Information & Computation, 11(1 & 2):104–123, 2011.

Counting and Solving Final Fantasy XIII-2′s Clock Puzzles

February 6th, 2012

Final Fantasy XIII-2 is a role-playing game, released last week in North America, that contains an abundance of mini-games. One of the more interesting mini-games is the “clock puzzle”, which presents the user with N integers arranged in a circle, with each integer being from 1 to \lfloor N/2 \rfloor.

A challenging late-game clock puzzle with N = 12

The way the game works is as follows:

  1. The user may start by picking any of the N positions on the circle. Call the number in this position M.
  2. You now have the option of picking either the number M positions clockwise from your last choice, or M positions counter-clockwise from your last choice. Update the value of M to be the number in the new position that you chose.
  3. Repeat step 2 until you have performed it N-1 times.

You win the game if you choose each of the N positions exactly once, and you lose the game otherwise (if you are forced to choose the same position twice, or equivalently if there is a position that you have not chosen after performing step 2 a total of N-1 times). During the game, N ranges from 5 to 13, though N could theoretically be as large as we like.

Example

To demonstrate the rules in action, consider the following simple example with N = 6 (I have labelled the six positions 05 in blue for easy reference):

If we start by choosing the 1 in position 1, then we have the option of choosing the 3 in either position 0 or 2. Let’s choose the 3 in position 0. Three moves either clockwise or counter-clockwise from here both give the 1 in position 3, so that is our only possible next choice. We continue on in this way, going through the N = 6 positions in the order 103425, as in the following image:

We have now selected each position exactly once, so we are done – we solved the puzzle! In fact, this is the unique solution for the given puzzle.

Counting Clock Puzzles

Let’s work on determining how many different clock puzzles there are of a given size. As mentioned earlier, a clock puzzle with N positions has an integer in the interval [1, \lfloor N/2 \rfloor] in each of the  positions. There are thus \lfloor N/2 \rfloor^N distinct clock puzzles with N positions, which grows very quickly with N – its values for N = 1, 2, 3, … are given by the sequence 0, 1, 1, 16, 32, 729, 2187, 65536, 262144, … (A206344 in the OEIS).

However, this rather crude count of the number of clock puzzles ignores the fact that some clock puzzles have no solution. To illustrate this fact, we present the following simple proposition:

Proposition. There are unsolvable clock puzzles with N positions if and only if N = 4 or N ≥ 6.

To prove this proposition, first note that the clock puzzles for N = 2 or N = 3 are trivially solvable, since each number in the puzzle is forced to be \lfloor N/2 \rfloor = 1. The 32 clock puzzles in the N = 5 case can all easily be shown to be solvable via computer brute force (does anyone have a simple or elegant argument for this case?).

In the N = 4 case, exactly 3 of the 16 clock puzzles are unsolvable:

To complete the proof, it suffices to demonstrate an unsolvable clock puzzle for each N ≥ 6. To this end, we begin by considering the following clock puzzle in the N = 6 case:

The above puzzle is unsolvable because the only way to reach position 0 is to select it first, but from there only one of positions 2 or 4 can be reached – not both. This example generalizes in a straightforward manner to any N ≥ 6 simply by adding more 1′s to the bottom: it will still be necessary to choose position 0 first, and then it is impossible to reach both position 2 and position N-2 from there.

There doesn’t seem to be an elegant way to count the number of solvable clock puzzles with N positions (which is most likely related to the apparent difficulty of solving these puzzles, which will be discussed in the next section), so let’s count the number of solvable clock puzzles via brute force. Simply constructing each of the \lfloor N/2 \rfloor^N clock puzzles and determining which of them are solvable (via the MATLAB script linked at the end of this post) shows that the number of solvable clock puzzles for N = 1, 2, 3, … is given by the sequence 0, 1, 1, 13, 32, 507, 1998, 33136, 193995, … (A206345 in the OEIS).

This count of puzzles is perhaps still unsatisfying, though, since it counts puzzles that are simply mirror images or rotations of each other multiple times. Again, there doesn’t seem to be an elegant counting argument for enumerating the solvable clock puzzles up to rotation and reflection, so we compute this sequence by brute force: 0, 1, 1, 4, 8, 72, 236, 3665, 19037, … (A206346 in the OEIS).

Solving Clock Puzzles

Clock puzzles are one of the most challenging parts of Final Fantasy XIII-2, and with good reason: they are a well-studied graph theory problem in disguise. We can consider each clock puzzle with N positions as a directed graph with N vertices. If position N contains the number M, then there is a directed edge going from vertex N to the vertices M positions clockwise and counter-clockwise from it. In other words, we consider a clock puzzle as a directed graph on N vertices, where the directed edges describe the valid moves around the circle.

The directed graph corresponding to the earlier (solvable) N = 6 example

The problem of solving a clock puzzle is then exactly the problem of finding a directed Hamiltonian path on the associated graph. Because finding a directed Hamiltonian path in general is NP-hard, this seems to suggest that solving clock puzzles might be as well. There of course is the problem that the directed graphs relevant to this problem have very special structure – in particular, every vertex has outdegree ≤ 2, and the graph has a symmetry property that results from clockwise/counter-clockwise movement allowed in the clock puzzles.

The main result of [1] shows that the fact that the outdegree of each vertex is no larger than 2 is no real help: finding directed Hamiltonian paths is still NP-hard given such a promise. However, the symmetry condition seems more difficult to characterize in graph theoretic terms, and could potentially be exploited to produce a fast algorithm for solving these puzzles.

Regardless of the problem’s computational complexity, the puzzles found in the game are quite small (N ≤ 13), so they can be easily solved by brute force. Attached is a MATLAB script (solve_clock.m) that can be used to solve clock puzzles. The first input argument is a vector containing the numeric values in each of the positions, starting from the top and reading clockwise. By default, only one solution is computed. To compute all solutions, set the second (optional) input argument to 1.

The output of the script is either a vector of positions (labelled 0 through N-1, with 0 referring to the top position, 1 referring to one position clockwise from there, and so on) describing an order in which you can visit the positions to solve the puzzle, or 0 if there is no solution.

For example, the script can be used to find our solution to the N = 6 example provided earlier:

>> solve_clock([3,1,3,1,2,3])

ans =
    1 0 3 4 2 5

Similarly, the script can be used to find all four solutions [Update, October 1, 2013: Whoops, there are six solutions! See the comments.] to the puzzle in the screenshot at the very top of this post:

>> solve_clock([6,5,1,4,2,1,6,4,2,1,5,2], 1)

ans =
    3 7 11 9 10 5 4 2 1 8 6 0
    7 3 11 9 10 5 4 2 1 8 6 0
    9 10 5 4 2 3 7 11 1 8 6 0
    9 8 10 5 4 2 3 7 11 1 6 0

Download

References

  1. J. Plesnik. The NP-completeness of the Hamiltonian cycle problem in planar digraphs with degree bound two. Inform. Process. Lett., 8:199–201, 1979.

MATLAB Scripts for Computing Completely Bounded Norms via Semidefinite Programming

July 23rd, 2011

In operator theory, the completely bounded norm of a linear map on complex matrices \Phi : M_m \rightarrow M_n is defined by \|\Phi\|_{cb} := \sup_{k \geq 1} \| id_k \otimes \Phi \|, where \|\Phi\| is the usual norm on linear maps defined by \|\Phi\| := \sup_{X \in M_m} \{ \|\Phi(X)\| : \|X\| \leq 1\} and \|X\| is the operator norm of X [1]. The completely bounded norm is particularly useful when thinking of M_m and M_n as operator spaces.

The dual of the completely bounded norm is called the diamond norm, which plays an important role in quantum information theory, as it can be used to measure the distance between quantum channels. The diamond norm of \Phi is typically denoted \|\Phi\|_{\diamond}. For properties of the completely bounded and diamond norms, see [1,2,3].

A method for efficiently computing the completely bounded and diamond norms via semidefinite programming was recently presented in [4]. The purpose of this post is to provide MATLAB scripts that implement this algorithm and demonstrate its usage.

Download and Install

In order to make use of these scripts to compute the completely bounded or diamond norm, you must download and install two things: the SeDuMi semidefinite program solver and the MATLAB scripts themselves.

  1. SeDuMi – Please follow the instructions on the SeDuMi website to download and install it. If possible, you should install SeDuMi 1.1R3, not SeDuMi 1.21 or SeDuMi 1.3, since there is a bug with the newer versions when dealing with complex matrices.
  2. CB Norm MATLAB Package – Once SeDuMi is installed, download the CB norm MATLAB scripts, unzip them, and place them in your MATLAB scripts directory. The zip file contains 10 MATLAB scripts.

Once the scripts are installed, type “help CBNorm” or “help DiamondNorm” at the MATLAB prompt to learn how to use the CBNorm and DiamondNorm functions. Several usage examples are provided below.

Usage Examples

The representation of the linear map \Phi that the CBNorm and DiamondNorm functions take as input is a pair of arrays of its left- and right- generalized Choi-Kraus operators. That is, an array of operators \{A_i\} and \{B_i\} such that \Phi(X) = \sum_i A_i X B_i for all X.

Basic Examples

If we wanted to compute the completely bounded and diamond norms of the map

the MATLAB input and output would be as follows:

>> PhiA(:,:,1) = [1,1;1,0];
>> PhiA(:,:,2) = [1,0;1,2];
>> PhiB(:,:,1) = [1,0;0,1];
>> PhiB(:,:,2) = [1,2;1,1];
>> CBNorm(PhiA,PhiB)

ans =

    7.2684

>> DiamondNorm(PhiA,PhiB)

ans =

    7.4124

So we see that its completely bounded norm is 7.2684 and its diamond norm is 7.4124.

If we instead want to compute the completely bounded or diamond norm of a completely positive map, we only need to provide its Kraus operators – i.e., operators \{A_i\} such that \Phi(X) = \sum_i A_i X A_i^\dagger for all X. Furthermore, in this case semidefinite programming isn’t used at all, since [1, Proposition 3.6] tells us that \|\Phi\|_{cb} = \|\Phi(I)\| and \|\Phi\|_{\diamond} = \|\Phi^\dagger(I)\|, and computing \|\Phi(I)\| is trivial. The following example demonstrates the usage of these scripts in this case, via a completely positive map \Phi : M_3 \rightarrow M_2 with four (essentially random) Kraus operators:

>> PhiA(:,:,1) = [1 0 0;0 1 1];
>> PhiA(:,:,2) = [-3 0 1;5 1 1];
>> PhiA(:,:,3) = [0 2 0;0 0 0];
>> PhiA(:,:,4) = [1 1 3;0 2 0];
>> CBNorm(PhiA)

ans =

   42.0000

>> DiamondNorm(PhiA)

ans =

   38.7303

Transpose Map

Suppose we want to compute the completely bounded or diamond norm of the transpose map on M_n. A generalized Choi-Kraus representation is given by defining A_{ij} = B_{ij} = e_i e_j^\dagger, where \{e_i\} is the standard basis of \mathbb{C}^n (i.e., A_{ij} and B_{ij} are the operators with matrix representation in the standard basis with a one in the (i,j)-entry and zeroes elsewhere). It is known that the completely bounded and diamond norms of the n-dimensional transpose map are both equal to n, which can be verified in small dimensions as follows:

>> % 2-dimensional transpose
>> PhiA(:,:,1) = [1 0;0 0];
>> PhiA(:,:,2) = [0 1;0 0];
>> PhiA(:,:,3) = [0 0;1 0];
>> PhiA(:,:,4) = [0 0;0 1];
>> PhiB = PhiA;
>> CBNorm(PhiA,PhiB)

ans =

    2.0000

>> DiamondNorm(PhiA,PhiB)

ans =

    2.0000
>> % 3-dimensional transpose
>> I = eye(3);
>> for i=1:3
for j=1:3
PhiA(:,:,3*(i-1)+j) = I(:,i)*I(j,:);
end
end
>> PhiB = PhiA;
>> CBNorm(PhiA,PhiB)

ans =

    3.0000

>> DiamondNorm(PhiA,PhiB)

ans =

    3.0000

Difference of Unitary Channels

Now consider the map \Phi : M_2 \rightarrow M_2 defined by \Phi(X) = X - UXU^\dagger, where U is the following unitary matrix:

We know from [2, Theorem 12] that the CB norm and diamond norm of \Phi are both equal to the diameter of the smallest closed disc containing all of the eigenvalues of U. Because the eigenvalues of U are (1 \pm i)/\sqrt{2}, the smallest closed disc containing its eigenvalues has diameter \sqrt{2}, so \|\Phi\|_{cb} = \|\Phi\|_{\diamond} = \sqrt{2}. This result can be verified as follows:

>> PhiA(:,:,1) = [1 0;0 1];
>> PhiA(:,:,2) = [1 1;-1 1]/sqrt(2);
>> PhiB(:,:,1) = [1 0;0 1];
>> PhiB(:,:,2) = -[1 -1;1 1]/sqrt(2);
>> CBNorm(PhiA,PhiB)

ans =

    1.4142

>> DiamondNorm(PhiA,PhiB)

ans =

    1.4142

References

  1. V. I. Paulsen. Completely bounded maps and operator algebras. Cambridge University Press, 2003.
  2. N. Johnston, D. W. Kribs, and V. I. Paulsen. Computing stabilized norms for quantum operations via the theory of completely bounded maps. Quantum Inf. Comput., 9:16-35, 2009.
  3. J. Watrous. Theory of quantum information lecture notes.
  4. J. Watrous. Semidefinite programs for completely bounded norms. Theory Comput., 5:217–238, 2009.

Separability-Preserving Operators in Entanglement Theory

June 14th, 2011

One of the key concepts in quantum information theory is the difference between separable states and entangled states. A pure quantum state (that is, a unit vector) v ∈ CnCn is said to be separable if it can be written as v = a ⊗ b for some a,b ∈ Cn; otherwise v is called entangled. In this post we will investigate what operators preserve the set of separable pure states, as well as what operators entangle all separable pure states.

Separable Pure State Preservers and Entangling Gates

In the design of quantum algorithms, entangling gates play a very important role. Entangling gates are unitary operators that are able to generate entanglement. A bit more specifically, a unitary operator U ∈ Mn ⊗ Mn (where Mn is the space of n × n complex matrices) is called an entangling gate if there exists a separable pure state v = a ⊗ b ∈ CnCn such that Uv is entangled. Conversely, we will say that a unitary operator U preserves separability if Uv is separable whenever v is separable.

In order to answer the question of what unitaries preserve separability, it is instructive to consider some simple examples (this is often a useful way to formulate conjectures regarding preserver problems). For example, it is clear that if U = A ⊗ B for some unitary operators A, B ∈ Mn, then U preserves separability (because U(a ⊗ b) = Aa ⊗ Bb is separable). Another example of a unitary operator that preserves separability is the swap (or flip) operator S defined on separable states by S(a ⊗ b) = b ⊗ a (the action of S on the rest of CnCn is determined by extending linearly). It turns out that these are essentially the only operators that preserve separability [1,2,3]:

Theorem 1. Let U ∈ Mn ⊗ Mn be a unitary operator. Then U preserves separability (i.e., U is not an entangling gate) if and only if there exist unitary operators A, B ∈ Mn such that either U = A ⊗ B or U = S(A ⊗ B).

As we already saw, the “if” direction of the above result is trivial – the meat and potatoes of the theorem comes from the “only if” direction (as is typically the case with results about linear preservers). Theorem 1 was first proved in [1] essentially by case analysis and checking the action of a separability-preserving unitary on a basis of CnCn, and was subsequently re-proved using similar techniques (but with different motivations and connections) in [2]. The result was proved in [3] by using the vector-operator isomorphism and the fact that a linear map Φ : Mn → Mn preserves the set of rank-1 operators if and only if there exist A, B ∈ Mn such that either Φ(X) ≡ AXB or Φ(X) ≡ AXtB [4].

Theorem 1 also follows as a simple corollary of several related results that have recently been proved in [5,6]. A version of Theorem 1 for multipartite systems (i.e., systems that are the tensor product of more than two copies of Cn) can be found in [3] and [7].

Universal Entangling Gates

A universal entangling gate is, as its name suggests, a stronger form of an entangling gate – it is a unitary operator U such that U(a ⊗ b) is entangled for all a, b ∈ Cn (contrast this with entangling gates, which require only that U(a ⊗ b) is entangled for some a, b ∈ Cn). The structure of universal entangling gates is much less well-understood than that of entangling gates, though we can still at least say when they exist.

It is not difficult to convince yourself that universal entangling gates can’t exist in small dimensions. Let’s begin by supposing n = 2. The set of pure states in C2C2 can be regarded as a 7-dimensional real manifold (7 = 2 × (n × n) – 1, where we subtract one because pure states all have unit length), while the set of separable pure states in C2C2 can be regarded as a 5-dimensional real manifold (5 = (2 × n – 1) + (2 × n – 1) – 1, where the final one is subtracted because the overall phase of the first system relative to the second system is irrelevant). Thus, if U ∈ M2 ⊗ M2 were a universal entangler, it would have to send a 5-dimensional manifold into the 7 – 5 = 2 remaining dimensions of the space, which seems unlikely. Similarly, if n = 3 and U ∈ M3 ⊗ M3 were a universal entangler, it would have to send a 9-dimensional manifold into the 17 – 9 = 8 remaining dimensions of the space, which also seems unlikely.

Indeed, this type of argument was made rigorous via methods of algebraic geometry in [8], where the following result was proved:

Theorem 2. There exists a universal entangling gate in Mn ⊗ Mn if and only if n ≥ 4.

Despite knowing when universal entangling gates exist, we still don’t have a characterization of such operators, nor do we even have many explicit examples (does anyone have an explicit example for 3 ⊗ 4 or 4 ⊗ 4 systems?). Similar techniques to those used in the proof of Theorem 2 should also shed light on when universal entangling gates exist in multipartite systems Mn1 ⊗ Mn2 ⊗ … ⊗ Mnk, but to my knowledge this calculation has not been explicitly carried out.

References:

  1. M. Marcus and B. N. Moyls, Transformations on tensor product spaces. Pacific Journal of Mathematics 9, 1215–1221 (1959).
  2. F. Hulpke, U. V. Poulsen, A. Sanpera, A. Sen De, U. Sen, and M. Lewenstein, Unitarity as preservation of entropy and entanglement in quantum systems. Foundations of Physics 36, 477–499 (2006). E-print: arXiv:quant-ph/0407118
  3. N. Johnston, Characterizing Operations Preserving Separability Measures via Linear Preserver Problems. To appear in Linear and Multilinear Algebra (2011). E-print: arXiv:1008.3633 [quant-ph]
  4. L. Beasley, Linear operators on matrices: the invariance of rank k matrices. Linear Algebra and its Applications 107, 161–167 (1988).
  5. E. Alfsen and F. Shultz, Unique decompositions, faces, and automorphisms of separable states. Journal of Mathematical Physics 51, 052201 (2010). E-print: arXiv:0906.1761 [math.OA]
  6. S. Friedland, C.-K. Li, Y.-T. Poon, and N.-S. Sze, The automorphism group of separable states in quantum information theory. Journal of Mathematical Physics 52, 042203 (2011). E-print: arXiv:1012.4221 [quant-ph]
  7. R. Westwick, Transformations on tensor spaces. Pacific Journal of Mathematics 23, 613–620 (1967).
  8. J. Chen, R. Duan, Z. Ji, M. Ying, J. Yu, Existence of Universal Entangler. Journal of Mathematical Physics 49, 012103 (2008). E-print: arXiv:0704.1473 [quant-ph]

The Q-Toothpick Cellular Automaton

March 26th, 2011

The Q-toothpick cellular automaton (defined earlier this month by Omar E. Pol) is described by the following simple rules:

  1. On an infinite square grid, draw a quarter circle from one corner of a square to the opposite corner of that square:
  2. Call an endpoint of a quarter circle (or a “Q-toothpick”) exposed if it does not touch the endpoint of any other quarter circle.
  3. From each exposed endpoint, draw two more quarter circles, each of the same size as the first quarter circle you drew. Furthermore, the two quarter circles that you draw are the ones that can be drawn “smoothly” (without creating a 90° or 180° corner). Thus the next two generations of the automaton are (already-placed quarter circles are green, newly-added quarter circles are red):

The name “Q-toothpick” comes from its analogy to the more well-studied toothpick automaton (see Sloane’s A139250 and this paper), in which toothpicks (rather than quarter circles) are repeatedly placed on a grid where exposed ends of other toothpicks lie. In this post, we will examine how this automaton evolves over time, and in particular we will investigate the types of shapes that it produces.

Counting Q-Toothpicks

While the Q-toothpick automaton appears quite random and unpredictable for the first few generations, evolving past generation 6 or so reveals several patterns. The following image depicts the evolution of the automaton for its first 19 generations.

The first 19 generations of the Q-toothpick cellular automaton (red segments are pieces that are newly added in the current generation)

Perhaps the most notable pattern is that the grid is more or less filled up in an expanding square starting from the initial Q-toothpick. In fact, by inspecting generations 4, 6, 10, 18, we see that at generation 2n + 2 (n = 1, 2, 3, …) the automaton has roughly filled in a square of side length 2n+1 + 1, and then evolution continues from there on out of the corners of that square. Also, the number of cells added (A187211) at these generations can now easily be computed:

A187211(2n + 2) = 16 + 8(2n-1 – 1) for n ≥ 3.

Furthermore, the growth in the following generations repeats itself. In particular, we have:

A187211(2n + 3) = 22 for n ≥ 1,
A187211(2n + 4) = 40 for n ≥ 2,
A187211(2n + 5) = 54 for n ≥ 2.

Similarly, for n ≥ 3, the four values of A187211(2n + 6) through A187211(2n + 9) are similarly constant (their values are 56, 70, 120, and 134). In general, for n ≥ k the 2k-1 values of A187211(2n + 2k-1 + 2) through A187211(2n + 2k + 1) are constant in n, though I am not aware of a general formula for what these constants are. If we ignore the first four generations and arrange the number of Q-toothpicks added in each generation in rows of length 2n, we obtain a table that begins as follows:

22, 20
22, 40, 54, 40
22, 40, 54, 56, 70, 120, 134, 72
22, 40, 54, 56, 70, 120, 134, 88, 70, 120, 150, 168, 246, 360, 326, 136

C scripts are provided at the end of this post for computing the values of A187210 and A187211 (and hence the values in the above table).

Shapes Traced Out by Q-Toothpicks

In the graphic above that depicts the initial 19 generations of the Q-toothpick automaton, several shapes are traced out, including circles, diamonds, hearts, and several nameless blobs:

By far the most common of these shapes are circles, diamonds and hearts. The fourth shape appears only on the diagonal and it’s not difficult to see that it forever will make up the entirety of the diagonal (with the exception of the circle in the center). The fifth and sixth objects are the first two members of an infinite family of objects that appear as the automaton evolves. The fifth object first appears in generation 9, and sixth object (which is basically two copies of the fifth object) first appears in generation 17. The following object, which is basically made up of two copies of the sixth object (i.e., four copies of the fifth object) first appears in generation 33:

In general, a new object of this type (made of 2n copies of the fifth object above) first appears in generation 2n+3 + 1. In fact, these objects are the only ones that are traced out by this automaton. [Edit: this final claim is not true! See ebcube's great post that shows a double-heart shape in generation 31.]

Update [March 28, 2011]: I have added a script that counts the number of circles, diamonds, and hearts in the nth generation of the Q-toothpick automaton, and another script that computes Sloane’s A187212.

Download:

  • A187210.c – computes the total number of Q-toothpicks present in the nth generation
  • A187211.c – computes the number of Q-toothpicks added in the nth generation
  • A187212.c – computes the number of Q-toothpicks if we restrict them to the positive quadrant
  • count_shapes.c – computes the number of circles, diamonds, and hearts in the nth generation

The Maximum Score in the Game “Entanglement” is 9080

January 21st, 2011

Entanglement is a browser-based game that has gained a fair bit of popularity lately due to its recent inclusion in Google’s Chrome Web Store and Chrome 9. The way the game works is probably best understood by actually playing it, but here is my brief attempt:

  • You are given a hexagonal tile with six paths printed on it, with two path ends touching each side of the hexagon. One such tile is as follows:

  • You may rotate, but not move the hexagon that has been provided to you.
  • Once you have selected an orientation of the hexagon, a path is traced along that hexagon, and you are provided a new hexagon that you may rotate at the end of your current path.
  • The goal of the game is to create the longest path possible without running into either the centre hexagon or the outer edge of the game board.

To make things a bit more interesting, the game was updated in November 2010 to include a new scoring system that gives you 1 + 2 + 3 + … + n (the nth triangular number) points on a turn if you extend the length of your path by n on that turn. This encourages clever moves that significantly extend the length of the path all at once. The question that I am going to answer today is what the maximum score in Entanglement is under this scoring system (inspired by this reddit thread).

On a Standard-Size Game Board

The standard Entanglement game board is made up of a hexagonal ring of 6 hexagons, surrounded by a hexagonal ring of 12 hexagons, surrounded by a hexagonal ring of 18 hexagons, for a total of 36 hexagons. In order to maximize our score, we want to maximize how much we increase the length of our path on our final move. Thus, we want to just extend our path by a length of one on each of our first 35 moves, and then score big on the 36th move.

Well, each hexagon that we lay has six paths on it, for a total of 6*36 = 216 paths on the board. 35 of those paths will be used up by our first 35 moves. It is not possible to use all of the remaining 181 paths, however, because many of them lead into the edge of the game board or the central hexagon, and connecting to such a path immediately ends the game. Because there are 12 path ends that touch the central hexagon and 84 path ends that touch the outer border, there must be at least (12+84)/2 – 1 = 47 unused paths on the game board (we divided by 2 because each unused path takes up two path ends and we subtracted 1 because one of the paths will be used by us).

Thus we can add a length of at most 181 – 47 = 134 to our path on the 36th and final move of the game, giving a total score of at most 35 (from the first 35 moves of the game) + 1 + 2 + 3 + … + 134 = 35 + 9045 = 9080. Not only is this an upper bound of the possible scores, but it is actually attainable, as demonstrated by the following optimal game board:

Paths in red are unused, the green line depicts the portion of the path laid by the first 35 moves of the game, and the blue line depicts the portion of the path (of length 134) gained on the 36th move. One fun property of the above game board is that it is actually completely “unentangled” – no paths cross over any other paths.

On a Larger or Smaller Game Board

Other than being a good size for playability purposes, there is no reason why we couldn’t play Entanglement on a game board of larger or smaller radius (by radius I mean the number of rings of hexagons around the central hexagon – the standard game board has a radius of 3). We will compute the maximum score simply by mimicking our previous analysis for the standard game board. If the board has radius n, then there are 6 + 12 + 18 + … + 6n = 3n(n+1) hexagons, each of which contains 6 paths. Thus there are 18n(n+1) lengths of path, 3n(n+1)-1 of which are used in the first 3n(n+1)-1 moves of the game, and we want to add as many as possible of the remaining 15n(n+1)+1 lengths of path in the final move of the game. There are 12 path ends that touch the central hexagon and 12 + 24n path ends that touch the outer edge of the game board. Thus there are at least (12 + 12 + 24n)/2 – 1 = 11 + 12n unused paths on the game board.

Tallying the numbers up, we see that on the final move, we can add at most 15n(n+1)+1 – (11 + 12n) = 15n2 + 3n – 10 lengths of path. If T(n) = n(n+1)/2 is the nth triangular number, then we see that it’s not possible to obtain more than 3n(n+1)-1 + T(15n2 + 3n – 10) = (225/2)n4 + 45n3 – 135n2 – (51/2)n + 44 points. In fact, this score is obtainable via the exact same construction as the optimal board in the n = 3 case – just extend the (counter)clockwise rotation of the path in the obvious way. Thus, the maximum score for a game of Entanglement on a board of radius n for n = 1, 2, 3, … is given by the sequence 41, 1613, 9080, 29462, 72479, … (A180667 in the OEIS).