The latest xkcd:
(As usual, click on the image for a larger version.)
I think that this is the first reference to eigenanalysis in a widely-read comic strip. (It's not a first for popular culture: Dudley Eigenvalue, DDS, was an important secondary character in Thomas Pynchon's novel V. But that's another story.)
As the wikipedia page on Eigenvalues and Eigenvectors explains,
The eigenvectors of a square matrix are the non-zero vectors which, after being multiplied by the matrix, remain proportional to the original vector (i.e. change only in magnitude, not in direction). For each eigenvector, the corresponding eigenvalue is the factor by which the eigenvector changes when multiplied by the matrix. The prefix eigen- is adopted from the German word "eigen" for "innate", "own". The eigenvectors are sometimes also called proper vectors, or characteristic vectors. Similarly, the eigenvalues are also known as proper values, or characteristic values.
So why might the prince use eigenvectors to match shoes to their owners? This is a reference, I think, to a well-known pattern-recognition technique, in which eigenanalysis is used to create a lower-dimensional representation of the members of a set of high-dimensional objects like pictures of faces.
In the eigenface technique, we start with relatively high-resolution pictures of faces — say 320×240 = 76,800 pixels. We accumulate a collection of such pictures. If these are grey-scale pictures, we can think of each picture as a weighted sum of 76,800 "basis pictures", each of which has a zero value everywhere except at one of the pixel locations. The coefficients of this weighted sum are then just the picture's original pixel values, and the sequence of coefficients for a given picture is, by trivial definition, a vector in a 76,800-dimensional space. (This seems like a lot of work for no purpose, but the general approach turns out to be extraordinarily useful.)
We'd like to find a way to project these vectors into a much lower-dimensional subspace, such that as much as possible of the differences among the faces in our collection is preserved. With a bit of linear algebra, we can show that we can do this by choosing, as our new "basis pictures", the first few eigenvectors of the covariance matrix of our picture collection. Each of these new basis vectors will be a 76,800-dimensional vector, i.e. in this application a sort of a picture, and one of the pictures in our collection (or more important, a new picture) can now be represented as a linear combination (a weighted sum) of these basis vectors.
But these new basis vectors, instead of being blank everywhere but at one pixel location, as our original basis vectors were, have some distribution of non-zero values over the whole picture area, in some cases looking somewhat like fuzzy faces — hence the term "eigenfaces".
Furthermore, if we order the eigenvectors by the magnitude of the associated eigenvalues, and keep just the first few of them, we can get a perhaps-surprisingly good approximation to the original pictures by adding up a weighted combination of a surprisingly small number of these "eigenfaces" — here say something like 32 or 64 or 128. The weights involved are then a radically smaller representation of each picture — a vector with 128 elements rather than 76,800 elements.
In this new space, it seems that simple measures of similarity between vectors — perhaps as simple as euclidean distance, i.e. the square root of the sum of squared element-wise differences — give us a metric that does a rather good job of identifying faces. We take a new picture, calculate its representation in the "eigenface" space, and then compare it (in that space) to each of the pictures in our collection.
So the idea behind the first xkcd panel is that the prince would create, or have access to, some sort of biometric database of foot representations, and would use it to create a space of "eigenfeet" in which newly-encountered foot shapes could be compared against existing entries, as a suitably dream-nerdy translation of trying the abandoned slipper on every female foot in the kingdom.
What does this have to do with linguistics?
Well, the number of connections may be surprising to those who are not used to the unreasonable effectiveness of mathematics. I'll mention just three of the many ways that people have applied eigenanalysis (or closely related ideas) to speech and language.
The first one goes under the name of "latent semantic indexing" or "latent semantic analysis". In this approach, rather than finding a useful low-dimensional approximation to pictures, we find a useful low-dimensional approximation to documents (and also to words, which are, after all, just one-word documents). The details are somewhat different, involving singular value decomposition, which is a generalization of eigendecomposition to non-square matrices, but the basic idea is the same. You can follow the links to learn more.
The second one is the application of "functional principal components analysis" to tonal analysis. A number of people have tried a number of different ways to approach this in recent years. One interesting example is John A. D. Aston et al., "Linguistic pitch analysis using functional principal component mixed effect models", J. Royal Stat. Soc.: C, 59 (2): 297–317, 2010.
I've given some examples of how to apply these ideas to a database of Chinese sentences (taken from Jiahong Yuan's thesis data), in the background of a problem set found here. You can access the details — include data and code — in the linked page, but I'll give a taste of the results here, in the form of some fragments of Octave (or Matlab) code and a couple of graphs.
Despite enormous variation by speaker, by phrasal position, by tonal context, and by stress or focus status, the mean contours by tone-class are surprisingly well behaved. (In the code below, the variable f0data1 has 7101 rows (different syllables) and 30 columns (the time-normalized pitch values in each syllable); the variable f0inds1 has the same 7101 rows and its columns give the values of various features of each syllable, with the 10th column being the tone class.)
tonemeans=zeros(4, 30); for tone=1:4 tonemeans(tone,:) = mean(f0data1(f0inds1(:,10)==tone,:)); end plot(tonemeans') legend(' 1',' 2',' 3',' 4','location','northwest')
The first few eigenvectors of the covariance matrix look very much like orthogonal polynomials:
[V L] = eig(cov(f0data1)); % Note that eigenvectors are in columns of V % with eigenvalues ordered from smallest to largest basis5 = V(:,30:-1:26)'; figure(1); plot(basis5'); title('Five eigenvectors')
We will find that these five eigenvectors do quite a good job of approximating (a sample of) the original contours:
for n=1:6 coeffs = basis5'\f0sample(n,:)'; synth = basis5'*coeffs; subplot(2,3,n); plot(1:30,f0sample(n,:),'b',1:30,synth,'r'); end
Four components do nearly as good a job; three is still OK, and a two-eigentone representation is marginal. A representation in terms of the just the first eigentone would be just a slightly bent line (the first eigenvector, which is the roughly horizonal blue line in the picture above) moved up and down according to its coefficient. Adding the second eigentone (the falling dark-green line in that picture) lets us represent different degrees of overall rise fall; and so on.
The last of my three samples of eigenlinguistics takes us into the area of spectral graph theory, which as wikipedia tells us is not the latest development from Ghostbusters' Laboratories, but rather
… is the study of properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated to the graph, such as its adjacency matrix or Laplacian matrix.
Most of what little I know about this area I learned from Partha Niyogi, who used such ideas extensively in various areas of machine learning, including in an effort to add some mathematical substance to an idea of mine about the development of lexical consensus in a speech community. But the particular case that I'm going to cite here comes from Cleve Moler and Donald Morrison, "Singular Value Decomposition of Cryptograms", American Mathematical Monthly, 1983.
In this case, we're using singular value decomposition to find some simple approximations to the bigram statistics of the letters in (a collection of) documents, with the goal of figuring out which letters are vowels and which are consonants. The first singular vectors tell us something about the overall frequency of individual letters, in somewhat the same way as the first eigenvector of the syllabic pitch contours told us something about the overall pitch level. Because vowels and consonants tend to alternate, in a statistical sense, Moler and Morrison explain that
where "the cryptanalyst's problem" here is just to distinguish vowels from consonants in a simple substitution cipher (a really easy problem for a cryptanalyst to have, but still…).
A sample application of this idea to the text of the U.S. constitution works reasonably well. For simplicity, we map all alphabetic characters to uppercase, and all punctuation and white-space characters to space, yielding this 27×27 bigram frequency table. Then a few lines of R are sufficient to yield this diagnosis, which does indeed separate vowels from consonants:
[1,] "-0.27" "0.161" "A"
[2,] "-0.304" "0.624" "E"
[3,] "-0.201" "0.207" "I"
[4,] "-0.263" "0.07" "O"
[5,] "-0.09" "0.013" "U"
[1,] "0.129" "-0.053" "B"
[2,] "0.143" "-0.124" "C"
[3,] "0.071" "-0.103" "D"
[4,] "0.011" "-0.126" "F"
[5,] "0.018" "-0.03" "G"
[6,] "0.007" "-0.005" "J"
[7,] "0.014" "-0.005" "K"
[8,] "0.046" "-0.1" "L"
[9,] "0.099" "-0.057" "M"
[10,] "0.02" "-0.056" "P"
[11,] "0" "-0.006" "Q"
[12,] "0.183" "-0.194" "R"
[13,] "0.188" "-0.195" "S"
[14,] "0.603" "-0.216" "T"
[15,] "0.116" "-0.05" "V"
[16,] "0.061" "-0.018" "W"
[17,] "0.008" "-0.019" "X"
[18,] "0.008" "-0.003" "Z"
[1,] "0.432" "0.466" "H"
[2,] "-0.032" "-0.36" "N"
[3,] "0.002" "0.014" "Y"
[4,] "-0.155" "-0.052" "_"
For a more extensive, more accessible (and more linguistic) discussion of graph partioning and spectral graph theory as applied to quasi-phonological analysis, see John Goldsmith and Aris Xanthos, "Three methods for learning phonological categories", University of Chicago Department of Computer Science TR-2008-08.