# A Short Stint with Principal Component Analysis I get like, ~100 visitors daily, and I know it’s a lot for a person who’s just started, and I’m not even blogging for visitors, because this blog is basically my study notes, but I think, if you look hard enough, if you slant your eyes, you may find something of value in it. So why not introduce it to your friends? Buddies? Comrades? Homies? Compadres? Dusts? Rafighs? Ais? Do it!

I wanna be honest with you. One thing that I never understood is Principal Component Analysis. It’s very hard. At least to me. Because my college doesn’t offer any courses in the way of statistics (yes, I’m an Associates student!) I thought to myself, “Shieeet, I’m gonna read every resource there is. From Youtube, which I kinda feel like is the worst resource for learning – At least, sometimes – to a book about Computer Vision!” and ta-dah! Here we are. A book about Computer Vision is about to clear everything PCA. Let us commence learning! Put your learning hat and goggles on, and wear your diaphragm, because we’re gonna get screwed!

The following is based on Computer Vision, Principles, Algorithms, Applications, Learning by E. R. Davies.

One powerful way of approaching the task of data representation is Principal Component Analysis, or PCA. This involves finding the mean of a cluster of points in feature space, and then finding the principal axes of the cluster in the following way: First, an axis is found which passes through the mean position and which gives the maximum variance when the data are projected onto it. Then, a second such axis is found which maximizes variance in a directional normal to the first. This process is carried out until a total of N principal axes have been found for N-dimensional feature space. The process is illustrated in this figure:

In fact, the process is entirely mathematical and need not be undertaken in the strict sequence indicated above, it merely involves finding a set of orthogonal axes which diagonalize the covariance matrix. A matrix is diagonalizable if it has distinct eigenvalues in , the eigenspace. Eigenvalues are defined as (I know what eigenvalues are, this is just for people who are reading this post in the future and don’t wanna google): Where is the matrix, in this case, the covariance matrix, is the eigenvector, and is the eigenvalue.

The covariance matrix for the input population is defined as: where is the location of the th data point and is the mean of data points, and indicates expectation value for the underlying population. We can estimate from the equations: Where: As is real and symmetric, it is possible to diagonalize it using a suitable orthogonal transformation matrix , obtaining a set of orthonormal eigenvectors with eigenvalues given by: The vectors are derived from the original vectors by: And the inverse transformation needed to recover the original data vectors is: Here, we have recalled that for an orthogonal matrix In fact, it may be shown that is the matrix whose rows are formed from the eigenvectors of , and that the diagonalized covariance matrix is given by: So that: In what follows, we shall assume that eigenvalues have been placed in an ordered sequence, starting with the largest. In that case, represents the most characteristic of the set data points, with later eigenvalues representing successively less significant characteristics. We could even go so far as to say that, in some sense, represents the most interesting characteristic of the data, whereas would be devoid of interest. More practically, if we ignored we might not lose much information. And indeed the last few eigenvalues would frequently represent characteristics which are not statistically significant and are essentially noise. For these reasons, PCA is commonly used for reduction in dimensionality of the feature space to some lower value .

Well, this was a short blog post. I have understood the concept behind PCA, I just haven’t understood how I can calculate it. No worries! There’s always NumPy… Unless I’m in C++, then I’m screwed!