PCA。Principal Component Analysis。主成分分析。
顾名思义,就是要从多元事物中解析出主要的影响因素。揭示事物的本质,简化复杂的问题。
PCA的目的是将高维的数据通过线性变换投影到较低维空间。
投影后的值应该尽可能的分散,才能尽量多的保留原始的信息。在数学上可以用最大方差来表示。
投影的低维空间应该相互之间不存在相关性。在数学上可以用协方差来表示相关性。
维度内的方差与维度间的协方差都可以表示为内积的形式,通过数学推导,可以得出我们想要的方差最大和协方差矩阵为零可以等价为将协方差矩阵对角化。具体推导及原理可参考
http://blog.codinglabs.org/articles/pca-tutorial.html
算法步骤如下:
设有m条n维数据。想要将数据降维到k维
①将原始数据按列组成n行m列矩阵X
②将X的每一行(代表一个属性字段)进行零均值化,即减去这一行的均值
③求出协方差矩阵C = 1/m * X (
X转置)
④求出协方差矩阵的特征值及对应的特征向量
⑤将特征向量按对应特征值大小从上到下按行排列成矩阵,取前k行组成矩阵P
⑥Y = P X 即为降维到k维后的数据
VB.NET + EmguCV3.0的代码实现如下:
样本X是500幅29*29的图片
Dim n = 29 * 29 Dim m = 500 Dim X As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(m,n) Dim fileCollection As Collections.ObjectModel.ReadOnlyCollection(Of String) = FileIO.FileSystem.GetFiles(Application.StartupPath & "\img") For i = 0 To fileCollection.Count - 1 Dim img As Image(Of Gray,Byte) = New Image(Of Gray,Byte)(fileCollection(i)) Dim j As Integer = 0 For x1 = 0 To img.Width - 1 For y1 = 0 To img.Height - 1 X.Data(i,j) = img.Data(y1,x1,0) j = j + 1 Next Next Next计算每一维度上的均值,进行零均值化
'计算每一维度上的均值 Dim Xxmean(n - 1) As Double For j = 0 To n - 1 For i = 0 To m - 1 Xxmean(j) = Xxmean(j) + X.Data(i,j) Next Xxmean(j) = Xxmean(j) / (m - 1) Next '每一维度上零均值化 For j = 0 To n - 1 For i = 0 To 499 X.Data(i,j) = X.Data(i,j) - Xxmean(j) Next Next计算协方差矩阵
'协方差矩阵C = 1/mXXt Dim C As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(n,n) For i = 0 To n - 1 For j = 0 To n - 1 Dim sum As Double = 0 For z = 0 To m - 1 sum = sum + X.Data(z,i) * X.Data(z,j) Next C.Data(i,j) = sum / m Next Next
计算对称矩阵的特征值和特征向量
'特征向量,按行排列~ Dim eigenVector As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(n,n) '特征值输出矩阵,按降序排列。特征值与特征向量的排序是一致的 Dim eigenValues As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(n,1) CvInvoke.Eigen(C,eigenValues,eigenVector)特征值降序排列,取前k行,并与样本X相乘,求得降维后的结果
'k为降维后的维度 Dim k As Integer = 99 '取前k行组成P Dim P As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(n,k) For i = 0 To k - 1 For j = 0 To n - 1 P.Data(j,i) = eigenVector.Data(j,i) Next Next 'Y = PX即为降维到k维后的输出 Dim Y As Emgu.CV.Matrix(Of Double) = New Matrix(Of Double)(k,m) For i = 0 To k - 1 For j = 0 To m - 1 Dim sum As Double = 0.0 For x1 = 0 To n - 1 sum = sum + P.Data(x1,i) * X.Data(j,x1) Next Y.Data(i,j) = sum Next Next
http://www.jb51.cc/article/p-qgsuxlac-bpk.html