注:本文为 “线性代数 · SVD” 相关英文引文,机翻未校。
如有内容异常,请看原文。
csdn 篇幅字数限制,分为两篇,此为第 1 篇。
Singular value decomposition | Annoying Precision
奇异值分解 | 令人困扰的精度
Annoying Precision
令人困扰的精度
“A good stock of examples, as large as possible, is indispensable for a thorough understanding of any concept, and when I want to learn something new, I make it my first job to build one.” – Paul Halmos
“要透彻理解任何概念,大量的示例储备必不可少;当我想要学习新事物时,我的首要任务就是构建这样的示例储备。”——保罗·哈尔莫斯(Paul Halmos)
Singular value decomposition
奇异值分解
March 13, 2017 by Qiaochu Yuan
As a warm-up to the subject of this blog post, consider the problem of how to classifyn × m n \times mn×mmatricesM ∈ R n × m M \in \mathbb{R}^{n \times m}M∈Rn×mup to change of basis in both the source (R m \mathbb{R}^{m}Rm) and the target (R n \mathbb{R}^{n}Rn). In other words, the problem is to describe the equivalence classes of the equivalence relation onn × m n \times mn×mmatrices given by
作为本文主题的热身内容,我们来思考一个问题:如何在源空间(R m \mathbb{R}^{m}Rm)和目标空间(R n \mathbb{R}^{n}Rn)均进行基变换的前提下,对n × m n \times mn×m 矩阵 M ∈ R n × m M \in \mathbb{R}^{n \times m}M∈Rn×m进行分类。换句话说,该问题旨在描述n × m n \times mn×m矩阵集合上由下述关系定义的等价类:
M ∼ N ⇔ M = P N Q − 1 , P ∈ G L n ( R ) , Q ∈ G L m ( R ) . M \sim N \Leftrightarrow M = PNQ^{-1}, P \in GL_{n}(\mathbb{R}), Q \in GL_{m}(\mathbb{R}).M∼N⇔M=PNQ−1,P∈GLn(R),Q∈GLm(R).
It turns out that the equivalence class ofM MMis completely determined by its rankr = r a n k ( M ) r = rank(M)r=rank(M). To prove this we construct some bases by induction. For starters, letx 1 ∈ R m x_{1} \in \mathbb{R}^{m}x1∈Rmbe a vector such thaty 1 = M x 1 ≠ 0 y_{1} = M x_{1} \neq 0y1=Mx1=0; this is always possible unlessM = 0 M = 0M=0. Next, letx 2 ∈ R m x_{2} \in \mathbb{R}^{m}x2∈Rmbe a vector such thaty 2 = M x 2 y_{2} = M x_{2}y2=Mx2is linearly independent ofy 1 y_{1}y1; this is always possible unlessr a n k ( M ) = 1 rank(M) = 1rank(M)=1.
事实证明,矩阵M MM的等价类完全由其秩r = r a n k ( M ) r = rank(M)r=rank(M)决定。为证明这一点,我们经过数学归纳法构造一些基。首先,取向量x 1 ∈ R m x_{1} \in \mathbb{R}^{m}x1∈Rm 使得 y 1 = M x 1 ≠ 0 y_{1} = M x_{1} \neq 0y1=Mx1=0;除非 M = 0 M = 0M=0,否则这样的向量总能找到。接下来,取向量x 2 ∈ R m x_{2} \in \mathbb{R}^{m}x2∈Rm 使得 y 2 = M x 2 y_{2} = M x_{2}y2=Mx2 与 y 1 y_{1}y1线性无关;除非r a n k ( M ) = 1 rank(M) = 1rank(M)=1,否则这样的向量也总能找到。
Continuing in this way, we construct vectorsx 1 , … , x r ∈ R m x_{1}, \dots, x_{r} \in \mathbb{R}^{m}x1,…,xr∈Rmsuch that the vectorsy 1 = M x 1 , … , y r = M x r ∈ R n y_{1} = M x_{1}, \dots, y_{r} = M x_{r} \in \mathbb{R}^{n}y1=Mx1,…,yr=Mxr∈Rnare linearly independent, hence a basis of the column space ofM MM. Next, we complete thex i x_{i}xi and y i y_{i}yito bases ofR m \mathbb{R}^{m}Rm and R n \mathbb{R}^{n}Rnin whatever manner we like. With respect to these bases,M MMtakes a very simple form: we haveM x i = y i M x_{i} = y_{i}Mxi=yi if 1 ≤ i ≤ r 1 \leq i \leq r1≤i≤rand otherwiseM x i = 0 M x_{i} = 0Mxi=0. Hence, in these bases,M MMis a block matrix where the top left block is anr × r r \times rr×ridentity matrix and the other blocks are zero.
按照此种方式继续,我们构造出向量x 1 , … , x r ∈ R m x_{1}, \dots, x_{r} \in \mathbb{R}^{m}x1,…,xr∈Rm,使得向量 y 1 = M x 1 , … , y r = M x r ∈ R n y_{1} = M x_{1}, \dots, y_{r} = M x_{r} \in \mathbb{R}^{n}y1=Mx1,…,yr=Mxr∈Rn线性无关,因此这些向量构成M MM的列空间的一组基。接下来,我们以任意方式将x i x_{i}xi 扩充为 R m \mathbb{R}^{m}Rm 的基,将 y i y_{i}yi 扩充为 R n \mathbb{R}^{n}Rn的基。在这些基下,矩阵M MM具有很简洁的形式:当1 ≤ i ≤ r 1 \leq i \leq r1≤i≤r 时,M x i = y i M x_{i} = y_{i}Mxi=yi;当 i > r i > ri>r 时,M x i = 0 M x_{i} = 0Mxi=0。因此,在这些基下,M MM是一个分块矩阵,其左上角分块为r × r r \times rr×r的单位矩阵,其余分块均为零矩阵。
Explicitly, this means we can writeM MMas a product
更明确地说,这意味着我们可以将M MM表示为如下乘积形式:
M = P D Q − 1 , P ∈ G L n ( R ) , Q ∈ G L m ( R ) M = P D Q^{-1}, P \in GL_{n}(\mathbb{R}), Q \in GL_{m}(\mathbb{R})M=PDQ−1,P∈GLn(R),Q∈GLm(R)
where D DDhas the block form above, the columns ofP PPare the basis ofR n \mathbb{R}^{n}Rnwe found by completingM x 1 , ⋯ , M x r M x_{1}, \cdots, M x_{r}Mx1,⋯,Mxr, and the columns ofQ QQare the basis ofR m \mathbb{R}^{m}Rmwe found by completingx 1 , ⋯ , x r x_{1}, \cdots, x_{r}x1,⋯,xr. This decomposition can be computed by row and column reduction onM MM, where the row operations we perform giveP PPand the column operations we perform giveQ QQ.
其中 D DD具有上述分块形式,矩阵P PP的列是借助扩充M x 1 , ⋯ , M x r M x_{1}, \cdots, M x_{r}Mx1,⋯,Mxr 得到的 R n \mathbb{R}^{n}Rn 的基,矩阵 Q QQ的列是经过扩充x 1 , ⋯ , x r x_{1}, \cdots, x_{r}x1,⋯,xr 得到的 R m \mathbb{R}^{m}Rm的基。此种分解可凭借对M MM进行行约简和列约简来计算:所执行的行变换对应得到矩阵P PP,所执行的列变换对应得到矩阵Q QQ。
Singular value decomposition | Annoying Precision
奇异值分解 | 令人困扰的精度
Conceptually, the question we’ve asked is: what does a linear transformationT : X → Y T: X \to YT:X→Ybetween vector spaces “look like,” when we don’t restrict ourselves to picking a particular basis ofX XX or Y YY? The answer, stated in a basis-independent form, is the following. First, we can factorT TTas a composite
从概念上讲,我们提出的问题是:当不限制选择向量空间X XX 或 Y YY的特定基时,向量空间之间的线性变换T : X → Y T: X \to YT:X→Y“看起来是什么样的”?以不依赖于基的形式表述,答案如下:首先,大家能够将T TT分解为如下复合变换
X → p i m ( T ) → i Y X \stackrel{p}{\to} im(T) \stackrel{i}{\to} YX→pim(T)→iY
where i m ( T ) im(T)im(T)is the image ofT TT. Next, we can find direct sum decompositionsX ≅ i m ( T ) ⊕ X ′ X \cong im(T) \oplus X'X≅im(T)⊕X′ and Y ≅ i m ( T ) ⊕ Y ′ Y \cong im(T) \oplus Y'Y≅im(T)⊕Y′such thatp ppis the projection ofX XXonto its first factor andi iiis the inclusion of the first factor intoY YY. Hence every linear transformation “looks like” a composite
其中 i m ( T ) im(T)im(T) 是 T TT的像空间。接下来,我们可以找到直和分解X ≅ i m ( T ) ⊕ X ′ X \cong im(T) \oplus X'X≅im(T)⊕X′ 和 Y ≅ i m ( T ) ⊕ Y ′ Y \cong im(T) \oplus Y'Y≅im(T)⊕Y′,使得 p pp 是 X XX到其第一个直和因子i m ( T ) im(T)im(T) 的投影,i ii是第一个直和因子i m ( T ) im(T)im(T) 到 Y YY的包含映射。因此,每个线性变换“看起来都像”如下复合变换
i m ( T ) ⊕ X ′ → p i m ( T ) i m ( T ) → i i m ( T ) i m ( T ) ⊕ Y ′ im(T) \oplus X' \stackrel{p_{im(T)}}{\to} im(T) \stackrel{i_{im(T)}}{\to} im(T) \oplus Y'im(T)⊕X′→pim(T)im(T)→iim(T)im(T)⊕Y′
of a projection onto a direct summand and an inclusion of a direct summand. So the only basis-independent information contained inT TTis the dimension of the imagei m ( T ) im(T)im(T), or equivalently the rank ofT TT. (It’s worth considering the analogous question for functions between sets, whose answer is a bit more complicated.)
该复合变换由“到直和因子的投影”与“直和因子的包含映射”构成。因此,线性变换T TT中仅有的不依赖于基的信息是其像空间i m ( T ) im(T)im(T)的维数,或者等价地说,是T TT的秩。(值得思考集合之间函数的类似挑战,其答案会复杂一些。)
The actual problem this blog post is about is more interesting: it is to classifyn × m n \times mn×mmatricesM ∈ R n × m M \in \mathbb{R}^{n \times m}M∈Rn×mup to orthogonal change of basis in both the source and the target. In other words, we now want to understand the equivalence classes of the equivalence relation given by
本文真正要探讨的问题更有趣:在源空间和目标空间均进行正交基变换的前提下,对n × m n \times mn×m 矩阵 M ∈ R n × m M \in \mathbb{R}^{n \times m}M∈Rn×m进行分类。换句话说,我们现在希望理解由下述关系定义的等价类:
M ∼ N ⇔ M = U N V − 1 , U ∈ O ( n ) , V ∈ O ( m ) M \sim N \Leftrightarrow M = U N V^{-1}, U \in O(n), V \in O(m)M∼N⇔M=UNV−1,U∈O(n),V∈O(m)
Conceptually, we’re now asking: what does a linear transformationT : X → Y T: X \to YT:X→Ybetween finite-dimensional Hilbert spaces “look like”?
从概念上讲,我们现在提出的问题是:有限维希尔伯特空间之间的线性变换T : X → Y T: X \to YT:X→Y“看起来是什么样的”?
Inventing singular value decomposition
构造奇异值分解
As before, we’ll answer this question by picking bases with respect to whichM MMis as easy to understand as possible, only this time we need to deal with the additional restriction of choosing orthonormal bases. We will follow roughly the same inductive strategy as before. For starters, we would like to pick a unit vectorv 1 ∈ R m v_{1} \in \mathbb{R}^{m}v1∈Rm, ∥ v 1 ∥ = 1 \|v_{1}\| = 1∥v1∥=1such thatM v 1 ≠ 0 M v_{1} \neq 0Mv1=0; this is possible unlessM MMis identically zero, in which case there’s not much to say. Now, there’s no guarantee thatM v 1 M v_{1}Mv1will be a unit vector, but we can always use
与之前类似,我们将通过选择基来回答这个问题——在这些基下,矩阵M MM尽可能易于理解,只不过这次要求额外满足“选择标准正交基”的限制。我们将采用与之前大致相同的归纳策略。首先,我们选择单位向量v 1 ∈ R m v_{1} \in \mathbb{R}^{m}v1∈Rm(即 ∥ v 1 ∥ = 1 \|v_{1}\| = 1∥v1∥=1)使得 M v 1 ≠ 0 M v_{1} \neq 0Mv1=0;除非 M MM是零矩阵(此时无需过多讨论),否则这样的向量总能找到。纵然无法保证M v 1 M v_{1}Mv1是单位向量,但我们总能取
u 1 = M v 1 ∥ M v 1 ∥ ∈ R n u_{1} = \frac{M v_{1}}{\|M v_{1}\|} \in \mathbb{R}^{n}u1=∥Mv1∥Mv1∈Rn
as the beginning of an orthonormal basis ofR n \mathbb{R}^{n}Rn. The question remains which of the many possible values ofv 1 v_{1}v1to use. In the previous argument it didn’t matter because they were all related by change of coordinates, but now it very much does because the length∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥may differ for different choices ofv 1 v_{1}v1. A natural choice is to pickv 1 v_{1}v1so that∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥is as large as possible (hence equal to the operator norm (https://en.wikipedia.org/wiki/Operator_norm)∥ M ∥ \|M\|∥M∥ of M MM); writingσ 1 = ∥ M v 1 ∥ \sigma_{1} = \|M v_{1}\|σ1=∥Mv1∥, we then have
作为 R n \mathbb{R}^{n}Rn标准正交基的起始向量。但问题在于,应选择众多可能的v 1 v_{1}v1中的哪一个?在之前的论证中,选择哪个v 1 v_{1}v1无关紧要,基于它们可通过坐标变换相互关联;但现在情况大不相同,因为不同v 1 v_{1}v1 对应的长度 ∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥可能不同。一个自然的选择是:选取v 1 v_{1}v1 使得 ∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥尽可能大(因此该最大值等于M MM的算子范数(https://en.wikipedia.org/wiki/Operator_norm)∥ M ∥ \|M\|∥M∥)。记 σ 1 = ∥ M v 1 ∥ \sigma_{1} = \|M v_{1}\|σ1=∥Mv1∥,则有
M v 1 = σ 1 u 1 , ∥ v 1 ∥ = ∥ u 1 ∥ = 1 M v_{1} = \sigma_{1} u_{1}, \|v_{1}\| = \|u_{1}\| = 1Mv1=σ1u1,∥v1∥=∥u1∥=1
σ 1 \sigma_{1}σ1is called the first singular value ofM MM, v 1 v_{1}v1is called its first right singular vector, andu 1 u_{1}u1is called its first left singular vector. (The singular vectors aren’t unique in general, but we’ll ignore this for now.) To continue building orthonormal bases we need to find a unit vector
其中 σ 1 \sigma_{1}σ1 称为 M MM的第一个奇异值,v 1 v_{1}v1称为第一个右奇异向量,u 1 u_{1}u1称为第一个左奇异向量。(一般来说,奇异向量并不唯一,但我们暂时忽略这一点。)为继续构造标准正交基,我们需要找到一个单位向量
v 2 ∈ R m , ∥ v 2 ∥ = 1 , ⟨ v 1 , v 2 ⟩ = 0 v_{2} \in \mathbb{R}^{m}, \|v_{2}\| = 1, \langle v_{1}, v_{2} \rangle = 0v2∈Rm,∥v2∥=1,⟨v1,v2⟩=0
orthogonal tov 1 v_{1}v1such thatM v 2 M v_{2}Mv2is linearly independent ofM v 1 M v_{1}Mv1; this is possible unlessr a n k ( M ) = 1 rank(M) = 1rank(M)=1, in which case we’re already done andM MMis completely describable asM 1 = σ 1 u 1 ⊗ v 1 ∗ M_{1} = \sigma_{1} u_{1} \otimes v_{1}^{*}M1=σ1u1⊗v1∗; equivalently, in this case we have
该向量与 v 1 v_{1}v1正交,且使得M v 2 M v_{2}Mv2 与 M v 1 M v_{1}Mv1线性无关;除非r a n k ( M ) = 1 rank(M) = 1rank(M)=1(此时我们的构造已完成,且M MM可完全表示为M 1 = σ 1 u 1 ⊗ v 1 ∗ M_{1} = \sigma_{1} u_{1} \otimes v_{1}^{*}M1=σ1u1⊗v1∗),否则这样的向量总能找到。等价地,在r a n k ( M ) = 1 rank(M) = 1rank(M)=1的情况下,有
M x = M 1 x = σ 1 u 1 ⟨ v 1 , x ⟩ M x = M_{1} x = \sigma_{1} u_{1} \langle v_{1}, x \rangleMx=M1x=σ1u1⟨v1,x⟩
We’ll pickv 2 v_{2}v2using the same strategy as before: we want the value ofv 2 v_{2}v2such that∥ M v 2 ∥ \|M v_{2}\|∥Mv2∥is as large as possible. Note that sinceM 1 v 2 = 0 M_{1} v_{2} = 0M1v2=0, this is equivalent to finding the value ofv 2 v_{2}v2such that∥ ( M − M 1 ) v 2 ∥ \|(M - M_{1}) v_{2}\|∥(M−M1)v2∥is as large as possible. Call this largest possible valueσ 2 \sigma_{2}σ2and write
我们采用与之前相同的策略选择v 2 v_{2}v2:选取 v 2 v_{2}v2 使得 ∥ M v 2 ∥ \|M v_{2}\|∥Mv2∥尽可能大。注意到M 1 v 2 = 0 M_{1} v_{2} = 0M1v2=0,因此这等价于选取v 2 v_{2}v2 使得 ∥ ( M − M 1 ) v 2 ∥ \|(M - M_{1}) v_{2}\|∥(M−M1)v2∥尽可能大。记该最大值为σ 2 \sigma_{2}σ2,则有
M v 2 = σ 2 u 2 , ∥ v 2 ∥ = ∥ u 2 ∥ = 1 M v_{2} = \sigma_{2} u_{2}, \|v_{2}\| = \|u_{2}\| = 1Mv2=σ2u2,∥v2∥=∥u2∥=1
At this point we are in trouble unless⟨ u 1 , u 2 ⟩ = 0 \langle u_{1}, u_{2} \rangle = 0⟨u1,u2⟩=0; if this weren’t the case then our strategy would fail to actually build an orthonormal basis ofR n \mathbb{R}^{n}Rn. Very importantly, this turns out to be the case.
此时,若 ⟨ u 1 , u 2 ⟩ ≠ 0 \langle u_{1}, u_{2} \rangle \neq 0⟨u1,u2⟩=0,我们的构造将陷入困境——因为这意味着我们的策略无法真正构建出R n \mathbb{R}^{n}Rn的标准正交基。但至关重要的是,⟨ u 1 , u 2 ⟩ = 0 \langle u_{1}, u_{2} \rangle = 0⟨u1,u2⟩=0这一关系实际上是成立的。
Key lemma #1: Supposev 1 v_{1}v1is a unit vector maximizing∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥. Let v vvbe a unit vector orthogonal tov 1 v_{1}v1. ThenM v M vMvis also orthogonal toM v 1 M v_{1}Mv1.
关键引理1:设v 1 v_{1}v1 是使得 ∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥最大的单位向量,v vv 是与 v 1 v_{1}v1正交的单位向量,则M v M vMv 也与 M v 1 M v_{1}Mv1 正交。
Proof. Consider the function
证明:考虑如下函数
f ( t ) = ∥ M ( v 1 cos t + v sin t ) ∥ 2 . f(t) = \| M(v_{1} \cos t + v \sin t) \|^{2}.f(t)=∥M(v1cost+vsint)∥2.
The vectorsv 1 cos t + v sin t v_{1} \cos t + v \sin tv1cost+vsintare all unit vectors sincev 1 v_{1}v1, v vvare orthonormal, so by construction (ofv 1 v_{1}v1) this function is maximized whent = 0 t = 0t=0. In particular, its derivative att = 0 t = 0t=0is zero. On the other hand, we can expandf ( t ) f(t)f(t)out using dot products as
由于 v 1 v_{1}v1 和 v vv是标准正交向量,因此向量v 1 cos t + v sin t v_{1} \cos t + v \sin tv1cost+vsint均为单位向量。根据v 1 v_{1}v1的构造(使其最大化∥ M v 1 ∥ \|M v_{1}\|∥Mv1∥),函数 f ( t ) f(t)f(t) 在 t = 0 t = 0t=0处取得最大值,因此其在t = 0 t = 0t=0处的导数为零。另一方面,利用点积可将f ( t ) f(t)f(t) 展开为
f ( t ) = ∥ M v 1 ∥ cos 2 t + 2 ⟨ M v 1 , M v ⟩ cos t sin t + ∥ M v ∥ sin 2 t . f(t) = \|M v_{1}\| \cos^{2} t + 2 \langle M v_{1}, M v \rangle \cos t \sin t + \|M v\| \sin^{2} t.f(t)=∥Mv1∥cos2t+2⟨Mv1,Mv⟩costsint+∥Mv∥sin2t.
Now we can compute the first-order Taylor series expansion of this function aroundt = 0 t = 0t=0, giving
现在计算该函数在t = 0 t = 0t=0处的一阶泰勒展开式,得到
f ( t ) = ∥ M v 1 ∥ + 2 t ⟨ M v 1 , M v ⟩ + ∥ M v ∥ t 2 + O ( t 2 ) f(t) = \|M v_{1}\| + 2 t \langle M v_{1}, M v \rangle + \|M v\| t^{2} + O(t^{2})f(t)=∥Mv1∥+2t⟨Mv1,Mv⟩+∥Mv∥t2+O(t2)
so setting the first derivative att = 0 t = 0t=0to zero gives2 ⟨ M v 1 , M v ⟩ = 0 2 \langle M v_{1}, M v \rangle = 02⟨Mv1,Mv⟩=0as desired.
因此,令 t = 0 t = 0t=0处的一阶导数为零,可得2 ⟨ M v 1 , M v ⟩ = 0 2 \langle M v_{1}, M v \rangle = 02⟨Mv1,Mv⟩=0,即证得所需结论。
This is the technical heart of singular value decomposition, so it’s worth understanding in some detail. Michael Nielsen (http://cognitivemedium.com/emm/emm.html) has a very nice interactive demo/explanation of this. Geometrically, the pointsM v 1 cos t + M v sin t M v_{1} \cos t + M v \sin tMv1cost+Mvsinttrace out an ellipse centered at the origin, and by hypothesisM v 1 M v_{1}Mv1describes the semimajor axis of the ellipse: the point furthest away from the origin. As we move away fromt = 0 t = 0t=0, to first order we are moving slightly in the direction ofM v M vMvand so ifM v M vMvwere not orthogonal toM v 1 M v_{1}Mv1it would be possible to move slightly further away from the origin thanM v 1 M v_{1}Mv1by moving either in the positive or negativet ttdirection, depending on whether the angle betweenM v 1 M v_{1}Mv1 and M v M vMvis greater than or less than90 ∘ 90^\circ90∘. The only way to ensure that moving in the direction ofM v M vMvdoes not, to first order, get us further away from the origin is ifM v M vMvis orthogonal toM v 1 M v_{1}Mv1.
这是奇异值分解的技术核心,值得深入理解。迈克尔·尼尔森(Michael Nielsen)在其网站(http://cognitivemedium.com/emm/emm.html)上给予了非常好的交互式演示和解释。从几何角度看,点M v 1 cos t + M v sin t M v_{1} \cos t + M v \sin tMv1cost+Mvsint描绘出以原点为中心的椭圆;根据假设,M v 1 M v_{1}Mv1对应椭圆的长半轴——即离原点最远的点。当我们偏离t = 0 t = 0t=0时,一阶近似下我们会沿M v M vMv方向轻微移动。若M v M vMv 与 M v 1 M v_{1}Mv1不正交,则根据M v 1 M v_{1}Mv1 与 M v M vMv之间的夹角是大于还是小于90 ∘ 90^\circ90∘,沿 t tt正方向或负方向移动,就有可能到达比M v 1 M v_{1}Mv1离原点更远的点。而要确保沿M v M vMv方向移动(一阶近似下)不会使我们离原点更远,唯一的方式就是M v M vMv 与 M v 1 M v_{1}Mv1 正交。
Note that this gives a proof that the semiminor axis of an ellipse – the point closest to the origin – is always orthogonal to its semimajor axis. We can think of key lemma #1 above as more or less being equivalent to this fact, also known as the principal axis theorem (https://en.wikipedia.org/wiki/Principal_axis_theorem) in the plane, and which is also closely related to but slightly weaker than the spectral theorem for symmetric matrices.
需注意的是,这一证明同时表明:椭圆的短半轴(即离原点最近的点)始终与其长半轴正交。大家可以认为上述关键引理1在某种程度上等价于这一事实——该事实在平面几何中被称为主轴定理(https://en.wikipedia.org/wiki/Principal_axis_theorem),且与对称矩阵的谱定理密切相关,但强度稍弱。
Thanks to key lemma #1, we can continue our construction. Withr = r a n k ( M ) r = rank(M)r=rank(M)as before, we inductively produce orthonormal vectorsv 1 , … , v r ∈ R m v_{1}, \dots, v_{r} \in \mathbb{R}^{m}v1,…,vr∈Rmsuch that∥ M v i ∥ \|M v_{i}\|∥Mvi∥is maximized subject to the condition that⟨ v i , v j ⟩ = 0 \langle v_{i}, v_{j} \rangle = 0⟨vi,vj⟩=0for allj ≤ i − 1 j \leq i - 1j≤i−1, and write
借助关键引理1,我们可以继续构造过程。与之前一样,设r = r a n k ( M ) r = rank(M)r=rank(M),依据归纳法构造标准正交向量v 1 , … , v r ∈ R m v_{1}, \dots, v_{r} \in \mathbb{R}^{m}v1,…,vr∈Rm,其中每个 v i v_{i}vi 满足:在 ⟨ v i , v j ⟩ = 0 \langle v_{i}, v_{j} \rangle = 0⟨vi,vj⟩=0(对所有 j ≤ i − 1 j \leq i - 1j≤i−1)的条件下,∥ M v i ∥ \|M v_{i}\|∥Mvi∥取得最大值。记
M v i = σ i u i , ∥ u i ∥ = 1 M v_{i} = \sigma_{i} u_{i}, \|u_{i}\| = 1Mvi=σiui,∥ui∥=1
where σ i \sigma_{i}σiis the maximum value of∥ M v ∥ \|M v\|∥Mv∥on all vectorsv vvorthogonal tov 1 , … , v i − 1 v_{1}, \dots, v_{i - 1}v1,…,vi−1; note that this implies that
其中 σ i \sigma_{i}σi 是 ∥ M v ∥ \|M v\|∥Mv∥ 在所有与 v 1 , … , v i − 1 v_{1}, \dots, v_{i - 1}v1,…,vi−1 正交的向量 v vv上的最大值。需注意的是,这意味着
σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0. \sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{r} > 0.σ1≥σ2≥⋯≥σr>0.
The σ i \sigma_{i}σiare the singular values ofM MM, the v i v_{i}viare its right singular vectors, and theu i u_{i}uiare its left singular vectors. Repeated application of key lemma #1 shows that theu i u_{i}uiare an orthonormal basis of the column space ofM MM, so the construction stops here:M MMis identically zero on the orthogonal complement ofs p a n ( v 1 , … , v r ) span(v_{1}, \dots, v_{r})span(v1,…,vr), because if it weren’t then it would take a non-zero value orthogonal tos p a n ( u 1 , … , u r ) span(u_{1}, \dots, u_{r})span(u1,…,ur). This means we can writeM MMas a sum
其中 σ i \sigma_{i}σi 是 M MM 的奇异值,v i v_{i}vi是右奇异向量,u i u_{i}ui是左奇异向量。反复应用关键引理1可知,u i u_{i}ui 构成 M MM的列空间的标准正交基,因此构造过程在此处停止:在s p a n ( v 1 , … , v r ) span(v_{1}, \dots, v_{r})span(v1,…,vr)的正交补上,M MM恒为零(若不然,M MM在该正交补上会取到与s p a n ( u 1 , … , u r ) span(u_{1}, \dots, u_{r})span(u1,…,ur)通过正交的非零值,与构造矛盾)。这意味着我们能够将M MM表示为如下和式
M = ∑ i = 1 r σ i u i ⊗ v i ∗ M = \sum_{i = 1}^{r} \sigma_{i} u_{i} \otimes v_{i}^{*}M=i=1∑rσiui⊗vi∗
This is one version of the Singular Value Decomposition (SVD for short) ofM MM, and it has the benefit of being as close to unique as possible. A more familiar version of SVD is obtained by completing thev i v_{i}vi and u i u_{i}uito orthonormal bases ofR m \mathbb{R}^{m}Rm and R n \mathbb{R}^{n}Rn(necessarily highly non-unique in general). With respect to these bases,M MMtakes, similar to the warm-up, a block form where the top left block is the diagonal matrix with entriesσ 1 , … , σ r \sigma_{1}, \dots, \sigma_{r}σ1,…,σrand the remaining blocks are zero. Hence we can writeM MMas a product
这是矩阵 M MM尽可能接近唯一性。另一种更常见的SVD形式可通过将就是的奇异值分解(Singular Value Decomposition,简称SVD)的一种形式,其优点v i v_{i}vi 扩充为 R m \mathbb{R}^{m}Rm的标准正交基、将u i u_{i}ui 扩充为 R n \mathbb{R}^{n}Rn的标准正交基(一般来说,这种扩充方式具有高度非唯一性)得到。在这些基下,与热身部分类似,M MM呈现分块形式:左上角分块为对角矩阵(对角元为σ 1 , … , σ r \sigma_{1}, \dots, \sigma_{r}σ1,…,σr),其余分块均为零矩阵。因此,我们能够将M MM表示为如下乘积
M = U Σ V T , U ∈ O ( n ) , V ∈ O ( m ) M = U \Sigma V^{T}, U \in O(n), V \in O(m)M=UΣVT,U∈O(n),V∈O(m)
where Σ \SigmaΣhas the above block form,U UUhas columns given byu 1 , … , u n u_{1}, \dots, u_{n}u1,…,un, and V VVhas columns given byv 1 , … , v m v_{1}, \dots, v_{m}v1,…,vm.
其中 Σ \SigmaΣ具有上述分块形式,矩阵U UU 的列由 u 1 , … , u n u_{1}, \dots, u_{n}u1,…,un 构成,矩阵 V VV 的列由 v 1 , … , v m v_{1}, \dots, v_{m}v1,…,vm 构成。
So, stepping back a bit: what have we learned about what a linear transformationT : X → Y T: X \to YT:X→Ybetween Hilbert spaces looks like? Up to orthogonal change of basis, we’ve learned that they all look like “weighted projections”: we are almost projecting onto the image as in the warm-up, except with weights given by the singular valuesσ i \sigma_{i}σito account for changes in length. The only orthogonal-basis-independent information contained in a linear transformation turns out to be its singular values.
现在稍作回顾:关于希尔伯特空间之间的线性变换T : X → Y T: X \to YT:X→Y的形式,我们有了哪些认识?我们发现,在正交基变换的意义下,所有线性变换都呈现“加权投影”的形式:与热身部分的投影类似,但需通过奇异值σ i \sigma_{i}σi其奇异值。就是引入权重,以体现长度的变化。线性变换中仅有的不依赖于正交基的信息,最终证明
Looking for more analogies between singular value decomposition and the warm-up, we might think of the singular values as a quantitative refinement of the rank, since there arer rrof them wherer rris the rank, and if some of them are small thenT TTis close (in the operator norm) to a linear transformation having lower rank.
若进一步寻找奇异值分解与热身部分的相似性,我们可以将奇异值视为对“秩”的定量细化:一方面,奇异值的非零个数等于秩r rr;另一方面,若部分奇异值很小,则线性变换T TT在算子范数意义下接近一个低秩线性变换。
Geometrically, one way to describe the answer provided by singular value decomposition to the question “what does a linear transformation look like” is that the key to understandingT TTis to understand what it does to the unit sphere ofX XX. The image of the unit sphere is anr rr-dimensional ellipsoid, and itsprincipal axeshave direction given by the left singular vectorsu i u_{i}uiand lengths given by the singular valuesσ i \sigma_{i}σi. The right singular vectorsv i v_{i}vimap to these principal axes.
什么样”这一问题的回答:理解线性变换就是从几何角度描述奇异值分解对“线性变换T TT的关键在于理解它对X XX中单位球面的作用。单位球面在T TT下的像为一个r rr维椭球面,该椭球面的主轴方向由左奇异向量u i u_{i}ui确定,长度由奇异值σ i \sigma_{i}σi确定,而右奇异向量v i v_{i}vi恰好映射到这些主轴上。
csdn 篇幅字数限制,未完……
- 线性代数 · SVD | 令人困扰的精度 2-CSDN博客
https://blog.csdn.net/u013669912/article/details/152056758
via:
- Singular value decomposition | Annoying Precision
https://qchu.wordpress.com/2017/03/13/singular-value-decomposition/