Softmax求导
其实BP过程在pytorch中可以自动进行,这里进行推导只是强迫症
A
Apart证明softmax求导和softmax的BP过程
本来像手打公式的,想想还是算了,引用部分给出latex公式说明。
A.1
softmax导数
A.2
softmax梯度下降
B
基本上都是拾人牙慧,在此给出引用和参考。
参考:
-
矩阵求导术(下) – 知乎 (zhihu.com)
-
nndl
(引用几个定理B.15和B.16)
((B.15))
[
begin{aligned}
& vec{x} in k^{M times 1}, y in R, vec{z} in R^{N times 1},quad 则:
& frac{partial y vec{z}}{partial vec{x}}=y frac{partial vec{z}}{partial vec{x}}+frac{partial y}{partial vec{x}} cdot vec{z}^{top} in R^{M times N}
end{aligned}
]
begin{aligned}
& vec{x} in k^{M times 1}, y in R, vec{z} in R^{N times 1},quad 则:
& frac{partial y vec{z}}{partial vec{x}}=y frac{partial vec{z}}{partial vec{x}}+frac{partial y}{partial vec{x}} cdot vec{z}^{top} in R^{M times N}
end{aligned}
]
[begin{aligned}
& text{[证明]:}
& dyvec{z}
& =d y cdot vec{z}+y cdot d vec{z}
&=vec{z} cdot d y+y cdot d vec{z}
&=vec{z} cdot left(frac{partial y}{partial vec{x}}right)^{top} d vec{x}+y cdotleft(frac{partial vec{z}}{partial vec{x}}right)^{top} d vec{x}
& therefore frac{partial y vec{z}}{partial vec{x}}=y cdot frac{partial vec{z}}{partial vec{x}}+frac{partial y}{partial vec{x}} cdot vec{z}^{top}
end{aligned}
]
& text{[证明]:}
& dyvec{z}
& =d y cdot vec{z}+y cdot d vec{z}
&=vec{z} cdot d y+y cdot d vec{z}
&=vec{z} cdot left(frac{partial y}{partial vec{x}}right)^{top} d vec{x}+y cdotleft(frac{partial vec{z}}{partial vec{x}}right)^{top} d vec{x}
& therefore frac{partial y vec{z}}{partial vec{x}}=y cdot frac{partial vec{z}}{partial vec{x}}+frac{partial y}{partial vec{x}} cdot vec{z}^{top}
end{aligned}
]
((B.26))
[begin{aligned}
& vec{x} in R^N, quad vec{f}(vec{x})=left[fleft(x_1right), fleft(x_2right) ldots fleft(x_nright)right] in R^N, 则
& frac{partial vec{f}(vec{x})}{partial vec{x}}=operatorname{diag}left(vec{f}^{prime}(vec{x})right)
end{aligned}
]
& vec{x} in R^N, quad vec{f}(vec{x})=left[fleft(x_1right), fleft(x_2right) ldots fleft(x_nright)right] in R^N, 则
& frac{partial vec{f}(vec{x})}{partial vec{x}}=operatorname{diag}left(vec{f}^{prime}(vec{x})right)
end{aligned}
]
[begin{aligned}
& text { [证明]: }
frac{partial vec{f}(vec{x})}{partial vec{x}}=left[begin{array}{cccc}
frac{partial f_1}{partial x_1} & frac{partial f_2}{partial x_1} & cdots & frac{partial f_n}{partial eta_n}
vdots & vdots & & vdots
frac{partial f_1}{partial x_n} & frac{partial f_1}{partial x_n} & cdots & -frac{partial f_n}{partial x_n}
end{array}right]=left[begin{array}{llll}
f^{prime}left(x_1right) & &
& f^{prime}left(x_2right) & &
& & ddots &
& & & f^{prime}left(x服务器托管网_nright)
end{array}right]=operatorname{diag}left(vec{f}^{prime}(vec{x})right)
end{aligned}
]
& text { [证明]: }
frac{partial vec{f}(vec{x})}{partial vec{x}}=left[begin{array}{cccc}
frac{partial f_1}{partial x_1} & frac{partial f_2}{partial x_1} & cdots & frac{partial f_n}{partial eta_n}
vdots & vdots & & vdots
frac{partial f_1}{partial x_n} & frac{partial f_1}{partial x_n} & cdots & -frac{partial f_n}{partial x_n}
end{array}right]=left[begin{array}{llll}
f^{prime}left(x_1right) & &
& f^{prime}left(x_2right) & &
& & ddots &
& & & f^{prime}left(x服务器托管网_nright)
end{array}right]=operatorname{diag}left(vec{f}^{prime}(vec{x})right)
end{aligned}
]
(Apart中必须说明的两个推导:)
((1))
[begin{aligned}
& vec{x} in R^n, exp (vec{x})=left[begin{array}{c}
exp left(x_1right)
vdots
exp left(x_nright)
end{array}right] in R^n
& 故存在偏导:frac{partial exp (vec{x})}{partial vec{x}}=left[begin{array}{ccc}
frac{partial exp left(x_1right)}{partial x_1} & cdots & frac{partial exp 服务器托管网left(x_nright)}{partial x_1}
vdots & &
frac{partial exp left(x_1right)}{partial x_n} & cdots & frac{partial exp left(x_nright)}{partial x_n}
end{array}right]=operatorname{diag}(exp (vec{x}))
end{aligned}
]
& vec{x} in R^n, exp (vec{x})=left[begin{array}{c}
exp left(x_1right)
vdots
exp left(x_nright)
end{array}right] in R^n
& 故存在偏导:frac{partial exp (vec{x})}{partial vec{x}}=left[begin{array}{ccc}
frac{partial exp left(x_1right)}{partial x_1} & cdots & frac{partial exp 服务器托管网left(x_nright)}{partial x_1}
vdots & &
frac{partial exp left(x_1right)}{partial x_n} & cdots & frac{partial exp left(x_nright)}{partial x_n}
end{array}right]=operatorname{diag}(exp (vec{x}))
end{aligned}
]
((2))
[begin{aligned}
& dvec{1}^{top} exp (vec{x})
& =vec{1}^{top} d exp (vec{x})
&=vec{1}^{top}left(exp ^{prime}(vec{x}) odot d vec{x}right)
&=left(vec{1} odot exp ^{prime}(vec{x})right)^{top} d vec{x}
& text { 有: } frac{partial vec{1}^{top} exp (vec{x})}{partial vec{x}}=vec{1} odot exp ^{prime}(vec{x})=exp ^{prime}(vec{x})=exp (vec{x})
end{aligned}
]
& dvec{1}^{top} exp (vec{x})
& =vec{1}^{top} d exp (vec{x})
&=vec{1}^{top}left(exp ^{prime}(vec{x}) odot d vec{x}right)
&=left(vec{1} odot exp ^{prime}(vec{x})right)^{top} d vec{x}
& text { 有: } frac{partial vec{1}^{top} exp (vec{x})}{partial vec{x}}=vec{1} odot exp ^{prime}(vec{x})=exp ^{prime}(vec{x})=exp (vec{x})
end{aligned}
]
C
理解可能有偏颇。
服务器托管,北京服务器托管,服务器租用 http://www.fwqtg.net
机房租用,北京机房租用,IDC机房托管, http://www.fwqtg.net
一、前言 小程序性能是指小程序在微信APP或者其他宿主APP中加载和呈现的速度,以及小程序对用户交互的响应程度。性能欠缺的小程序渲染和响应速度较慢,甚至会出现无法正常打开小程序的情况,在不同程度上极大地影响了用户体验,从而导致用户流失。 京东购物小程序随着更多…