softmax是一个多分类器,可以计算预测对象属于各个类别的概率。
y i = S ( z ) i = e z i ∑ j = 1 C e z j , i = 1 , . . . , C y_i=S(\boldsymbol{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{C}e^{z_j}},i=1,…,C yi=S(z)i=∑j=1Cezjezi,i=1,…,C
softmax
的输入, 维度为 C C C变量间的计算图如上,已知 y \boldsymbol{y} y的梯度 ∂ l ∂ y i , i = 1 , . . . , C \frac{\partial l}{\partial y_i}, i=1,…,C ∂yi∂l,i=1,…,C,要计算 z \boldsymbol{z} z的梯度 ∂ l ∂ z j , j = 1 , . . . , C \frac{\partial l}{\partial z_j}, j=1,…,C ∂zj∂l,j=1,…,C
从计算图中可以看到, z \boldsymbol{z} z的分量 z j z_j zj对 y \boldsymbol{y} y的每一个分量都有贡献,因此:
∂ l ∂ z j = ∑ i = 1 C ∂ l ∂ y i ∂ y i ∂ z j \frac{\partial l}{\partial z_j} = \sum_{i=1}^{C}\frac{\partial l}{\partial y_i} \frac{\partial y_i}{\partial z_j} ∂zj∂l=i=1∑C∂yi∂l∂zj∂yi
由于 ∂ l ∂ y i \frac{\partial l}{\partial y_i} ∂yi∂l已知,因此计算 ∂ y i ∂ z j \frac{\partial y_i}{\partial z_j} ∂zj∂yi即可!
为方便记 ∑ j = 1 C e z j \sum_{j=1}^{C}e^{z_j} ∑j=1Cezj为 ∑ C \sum_C ∑C
(1) i = j i=j i=j时:
∂ y i ∂ z j = e z i ∑ C − e z i e z i ∑ C 2 = e z i ∑ C − e z i ∑ C 2 = y i − y i 2 = y i ( 1 − y i ) \begin{aligned} \frac{\partial y_i}{\partial z_j} & = \frac{e^{z_i}\sum_C-e^{z_i}e^{z_i}}{{\sum_C}^2} \\ &=\frac{e^{z_i}}{\sum_C} - \frac{e_{z_i}}{\sum_C}^2 \\ & = y_i-y_i^2 \\ & = y_i(1-y_i) \end{aligned} ∂zj∂yi=∑C2ezi∑C−eziezi=∑Cezi−∑Cezi2=yi−yi2=yi(1−yi)
(2) i ≠ j i \neq j i̸=j
∂ y i ∂ z j = 0 ∑ C − e z i e z j ∑ C 2 = − e z i ∑ C e z j ∑ C = − y i y j \begin{aligned} \frac{\partial y_i}{\partial z_j} &= \frac{0\sum_C - e^{z_i}e^{z_j}}{{\sum_C}^2} \\ &= -\frac{e_{z_i}}{\sum_C}\frac{e_{z_j}}{\sum_C} \\ &=-y_iy_j \end{aligned} ∂zj∂yi=∑C20∑C−eziezj=−∑Cezi∑Cezj=−yiyj
手机扫一扫
移动阅读更方便
你可能感兴趣的文章