---恢复内容开始---
Softmax Regression 可以看做是 LR 算法在多分类上的推广,即类标签 y 的取值大于或者等于 2。
假设数据样本集为:$\left \{ \left ( X^{(1)},y ^{(1)} \right ) ,\left ( X^{(2)},y ^{(2)} \right ),\left ( X^{(3)},y ^{(3)} \right ),…,\left ( X^{(m)},y ^{(m)} \right )\right \}$
对于 SR 算法,其输入特征为:$ X^{(i)} \in \mathbb{R}^{n+1}$,类别标记为:$y^{(i)} \in \{ 0,1,2,…,k \}$,假设函数为每一个样本估计其所属类别的概率 $P(y=j|X)$,具体的假设函数为:
$h_{\theta}(X^{(i)}) =\begin{bmatrix}
P(y^{(i)}=1|X^{(i)};\theta)\\
P(y^{(i)}=2|X^{(i)};\theta)\\
…\\
P(y^{(i)}=k|X^{(i)};\theta)
\end{bmatrix} = \frac{1}{\sum _{j=1}^{k}e^{\theta_j^TX^{(i)}}}\begin{bmatrix}
e^{\theta_1^TX^{(i)}}\\
e^{\theta_2^TX^{(i)}}\\
…\\
e^{\theta_k^TX^{(i)}}
\end{bmatrix}$
其中,$\theta$表示的向量,且 $\theta_i \in \mathbb{R}^{n+1}$,则对于每一个样本估计其所属的类别的概率为
$P(y^{(i)}=j|X^{(i)};\theta) = \frac{e^{\theta_j^TX^{(i)}}}{\sum _{l=1}^{k}e^{\theta_l^TX^{(i)}}}$
SR 的损失函数为:
$J(\theta) = -\frac{1}{m} \left [\sum_{i=1}^{m} \sum_{j=1}^{k} I \{ y^{(i)}=j \} \log \frac{e^{\theta_j^TX^{(i)}}}{\sum _{l=1}^{k}e^{\theta_l^TX^{(i)}}} \right ]$
其中,$I(x) = \left\{\begin{matrix}
0 & if\;\;x = false\\
1 & if\;\;x = true
\end{matrix}\right.$ 表示指示函数。
对于上述的损失函数,可以使用梯度下降法求解:
首先求参数的梯度:
$\frac{\partial J(\theta )}{\partial \theta _j} = -\frac{1}{m}\left [ \sum_{i=1}^{m}\triangledown _{\theta_j}\left \{ \sum_{j=1}^{k}I(y^{(i)}=j) \log\frac{e^{\theta_j^TX^{(i)}}}{\sum _{l=1}^{k}e^{\theta_l^TX^{(i)}}} \right \} \right ]$
当 $y^{(i)}=j$ 时, $\frac{\partial J(\theta )}{\partial \theta _j} = -\frac{1}{m}\sum_{i=1}^{m}\left [\left ( 1-\frac{e^{\theta_j^TX^{(i)}}}{\sum _{l=1}^{k}e^{\theta_l^TX^{(i)}}} \right )X^{(i)} \right ]$
当 $y^{(i)}\neq j$ 时,$\frac{\partial J(\theta )}{\partial \theta _j} = -\frac{1}{m}\sum_{i=1}^{m}\left [\left (-\frac{e^{\theta_j^TX^{(i)}}}{\sum _{l=1}^{k}e^{\theta_l^TX^{(i)}}} \right )X^{(i)} \right ]$
因此,最终结果为:
$g(\theta_j) = \frac{\partial J(\theta )}{\partial \theta _j} = -\frac{1}{m}\sum_{i=1}^{m}\left [X^{(i)} \cdot \left ( I\left \{ y^{(i)}=j \right \}-P( y^{(i)}=j|X^{(i)};\theta) \right ) \right ]$
梯度下降法的迭代更新公式为:
$\theta_j = \theta_j - \alpha \cdot g(\theta_j)$
主要python代码
def gradientAscent(feature_data,label_data,k,maxCycle,alpha):
'''
梯度下降求解Softmax模型
:param feature_data: 特征
:param label_data: 标签
:param k: 类别个数
:param maxCycle: 最大迭代次数
:param alpha: 学习率
:return: 权重
'''
m,n = np.shape(feature_data)
weights = np.mat(np.ones((n,k))) #一共有n*k个权值
i = 0
while i <=maxCycle:
i+=1
err = np.exp(feature_data*weights) #e^(\theta_j * x^i)
if i%100==0:
print ("\t-----iter:",i,",cost:",cost(err,label_data))
rowsum = -err.sum(axis = 1)
rowsum = rowsum.repeat(k,axis = 1)
err = err/rowsum # -p(y^i = j|x^i;0)
for x in range(m):
err[x,label_data[x,0]]+=1 # I(y^i = j)-p(y^i = j|x^i;0)
weights = weights+(alpha/m)*feature_data.T*err #weights
return weights
def cost(err,label_data):
'''
计算损失函数值
:param err: exp的值
:param label_data: 标签值
:return: sum_cost/m:损失函数值
'''
m = np.shape(err)[0]
sum_cost = 0.0
for i in xrange(m):
if err[i,label_data[i,0]] / np.sum(err[i,:])>0:
sum_cost -=np.log(err[i,label_data[i,0]]/np.sum(err[i,:]))
else:
sum_cost-=0
return sum_cost/m
Sklearn代码:
lr = LogisticRegressionCV(fit_intercept=True, Cs=np.logspace(-5, 1, 100),
multi_class='multinomial', penalty='l2', solver='lbfgs',max_iter = 10000,cv = 7)#multinomial表示多类即softmax回归
re = lr.fit(X_train, Y_train)
手机扫一扫
移动阅读更方便
你可能感兴趣的文章