Logistic regression中regularization失败的解决方法探索(文末附解决后code)
阅读原文时间:2023年07月09日阅读:1

在matlab中做Regularized logistic regression

原理:

我的代码:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta

h = sigmoid(X\*theta);  
theta2=\[0;theta(2:end)\];

J\_partial = sum((-y).\*log(h)+(y-1).\*log(1-h))./m;  
J\_regularization= (lambda/(2\*m)).\*sum(theta2.^2);  
J = J\_partial+J\_regularization;

grad\_partial = sum((h-y).\*X)/m;  
grad\_regularization = lambda.\*theta2./m;  
grad = grad\_partial+grad\_regularization;

% =============================================================

end

运行结果:

标黄的与下面的预期对比发现不同

尝试删去

+grad_regularization

.rtcContent { padding: 30px }
.lineNode { font-size: 10pt; font-family: Menlo, Monaco, Consolas, "Courier New", monospace; font-style: normal; font-weight: normal }

部分结果符合预期,部分不符合

尝试大佬代码

%Hypotheses
hx = sigmoid(X * theta);
%%The cost without regularization
J_partial = (-y' * log(hx) - (1 - y)' * log(1 - hx)) ./ m;
%%Regularization Cost Added
J_regularization = (lambda/(2*m)) * sum(theta(2:end).^2);
%%Cost when we add regularization
J = J_partial + J_regularization;
%Grad without regularization
grad_partial = (1/m) * (X' * (hx -y));
%%Grad Cost Added
grad_regularization = (lambda/m) .* theta(2:end);
grad_regularization = [0; grad_regularization];
grad = grad_partial + grad_regularization;

完全成功!?我不李姐……

观察大佬代码发现,我和大佬的区别在于:

最开始的theta向量和计算J(theta)和grad时候使用sum的数目

故尝试修改和大佬数目一样多的sum

h = sigmoid(X\*theta);  
theta2=\[0;theta(2:end)\];

J\_partial = (-y).\*log(h)+(y-1).\*log(1-h)./m;  
J\_regularization= (lambda/(2\*m)).\*sum(theta2.^2);  
J = J\_partial+J\_regularization;

grad\_partial = (h-y).\*X/m;  
grad\_regularization = lambda.\*theta2./m;  
grad = grad\_partial+grad\_regularization;

结果:incompatible不兼容

文档对该错误的解释如下

事已至此,只好向大佬更近一步!

h = sigmoid(X\*theta);

J\_partial = (-y).\*log(h)+(y-1).\*log(1-h)./m;  
J\_regularization= (lambda/(2\*m)).\*sum(theta(2:end).^2);  
J = J\_partial+J\_regularization;

grad\_partial = (h-y).\*X/m;  
grad\_regularization = lambda.\*theta(2:end)./m;  
grad\_regularization2=\[0;grad\_regularization\];

grad = grad\_partial+grad\_regularization2;

为什么还是不兼容?

到底哪里出了问题?

最后,尝试离大佬更近一步,把grad_partial里的(h-y).*X/m变成了(1/m) * (X' * (h -y))

h = sigmoid(X\*theta);

J\_partial = (1/m).\*((-y).\*log(h)+(y-1).\*log(1-h));  
J\_regularization= (lambda/(2\*m)).\*sum(theta(2:end).^2);  
J = J\_partial+J\_regularization;

grad\_partial = (1/m) \* (X' \* (h -y));  
grad\_regularization = (lambda/m).\*theta(2:end);  
grad\_regularization = \[0; grad\_regularization\];  
grad = grad\_partial+ grad\_regularization;

舒服了!

但,等等,上面怎么那么多行,数值还不对?看来不能完全靠大佬,还得自己改!!!

h = sigmoid(X\*theta);

J\_partial = (1/m).\*sum((-y).\*log(h)+(y-1).\*log(1-h));  
J\_regularization= (lambda/(2\*m)).\*sum(theta(2:end).^2);  
J = J\_partial+J\_regularization;

grad\_partial = (1/m) \* (X' \* (h -y));  
grad\_regularization = (lambda/m).\*theta(2:end);  
grad\_regularization = \[0; grad\_regularization\];  
grad = grad\_partial+ grad\_regularization;

最终,得到了满意的答案

以及

总结一下出现的问题

01不兼容,就像上面说明的那样,行列不匹配

(解决方法:查看有无sum、是值还是array,把系数往前放,修改两数相乘的顺序)

02加入grad_regularization后,grad(1,5)的后四项都出现了问题(很神奇地值相等),

一旦去掉又与正确值有小范围差距(缺少grad_regularization导致的)

说明grad_regularization存在问题

而如果一开始就将theta变为第一行元素是0的矩阵,很容易出现不兼容的问题

大佬的代码提示我们特殊情况可以分出来特殊处理,也就是:

在计算J(θ)不使用矩阵,而是用除0外、后面的θ直接产出需要的值

在计算grad时,由于输出也是矩阵,所以可以创建一个含0和其他θ的矩阵

这样既可以避免不兼容,也可以得出正确的结果

最终的部分code如下

h = sigmoid(X\*theta);

J\_partial = (1/m).\*sum((-y).\*log(h)+(y-1).\*log(1-h));  
J\_regularization= (lambda/(2\*m)).\*sum(theta(2:end).^2);  
J = J\_partial+J\_regularization;

grad\_partial = (1/m) \* (X' \* (h -y));  
grad\_regularization = (lambda/m).\*theta(2:end);  
grad\_regularization = \[0; grad\_regularization\];  
grad = grad\_partial+ grad\_regularization;

手机扫一扫

移动阅读更方便

阿里云服务器
腾讯云服务器
七牛云服务器