Key things to know about Ridge Regression:
It is a regularization technique used to address the problem of multicollinearity in linear regression models.
Multicollinearity occurs when independent variables are highly correlated, which increases the variance of the coefficient estimates.
Ridge adds a degree of “bias” to the coefficient estimates by imposing a penalty on the size of coefficients.
It works by adding the L2 norm (square) of the coefficients to the loss function that is being minimized during regression.
This shrinks the large coefficients and distributes the weight more evenly among correlated variables, improving generalization.
The shrinkage is controlled by a hyperparameter alpha. Higher alpha means more shrinkage of coefficients towards zero.
It helps avoid overfitting and gives more stable and reliable estimates compared to ordinary least squares regression.
Coefficients never become exactly zero but are shrunken, so all variables are retained in the model unlike LASSO.
Commonly used when there are many correlated predictors to get a stable set of predictors with predictive power.
So in summary, Ridge applies L2 regularization to linear models by imposing a penalty on large coefficients to address multicollinearity.