Key things to know about Lasso (Least Absolute Shrinkage and Selection Operator) Regression:
Like Ridge, it is also a regularization technique used for linear regression models.
However, instead of penalizing the L2 norm of coefficients like Ridge, Lasso penalizes the L1 norm (absolute sum).
This induces sparsity by forcing some coefficients to become exactly zero, automatically performing variable selection.
It selects a parsimonious model with fewer predictors than Ridge by driving unnecessary coefficients to zero.
Only the most informative predictors remain, ignoring least important ones and improving interpretability.
The degree of sparsity is controlled by the regularization hyperparameter (lambda).
Converges faster than Ridge as the cost function is convex but not necessarily differentiable.
Commonly used when the true underlying model is sparse in nature i.e. has only a few influential predictors.
Tends to give higher prediction accuracy than Ridge when number of features is very large as it selects only relevant features.
So in summary, Lasso squeezes irrelevant coefficients to zero for simplified model interpretation while performing embedded feature selection.