These Over-fitting and Under-fitting terms are quite common among people who are in Machine Learning and Data Science filed. In this article, we will look into these two terminology and few more terms to understand it better.

Over-fitting can be described as when the model has high variance and low bias.

Under-fitting can be described as when the model has high bias and low variance.

**Bias**: In simple language, understand this as when our model has a very**simple**assumption of data.**Variance**: In contrast to bias, variance when our model is too**complex**on training data.

As we can see in the above image, first one example of High bias(underfitting) and last one is an example of High variance(over-fitting).

One way is using error in model predications.

**For the case of Under-fitting**: We have high error in training data as well as testing data.**In case of Over-fitting**: We have less error on training data but high error testing data.

Now, as we know all the required terms, let's conclude and define Under-fitting and Over-fitting again.

A model is said to be under-fit when the model has high bias and less variance, which can also be verified if the model gives high error on both training and test dataset.

On the other hand, a model is said to be an over-fit model if it has high variance and low bias, for verification, over-fit model has high accuracy(less error) on training data whereas high error on test data.

The ** key difference between L1 and L2 regularization is the penalty term or how weights are used**, L2 is the sum of the **square** of the weights, while L1 is just the **absolute** sum of the weights, using these techniques we can to avoid over-fitting.

In L1 Regularization or Lasso Regression, the cost function is changed by L1 loss function which used to minimize the error, **that is the sum of the all the absolute(mod) differences between the actual value and the predicted value.**

Cost Function for Lasso Regression

In L2 Regularization or Ridge Regression, the cost function is changed by L2 loss function which used to minimize the error, **that is sum of the all the squared differences between the actual value and the predicted value**.

Cost function for Ridge Regression

Lasso Regression is useful for feature selection as regression is performed by removing the slopes whose value after model fitting is approaching to zero, meaning they are less important to the model.

It is very important to choose right value of lambda, otherwise model can lead to under-fit.