Feature selection is an important step in building a good machine learning model, one of the technique that helps us in selecting these features is checking the correlation between different features of the dataset. In this article we will discuss the following:

  1. What is Pearson Correlation Coefficient?
  2. How to find the correlation between continuous variables in a dataset?
  3. How to visualise this correlation?

The Pearson Correlation Coefficient is basically used to find out the strength of the linear relation between two continuous variables, it is represented using r. The mathematical formula to calculate the correlation coefficient is given by:


We all have come across the terms outliers or skewness while cleaning or preprocessing data for a project. So, in this article we will be discussing the following points:

  1. What is skewness?
  2. How to identify skewness in data?
  3. Why is skewness a problem?
  4. How do we fix this skewness in python?

Skewness is basically asymmetry in distribution of data as it does not show any kind of symmetry in continuous data. Skewed data can be of 2 types Right-Skewed data also called as Positively-Skewed data or Left-Skewed data also called as Negatively-Skewed data.

Identification of skewness can be easily done…

Yashowardhan Shinde

Computer Science Undergraduate, Machine Learning and Deep Learning Enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store