5 MOST USED VISUALIZATION IN DATA SCIENCE ANALYSIS

Data visualization is a big part of a data scientist job. Visualizing our dataset is one of the best ways to understand trend in our dataset, especially when we are dealing with large datasets.

Matplotlib and Seaborn are popular python libraries that can be used to create your data visualizations easily.

Visualizing our dataset helps us in understanding the trend, patterns and outliers within large dataset, it also helps in us to identify the correlations or the relationship between the independent variables.

In this post I will be sharing with us 5 most used visualizations and with some easy quick functions for them with python matplotlib and seaborn.

  1. Scatterplot: This is a type of matplotlib visualization that is used in identifying the outliers in a dataset. It is also great for showing the relationship between the variables since you can directly see the raw distribution of the data.
dataset =pd.read_csv(r"D:\titanic.csv")
x= np.linspace(0,10,25)
y=x*x*8
plt.scatter(x,y)
plt.show()
Scatterplot

2. Histogram: A histogram is a graphical display of numerical data in the form of upright bars with the area of each bar representing frequency. Histograms are useful for viewing or discovering the distribution of data points.

n =np.array([1,2,3,4])
plt.bar(n,n**2,align='center')
plt.show()
Histogram

3. Barplot:Bar plots are most effective when you are visualizing categorical data that has few categories. Having too many categories will make the data cluttered in the figure and hard to understand.

n =np.array([1,2,3,4])plt.bar(n,n**2,align='center')
Barplot

4. Box plot: Box are used to show overall patterns of response for a group. They provide a useful way to visualize the range, median, interquartile range, upper quartile range and other characteristics for a large group.

import numpy as np;np.random.seed(42)
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
df=pd.DataFrame(np.random.random(size=(4,4)), index=df.index, columns=['A','B','X','Y'])
sns.boxplot(x="variable",y="value",data=pd.melt(df))
Boxplot

5. Heatmap: A heatmap is a graphical representation of data where each value matrix is represented as a color. It is mainly used in checking correlations between independent features.

df_dict={"District_No":[21,27,30,31],
"Year":[2000,2001,2002,2003],
"population":[10000,8500,35000,12000],
"age":[50,80,70,100]}
df=pd.DataFrame(df_dict,index=[2,4,6,8])
corrmat=df.corr()
fig=plt.figure(figsize=(12,9))
sns.heatmap(corrmat, vmax=.8)
Heatmap

New Media consultant || Machine Learning Engineer