5 MOST USED VISUALIZATION IN DATA SCIENCE ANALYSIS

Data visualization is a big part of a data scientist job. Visualizing our dataset is one of the best ways to understand trend in our dataset, especially when we are dealing with large datasets.

Matplotlib and Seaborn are popular python libraries that can be used to create your data visualizations easily.

Visualizing our dataset helps us in understanding the trend, patterns and outliers within large dataset, it also helps in us to identify the correlations or the relationship between the independent variables.

In this post I will be sharing with us 5 most used visualizations and with some easy quick functions for them with python matplotlib and seaborn.

1. Scatterplot: This is a type of matplotlib visualization that is used in identifying the outliers in a dataset. It is also great for showing the relationship between the variables since you can directly see the raw distribution of the data.
`dataset =pd.read_csv(r"D:\titanic.csv")x= np.linspace(0,10,25)y=x*x*8plt.scatter(x,y)plt.show()`

2. Histogram: A histogram is a graphical display of numerical data in the form of upright bars with the area of each bar representing frequency. Histograms are useful for viewing or discovering the distribution of data points.

`n =np.array([1,2,3,4])plt.bar(n,n**2,align='center')plt.show()`

3. Barplot:Bar plots are most effective when you are visualizing categorical data that has few categories. Having too many categories will make the data cluttered in the figure and hard to understand.

`n =np.array([1,2,3,4])plt.bar(n,n**2,align='center')`
`import numpy as np;np.random.seed(42)import matplotlib.pyplot  as pltimport pandas as pdimport seaborn as sns%matplotlib inlinedf=pd.DataFrame(np.random.random(size=(4,4)), index=df.index, columns=['A','B','X','Y'])sns.boxplot(x="variable",y="value",data=pd.melt(df))`
`df_dict={"District_No":[21,27,30,31],        "Year":[2000,2001,2002,2003],        "population":[10000,8500,35000,12000],        "age":[50,80,70,100]}df=pd.DataFrame(df_dict,index=[2,4,6,8])corrmat=df.corr()fig=plt.figure(figsize=(12,9))sns.heatmap(corrmat, vmax=.8)`