Data Visualization in Python – Bar Charts and Pie Charts

Data visualization skills are a key part of a of data analytics and data science and in this tutorial we’ll cover all the commonly used graphs using Python. We’ll start with a quick introduction to data visualization in Python and then look at python functions for a range of bars and charts.

You can download the data files for this tutorial here.

What is Data Visualization in Python?

Data visualization the visual representation of data in the form of graphs and plots and is particularly useful as non technical people often understand data and analysis presented in a visual form much better than with complicated numbers and tables.

Data visualization enables us to identify patterns or trends easily, as well as help to visualize data distribution, correlation and causality.

Principles of Data Visualization

Here are some important principles of data visualization that we should keep in mind when creating various charts and graphs.

Data visualization in Python case study

Case Study

Let’s consider a case study to explain the various charts and graphs.

We have a telecom service provider has demographic and transactional  information about their customers. We want to visualize the data using usage variables and customer demographic information in order to generate business insights.

Python Data Visualization case study

There are 1000 customers in our sample. For each customer age, gender and pincode information is provided. In addition, the number of calls , number of minutes spoken and bill amount over a 6 month period are available for each customer.

Data visualization
 - Data snapshot

Bar Charts

Simple bar graphs are a very common type of graph used in data visualization and are used to represent one variable. They consist of vertical or horizontal bars of uniform width and height proportional to the value of the variable for certain groups. They are a one dimensional diagrams. The space between two bars in a simple bar graph must be uniform. The height or length of a bar can represent, for example, frequency, mean, total or percentage for each category/group of a variable.

Bar Charts in Python

We import pandas, matplotlib and seaborn libraries to construct a simple bar diagram.

The syntax is as follows:

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

To construct a simple bar diagram of the total number of calls for each age group  it’s important to aggregate our data using groupby() function.

Importing the Libraries

 import pandas as pd
 import matplotlib.pyplot as plt
 import seaborn as sns 

Importing Data

telecom = pd.read_csv("telecom.csv")

Aggregating Data

 telecom1 = telecom.groupby('Age_Group')['Calls'].sum()
 telecom1 

        

#Output

The output shows the number of calls for each age group Calls
 Age_Group        
 18-30       943187
 30-45       798721
 >45         128870 
    

Simple Bar Chart – Total calls for different age groups

We use the plt.figure() function plot all columns with labels and the Plot.bar function plots a bar chart.

title is a string argument to give the plot a title.

color argument specifies the plot colour. It accepts strings, hex numbers and colour code.

plt.xlabel specifies the x label.plt.ylabel function specifies the y label.

plt.figure(); telecom1.plot.bar(title='Fig.No.1 : SIMPLE BAR CHART (Total Calls – Age Group)', color='darkorange'); plt.xlabel('Age Groups'); plt.ylabel('Total Calls') 

This slide shows simple bar diagram of the total number of calls in each age group.

Observing the diagram, we can say that the number of calls made by the 18-30 age group is slightly higher than the 30 – 45 age group and much higher than the over 45 group.

bar chart python

Simple Bar Chart – Mean calls for different age groups

To construct a simple bar diagram for the mean number of calls for different our age groups, the python code remains same, with only difference being that while aggregating the data, “mean” function is used instead of “sum”.

 telecom2 = telecom.groupby('Age_Group')['Calls'].mean()
 telecom2 
                  Calls
 Age_Group             
 18-30      1882.608782
 30-45      1866.170561
 >45        1815.070423 
plt.figure(); telecom2.plot.bar(title='Fig.No.2 : SIMPLE BAR CHART (Mean Calls – Age Group)', color='darkorange'); plt.xlabel('Age Groups'); plt.ylabel('Mean Calls')  

This graph gives the distribution of the mean number of calls across different age groups. By plotting the average number of calls, we can see that although there is quite a difference in total calls between each age group, the average number of calls across age groups is similar.

Simple bar chart python

Simple Bar Chart in Horizontal Orientation

Here we have replaced the plot.bar function with plot.barh.barh() to give a horizontal orientation to the bars.

plt.figure(); telecom1.plot.barh(title='Fig.No. 3: SIMPLE BAR CHART - HORIZONTAL', color='darkorange'); plt.xlabel('No.of Customers'); plt.ylabel('Age Group') 

bar chart python horizontal

Stacked Bar Chart in Python

We use the pivot_table function to provide a count of customers by age group and gender.The index option specifies the rows in the table and the columns option specifies columns. We’ve used the count function to obtain the count of customers based on values=CustID. As in the previous case, the plot.bar function is used, in this case with stacked=True.

telecom3=pd.pivot_table(telecom, index=['Age_Group'], columns=['Gender'], values=['CustID'], aggfunc='count')
telecom3 
           CustID     
 Gender         F    M
 Age_Group            
 18-30        256  245
 30-45        221  207
 >45           32   39 
plt.figure(); telecom3.plot.bar(title='Fig.No. 4 : STACKED BAR CHART', stacked=True); plt.xlabel('Age Group'); plt.ylabel('No.of Customers') 

This graph divides the number of customers in each age group by gender.

The graph shows that although there are more young customers in data there is an almost equal number of males and females present in each age group.

Stacked bar chart python

Percentage Bar Chart in Python

Now let’s get a percentage bar chart in Python.

We first obtain table of percentage values using the div() function on the pivot table obtained.

Note that the object telecom3 is used to obtain percentage values, which are stored in object telecom4.

The percentage subdivided barplot code remains the same with respect to the previous subdivided barplot code. The only difference is that instead of counts, we use percentage values.

 telecom4=telecom3.div(telecom3.sum(1).astype(float), axis=0)
 telecom4 
              CustID          
 Gender            F         M
 Age_Group                    
 18-30      0.510978  0.489022
 30-45      0.516355  0.483645
 >45        0.450704  0.549296 
plt.figure();(telecom4*100).plot.bar(title='Fig.No. 5 : PERCENTAGE BAR CHART', stacked=True); plt.xlabel('Age Groups'); plt.ylabel('Customer %') 

We can now see the percentage subdivided diagram for gender wise distribution of the number of customers across the age groups.

We observe that the data contains an almost equal proportion of male and female callers across three different age groups.Therefore plotting a percentage stacked graph makes it efficient for comparing the gender wise distribution of the number of customers across age groups.

Percentage bar chart python

Multiple Bar Charts in Python

Let’s now move to multiple bar diagrams

We use pivot_table() to generate a cross table giving the total number of calls by age group and gender.

We then use the pd.plot.bar function with the familiar argument- title ,and plt.xlabel and plt.ylabel function to construct a multiple bar diagram.

telecom5=pd.pivot_table(telecom, index=['Age_Group'], columns=['Gender'], values=['Calls'], aggfunc='sum')
telecom5 
             Calls        
 Gender          F       M
 Age_Group                
 18-30      480235  462952
 30-45      408184  390537
 >45         58310   70560 
plt.figure(); telecom5.plot.bar(title='Fig.No.6 : MULTIPLE BAR CHART (Total Calls - Gender & Age Group)'); plt.xlabel('Age Groups'); plt.ylabel('No. of Calls') 

This is how our multiple bar diagram looks. There are two bars for each age group – one for females and theother for males. Multiple bar diagram can be used as an alternative way of representing a stacked bar graph.

Multiple bar chart python

Pie Charts in Python

Finally, let’s construct  a pie chart in python

We use the groupby function with calls as a variable and age group as a factor to obtain total calls for each age group. Then we obtain the percentage for each age group using the div() functionNext, the function plot.pie is used to obtain a pie diagram with arguments such as:

label that provides a user defined label for the variable on X axis

title gives title of the plot, autopct is used to display percentage values and colormap can be used to input your choice of colors

 telecom6 = telecom.groupby('Age_Group')['Calls'].sum()
 telecom6 = telecom6.div(telecom6.sum().astype(float)).round(2)*100
 telecom6 
 Age_Group
 18-30    50.0
 30-45    43.0
 >45       7.0 
telecom6.plot.pie(label=('Age Groups'), title = "Fig.No. 7 : PIE CHART WITH PERCENTAGE",colormap='brg', autopct='%1.0f%%') 

This slide displays a pie diagram.Observing the diagram we can say that 50% of calls are made by Age_Group 18-30, 43% by 30-45 & only 7% by greater than 45 Age_Group.

Pie chart in Python

Pie Chart in Python – More than one

To plot multiple pie charts, the argument subplot = True should be included within the plot.pie function.

telecom7 = pd.pivot_table(telecom, index=['Age_Group'], columns=['Gender'], values=['CustID'], aggfunc='count')
telecom7 
           CustID     
 Gender         F    M
 Age_Group            
 18-30        256  245
 30-45        221  207
 >45           32   39 
plt.figure(); telecom7.plot.pie(title='Fig.No. 8 : MULTIPLE PIE CHARTS', colors=['darkcyan','orange','yellowgreen'],autopct='%.1f%%', subplots=True) 
Multiple pie charts python

This tutorial lesson is taken from the Postgraduate Diploma in Data Science.