How do you graph multiple items in a dataframe on one graph using pandas and matplotlib.pyplot?
The dataframe I am trying to graph is below. I want to plot each fieldname as the legend item with x=year and y=value
The name of the dataframe is my_gross
fieldName thisType value year
0 diluted_shares_outstanding unit 9.637900e+07 2015
1 diluted_shares_outstanding unit 8.777500e+07 2016
2 diluted_shares_outstanding unit 8.556200e+07 2017
3 diluted_shares_outstanding unit 8.353000e+07 2018
4 diluted_shares_outstanding unit 7.771000e+07 2019
5 diluted_shares_outstanding unit 7.292900e+07 2020
6 eps gross 7.360470e+08 2015
7 eps gross 7.285207e+08 2016
8 eps gross 8.944702e+08 2017
9 eps gross 1.298734e+09 2018
10 eps gross 1.451550e+09 2019
11 eps gross 1.259110e+09 2020
18 sales_revenue gross 5.817000e+09 2015
19 sales_revenue gross 5.762000e+09 2016
20 sales_revenue gross 6.641000e+09 2017
21 sales_revenue gross 8.047000e+09 2018
22 sales_revenue gross 9.351000e+09 2019
23 sales_revenue gross 8.530000e+09 2020
The following code is what I ran to create a graph, but I get undesired results.
for item in my_gross['fieldName']:
plt.plot(my_gross['year'], my_gross['value'],label=item)
plt.legend()
plt.xticks(rotation=45)
plt.show()
Results undesired graph
The result I am trying to get is similar to this graph desired graph
Do I need to create a dictionary for unique values and do some sort of count and then loop through that dictionary instead of the df
itself?
Answer
The issue with your current code is that you are iterating through all the rows in the dataframe and plotting the entire series multiple times, once for each fieldName
. As a result, all lines are overlapping in the plot, leading to the undesired result.
To get the desired result where each fieldName
is plotted as a separate line, you should first filter the dataframe based on the fieldName
and plot each group separately. Here's a revised approach:
Steps:
- Group the data by
fieldName
: You can group the data byfieldName
and then plot each group'syear
vsvalue
. - Plot each group separately: For each group, use the
plot
function to plot theyear
asx
andvalue
asy
, while setting the label to the correspondingfieldName
. - Display the legend: Use the
label
argument inplt.plot()
to automatically generate the legend.
Here’s how you can modify your code:
import matplotlib.pyplot as plt
# Grouping the dataframe by fieldName and plotting each group
for field, group in my_gross.groupby('fieldName'):
plt.plot(group['year'], group['value'], label=field)
# Adding the legend, rotating x-axis labels, and showing the plot
plt.legend()
plt.xticks(rotation=45)
plt.show()
Explanation:
groupby('fieldName')
: This groups the rows by thefieldName
column. Each group will represent a different line on the plot.group['year']
andgroup['value']
: For each group,group['year']
provides the x-axis values (years), andgroup['value']
provides the y-axis values (the corresponding values for each field).label=field
: This automatically labels the line in the legend with thefieldName
.
Result:
This will create a plot where each fieldName
appears as a separate line on the graph with the correct legend.
Let me know if you need further adjustments!