How do you graph multiple items in a dataframe on one graph usin

ghz 5days ago ⋅ 4 views

How do you graph multiple items in a dataframe on one graph using pandas and matplotlib.pyplot?

The dataframe I am trying to graph is below. I want to plot each fieldname as the legend item with x=year and y=value

The name of the dataframe is my_gross

                     fieldName thisType         value  year
0   diluted_shares_outstanding     unit  9.637900e+07  2015
1   diluted_shares_outstanding     unit  8.777500e+07  2016
2   diluted_shares_outstanding     unit  8.556200e+07  2017
3   diluted_shares_outstanding     unit  8.353000e+07  2018
4   diluted_shares_outstanding     unit  7.771000e+07  2019
5   diluted_shares_outstanding     unit  7.292900e+07  2020
6                          eps    gross  7.360470e+08  2015
7                          eps    gross  7.285207e+08  2016
8                          eps    gross  8.944702e+08  2017
9                          eps    gross  1.298734e+09  2018
10                         eps    gross  1.451550e+09  2019
11                         eps    gross  1.259110e+09  2020
18               sales_revenue    gross  5.817000e+09  2015
19               sales_revenue    gross  5.762000e+09  2016
20               sales_revenue    gross  6.641000e+09  2017
21               sales_revenue    gross  8.047000e+09  2018
22               sales_revenue    gross  9.351000e+09  2019
23               sales_revenue    gross  8.530000e+09  2020

The following code is what I ran to create a graph, but I get undesired results.

for item in my_gross['fieldName']:
    plt.plot(my_gross['year'], my_gross['value'],label=item)

plt.legend()
plt.xticks(rotation=45)
plt.show()

Results undesired graph

The result I am trying to get is similar to this graph desired graph

Do I need to create a dictionary for unique values and do some sort of count and then loop through that dictionary instead of the df itself?

Answer

The issue with your current code is that you are iterating through all the rows in the dataframe and plotting the entire series multiple times, once for each fieldName. As a result, all lines are overlapping in the plot, leading to the undesired result.

To get the desired result where each fieldName is plotted as a separate line, you should first filter the dataframe based on the fieldName and plot each group separately. Here's a revised approach:

Steps:

  1. Group the data by fieldName: You can group the data by fieldName and then plot each group's year vs value.
  2. Plot each group separately: For each group, use the plot function to plot the year as x and value as y, while setting the label to the corresponding fieldName.
  3. Display the legend: Use the label argument in plt.plot() to automatically generate the legend.

Here’s how you can modify your code:

import matplotlib.pyplot as plt

# Grouping the dataframe by fieldName and plotting each group
for field, group in my_gross.groupby('fieldName'):
    plt.plot(group['year'], group['value'], label=field)

# Adding the legend, rotating x-axis labels, and showing the plot
plt.legend()
plt.xticks(rotation=45)
plt.show()

Explanation:

  • groupby('fieldName'): This groups the rows by the fieldName column. Each group will represent a different line on the plot.
  • group['year'] and group['value']: For each group, group['year'] provides the x-axis values (years), and group['value'] provides the y-axis values (the corresponding values for each field).
  • label=field: This automatically labels the line in the legend with the fieldName.

Result:

This will create a plot where each fieldName appears as a separate line on the graph with the correct legend.

Let me know if you need further adjustments!