Color mapping of data on a date vs time plot

ghz 昨天 ⋅ 5 views

I am trying to plot 3 variables x,y,z on a 2d plot, with x (date) on the x axis, y (time) on the y axis and z (temperature) mapped with a colorscale. I have the three variables available within a pandas Dataframe and created an extra column with the datenumber so that matplotlib can work with it.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

data=pd.DataFrame() 
data['datenum']=mdates.date2num(data['Date'])

Example:

            Date Time     Tgrad   datenum
0     2016-08-01   00 -0.841203  736177.0
1     2016-08-01   01 -0.629176  736177.0
2     2016-08-01   02 -0.623608  736177.0
3     2016-08-01   03 -0.615145  736177.0
4     2016-08-01   04 -0.726949  736177.0
5     2016-08-01   05 -0.788864  736177.0
6     2016-08-01   06 -0.794655  736177.0
7     2016-08-01   07 -0.775724  736177.0
8     2016-08-01   08 -0.677951  736177.0

But have not been successful due to the wrong shape of my input data I think. I have tried something like this:

fig, ax = plt.subplots()
ax.imshow(data['Tgrad'], extent = [min(data['datenum']), max(data['datenum']),min(data['Time']), max(data['Time'])], cmap="autumn", aspect = "auto")
ax.xaxis_date()

But get a ValueError:

ValueError: setting an array element with a sequence

Is it necessary to have the data as numpy array or any other type? And how can I map the data once I have it in a different format?

Answer

The issue you're encountering comes from the shape and structure of your data. imshow in Matplotlib expects a 2D array of values (i.e., a grid of data points), but your data is structured as a 1D column with datenum values (which are already in the correct numeric format for plotting dates) and Time values as a categorical variable (hours). You need to reshape your data into a 2D grid format for imshow to work, where each point in the grid corresponds to a (datenum, Time) pair.

You can achieve this by converting your DataFrame into a 2D matrix of Tgrad values that correspond to specific datenum (date) and Time (hour) combinations. Here's how to do it:

Step-by-step solution:

  1. Pivot your DataFrame: You'll need to reshape your data into a grid where each column represents a specific date (datenum) and each row represents a specific time (Time). The values in this grid will be the Tgrad values.

  2. Plot the data using imshow: After reshaping the data into a 2D grid, you can use imshow to plot the Tgrad values. You'll also want to properly format the x and y axes to correspond to your dates and times.

Here’s how to implement this:

Code Example:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Sample data
data = pd.DataFrame({
    'Date': ['2016-08-01', '2016-08-01', '2016-08-01', '2016-08-01', '2016-08-01',
             '2016-08-01', '2016-08-01', '2016-08-01', '2016-08-01'],
    'Time': ['00', '01', '02', '03', '04', '05', '06', '07', '08'],
    'Tgrad': [-0.841203, -0.629176, -0.623608, -0.615145, -0.726949,
              -0.788864, -0.794655, -0.775724, -0.677951]
})

# Convert 'Date' to datetime and 'Time' to int
data['Date'] = pd.to_datetime(data['Date'])
data['Time'] = data['Time'].astype(int)

# Convert Date to datenumber
data['datenum'] = mdates.date2num(data['Date'])

# Pivot the DataFrame to create a grid of Tgrad values (dates as columns, times as rows)
pivoted_data = data.pivot(index='Time', columns='datenum', values='Tgrad')

# Create the plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the data using imshow
cax = ax.imshow(pivoted_data, aspect='auto', cmap='autumn', interpolation='nearest')

# Set the x and y ticks
ax.set_xticks(range(len(pivoted_data.columns)))
ax.set_xticklabels([mdates.num2date(d) for d in pivoted_data.columns], rotation=45)

ax.set_yticks(range(len(pivoted_data.index)))
ax.set_yticklabels(pivoted_data.index)

# Label the axes
ax.set_xlabel('Date')
ax.set_ylabel('Time (Hour)')

# Add a colorbar to map temperature gradient (Tgrad)
fig.colorbar(cax, label='Temperature Gradient (Tgrad)')

plt.tight_layout()
plt.show()

Explanation:

  1. Pivoting the Data:

    • The pivot function is used to transform the DataFrame into a 2D matrix where the rows represent the time (hour of the day), the columns represent the datenum (which corresponds to the date), and the values in the matrix are the Tgrad (temperature gradient) values.
  2. Using imshow:

    • imshow is then used to plot this matrix. We set aspect='auto' to allow the plot to adjust the aspect ratio according to the data.
  3. Formatting the Axes:

    • We set the x-axis to correspond to the actual dates by converting the datenum back to date format using mdates.num2date.
    • The y-axis is labeled with the times (hours), which are integers from the Time column.
  4. Color Mapping:

    • A colorbar is added to indicate the temperature gradient values (Tgrad) corresponding to the colors in the plot.

Result:

This will generate a 2D plot with dates on the x-axis, times (hours) on the y-axis, and the Tgrad values displayed using a color scale. The colorbar will indicate the values of the Tgrad variable, and the x-axis will display the actual dates.

Key Points:

  • The pivoting step is crucial, as it reshapes your data from long format (1D) into wide format (2D).
  • The imshow function expects a 2D matrix, and you can then map the color scale (via cmap) to the values of Tgrad.
  • You can fine-tune the color mapping and axis labels as needed.