I have a dataframe with a datetime column in string type, like this:
>>> df2
date a b
0 2020/1/1 8.0 5.0
1 2020/1/2 10.0 7.0
2 2020/1/3 6.0 1.0
3 2020/1/4 6.0 3.0
I want use its 'date' column to generate a new index with various length by multiply a array, like this:
>>> idx_list = [2,3,1,2]
>>> df2.date*idx_list
but I got a unexpected result:
>>> df2.date*idx_list
0 2020/1/12020/1/1
1 2020/1/22020/1/22020/1/2
2 2020/1/3
3 2020/1/42020/1/4
Is there a way to make a new index series to be a sequential data, like:
0 2020/1/1
1 2020/1/1
2 2020/1/2
3 2020/1/2
4 2020/1/2
5 2020/1/3
6 2020/1/4
7 2020/1/4
Answer
To achieve your goal of generating a sequential index based on the multiplication of the date
column with an array like idx_list
, you'll need to use a different approach because simply multiplying the column will not produce the desired result. Instead, you can use np.repeat()
to repeat the values in the date
column according to the corresponding values in idx_list
.
Here's a step-by-step solution:
import pandas as pd
import numpy as np
# Create your DataFrame
data = {
'date': ['2020/1/1', '2020/1/2', '2020/1/3', '2020/1/4'],
'a': [8.0, 10.0, 6.0, 6.0],
'b': [5.0, 7.0, 1.0, 3.0]
}
df2 = pd.DataFrame(data)
# Convert 'date' column to datetime
df2['date'] = pd.to_datetime(df2['date'])
# Define the idx_list
idx_list = [2, 3, 1, 2]
# Use np.repeat() to repeat the dates according to idx_list
new_dates = np.repeat(df2['date'].values, idx_list)
# Create a new DataFrame with the new index
new_df = pd.DataFrame({
'date': new_dates,
'a': np.repeat(df2['a'].values, idx_list),
'b': np.repeat(df2['b'].values, idx_list)
})
print(new_df)
Explanation:
- Convert
date
column todatetime
: We first convert thedate
column to adatetime
type, so the dates are handled properly. - Use
np.repeat()
: Thenp.repeat()
function repeats thedate
column values based on the lengths specified inidx_list
. For example,2
means the date will be repeated twice,3
means the date will be repeated three times, etc. - Create a new DataFrame: We create a new DataFrame with the repeated dates, as well as the values in columns
a
andb
, which are also repeated based onidx_list
.
Output:
date a b
0 2020-01-01 8.0 5.0
1 2020-01-01 8.0 5.0
2 2020-01-02 10.0 7.0
3 2020-01-02 10.0 7.0
4 2020-01-02 10.0 7.0
5 2020-01-03 6.0 1.0
6 2020-01-04 6.0 3.0
7 2020-01-04 6.0 3.0
This will give you a new DataFrame where the date
column is repeated according to idx_list
, and the a
and b
values are also repeated accordingly.