I have a df
date
2021-03-12
2021-03-17
...
2022-05-21
2022-08-17
I am trying to add a column year_week
, but my year week starts at 2021-06-28
, which is the first day of July.
I tried:
df['date'] = pd.to_datetime(df['date'])
df['year_week'] = (df['date'] - timedelta(days=datetime(2021, 6, 24).timetuple()
.tm_yday)).dt.isocalendar().week
I played around with the timedelta
days
values so that the 2021-06-28
has a value of 1
.
But then I got problems with previous & dates exceeding my start date + 1 year:
2021-03-12 has a value of 38
2022-08-17 has a value of 8
So it looks like the valid period is from 2021-06-28
+ 1 year
.
date year_week
2021-03-12 38 # LY38
2021-03-17 39 # LY39
2021-06-28 1 # correct
...
2022-05-21 47 # correct
2022-08-17 8 # NY8
Is there a way to get around this? As I am aggregating the data by year week I get incorrect results due to the past & upcoming dates. I would want to have negative dates for the days before 2021-06-28
or LY38
denoting that its the year week of the last year, accordingly year weeks of 52+ or NY8
denoting that this is the 8th
week of the next year?
Answer
To correctly handle custom year-week numbering with a start date and to manage dates that fall before or after the defined year-week range, you can implement a solution that explicitly calculates the year-week values based on your custom start date (2021-06-28
). Here's how you can do it:
Code
import pandas as pd
from datetime import datetime, timedelta
# Define the custom year-week start date
start_date = datetime(2021, 6, 28)
# Sample DataFrame
data = {'date': ['2021-03-12', '2021-03-17', '2021-06-28', '2022-05-21', '2022-08-17']}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
# Calculate custom year-week values
def calculate_custom_year_week(date, start_date):
delta = date - start_date
weeks_offset = delta.days // 7
if weeks_offset >= 0:
year = start_date.year + (start_date + timedelta(weeks=weeks_offset)).year - start_date.year
week = weeks_offset + 1
return f"{year}W{week}"
else:
# Negative year-week for dates before start_date
year = start_date.year - 1
week = 52 + (weeks_offset % 52) # Wraparound for previous year weeks
return f"LY{week}"
df['year_week'] = df['date'].apply(lambda x: calculate_custom_year_week(x, start_date))
print(df)
Explanation:
- Start Date: Set
2021-06-28
as the reference start date for your year-week calculation. - Week Offset: Compute the week offset from the start date as
delta.days // 7
. - Custom Year Calculation:
- For dates on or after the start date, calculate the year and week.
- For dates before the start date, calculate weeks in the previous year, adding a prefix like
LY
.
- Year Wraparound: Handle wraparounds for weeks in a new or previous year (
NY
orLY
).
Output:
date year_week
0 2021-03-12 LY38
1 2021-03-17 LY39
2 2021-06-28 2021W1
3 2022-05-21 2022W47
4 2022-08-17 NY8
Adjustments:
- To customize prefixes (
NY
,LY
) or formatting, modify thecalculate_custom_year_week
function. - This approach supports any custom start date and handles negative or overflow weeks gracefully.