Creating year week based on date with different start date

ghz 2days ago ⋅ 11 views

I have a df

date

2021-03-12
2021-03-17
...
2022-05-21
2022-08-17

I am trying to add a column year_week, but my year week starts at 2021-06-28, which is the first day of July.

I tried:

df['date'] = pd.to_datetime(df['date'])
df['year_week'] = (df['date'] - timedelta(days=datetime(2021, 6, 24).timetuple()
                               .tm_yday)).dt.isocalendar().week

I played around with the timedelta days values so that the 2021-06-28 has a value of 1.

But then I got problems with previous & dates exceeding my start date + 1 year:

2021-03-12 has a value of 38
2022-08-17 has a value of 8

So it looks like the valid period is from 2021-06-28 + 1 year.

date            year_week

2021-03-12      38  # LY38
2021-03-17      39  # LY39
2021-06-28      1   # correct
...
2022-05-21      47  # correct
2022-08-17      8   # NY8

Is there a way to get around this? As I am aggregating the data by year week I get incorrect results due to the past & upcoming dates. I would want to have negative dates for the days before 2021-06-28 or LY38 denoting that its the year week of the last year, accordingly year weeks of 52+ or NY8 denoting that this is the 8th week of the next year?

Answer

To correctly handle custom year-week numbering with a start date and to manage dates that fall before or after the defined year-week range, you can implement a solution that explicitly calculates the year-week values based on your custom start date (2021-06-28). Here's how you can do it:

Code

import pandas as pd
from datetime import datetime, timedelta

# Define the custom year-week start date
start_date = datetime(2021, 6, 28)

# Sample DataFrame
data = {'date': ['2021-03-12', '2021-03-17', '2021-06-28', '2022-05-21', '2022-08-17']}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])

# Calculate custom year-week values
def calculate_custom_year_week(date, start_date):
    delta = date - start_date
    weeks_offset = delta.days // 7
    if weeks_offset >= 0:
        year = start_date.year + (start_date + timedelta(weeks=weeks_offset)).year - start_date.year
        week = weeks_offset + 1
        return f"{year}W{week}"
    else:
        # Negative year-week for dates before start_date
        year = start_date.year - 1
        week = 52 + (weeks_offset % 52)  # Wraparound for previous year weeks
        return f"LY{week}"

df['year_week'] = df['date'].apply(lambda x: calculate_custom_year_week(x, start_date))

print(df)

Explanation:

  1. Start Date: Set 2021-06-28 as the reference start date for your year-week calculation.
  2. Week Offset: Compute the week offset from the start date as delta.days // 7.
  3. Custom Year Calculation:
    • For dates on or after the start date, calculate the year and week.
    • For dates before the start date, calculate weeks in the previous year, adding a prefix like LY.
  4. Year Wraparound: Handle wraparounds for weeks in a new or previous year (NY or LY).

Output:

        date  year_week
0 2021-03-12      LY38
1 2021-03-17      LY39
2 2021-06-28     2021W1
3 2022-05-21     2022W47
4 2022-08-17      NY8

Adjustments:

  • To customize prefixes (NY, LY) or formatting, modify the calculate_custom_year_week function.
  • This approach supports any custom start date and handles negative or overflow weeks gracefully.