I have the following DataFrame:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
1 2015-07-08 2015-07-14 (8, 9, 10, 11, 12, 13, 14)
2 2015-07-15 2015-07-21 (15, 16, 17, 18, 19, 20, 21)
3 2015-07-22 2015-07-28 (22, 23, 24, 25, 26, 27, 28)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
5 2015-08-05 2015-08-11 (5, 6, 7, 8, 9, 10, 11)
6 2015-08-12 2015-08-18 (12, 13, 14, 15, 16, 17, 18)
7 2015-08-19 2015-08-25 (19, 20, 21, 22, 23, 24, 25)
8 2015-08-26 2015-09-01 (26, 27, 28, 29, 30, 31, 1)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
10 2015-09-09 2015-09-15 (9, 10, 11, 12, 13, 14, 15)
11 2015-09-16 2015-09-22 (16, 17, 18, 19, 20, 21, 22)
12 2015-09-23 2015-09-29 (23, 24, 25, 26, 27, 28, 29)
I am interested in working with the days column containing tuples, using Pandas syntax for basic filtering does not appear to work:
df[4 in df['days'] == True]
I was hoping the above would filter the DataFrame to return the following rows, i.e. tuples containing 4:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
Instead an empty DataFrame is returned.
I have also tried creating a new column to hold True/False values based on checking against an expression like so:
df['daysTF'] = 4 in df['days']
This returns the DataFrame with the 'daysTF' column set to True for all rows, instead of only True if 4 is contained within the tuple.
Answer
The issue you're encountering is due to the way Pandas interprets operations on columns that contain tuples. When you try to use basic filtering like df[4 in df['days'] == True]
, you're actually trying to check if the number 4
is in the entire df['days']
series, which does not work as expected.
To filter rows based on whether the number 4
is in the tuples in the days
column, you need to use a more explicit method to apply the check row by row. Here’s how you can achieve the desired filtering:
Solution:
You can apply a lambda function or use the apply
method to check if 4
is in each tuple of the days
column. Here's the correct approach:
import pandas as pd
# Sample DataFrame
data = {
'start': ['2015-07-01', '2015-07-08', '2015-07-15', '2015-07-22', '2015-07-29',
'2015-08-05', '2015-08-12', '2015-08-19', '2015-08-26', '2015-09-02',
'2015-09-09', '2015-09-16', '2015-09-23'],
'end': ['2015-07-07', '2015-07-14', '2015-07-21', '2015-07-28', '2015-08-04',
'2015-08-11', '2015-08-18', '2015-08-25', '2015-09-01', '2015-09-08',
'2015-09-15', '2015-09-22', '2015-09-29'],
'days': [
(1, 2, 3, 4, 5, 6, 7),
(8, 9, 10, 11, 12, 13, 14),
(15, 16, 17, 18, 19, 20, 21),
(22, 23, 24, 25, 26, 27, 28),
(29, 30, 31, 1, 2, 3, 4),
(5, 6, 7, 8, 9, 10, 11),
(12, 13, 14, 15, 16, 17, 18),
(19, 20, 21, 22, 23, 24, 25),
(26, 27, 28, 29, 30, 31, 1),
(2, 3, 4, 5, 6, 7, 8),
(9, 10, 11, 12, 13, 14, 15),
(16, 17, 18, 19, 20, 21, 22),
(23, 24, 25, 26, 27, 28, 29)
]
}
df = pd.DataFrame(data)
# Filter using apply and lambda function to check if 4 is in the tuple in the 'days' column
filtered_df = df[df['days'].apply(lambda x: 4 in x)]
print(filtered_df)
Explanation:
.apply(lambda x: 4 in x)
: This applies a function to each element (each tuple) in thedays
column. For each row, it checks whether the number4
is present in the tuple.df[df['days'].apply(lambda x: 4 in x)]
: This filters the DataFrame by keeping only the rows where the condition (4 being in the tuple) isTrue
.
Output:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
Notes:
apply()
: This method allows you to apply a function to each row (or element) in a series. In this case, the function is checking if4
exists in the tuple.- You could replace
4
with any other number to filter based on different conditions.
Let me know if you have more questions!