I have two columns like shown below, and trying to return the highest count of the second column, but its just returning me the highest count on rating without considering the gender
DATA :
print (df)
AGE GENDER rating
0 10 M PG
1 10 M R
2 10 M R
3 4 F PG13
4 4 F PG13
CODE :
s = (df.groupby(['AGE', 'GENDER'])['rating']
.apply(lambda x: x.value_counts().head(2))
.rename_axis(('a','b', 'c'))
.reset_index(level=2)['c'])
OUTPUT :
print (s[F])
('PG')
print(s[M]
('PG', 'R')
Answer
It seems like you're trying to get the highest count of ratings for each combination of AGE
and GENDER
, but the code you've written doesn't fully achieve that. Specifically, you're using .apply(lambda x: x.value_counts().head(2))
, which returns the top 2 counts for each group, but it might not give you just the highest count, and it's not filtering out ratings correctly for each GENDER
.
To get the highest count for the rating
in each AGE
and GENDER
combination, you can modify your approach as follows:
Solution:
import pandas as pd
# Sample DataFrame
data = {
'AGE': [10, 10, 10, 4, 4],
'GENDER': ['M', 'M', 'M', 'F', 'F'],
'rating': ['PG', 'R', 'R', 'PG13', 'PG13']
}
df = pd.DataFrame(data)
# Group by AGE and GENDER, then get the most common rating for each group
s = df.groupby(['AGE', 'GENDER'])['rating'] \
.agg(lambda x: x.value_counts().idxmax()) \
.reset_index(name='most_common_rating')
# Output the result
print(s)
Explanation:
- GroupBy and Aggregation: We use
.groupby(['AGE', 'GENDER'])
to group byAGE
andGENDER
. - Aggregation Function (
agg
): The lambda functionx.value_counts().idxmax()
gets the most frequent rating in each group (i.e., the one with the highest count). - Resetting Index:
.reset_index(name='most_common_rating')
gives a clean DataFrame with the most common ratings for eachAGE
andGENDER
pair.
Output:
AGE GENDER most_common_rating
0 4 F PG13
1 10 M R
Key Points:
- This approach gives you only the most frequent rating for each group.
.value_counts().idxmax()
retrieves the rating with the highest count.