function to return the highest count value using a rule

ghz 12hours ago ⋅ 3 views

I have two columns like shown below, and trying to return the highest count of the second column, but its just returning me the highest count on rating without considering the gender

DATA :

print (df)

   AGE GENDER rating
0   10      M     PG
1   10      M      R
2   10      M      R
3    4      F   PG13
4    4      F   PG13

CODE :

 s = (df.groupby(['AGE', 'GENDER'])['rating']
       .apply(lambda x: x.value_counts().head(2))
       .rename_axis(('a','b', 'c'))
       .reset_index(level=2)['c'])

OUTPUT :

print (s[F])
('PG')

print(s[M]

('PG', 'R')

Answer

It seems like you're trying to get the highest count of ratings for each combination of AGE and GENDER, but the code you've written doesn't fully achieve that. Specifically, you're using .apply(lambda x: x.value_counts().head(2)), which returns the top 2 counts for each group, but it might not give you just the highest count, and it's not filtering out ratings correctly for each GENDER.

To get the highest count for the rating in each AGE and GENDER combination, you can modify your approach as follows:

Solution:

import pandas as pd

# Sample DataFrame
data = {
    'AGE': [10, 10, 10, 4, 4],
    'GENDER': ['M', 'M', 'M', 'F', 'F'],
    'rating': ['PG', 'R', 'R', 'PG13', 'PG13']
}

df = pd.DataFrame(data)

# Group by AGE and GENDER, then get the most common rating for each group
s = df.groupby(['AGE', 'GENDER'])['rating'] \
       .agg(lambda x: x.value_counts().idxmax()) \
       .reset_index(name='most_common_rating')

# Output the result
print(s)

Explanation:

  1. GroupBy and Aggregation: We use .groupby(['AGE', 'GENDER']) to group by AGE and GENDER.
  2. Aggregation Function (agg): The lambda function x.value_counts().idxmax() gets the most frequent rating in each group (i.e., the one with the highest count).
  3. Resetting Index: .reset_index(name='most_common_rating') gives a clean DataFrame with the most common ratings for each AGE and GENDER pair.

Output:

   AGE GENDER most_common_rating
0    4      F              PG13
1   10      M                R

Key Points:

  • This approach gives you only the most frequent rating for each group.
  • .value_counts().idxmax() retrieves the rating with the highest count.