In Pandas, the groupby() and agg() methods are closely related as groupby() is used to group the data in a DataFrame based on one or more columns and then the agg() method is used to perform aggregation operations on those groups.
After grouping the data using groupby(), you can use agg() to specify one or more functions to be applied to each group of the data. These functions can be built-in aggregation functions, such as mean(), sum(), min(), max(), etc., or custom functions defined by the user.
The agg() method applies the specified functions to each group and returns a new DataFrame with the aggregated data. The resulting DataFrame has a hierarchical index, where the first level corresponds to the grouping columns, and the second level corresponds to the columns on which the aggregation function was applied.
Overall, groupby() and agg() methods are two powerful tools in Pandas for data grouping and aggregation operations, which can help users extract meaningful insights and information from their data.
Let’s demonstrate how to use the groupby and agg methods in Pandas to perform data aggregation and transformation operations on a DataFrame:
import pandas as pd
import numpy as np
# Extended Aggregate Function
def transformed_mean(value):
value *= 100
return value.mean()
# Create DataFrame
df = pd.DataFrame({
'ID': [1, 1, 2, 2, 3, 3],
'Name': ['Danny', 'Adil', 'Andi', 'Mala', 'Zack', None],
'Point': [20, 21, 30, 11, 10, np.nan],
'Redeemed': [3, 5, 20, 0, 1, 1]
})
df['Name'] = df['Name'].astype(str)
# groupby() and agg()
df_grp = df.groupby(['ID'], as_index=False).agg({
'Name': ['-'.join, 'sum', 'count', 'size'],
'Point': ['min', 'max', 'mean', 'sum', 'std', 'count', 'size'],
'Redeemed': ['sum', lambda x: (x * 2).sum(), transformed_mean]
})
# Rename columns for more descriptive name
df_grp.columns = ['ID', 'Name_join', 'Name_sum', 'Name_count', 'Name_size',
'Point_min', 'Point_max', 'Point_mean', 'Point_sum', 'Point_std', 'Point_count', 'Point_size',
'Redeemed_sum', 'Redeemed_lambda', 'Redeemed_transformed_mean']
df_grp
Please refer to the following image for a description of the script:

Here’s a summary of what the script does:
- A DataFrame
dfis created using a Python dictionary, which contains columns forID,Name,Point, andRedeemed. The DataFrame contains some missing values represented bynp.nan. - A custom function
transformed_meanis defined to multiply a given value by 100 and then calculate its mean. This custom function demonstrates how to group data in a DataFrame and apply a custom function to each group. - The
groupbymethod is used to group the DataFramedfby theIDcolumn. Theaggmethod is then called on the grouped DataFrame to aggregate the data based on the specified functions. The results are saved in a new DataFramedf_grp. - The columns in
df_grpare renamed using thecolumnsattribute to create more descriptive names.
In this Python script, you have learned how to group and aggregate data in a Pandas DataFrame using the groupby and agg methods. These methods allow you to summarize and transform data in useful ways, making it easier to draw insights from complex datasets. In this script, you saw how to create a new DataFrame by grouping an existing DataFrame by a particular column, and then using agg to apply a set of functions to each group. You also learned how to rename the columns in the new DataFrame to make them more descriptive. Overall, this function demonstrates the power and flexibility of Pandas for data manipulation and analysis.