Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: Explain the purpose of the 'groupBy' operation in PySpark.
Answer: 'groupBy' is used to group the data based on one or more columns. It is often followed by aggregation functions to perform operations on each group.

Example:

grouped_data = df.groupBy('Category').agg({'Price': 'mean'})
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook