Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: Explain the role of the 'broadcast' variable in PySpark.
Answer: A 'broadcast' variable is used to cache a read-only variable in each node of a cluster to enhance the performance of joins.

Example:

from pyspark.sql.functions import broadcast

result = df1.join(broadcast(df2), 'key')
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook