Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: What is the purpose of the 'cache' operation in PySpark?
Answer: The 'cache' operation is used to persist a DataFrame or RDD in memory, enhancing the performance of iterative algorithms or repeated operations.

Example:

df.cache()
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook