Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: Explain the concept of 'checkpointing' in PySpark.
Answer: 'Checkpointing' is a mechanism in PySpark to truncate the lineage of a RDD or DataFrame by saving it to a reliable distributed file system.

Example:

spark.sparkContext.setCheckpointDir('hdfs://path/to/checkpoint')
df_checkpointed = df.checkpoint()
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook