Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: Explain the concept of Resilient Distributed Datasets (RDD) in PySpark.
Answer: RDD is the fundamental data structure in PySpark, representing an immutable distributed collection of objects. It allows parallel processing and fault tolerance.

Example:

data = [1, 2, 3, 4, 5]
rdd = spark.sparkContext.parallelize(data)
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook