Interview Questions and Answers
Freshers / Beginner level questions & answers
Ques 1. What is Sqoop?
Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 2. What is the purpose of the --target-dir option in Sqoop import?
The --target-dir option specifies the HDFS directory where the imported data will be stored.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 3. What is the purpose of the --warehouse-dir option in Sqoop?
The --warehouse-dir option specifies the base directory in HDFS where imported data is stored.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 4. What is the purpose of the --update-key option in Sqoop export?
The --update-key option specifies the column(s) used to identify rows for updates when performing an export.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 5. Explain the purpose of the --as-textfile option in Sqoop.
The --as-textfile option in Sqoop specifies that the data should be stored in text format in HDFS during import.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 6. How can you perform a full load import in Sqoop?
A full load import in Sqoop can be done by using the --m (or --num-mappers) option with a value of 1 to import the data using a single mapper.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --m 1
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 7. What is the purpose of the --null-string and --null-non-string options in Sqoop?
These options are used to specify the representation of NULL values in the imported data for string and non-string columns, respectively.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --null-string --null-non-string -1
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 8. What is the purpose of the --columns option in Sqoop?
The --columns option allows you to specify a comma-separated list of columns to import, excluding others from the source table.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --columns id,name
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 9. What is the purpose of the --fetch-size option in Sqoop?
The --fetch-size option specifies the number of rows to fetch in each round trip between Sqoop and the database during import.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --fetch-size 100
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Intermediate / 1 to 5 years experienced level questions & answers
Ques 10. Explain the import command in Sqoop.
The import command in Sqoop is used to import data from a relational database into Hadoop.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --target-dir /user/hadoop/mytable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 11. How can you perform an incremental import in Sqoop?
Incremental imports in Sqoop can be done using the --incremental option. You need to specify the mode and the column to use for tracking changes.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --incremental append --check-column id --last-value 100
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 12. Explain the export command in Sqoop.
The export command in Sqoop is used to export data from Hadoop to a relational database.
Example:
sqoop export --connect jdbc:mysql://localhost:3306/db --table mytable --export-dir /user/hadoop/mytable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 13. What is the metastore in Sqoop?
The metastore in Sqoop is a central repository that stores metadata related to Sqoop jobs, such as saved jobs, connection information, and job history.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 14. Explain the purpose of the --map-column-java option in Sqoop.
The --map-column-java option allows you to specify how the columns from the database table should be mapped to Java types during import.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --map-column-java id=String,value=Double
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 15. What is the difference between the free-form query import and table-based import in Sqoop?
In free-form query import, you can specify a SQL query to extract data, while in table-based import, you directly import an entire table.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 16. Explain the purpose of the --merge-key option in Sqoop.
The --merge-key option is used during the Sqoop merge operation to specify the columns used for identifying rows to merge.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 17. Explain the purpose of the --query option in Sqoop.
The --query option allows you to specify a SQL SELECT statement to retrieve data during Sqoop import.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --query SELECT * FROM mytable WHERE $CONDITIONS --split-by id
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 18. Explain the purpose of the --boundary-query option in Sqoop.
The --boundary-query option allows you to specify a SQL query that is used to determine the range of values for the splitting column.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --boundary-query SELECT MIN(id), MAX(id) FROM mytable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 19. Explain the purpose of the --boundary-query option in Sqoop.
The --boundary-query option allows you to specify a SQL query that is used to determine the range of values for the splitting column.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --boundary-query SELECT MIN(id), MAX(id) FROM mytable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 20. How can you import data into Hive using Sqoop?
You can import data into Hive using Sqoop by specifying the --hive-import option along with the target Hive table using --hive-table.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --hive-import --hive-table myhivetable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 21. Explain the purpose of the --hive-overwrite option in Sqoop.
The --hive-overwrite option in Sqoop is used to overwrite existing data in the Hive table during import.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --hive-import --hive-table myhivetable --hive-overwrite
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 22. How can you perform an export operation in Sqoop to update existing records?
To update existing records during export in Sqoop, you can use the --update-key option along with --update-mode option set to allowinsert.
Example:
sqoop export --connect jdbc:mysql://localhost:3306/db --table mytable --update-key id --update-mode allowinsert --export-dir /user/hadoop/mytable
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 23. Explain the purpose of the --hcatalog-database option in Sqoop.
The --hcatalog-database option specifies the HCatalog database name when importing data into HCatalog.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --hcatalog-import --hcatalog-database mydatabase
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Experienced / Expert level questions & answers
Ques 24. Explain the purpose of the --direct option in Sqoop.
The --direct option enables direct export or import between Hadoop and the database without using HDFS as an intermediary.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 25. Explain the purpose of the --direct-split-size option in Sqoop.
The --direct-split-size option is used to specify the number of bytes per split when using direct mode for imports and exports.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --direct --direct-split-size 1000000
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 26. What is the purpose of the --direct-import option in Sqoop?
The --direct-import option is used to import data directly into the database without using HDFS as an intermediate storage.
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 27. What is the purpose of the --validate option in Sqoop?
The --validate option is used to perform data validation during import by comparing the source and target data counts.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --validate
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 28. Explain the purpose of the --direct-split-size option in Sqoop.
The --direct-split-size option is used to specify the number of bytes per split when using direct mode for imports and exports.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --direct --direct-split-size 1000000
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 29. What is the purpose of the --direct-split-size option in Sqoop?
The --direct-split-size option is used to specify the number of bytes per split when using direct mode for imports and exports.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --direct --direct-split-size 1000000
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Ques 30. Explain the purpose of the --autoreset-to-one-mapper option in Sqoop.
The --autoreset-to-one-mapper option automatically resets the number of mappers to one if the initial split size is larger than the total number of rows in the table.
Example:
sqoop import --connect jdbc:mysql://localhost:3306/db --table mytable --autoreset-to-one-mapper
Save For Revision
Save For Revision
Bookmark this item, mark it difficult, or place it in a revision set.
Log in to save bookmarks, difficult questions, and revision sets.
Most helpful rated by users: