I'm trying to read a parquet file on spark and I have a question.
How is the type inferred when loading a parquet file with spark.read.parquet?
- 1. Parquet Type INT32 -> Spark Type IntegerType
- 2. Parquet inferred from actual stored values -> Spark IntegerType
Is there a dictionary for mapping like 1? Or is it inferred from the actual stored values like 2?
CodePudding user response:
Spark uses the parquet schema to parse it to an internal representation (i.e, StructType), it is a bit hard to find this information on spark docs. I went through the code to find the mapping you are looking for here:
