How does schema inference work in spark.read.parquet?-CodePudding

I'm trying to read a parquet file on spark and I have a question.

How is the type inferred when loading a parquet file with spark.read.parquet?

1. Parquet Type INT32 -> Spark Type IntegerType
2. Parquet inferred from actual stored values -> Spark IntegerType

Is there a dictionary for mapping like 1? Or is it inferred from the actual stored values like 2?

CodePudding user response：

Spark uses the parquet schema to parse it to an internal representation (i.e, StructType), it is a bit hard to find this information on spark docs. I went through the code to find the mapping you are looking for here:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L197-L281