You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
Issue
I am saving a parquet file with spark where one of the columns is decimal. Physical type of this column becomes INT32 and INT64 based on its precision. Then, when I read the parquet file with AvroParquetReader, I see logical type being long with the wrong value. For example, if original value is 23.4 then read value is 234.
Spark side
If I enable spark.sql.parquet.writeLegacyFormat for the Spark (ex Jira: SPARK-20297), I see that Spark does not use INT32/INT64 as physical type and then I can successfully read the parquet file. However, this is not the default option and also based on the decimal documentation of this repo, INT32/INT64 should be viable options.
How to reproduce
Writing with Spark (version: 3.3.0)
df_temp = spark.createDataFrame([
(120.321, "Alex"), (24.45, "John")],
schema=["salary", "name"]
)
df_temp.createOrReplaceTempView("companyTable")
df = spark.sql("SELECT *, CAST(salary as DECIMAL(10,1)) as decimal_salary FROM companyTable")
df.show()
df.write.parquet("my_path")
Describe the bug, including details regarding any error messages, version, and platform.
Issue
I am saving a parquet file with spark where one of the columns is decimal. Physical type of this column becomes INT32 and INT64 based on its precision. Then, when I read the parquet file with AvroParquetReader, I see logical type being long with the wrong value. For example, if original value is 23.4 then read value is 234.
Spark side
If I enable
spark.sql.parquet.writeLegacyFormat
for the Spark (ex Jira: SPARK-20297), I see that Spark does not use INT32/INT64 as physical type and then I can successfully read the parquet file. However, this is not the default option and also based on the decimal documentation of this repo, INT32/INT64 should be viable options.How to reproduce
Writing with Spark (version: 3.3.0)
Confirming the schema
Running the parquet-tools:
parquet-tools inspect github_example.parquet
Reading with AvroParquetReader
Dependencies
Artifacts
github_example.parquet.zip
Component(s)
Avro
The text was updated successfully, but these errors were encountered: