AnalysisException: Attribute name contains invalid character(s) issue #462

rambabu-posa · 2020-06-22T13:30:25Z

Exception in thread "main" org.apache.spark.sql.AnalysisException: Attribute name "Code région" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

rambabu-posa · 2020-06-22T13:39:17Z

Not only this column name (with special characters), normal column like "Transaction description" is also failing with same error.

That means if a Spark DataFrame reads a CSV file with column names( having spaces), its throwing same AnalysisException exception.

Is there any way to fix this issue with default character adding to that space like
"Transaction description"
to
"Transaction_description"

Or any other solution to solve this scenario please?

brkyvz · 2020-06-22T15:49:21Z

You can use the withColumnRenamed function of DataFrames to rename your columns, or you can alias columns in your select statement with alias, e.g.

select(col("Transacation description").alias("Transacation_description"))

rambabu-posa · 2020-06-22T16:01:47Z

Yes we can do it programatically. Do we have any option to fix it without doing so?
I think, we don't need to rename column names in Spark, it works very well without doing so.

brkyvz · 2020-06-22T16:05:54Z

Parquet doesn't allow storing such column names. I'd say it's better engineering practice for you to follow some convention yourself and fix the names rather than having some system arbitrarily fix it for you.

rambabu-posa · 2020-06-23T11:00:15Z

Im able to do the same operation using DataFrame API like
df.write.format("parquet").save("data/par_data")

I would say its a feature in "Delta Lake" that it is enforcing to follow that good practice.

tdas · 2020-06-23T20:02:54Z

That has indeed been one of our core design principle - an opinionated view of how to manage data without shooting yourself in the foot.

zsxwing · 2021-04-07T03:46:13Z

Closing this. Invalid characters are disallowed by design.

surya1527 · 2022-02-22T07:44:40Z

Parquet doesn't allow storing such column names. I'd say it's better engineering practice for you to follow some convention yourself and fix the names rather than having some system arbitrarily fix it for you.

for c in df.columns:
df = df.withColumnRenamed(c, c.replace( ";" , ""))

That worked fine in case

Fixes delta-io#462

zsxwing closed this as completed Apr 7, 2021

olivertan1999 mentioned this issue Sep 21, 2022

Revenue take rate metrics MAST30034-Applied-Data-Science/generic-buy-now-pay-later-project-group-19#29

Merged

tdas pushed a commit to tdas/delta that referenced this issue May 31, 2023

Update Flink FAQ to remove outdated answer (delta-io#463)

743cf09

Fixes delta-io#462

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnalysisException: Attribute name contains invalid character(s) issue #462

AnalysisException: Attribute name contains invalid character(s) issue #462

rambabu-posa commented Jun 22, 2020

rambabu-posa commented Jun 22, 2020 •

edited

Loading

brkyvz commented Jun 22, 2020

rambabu-posa commented Jun 22, 2020

brkyvz commented Jun 22, 2020

rambabu-posa commented Jun 23, 2020 •

edited

Loading

tdas commented Jun 23, 2020

zsxwing commented Apr 7, 2021

surya1527 commented Feb 22, 2022

AnalysisException: Attribute name contains invalid character(s) issue #462

AnalysisException: Attribute name contains invalid character(s) issue #462

Comments

rambabu-posa commented Jun 22, 2020

rambabu-posa commented Jun 22, 2020 • edited Loading

brkyvz commented Jun 22, 2020

rambabu-posa commented Jun 22, 2020

brkyvz commented Jun 22, 2020

rambabu-posa commented Jun 23, 2020 • edited Loading

tdas commented Jun 23, 2020

zsxwing commented Apr 7, 2021

surya1527 commented Feb 22, 2022

rambabu-posa commented Jun 22, 2020 •

edited

Loading

rambabu-posa commented Jun 23, 2020 •

edited

Loading