Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnalysisException: Attribute name contains invalid character(s) issue #462

Closed
rambabu-posa opened this issue Jun 22, 2020 · 8 comments
Closed

Comments

@rambabu-posa
Copy link

Exception in thread "main" org.apache.spark.sql.AnalysisException: Attribute name "Code région" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

@rambabu-posa
Copy link
Author

rambabu-posa commented Jun 22, 2020

Not only this column name (with special characters), normal column like "Transaction description" is also failing with same error.

That means if a Spark DataFrame reads a CSV file with column names( having spaces), its throwing same AnalysisException exception.

Is there any way to fix this issue with default character adding to that space like
"Transaction description"
to
"Transaction_description"

Or any other solution to solve this scenario please?

@brkyvz
Copy link
Collaborator

brkyvz commented Jun 22, 2020

You can use the withColumnRenamed function of DataFrames to rename your columns, or you can alias columns in your select statement with alias, e.g.

select(col("Transacation description").alias("Transacation_description"))

@rambabu-posa
Copy link
Author

Yes we can do it programatically. Do we have any option to fix it without doing so?
I think, we don't need to rename column names in Spark, it works very well without doing so.

@brkyvz
Copy link
Collaborator

brkyvz commented Jun 22, 2020

Parquet doesn't allow storing such column names. I'd say it's better engineering practice for you to follow some convention yourself and fix the names rather than having some system arbitrarily fix it for you.

@rambabu-posa
Copy link
Author

rambabu-posa commented Jun 23, 2020

Im able to do the same operation using DataFrame API like
df.write.format("parquet").save("data/par_data")

I would say its a feature in "Delta Lake" that it is enforcing to follow that good practice.

@tdas
Copy link
Contributor

tdas commented Jun 23, 2020

That has indeed been one of our core design principle - an opinionated view of how to manage data without shooting yourself in the foot.

@zsxwing
Copy link
Member

zsxwing commented Apr 7, 2021

Closing this. Invalid characters are disallowed by design.

@zsxwing zsxwing closed this as completed Apr 7, 2021
@surya1527
Copy link

Parquet doesn't allow storing such column names. I'd say it's better engineering practice for you to follow some convention yourself and fix the names rather than having some system arbitrarily fix it for you.

for c in df.columns:
df = df.withColumnRenamed(c, c.replace( ";" , ""))

That worked fine in case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants