-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sparksql dialect #247
base: master
Are you sure you want to change the base?
Add sparksql dialect #247
Conversation
SparkSQL is made to be almost compatible with HiveQL. However, the `show tables` syntax for Spark SQL is slightly different. As such, a new SparkSQL dialect is created based on the original HiveDialect with `get_table_names` modified to accommodate the results returned from Spark SQL's 'SHOW TABLES'.
Added SparkSQL support based on HiveDialect
Codecov Report
@@ Coverage Diff @@
## master #247 +/- ##
==========================================
- Coverage 93.94% 91.12% -2.82%
==========================================
Files 14 15 +1
Lines 1487 1533 +46
Branches 159 169 +10
==========================================
Hits 1397 1397
- Misses 64 108 +44
- Partials 26 28 +2
Continue to review full report at Codecov.
|
What is holding this PR up? @jingw - Is there something that we can do to move this PR along and make it part of the project, or does Spark SQL not fit the mission? |
Many projects relying on PyHive experience problem #150 . Is there any way we can make this PR merged? |
+1. Any plans to get this merged? |
Haven't seen this PR, looks nice. If someone could add unit tests, will be happy to merge and do another pyhive release. |
@bkyryliuk it seems there has been efforts to get Spark SQL in for a long time, but many previous PRs have gone stale in the end. As potential problems are limited to Spark SQL only, in the interest of getting this functionality out there, I wonder if it would make sense to let this in without rigorous tests, and add tests later if/when problems surface? |
It would be quite challenging to maintain from our prospective as we don't leverage spark much. Presto & hive setup doesn't seem to be very involved process: |
@bkyryliuk I've tried to add some unit tests, but many done by sqlalchemy_test_case will fail due the lack of support by spark. |
you can use sqlalchemy engine in those test to do a pass for the not supported functions. |
So many people will be so happy if this is merged and released soon :) |
@bkyryliuk I can help write some tests if necessary, I think this would be a really nice feature to have. |
What's the status on this? Would be happy to help. |
I applied your changes to my project, but had a little issue with column containing What I did to fix this was to change this line from |
Any updates. Maybe we can help to get this through! |
I have been watching this one but haven't seen any action... |
Is there any progress with this? This is causing issues in related applications like Superset |
Any updates regarding this PR? |
Hao Qin Tan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Hello mates, |
Same here, this is particularly important when it comes to catalog metadata fetch in tools like Superset, currently, we cant use physical references to the table, only virtual SQL queries, and metadata exploration using the UI is blocked. |
This PR was based on #187 and only add some fixes due PEP8.
No unit tests for this new dialect were added because many tests done by
sqlalchemy_test_case
will fail due the lack of support of some types by spark (SPARK-21529).