Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparksql dialect #247

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

gmcoringa
Copy link

This PR was based on #187 and only add some fixes due PEP8.

No unit tests for this new dialect were added because many tests done by sqlalchemy_test_case will fail due the lack of support of some types by spark (SPARK-21529).

Hao Qin Tan and others added 3 commits September 11, 2018 09:28
SparkSQL is made to be almost compatible with HiveQL. However, the
`show tables` syntax for Spark SQL is slightly different. As such, a new
SparkSQL dialect is created based on the original HiveDialect with
`get_table_names` modified to accommodate the results returned from Spark SQL's
'SHOW TABLES'.
Added SparkSQL support based on HiveDialect
@codecov-io
Copy link

Codecov Report

Merging #247 into master will decrease coverage by 2.81%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
- Coverage   93.94%   91.12%   -2.82%     
==========================================
  Files          14       15       +1     
  Lines        1487     1533      +46     
  Branches      159      169      +10     
==========================================
  Hits         1397     1397              
- Misses         64      108      +44     
- Partials       26       28       +2
Impacted Files Coverage Δ
pyhive/sqlalchemy_sparksql.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d19cb0c...a07d74d. Read the comment docs.

leftjs added a commit to leftjs/PyHive that referenced this pull request Sep 19, 2019
@nchammas
Copy link

nchammas commented Oct 6, 2019

What is holding this PR up?

@jingw - Is there something that we can do to move this PR along and make it part of the project, or does Spark SQL not fit the mission?

@serialx
Copy link

serialx commented Oct 15, 2019

Many projects relying on PyHive experience problem #150 . Is there any way we can make this PR merged?

@prongs
Copy link

prongs commented May 8, 2020

+1. Any plans to get this merged?

@bkyryliuk
Copy link
Collaborator

bkyryliuk commented May 8, 2020

Haven't seen this PR, looks nice. If someone could add unit tests, will be happy to merge and do another pyhive release.

@villebro
Copy link

@bkyryliuk it seems there has been efforts to get Spark SQL in for a long time, but many previous PRs have gone stale in the end. As potential problems are limited to Spark SQL only, in the interest of getting this functionality out there, I wonder if it would make sense to let this in without rigorous tests, and add tests later if/when problems surface?

@bkyryliuk
Copy link
Collaborator

bkyryliuk commented May 12, 2020

It would be quite challenging to maintain from our prospective as we don't leverage spark much.
I am not looking for 100 % test coverage, but would prefer to have at least a smoke test.

Presto & hive setup doesn't seem to be very involved process:
https://github.com/dropbox/PyHive/blob/master/scripts/travis-install.sh
I assume spark would be somewhat similar

@gmcoringa
Copy link
Author

@bkyryliuk I've tried to add some unit tests, but many done by sqlalchemy_test_case will fail due the lack of support by spark.
It's possible to do some tests, but all tests done by sqlalchemy_test_case will be omitted.

@bkyryliuk
Copy link
Collaborator

you can use sqlalchemy engine in those test to do a pass for the not supported functions.
Superset has a good example: https://github.com/apache/incubator-superset/blob/903217f64d38b2083bb62a8a2b81686a607ba479/tests/sqllab_tests.py#L76

@ali-bahjati
Copy link

So many people will be so happy if this is merged and released soon :)

@villebro
Copy link

@bkyryliuk I can help write some tests if necessary, I think this would be a really nice feature to have.

@mickymiek
Copy link

What's the status on this? Would be happy to help.

@mickymiek
Copy link

mickymiek commented Dec 3, 2020

I applied your changes to my project, but had a little issue with column containing # Partitioning", "Not partitioned" and one being empty. I saw there was a filter on a similar column in sqlalchemy_hive.py. I guess this differs from one hive version to another (using 2.3.7 here).

What I did to fix this was to change this line from sqlalchemy_sparksql.py to rows = [column for column in connection.execute('DESCRIBE {}'.format(full_table)).fetchall() if column[0] not in {"# Partitioning", "Not partitioned", ""}].

@panoptikum
Copy link

Any updates. Maybe we can help to get this through!

@Data-drone
Copy link

I have been watching this one but haven't seen any action...

@tomkos
Copy link

tomkos commented Oct 18, 2021

Is there any progress with this? This is causing issues in related applications like Superset

@OmarRehan
Copy link

Any updates regarding this PR?

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ gmcoringa
❌ Hao Qin Tan


Hao Qin Tan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jbguerraz
Copy link

Hello mates,
What could we do to move ahead with this one ? ready to help!

@pedrosalgadowork
Copy link

Same here, this is particularly important when it comes to catalog metadata fetch in tools like Superset, currently, we cant use physical references to the table, only virtual SQL queries, and metadata exploration using the UI is blocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.