Add sparksql dialect #247

gmcoringa · 2018-10-18T18:11:23Z

This PR was based on #187 and only add some fixes due PEP8.

No unit tests for this new dialect were added because many tests done by sqlalchemy_test_case will fail due the lack of support of some types by spark (SPARK-21529).

SparkSQL is made to be almost compatible with HiveQL. However, the `show tables` syntax for Spark SQL is slightly different. As such, a new SparkSQL dialect is created based on the original HiveDialect with `get_table_names` modified to accommodate the results returned from Spark SQL's 'SHOW TABLES'.

Added SparkSQL support based on HiveDialect

codecov-io · 2018-10-18T18:19:24Z

Codecov Report

Merging #247 into master will decrease coverage by 2.81%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #247      +/-   ##
==========================================
- Coverage   93.94%   91.12%   -2.82%     
==========================================
  Files          14       15       +1     
  Lines        1487     1533      +46     
  Branches      159      169      +10     
==========================================
  Hits         1397     1397              
- Misses         64      108      +44     
- Partials       26       28       +2

Impacted Files	Coverage Δ
pyhive/sqlalchemy_sparksql.py	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d19cb0c...a07d74d. Read the comment docs.

nchammas · 2019-10-06T01:27:18Z

What is holding this PR up?

@jingw - Is there something that we can do to move this PR along and make it part of the project, or does Spark SQL not fit the mission?

serialx · 2019-10-15T12:05:30Z

Many projects relying on PyHive experience problem #150 . Is there any way we can make this PR merged?

prongs · 2020-05-08T07:50:55Z

+1. Any plans to get this merged?

bkyryliuk · 2020-05-08T17:35:08Z

Haven't seen this PR, looks nice. If someone could add unit tests, will be happy to merge and do another pyhive release.

villebro · 2020-05-12T05:35:58Z

@bkyryliuk it seems there has been efforts to get Spark SQL in for a long time, but many previous PRs have gone stale in the end. As potential problems are limited to Spark SQL only, in the interest of getting this functionality out there, I wonder if it would make sense to let this in without rigorous tests, and add tests later if/when problems surface?

bkyryliuk · 2020-05-12T16:21:23Z

It would be quite challenging to maintain from our prospective as we don't leverage spark much.
I am not looking for 100 % test coverage, but would prefer to have at least a smoke test.

Presto & hive setup doesn't seem to be very involved process:
https://github.com/dropbox/PyHive/blob/master/scripts/travis-install.sh
I assume spark would be somewhat similar

gmcoringa · 2020-05-12T17:19:55Z

@bkyryliuk I've tried to add some unit tests, but many done by sqlalchemy_test_case will fail due the lack of support by spark.
It's possible to do some tests, but all tests done by sqlalchemy_test_case will be omitted.

bkyryliuk · 2020-05-12T19:00:07Z

you can use sqlalchemy engine in those test to do a pass for the not supported functions.
Superset has a good example: https://github.com/apache/incubator-superset/blob/903217f64d38b2083bb62a8a2b81686a607ba479/tests/sqllab_tests.py#L76

ali-bahjati · 2020-05-13T21:13:32Z

So many people will be so happy if this is merged and released soon :)

villebro · 2020-05-13T21:24:43Z

@bkyryliuk I can help write some tests if necessary, I think this would be a really nice feature to have.

mickymiek · 2020-11-27T15:58:27Z

What's the status on this? Would be happy to help.

mickymiek · 2020-12-03T13:58:33Z

I applied your changes to my project, but had a little issue with column containing # Partitioning", "Not partitioned" and one being empty. I saw there was a filter on a similar column in sqlalchemy_hive.py. I guess this differs from one hive version to another (using 2.3.7 here).

What I did to fix this was to change this line from sqlalchemy_sparksql.py to rows = [column for column in connection.execute('DESCRIBE {}'.format(full_table)).fetchall() if column[0] not in {"# Partitioning", "Not partitioned", ""}].

panoptikum · 2021-09-23T16:43:43Z

Any updates. Maybe we can help to get this through!

Data-drone · 2021-09-30T12:59:57Z

I have been watching this one but haven't seen any action...

tomkos · 2021-10-18T20:03:34Z

Is there any progress with this? This is causing issues in related applications like Superset

OmarRehan · 2022-01-28T21:01:59Z

Any updates regarding this PR?

CLAassistant · 2022-04-16T21:46:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ gmcoringa
❌ Hao Qin Tan

Hao Qin Tan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

jbguerraz · 2022-07-13T14:52:31Z

Hello mates,
What could we do to move ahead with this one ? ready to help!

pedrosalgadowork · 2023-05-15T10:52:18Z

Same here, this is particularly important when it comes to catalog metadata fetch in tools like Superset, currently, we cant use physical references to the table, only virtual SQL queries, and metadata exploration using the UI is blocked.

Hao Qin Tan and others added 3 commits September 11, 2018 09:28

Merge pull request #1 from tanhaoqin/add-sparksql-dialect

1fb7c88

Added SparkSQL support based on HiveDialect

PEP 8 for sqlalchemy_sparksql

a07d74d

leftjs added a commit to leftjs/PyHive that referenced this pull request Sep 19, 2019

apply dropbox#247

1a1437d

codenamelxl mentioned this pull request Apr 22, 2021

SQL Lab does not show table names in the dropdown box apache/superset#10940

Closed

3 tasks

skrydal mentioned this pull request Mar 16, 2022

Add support for sparksql dialect acryldata/PyHive#4

Merged

jbguerraz mentioned this pull request Jul 13, 2022

Error loading tables from Hive2/SparkSQL apache/superset#16509

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sparksql dialect #247

Add sparksql dialect #247

gmcoringa commented Oct 18, 2018

codecov-io commented Oct 18, 2018

nchammas commented Oct 6, 2019

serialx commented Oct 15, 2019

prongs commented May 8, 2020

bkyryliuk commented May 8, 2020 •

edited

Loading

villebro commented May 12, 2020

bkyryliuk commented May 12, 2020 •

edited

Loading

gmcoringa commented May 12, 2020

bkyryliuk commented May 12, 2020

ali-bahjati commented May 13, 2020

villebro commented May 13, 2020

mickymiek commented Nov 27, 2020

mickymiek commented Dec 3, 2020 •

edited

Loading

panoptikum commented Sep 23, 2021

Data-drone commented Sep 30, 2021

tomkos commented Oct 18, 2021

OmarRehan commented Jan 28, 2022

CLAassistant commented Apr 16, 2022 •

edited

Loading

jbguerraz commented Jul 13, 2022

pedrosalgadowork commented May 15, 2023

Add sparksql dialect #247

Are you sure you want to change the base?

Add sparksql dialect #247

Conversation

gmcoringa commented Oct 18, 2018

codecov-io commented Oct 18, 2018

Codecov Report

nchammas commented Oct 6, 2019

serialx commented Oct 15, 2019

prongs commented May 8, 2020

bkyryliuk commented May 8, 2020 • edited Loading

villebro commented May 12, 2020

bkyryliuk commented May 12, 2020 • edited Loading

gmcoringa commented May 12, 2020

bkyryliuk commented May 12, 2020

ali-bahjati commented May 13, 2020

villebro commented May 13, 2020

mickymiek commented Nov 27, 2020

mickymiek commented Dec 3, 2020 • edited Loading

panoptikum commented Sep 23, 2021

Data-drone commented Sep 30, 2021

tomkos commented Oct 18, 2021

OmarRehan commented Jan 28, 2022

CLAassistant commented Apr 16, 2022 • edited Loading

jbguerraz commented Jul 13, 2022

pedrosalgadowork commented May 15, 2023

bkyryliuk commented May 8, 2020 •

edited

Loading

bkyryliuk commented May 12, 2020 •

edited

Loading

mickymiek commented Dec 3, 2020 •

edited

Loading

CLAassistant commented Apr 16, 2022 •

edited

Loading