Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50704][SQL] Support more pushdown functions for MySQL connector #49335

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sunxiaoguang
Copy link

@sunxiaoguang sunxiaoguang commented Dec 30, 2024

What changes were proposed in this pull request?

This PR tries to implement more pushdown functions for MySQL connector.

ABS COALESCE GREATEST LEAST RAND LOG10 LOG2 LN EXP POWER SQRT SIN COS TAN COT ASIN ACOS ATAN
ATAN2 DEGREES RADIANS SIGN SUBSTRING UPPER LOWER SHA1 SHA2 MD5 CRC32 BIT_LENGTH CHAR_LENGTH CONCAT

Why are the changes needed?

There are Spark SQL functions having the same function signature and similar semantic in MySQL. This PR tries to support these SQL functions in pushdown and added integration tests to make sure it works for valid types.

Does this PR introduce any user-facing change?

'No'

How was this patch tested?

Integration tests added.

Was this patch authored or co-authored using generative AI tooling?

'No'

@HyukjinKwon
Copy link
Member

cc @beliefer

// See https://dev.mysql.com/doc/refman/8.4/en/built-in-function-reference.html
// The functions listed here have the same signature and similar semantics,
// and can be supported with existing mechanism.
private val supportedSQLFunctions = Set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split this set of functions according to Spark's classification of functions, for example, ABS belongs to Mathematical Functions, COALESCE to Conditional Functions, LOWER to String Functions, and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed now, thanks for the recommendation. PTAL.

@beliefer
Copy link
Contributor

beliefer commented Jan 7, 2025

@HyukjinKwon Thank you for ping me. I will take a look a little later.

@sunxiaoguang sunxiaoguang force-pushed the more_mysql_pushdown_functions branch from 1a2f5f5 to 1c27e03 Compare January 8, 2025 11:53
@sunxiaoguang sunxiaoguang requested a review from beliefer January 10, 2025 02:17
@@ -374,6 +374,7 @@ abstract class JdbcDialect extends Serializable with Logging {
case dateValue: Date => "'" + dateValue + "'"
case dateValue: LocalDate => s"'${DateFormatter().format(dateValue)}'"
case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ")
case binaryValue: Array[Byte] => binaryValue.map("%02X".format(_)).mkString("X'", "", "'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this good for any databases?

Copy link
Author

@sunxiaoguang sunxiaoguang Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a bug existed before.
Array[Byte] is matched as wildcard, therefore the literal of binary data is eventually formatted as '[B@757d6814' which must be fixed anyway.

This fix is trying to format literal of binary data in hex format which is the case at least for MySQL, PostgreSQL, MS SQL Server and Spark SQL. For those databases behave differently, we can implement the format in their dialects respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, could we split this fix into another PR?

Copy link
Author

@sunxiaoguang sunxiaoguang Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it will break the tests as there are some functions taking binary data, e.g. md5, sha1 and sha2

Copy link
Author

@sunxiaoguang sunxiaoguang Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this, let's add a test to this type to V2JDBCTest.scala, this way we are sure the change is tested on other databases as well.

Copy link
Author

@sunxiaoguang sunxiaoguang Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR to fix binary literal is created, PTAL: #49452

Copy link
Contributor

@beliefer beliefer Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean fix the bug with a dedicated PR. It helps to review it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR to fix cast types on MySQL is created: PTAL: #49453

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been refactored to depend on #49452 and #49453 PTAL

We need to wait for these two prerequisites merged and rebase on the new base before tests can pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix that bug first, then promote this one.

Signed-off-by: Xiaoguang Sun <[email protected]>
@sunxiaoguang sunxiaoguang force-pushed the more_mysql_pushdown_functions branch from a9575ca to 604a1e3 Compare January 11, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants