-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50704][SQL] Support more pushdown functions for MySQL connector #49335
base: master
Are you sure you want to change the base?
[SPARK-50704][SQL] Support more pushdown functions for MySQL connector #49335
Conversation
cc @beliefer |
// See https://dev.mysql.com/doc/refman/8.4/en/built-in-function-reference.html | ||
// The functions listed here have the same signature and similar semantics, | ||
// and can be supported with existing mechanism. | ||
private val supportedSQLFunctions = Set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can split this set of functions according to Spark's classification of functions, for example, ABS
belongs to Mathematical Functions
, COALESCE
to Conditional Functions
, LOWER
to String Functions
, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fixed now, thanks for the recommendation. PTAL.
@HyukjinKwon Thank you for ping me. I will take a look a little later. |
Signed-off-by: Xiaoguang Sun <[email protected]>
Signed-off-by: Xiaoguang Sun <[email protected]>
1a2f5f5
to
1c27e03
Compare
sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Xiaoguang Sun <[email protected]>
@@ -374,6 +374,7 @@ abstract class JdbcDialect extends Serializable with Logging { | |||
case dateValue: Date => "'" + dateValue + "'" | |||
case dateValue: LocalDate => s"'${DateFormatter().format(dateValue)}'" | |||
case arrayValue: Array[Any] => arrayValue.map(compileValue).mkString(", ") | |||
case binaryValue: Array[Byte] => binaryValue.map("%02X".format(_)).mkString("X'", "", "'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this good for any databases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bug existed before.
Array[Byte] is matched as wildcard, therefore the literal of binary data is eventually formatted as '[B@757d6814' which must be fixed anyway.
This fix is trying to format literal of binary data in hex format which is the case at least for MySQL, PostgreSQL, MS SQL Server and Spark SQL. For those databases behave differently, we can implement the format in their dialects respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, could we split this fix into another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it will break the tests as there are some functions taking binary data, e.g. md5, sha1 and sha2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this, let's add a test to this type to V2JDBCTest.scala, this way we are sure the change is tested on other databases as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR to fix binary literal is created, PTAL: #49452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean fix the bug with a dedicated PR. It helps to review it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR to fix cast types on MySQL is created: PTAL: #49453
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix that bug first, then promote this one.
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
Outdated
Show resolved
Hide resolved
...er-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala
Show resolved
Hide resolved
Signed-off-by: Xiaoguang Sun <[email protected]>
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Xiaoguang Sun <[email protected]>
Depends on apache#49452 and apache#49453 Signed-off-by: Xiaoguang Sun <[email protected]>
a9575ca
to
604a1e3
Compare
What changes were proposed in this pull request?
This PR tries to implement more pushdown functions for MySQL connector.
Why are the changes needed?
There are Spark SQL functions having the same function signature and similar semantic in MySQL. This PR tries to support these SQL functions in pushdown and added integration tests to make sure it works for valid types.
Does this PR introduce any user-facing change?
'No'
How was this patch tested?
Integration tests added.
Was this patch authored or co-authored using generative AI tooling?
'No'