Skip to content

Commit 0bf605c

Browse files
cloud-fangatorsmile
authored andcommitted
[SPARK-19292][SQL] filter with partition columns should be case-insensitive on Hive tables
## What changes were proposed in this pull request? When we query a table with a filter on partitioned columns, we will push the partition filter to the metastore to get matched partitions directly. In `HiveExternalCatalog.listPartitionsByFilter`, we assume the column names in partition filter are already normalized and we don't need to consider case sensitivity. However, `HiveTableScanExec` doesn't follow this assumption. This PR fixes it. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#16647 from cloud-fan/bug.
1 parent 148a84b commit 0bf605c

File tree

3 files changed

+25
-2
lines changed

3 files changed

+25
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ object FileSourceStrategy extends Strategy with Logging {
6262
val filterSet = ExpressionSet(filters)
6363

6464
// The attribute name of predicate could be different than the one in schema in case of
65-
// case insensitive, we should change them to match the one in schema, so we donot need to
65+
// case insensitive, we should change them to match the one in schema, so we do not need to
6666
// worry about case sensitivity anymore.
6767
val normalizedFilters = filters.map { e =>
6868
e transform {

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala

+11-1
Original file line numberDiff line numberDiff line change
@@ -146,9 +146,19 @@ case class HiveTableScanExec(
146146
hadoopReader.makeRDDForTable(relation.hiveQlTable)
147147
}
148148
} else {
149+
// The attribute name of predicate could be different than the one in schema in case of
150+
// case insensitive, we should change them to match the one in schema, so we do not need to
151+
// worry about case sensitivity anymore.
152+
val normalizedFilters = partitionPruningPred.map { e =>
153+
e transform {
154+
case a: AttributeReference =>
155+
a.withName(relation.output.find(_.semanticEquals(a)).get.name)
156+
}
157+
}
158+
149159
Utils.withDummyCallSite(sqlContext.sparkContext) {
150160
hadoopReader.makeRDDForPartitionedTable(
151-
prunePartitions(relation.getHiveQlPartitions(partitionPruningPred)))
161+
prunePartitions(relation.getHiveQlPartitions(normalizedFilters)))
152162
}
153163
}
154164
val numOutputRows = longMetric("numOutputRows")

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

+13
Original file line numberDiff line numberDiff line change
@@ -2014,4 +2014,17 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
20142014
)
20152015
}
20162016
}
2017+
2018+
test("SPARK-19292: filter with partition columns should be case-insensitive on Hive tables") {
2019+
withTable("tbl") {
2020+
withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
2021+
sql("CREATE TABLE tbl(i int, j int) USING hive PARTITIONED BY (j)")
2022+
sql("INSERT INTO tbl PARTITION(j=10) SELECT 1")
2023+
checkAnswer(spark.table("tbl"), Row(1, 10))
2024+
2025+
checkAnswer(sql("SELECT i, j FROM tbl WHERE J=10"), Row(1, 10))
2026+
checkAnswer(spark.table("tbl").filter($"J" === 10), Row(1, 10))
2027+
}
2028+
}
2029+
}
20172030
}

0 commit comments

Comments
 (0)