Skip to content

Commit cb3fa6c

Browse files
sunchaodbtsai
authored andcommitted
[SPARK-33212][BUILD] Move to shaded clients for Hadoop 3.x profile
### What changes were proposed in this pull request? This switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. For Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? This serves two purposes: - to unblock Spark from upgrading to Hadoop 3.2.2/3.3.0+. Latest Hadoop versions have upgraded to use Guava 27+ and in order to adopt the latest Hadoop versions in Spark, we'll need to resolve the Guava conflicts. This takes the approach by switching to shaded client jars provided by Hadoop. - avoid pulling 3rd party dependencies from Hadoop and avoid potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes apache#29843 from sunchao/SPARK-29250. Authored-by: Chao Sun <[email protected]> Signed-off-by: DB Tsai <[email protected]>
1 parent ba13b94 commit cb3fa6c

File tree

18 files changed

+186
-96
lines changed

18 files changed

+186
-96
lines changed

common/network-yarn/pom.xml

+7-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,13 @@
6565
<!-- Provided dependencies -->
6666
<dependency>
6767
<groupId>org.apache.hadoop</groupId>
68-
<artifactId>hadoop-client</artifactId>
68+
<artifactId>${hadoop-client-api.artifact}</artifactId>
69+
<version>${hadoop.version}</version>
70+
</dependency>
71+
<dependency>
72+
<groupId>org.apache.hadoop</groupId>
73+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
74+
<version>${hadoop.version}</version>
6975
</dependency>
7076
<dependency>
7177
<groupId>org.slf4j</groupId>

core/pom.xml

+15-1
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,13 @@
6666
</dependency>
6767
<dependency>
6868
<groupId>org.apache.hadoop</groupId>
69-
<artifactId>hadoop-client</artifactId>
69+
<artifactId>${hadoop-client-api.artifact}</artifactId>
70+
<version>${hadoop.version}</version>
71+
</dependency>
72+
<dependency>
73+
<groupId>org.apache.hadoop</groupId>
74+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
75+
<version>${hadoop.version}</version>
7076
</dependency>
7177
<dependency>
7278
<groupId>org.apache.spark</groupId>
@@ -177,6 +183,14 @@
177183
<groupId>org.apache.commons</groupId>
178184
<artifactId>commons-text</artifactId>
179185
</dependency>
186+
<dependency>
187+
<groupId>commons-io</groupId>
188+
<artifactId>commons-io</artifactId>
189+
</dependency>
190+
<dependency>
191+
<groupId>commons-collections</groupId>
192+
<artifactId>commons-collections</artifactId>
193+
</dependency>
180194
<dependency>
181195
<groupId>com.google.code.findbugs</groupId>
182196
<artifactId>jsr305</artifactId>

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+5-3
Original file line numberDiff line numberDiff line change
@@ -1182,10 +1182,12 @@ private[spark] object SparkSubmitUtils {
11821182
def resolveDependencyPaths(
11831183
artifacts: Array[AnyRef],
11841184
cacheDirectory: File): String = {
1185-
artifacts.map { artifactInfo =>
1186-
val artifact = artifactInfo.asInstanceOf[Artifact].getModuleRevisionId
1185+
artifacts.map { ai =>
1186+
val artifactInfo = ai.asInstanceOf[Artifact]
1187+
val artifact = artifactInfo.getModuleRevisionId
1188+
val testSuffix = if (artifactInfo.getType == "test-jar") "-tests" else ""
11871189
cacheDirectory.getAbsolutePath + File.separator +
1188-
s"${artifact.getOrganisation}_${artifact.getName}-${artifact.getRevision}.jar"
1190+
s"${artifact.getOrganisation}_${artifact.getName}-${artifact.getRevision}${testSuffix}.jar"
11891191
}.mkString(",")
11901192
}
11911193

dev/deps/spark-deps-hadoop-2.7-hive-2.3

+1-2
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ javax.inject/1//javax.inject-1.jar
127127
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
128128
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
129129
javolution/5.5.1//javolution-5.5.1.jar
130-
jaxb-api/2.2.2//jaxb-api-2.2.2.jar
130+
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
131131
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
132132
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
133133
jdo-api/3.0.1//jdo-api-3.0.1.jar
@@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
227227
spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
228228
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
229229
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
230-
stax-api/1.0-2//stax-api-1.0-2.jar
231230
stax-api/1.0.1//stax-api-1.0.1.jar
232231
stream/2.9.6//stream-2.9.6.jar
233232
super-csv/2.2.0//super-csv-2.2.0.jar

dev/deps/spark-deps-hadoop-3.2-hive-2.3

+2-50
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,12 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
33
JTransforms/3.1//JTransforms-3.1.jar
44
RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
55
ST4/4.0.4//ST4-4.0.4.jar
6-
accessors-smart/1.2//accessors-smart-1.2.jar
76
activation/1.1.1//activation-1.1.1.jar
87
aircompressor/0.10//aircompressor-0.10.jar
98
algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
109
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
1110
antlr4-runtime/4.7.1//antlr4-runtime-4.7.1.jar
1211
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
13-
aopalliance/1.0//aopalliance-1.0.jar
1412
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
1513
arrow-format/1.0.1//arrow-format-1.0.1.jar
1614
arrow-memory-core/1.0.1//arrow-memory-core-1.0.1.jar
@@ -27,15 +25,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
2725
cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
2826
chill-java/0.9.5//chill-java-0.9.5.jar
2927
chill_2.12/0.9.5//chill_2.12-0.9.5.jar
30-
commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
3128
commons-cli/1.2//commons-cli-1.2.jar
3229
commons-codec/1.10//commons-codec-1.10.jar
3330
commons-collections/3.2.2//commons-collections-3.2.2.jar
3431
commons-compiler/3.0.16//commons-compiler-3.0.16.jar
3532
commons-compress/1.8.1//commons-compress-1.8.1.jar
36-
commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
3733
commons-crypto/1.0.0//commons-crypto-1.0.0.jar
38-
commons-daemon/1.0.13//commons-daemon-1.0.13.jar
3934
commons-dbcp/1.4//commons-dbcp-1.4.jar
4035
commons-httpclient/3.1//commons-httpclient-3.1.jar
4136
commons-io/2.5//commons-io-2.5.jar
@@ -55,30 +50,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
5550
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
5651
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
5752
derby/10.12.1.1//derby-10.12.1.1.jar
58-
dnsjava/2.1.7//dnsjava-2.1.7.jar
5953
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
60-
ehcache/3.3.1//ehcache-3.3.1.jar
6154
flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
6255
generex/1.0.2//generex-1.0.2.jar
63-
geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
6456
gson/2.2.4//gson-2.2.4.jar
6557
guava/14.0.1//guava-14.0.1.jar
66-
guice-servlet/4.0//guice-servlet-4.0.jar
67-
guice/4.0//guice-4.0.jar
68-
hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
69-
hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
70-
hadoop-client/3.2.0//hadoop-client-3.2.0.jar
71-
hadoop-common/3.2.0//hadoop-common-3.2.0.jar
72-
hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
73-
hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
74-
hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
75-
hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
76-
hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
77-
hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
78-
hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
79-
hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
80-
hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
81-
hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
58+
hadoop-client-api/3.2.0//hadoop-client-api-3.2.0.jar
59+
hadoop-client-runtime/3.2.0//hadoop-client-runtime-3.2.0.jar
8260
hive-beeline/2.3.7//hive-beeline-2.3.7.jar
8361
hive-cli/2.3.7//hive-cli-2.3.7.jar
8462
hive-common/2.3.7//hive-common-2.3.7.jar
@@ -108,8 +86,6 @@ jackson-core/2.10.0//jackson-core-2.10.0.jar
10886
jackson-databind/2.10.0//jackson-databind-2.10.0.jar
10987
jackson-dataformat-yaml/2.10.0//jackson-dataformat-yaml-2.10.0.jar
11088
jackson-datatype-jsr310/2.10.3//jackson-datatype-jsr310-2.10.3.jar
111-
jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
112-
jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
11389
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
11490
jackson-module-jaxb-annotations/2.10.0//jackson-module-jaxb-annotations-2.10.0.jar
11591
jackson-module-paranamer/2.10.0//jackson-module-paranamer-2.10.0.jar
@@ -122,13 +98,11 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
12298
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
12399
janino/3.0.16//janino-3.0.16.jar
124100
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
125-
javax.inject/1//javax.inject-1.jar
126101
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
127102
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
128103
javolution/5.5.1//javolution-5.5.1.jar
129104
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
130105
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
131-
jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
132106
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
133107
jdo-api/3.0.1//jdo-api-3.0.1.jar
134108
jersey-client/2.30//jersey-client-2.30.jar
@@ -142,30 +116,14 @@ jline/2.14.6//jline-2.14.6.jar
142116
joda-time/2.10.5//joda-time-2.10.5.jar
143117
jodd-core/3.5.2//jodd-core-3.5.2.jar
144118
jpam/1.1//jpam-1.1.jar
145-
json-smart/2.3//json-smart-2.3.jar
146119
json/1.8//json-1.8.jar
147120
json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
148121
json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
149122
json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
150123
json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
151-
jsp-api/2.1//jsp-api-2.1.jar
152124
jsr305/3.0.0//jsr305-3.0.0.jar
153125
jta/1.1//jta-1.1.jar
154126
jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
155-
kerb-admin/1.0.1//kerb-admin-1.0.1.jar
156-
kerb-client/1.0.1//kerb-client-1.0.1.jar
157-
kerb-common/1.0.1//kerb-common-1.0.1.jar
158-
kerb-core/1.0.1//kerb-core-1.0.1.jar
159-
kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
160-
kerb-identity/1.0.1//kerb-identity-1.0.1.jar
161-
kerb-server/1.0.1//kerb-server-1.0.1.jar
162-
kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
163-
kerb-util/1.0.1//kerb-util-1.0.1.jar
164-
kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
165-
kerby-config/1.0.1//kerby-config-1.0.1.jar
166-
kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
167-
kerby-util/1.0.1//kerby-util-1.0.1.jar
168-
kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
169127
kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
170128
kubernetes-client/4.10.3//kubernetes-client-4.10.3.jar
171129
kubernetes-model-admissionregistration/4.10.3//kubernetes-model-admissionregistration-4.10.3.jar
@@ -203,9 +161,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
203161
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
204162
minlog/1.3.0//minlog-1.3.0.jar
205163
netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
206-
nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
207164
objenesis/2.6//objenesis-2.6.jar
208-
okhttp/2.7.5//okhttp-2.7.5.jar
209165
okhttp/3.12.12//okhttp-3.12.12.jar
210166
okio/1.14.0//okio-1.14.0.jar
211167
opencsv/2.3//opencsv-2.3.jar
@@ -225,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
225181
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
226182
py4j/0.10.9//py4j-0.10.9.jar
227183
pyrolite/4.30//pyrolite-4.30.jar
228-
re2j/1.1//re2j-1.1.jar
229184
scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
230185
scala-compiler/2.12.10//scala-compiler-2.12.10.jar
231186
scala-library/2.12.10//scala-library-2.12.10.jar
@@ -243,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
243198
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
244199
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
245200
stax-api/1.0.1//stax-api-1.0.1.jar
246-
stax2-api/3.1.4//stax2-api-3.1.4.jar
247201
stream/2.9.6//stream-2.9.6.jar
248202
super-csv/2.2.0//super-csv-2.2.0.jar
249203
threeten-extra/1.5.0//threeten-extra-1.5.0.jar
250-
token-provider/1.0.1//token-provider-1.0.1.jar
251204
transaction-api/1.1//transaction-api-1.1.jar
252205
univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
253206
velocity/1.5//velocity-1.5.jar
254-
woodstox-core/5.0.3//woodstox-core-5.0.3.jar
255207
xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
256208
xz/1.5//xz-1.5.jar
257209
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar

external/kafka-0-10-assembly/pom.xml

+7-1
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,15 @@
7171
</dependency>
7272
<dependency>
7373
<groupId>org.apache.hadoop</groupId>
74-
<artifactId>hadoop-client</artifactId>
74+
<artifactId>${hadoop-client-api.artifact}</artifactId>
75+
<version>${hadoop.version}</version>
7576
<scope>provided</scope>
7677
</dependency>
78+
<dependency>
79+
<groupId>org.apache.hadoop</groupId>
80+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
81+
<version>${hadoop.version}</version>
82+
</dependency>
7783
<dependency>
7884
<groupId>org.apache.avro</groupId>
7985
<artifactId>avro-mapred</artifactId>

external/kafka-0-10-sql/pom.xml

+4
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@
7979
<artifactId>kafka-clients</artifactId>
8080
<version>${kafka.version}</version>
8181
</dependency>
82+
<dependency>
83+
<groupId>com.google.code.findbugs</groupId>
84+
<artifactId>jsr305</artifactId>
85+
</dependency>
8286
<dependency>
8387
<groupId>org.apache.commons</groupId>
8488
<artifactId>commons-pool2</artifactId>

external/kafka-0-10-token-provider/pom.xml

+5
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,11 @@
5858
<artifactId>mockito-core</artifactId>
5959
<scope>test</scope>
6060
</dependency>
61+
<dependency>
62+
<groupId>org.apache.hadoop</groupId>
63+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
64+
<scope>${hadoop.deps.scope}</scope>
65+
</dependency>
6166
<dependency>
6267
<groupId>org.apache.spark</groupId>
6368
<artifactId>spark-tags_${scala.binary.version}</artifactId>

external/kinesis-asl-assembly/pom.xml

+7-1
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,15 @@
9191
</dependency>
9292
<dependency>
9393
<groupId>org.apache.hadoop</groupId>
94-
<artifactId>hadoop-client</artifactId>
94+
<artifactId>${hadoop-client-api.artifact}</artifactId>
95+
<version>${hadoop.version}</version>
9596
<scope>provided</scope>
9697
</dependency>
98+
<dependency>
99+
<groupId>org.apache.hadoop</groupId>
100+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
101+
<version>${hadoop.version}</version>
102+
</dependency>
97103
<dependency>
98104
<groupId>org.apache.avro</groupId>
99105
<artifactId>avro-ipc</artifactId>

hadoop-cloud/pom.xml

+6-1
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,15 @@
5858
</dependency>
5959
<dependency>
6060
<groupId>org.apache.hadoop</groupId>
61-
<artifactId>hadoop-client</artifactId>
61+
<artifactId>${hadoop-client-api.artifact}</artifactId>
6262
<version>${hadoop.version}</version>
6363
<scope>provided</scope>
6464
</dependency>
65+
<dependency>
66+
<groupId>org.apache.hadoop</groupId>
67+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
68+
<version>${hadoop.version}</version>
69+
</dependency>
6570
<!--
6671
the AWS module pulls in jackson; its transitive dependencies can create
6772
intra-jackson-module version problems.

launcher/pom.xml

+8-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,14 @@
8181
<!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
8282
<dependency>
8383
<groupId>org.apache.hadoop</groupId>
84-
<artifactId>hadoop-client</artifactId>
84+
<artifactId>${hadoop-client-api.artifact}</artifactId>
85+
<version>${hadoop.version}</version>
86+
<scope>test</scope>
87+
</dependency>
88+
<dependency>
89+
<groupId>org.apache.hadoop</groupId>
90+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
91+
<version>${hadoop.version}</version>
8592
<scope>test</scope>
8693
</dependency>
8794
</dependencies>

0 commit comments

Comments
 (0)