Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-33101][ML] Make LibSVM format propagate Hadoop config from DS …
…options to underlying HDFS file system ### What changes were proposed in this pull request? Propagate LibSVM options to Hadoop configs in the LibSVM datasource. ### Why are the changes needed? There is a bug that when running: ```scala spark.read.format("libsvm").options(conf).load(path) ``` The underlying file system will not receive the `conf` options. ### Does this PR introduce _any_ user-facing change? Yes. After the changes, for example, users should read files from Azure Data Lake successfully: ```scala def hadoopConf1() = Map[String, String]( s"fs.adl.oauth2.access.token.provider.type" -> "ClientCredential", s"fs.adl.oauth2.client.id" -> dbutils.secrets.get(scope = "...", key = "..."), s"fs.adl.oauth2.credential" -> dbutils.secrets.get(scope = "...", key = "..."), s"fs.adl.oauth2.refresh.url" -> s"https://login.microsoftonline.com/.../oauth2/token") val df = spark.read.format("libsvm").options(hadoopConf1).load("adl://....azuredatalakestore.net/foldersp1/...") ``` and not get the following exception because the settings above are not propagated to the filesystem: ```java java.lang.IllegalArgumentException: No value for fs.adl.oauth2.access.token.provider found in conf file. at ....adl.AdlFileSystem.getNonEmptyVal(AdlFileSystem.java:820) at ....adl.AdlFileSystem.getCustomAccessTokenProvider(AdlFileSystem.java:220) at ....adl.AdlFileSystem.getAccessTokenProvider(AdlFileSystem.java:257) at ....adl.AdlFileSystem.initialize(AdlFileSystem.java:164) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) ``` ### How was this patch tested? Added UT to `LibSVMRelationSuite`. Closes apache#29984 from MaxGekk/ml-option-propagation. Authored-by: Max Gekk <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information