wip: Support UDF for spark-rapids-tools #1347

Heatao · 2024-09-18T07:24:06Z

No description provided.

change path add description add description add timeout 20min add return map udf udf use java map instead of scala map add udf 2 make FileSourceScanExecParser factor always 1 rename Support opScore for bd 0325

tgravescs · 2024-09-18T16:03:14Z

Hello @Heatao could you please file an issue and fill in the description with the details on what we feature is being added here?
I'd like to make sure we are on the same page with the feature and how its being implemented.

amahussein

I assume that the title of the PR does not correctly reflect the changes here.
IIUC, this change aims at running the Qualification tool as UDF from within spark.
I am not sure why we would need make that change into the tools core? Why not simply submitting qualification as Spark jobs?

amahussein · 2024-09-18T17:07:10Z

core/pom.xml

+            <!--            <exclusions>-->
+            <!--                <exclusion>-->
+            <!--                    <groupId>org.codehaus.jackson</groupId>-->
+            <!--                    <artifactId>jackson-core-asl</artifactId>-->
+            <!--                </exclusion>-->
+            <!--                <exclusion>-->
+            <!--                    <groupId>org.codehaus.jackson</groupId>-->
+            <!--                    <artifactId>jackson-mapper-asl</artifactId>-->
+            <!--                </exclusion>-->
+            <!--                <exclusion>-->
+            <!--                    <groupId>org.codehaus.jackson</groupId>-->
+            <!--                    <artifactId>jackson-jaxrs</artifactId>-->
+            <!--                </exclusion>-->
+            <!--                <exclusion>-->
+            <!--                    <groupId>org.codehaus.jackson</groupId>-->
+            <!--                    <artifactId>jackson-xc</artifactId>-->
+            <!--                </exclusion>-->
+            <!--                <exclusion>-->
+            <!--                    <groupId>com.google.protobuf</groupId>-->
+            <!--                    <artifactId>protobuf-java</artifactId>-->
+            <!--                </exclusion>-->
+            <!--            </exclusions>-->


remove commented code

amahussein · 2024-09-18T17:10:18Z

core/pom.xml

@@ -443,6 +458,7 @@
        <spark314.version>3.1.4-SNAPSHOT</spark314.version>
        <spark320.version>3.2.0</spark320.version>
        <spark321.version>3.2.1</spark321.version>
+        <bdspark321.version>3.2.1-bd1-SNAPSHOT</bdspark321.version>


Is this available on mvn public repo?

amahussein · 2024-09-18T17:13:00Z

core/pom.xml

@@ -502,6 +518,7 @@
        <project.build.sourceEncoding>${platform-encoding}</project.build.sourceEncoding>
        <project.reporting.sourceEncoding>${platform-encoding}</project.reporting.sourceEncoding>
        <project.reporting.outputEncoding>${platform-encoding}</project.reporting.outputEncoding>
+        <hive.version>1.2.2-bd103</hive.version>


Is this available on public mvn? and why do we need to define custom hive version to do basic functionalities? it will only run in a specific environment.

amahussein · 2024-09-18T17:16:05Z

core/src/main/resources/opScore-onprem-bytedance0325.csv

+RandomForestRegressor-pyspark,3.66
+RandomForestRegressor-scala,1
+XGBoost-pyspark,1
+XGBoost-scala,3.31


This file is going to be deprecated and removed anyway since we rely on qualx for speedups.
FWIW, it is missing some expressions added recently to the qualification tool (i.e., RoundCeil,4
RoundFloor,4
BloomFilterMightContain,4
BloomFilterAggregate,4
EphemeralSubstring,4
KnownNullable,4
InSubqueryExec,4
AQEShuffleReadExec,4
CheckOverflowInTableInsert,4
ArrayFilter,1.5
BoundReference,1.5
HiveHash,1.5
MapFromArrays,1.5
DecimalSum,1.5
MaxBy,1.5
MinBy,1.5
ArrayJoin,1.5

amahussein · 2024-09-18T17:16:58Z

core/src/main/scala/com/nvidia/spark/rapids/tool/udf/EstimateEventRapidsUDF.scala

+      None, None, List(eventPath), hadoopConf)
+    val filteredLogs = eventLogFsFiltered
+
+    // uiEnabled = false


remove unnecessary code. uiEnabled has been removed from the code base

amahussein · 2024-09-18T17:18:52Z

core/src/main/scala/com/nvidia/spark/rapids/tool/udf/EstimateEventRapidsUDF.scala

+    val filteredLogs = eventLogFsFiltered
+
+    // uiEnabled = false
+    val qual = new Qualification(


It is possible to call QualificationMain.mainInternal instead

amahussein · 2024-09-18T17:21:28Z

core/src/main/scala/com/nvidia/spark/rapids/tool/udf/EstimateEventRapidsUDF.scala

+    }
+  }
+
+  private def execMLPredict(applicationId: String, outputDirectory: String): String = {


The architecture of the tools is that jar is being executed by the python wrapper.
Putting an exec to run the predict command from within the jar implies that we have both the jar and the wrapper python are calling each other which is inconsistent and duplicating the work.

amahussein · 2024-09-18T17:27:34Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/profiling/ApplicationInfo.scala

-  metadata, metrics) {
+    val stageId: Option[Int],
+    planId: Int = 0) extends SparkPlanInfo(nodeName, simpleString, children,
+  metadata, metrics, planId) {


does this extra argument break other environment?

nartal1 · 2024-09-18T18:55:16Z

core/src/main/scala/com/nvidia/spark/rapids/tool/udf/EstimateEventRapidsUDF.scala

@@ -0,0 +1,148 @@
+/*
+ * Copyright (c) 2021-2023, NVIDIA CORPORATION.


Nit: Update copyright year in this file and all other files where applicable.

wjxiz1992 · 2024-09-19T03:07:38Z

Some background:

Customer is running in their k8s cluster with a customized docker container, cannot mount external volume => local files will be lost after container is desctroyed.
Their Spark is highly customized, not any public community version, as well as some other dependencies.
Target is to get the "prediction speedup" produced by qualx for their 30,000+ Spark jobs.
blocked by [FEA] Tool output file support for writing to HDFS file system #1348.

This is more for a show/description of what the customer is trying to do, so that we can understand and help accordingly.

tgravescs · 2024-09-19T13:23:38Z

Target is to get the "prediction speedup" produced by qualx for their 30,000+ Spark jobs.

Qual tool has been explicitly moving away from speedup numbers so I think we want to be careful about adding any tools that emphasizes it again. I realize different customers may fit better but we need to figure out how to expose to users if we chose or if its really a customer specific type tool.

hezhengjie added 13 commits September 14, 2024 15:44

Compatible with bd spark

5dcc95b

EstimateEventRapidsUDF v1

1e5a8b4

change path add description add description add timeout 20min add return map udf udf use java map instead of scala map add udf 2 make FileSourceScanExecParser factor always 1 rename Support opScore for bd 0325

wip for new Qualification

7dfe11c

new version for rapids

1e12968

use local output for ml

cee6004

use local output for ml

096f5f6

try to fix loadClusterProps

83341b2

find prediction file

a52f14d

fix local dir

49fcc56

outputDirectory for tools should be hdfs

3347da9

outputDirectory for tools should be hdfs

5cc3ce2

outputDirectory for tools should be hdfs

22152dc

refact output dir

9ab61af

wjxiz1992 mentioned this pull request Sep 18, 2024

[FEA] Tool output file support for writing to HDFS file system #1348

Open

amahussein reviewed Sep 18, 2024

View reviewed changes

nartal1 reviewed Sep 18, 2024

View reviewed changes

hezhengjie added 4 commits September 20, 2024 19:05

Support ml

b1131da

Support ml

6349290

Support ml

83d3273

print log

2412631

amahussein marked this pull request as draft October 15, 2024 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: Support UDF for spark-rapids-tools #1347

wip: Support UDF for spark-rapids-tools #1347

Heatao commented Sep 18, 2024

tgravescs commented Sep 18, 2024

amahussein left a comment

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

amahussein Sep 18, 2024

nartal1 Sep 18, 2024

amahussein Sep 18, 2024

wjxiz1992 commented Sep 19, 2024 •

edited

Loading

tgravescs commented Sep 19, 2024

		@@ -0,0 +1,148 @@
		/*
		* Copyright (c) 2021-2023, NVIDIA CORPORATION.

wip: Support UDF for spark-rapids-tools #1347

Are you sure you want to change the base?

wip: Support UDF for spark-rapids-tools #1347

Conversation

Heatao commented Sep 18, 2024

tgravescs commented Sep 18, 2024

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjxiz1992 commented Sep 19, 2024 • edited Loading

tgravescs commented Sep 19, 2024

wjxiz1992 commented Sep 19, 2024 •

edited

Loading