Skip to content

Commit 4d3e13f

Browse files
Initial Commit
1 parent afa23cc commit 4d3e13f

10 files changed

+941036
-0
lines changed

19-ShuffleJoinDemo/SuffleJoinDemo.py

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
from pyspark.sql import SparkSession
2+
3+
from lib.logger import Log4j
4+
5+
if __name__ == "__main__":
6+
spark = SparkSession \
7+
.builder \
8+
.appName("Shuffle Join Demo") \
9+
.master("local[3]") \
10+
.getOrCreate()
11+
12+
logger = Log4j(spark)
13+
14+
flight_time_df1 = spark.read.json("data/d1/")
15+
flight_time_df2 = spark.read.json("data/d2/")
16+
17+
spark.conf.set("spark.sql.shuffle.partitions", 3)
18+
19+
join_expr = flight_time_df1.id == flight_time_df2.id
20+
join_df = flight_time_df1.join(flight_time_df2, join_expr, "inner")
21+
22+
join_df.foreach(lambda f: None)
23+
input("press a key to stop...")

19-ShuffleJoinDemo/data/d1/part-00000-00af64b6-7ef5-4909-8f82-b8897114efaf-c000.json

Lines changed: 172660 additions & 0 deletions
Large diffs are not rendered by default.

19-ShuffleJoinDemo/data/d1/part-00001-00af64b6-7ef5-4909-8f82-b8897114efaf-c000.json

Lines changed: 171572 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)