A functional wrapper around Spark to make it work with ZIO
You can find the documentation of zio-spark here.
You can ask us (Dylan, Jonathan) for some help if you want to use the lib or have questions around it : https://calendly.com/zio-spark/help
If you want to get the very last version of this library you can still download it using:
libraryDependencies += "io.univalence" %% "zio-spark" % "0.9.0"
If you want to get the latest snapshots (the version associated with the last commit on master), you can still download it using:
resolvers += Resolver.sonatypeRepo("snapshots"),
libraryDependencies += "io.univalence" %% "zio-spark" % "<SNAPSHOT-VERSION>"
You can find the latest version on nexus repository manager.
ZIO-Spark is compatible with Scala 2.11, 2.12 and 2.13. Spark is provided, you must add your own Spark version in build.sbt (as you would usually).
We advise you to use the latest version of Spark for your scala version.
We worked to make zio-spark available for Scala 3, so it works with zio-direct.
import zio.*
import zio.direct.*
import zio.spark.sql.*
//import for syntax + spark encoders
import zio.spark.sql.implicits.*
import scala3encoders.given
//throwsAnalysisException directly
import zio.spark.sql.TryAnalysis.syntax.throwAnalysisException
object Main extends ZIOAppDefault {
val sparkSession = SparkSession.builder.master("local").asLayer
override def run = {
defer {
val readBuild: RIO[SparkSession,DataFrame] = SparkSession.read.text("./build.sbt")
val text: Dataset[String] = readBuild.run.as[String]
text.filter(_.contains("zio")).show(truncate = false).run
Console.printLine("what a time to be alive!").run
}.provideLayer(sparkSession)
}
}
build.sbt
scalaVersion := "3.2.1"
"dev.zio" %% "zio" % "2.0.5",
"dev.zio" % "zio-direct_3" % "1.0.0-RC1",
"io.univalence" %% "zio-spark" % "0.9.0",
("org.apache.spark" %% "spark-sql" % "3.3.1" % Provided).cross(CrossVersion.for3Use2_13),
("org.apache.hadoop" % "hadoop-client" % "3.3.1" % Provided),
"dev.zio" %% "zio-test" % "2.0.5" % Test
There are many reasons why we decide to build this library, such as:
- allowing user to build Spark pipeline with ZIO easily.
- making better code, pure FP, more composable, more readable Spark code.
- stopping the propagation of
implicit SparkSessions
. - improving some performances.
- taking advantage of ZIO allowing our jobs to retry and to be run in parallel.
- ZparkIO a framework for Spark, ZIO
- iskra from VirtusLab, and interresting take and typesafety for Spark, without compromises on performance.
- spark-scala3, one of our dependency to support avec encoders for Spark in Scala3.
Pull requests are welcomed. We are open to organize pair-programming session to tackle improvements. If you want to add
new things in zio-spark
, don't hesitate to open an issue!
You can also talk to us directly using this link if you are interested to contribute https://calendly.com/zio-spark/contribution.