Parquet4S

Parquet4s is a simple I/O for Parquet. Allows you to easily read and write Parquet files in Scala.

Use just a Scala case class to define the schema of your data. No need to use Avro, Protobuf, Thrift, or other data serialisation systems. You can use generic records if you don't want to use the case class, too.

Compatible with files generated with Apache Spark. However, unlike in Spark, you do not have to start a cluster to perform I/O operations.

Based on official Parquet library, Hadoop Client and Shapeless (Shapeless is not in use in a version for Scala 3).

As it is based on Hadoop Client, you can connect to any Hadoop-compatible storage like AWS S3 or Google Cloud Storage.

Integrations for Akka Streams, Pekko Streams, and FS2.

Released for Scala 2.12.x, 2.13.x and 3.3.x.

Documentation

Documentation is available at here.

Contributing

Do you want to contribute? Please read the contribution guidelines.

Name	Name	Last commit message	Last commit date
Latest commit mjakubowski84 Preparing 2.18.0-SNAPSHOT Feb 25, 2024 ec223e7 · Feb 25, 2024 History 474 Commits
.circleci	.circleci	fix: do not cross-build scalapb module against akka/pekko	Dec 1, 2023
.github	.github	adding FUNDING.yml	Oct 8, 2023
akkaPekko/src	akkaPekko/src	Experimental record filter. Improvemnts in listing partitioned direct…	Feb 24, 2024
akkaPekkoBenchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	akkaPekkoBenchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	Update akka/pekko partition path generation to avoid unnecessary obje…	Feb 11, 2024
core/src	core/src	Experimental record filter. Improvemnts in listing partitioned direct…	Feb 24, 2024
coreBenchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	coreBenchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	scala3 syntax everywhere	Nov 7, 2021
examples/src/main	examples/src/main	Experimental record filter. Improvemnts in listing partitioned direct…	Feb 24, 2024
fs2/src	fs2/src	Fix the state of partition write counts in postWriteHander in FS2	Feb 24, 2024
fs2Benchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	fs2Benchmarks/src/main/scala/com/github/mjakubowski84/parquet4s	Update fs2 partition path generation to avoid unnecessary object crea…	Feb 11, 2024
project	project	bloop upgrade	Feb 8, 2024
scalapb/src	scalapb/src	mjakubowski84#313 Add pekko support	Nov 3, 2023
site/src/main/resources/docs	site/src/main/resources/docs	Experimental record filter. Improvemnts in listing partitioned direct…	Feb 24, 2024
testkit/src/main/scala/com/github/mjakubowski84/parquet4s/testkit	testkit/src/main/scala/com/github/mjakubowski84/parquet4s/testkit	Fix mjakubowski84#285 : Avoid wrapping FileSystem with Resource	Jan 17, 2023
.gitignore	.gitignore	restore metals.sbt as removing it breaks the build	Apr 9, 2022
.java-version	.java-version	consistent builder API	Nov 7, 2021
.sbtopts	.sbtopts	dependency upgrades	Nov 5, 2023
.scalafmt.conf	.scalafmt.conf	exclude metals.sbt from scalafmt check	Feb 8, 2022
CONTRIBUTING.md	CONTRIBUTING.md	contribution guidelines	Feb 12, 2019
LICENSE	LICENSE	Initial commit	Aug 26, 2018
README.md	README.md	Sponsors	Feb 7, 2024
build.sbt	build.sbt	Preparing 2.18.0-SNAPSHOT	Feb 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parquet4S

Documentation

Contributing

Sponsors

About

Releases

Packages

Languages

License

jessekempf-vsco/parquet4s

Folders and files

Latest commit

History

Repository files navigation

Parquet4S

Documentation

Contributing

Sponsors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages