Skip to content

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

License

Notifications You must be signed in to change notification settings

jessekempf-vsco/parquet4s

This branch is 25 commits behind mjakubowski84/parquet4s:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ec223e7 · Feb 25, 2024
Dec 1, 2023
Oct 8, 2023
Feb 24, 2024
Feb 11, 2024
Feb 24, 2024
Nov 7, 2021
Feb 24, 2024
Feb 24, 2024
Feb 11, 2024
Feb 8, 2024
Nov 3, 2023
Feb 24, 2024
Jan 17, 2023
Apr 9, 2022
Nov 7, 2021
Nov 5, 2023
Feb 8, 2022
Feb 12, 2019
Aug 26, 2018
Feb 7, 2024
Feb 25, 2024

Repository files navigation

Parquet4S

Parquet4s is a simple I/O for Parquet. Allows you to easily read and write Parquet files in Scala.

Use just a Scala case class to define the schema of your data. No need to use Avro, Protobuf, Thrift, or other data serialisation systems. You can use generic records if you don't want to use the case class, too.

Compatible with files generated with Apache Spark. However, unlike in Spark, you do not have to start a cluster to perform I/O operations.

Based on official Parquet library, Hadoop Client and Shapeless (Shapeless is not in use in a version for Scala 3).

As it is based on Hadoop Client, you can connect to any Hadoop-compatible storage like AWS S3 or Google Cloud Storage.

Integrations for Akka Streams, Pekko Streams, and FS2.

Released for Scala 2.12.x, 2.13.x and 3.3.x.

Documentation

Documentation is available at here.

Contributing

Do you want to contribute? Please read the contribution guidelines.

Sponsors

About

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 100.0%