This guide walks you through the process of Spring Data Cassandra to build an application that stores data in and retrieves it from Apache Cassandra, a high-performance distributed database.
You will store and retrieve data from Apache Cassandra by using Spring Data Cassandra.
build.gradle
link:https://raw.githubusercontent.com/salmar/gs-accessing-data-cassandra/master/initial/build.gradle[role=include]
pom.xml
link:https://raw.githubusercontent.com/salmar/gs-accessing-data-cassandra/master/initial/pom.xml[role=include]
Before you can build the application, you need to set up a Cassandra database. Apache Cassandra is an open-source NoSQL data store optimized for fast reads and fast writes in large datasets. In the next subsections, you can choose between using DataStax Astra DB Cassandra-as-a-Service or running it locally on a Docker container. This guide will describe using the free tier of DataStax Astra Cassandra-as-a-Service so you can create and store data in your Cassandra database in a matter of minutes.
Add the following properties in your application.properties
(src/main/resources/application.properties
) to configure Spring Data Cassandra:
spring.data.cassandra.schema-action=CREATE_IF_NOT_EXISTS
spring.data.cassandra.request.timeout=10s
spring.data.cassandra.connection.connect-timeout=10s
spring.data.cassandra.connection.init-query-timeout=10s
The spring.data.cassandra.schema-action
property defines the schema action to take at startup and can be none
, create
, create-if-not-exists
, recreate
or recreate-drop-unused
. We’ll be using create-if-not-exists
to create the required schema. See the
documentation for details.
Note
|
It is a good security practice to set this to none in production, to avoid the creation / recreation of the database at startup.
|
We will also be increasing the default timeouts which might be needed when first creating the schema or with slow remote network connections.
To use a managed database, you can use the rebust free tier of DataStax Astra DB Cassandra-as-a-Service. It will scale to zero when unused. Follow the instructions in the following link to create a database and keystore named spring_cassandra
.
The Spring Boot Astra starter will pull in and autoconfigure all the required dependencies, add it to your pom.xml
:
<dependency>
<groupId>com.datastax.astra</groupId>
<artifactId>astra-spring-boot-starter</artifactId>
<version>0.1.13</version>
</dependency>
Note
|
For Gradle, add implementation 'com.datastax.astra:astra-spring-boot-starter:0.1.13' to your build.gradle
|
The Astra auto-configuration needs the configuration to connect to our cloud database:
-
Define the credentials: client ID, client secret and application token.
-
Select your instance with the cloud region, database id and keyspace (
spring_cassandra
).
Add these extra properties in your application.properties
(src/main/resources/application.properties
) to configure Astra:
# Credentials to Astra DB
astra.client-id=<CLIENT_ID>
astra.client-secret=<CLIENT_SECRET>
astra.application-token=<APP_TOKEN>
# Select an Astra instance
astra.cloud-region=<DB_REGION>
astra.database-id=<DB_ID>
astra.keyspace=spring_cassandra
If you prefer to run Cassandra locally in a containerized environment, execute the following docker run command:
docker run -p 9042:9042 --rm --name cassandra -d cassandra:3.11
After the container is created, access the Cassandra query language shell:
docker exec -it cassandra cqlsh
And create a keyspace for the application:
CREATE KEYSPACE spring_cassandra WITH replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1};
Now that you have your database running, configure Spring Data Cassandra to access your database.
Add the following properties in your application.properties
(src/main/resources/application.properties
) to connect to your local database:
spring.data.cassandra.local-datacenter=datacenter1
spring.data.cassandra.keyspace-name=spring_cassandra
Alternatively, for a convenient bundle of Cassandra and related Kubernetes ecosystem projects, you can spin up a single node Cassandra cluster on K8ssandra in about 10 minutes.
In this example, you’ll define a Vet
(Veterinarian) entity. The following listing shows the Vet
class (in
src/main/java/com/example/accessingdatacassandra/Vet.java
):
link:complete/src/main/java/com/example/accessingdatacassandra/Vet.java[role=include]
The Vet
class is annotated with @Table
which maps it to a Cassandra Table. Each property will be mapped to a column.
The class uses a simple @PrimaryKey
of type UUID
. Choosing the right primary key is essential, this will determine our partition key and cannot be changed later.
Note
|
Why is it so important? The partition key not only defines data uniqueness but also controls data locality. When inserting data, the primary key is hashed and used to choose the node where to store the data, this way we know the data will always be found in that node. |
Cassandra denormalizes data and does not need table joins like SQL/RDBMS does, which allows you to retrieve data much faster. For that reason, we have modelled our specialties
as a Set<String>
.
Spring Data Cassandra is focused on storing data in Apache Cassandra. But it inherits functionality from the Spring Data Commons project, including the ability to derive queries. Essentially, you need not learn the query language of Cassandra. Instead, you can write a handful of methods and let the queries be written for you.
To see how this works, create a repository interface that queries Vet
entities, as the following listing (in src/main/java/com/example/accessingdatacaddandra/VetRepository.java
) shows:
link:complete/src/main/java/com/example/accessingdatacassandra/VetRepository.java[role=include]
VetRepository
extends the CassandraRepository
interface and specifies types for the generic type parameters for both the value and the key that the Repository works with, i.e. Vet
and UUID
, respectively. Out-of-the-box, this interface comes with many operations, including basic CRUD (CREATE, READ UPDATE, DELETE) and simple query (e.g. findById(..)) data access operations. CassandraRepository
doesn’t extend from PagingAndSortingRepository
, because classic paging patterns using limit/offset are not applicable to Cassandra.
You can define other queries as needed by simply declaring their method signature. However, you can only perform queries that include the primary key. The method findByFirstName
is a valid Spring Data method but won’t be allowed in Cassandra as firstName
is not part of the primary key.
Note
|
Some generated methods in the repository might require a full table scan. One example is the findAll method, which requires querying all nodes in the cluster. Such queries are not recommended with large datasets as they can impact performance.
|
Define a bean of type CommandLineRunner
and inject the VetRepository
to set up some data and use its methods.
Spring Boot automatically handles those repositories as long as they are included in the
same package (or a sub-package) of your @SpringBootApplication
class. For more control
over the registration process, you can use the @EnableCassandraRepositories
annotation.
Note
|
By default, @EnableCassandraRepositories scans the current package for any interfaces
that extend one of Spring Data’s repository interfaces. You can use its
basePackageClasses=MyRepository.class to safely tell Spring Data Cassandra to scan a
different root package by type if your project layout has multiple projects and it does
not find your repositories.
|
Spring Data Cassandra uses the CassandraTemplate
to execute the queries behind your find* methods. You can use the template yourself for more complex queries, but this guide does not cover that. (see the Spring Data Cassandra Reference Guide[https://docs.spring.io/spring-data/cassandra/docs/current/reference/html/#reference]).
The following listing shows the finished AccessingDataCassandraApplication class (at /src/main/java/com/example/accessingdatacassandra/AccessingDataCassandraApplication.java):
link:complete/src/main/java/com/example/accessingdatacassandra/AccessingDataCassandraApplication.java[role=include]
Congratulations! You’ve just developed a Spring application that uses Spring Data Cassandra to access distributed data.