hello-cassandra

Introduction

This project demonstrates the data modelling process for Apache Cassandra, a noSQL database. The database is here is for a music streaming website. Nosql databases are optimized for fast writes. To design for faster reads tables have to be entirely denormalized with queries in mind. In other words, each table in the Cassandra data base is modelling based on it's unique query outcome, with data redundancy(overlaps between tables) permitted. In some sense this is an easier process than with relational databases, the trick however lies majorly in how data is partioned and sorted across nodes.

How to run project:

Project is a jupiter notebook. That follows these steps:

We walk through file folder with data.
Read in CSVs with all contain individual events
We extract relevant columns(values) from each CSV and read them into new CSV that will house all events
We then initiate the Cassandra cluster and define a key space.
We create all our tables within this keyspace.
We validate our table by running the queries they where designed for.
Drop tables and close connections

Tools

Python Cassandra driver aptly named "cassandra"
Pandas for data manipulation
OS and glob for crawling through file repository

To-do:

Modularize code

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Dolu's Cassandra Project.ipynb		Dolu's Cassandra Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hello-cassandra

Introduction

How to run project:

Tools

To-do:

About

Releases

Packages

Languages

donsolana/hello-cassandra

Folders and files

Latest commit

History

Repository files navigation

hello-cassandra

Introduction

How to run project:

Tools

To-do:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages