This is a demo project to show you how you can use Cloud Run, Stackdriver, Pub/Sub, Cloud Monitoring and BigQuery to accept events from your clients into BigQuery, with no hassle of building a scalable infastructure where all these services are serverless servicess and have a free tier!.
- Backend: node.js, express, dotenv, Docker, GCP sdks and Bunyan
- HTTP benchmarking: wrk
- Cloud: gcloud commands and GCP console
Clients will generate events and send it to backend, API to receive events considerd as a Producer
and will send these events to a Queue to store their temporarily, Producer
will write these events as log messages, each event as a log message, later a sink in Stackdriver
will send these logs to a Pub\Sub Topic - Events topic
Consumers are jobs running to pull a big batch of events, do an ETL on events and load into BigQuery.
Consumer here built as an API call, it will take around 9 mins to do its work and return 200 response, you can run the job asynchronously to process high load.
The consumer will get triggered from a Pub/Sub message
, once the job finish its work, it will return 200 success response
- GCP account, you can get free credits once you signup
- Install gcloud and login to your account, gcloud
- Create new GCP project, create/get a project ID
git clone [email protected]:omegaes/serverless-streaming.git cd serverless-streaming/deploy export PROJECT_ID= export LOCATION=US export REGION=us-east1 export SERVICE_ACCOUNT_ID= export ROLE_ID=roleCloudRun export SINK_NAME=events_sink export PUBSUB_EVENTS_TOPIC=events_topic export PUBSUB_EVENTS_SUB=events_subscription export PUBSUB_INVOKE_TOPIC=invoke_topic export PUBSUB_INVOKE_SUB=invoke_topic_subscription export LOG_NAME=events-service export$PROJECT_ID/topics/$PUBSUB_EVENTS_TOPIC export ROLE=projects/$PROJECT_ID/roles/$ROLE_ID export DATASET_NAME=test_data export TABLE_NAME=events ./ #start building and deploying!
- command .sh includes these functions:
- gcloudConfig: to configure your gcloud CLI with your project and preference
- buildProducerApiAndDeployToCloudRun: build producer API from api folder, deploy to cloud run
- buildConsumerApiAndDeployToCloudRun: build consumer API from job folder, deploy to cloud run
- createPubSubTopicsAndSubscriptions: Build 2 topics and its subscriptions to run this demo
- createLogsSink: create stackdriver logs sink to redirect from Producer API stdout to Pub/Sub topic
- createAlertPolicyAndNotificationChannel: create cloud schedlure and alert to trigger Consumer API to process events and do the ETL work
- testProducerAndConsumer: to run the demo, produce random events from clients' calls, then you can monitor BigQuery after ETL run successfully
- cleanUpResources: delete all create resources in this example
You can refer to my article on Medium to understand how this example works, See Stream Millions of events from your client to BigQuery in a Serverless way, Part #1.