- Get the latest Lambda binaries
- Read the beta release announcement
- Install AWS CDK Toolkit (cdk command)
npm install -g aws-cdk
- Ensure
curl
andmake
are installed - To run the invocation example
make
commands, you will also need Python 3.10 or later andpip
installed (see Python venv below).
For newly created AWS accounts, a conservative quota of 10 concurrent executions is applied to Lambda in each individual region. If that's the case, CDK won't be able to apply the reserved concurrency of the indexing Quickwit lambda. You can increase the quota without charge using the Service Quotas console.
Note: The request can take hours or even days to be processed.
The Python environment is configured using pipenv:
# Install pipenv if needed.
pip install --user pipenv
pipenv shell
pipenv install
Provided demonstration setups:
- HDFS example data: index the the HDFS dataset by triggering the Quickwit lambda manually.
- Mock Data generator: start a mock data generator lambda that pushes mock JSON data every X minutes to S3. Those file trigger the Quickwit indexer lambda automatically.
The Makefile is a useful entrypoint to show how the Lambda deployment can used.
Configure your shell and AWS account:
# replace with you AWS account ID and preferred region
export CDK_ACCOUNT=123456789
export CDK_REGION=us-east-1
make bootstrap
Deploy, index and query the HDFS dataset:
make deploy-hdfs
make invoke-hdfs-indexer
make invoke-hdfs-searcher
Deploy the mock data generator and query the indexed data:
make deploy-mock-data
# wait a few minutes...
make invoke-mock-data-searcher
The following environment variables can be configured on the Lambda functions. Note that only a small subset of all Quickwit configurations are exposed to simplify the setup and avoid unstable deployments.
Variable | Description | Default |
---|---|---|
QW_LAMBDA_INDEX_ID | the index this Lambda interacts with (one and only one) | required |
QW_LAMBDA_METASTORE_BUCKET | bucket name for metastore files | required |
QW_LAMBDA_INDEX_BUCKET | bucket name for split files | required |
QW_LAMBDA_OPENTELEMETRY_URL | HTTP OTEL tracing collector endpoint | none, OTEL disabled |
QW_LAMBDA_OPENTELEMETRY_AUTHORIZATION | Authorization header value for HTTP OTEL calls | none, OTEL disabled |
QW_LAMBDA_ENABLE_VERBOSE_JSON_LOGS | true to enable JSON logging of spans and logs in Cloudwatch | false |
RUST_LOG | Rust logging config | info |
Tip
The Indexer Lambda's logging is quite verbose. To reduce the associated
CloudWatch costs, you can disable some lower level logs by setting the
RUST_LOG
environment variable to info,quickwit_actors=warn
, or disable
INFO logs altogether by setting RUST_LOG=warn
.
Indexer only:
Variable | Description | Default |
---|---|---|
QW_LAMBDA_INDEX_CONFIG_URI | location of the index configuration file, e.g s3://mybucket/index-config.yaml |
required |
QW_LAMBDA_DISABLE_MERGE | true to disable compaction merges | false |
QW_LAMBDA_DISABLE_JANITOR | true to disable retention enforcement and garbage collection | false |
QW_LAMBDA_MAX_CHECKPOINTS | maximum number of ingested file names to keep in source history | 100 |
Searcher only:
Variable | Description | Default |
---|---|---|
QW_LAMBDA_SEARCHER_METASTORE_POLLING_INTERVAL_SECONDS | refresh interval of the metastore | 60 |
QW_LAMBDA_PARTIAL_REQUEST_CACHE_CAPACITY | searcher.partial_request_cache_capacity node config |
64M |
You can configure an HTTP API endpoint around the Quickwit Searcher Lambda. The
mock data example stack shows such a configuration. The API Gateway is enabled
when the SEARCHER_API_KEY
environment variable is set:
SEARCHER_API_KEY=my-at-least-20-char-long-key make deploy-mock-data
Warning
The API key is stored in plain text in the CDK stack. For a real world deployment, the key should be fetched from something like AWS Secrets Manager.
Note that the response is always gzipped compressed, regardless the
Accept-Encoding
request header:
curl -d '{"query":"quantity:>5", "max_hits": 10}' -H "Content-Type: application/json" -H "x-api-key: my-at-least-20-char-long-key" -X POST https://{api_id}.execute-api.{region}.amazonaws.com/api/v1/mock-sales/search --compressed
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation
You can query and visualize the Quickwit Searcher Lambda from Grafana by using the Quickwit data source for Grafana.
- Set up HTTP API endpoint for Quickwit Searcher Lambda
- Install Quickwit data source plugin on Grafana
If you don't have a Grafana instance running yet, you can start one with the Quickwit plugin installed using Docker:
docker run -e GF_INSTALL_PLUGINS="quickwit-quickwit-datasource" -p 3000:3000 grafana/grafana
In the Connections > Data sources
page, add a new Quickwit data source and configure the following settings:
Variable | Description | Example |
---|---|---|
HTTP URL | HTTP search endpoint for Quickwit Searcher Lambda | https://*******.execute-api.us-east-1.amazonaws.com/api/v1 |
Custom HTTP Headers | If you configure API Gateway to require an API key, set x-api-key HTTP Header |
Header: x-api-key Value: API key value |
Index ID | Same as QW_LAMBDA_INDEX_ID |
hdfs-logs |
After entering these values, click "Save & test". You can now query your Quickwit Lambda from Grafana!