Stand alone Google Storage based BigQuery loader.
BqTail command loader manages ingestion process as stand along process using Data ingestion rules. For each source datafile an event is triggered to local BqTail process. Since BigQuery Load API accepts URI that is valid Google Cloud Storage location, all data events also needs to be valid GCS locations.
Data event can be trigger directly to the bqtail process if source URL is valid Google Cloud Storage URL and source path matches bucket and rule filter. Otherwise all files get copied from sourceURL to gs://${bucket}/$filterPath, and then an event gets fired. $filterPath can be derived from source path when it matches rule filter, or constructed from rule prefix and pattern.
In direct eventing mode all data source files are govern by BqTail ingestion rule. For example if rule uses batching window, datafile last modification is used to allocate corresponding batch. Take another example when a rule uses delete action on Success, all source matched data files would be deleted.
For non direct mode, original data files are never deleted, to avoid the same file processing between a separate bqtail commands run, you can use -h or -X parameter to store all successfully processed file in a history file.
By default only streaming mode stores history file in file:///${env.HOME}/.bqtail location, otherwise memory filesystem is used.
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_osx_amd64_2.10.3.tar.gz
tar -xvzf bqtail_osx_amd64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_osx_arm64_2.10.3.tar.gz
tar -xvzf bqtail_osx_arm64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_amd64_linux_2.10.3.tar.gz
tar -xvzf bqtail_linux_amd64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
wget https://github.com/viant/bqtail/releases/download/v2.10.3/bqtail_arm64_linux_2.10.3.tar.gz
tar -xvzf bqtail_linux_arm64_2.10.3.tar.gz
cp bqtail /usr/local/bin/
git clone https://github.com/viant/bqtail.git
cd bqtail/cmd/bqtail
go build
Make sure that you have temp dataset in the project.
Data ingestion rule validation
To validate rule use -V option.
bqtail -r='myRuleURL -V' -p=myProject
bqtail -s=mydatafile -d='myProject:mydataset.mytable' -V
bqtail -r=gs://MY_CONFIG_BUCKET/BqTail/Rules/sys/bqjob.yaml -V
Local data file ingestion
bqtail -s=mydatafile -d='myProject:mydataset.mytable' -b=myGCSBucket
Google storage file ingestion
The following line creates default ingestion rule to ingest data directly from Google Storage
bqtail -s=gs://myBuckey/folder/mydatafile.csv -d='myProject:mydataset.mytable'
The command ingests data to the dest table and produces the following rule:
Async: true
Dest:
Table: myProject:mydataset.mytable
Transient:
Alias: t
Dataset: temp
ProjectID: myProject
Info:
LeadEngineer: awitas
URL: mem://localhost/BqTail/config/rule/performance.yaml
Workflow: rule
OnSuccess:
- Action: delete
Request:
URLs: $LoadURIs
When:
Prefix: /folder/
You can save it as rule.yaml to extend/customize the rule, then you can ingest data with updated rule:
bqtail -s=gs://myBuckey/folder/mydatafile.csv -r=performance.yaml
Local data ingestion with data ingestion rule
bqtail -s=mydatafile -r='myRuleURL' -b=myGCSBucket
Local data files ingestion
bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -b=myGCSBucket
Local data files ingestion in batch with 120 sec window
bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -w=120 -b=myGCSBucket
Local data files streaming ingestion with rule
bqtail -s=mylocaldatafolder -r='myRuleURL' -X
Local data files ingestion in batch with 120 sec window with processed file tracking
bqtail -s=mylocaldatafolder -d='myProject:mydataset.mytable' -w=120 -h=~/.bqtail
BqTail client can use one the following auth method
- With BqTail BigQuery OAuth client (by default)
- no env setting needed
2.With Google Service Account Secrets
export GOOGLE_APPLICATION_CREDENTIALS=myGoogle.secret
- With gsutil authentication
gcloud config set project my-project
gcloud auth login`
export GCLOUD_AUTH=true
- With custom BigQuery Oath clent
-c switch
bqtail -c=pathTo/custom.json
where:
- @pathTo/custom.json
{
"Id": "xxxx.apps.googleusercontent.com",
"Secret": "xxxxxx"
}
Help:
bqtail -h