forked from Jamesyang2333/SAM
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of https://github.com/jamesyang2333/SAM into main
- Loading branch information
Showing
3 changed files
with
32 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,35 @@ | ||
# SAM | ||
### Instruction | ||
We have provided pretrained model for both Census and DMV dataset. To generate database from pretrained models using SAM, use the following commands. | ||
# SAM for single-relation database generation | ||
### Getting Started | ||
|
||
**Datasets** For single-relation database, we conduct our experiments on two datasets, Census and DMV. We have uploaded Census at [`./datasets/census.csv`](./datasets/census.csv). You can download the DMV dataset by running the script. | ||
``` | ||
bash scripts/download_dmv.sh | ||
``` | ||
**Pretrained Models** We have provided a pretrained model for each dataset. | ||
[`./models/census_pretrained.pt`](./models/census_pretrained.pt): Trained from 20000 queries in the generated workload ([`./queries/census_21000.csv`](./queries/census_21000.csv)). | ||
|
||
[`./models/dmv_pretrained.pt`](./models/dmv_pretrained.pt): Trained from 20000 queries in the generated workload ([`./queries/dmv_21000.csv`](./queries/dmv_21000.csv)). | ||
|
||
**Database Generation** To generate database from trained models using SAM, use the following commands. | ||
``` | ||
python gen_data_model.py --dataset census --residual --layers=2 --fc-hiddens=128 --direct-io --column-masking --glob census_pretrained.pt --save-name census | ||
python gen_data_model.py --dataset dmv --residual --layers=2 --fc-hiddens=128 --direct-io --column-masking --glob dmv_pretrained.pt --save-name dmv | ||
``` | ||
The generated relation is saved at `./generated_data_tables`. | ||
|
||
**Test the generated database** | ||
|
||
|
||
### SAM model training | ||
SAM uses [UAE-Q](https://github.com/pagegitss/UAE) to train a deep autoregressive model from query workloads, | ||
|
||
To train the model from the full MSCN dataset | ||
``` | ||
python run_uae.py --run job-light-ranges-mscn-workload | ||
``` | ||
|
||
To test the model on sub-queries of JOB-light | ||
``` | ||
python run_uae.py --run uae-job-light-ranges-reload | ||
``` | ||
|