Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/jamesyang2333/SAM into main
Browse files Browse the repository at this point in the history
  • Loading branch information
Jamesyang2333 committed Jul 26, 2022
2 parents acc5dea + dcf883d commit 54a30cf
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ bash scripts/download_imdb.sh

Generate the IMDB database using the pretrained model at [`./sam_multi/models/uaeq-mscn-400.pt`](./sam_multi/models/uaeq-mscn-400.pt). The model is trained from the first 400 queries in the MSCN workload. The generated data csv files are saved at `./sam_multi/generated_database/imdb`.
```
python run_dbgen.py --run data-generation-job-light-MSCN-worklod
python run_dbgen.py --run data-generation-job-light-mscn-worklod
```

To test the fidelity of generated database, import the files to a PostgreSQL database:
Expand Down
2 changes: 1 addition & 1 deletion sam_multi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ bash scripts/download_imdb.sh

**Database Generation** To generate database from trained models using SAM, use the following commands.
```
python run_dbgen.py --run job-light-ranges-reload
python run_dbgen.py --run data-generation-job-light-mscn-worklod
```
By default, this generates the database using the model [`./models/uaeq-mscn-400.pt`](./models/uaeq-mscn.pt). The generation process runs for 100 iterations. The generated data csv files are saved at `./generated_database/imdb`.

Expand Down
33 changes: 30 additions & 3 deletions sam_single/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,35 @@
# SAM
### Instruction
We have provided pretrained model for both Census and DMV dataset. To generate database from pretrained models using SAM, use the following commands.
# SAM for single-relation database generation
### Getting Started

**Datasets** For single-relation database, we conduct our experiments on two datasets, Census and DMV. We have uploaded Census at [`./datasets/census.csv`](./datasets/census.csv). You can download the DMV dataset by running the script.
```
bash scripts/download_dmv.sh
```
**Pretrained Models** We have provided a pretrained model for each dataset.
[`./models/census_pretrained.pt`](./models/census_pretrained.pt): Trained from 20000 queries in the generated workload ([`./queries/census_21000.csv`](./queries/census_21000.csv)).

[`./models/dmv_pretrained.pt`](./models/dmv_pretrained.pt): Trained from 20000 queries in the generated workload ([`./queries/dmv_21000.csv`](./queries/dmv_21000.csv)).

**Database Generation** To generate database from trained models using SAM, use the following commands.
```
python gen_data_model.py --dataset census --residual --layers=2 --fc-hiddens=128 --direct-io --column-masking --glob census_pretrained.pt --save-name census
python gen_data_model.py --dataset dmv --residual --layers=2 --fc-hiddens=128 --direct-io --column-masking --glob dmv_pretrained.pt --save-name dmv
```
The generated relation is saved at `./generated_data_tables`.

**Test the generated database**


### SAM model training
SAM uses [UAE-Q](https://github.com/pagegitss/UAE) to train a deep autoregressive model from query workloads,

To train the model from the full MSCN dataset
```
python run_uae.py --run job-light-ranges-mscn-workload
```

To test the model on sub-queries of JOB-light
```
python run_uae.py --run uae-job-light-ranges-reload
```

0 comments on commit 54a30cf

Please sign in to comment.