This is a repository for any code related to the DS4A Data Engineering Data Swan pipeline project.
- A config.yaml must be added to the root folder in order to credential into a remote cloud object storage solution. This is currently set up for AWS, with a plan to make more extensible.
- The get_data.py file currently contains the main code for getting code from CMS sources into object storage. This is captured in the final for-loop in the code. This will be refactor such that the script is removed from the getter/load functions.