Skip to content

Ray941216/data-engineer-exam

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineer Exam

Environment

Before the test, please use virtalenv to build the environment. You can add the tool you want to use in requirements.txt.

1. Exam 1

You have 67 time series data in exam1/data/time-series.zip, some of them contain anomaly points and some are not.

Please finished algorithm.py that return 1 if the points your algorithm detect is anomaly else 0, you can use ANY methods to detect anomalies.

Evaluate

Execute the command below, the result will show on the exam1/result folder.

Anomaly points will be annotated as redpointing.

cd exam1
python detector.py

2. Exam 2

Use Pandas to transform data exam2/data/train_need_aggregate.csv and exam2/data/test_need_aggregate.csv from Figure 1 to Figure 2.

Figure 1

before

Figure 2

after

Evaluate

Please output two files train.csv and test.csv in exam2/result folder via the command below:

cd exam2
python main.py

3. Exam 3

  1. Use deep learning framework PyTorch to build an LSTM Model on model.py.

  2. Use the model you built in step 1, use the train file you aggregated in question Exam 2 to train a model. Please finish train.py and save the model weight on model folder. The train.py should support epochs arguments to determine how many epochs should model trains.

    cd exam3
    python train --epochs 10
  3. (bonus1) Finish the predict.py to load the model weight to predict the test file you aggregated in question Exam 2.

    3.1. (bonus2) Point out which time point is an anomaly. Add a column called anomaly, fill 1 if the point is anomaly else 0. Output the result call predict.csv to exam3/result folder.

    cd exam3
    python predict.py

About

Offline Exam for Data Engineer Candidates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.0%
  • Python 8.0%