Edge Cloud servers Workload Dataset（ECWDataset）

In this Github repo, we provide several datasets could be used for the dynamic sequence time-series problem. All datasets have been preprocessed and they were stored as .npy files. The dataset ranges from 2022/08 to 2022/10.

Data background: The ECW is a real-world dataset utilized in our study One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms. The whole dataset encompasses diverse load logs (e.g., bandwidth for ECW, CPU, storage) and abundant cross-domain static data from a leading edge cloud service company that provides mature idle computing resource integration and edge cloud resource provisioning.

Note: The uploaded data has been normalized to the minimum and maximum for privacy reasons!!!.

Dataset list (updating)

ECW-08: A matrix shaped as 797 $\times$ 720 $\times$ 16 represents the upload bandwidth workload variation of 797 edge servers over 720 hours (24*30) in 2022/08. 16 is the feature dimension of the data, containing 4-dimensional dynamic features and 12-dimensional static content features from the cross domain.
ECW-09: A matrix shaped as 1022 $\times$ 720 $\times$ 16 represents the upload bandwidth workload variation of 1022 edge servers over 720 hours (24*30) in 2022/09.
ECW-min: Workload logs with a granularity of every 5 minutes of upload bandwidth, allows for finer granularity load changes compared to ECW.

The following dataset is recommended for testing.

ECW-App-Switch-sample: The workload series where application switching occurred during 2022/08/25-2022/08/30. The exact time of application switches can be easily detected by plotting the curve, as it is accompanied by a sudden change in the load change pattern. If you wish to test the model's capability to handle abrupt changes in time-series patterns (extreme concept-drift), use this dataset.
ECW-New-App: The workload series of the applications that never appeared in ECW-08. If you wish to test the model's generalizability when faced with unknown time-series patterns, use this dataset.
ECW-New-Infras.: The workload series running on infrastructure that has never been present in the ECW.

If you use this dataset please cite the work DynEformer @ KDD2023 [paper], [code] (The Bibtex version is comming soon):

Shaoyuan Huang, Zheng Wang, Heng Zhang, Xiaofei Wang, Cheng Zhang, and Wenyu Wang. 2023. One for All: Unified Workload Prediction for Dynamic Multi-tenant Edge Cloud Platforms. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3580305.3599453

@inproceedings{10.1145/3580305.3599453,
author = {Huang, Shaoyuan and Wang, Zheng and Zhang, Heng and Wang, Xiaofei and Zhang, Cheng and Wang, Wenyu},
title = {One for All: Unified Workload Prediction for Dynamic Multi-Tenant Edge Cloud Platforms},
year = {2023},
isbn = {9798400701030},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3580305.3599453},
doi = {10.1145/3580305.3599453},
booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {788–797},
numpages = {10},
location = {Long Beach, CA, USA},
series = {KDD '23}
}

Why Cross-domain Static Content is involved in ECW?

The cross-domain data points, which encompass 12 dimensions of static server hardware attributes, including maximum bandwidth, number of CPUs, location, and other infrastructure characteristics. This data is collected when edge servers join the Multi-tenant Edge Cloud Platforms(MT-ECP) or during subsequent updates. Incorporating cross-domain data is aimed to further enhance model's robustness, as noted in previous research (Lim et al. [a]). Specifically, for workload prediction in MT-ECP, server features such as hardware attributes and geographical location significantly influence workload variations.

[a] Bryan Lim, et al. "Temporal Fusion Transformers for interpretable multi-horizon time series forecasting". International Journal of Forecasting, 2021

ECW and its derivative data:

ECW, ECW-App-Switch, ECW-New-App, ECW-New-Infras cover all three types of behavior of dynamic MT-ECP, as shown in the following figure, all datasets are shaped as N(edge server num)*T(series len, hourly granularity)*F(num of feature dimensions, all 16).

Figure 1.Workloads under dynamic MT-ECP behaviors.

Specifically, each edge server's workload incorporates daily short-term perodical patterns and weekly long-term perodical patterns (where future load on the same calendar day of the upcoming week resembles that of the current day's load), in addition to certain trend patterns and irregular fluctuations. Due to the heterogeneity in server attributes and application patterns, these patterns vary across different load series. Compared to existing time-series prediction datasets (such as ETT and Azure), ECW exhibits more complex patterns and higher-frequency dynamics (application switch disruptions and new entities), significantly elevating the generalization requirements for models.

We use the .npy file format to save the data, an edge server workload demo of the ECW data is illustrated in Figure 2. The first line (16 columns) is the horizontal header and includes "bw_upload", "hour", "day", "week", 'province', 'bandwidth_type', 'nat_type', 'isp', 'billing_rule', 'upbandwidth', 'upbandwidth_base', 'cpu_num', 'memory_size', 'disk_size', 'test_sat', and 'loss_sat'. The detailed meaning of each column name is shown in the Table 1.

Figure 2. Edge server workload demo.

Field	bw_upload	hour	day	week	province	bandwidth_type	nat_type	isp	billing_rule	upbandwidth	upbandwidth_base	cpu_num	memory_size	disk_size	test_sat	loss_sat
Description	upload bandwidth workload (target)	record time (dynamic features)	<-	<-	edge server location	quality assessment of the network	nat type	isp	types of billing	total server bandwidth	available server bandwidth	cpu_num	memory_size	disk_size	network pressure test quality	packet loss quality

Table 1. Description for each columm.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
ECW_08.npy		ECW_08.npy
ECW_09.npy		ECW_09.npy
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge Cloud servers Workload Dataset（ECWDataset）

Why Cross-domain Static Content is involved in ECW?

ECW and its derivative data:

About

Releases

Packages

hsy23/ECWDataset

Folders and files

Latest commit

History

Repository files navigation

Edge Cloud servers Workload Dataset（ECWDataset）

Why Cross-domain Static Content is involved in ECW?

ECW and its derivative data:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages