2
2
3
3
---
4
4
5
- title: 'Simple command line to prepare dataset manifest file '
5
+ title: 'Dataset Manifest '
6
6
linkTitle: 'Dataset manifest'
7
7
weight: 30
8
- description: This section on [ GitHub ] ( https://github.com/cvat-ai/cvat/tree/develop/utils/dataset_manifest )
8
+ description:
9
9
10
10
---
11
11
12
12
<!-- lint disable heading-style-->
13
13
14
- ### Steps before use
14
+ ## Overview
15
15
16
- When used separately from Computer Vision Annotation Tool(CVAT), the required dependencies must be installed
16
+ When we create a new task in CVAT, we need to specify where to get the input data from.
17
+ CVAT allows to use different data sources, including local file uploads, a mounted
18
+ file share on the server, cloud storages and remote URLs. In some cases CVAT
19
+ needs to have extra information about the input data. This information can be provided
20
+ in Dataset manifest files. They are mainly used when working with cloud storages to
21
+ reduce the amount of network traffic used and speed up the task creation process.
22
+ However, they can also be used in other cases, which will be explained below.
17
23
18
- #### Ubuntu:20.04
24
+ A dataset manifest file is a text file in the JSONL format. These files can be created
25
+ automatically with [ the special command-line tool] ( https://github.com/opencv/cvat/tree/develop/utils/dataset_manifest ) ,
26
+ or manually, following [ the manifest file format specification] ( #file-format ) .
27
+
28
+ ## How and when to use manifest files
29
+
30
+ Manifest files can be used in the following cases:
31
+ - A video file or a set of images is used as the data source and
32
+ the caching mode is enabled. [ Read more] ( /docs/manual/advanced/data_on_fly/ )
33
+ - The data is located in a cloud storage. [ Read more] ( /docs/manual/basics/cloud-storages/ )
34
+
35
+ ## How to generate manifest files
36
+
37
+ CVAT provides a dedicated Python tool to generate manifest files.
38
+ The source code can be found [ here] ( https://github.com/opencv/cvat/tree/develop/utils/dataset_manifest ) .
39
+
40
+ Using the tool is the recommended way to create manifest files for you data. The data must be
41
+ available locally to the tool to generate manifest.
42
+
43
+ ### Usage
44
+
45
+ ``` bash
46
+ usage: create.py [-h] [--force] [--output-dir .] source
47
+
48
+ positional arguments:
49
+ source Source paths
50
+
51
+ optional arguments:
52
+ -h, --help show this help message and exit
53
+ --force Use this flag to prepare the manifest file for video data
54
+ if by default the video does not meet the requirements
55
+ and a manifest file is not prepared
56
+ --output-dir OUTPUT_DIR
57
+ Directory where the manifest file will be saved
58
+ ` ` `
59
+
60
+ # ## Use the script from a Docker image
61
+
62
+ This is the recommended way to use the tool.
63
+
64
+ The script can be used from the ` cvat/server` image:
65
+
66
+ ` ` ` bash
67
+ docker run -it --rm -u " $( id -u) " :" $( id -g) " \
68
+ -v " ${PWD} " :" /local" \
69
+ --entrypoint python3 \
70
+ cvat/server \
71
+ utils/dataset_manifest/create.py --output-dir /local /local/< path/to/sources>
72
+ ` ` `
73
+
74
+ Make sure to adapt the command to your file locations.
75
+
76
+ # ## Use the script directly
77
+
78
+ # ### Ubuntu 20.04
19
79
20
80
Install dependencies:
21
81
@@ -38,72 +98,102 @@ Create an environment and install the necessary python modules:
38
98
python3 -m venv .env
39
99
. .env/bin/activate
40
100
pip install -U pip
41
- pip install -r requirements.txt
42
- ```
43
-
44
- ### Using
45
-
46
- ``` bash
47
- usage: python create.py [-h] [--force] [--output-dir .] source
48
-
49
- positional arguments:
50
- source Source paths
51
-
52
- optional arguments:
53
- -h, --help show this help message and exit
54
- --force Use this flag to prepare the manifest file for video data if by default the video does not meet the requirements
55
- and a manifest file is not prepared
56
- --output-dir OUTPUT_DIR
57
- Directory where the manifest file will be saved
101
+ pip install -r utils/dataset_manifest/requirements.txt
58
102
` ` `
59
103
60
- # ## Alternative way to use with cvat/server
61
-
62
- ` ` ` bash
63
- docker run -it -u root --entrypoint bash -v /path/to/host/data/:/path/inside/container/:rw cvat/server -c " pip3 install -r utils/dataset_manifest/requirements.txt && python3 utils/dataset_manifest/create.py --output-dir /path/to/manifest/directory/ /path/to/data/"
64
- ` ` `
104
+ > Please note that if used with video this way, the results may be different from what
105
+ would the server decode. It is related to the ffmpeg library version. For this reason,
106
+ using the Docker-based version of the tool is recommended.
65
107
66
- # ## Examples of using
108
+ # ## Examples
67
109
68
110
Create a dataset manifest in the current directory with video which contains enough keyframes:
69
111
70
112
` ` ` bash
71
- python create.py ~ /Documents/video.mp4
113
+ python utils/dataset_manifest/ create.py ~ /Documents/video.mp4
72
114
` ` `
73
115
74
116
Create a dataset manifest with video which does not contain enough keyframes:
75
117
76
118
` ` ` bash
77
- python create.py --force --output-dir ~ /Documents ~ /Documents/video.mp4
119
+ python utils/dataset_manifest/ create.py --force --output-dir ~ /Documents ~ /Documents/video.mp4
78
120
` ` `
79
121
80
122
Create a dataset manifest with images:
81
123
82
124
` ` ` bash
83
- python create.py --output-dir ~ /Documents ~ /Documents/images/
125
+ python utils/dataset_manifest/ create.py --output-dir ~ /Documents ~ /Documents/images/
84
126
` ` `
85
127
86
128
Create a dataset manifest with pattern (may be used ` * ` , ` ? ` , ` []` ):
87
129
88
130
` ` ` bash
89
- python create.py --output-dir ~ /Documents " /home/${USER} /Documents/**/image*.jpeg"
131
+ python utils/dataset_manifest/ create.py --output-dir ~ /Documents " /home/${USER} /Documents/**/image*.jpeg"
90
132
` ` `
91
133
92
- Create a dataset manifest with ` cvat/server ` :
134
+ Create a dataset manifest using Docker image :
93
135
94
136
` ` ` bash
95
- docker run -it --entrypoint python3 -v ~ /Documents/data/:${HOME} /manifest/:rw cvat/server
96
- utils/dataset_manifest/create.py --output-dir ~ /manifest/ ~ /manifest/images/
137
+ docker run -it --rm -u " $( id -u) " :" $( id -g) " \
138
+ -v ~ /Documents/data/:${HOME} /manifest/:rw \
139
+ --entrypoint ' /usr/bin/bash' \
140
+ cvat/server \
141
+ utils/dataset_manifest/create.py --output-dir ~ /manifest/ ~ /manifest/images/
97
142
` ` `
98
143
99
- # ## Examples of generated `manifest.jsonl` files
144
+ # ## File format
145
+
146
+ The dataset manifest files are text files in JSONL format. These files have 2 sub-formats:
147
+ _for video_ and _for images and 3d data_.
148
+
149
+ > Each top-level entry enclosed in curly braces must use 1 string, no empty strings is allowed.
150
+ > The formatting in the descriptions below is only for demonstration.
151
+
152
+ # ### Dataset manifest for video
100
153
101
- A manifest file contains some intuitive information and some specific like:
154
+ The file describes a single video.
102
155
103
156
` pts` - time at which the frame should be shown to the user
104
- ` checksum` - ` md5` hash sum for the specific image/frame
157
+ ` checksum` - ` md5` hash sum for the specific image/frame decoded
158
+
159
+ ` ` ` json
160
+ { " version" : < string, version id> }
161
+ { " type" : " video" }
162
+ { " properties" : {
163
+ " name" : < string, filename> ,
164
+ " resolution" : [< int, width> , < int, height> ],
165
+ " length" : < int, frame count>
166
+ }}
167
+ {
168
+ " number" : < int, frame number> ,
169
+ " pts" : < int, frame pts> ,
170
+ " checksum" : < string, md5 frame hash>
171
+ } (repeatable)
172
+ ` ` `
173
+
174
+ # ### Dataset manifest for images and other data types
175
+
176
+ The file describes an ordered set of images and 3d point clouds.
177
+
178
+ ` name` - file basename and leading directories from the dataset root
179
+ ` checksum` - ` md5` hash sum for the specific image/frame decoded
180
+
181
+ ` ` ` json
182
+ { " version" : < string, version id> }
183
+ { " type" : " images" }
184
+ {
185
+ " name" : < string, image filename> ,
186
+ " extension" : < string, . + file extension> ,
187
+ " width" : < int, width> ,
188
+ " height" : < int, height> ,
189
+ " meta" : < dict, optional> ,
190
+ " checksum" : < string, md5 hash, optional>
191
+ } (repeatable)
192
+ ` ` `
193
+
194
+ # ## Example files
105
195
106
- # ### For a video
196
+ # ### Manifest for a video
107
197
108
198
` ` ` json
109
199
{" version" :" 1.0" }
@@ -117,7 +207,7 @@ A manifest file contains some intuitive information and some specific like:
117
207
{" number" :675," pts" :2430000," checksum" :" 0e72faf67e5218c70b506445ac91cdd7" }
118
208
` ` `
119
209
120
- # ### For a dataset with images
210
+ # ### Manifest for a dataset with images
121
211
122
212
` ` ` json
123
213
{" version" :" 1.0" }
0 commit comments