- The video size should end up smaller after the encoding
- The encoding is reversible
- The compressed video should contain enough information to be recongisable
Consecutive frames have spatial redundancy, pixels in consecutive frames are likely to have similar information. This allows us to compress like how png does.
Each frame contains pixels -which contain three attribuutes: r, g, and b. This gives a total size of (width * height * 3) per frame.
We convert each frame to YUV420 format. Then for each 4 adjacent pixels, we average the chominance this reducing information stored by each pixels to 0.25
- https://github.com/kevmo314/codec-from-scratch
- ffmpeg blogs