Skip to content

"The maximum pts value in seconds is unknown." while streaming #691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FredrikNoren opened this issue May 20, 2025 · 4 comments
Closed

"The maximum pts value in seconds is unknown." while streaming #691

FredrikNoren opened this issue May 20, 2025 · 4 comments

Comments

@FredrikNoren
Copy link

🐛 Describe the bug

This might be me using torchcodec wrong, but I'm trying to stream video data using a websocket. This is my code:

class WebsocketBuffer:
    def __init__(self, websocket):
        self.buffer = bytes(0)
        self.position = 0
        self.websocket = websocket

    def read_websocket_until(self, size: int):
        for msg in self.websocket:
            msg_type = type(msg)
            msg_size = len(msg) if isinstance(msg, (bytes, bytearray)) else 0
            print(f"Got message of type {msg_type} with size {msg_size} bytes")
            if isinstance(msg, (bytes, bytearray)):
                self.buffer += msg
                if len(self.buffer) >= size:
                    break
            else:
                print("Got non‑binary message:", msg)

    def read(self, size: int) -> bytes:
        print(f"Reading {size} bytes from position {self.position}")
        if self.position + size >= len(self.buffer):
            self.read_websocket_until(self.position + size)
        print(f"Buffer size: {len(self.buffer)}")
        data = self.buffer[self.position:self.position + size]
        self.position += min(size, len(self.buffer) - self.position)
        return data

    def seek(self, offset: int, whence: int) -> int:
        print(f"Seeking to {offset} from {whence} (current position: {self.position})")
        if whence == 0:  # SEEK_SET
            self.position = offset
        elif whence == 1:  # SEEK_CUR
            self.position += offset
        elif whence == 2:  # SEEK_END
            self.position = len(self.buffer) + offset

        # Ensure position is within bounds
        self.position = max(0, min(self.position, len(self.buffer)))
        return self.position

def handle(websocket):
    buffer = WebsocketBuffer(websocket)
    decoder = VideoDecoder(buffer, seek_mode="approximate") # type: ignore
    for i in range(1000):
        frame = decoder[i] # type: ignore
        print(f"Decoded frame {i} of size {frame.shape}")

Which gives me this output:

Reading 65536 bytes from position 0
Got message of type <class 'str'> with size 0 bytes
Got non‑binary message: Hello from the client!
Got message of type <class 'bytes'> with size 240285 bytes
Buffer size: 240285
Seeking to -1 from 2 (current position: 65536)
Seeking to 65536 from 0 (current position: 240284)
Reading 65536 bytes from position 65536
Buffer size: 240285
Reading 65536 bytes from position 131072
Buffer size: 240285
Seeking to -1 from 2 (current position: 196608)
Seeking to 196608 from 0 (current position: 240284)
Seeking to -1 from 2 (current position: 196608)
Seeking to 196608 from 0 (current position: 240284)
connection handler failed
Traceback (most recent call last):
  File "/Users/fredrik/medal/clip2actions/.venv/lib/python3.12/site-packages/websockets/sync/server.py", line 593, in conn_handler
    handler(connection)
  File "/Users/fredrik/medal/clip2actions/environments/geforce_now/server.py", line 113, in <lambda>
    lambda msg: handle(msg, on_frame),
                ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fredrik/medal/clip2actions/environments/geforce_now/server.py", line 79, in handle
    decoder = VideoDecoder(buffer, seek_mode="approximate") # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fredrik/medal/clip2actions/.venv/lib/python3.12/site-packages/torchcodec/decoders/_video_decoder.py", line 118, in __init__
    ) = _get_and_validate_stream_metadata(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fredrik/medal/clip2actions/.venv/lib/python3.12/site-packages/torchcodec/decoders/_video_decoder.py", line 381, in _get_and_validate_stream_metadata
    raise ValueError(
ValueError: The maximum pts value in seconds is unknown. 
This should never happen. Please report an issue following the steps in
https://github.com/pytorch/torchcodec/issues/new?assignees=&labels=&projects=&template=bug-report.yml.

I think my first question is; have I implemented seek correctly? Especially seeking from the end feels weird in a stream; right now it's just seeking from what's available at the moment.

Versions

I'm using torchcodec 0.4.0 and torch 2.7.0

$ uv run collect_env.py 
Collecting environment information...
PyTorch version: 2.7.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.6)
CMake version: version 3.31.6
Libc version: N/A

Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] Could not collect
[conda] numpy                     2.1.1                    pypi_0    pypi
@scotts
Copy link
Contributor

scotts commented May 20, 2025

@FredrikNoren, I can't comment on the correctness of your implementations of read and seek, since I'm not familiar with the expected behavior of websockets. I can, however, comment on the expectations of the VideoDecoder implementation. It assumes that the video it is decoding from has metadata which describes the length of the video. This metadata comes in two forms: the duration of the video in seconds (reported as end_stream_seconds) and the number of frames (reported as num_frames).

Currently, the implementation of VideoDecoder requires both values to be present in the metadata. The error is saying that it cannot find a value for end_stream_seconds, which in approximate mode, means that there was no duration in your video's metadata. I suspect that num_frames might also be missing for you as well.

Is the video you're decoding a live video stream, or is it a video file accessible from a websocket? I'm afraid that the VideoDecoder implementation currently does not work on a live video stream; we need to know the duration of the video and the number of frames.

@FredrikNoren
Copy link
Author

@scotts It's a live video stream, with the mime type video/webm; codecs=vp9,opus. I can write it to a .webm which I can later open, and when I'm doing that I'm just appending to the file, so I assume that the header doesn't say anything about the length of the clip in this case.

So "Streaming video" in the docs doesn't include live video; it's only a method for streaming a fixed sized videos?

@scotts
Copy link
Contributor

scotts commented May 22, 2025

@FredrikNoren, that is correct. I created issue #695 to track live stream decoding as a feature. Can you comment there more about your use-case? We'll also use that issue to track how popular the feature request is. Thank you!

@FredrikNoren
Copy link
Author

Ok will do. I'll close this then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants