Audio encoding: support custom `num_channels` #693

NicolasHug · 2025-05-22T10:32:12Z

This PR adds a num_channels parameter to the audio encoder, which allows users to specify the number of channels of the encoded data (e.g. encode a stereo tensor into a mono file).

…m_channels

NicolasHug · 2025-05-22T10:36:28Z

src/torchcodec/encoders/_audio_encoder.py

    ) -> Tensor:
        return _core.encode_audio_to_tensor(
            wf=self._samples,
            sample_rate=self._sample_rate,
            format=format,
            bit_rate=bit_rate,
+            num_channels=num_channels,


There are no tests for the public API right now. I will soon migrate most of the existing encoder ops tests into testing the public Python APIs.

scotts · 2025-05-22T13:28:04Z

src/torchcodec/_core/Encoder.h

@@ -44,6 +50,9 @@ class AudioEncoder {
  UniqueAVCodecContext avCodecContext_;
  int streamIndex_;
  UniqueSwrContext swrContext_;
+  // TODO-ENCODING: desiredNumChannels should just be part of an options struct,
+  // see other TODO above.
+  int desiredNumChannels_ = -1;

  const torch::Tensor wf_;


I think a comment here that wf stands for "wave form", and it's the original audio data passed to us by the user would be helpful. I know this is not directly related to the changes in this PR, but I keep having to remind myself of this fact.

I have a TODO somewhere to rename wf to samples, like in our Python API. That should make it more obvious

scotts · 2025-05-22T13:29:44Z

src/torchcodec/_core/Encoder.cpp

@@ -228,7 +233,7 @@ void AudioEncoder::encode() {
  avFrame->format = AV_SAMPLE_FMT_FLTP;
  avFrame->sample_rate = avCodecContext_->sample_rate;
  avFrame->pts = 0;
-  setChannelLayout(avFrame, avCodecContext_);
+  setDefaultChannelLayout(avFrame, static_cast<int>(wf_.sizes()[0]));


I had to think about this for a few moments to convince myself it's correct, so it may be worth putting in a comment: the default channel layout should be the channel layout of the provided waveform. The desired channel layout only comes in if we need to do any conversions in the encoding inner loop.

scotts · 2025-05-22T13:40:45Z

src/torchcodec/_core/FFMPEGCommon.cpp

+    if (numChannels == avCodec.ch_layouts[i].nb_channels) {
+      return;
+    }
+  }


A comment here saying that we've now entered the error path might be helpful - I think this is less obvious because we're in an #if block.

scotts · 2025-05-22T13:54:09Z

src/torchcodec/_core/FFMPEGCommon.cpp

+    // eventually raise.
+    return;
+  }
+  for (auto i = 0; avCodec.ch_layouts[i].order != AV_CHANNEL_ORDER_UNSPEC;


I'm not sure that this is correct. I think it will often work, but the docs say that ch_layouts is:

Array of supported channel layouts, terminated with a zeroed layout.

That does imply that the terminating struct will have AV_CHANNEL_ORDER_UNSPEC for its order because that corresponds to 0 in the AVChannelOrder enum, but I think that's also a valid order for a real layout.

I think it may be safer to say avCodec.ch_layouts[i] != AVChannelLayout{0}. That's also not obvious, so we could define const auto emptyAVChannelLayout = AVChannelLayout{0} and use that in the comparison.

Good point - I can't create a AVChannelLayout{0} because of our no-permissive compilation flag. But checking for nb_channels == 0 should correctly indicate the end of the array. nb_channels should never be 0 for a valid layout.

Digging a little, just AVChannelLayout{} may work. That's a default initialized layout, but I'm not certain it's necessarily a zeroed layout. Yay C++. :) I think your solution of using nb_channels is probably best.

scotts · 2025-05-22T13:56:39Z

src/torchcodec/_core/FFMPEGCommon.cpp

+      return;
+    }
+  }
+  std::stringstream supportedNumChannels;


Ditto about error path, partially so both arms have the same structure.

scotts

Approved to unblock - let's make sure the validate function is correct before merging.

NicolasHug added 3 commits May 22, 2025 10:18

Add num_channels parameter to AudioEncoder

52d624b

Merge branch 'main' of github.com:pytorch/torchcodec into encoding_nu…

aad9c7d

…m_channels

Add validation for num_channels

2d76a7b

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 22, 2025

NicolasHug commented May 22, 2025

View reviewed changes

Fix FFmpeg 5.X?

7d643f2

NicolasHug mentioned this pull request May 22, 2025

Migrate encoder tests to public Python APIs #694

Merged

scotts reviewed May 22, 2025

View reviewed changes

Add comments

dd85e96

scotts reviewed May 22, 2025

View reviewed changes

scotts approved these changes May 22, 2025

View reviewed changes

Better check for end of ch_layouts

b101939

NicolasHug merged commit b4e958f into pytorch:main May 22, 2025
31 checks passed

Dan-Flores mentioned this pull request May 27, 2025

Rename 'wf' to 'samples' in AudioEncoder #701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio encoding: support custom `num_channels` #693

Audio encoding: support custom `num_channels` #693

Uh oh!

NicolasHug commented May 22, 2025

Uh oh!

NicolasHug May 22, 2025

Uh oh!

scotts May 22, 2025

Uh oh!

NicolasHug May 22, 2025

Uh oh!

scotts May 22, 2025 •

edited

Loading

Uh oh!

scotts May 22, 2025

Uh oh!

scotts May 22, 2025

Uh oh!

NicolasHug May 22, 2025

Uh oh!

scotts May 22, 2025

Uh oh!

scotts May 22, 2025

Uh oh!

scotts left a comment

Uh oh!

Uh oh!

Uh oh!

Audio encoding: support custom num_channels #693

Audio encoding: support custom num_channels #693

Uh oh!

Conversation

NicolasHug commented May 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Audio encoding: support custom `num_channels` #693

Audio encoding: support custom `num_channels` #693

scotts May 22, 2025 •

edited

Loading