Skip to content

Audio encoding: support custom num_channels #693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 22, 2025

Conversation

NicolasHug
Copy link
Member

This PR adds a num_channels parameter to the audio encoder, which allows users to specify the number of channels of the encoded data (e.g. encode a stereo tensor into a mono file).

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 22, 2025
) -> Tensor:
return _core.encode_audio_to_tensor(
wf=self._samples,
sample_rate=self._sample_rate,
format=format,
bit_rate=bit_rate,
num_channels=num_channels,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests for the public API right now. I will soon migrate most of the existing encoder ops tests into testing the public Python APIs.

@@ -44,6 +50,9 @@ class AudioEncoder {
UniqueAVCodecContext avCodecContext_;
int streamIndex_;
UniqueSwrContext swrContext_;
// TODO-ENCODING: desiredNumChannels should just be part of an options struct,
// see other TODO above.
int desiredNumChannels_ = -1;

const torch::Tensor wf_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment here that wf stands for "wave form", and it's the original audio data passed to us by the user would be helpful. I know this is not directly related to the changes in this PR, but I keep having to remind myself of this fact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a TODO somewhere to rename wf to samples, like in our Python API. That should make it more obvious

@@ -228,7 +233,7 @@ void AudioEncoder::encode() {
avFrame->format = AV_SAMPLE_FMT_FLTP;
avFrame->sample_rate = avCodecContext_->sample_rate;
avFrame->pts = 0;
setChannelLayout(avFrame, avCodecContext_);
setDefaultChannelLayout(avFrame, static_cast<int>(wf_.sizes()[0]));
Copy link
Contributor

@scotts scotts May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to think about this for a few moments to convince myself it's correct, so it may be worth putting in a comment: the default channel layout should be the channel layout of the provided waveform. The desired channel layout only comes in if we need to do any conversions in the encoding inner loop.

if (numChannels == avCodec.ch_layouts[i].nb_channels) {
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here saying that we've now entered the error path might be helpful - I think this is less obvious because we're in an #if block.

// eventually raise.
return;
}
for (auto i = 0; avCodec.ch_layouts[i].order != AV_CHANNEL_ORDER_UNSPEC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is correct. I think it will often work, but the docs say that ch_layouts is:

Array of supported channel layouts, terminated with a zeroed layout.

That does imply that the terminating struct will have AV_CHANNEL_ORDER_UNSPEC for its order because that corresponds to 0 in the AVChannelOrder enum, but I think that's also a valid order for a real layout.

I think it may be safer to say avCodec.ch_layouts[i] != AVChannelLayout{0}. That's also not obvious, so we could define const auto emptyAVChannelLayout = AVChannelLayout{0} and use that in the comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I can't create a AVChannelLayout{0} because of our no-permissive compilation flag. But checking for nb_channels == 0 should correctly indicate the end of the array. nb_channels should never be 0 for a valid layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging a little, just AVChannelLayout{} may work. That's a default initialized layout, but I'm not certain it's necessarily a zeroed layout. Yay C++. :) I think your solution of using nb_channels is probably best.

return;
}
}
std::stringstream supportedNumChannels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about error path, partially so both arms have the same structure.

Copy link
Contributor

@scotts scotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to unblock - let's make sure the validate function is correct before merging.

@NicolasHug NicolasHug merged commit b4e958f into pytorch:main May 22, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants