-
Notifications
You must be signed in to change notification settings - Fork 37
Audio encoding: support custom num_channels
#693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
) -> Tensor: | ||
return _core.encode_audio_to_tensor( | ||
wf=self._samples, | ||
sample_rate=self._sample_rate, | ||
format=format, | ||
bit_rate=bit_rate, | ||
num_channels=num_channels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no tests for the public API right now. I will soon migrate most of the existing encoder ops tests into testing the public Python APIs.
@@ -44,6 +50,9 @@ class AudioEncoder { | |||
UniqueAVCodecContext avCodecContext_; | |||
int streamIndex_; | |||
UniqueSwrContext swrContext_; | |||
// TODO-ENCODING: desiredNumChannels should just be part of an options struct, | |||
// see other TODO above. | |||
int desiredNumChannels_ = -1; | |||
|
|||
const torch::Tensor wf_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment here that wf
stands for "wave form", and it's the original audio data passed to us by the user would be helpful. I know this is not directly related to the changes in this PR, but I keep having to remind myself of this fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a TODO somewhere to rename wf
to samples
, like in our Python API. That should make it more obvious
@@ -228,7 +233,7 @@ void AudioEncoder::encode() { | |||
avFrame->format = AV_SAMPLE_FMT_FLTP; | |||
avFrame->sample_rate = avCodecContext_->sample_rate; | |||
avFrame->pts = 0; | |||
setChannelLayout(avFrame, avCodecContext_); | |||
setDefaultChannelLayout(avFrame, static_cast<int>(wf_.sizes()[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to think about this for a few moments to convince myself it's correct, so it may be worth putting in a comment: the default channel layout should be the channel layout of the provided waveform. The desired channel layout only comes in if we need to do any conversions in the encoding inner loop.
if (numChannels == avCodec.ch_layouts[i].nb_channels) { | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here saying that we've now entered the error path might be helpful - I think this is less obvious because we're in an #if
block.
// eventually raise. | ||
return; | ||
} | ||
for (auto i = 0; avCodec.ch_layouts[i].order != AV_CHANNEL_ORDER_UNSPEC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this is correct. I think it will often work, but the docs say that ch_layouts
is:
Array of supported channel layouts, terminated with a zeroed layout.
That does imply that the terminating struct will have AV_CHANNEL_ORDER_UNSPEC
for its order because that corresponds to 0 in the AVChannelOrder
enum, but I think that's also a valid order for a real layout.
I think it may be safer to say avCodec.ch_layouts[i] != AVChannelLayout{0}
. That's also not obvious, so we could define const auto emptyAVChannelLayout = AVChannelLayout{0}
and use that in the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I can't create a AVChannelLayout{0}
because of our no-permissive compilation flag. But checking for nb_channels == 0
should correctly indicate the end of the array. nb_channels
should never be 0 for a valid layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging a little, just AVChannelLayout{}
may work. That's a default initialized layout, but I'm not certain it's necessarily a zeroed layout. Yay C++. :) I think your solution of using nb_channels
is probably best.
return; | ||
} | ||
} | ||
std::stringstream supportedNumChannels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto about error path, partially so both arms have the same structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved to unblock - let's make sure the validate function is correct before merging.
This PR adds a
num_channels
parameter to the audio encoder, which allows users to specify the number of channels of the encoded data (e.g. encode a stereo tensor into a mono file).