Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why perform speaker diarization at the end #1043

Closed
bofenghuang opened this issue Feb 13, 2025 · 0 comments
Closed

Why perform speaker diarization at the end #1043

bofenghuang opened this issue Feb 13, 2025 · 0 comments

Comments

@bofenghuang
Copy link

bofenghuang commented Feb 13, 2025

Hello @m-bain ,

Thank you for this excellent project!

From what I understand, the current pipeline merges the results of speaker diarization and STT at the end based on timestamps. I'm wondering why we don't just replace VAD with speaker diarization and pass the segments by speaker directly to Whisper (still need to ensure segments are <30s). Is it because we want to keep speaker diarization optional, or have benchmarks shown this approach performs better?

Repository owner locked and limited conversation to collaborators Feb 19, 2025
@Barabazs Barabazs converted this issue into discussion #1055 Feb 19, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant