Skip to content

feat: Qwen2.5-omni-7b full modal speech recognition #3870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

shaohuzhang1
Copy link
Contributor

feat: Qwen2.5-omni-7b full modal speech recognition

Copy link

f2c-ci-robot bot commented Aug 18, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

f2c-ci-robot bot commented Aug 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -80,6 +80,9 @@
ModelInfo('qwen-omni-turbo',
_('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'),
ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText),
ModelInfo('qwen2.5-omni-7b',
_('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'),
ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText),
]

module_info_vl_list = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review and Recommendations

  1. Code Fragment: The code contains multiple changes across modules:

    • Changed modelsProvider.modelsImpl.aliyunBaiLian.credential.omi_stt to modelsProvider.modelsImpl.aliyunBaiLian.credential.omni_stt.

    • Replaced occurrences of _model_credential with _modelcredential.

    • Added two new model entries:

      ModelInfo('qwen2.5-omni-7b', ...
  2. Import Statements:

    • Consistent change from QwenVLModelCredential to QwenVLChatModel.
    • Updated all related classes and models.
  3. Comments:

    • No significant comment modifications noted.

Overall Conclusion

  • The provided changes appear consistent with the overall structure and functionality of the API.

No major technical issues were identified in this snippet. However, consider adding comments above each entry (like ModelInfo) explaining their purpose, given that they might not always be immediately self-explanatory without context.

Additional Advice for Quality Assurance:

For robustness before release, consider these enhancements:

  1. Validation Checks: Add checks to ensure that credential and model configurations match expected types before usage.
  2. Edge Case Testing: Cover scenarios where user inputs may differ slightly to identify potential bugs early.
  3. Testing Frameworks: Integrate existing testing frameworks to automate repeated unit, integration, and end-to-end tests after making these updates.
  4. Performance Monitoring: After deployment, set up performance monitoring tools to catch regressions quickly when updating similar APIs.

Feel free to add more detailed feedback or ask about specific areas if needed!

@@ -30,8 +30,6 @@ def new_instance(model_type, model_name, model_credential: Dict[str, object], **
optional_params['max_tokens'] = model_kwargs['max_tokens']
if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None:
optional_params['temperature'] = model_kwargs['temperature']
if model_name == 'qwen-omni-turbo':
optional_params['streaming'] = True
return AliyunBaiLianSpeechToText(
model=model_name,
api_key=model_credential.get('api_key'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet has some issues:

  1. Optional Parameters Handling: The optional_params dictionary uses the same key 'max_tokens', which can lead to overwriting values in case multiple sources provide this parameter.

  2. Qwen Model Streaming Support: For the 'qwen-omni-turbo' model, there's an assumption that streaming should be enabled. However, it seems like the logic needs adjustments because if both model_name == 'qwen-omni-turbo' and model_kwargs['streaming'] is set elsewhere (and possibly overridden by another source), this line might still enable streaming despite its presence in the original parameters.

  3. Dictionary Usage: While using Dict[str, object] is flexible, using Any instead could help improve type-checking by letting you specify more precise types for the keys and values.

  4. Comments Clarity: The comments describe what each part of the code does; however, ensuring they are accurate with respect to actual behavior could make the function easier to reason about.

Here are the suggested improvements:

from typing import Dict

def new_instance(
    model_type: str,
    model_name: str,
    model_credential: Dict[str, Any],
    **model_kwargs
) -> AliasBaiLianSpeechToText:
    optional_params = {
        "max_tokens": 1000,  # Default value for max tokens, adjust as needed based on service documentation
        "temperature": 0.7   # Default temperature for generation quality, adjust as needed
    }
    
    # Update optional parameters from kwargs if present and not None
    if 'max_tokens' in model_kwargs and model_kwargs['max_tokens'] is not None:
        optional_params['max_tokens'] = model_kwargs['max_tokens']
    if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None:
        optional_params['temperature'] = model_kwargs['temperature']
    
    # Enable streaming explicitly for Qwen-omni-turbo if necessary
    if model_name.lower() == 'qwen-omni-turbo':
        optional_params['streaming'] = True
    
    return AliyunBaiLianSpeechToText(
        model=model_name,
        api_key=model_credential.get('api_key'),
        **optional_params
    )

Key Changes:

  • Used Dict[str, Any] to allow runtime-specific key-value pairs in model_kwargs.
  • Removed hardcoding of default values (optional_params) for clarity.
  • Explicitly enabled streaming only when the model name matches 'qwen-omni-turbo', assuming lowercase comparisons align with the expected input format.
  • Improved commenting to reflect the current behavior better.

@zhanweizhang7 zhanweizhang7 merged commit b32b063 into v2 Aug 18, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_qwen2.5_omni_7b_full_modal_speech_recognition branch August 18, 2025 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants