feat: Qwen2.5-omni-7b full modal speech recognition #3870

shaohuzhang1 · 2025-08-18T03:04:33Z

feat: Qwen2.5-omni-7b full modal speech recognition

f2c-ci-robot · 2025-08-18T03:04:37Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

f2c-ci-robot · 2025-08-18T03:04:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-08-18T03:04:57Z

apps/models_provider/impl/aliyun_bai_lian_model_provider/aliyun_bai_lian_model_provider.py

@@ -80,6 +80,9 @@
                   ModelInfo('qwen-omni-turbo',
                             _('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'),
                             ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText),
+                   ModelInfo('qwen2.5-omni-7b',
+                             _('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'),
+                             ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText),
                   ]

 module_info_vl_list = [


Review and Recommendations

Code Fragment: The code contains multiple changes across modules:

Changed modelsProvider.modelsImpl.aliyunBaiLian.credential.omi_stt to modelsProvider.modelsImpl.aliyunBaiLian.credential.omni_stt.

Replaced occurrences of _model_credential with _modelcredential.

Added two new model entries:

ModelInfo('qwen2.5-omni-7b', ...

Import Statements:

Consistent change from QwenVLModelCredential to QwenVLChatModel.

Updated all related classes and models.

Comments:

No significant comment modifications noted.

Overall Conclusion

The provided changes appear consistent with the overall structure and functionality of the API.

No major technical issues were identified in this snippet. However, consider adding comments above each entry (like ModelInfo) explaining their purpose, given that they might not always be immediately self-explanatory without context.

Additional Advice for Quality Assurance:

For robustness before release, consider these enhancements:

Validation Checks: Add checks to ensure that credential and model configurations match expected types before usage.

Edge Case Testing: Cover scenarios where user inputs may differ slightly to identify potential bugs early.

Testing Frameworks: Integrate existing testing frameworks to automate repeated unit, integration, and end-to-end tests after making these updates.

Performance Monitoring: After deployment, set up performance monitoring tools to catch regressions quickly when updating similar APIs.

Feel free to add more detailed feedback or ask about specific areas if needed!

shaohuzhang1 · 2025-08-18T03:05:19Z

apps/models_provider/impl/aliyun_bai_lian_model_provider/model/stt.py

@@ -30,8 +30,6 @@ def new_instance(model_type, model_name, model_credential: Dict[str, object], **
            optional_params['max_tokens'] = model_kwargs['max_tokens']
        if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None:
            optional_params['temperature'] = model_kwargs['temperature']
-        if model_name == 'qwen-omni-turbo':
-            optional_params['streaming'] = True
        return AliyunBaiLianSpeechToText(
            model=model_name,
            api_key=model_credential.get('api_key'),


The provided code snippet has some issues:

Optional Parameters Handling: The optional_params dictionary uses the same key 'max_tokens', which can lead to overwriting values in case multiple sources provide this parameter.

Qwen Model Streaming Support: For the 'qwen-omni-turbo' model, there's an assumption that streaming should be enabled. However, it seems like the logic needs adjustments because if both model_name == 'qwen-omni-turbo' and model_kwargs['streaming'] is set elsewhere (and possibly overridden by another source), this line might still enable streaming despite its presence in the original parameters.

Dictionary Usage: While using Dict[str, object] is flexible, using Any instead could help improve type-checking by letting you specify more precise types for the keys and values.

Comments Clarity: The comments describe what each part of the code does; however, ensuring they are accurate with respect to actual behavior could make the function easier to reason about.

Here are the suggested improvements:

from typing import Dict def new_instance( model_type: str, model_name: str, model_credential: Dict[str, Any], **model_kwargs ) -> AliasBaiLianSpeechToText: optional_params = { "max_tokens": 1000, # Default value for max tokens, adjust as needed based on service documentation "temperature": 0.7 # Default temperature for generation quality, adjust as needed } # Update optional parameters from kwargs if present and not None if 'max_tokens' in model_kwargs and model_kwargs['max_tokens'] is not None: optional_params['max_tokens'] = model_kwargs['max_tokens'] if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None: optional_params['temperature'] = model_kwargs['temperature'] # Enable streaming explicitly for Qwen-omni-turbo if necessary if model_name.lower() == 'qwen-omni-turbo': optional_params['streaming'] = True return AliyunBaiLianSpeechToText( model=model_name, api_key=model_credential.get('api_key'), **optional_params )

Key Changes:

Used Dict[str, Any] to allow runtime-specific key-value pairs in model_kwargs.

Removed hardcoding of default values (optional_params) for clarity.

Explicitly enabled streaming only when the model name matches 'qwen-omni-turbo', assuming lowercase comparisons align with the expected input format.

Improved commenting to reflect the current behavior better.

feat: Qwen2.5-omni-7b full modal speech recognition

880df4a

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Aug 18, 2025

shaohuzhang1 commented Aug 18, 2025

View reviewed changes

zhanweizhang7 merged commit b32b063 into v2 Aug 18, 2025
3 of 6 checks passed

zhanweizhang7 deleted the pr@v2@feat_qwen2.5_omni_7b_full_modal_speech_recognition branch August 18, 2025 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Qwen2.5-omni-7b full modal speech recognition #3870

feat: Qwen2.5-omni-7b full modal speech recognition #3870

Uh oh!

shaohuzhang1 commented Aug 18, 2025

Uh oh!

f2c-ci-robot bot commented Aug 18, 2025

Uh oh!

f2c-ci-robot bot commented Aug 18, 2025

Uh oh!

shaohuzhang1 Aug 18, 2025

Uh oh!

shaohuzhang1 Aug 18, 2025

Uh oh!

Uh oh!

Uh oh!

feat: Qwen2.5-omni-7b full modal speech recognition #3870

feat: Qwen2.5-omni-7b full modal speech recognition #3870

Uh oh!

Conversation

shaohuzhang1 commented Aug 18, 2025

Uh oh!

f2c-ci-robot bot commented Aug 18, 2025

Uh oh!

f2c-ci-robot bot commented Aug 18, 2025

Uh oh!

shaohuzhang1 Aug 18, 2025

Choose a reason for hiding this comment

Review and Recommendations

Overall Conclusion

Additional Advice for Quality Assurance:

Uh oh!

shaohuzhang1 Aug 18, 2025

Choose a reason for hiding this comment

Key Changes:

Uh oh!

Uh oh!

Uh oh!