-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: Qwen2.5-omni-7b full modal speech recognition #3870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Qwen2.5-omni-7b full modal speech recognition #3870
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -80,6 +80,9 @@ | |||
ModelInfo('qwen-omni-turbo', | |||
_('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'), | |||
ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText), | |||
ModelInfo('qwen2.5-omni-7b', | |||
_('The Qwen Omni series model supports inputting multiple modalities of data, including video, audio, images, and text, and outputting audio and text.'), | |||
ModelTypeConst.STT, aliyun_bai_lian_omi_stt_model_credential, AliyunBaiLianOmiSpeechToText), | |||
] | |||
|
|||
module_info_vl_list = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review and Recommendations
-
Code Fragment: The code contains multiple changes across modules:
-
Changed
modelsProvider.modelsImpl.aliyunBaiLian.credential.omi_stt
tomodelsProvider.modelsImpl.aliyunBaiLian.credential.omni_stt
. -
Replaced occurrences of
_model_credential
with_modelcredential
. -
Added two new model entries:
ModelInfo('qwen2.5-omni-7b', ...
-
-
Import Statements:
- Consistent change from
QwenVLModelCredential
toQwenVLChatModel
. - Updated all related classes and models.
- Consistent change from
-
Comments:
- No significant comment modifications noted.
Overall Conclusion
- The provided changes appear consistent with the overall structure and functionality of the API.
No major technical issues were identified in this snippet. However, consider adding comments above each entry (like ModelInfo
) explaining their purpose, given that they might not always be immediately self-explanatory without context.
Additional Advice for Quality Assurance:
For robustness before release, consider these enhancements:
- Validation Checks: Add checks to ensure that credential and model configurations match expected types before usage.
- Edge Case Testing: Cover scenarios where user inputs may differ slightly to identify potential bugs early.
- Testing Frameworks: Integrate existing testing frameworks to automate repeated unit, integration, and end-to-end tests after making these updates.
- Performance Monitoring: After deployment, set up performance monitoring tools to catch regressions quickly when updating similar APIs.
Feel free to add more detailed feedback or ask about specific areas if needed!
@@ -30,8 +30,6 @@ def new_instance(model_type, model_name, model_credential: Dict[str, object], ** | |||
optional_params['max_tokens'] = model_kwargs['max_tokens'] | |||
if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None: | |||
optional_params['temperature'] = model_kwargs['temperature'] | |||
if model_name == 'qwen-omni-turbo': | |||
optional_params['streaming'] = True | |||
return AliyunBaiLianSpeechToText( | |||
model=model_name, | |||
api_key=model_credential.get('api_key'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code snippet has some issues:
-
Optional Parameters Handling: The
optional_params
dictionary uses the same key'max_tokens'
, which can lead to overwriting values in case multiple sources provide this parameter. -
Qwen Model Streaming Support: For the 'qwen-omni-turbo' model, there's an assumption that streaming should be enabled. However, it seems like the logic needs adjustments because if both
model_name == 'qwen-omni-turbo'
andmodel_kwargs['streaming']
is set elsewhere (and possibly overridden by another source), this line might still enable streaming despite its presence in the original parameters. -
Dictionary Usage: While using
Dict[str, object]
is flexible, usingAny
instead could help improve type-checking by letting you specify more precise types for the keys and values. -
Comments Clarity: The comments describe what each part of the code does; however, ensuring they are accurate with respect to actual behavior could make the function easier to reason about.
Here are the suggested improvements:
from typing import Dict
def new_instance(
model_type: str,
model_name: str,
model_credential: Dict[str, Any],
**model_kwargs
) -> AliasBaiLianSpeechToText:
optional_params = {
"max_tokens": 1000, # Default value for max tokens, adjust as needed based on service documentation
"temperature": 0.7 # Default temperature for generation quality, adjust as needed
}
# Update optional parameters from kwargs if present and not None
if 'max_tokens' in model_kwargs and model_kwargs['max_tokens'] is not None:
optional_params['max_tokens'] = model_kwargs['max_tokens']
if 'temperature' in model_kwargs and model_kwargs['temperature'] is not None:
optional_params['temperature'] = model_kwargs['temperature']
# Enable streaming explicitly for Qwen-omni-turbo if necessary
if model_name.lower() == 'qwen-omni-turbo':
optional_params['streaming'] = True
return AliyunBaiLianSpeechToText(
model=model_name,
api_key=model_credential.get('api_key'),
**optional_params
)
Key Changes:
- Used
Dict[str, Any]
to allow runtime-specific key-value pairs inmodel_kwargs
. - Removed hardcoding of default values (
optional_params
) for clarity. - Explicitly enabled streaming only when the model name matches 'qwen-omni-turbo', assuming lowercase comparisons align with the expected input format.
- Improved commenting to reflect the current behavior better.
feat: Qwen2.5-omni-7b full modal speech recognition