Azure Text to Sound Configuration
Configurable parameters:
Parameter Name | Parameter Description | Default Value |
---|---|---|
key | Azure Speech Services Key, For more information, refer to: Azure Speech Services Key | none |
service_region | Azure Speech Services Region, For more information, refer to: Services Region | none |
voice_name | Voice Name, For more information, refer to: Voice Name | zh-CN-YunyangNeural |
style | The voice-specific speaking style. You can express emotions like cheerfulness, empathy, and calmness. You can also optimize the voice for different scenarios like customer service, newscast, and voice assistant. If the style value is missing or invalid, the entire mstts:express-as element is ignored and the service uses the default neutral speech. | None. If this field is configured, SSML will be used to generate the speech. |
styledegree | The intensity of the speaking style. You can specify a stronger or softer style to make the speech more expressive or subdued. The range of accepted values are: 0.01 to 2 inclusive. The default value is 1, which means the predefined style intensity. The minimum unit is 0.01, which results in a slight tendency for the target style. A value of 2 results in a doubling of the default style intensity. If the style degree is missing or isn't supported for your voice, this attribute is ignored. | 默认值是1,只有设置了 style 字段才会生效 |
role | The speaking role-play. The voice can imitate a different age and gender, but the voice name isn't changed. For example, a male voice can raise the pitch and change the intonation to imitate a female voice, but the voice name isn't be changed. If the role is missing or isn't supported for your voice, this attribute is ignored. | 无,只有设置了 style 字段才会生效 |
dialect_name | dialect name | voice_name needs to be set to a value that supports dialects before setting the dialect. Also, please ensure that the Azure region supports the dialect. For Chinese dialects, it is recommended to use the eastasia region. |
prosody_rate | Speech rate, optional values are ['slow', 'x-slow', 'medium', 'fast', 'x-fast'] or [0, 3]. | 1 |
prosody_pitch | Pitch, optional values are ['low', 'x-low', 'medium', 'high', 'x-high'] or [0.5, 1.5]. | 1 |
prosody_volume | Volume, optional values are ['silent', 'x-soft', 'soft', 'medium', 'x-loud'] or [0, 1.5]. | 1 |
When voice_name is set to zh-CN-XiaoxiaoDialectsNeural
, dialect_name can be set to the following dialects:
- Shandong Dialect:
zh-CN-shandong
- Northeastern Mandarin:
zh-CN-liaoning
- Sichuan Dialect:
zh-CN-sichuan
- Taiwanese Mandarin:
zh-TW
- Henan Dialect:
zh-CN-henan
- Shaanxi Dialect:
zh-CN-shaanxi
- Minnan Dialect:
nan-CN
- Anhui Mandarin:
zh-CN-anhui
- Gansu Dialect:
zh-CN-gansu
- Hunan Mandarin:
zh-CN-hunan
- Shaanxi Dialect:
zh-CN-shanxi
For specific SSML settings, please refer to the Azure documentation: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice
Configuration example:
roles.json
{
"1": {
"start_text": "你好,我是小兔兔,请问有什么我可以帮助你的吗?",
"prompt": "你扮演一个孩子的小伙伴,名字叫小兔兔,性格和善,说话活泼可爱,对孩子充满爱心,经常赞赏和鼓励孩子,用5岁孩子容易理解语言提供有趣和创新的回答,每次回复根据聊天主题询问她的看法以激发她的思考和好奇心,现在她来到了你身边问了第一个问题:[你是谁]",
"tts_type": "azure-tts",
"tts_config": {
"key": "aaaaaaaaaaaa",
"service_region": "asiaeast",
"voice_name": "zh-CN-YunyangNeura"
}
}
}