Baidu AI Cloud
中国站

Speech Technology

Baidu’s speech technology provides developers with such industry-leading capabilities as speech-to-text,text-to-speech, speech wake-up. Combining with the NLP technology, it is applicable for several scenarios, including speech input, speech search, video subtitle, audio content analysis, calling center, book broadcasting, news broadcasting and order broadcasting.
Speech Technology
  • Product List
  • Application Scenarios
  • Special Advantages
  • Relevant Recommendations

Product List

  • Short Speech-to-Text

    It can convert a speech with a duration of fewer than 60 seconds to characters. It is applicable for mobile speech input, intelligent speech interaction, speech command, and speech search.

  • Real-time Speech-to-Text

    It can convert the audio stream into characters and return each sentence's start and end time. It is applicable for such scenarios as long-sentence speech input, audio and video subtitles, and meetings record.

  • Audio File Transcription

    It can convert the audio files uploaded in batch into characters and return the recognition results within 12 hours. It is applicable for such scenarios as record quality check, audio content analysis.

  • Call Center Solution

    The end-to-end speech technology solution adopted for the call center scenario includes speech-to-text at an 8K sampling rate, speech synthesis. It helps enterprises access the call center’s speech capability more efficiently.

  • Speech Self-training Platform

    With the professional text for business scenarios, it can train language model with zero code. It can recognize the speech content more precisely and effectively, improving the recognition accuracy in the business field.

  • Speech Wake-up

    It supports the wake-up by a specific speech command. During the wake-up. It allows you to customize several wake-up words, ensuring natural and smooth conversation for your application.

  • Online Text-to-Speech

    It offers highly anthropomorphic, smooth, and natural language synthesis services. It meets the speech broadcast requirements for reading application, purchase order broadcast, and intelligent hardware.

  • Offline Text-to-Speech

    In an environment without or with weak internet access, it allows you to perform the speech broadcast on intelligent hardware devices. It can synthesize the characters into an audio file and give you a stable, consistent, and natural speech synthesis experience.

  • Speech Translation

    By integrating the high-precise speech-to-text, text translation, and text-to-speech, it provides developers with on-line real-time speech translation capability. It supports four languages, i.e., Chinese, English, Japanese, and Cantonese.

Application Scenarios

  • Speech Search
  • Speech Command
  • Live Video Subtitle
  • Audio Content Analysis
  • Book Content Broadcast
  • Purchase Order Broadcast
  • Speech Search

    It allows you to input the search contents by means of speech. It is applied in such search scenarios as web search, vehicular search, and mobile search, freeing your hands and making the search more efficient. It is applicable for many industries, including video websites, intelligent hardware, and mobile manufacturers.

    Speech Search
  • Speech Command

    It allows you to give commands to your device or software for control and operation using speech, without any manual operation. It is applicable for many fields, including intelligent hardware, vehicular systems, robots, mobile APPs, and games.

    Speech Command
  • Live Video Subtitle

    As a new live video broadcast means, the speech contents delivered by the host can be transcribed into subtitles on the screen, or it allows you to edit the subtitles.

    Live Video Subtitle
  • Audio Content Analysis

    It can convert the audio speech records into characters and perform continuous analysis and monitoring. Thus, it allows you to identify any risks and illegal contents and exploit potential marketing opportunities.

    Audio Content Analysis
  • Book Content Broadcast

    Text-to-speech technology empowers the reading APPs with the broadcasting abilities, freeing the users’ hands and eyes. Several kinds of special voices give every story a proper tone, bringing the users a more exquisite reading experience.

    Book Content Broadcast
  • Purchase Order Broadcast

    It is applied for such scenarios as car-hailing software, restaurant reservation number calling, and queuing software. Through the text-to-speech, it can perform the purchase order broadcast, helping the users to receive the notification timely and conveniently.

    Purchase Order Broadcast

Special Advantages

Abundant Interface Features
Abundant Interface Features
The speech-to-text can support post-processing capabilities, such as punctuation mark, number format conversion and time stamp processing. The text-to-speech allows you to set the speed, tone, and volume flexibly and mark the polyphones, meeting the personalized requirements.
Service Stability and High-Efficiency
Service Stability and High-Efficiency
It features enterprise-level stable service guarantee, professional server clusters carrying with efficient and flexible huge traffic concurrence, and 99.9% service stability guarantee. 
Support Model Self-Training
Support Model Self-Training
The speech-to-text supports the self-training of language models on the speech self-training platform. You can upload the professional texts in your business area, the zero-code training is done automatically. Generally it can improve the identification rate of the words in business fields by 5-25%. 
Several Calling Methods
Several Calling Methods
It offers several calling methods, including REST API, websocket API, Android, iOS, and Linux SDK, and text-to-speech offline SDK. It is applicable for different terminal requirements.

Relevant Recommendations

  • Text Review

    On the basis of NLP technologies, it can identify text contents about porn, terrorism, politics, malicious advertisement, abuse, and illegal articles. It supports you in customizing the black and white lists. It allows you to adjust the review strategy and strictness flexibly.

  • Video Content Review

    With respect to the video contents, it can perform the intelligent review from several dimensions. The review contents include porn, violence, terrorism, politics, advertisement, and user-defined black library. It helps you with the review of the contents on your platform.

  • Application Technology for Natural Language Processing

    Oriented by the multi-scenario technical application, it offers the NLP technical abilities that can be applied for product strategy directly. So, it allows your products to better understand languages and users.