Speech Technology

Baidu’s speech technology provides developers with such industry-leading capabilities as speech-to-text,text-to-speech, speech wake-up. Combining with the NLP technology, it is applicable for several scenarios, including speech input, speech search, video subtitle, audio content analysis, calling center, book broadcasting, news broadcasting and order broadcasting.

Product List
Application Scenarios
Special Advantages
Relevant Recommendations

Product List

Short Speech-to-Text
It can convert a speech with a duration of fewer than 60 seconds to characters. It is applicable for mobile speech input, intelligent speech interaction, speech command, and speech search.
Real-time Speech-to-Text
It can convert the audio stream into characters and return each sentence's start and end time. It is applicable for such scenarios as long-sentence speech input, audio and video subtitles, and meetings record.
Audio File Transcription
It can convert the audio files uploaded in batch into characters and return the recognition results within 12 hours. It is applicable for such scenarios as record quality check, audio content analysis.
Call Center Solution
The end-to-end speech technology solution adopted for the call center scenario includes speech-to-text at an 8K sampling rate, speech synthesis. It helps enterprises access the call center’s speech capability more efficiently.
Speech Self-training Platform
With the professional text for business scenarios, it can train language model with zero code. It can recognize the speech content more precisely and effectively, improving the recognition accuracy in the business field.
Speech Wake-up
It supports the wake-up by a specific speech command. During the wake-up. It allows you to customize several wake-up words, ensuring natural and smooth conversation for your application.
Online Text-to-Speech
It offers highly anthropomorphic, smooth, and natural language synthesis services. It meets the speech broadcast requirements for reading application, purchase order broadcast, and intelligent hardware.
Offline Text-to-Speech
In an environment without or with weak internet access, it allows you to perform the speech broadcast on intelligent hardware devices. It can synthesize the characters into an audio file and give you a stable, consistent, and natural speech synthesis experience.
Speech Translation
By integrating the high-precise speech-to-text, text translation, and text-to-speech, it provides developers with on-line real-time speech translation capability. It supports four languages, i.e., Chinese, English, Japanese, and Cantonese.

Application Scenarios

Speech Search
Speech Command
Live Video Subtitle
Audio Content Analysis
Book Content Broadcast
Purchase Order Broadcast

Speech Search
It allows you to input the search contents by means of speech. It is applied in such search scenarios as web search, vehicular search, and mobile search, freeing your hands and making the search more efficient. It is applicable for many industries, including video websites, intelligent hardware, and mobile manufacturers.
Speech Command
It allows you to give commands to your device or software for control and operation using speech, without any manual operation. It is applicable for many fields, including intelligent hardware, vehicular systems, robots, mobile APPs, and games.
Live Video Subtitle
As a new live video broadcast means, the speech contents delivered by the host can be transcribed into subtitles on the screen, or it allows you to edit the subtitles.
Audio Content Analysis
It can convert the audio speech records into characters and perform continuous analysis and monitoring. Thus, it allows you to identify any risks and illegal contents and exploit potential marketing opportunities.
Book Content Broadcast
Text-to-speech technology empowers the reading APPs with the broadcasting abilities, freeing the users’ hands and eyes. Several kinds of special voices give every story a proper tone, bringing the users a more exquisite reading experience.
Purchase Order Broadcast
It is applied for such scenarios as car-hailing software, restaurant reservation number calling, and queuing software. Through the text-to-speech, it can perform the purchase order broadcast, helping the users to receive the notification timely and conveniently.

Special Advantages

Abundant Interface Features

The speech-to-text can support post-processing capabilities, such as punctuation mark, number format conversion and time stamp processing. The text-to-speech allows you to set the speed, tone, and volume flexibly and mark the polyphones, meeting the personalized requirements.

Service Stability and High-Efficiency

It features enterprise-level stable service guarantee, professional server clusters carrying with efficient and flexible huge traffic concurrence, and 99.9% service stability guarantee.

Support Model Self-Training

The speech-to-text supports the self-training of language models on the speech self-training platform. You can upload the professional texts in your business area, the zero-code training is done automatically. Generally it can improve the identification rate of the words in business fields by 5-25%.

Several Calling Methods

It offers several calling methods, including REST API, websocket API, Android, iOS, and Linux SDK, and text-to-speech offline SDK. It is applicable for different terminal requirements.

Relevant Recommendations

Text Review
On the basis of NLP technologies, it can identify text contents about porn, terrorism, politics, malicious advertisement, abuse, and illegal articles. It supports you in customizing the black and white lists. It allows you to adjust the review strategy and strictness flexibly.
Video Content Review
With respect to the video contents, it can perform the intelligent review from several dimensions. The review contents include porn, violence, terrorism, politics, advertisement, and user-defined black library. It helps you with the review of the contents on your platform.
Application Technology for Natural Language Processing
Oriented by the multi-scenario technical application, it offers the NLP technical abilities that can be applied for product strategy directly. So, it allows your products to better understand languages and users.

Speech Technology

Product List

Short Speech-to-Text

Real-time Speech-to-Text

Audio File Transcription

Call Center Solution

Speech Self-training Platform

Speech Wake-up

Online Text-to-Speech

Offline Text-to-Speech

Speech Translation

Application Scenarios

Speech Search

Speech Command

Live Video Subtitle

Audio Content Analysis

Book Content Broadcast

Purchase Order Broadcast

Special Advantages

Relevant Recommendations

Text Review

Video Content Review

Application Technology for Natural Language Processing