Android Speech To Text Tutorial
Android comes with an inbuilt feature speech to text through which you can provide speech input to your app. With this you can add some of the cool features to your app like adding voice navigation, filling a form with voice input etc.,
In the background how voice input works is, the speech input will be streamed to a server, on the server voice will be converted to text and finally text will be sent back to our app.
If you want to do the other way i.e converting text to speech, follow my previous tutorial Android Text to Speech
I have created a simple app to demonstrate this tutorial. Below is the screenshot of the app which contains a simple button to invoke speech input and a TextView to display the converted speech text.
So lets start by creating simple app.
English Voice Typing Keyboard
English Voice Typing Keyboard Voice to Text Converter as it instantly converts spoken words to text format with high accuracy.
With the advancement in technology and the rapid growth of the world English Voice Typing keyboard Voice to Text will facilitate your life. Voice to text apps can be a treat for busy professionals who dont even find time to have a conversation with their loved ones. Voice typing is actually a speech recognition tool that records, analyzes and interprets the phrases and words you speak and converts your voice into words much faster than it would take you to type. This feature is useful for visually impaired people to take notes and convey their messages in the easiest way. Voice typing in English will increase your confidence in speaking English in such a way that if you do not understand any phrase, word or sentence, it will confirm it and give alternative suggestions. With each update, app developers try to innovate new core features.In addition to voice typing, it also has built-in aesthetic wallpapers, funky stickers and cute emojis that will blow your mind. The application is very convenient to use while dealing with clients who do not speak the same language as you or useful for those who have moved abroad for study or business purpose. Speechnotes is exemplary for codifying long notes, is a delight for the students to take notes and will save them in chats for later.
Say The Word To Trigger Speech Recognition Inside Your Application
Voice recognition has gained a lot of traction over the past few years. When building an app where you feel speech recognition would boost your user experience, you can either:
- Integrate SpeechRecognizer API.
- Leverage Google Assistant.
Implementing SpeechRecognizer in your Android application is straightforward. Ill provide a detailed implementation later in the article.
However, we want continuous voice recognition. Unfortunately, the API doesnt provide a mechanism to trigger voice recognition using a keyword. All voice recognition systems are based on this pattern, whether its Ok Google for Google Assistant, Hey Siri for iOS, or Alexa for Amazon devices.
For that, the second option should fit our needs. Sadly, Google Assistant remains a closed API and doesnt offer many possibilities. It provides App Action, but you wont achieve continuous voice recognition with it.
I was excited when I first came across VoiceInteractionService. It seemed to do what I wanted with the AlwaysOnHotwordDetector. Unfortunately, its tightly bound to Google Assistant by letting you integrate a custom assistant.
So far, we can use the SpeechRecognizer but we still need to trigger speech recognition from user interaction.
Also Check: Verizon Deals On Android Phones
How To Develop An Speech Recognizer In Android Without Google Api In Kotlin
This example demonstrates how to develop an Speech recognizer in Android without Google API in Kotlin.
Step 1 Create a new project in Android Studio, go to File ? New Project and fill all required details to create a new project.
Step 2 Add the following code to res/layout/activity_main.xml.
Google Docs Voice Typing
Google Docs has now become an integral part of the lives of most content writers. Especially if already a google services user. So if you use Google products such as Gmail and Google Drive, and need an in-built, powerful, yet free dictation tool, consider using Google Docs or Google Slides and make use of their Googles Voice Typing tool. It enables you to type with your voice and make use of over 100 view commands meant explicitly for editing and formatting your documents in any way you like. Including making bullet points, changing the style of the text, and moving the cursor to different parts of the material.
To use Voice Typing through Google Docs, all you have to do is click on the Tools button and then select Voice Typing then allow Google access to your laptop or PCs microphone.
Compatibility: Any Google Chrome compatible device
Don’t Miss: Android Text Messages On Pc
Windows Speech Recognition :
Windows Speech Recognition is a good software for speech recognition, especially because it is specifically designed to work with Windows, and works best in its newest update with Windows 10. Most people reviewed it as good, not great, but also claimed that it is at par with Google Docs Voice Typing and is a Windows version of the same level.
The advantages specific to WSR are that it has computer automation and related features, because it is especially integrated into and designed for the Windows operating system, it has complete control over the computer and its features, like sleep or shutdown options, etc. In addition, it gives the user text editing options, whereby any mistakes can be there and then corrected.
Though, some downsides include the fact that it is not the most accurate voice recognition software available in the market, as its accuracy is on the weaker side, and it cannot be freely used with other operating systems is need be for a change.
Its unique selling point would be the fact that it can control the whole computer through the software options, and can edit as you go. It is also free of cost, without additional charges, and works fine with Windows 10.
Learn Mobile Development And Start Your Free Trial Today
The Android Speech API provides recognition control, background services, intents, and support for multiple languages. Again, it can look like a simple addition to the user input for your apps, but its a very powerful feature that makes them stand out. Imagine how helpful this feature can be for those people with disabilities using a keyboard or simply for those trying to find a way to increase productivity and improve their work flow.
Recommended Reading: News Update Apps For Android
Configuring The Android App
The Playchat sample app requires the microservice URL to enablespeech translation features. To retrieve the microservice URL:
To configure the app to work with the microservice, open theapp/src/main/res/values/speech_translation.xml file in thefirebase-android-clientrepo and update the speechToSpeechEndpoint field with the microservice URL.
What Can The Google Audio Transcription Api Do For Our Applications
Googles Audio Transcription API allows your program to use all the power of Googles huge computing resources to accurately transcribe speech found in audio files into text.
As Googles documentation says:
- Transcribe your content in real time or from stored files
- Deliver a better user experience in products through voice commands
- Gain insights from customer interactions to improve your service
- How can we use the Audio Transcription API with Delphi?
Google cloud services have been positioned as a must have computing service solution today. They allow us to easily use their perfectly-designed AI solutions to use in our applications. Not only that, prices are also reasonable and you can start with zero payment by just adding your credit card.
But how about getting those amazing functions to our Delphi application? Some people would think that Delphi is not the ideal language to work with those popular cloud computing APIs. Its not true. With the help of the huge community for Delphi, its simpler than you think.
Google Speech to Text and Text to speech are two of those cloud computing functions that could vital for some business applications. With the help of few repositories by grijjy in GitHub, we can easily get those functionalities to our Delphi application. Actually the repository is dated from 2017 but surprisingly it works perfectly with the newer versions of Delphi without many changes to the code.
Don’t Miss: How Do I Set Up Parental Controls On Android
Not Working On An Android Emulator #
The above tip about getting it working on an Android device is also useful for emulators. Some users have reported seeing another error on Android simulators – sdk gphone x86 . AUDIO_RECORD perms were in Manifest, also manually set Mic perms in Android Settings. When running sample app, Initialize works, but Start failed the log looks as follows.
D/SpeechToTextPlugin: put partialD/SpeechToTextPlugin: put languageTagD/SpeechToTextPlugin: Error 9 after start at 35 1000.0 / -100.0D/SpeechToTextPlugin: Cancel listening
Resolved it by Opening Google, clicking Mic icon and granting it perms, then everything on the App works…
This is the SO post that helped:
Ensure the app has the required permissions. The symptom for this that you get a permanent error notification ‘error_audio_error` when starting a listen session. Here’s a Stack Overflow post that addresses that Here’s the important excerpt:
You should go to system setting, Apps, Google app, then enable its permission of microphone.
User reported steps
From issue #298 this is the detailed set of steps thatresolved their issue:
Top Text To Speech Apis
Text to Speech services convert text into spoken word audio. The technology is useful for providing content accessibility to people with visual impairments, reading impairments such as dyslexia, speaking impairment, studying languages, playing video games, language translation, and other uses.
Developers wishing to enhance applications with TTS services can tap into APIs for help.
Also Check: How To Add Ads In Android App
Voice Recognition Apis For Longform And Offline Processing
4. IBM Watson
Its no secret were generating, processing, and analyzing larger quantities of data than any other time in history. Not all of that data is going to be clean and well-organized, especially if youre designing or developing an API. As API developers, its our job to make sure that the data is organized and usable.
IBM Watson is perhaps one of the purest expressions of AI as a virtual assistant. IBM Watson is very adept at processing natural language patterns, which is one of the holy grails of AI and machine learning developers.
The IBM Watson Speech to Text API is particularly robust in understanding context, relying on hypothesis generation and evaluation in its response formulation. Its also able to differentiate between multiple speakers, which makes it suitable for most transcription tasks. You can even set a number of filters, eliminating profanities, adding word confidence, and formatting options for speech-to-text applications.
IBM Watson offers three different interfaces for developers. Theres a WebSocket interface, an HTTP REST interface, and an asynchronous HTTP interface.
As one of the best-developed machine learning APIs out there, IBM Watson isnt cheap. It is quick to get up and running, however, meaning you wont waste money on downtime or having to hire multiple developers just to get started. The peace of mind of a nearly plug-and-play Speech-To-Text API may be worth the cost of admission alone.
Best Speech To Text Software Faqs:
Is there speech to text on Microsoft Word?
Yes, dictation technology is available for Microsoft Word independently and as a part of Windows 10. Just press the Windows and H key to launch the toolbar and start speaking. However, it is best to use the Microsoft Office speech to text tool since it will work seamlessly with any Office product. Heres how you can activate the dictation feature if you are an Office 365 subscriber .
What is the best voice recognition software for Mac?
The best text to speech software for Mac systems is the built-in Apple Dictation software. It is also one of the best text to speech software with natural voices. To use it, go to the Apple menu to activate and enjoy.
Recommended Reading: What Is The Cheapest Android Phone To Buy
How To Code The Delphi Audio Transcription Project
Now lets move to the coding part. We use an instance of TgoGoogle class to post data and get the response form the Speech to Text API. We need to set some parameters of TgoGoogle.
TgoGoogle.OAuthScope OAuth Scope of the respective API we going to use. This is the list of Scopes for different APIs.
We use https://www.googleapis.com/auth/cloud-platform in our application to use the Cloud Text-to-Speech API.
TgoGoogle.ServiceAccount This is the ID of the service account we created on google cloud earlier.
TgoGoogle.PrivateKey This is the private key we created earlier. Its a file with PEM extension. In our demo application, we let the user browse the file at runtime.
We can get the PEM file path using this simple code.
Configuring The Default Bucket On Cloud Storage For Firebase
The microservice uses the default Cloud Storage bucket in the Firebaseproject to store the translated audio files. You must enable read access to theuser accounts that want to retrieve the audio files.
To enable read access you need the Firebase user UID of the account. To retrievethe user UID:
To enable read access to the user account, you must create a storage securityrule. To create a security rule:
In the Storage page, go to the Rules section and add the followingrule inside the service firebase.storage section:
match /b//o }
Replace ACCOUNT_USER_UID with the user UID value obtained in theprevious steps.
For more information, seeGet Started with Storage Security Rulesin the Firebase documentation.
Also Check: Google Play App For Android
Conversion Of The Speech
The text interpreted from the speech will be delivered within the Intent, which is returned when the activity has beencompleted and is accessed via GetStringArrayListExtra. This will return anIList< string> , of which the index can be used and displayed, depending on the number of languages requested in thecaller intent . As with any list though, it is worthchecking to ensure that there is data to be displayed.
When listening for the return value of a StartActivityForResult, the OnActivityResult method has to be supplied.
In the example below, textBox is a TextBox used for outputting what has been dictated. It could equally be usedto pass the text to some form of interpreter and from there, the application can compare the text and branch toanother part of the application.
protected override void OnActivityResult } else } base.OnActivityResult }}
Responsivevoice Text To Speech Api
The ResponsiveVoice Text-To-Speech APITrack this API is a cross-platform, HTML5-based library that supports 51 languages. It is open-sourced for non-commercial and non-profit use. It includes speech synthesis and speech recognition with lifelike human digital voices and is designed to voice-enable websites and applications.
Recommended Reading: Free Deer Hunting Apps For Android
Basic Voice Recognition Example
We are now ready to start a basic example utilizing the voice recognition feature. RecognizerIntent.ACTION_RECOGNIZE_SPEECH is the intent defining the request. The only requirement is to specify RecognizerIntent.EXTRA_LANGUAGE_MODEL, which is assigned with RecognizerIntent.LANGUAGE_MODEL_FREE_FORM in our case. If another language is needed, you can supply the data for RecognizerIntent.EXTRA_LANGUAGE. Otherwise, the recognizer will simply use the default locale. To make the example more interesting, we also use RecognizerIntent.EXTRA_PROMPT to prompt a question. Then, we can start the recognition intent.
Once the recognition results are returned, they are saved in the data bundle associated with RecognizerIntent.EXTRA_RESULTS. In this example, we basically check if the answer contains a substring Amazon. Depending on your voice input, it will respond with the message on screen accordingly. The code is implemented in Listing 3.
When the app is run, it will prompt you the question message with a microphone icon waiting for you to say something, as in Figure 4. In Figure 5, I intentionally responded with Google, which does not contain the substring Amazon and therefore the result message was displayed that way.
Listing 3: Basic Voice Recognition Example
Figure 4: Voice Recognition in Action
Figure 5: Voice Recognition Result
Oddcast Text To Speech Api
Recommended Reading: Truck Stop Apps For Android
Incorrect Swift Version Trying To Compile For Ios #
/Users/markvandergon/flutter/.pub-cache/hosted/pub.dartlang.org/speech_to_text-1.1.0/ios/Classes/SwiftSpeechToTextPlugin.swift:224:44: error: value of type 'SwiftSpeechToTextPlugin' has no member 'AVAudioSession' rememberedAudioCategory = self.AVAudioSession.Category ~~~~ ^~~~~~~~~~~~~~ /Users/markvandergon/flutter/.pub-cache/hosted/pub.dartlang.org/speech_to_text-1.1.0/ios/Classes/SwiftSpeechToTextPlugin.swift:227:63: error: type 'Int' has no member 'notifyOthersOnDeactivation' try self.audioSession.setActive
This happens when the Swift language version is not set correctly. See this thread for help .
Create New Android Project
A new project will be created and gradle will resolve all the dependencies.
Don’t Miss: What’s The Best Android Tablet To Buy