“A strong, writing board how disappeared again?”
Recently, programmer Ah Qiang’s grandmother, who is brave enough to try new things, has been hooked on online shopping. After not too much effort to feel the shopping software, did not expect that the thought of smooth online shopping road, stuck in the search items. In the handwritten input link, or misoperation, inadvertently replaced to the unfamiliar input method; Or by mistake pressed an abstract command character on the interface… Then A Qiang also often receives the help that grandma sends.
In fact, not only shopping apps, most of the apps loaded in smart phones nowadays are interactive designs for young people, and it’s hard for the elderly to experience and learn how to use them.
After patiently guiding her grandmother to complete the operation again and again, Ah Qiang, the mature coder, proposed a demand for himself: to improve her online shopping experience. Instead of just letting her adapt to the input method, let the input method cater to the grandmother’s use preferences.
Manual input error prone, then write a speech to text input method, as long as the start recording button, real-time speech recognition input, simple and fast, grandma used to say it!
Results demonstrate
Real-time speech recognition and audio to text have rich usage scenarios
1. Application in game applications: when you are in an online game field, you can communicate with your teammates without obstruction through real-time voice recognition, which does not occupy your hands and avoids the embarrassment of your voice on the open mic. 2. Application in office applications: in the workplace, it is inefficient to record a time-consuming meeting by hand, and it is easy to omit details. With the function of transferring audio files to text, I can transfer the discussion content of the meeting, and then organize and polish the transferred text after the meeting, so as to get twice the result with half the effort. 3. Application in learning applications: nowadays, more and more audio teaching materials can be used to pause taking notes while watching, which can easily interrupt the learning rhythm and destroy the integrity of the learning process. With audio file transfer, the text can be reviewed and carding after the systematic study of the textbook, so that the learning experience will be better.
Realize the principle of
Huawei Machine Learning services provide real-time speech recognition and audio file transcribing capabilities. Real-time speech recognition enables real-time input of short speech (up to 60 seconds) into text with recognition accuracy of more than 95%. At present, it supports the recognition of Mandarin Chinese, English, mixed Chinese and English, French, German, Spanish, Italian and Arabic.
- Support for real-time typing.
- Provide sound interface, no sound interface two ways.
- Endpoint detection is supported to accurately locate the start and end points.
- Supports mute detection, the voiceless part of the voice does not send voice packet.
-
Support for intelligent conversion of digital format, such as voice input “2021”, can be intelligently recognized as “2021”.
Audio file overwriteCan be 5 hours of audio files into text, support the output of punctuation marks, form reasonable sentences, easy to understand the text information. At the same time, it supports the generation of text messages with time stamps, which is convenient for the subsequent development of more functions. The current version supports transliteration in both Chinese and English.Development steps
1. Preparation before development
- Configure the address of Huawei Maven repository and place the agconnect-services.json file in the app directory: open the Android Studio project level “build.gradle” file. Add the HUAWEI AGCP plug-in and the Maven code base. In “AllProjects > Repositories”, configure the Maven repository location of the HMS Core SDK. Configure the Maven repository address of the HMS Core SDK in “BuildScript > Repositories”. If you add the file “agconnect-services.json” to your App, you need to add the AGCP configuration in “BuildScript > Dependencies”.
buildscript { repositories { google() jcenter() maven { url 'https://developer.huawei.com/repo/' } } dependencies { The classpath 'com. Android. Tools. Build: gradle: 3.5.4' classpath 'com. Huawei. Agconnect: agcp: 1.4.1.300' / / NOTE: Do not place your application dependencies here; they belong // in the individual module build.gradle files } } allprojects { repositories { google() jcenter() maven { url 'https://developer.huawei.com/repo/' } } }
See the Instructions for the Use of Cloud Authentication Information to set the authentication information of the application.
- Add a compile SDK dependency:
HMS: ML-Computer-Voice-Aft :2.2.0.300' // SDK implementation: ML-Computer-Voice-Aft :2.2.0.300' HMS :ml-computer-voice-asr:2.2.0.300' // Plugin.implementation 'com. Huawei. HMS: ml - computer - voice - asr - plugin: 2.2.0.300'... } apply plugin: 'com.huawei.agconnect' // HUAWEI agconnect Gradle plugin
- Configure the signature file in the build of the app and put the signature file (xxx. JKS) in the app directory:
signingConfigs {
release {
storeFile file("xxx.jks")
keyAlias xxx
keyPassword xxxxxx
storePassword xxxxxx
v1SigningEnabled true
v2SigningEnabled true
}
}
buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
debug {
signingConfig signingConfigs.release
debuggable true
}
}
- Add permissions in MANIFEST.xml:
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<application
android:requestLegacyExternalStorage="true"
...
</application>
2, access to real-time speech recognition capabilities
1. Dynamic application for permissions:
if (ActivityCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO) ! = PackageManager.PERMISSION_GRANTED) { requestCameraPermission(); } private void requestCameraPermission() { final String[] permissions = new String[]{Manifest.permission.RECORD_AUDIO}; if (! ActivityCompat.shouldShowRequestPermissionRationale(this, Manifest.permission.RECORD_AUDIO)) { ActivityCompat.requestPermissions(this, permissions, Constants.AUDIO_PERMISSION_CODE); return; }}
- Create an Intent to set real-time speech recognition parameters.
// Set the authentication information for your application MLApplication.getInstance().setApiKey(AGConnectServicesConfig.fromContext(this).getString("client/api_key")); //// identifies the Settings with Intent. Intent intentPlugin = new Intent (this, MLAsrCaptureActivity. Class) / / set to identify language is English, if not set, the default identification in English. Support Settings: "zh-cn ": Chinese;" "EN-US ": English, etc. .putExtra(MLAsrCaptureConstants.LANGUAGE, MLAsrConstants. LAN_ZH_CN) / / pickup Settings interface is displayed recognition results. PutExtra (MLAsrCaptureConstants FEATURE, MLAsrCaptureConstants.FEATURE_WORDFLUX); startActivityForResult(intentPlugin, "1");
- Override the “onActivityResult” method to handle the result returned by the speech recognition service.
@Override protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) { super.onActivityResult(requestCode, resultCode, data); String text = ""; if (null == data) { addTagItem("Intent data is null.", true); } if (requestCode == "1") { if (data == null) { return; } Bundle bundle = data.getExtras(); if (bundle == null) { return; {} the switch (the resultCode) case MLAsrCaptureConstants. ASR_SUCCESS: / / for speech recognition of text information. if (bundle.containsKey(MLAsrCaptureConstants.ASR_RESULT)) { text = bundle.getString(MLAsrCaptureConstants.ASR_RESULT); } if (text == null || "".equals(text)) { text = "Result is null."; Log.e(TAG, text); } else {// Set the speech recognition result in the search box searchEdit.setText(text); goSearch(text, true); } break; / / the return value for MLAsrCaptureConstants ASR_FAILURE said recognition failure. Case MLAsrCaptureConstants. ASR_FAILURE: / / determine whether contain error codes. if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_CODE)) { text = text + bundle.getInt(MLAsrCaptureConstants.ASR_ERROR_CODE); // Handle the error code. } // Determine if it contains an error message. if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)) { String errorMsg = bundle.getString(MLAsrCaptureConstants.ASR_ERROR_MESSAGE); // Handle the error message. if (errorMsg ! = null && !" ".equals(errorMsg)) { text = "[" + text + "]" + errorMsg; }} // Determine if the suberror code is included. if (bundle.containsKey(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)) { int subErrorCode = bundle.getInt(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE); // Handle the suberror code. text = "[" + text + "]" + subErrorCode; } Log.e(TAG, text); break; default: break; }}}
3. Access the ability to transfer audio files
- Request dynamic permissions.
private static final int REQUEST_EXTERNAL_STORAGE = 1;
private static final String[] PERMISSIONS_STORAGE = {
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE };
public static void verifyStoragePermissions(Activity activity) {
// Check if we have write permission
int permission = ActivityCompat.checkSelfPermission(activity,
Manifest.permission.WRITE_EXTERNAL_STORAGE);
if (permission != PackageManager.PERMISSION_GRANTED) {
// We don't have permission so prompt the user
ActivityCompat.requestPermissions(activity, PERMISSIONS_STORAGE,
REQUEST_EXTERNAL_STORAGE);
}
}
- Create a new audio file transfer engine and initialize it. New Audio File Overwrite Configurator.
/ / set ApiKey. MLApplication.getInstance().setApiKey(AGConnectServicesConfig.fromContext(getApplication()).getString("client/api_key")) ; MLRemoteAftSetting setting = new MLRemoteAftSetting. Factory () / / set the transcription language coding, using BCP - 47 specification, currently supports Chinese mandarin, English transliteration. .setLanguageCode("zh") // Sets whether to automatically add punctuation to the output text. Default is false. .enablePunctuation(true) // Sets whether the output of each segment of audio text and the corresponding audio time shift, the default is false (this parameter only needs to be set for audio less than 1 minute). .enableWordTimeOffset(true) // Sets whether the output sentence appears in the audio file at the time offset value. Default is false. .enableSentenceTimeOffset(true) .create(); // Create a new audio file transfer engine. MLRemoteAftEngine engine = MLRemoteAftEngine.getInstance(); engine.init(this); // Pass the listener callback to the engine. SetaftListener (aftListener) in the audio file transfer engine defined in the first step;
- New listener callback for handling audio file overwrite results: Short speech overwrite: for audio files that are less than 1 minute long
private MLRemoteAftListener aftListener = new MLRemoteAftListener() { public void onResult(String taskId, MlRemoteAFtResult result, Object ext) {// Get notification of the overwrite result. If (result.isComplete()) {// Write the result. }} @Override public void onError(String tasKid, int errorCode, String message) {// Override public void onError(String tasKid, int errorCode, String message) {// Override public void onError(String tasKid, int errorCode, String message) { } @Override public void onInitComplete(String tasKid, Object ext) {// Set the interface. } public void OnUploadProgress (String tasKid, Double Progress, Object ext) {// Override public void OnUploadProgress (String tasKid, Double Progress, Object ext) { } @Override public void OnEvent (String TasKid, int Eventid, Object Ext) {// Override public void OnEvent (String TasKid, int Eventid, Object Ext) { }};
Long voice transliteration: suitable for audio files longer than 1 minute
private MLRemoteAftListener asrListener = new MLRemoteAftListener() { @Override public void onInitComplete(String taskId, Object ext) { Log.e(TAG, "MLAsrCallBack onInitComplete"); Start (taskId); // start(taskId); } @Override public void onUploadProgress(String taskId, double progress, Object ext) { Log.e(TAG, " MLAsrCallBack onUploadProgress"); } @Override public void onEvent(String tasKid, int eventId, Object ext) { "MLAsrCallBack onEvent" + eventId); If (mlaftevents. UPLOADED_EVENT == eventId) {startQueryResult(tasKid); } } @Override public void onResult(String taskId, MLRemoteAftResult result, Object ext) { Log.e(TAG, "MLAsrCallBack onResult taskId is :" + taskId + " "); if (result ! = null) { Log.e(TAG, "MLAsrCallBack onResult isComplete: " + result.isComplete()); if (result.isComplete()) { TimerTask timerTask = timerTaskMap.get(taskId); if (null ! = timerTask) { timerTask.cancel(); timerTaskMap.remove(taskId); } if (result.getText() ! = null) { Log.e(TAG, taskId + " MLAsrCallBack onResult result is : " + result.getText()); tvText.setText(result.getText()); } List<MLRemoteAftResult.Segment> words = result.getWords(); if (words ! = null && words.size() ! = 0) { for (MLRemoteAftResult.Segment word : words) { Log.e(TAG, "MLAsrCallBack word text is : " + word.getText() + ", startTime is : " + word.getStartTime() + ". endTime is : " + word.getEndTime()); } } List<MLRemoteAftResult.Segment> sentences = result.getSentences(); if (sentences ! = null && sentences.size() ! = 0) { for (MLRemoteAftResult.Segment sentence : sentences) { Log.e(TAG, "MLAsrCallBack sentence text is : " + sentence.getText() + ", startTime is : " + sentence.getStartTime() + ". endTime is : " + sentence.getEndTime()); } } } } } @Override public void onError(String taskId, int errorCode, String message) { Log.i(TAG, "MLAsrCallBack onError : " + message + "errorCode, " + errorCode); switch (errorCode) { case MLAftErrors.ERR_AUDIO_FILE_NOTSUPPORTED: break; }}}; Private void start(String tasKid) {log. e(TAG, "start"); engine.setAftListener(asrListener); engine.startTask(taskId); } private Map<String, TimerTaskMap = new HashMap<>(); private void startQueryResult(final String taskId) { Timer mTimer = new Timer(); TimerTask mTimerTask = new TimerTask() { @Override public void run() { getResult(taskId); }}; Mtimer.schedule (mtimerTask, 5000, 10000); TimerTaskMap timerTaskMap.put(tasKid, mtimerTask); }
- Get the audio and upload the audio file to the overwrite engine:
// Get the audio file URI URI URI = getFileURI (); Long audioTime = getaudiofileTimeFromURI (URI); If (audioTime < 60000) {if (audioTime < 60000) {if (audioTime < 60000) {if (audioTime < 60000) {if (audioTime < 60000) { Only supports the length within 1 minute local audio enclosing taskId = this. Engine. ShortRecognize (uri, enclosing setting). Log.i(TAG, "Short audio transcription."); } else {// Longrecognize is a long voice transfer interface for any speech longer than 1 minute and less than 5 hours long. this.taskId = this.engine.longRecognize(uri, this.setting); Log.i(TAG, "Long audio transcription."); } private Long getAudioFileTimeFromUri(Uri uri) { Long time = null; Cursor cursor = this.getContentResolver() .query(uri, null, null, null, null); if (cursor ! = null) { cursor.moveToFirst(); time = cursor.getLong(cursor.getColumnIndexOrThrow(MediaStore.Video.Media.DURATION)); } else { MediaPlayer mediaPlayer = new MediaPlayer(); try { mediaPlayer.setDataSource(String.valueOf(uri)); mediaPlayer.prepare(); } catch (IOException e) { Log.e(TAG, "Failed to read the file time."); } time = Long.valueOf(mediaPlayer.getDuration()); } return time; }
Visit the Huawei Developer Alliance website to learn more about it
Get the development guidance document
Address of Huawei Mobile Service Open Source Warehouse:
GitHub,
Gitee