I'm planning to transcribe a speech where the language is unknown, so I am trying to detect the language spoken automatically with multiple language codes given, however, I can't seem to find an option to actually find out which language the transcription will be in.
I've looked through the dev page of the speech-to-text api, but I can't seem to find a way to output the language code of the transcribed text.
Anyone could help me with this?
Thank you.
In general, the language code is returned with the results. For example, see the sample code here, which shows how to retrieve the language code from the results.
However, see the issue mentioned here. The language code does not always get returned when multiple languages are specified. As reported in the comments, this is an issue with the Google Speech API, an issue which reported here.
Related
I'm trying to give a feature like google translate app where user can download multiple languages and see translations in those.
More specifically, I need to implement offline language translation. Like a user write some text and wants to translate it in some other language (Spanish or German) without internet.
Is there any way to do that? I'm not able to find anything about this. Please guide me through if someone knows about it.
Thanks.
I have not come across any solution that provides this functionality, although as google translate works, you do need to download required language pack one time. Then you can use it offline. Also language packs can be huge so you can definitely not keep all of them saved in your application at once.
In case this is your requirement, you can check out google ML Kit Translator for iOS. This is pretty neat along with the documentation.
https://developers.google.com/ml-kit/language/translation/ios
I've noticed that certain apps on Android (ie. gboard) support translating phrases such as 'poop emoji' into the actual emoji as part of speech recognition. I was wondering if this is something that is supported through google's cloud speech APIs that I could similarly use in my own applications?
In my initial scan of the API I can't see anything that might indicate a way to turn this on (ie. RecognitionConfig et.al has no obvious toggles for it), and in some quick one-off tests in my own app I wasn't provided emoji-fied results from the service.
I've done a bunch of googling but found nothing so far.
Any insight here would be awesome, thanks!
-edit- Thanks to the answer below I have learned this currently is not supported. I've gone to Google's issue tracker to request this feature. If anyone wishes to track the feature request the link is:
https://issuetracker.google.com/u/1/issues/113978818
The Cloud Speech-to-Text API service doesn't currently support emoji phrases recognition; however, you can use the Send Feedback button located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in case you want to raise a Speech API feature request in order to notify to Google about this desired functionality.
Finally, you can refer to the Release Notes section of Speech-to-Text API to keep the track of the new features and functionalities added to the service.
A simple search of "how alexa works" yielded no results so here it is.
If you go through the documentation for utterances the need to exhaustively list out all possible variations is ridiculous. For example you need to list down the following variations separately to support them.
what's my horoscope
what is my horoscope
what my horoscope is
Maybe I didn't interpret the documentation correctly but I'm just curious as to where exactly the machine learning algorithms come in for identifying intents and skills.
Any pointers to helpful resources will be fine too.
Just pure pattern matching on the transcribed text. We are still in 21st century ...
There is lot of mystery around CC608 usage under iOS.
Apple's UsingHLS offers to declare them in the manifest like this:
#EXT-X-MEDIA:TYPE=CLOSED-CAPTIONS,GROUP-ID="cc",NAME="CC1",LANGUAGE="en",DEFAULT=YES,AUTOSELECT=YES,INSTREAM-ID="CC1"
#EXT-X-MEDIA:TYPE=CLOSED-CAPTIONS,GROUP-ID="cc",NAME="CC2",LANGUAGE="sp",AUTOSELECT=YES,INSTREAM-ID="CC2"
#EXT-X-STREAM-INF:BANDWIDTH=1000000,SUBTITLES="subs",CLOSED-CAPTIONS="cc" x.m3u8
But Apple's official sample stream do include CC608 embedded into the MPEG, and still they didn't list them in their manifest!
On that sample stream, I can turn CC608 on using closedCaptionDisplayEnabled=YES, but this method does not allow selection of a specific language.
In Apple's dev forum I have found this question with a promising answer:
Are you still calling "player?.closedCaptionDisplayEnabled=true"?
There's no need to do that. If you author your HLS playlist properly
with the appropriate language tags, the user can enable captions in
the language of their choice, or disable them completely as well.
I was failing to find API in iOS which will allow me to:
Read the list of available CC608 streams
Activate CC for a specific language
Would appreciate your help with this!
I've been researching in any forums about this problem that I'm facing, I believe be getting close to a fix and so, decided to ask in here for help and also to help any other one who needs this topic.
The problem involves that language in SKRouteAdvices. When retrieved through
SKRoutingService.sharedInstance().routeAdviceListWithDistanceFormat(.Metric)
an array of SKRouteAdvices was retrieved, but all of the advices were written in english, the voice was in portuguese but the .adviceInstruction was in english. I tried to set the advisorSettings (as I should anyway), it didn't work, but, for some unknown reason, when I set to TTS instead of pre-recorded audios, the advices were written portuguese but an weird voice (TSS) was in it instead the pre-recorded, as expected, actually. Then, tired of trying to find an obvious fix, decided to first do this, retrieve the portuguese advices, save in an array and then do it again but as did before to get the pre-recorded voice.
Turns out, the framework has some hidden problem with it, I tried a couple of different ways to get to it but the best I got was the result I wanted but with a 50% chance of crash, I really don't know why but sometimes it just did crash. So then I tried to do the TTS again but trying to getting the pre-recorded voices with the adviceInstruction property. It comes in portuguese and all the audios files are named in english so yes, and it doesn't work either.
Resuming everything: I need the SKRouteAdvices from my advices come in portuguese instruction and also in a pre-recorded voice. Any clue?
I give up trying to find a native way to get it, I followed Sylvia's suggest but I already did that before, I manage to get the result the I wanted by calling start navigation twice. In the first attempt I specify the advisorType (in SKAdvisorConfiguration in SKRoutingService.sharedInstance()) to .TextToSpeech, then, I grab the portuguese instructions and save in to a array and proceed to the second step, I repeat the configuration route and navigation with advisorType set to .AudioFiles.
With this strange combination I got what I wanted.
The text instructions are generated based on the config files (for full details see http://sdkblog.skobbler.com/advisor-support-text-to-speech-scout-audio/ and http://sdkblog.skobbler.com/advisor-support-text-to-speech-faq/)
The bottom line is that due to how the audio files (.mp3) are linked together the text advices generated when using the "audio" option will not be "human readable".
For TTS support the advices meant to be read by a voice, hence they are "human readable".
Right now you cannot have both "mp3" advices and human understandable text instructions at the same time.