I'm using Google Cloud Speech API with IBM Voice Gateway in order to interact with a VoiceBot through a phone.
If I say an identifier contening letters and numbers through the phone, the Google Cloud Speech converts it into string with spaces. For example, if I say "A1B2C3", it will convert it into the following string "a 1 b 2 c 3".
Do you know if there is way to avoid these useless spaces ?
Thanks for your help!
Lucas
I don't see any way in which you can eliminate spaces from the API response. What you could do is experiment with the available features, as this is probably your best chance to get a recognition more similar to what you are looking for.
For example: you can provide some sample hint phrases echoing your use case, indicate that the audio is a phone call, or use an enhanced model (although for the latter to be available you need to first opt in for data logging).
Honestly though, for your case, it might be better if you post process the returned string (e.g. with a simple "a 1 b 2 c 3".replace(' ','') ).
Related
Goal
User should spell out their name one letter at a time, for example: "s a r a h" / "ess ay are ay aych".
Lex should understand and convert it to text together: "sarah"
What I've tried
I'm using Amazon Connect (IVR/phone, user speaks into the phone to spell their name) which is using Lex to listen and convert to text.
I've tried a "AlphaNumeric" slot but it rarely works. I've also tried custom slots for all letters e.g. "a.", "b." - also rarely works.
Has anyone dealt with this? Any direction/experience would be appreciated re. handling spoken (not typed/chat bot) letter-by-letter input using Lex and preferably also Connect.
Other research I've done
https://forums.developer.amazon.com/questions/9331/letter-of-the-alphabet-slot-type.html - this author apparently took the custom letter slot approach, but doesn't really confirm if it worked overall. I've tried this and it's not working, but maybe I'm missing something important.
https://forums.aws.amazon.com/thread.jspa?threadID=261741 - AWS support thread which isn't very helpful
If you click the + next to Slot types you'll have the option to Extend slot type. If you click this you can create a new slot type with a regular expression based on your use case. For a first name that might be [a-z]{2,25}. I think the max length for these is 30, so [a-z]{1,40} would fail.
For those who are not familiar with what a homophone is, I provide the following examples:
our & are
hi & high
to & too & two
While using the Speech API included with iOS, I am encountering situations where a user may say one of these words, but it will not always return the word I want.
I looked into the [alternativeSubstrings] (link) property wondering if this would help, but in my testing of the above words, it always comes back empty.
I also looked into the Natural Language API, but could not find anything in there that looked useful.
I understand that as a user adds more words, the Speech API can begin to infer context and correct for these, but my use case will not work well with this since it will often only want one or two words at most, limiting the effectiveness of context.
An example of contextual processing:
Using the words above on their own, I get these results:
are
hi
to
However, if I put together the following sentence, you can see they are all wrong:
I am too high for our ladder
Ideally, I would either get a list back containing [are, our], [to, too, two], [hi, high] for each transcription segment, or would have a way to compare a string against a function that supports homophones.
An example of this would be:
if myDetectedWord == "to" then { ... }
Where myDetectedWord can be [to, too, two], and this function would return true for each of these.
This is a common NLP dilemma, and I'm not so sure what might be your desired output in this application. However, you may want to bypass this problem in your design/architecture process, if possible and if you could. Otherwise, this problem is to turn into a challenge.
Being said that, if you wish to really get into it, I like this idea of yours:
string against a function
This might be more efficient and performance friendly.
One way, I'd be liking to solve this problem would be though RegEx processing, instead of using endless loops and arrays. You could maybe prototype loops and arrays to begin with and see how it works, then you might want to use regular expression for gaining performance.
You could for instance define fixed arrays in regular expressions and quickly check against your string (word by word, maybe using back-referencing) and you can add many boundaries in your expressions for string processing, as you wish.
Your fixed arrays also can be designed based on probabilities of occurring certain words in certain part of a string. For instance,
^I
vs
^eye
The probability of I being the first word is much higher than that of eye.
The probability of I in any part of a string is higher than that of eye, also.
You might want to weight words based on that.
I'd say the key would be that you'd narrow down your desired outputs as focused as possible and increase accuracy, [maybe even with 100 words if possible], if you wish to have a good/working application.
Good project though, I hope you like/enjoy the challenge.
I want to make custom slots that accepts any and all entries as long as those entries follow a certain regex pattern, eg:any number of alphabets or numbers but without a space in between. Can anyone tell me if there is a way in amazon lex to achieve it?
Also, if I want to take a certain type of data, say, email ids, but want to give the user option to give any number of email ids (more than one), what is the way to do that.
I am new to Amazon Lex and any suggestions would be appreciated.
Make a slot in Lex console in your intent but do not tick as required, and give any type as slot type.
Now in lambda code, first set the slot to null and then parse the inputText using regex and assign the correct value to the slot.
This way both of your problems will be addressed.
Hope it helps. Let us know if you run in any problems.
I am trying to use ANTLR as a parser for my company's latest project. I have been unable to find any information on how to parse one number, say (0005039906179210835699175654) into multiple tokens (a 5 digit number, a 3 digit number, a 14 digit number, and a 6 digit number).
My current code spits back an error,
line 1:1 no viable alternative at input '0005039906179210835699175654'
Also, on another note, does anyone know how to get the name of a token by using a listener? That's just a bonus question I guess :) Thanks in advance to everyone who responds!
EDIT:
To clarify the whole problem, my company receives information in a legacy format from automated systems. This information must be parsed into POJOs for further processing. I am trying to use ANTLR as an easy, smooth, readable, and expandable solution to this. One example is this line:
U0005138606179090232769522950 0863832 18322862 0284785 3
Which must be parsed into the sections: U, 00051, 386, 06179090232769, 522950, 0863832, 18322862, 0284785, and 3. Obviously the sections separated by white space are easy to parse but I have been unable to find a way in ANTLR to parse the values not separated by white space. Any help would be appreciated, thanks!
EDIT2:
To be perfectly clear as to why I'm using ANTLR instead of just java, my company receives messages in 5 legacy formats, and the system implemented to parse them must be easily expandable to accommodate more in the future. ANTLR is easy to read and understand. Plus, it will be easier to construct additional grammars and listeners than try to maintain a random mess of java.
EDIT3:
I thought of a solution but it is pretty janky. My idea is to parse the 28 character number as one token, then split it using java from a listener since it is broken up the same way each time. I'll report back later today on whether I got it to work.
EDIT4, FINAL UPDATE:
I have chosen to go my solution mentioned in edit3. It is not pretty, but it works and it is fast enough. Thank your very much to everyone who commented, shared ideas, and stimulated thought!
We need to parse the GS1 datamatrix barcode which will be provided by other party. We know they are going to use GTIN(01), lot number(10), Expiration date(17), serial number (21). The problems is that barcode reader output a string, the format is like this 01076123456789001710050310AC3453G321455777. Since there is not separator and both serial number and lot number are variable length according to GS1 standard, we have trouble to identify segments. My understanding is that it seems like the best way to parse is to embed the parser in the scanning device, not from the application. But we didn't plan an embed software yet. How can I implement the parser? Any suggestions?
There should be a FNC1 character at the end of a variable-length field that is not filled to maximum; so that FNC1 will appear between the G3 and the 21.
FNC1 is invisible to humans but can be detected by scanners and will be reproduced in the string reported by the scanner. Simply send the string directly to a text file and examine the text with a hex reader. the FNC1 should be obvious.
If you can, it might be an idea to swap the sequence of the 21 field and the 10 field since you appear to be using a pure-numeric for 21. This would make the barcode produced a little shorter.
One way to deal with this is to program the scanner to replace FNC1 with space or another plain text character before sending it to your application. The scanner manufacturer usually provides a tool to produce programming bar codes that can do simple substitutions in the scanner. Then you can parse the data without having to handle special characters.