I would like to be able to dictate into my Delphi application using Microsoft Vista's speech recognition. However when attempting to dictate into a TMemo, it simply does not work. I noticed that Firefox 3.0 has the same issue and after they contacted Microsoft about the problem, they told them that they need to implement the Text Services Framework for their application.
I am wondering if there is any way to implement this in Delphi so that I can dictate into a TMemo or a TRichEdit. Searching Google for a solution didn't return any relevant results. Where would I start in finding a solution?
Edit: I found out that there is a way to enable speech recognition in all programs, even those that don't support it, simply by going to the options of Windows Speech Recognition and selecting Enable dictation everywhere. However when you use this to dictate into an editbox that doesn't use the Text Services Framework, it always pops up the Alternates Panel which displays the prompt Say the number next to the item you want, followed by OK. While this may work for short sentences, it does not have many of the useful features such as correcting or deleting a word. So I am still trying to figure out how to enable speech recognition without relying on the Enable dictation everywhere option.
I found out that there is a way to enable speech recognition in all programs, even those that don't support it, simply by going to the options of Windows Speech Recognition and selecting Enable dictation everywhere. However when you use this to dictate into an editbox that doesn't use the Text Services Framework, it always pops up the Alternates Panel which displays the prompt Say the number next to the item you want, followed by OK. While this may work for short sentences, it does not have many of the useful features such as correcting or deleting a word.
Text to speech in Vista
Just tested it with a button like the demo code on that page, works fine in Vista SP1/D2007. (funny, I clicked the 'Vista' tag-link and found it there...)
Related
I'm working on an application that requires the use of a text to speech synthesizer. Implementing this was rather simple for iOS using AVSpeechSynthesizer. However, when it comes to customizing synthesis, I was directed to documentation about speech synthesis for an OSX only API, which allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
I was hoping someone might know of a similar library or plugin that might accomplish the same task. If you do, it would be much appreciated if you would lend a hand.
Thanks in advance!
AVSpeechSynthesizer for iOS is not capable (out of the box) to work with phonemes. NSSpeechSynthesizer is capable of it, but that's not available on iOS.
You can create an algorithm that produces short phonemes, but it would be incredibly difficult to make it sound good by any means.
... allows you to input phoneme pairs, in order to customize word pronunciation. Unfortunately, this interface is not available on iOS.
This kind of interface is definitely available on iOS: in your device settings (iOS 12), once the menu General - Accessibility - Speech - Pronunciations is reached:
Select the '+' icon to add a new phonetic element.
Name this new element in order to quickly find it later on.
Tap the microphone icon.
Vocalize an entire sentence or a single word.
Listen to the different system proposals.
Validate your choice with the 'OK' button or cancel to start over.
Tap the back button to confirm the new created phonetic element.
Find all the generated elements in the Pronunciations page.
Following the steps above, you will be able to synthesize speech using phonemes for iOS.
Before I spend a lot of time (I guess) reading up on SpeechSynthesisUtterance, is this the way to go to add voice recognition to my Dart+Polymer web app?
In fact, all I need is dictation (to fill in a text box). Nothing particularly clever (on my part at least!).
Or [polymer] "is there an element for that?" ;-)
cheers
Steve
If it works in JS you can make it work in Dart. No need of Polymer but it will work with Polymer.dart as well.
Currently i am using open ears to detect a phrase and it works pretty well, although i would like to recognize all words in the english language and add that to a text field. So I had two thoughts on how to approach this.
1) Somehow load the entire english dictionary into OpenEars.
(i don't think it is a good idea because they say from 2-300 words or something like that
2)Activate the native iOS voice recognition without deploying the keyboard.
I'm leaning towoards the second way if possible because i love the live recognition in iOS 8, it works flawlessly for me.
How do i recognize all words using one of the two methods (or a better way if you know)?
Thank you
The answer is that you can't do 1) or 2), at least not the way you want to. OpenEars won't handle the whole English dictionary, and you can't get iOS voice recognition without the keyboard widget. You might want to look into Dragon Dictation, which is the speech engine that Siri uses, or SILVIA. You'll have to pay for a license though.
We're working on a app for blind and visually impaired users. We've been experimenting with a third party library to get spoken user input and convert it to text, which we then parse as commands to control the app. The problem is that the word recognition is not very good and certainly not anywhere near as good as what iOS uses to get voice input on a text field.
I'd like to experiment with that, but our users are mostly unable to tap a text field, then hit the mic button on the popup keyboard, then hit the done button or even dismiss any of it. I'm not even sure how they can deal with a single tap on the whole screen, it might be too difficult for some. So, I'd like to automate that for them, but I don't see anything in the docs that indicates it is possible. So, is it even possible, and if so, what's the proper way to do it so that it passes verification?
The solution for you is to implement a keyword spotting so that the speech recognition will be activated with the keyword instead of button tap. After that you can record commands/text and recognize them with any service you need. Something like "Ok google" activation on Motorola X.
There are several keyword activation libraries for iOS, one possible solution is OpenEars based on the open source speech recogntion library CMUSphinx. If you want to use Pocketsphinx directly, you can find keyword activation implementation in kws branch in subversion (branches/kws)
The only way to get the iOS dictation is to sign up yourself through Nuance: http://dragonmobile.nuancemobiledeveloper.com/ - it's expensive, because it's the best. Presumably, Apple's contract prevents them from exposing an API.
The built in iOS accessibility features allow immobilized users to access dictation (and other keyboard buttons) through tools like VoiceOver and Assistive Touch. It may not be worth reinventing this if your users might be familiar with these tools.
I bough a cheap RFID reader from eBay, just to play about with. There is no API, it just writes to stdin - that it to say, if you have Notepad open and tap an RFID tag to the reader its Id number appears in the Notepad window.
I am looking around for a reasonably priced reader/writer with an actual API (any recommendations?).
Until then I need to knock together a quick demo using what I have, just to prove the concept.
How can I best intercept the input from the USB connection? (and is there a free VCL control to do this?)
I guess if I just have a modal form with a control which is active then I can hook its on change event. But modal forms seem a bit rude. Maybe I can hook keyboard input, as it seems to be injecting like types chars?
Any idea? Please tell me if I cam not explaining this clearly enough.
Thanks in advance for your help.
In the end, I just hooked the keyboard, rather than trying to intercept the USB. It works if I check that my application is active and pass on the keystrokes otherwise. My app doesn't have any keyboard input, just mouse clicks (and what I read from RFID is digits only, so I can still handle things like Alt+F4. Maybe not the perfect solution for everyone, but all that I could get to work)
Based on your description, it sounds like the RFID reader is providing a USB HID keyboard interface.
I don't know if there is anything similar in delphi, but in libusb there is a libusb_claim_interface, which requests that the OS hand control over to your program.
A Delphi library for doing HID devices:
http://www.soft-gems.net/index.php?option=com_content&task=view&id=14&Itemid=33