I would like to build a simple reader app for the iPad 2 that would allow users to navigate/read via voice controls. The app would allow the user to enter a mode where the microphone was live and listened for predefined keywords like 'down', 'up', 'next', 'back', 'home', etc.
I don't want to reinvent the wheel on this so I'm just wondering first, if someone has done this already and if not, are there any good tutorials or SDKs available to help with recording someone's voice, and then comparing future output to see if it matches, or just dealing with the microphone in general?
Let's put aside that this is a fairly vaguely worded question for the moment.
If you are expecting to allow voice control in your app that somehow works throughout the entire device, it's just not possible. Your app would only work to control itself -- or at least itself and whatever external hooks you can normally get to the rest of the device, like, say, playing a song out of the user's iTunes library.
If you're planning on doing this in a jailbroken environment, then you should find some open-source library that does voice recognition -- if there are any -- and start from there. Be prepared for a very long haul, though.
Dragon Mobile SDK is what you're looking for.
http://dragonmobile.nuancemobiledeveloper.com/
There maybe others voice recognition SDKs out there, but this is the only one I can think of from the top of my head.
You can find a library called CMU Sphinx. There's an iphone version for it called
PocketSphinx. See if it fits your needs.
I would like to build a simple reader app for the iPad 2 that would allow users to navigate/read via voice controls.
The iOS 13 new feature Voice Control fully meets your request because you can control your device and your app with your voice exactly the same as with touches.
It's also possible to define actions for some specific words for instance.
The device settings are perfectly well detailed to handle this amazing new feature (Accessibility - Voice Control):
If you need dedicated names to be read out in your app, use the accessibilityUserInputLabels property to define them.
That's definitely the built-in tool your need to reach your goal: no need to use external library or SDK, everything is natively provided. ;o)
Related
Is there any way to implement an iOS app that has access to the screen (e.g. screen recording) also when it's backgrounded? Has anyone experience with this?
Apps like TeamViewer do this, but it's not clear to me if they went through a special process with Apple (e.g. a non-open API).
P.S. I am of course assuming that the user would have to explicitly accept this (e.g. like for system extensions on macOS), the goal here is not to make a malicious app but a remote-control tool.
The only way to record the screen in the background is by using the broadcast upload extension in ReplayKit 2. This WWDC talk goes into more detail around how to use this API https://developer.apple.com/videos/play/wwdc2018/601/
Since it's not specifically designed for your use case you will have to do some things differently like locally storing the frames in your App Group instead of uploading them.
Currently, I am working on developing the iOS App that triggers an event upon voice command.
I saw a camera app, where a user says "start recording," then the camera starts to the recording mode.
This is an in-app voice control capability, so I am thinking it is different from SiriKit or SpeechRecognizer, which I have already implemented.
How would I achieve it?
My question is NOT the voice dictation where a user has to press a button to start dictation.
App needs to passively wait for a keyword, or intent, which is something like "myApp, start recording" or "myApp, stop recording", then the app starts/stop that event function accordingly.
Thanks.
OpenEars : Free speech recognition and speech synthesis for the iPhone.
OpenEars makes it simple for you to add offline speech recognition in many languages and synthesized speech/TTS to your iPhone app quickly and easily. It lets everyone get the great results of using advanced speech app interface concepts.
Check out this link.
http://www.politepix.com/openears/
or
Building an iOS App like Siri
https://www.raywenderlich.com/60870/building-ios-app-like-siri
Thank you.
How would I achieve it?
There's an iOS 13 new feature called Voice Control that will allow you to reach your goal.
You can find useful information in the Customize Commands section where all the vocal commands are available (you can create a custom one as well):
For the example of the camera you mentioned, everything can be done vocally as follows:
I showed the items names to understand the vocal commands I used but they can be hidden if you prefer (hide names).
Voice Control is a built-in feature you can use inside your apps as well.
The only thing to do as a developer is eventually adapting the accessibilityUserInputLabels properties if you need specific names to be displayed for some items in your apps.
If you're looking for a voice command without pressing a button on iOS, the Voice Control is THE perfect candidate.
I have question to iOS Developers.
Does anybody know if Apple iOS Api allows to add new commands to build in iOS Voice Control engine. I noticed that Voice Control can control phone application using names, nicknames from address book. It can also play music list from default iOS music player app. I would like in my app to register new voice commands for this Voice Control engine and handle some actions based on recognized commands. I was searching in developer documentations but can't find anything like that. Am I missing something?
There's an iOS 13 new feature called Voice Control that may help you reach your goal:
I would like in my app to register new voice commands for this Voice Control engine and handle some actions based on recognized commands.
This is definitely possible thanks to the Customize Commands - Create New Command... menu:
If you need dedicated names to be read out for some items in your app, use the accessibilityUserInputLabels property to define them.
Following this rationale, you can now register new voice commands from your app to iOS Voice Control engine.
IOS till now not exposed any API's related to voice. However it is achievable using CMU Sphinx.
Big advantage of CMU Sphinx - it works offline.
Can you use the system sounds in your iOS app? I'm looking to have the same list that is used in the default timer app (Marimba, Alarm, Doorbell etc).
Reason i'm asking is that in Apple's own Multimedia docs it says:
Note: System-supplied alert sounds and system-supplied user-interface sound effects are not available to your application. For example, using the kSystemSoundID_UserPreferredAlert constant as a parameter to the AudioServicesPlayAlertSound function will not play anything.
Then i've come across this list of system sound ID's.
So can you use access and use these sounds in your own apps which will pass Apple's review process? If not are similar sounds available open source?
Actually if you use AudioToolbox/AudioToolbox.h framework and import it in your header file for the view controller, you can play Apple system sounds without jailbreaking. For example, putting
AudioServicesPlaySystemSound(0x450);
under an IBAction will play the Apple 'click' sound on the execution of the action.
Also, to hear the system sounds referenced earlier, there is a great app available on github that works on your iPhone (not the iOS simulator) that has the sounds for you to click and hear, as the documentation references them, but you cannot hear them. The app is nice to listen to and then find the corresponding reference number.
No, you are not able to access this sounds, until jailbreak, after jail break you can access this sounds like below.
AudioServicesPlaySystemSound(1000);
i hope this files are copyrighted.
My company is planning develop an iPad app that will rely heavily on video calls and I am trying to decide which API/tools to use.
Our requirements are as follows:
High quality video with minimal delay or dropping of calls (provided users on each end have good connection)
Good security
Customizable UI; ability to move/resize video windows
Ability to switch between FaceTime and iSight cameras; user would typically use FaceTime but could turn iPad around to use iSight when better resolution is required
I am well aware that there is no FaceTime API. Is there some other WebRTC-like API/SDK I could use that meets these requirements? Based on my research so far, OpenTok (TokBox) seems like the best bet.
AddLive is another API, though at this point, it seems incomplete. It is missing files from iOS SDK at least as of today.