I'm currently trying to implement UI tests in my iOS app. An important feature of my app is the ability to let users scan QR codes so they can quickly retrieve the ID's of certain objects. I'm struggling to write tests for this however. What I would like to achieve is to supply mock images to the camera during UI testing so I can essentially simulate the scanning action. So far I haven't really been able to find anything that mentions whether it is even possible to supply image data to the camera, let alone how this would be implemented.
So my question is, is this even possible?
What is the best way, if any, to use Apple's new ARKit with multiple users/devices?
It seems that each devices gets its own scene understanding individually. My best guess so far is to use raw features points positions and try to match them across devices to glue together the different points of views since ARKit doesn't offer any absolute referential reference.
===Edit1, Things I've tried===
1) Feature points
I've played around and with the exposed raw features points and I'm now convinced that in their current state they are a dead end:
they are not raw feature points, they only expose positions but none of the attributes typically found in tracked feature points
their instantiation doesn't carry over from frame to frame, nor are the positions exactly the same
it often happens that reported feature points change by a lot when the camera input is almost not changing, with either a lot appearing or disappearing.
So overall I think it's unreasonable to try to use them in some meaningful way, not being able to make any kind of good point matching within one device, let alone several.
Alternative would to implement my own feature point detection and matching, but that'd be more replacing ARKit than leveraging it.
2) QR code
As #Rickster suggested, I've also tried identifying an easily identifiable object like a QR code and getting the relative referential change from that fixed point (see this question) It's a bit difficult and implied me using some openCV to estimate camera pose. But more importantly very limiting
As some newer answers have added, multiuser AR is a headline feature of ARKit 2 (aka ARKit on iOS 12). The WWDC18 talk on ARKit 2 has a nice overview, and Apple has two developer sample code projects to help you get started: a basic example that just gets 2+ devices into a shared experience, and SwiftShot, a real multiplayer game built for AR.
The major points:
ARWorldMap wraps up everything ARKit knows about the local environment into a serializable object, so you can save it for later or send it to another device. In the latter case, "relocalizing" to a world map saved by another device in the same local environment gives both devices the same frame of reference (world coordinate system).
Use the networking technology of your choice to send the ARWorldMap between devices: AirDrop, cloud shares, carrier pigeon, etc all work, but Apple's Multipeer Connectivity framework is one good, easy, and secure option, so it's what Apple uses in their example projects.
All of this gives you only the basis for creating a shared experience — multiple copies on your app on multiple devices all using a world coordinate system that lines up with the same real-world environment. That's all you need to get multiple users experiencing the same static AR content, but if you want them to interact in AR, you'll need to use your favorite networking technology some more.
Apple's basic multiuser AR demo shows encoding an ARAnchor
and sending it to peers, so that one user can tap to place a 3D
model in the world and all others can see it. The SwiftShot game example builds a whole networking protocol so that all users get the same gameplay actions (like firing slingshots at each other) and synchronized physics results (like blocks falling down after being struck). Both use Multipeer Connectivity.
(BTW, the second and third points above are where you get the "2 to 6" figure from #andy's answer — there's no limit on the ARKit side, because ARKit has no idea how many people may have received the world map you saved. However, Multipeer Connectivity has an 8 peer limit. And whatever game / app / experience you build on top of this may have latency / performance scaling issues as you add more peers, but that depends on your technology and design.)
Original answer below for historical interest...
This seems to be an area of active research in the iOS developer community — I met several teams trying to figure it out at WWDC last week, and nobody had even begun to crack it yet. So I'm not sure there's a "best way" yet, if even a feasible way at all.
Feature points are positioned relative to the session, and aren't individually identified, so I'd imagine correlating them between multiple users would be tricky.
The session alignment mode gravityAndHeading might prove helpful: that fixes all the directions to a (presumed/estimated to be) absolute reference frame, but positions are still relative to where the device was when the session started. If you could find a way to relate that position to something absolute — a lat/long, or an iBeacon maybe — and do so reliably, with enough precision... Well, then you'd not only have a reference frame that could be shared by multiple users, you'd also have the main ingredients for location based AR. (You know, like a floating virtual arrow that says turn right there to get to Gate A113 at the airport, or whatever.)
Another avenue I've heard discussed is image analysis. If you could place some real markers — easily machine recognizable things like QR codes — in view of multiple users, you could maybe use some form of object recognition or tracking (a ML model, perhaps?) to precisely identify the markers' positions and orientations relative to each user, and work back from there to calculate a shared frame of reference. Dunno how feasible that might be. (But if you go that route, or similar, note that ARKit exposes a pixel buffer for each captured camera frame.)
Good luck!
Now, after releasing ARKit 2.0 at WWDC 2018, it's possible to make games for 2....6 users.
For this, you need to use ARWorldMap class. By saving world maps and using them to start new sessions, your iOS application can now add new Augmented Reality capabilities: multiuser and persistent AR experiences.
AR Multiuser experiences. Now you may create a shared frame of a reference by sending archived ARWorldMap objects to a nearby iPhone or iPad. With several devices simultaneously tracking the same world map, you may build an experience where all users (up to 6) can share and see the same virtual 3D content (use Pixar's USDZ file format for 3D in Xcode 10+ and iOS 12+).
session.getCurrentWorldMap { worldMap, error in
guard let worldMap = worldMap else {
showAlert(error)
return
}
}
let configuration = ARWorldTrackingConfiguration()
configuration.initialWorldMap = worldMap
session.run(configuration)
AR Persistent experiences. If you save a world map and then your iOS application becomes inactive, you can easily restore it in the next launch of app and in the same physical environment. You can use ARAnchors from the resumed world map to place the same virtual 3D content (in USDZ or DAE format) at the same positions from the previous saved session.
Not bulletproof answers more like workarounds but maybe you'll find these helpful.
All assume the players are in the same place.
DIY ARKit sets up it's world coordinate system quickly after the AR session has been started. So if you can have all players, one after another, put and align their devices to the same physical location and let them start the session there, there you go. Imagine the inside edges of an L square ruler fixed to whatever available. Or any flat surface with a hole: hold phone agains surface looking through the hole with camera, (re)init session.
Medium Save the player aligning phone manually, instead detect a real world marker with image analysis just like #Rickster described.
Involved Train an Core ML model to recognize iPhones and iPads and their camera location. Like it's done with human face and eyes. Aggregate data on a server, then turn off ML to save power. Note: make sure your model is cover-proof. :)
I'm in the process of updating my game controller framework (https://github.com/robreuss/VirtualGameController) to support a shared controller capability, so all devices would receive input from the control elements on the screens of all devices. The purpose of this enhancement is to support ARKit-based multiplayer functionality. I'm assuming developers will use the first approach mentioned by diviaki, where the general positioning of the virtual space is defined by starting the session on each device from a common point in physical space, a shared reference, and specifically I have in mind being on opposite sides of a table. All the devices would launch the game at the same time and utilize a common coordinate space relative to physical size, and using the inputs from all the controllers, the game would remain theoretically in sync on all devices. Still testing. The obvious potential problem is latency or disruption in the network and the sync falls apart, and it would be difficult to recover except by restarting the game. The approach and framework may work for some types of games fairly well - for example, straightforward arcade-style games, but certainly not for many others - for example, any game with significant randomness that cannot be coordinated across devices.
This is a hugely difficult problem - the most prominent startup that is working on it is 6D.ai.
"Multiplayer AR" is the same problem as persistent SLAM, where you need to position yourself in a map that you may not have built yourself. It is the problem that most self driving car companies are actively working on.
I am working on augmented reality app. I have augmented a 3d model using open GL ES 2.0. Now, my problem is when I move device a 3d model should move according to device movement speed. Just like this app does : https://itunes.apple.com/us/app/augment/id506463171?l=en&ls=1&mt=8. I have used UIAccelerometer to achieve this. But, I am not able to do it.
Should I use UIAccelerometer to achieve it or any other framework?
It is complicated algorithm rather than just Accelerometer. You'd better use any third party frameworks, such as Vuforia, Metaio. That would save a lot of time.
Download and check a few samples apps. That is exactly what you want.
https://developer.vuforia.com/resources/sample-apps
You could use Unity3D to load your 3D model and export XCODE project. Or you could use open GL ES.
From your comment am I to understand that you want to have the model anchored at a real world location? If so, then the easiest way to do it is by giving your model a GPS location and reading the devices' GPS location. There is actually a lot of research going into the subject of positional tracking, but for now GPS is your best (and likely only) option without going into advanced positional tracking solutions.
Seeing as I can't add comments due to my account being too new. I'll also add a warning not to try to position the device using the accelerometer data. You'll get far too much error due to the double integration of acceleration to position (See Indoor Positioning System based on Gyroscope and Accelerometer).
I would definitely use Vuforia for this task.
Regarding your comment:
I am using Vuforia framework to augment 3d model in native iOS. It's okay. But, I want to
move 3d model when I move device. It is not provided in any sample code.
Well, it's not provided in any sample code, but that doesn't necessarily mean it's impossible or too difficult.
I would do it like this (working on Android, C++, but it must be very similar on iOS anyway):
locate your renderFrame function
simply do your translation before actual DrawElements:
QCARUtils::translatePoseMatrix(xMOV, yMOV, zMOV, &modelViewProjectionScaled.data[0]);
Where the data for the movement would be prepared by a function that reads them from the accelerometer as a time and acceleration...
What I actually find challenging is to find just the right calibration for a proper adjustment of the output from the sensor's API, which is a completely different and AR/Vuforia unrelated question. Here I guess you've got a huge advantage over Android devs regarding various devices...
I am trying to build an app that allows the user to record individual people speaking, and then save the recordings on the device and tag each record with the name of the person who spoke. Then there is the detection mode, in which i record someone and can tell whats his name if he is in the local database.
First of all - is this possible at all? I am very new to iOS development and not so familiar with the available APIs.
More importantly, which API should I use (ideally free) to correlate between the incoming voice and the records I have in the local db? This should behave something like Shazam, but much more simple since the database I am looking for a match against is much smaller.
If you're new to iOS development, I'd start with the core app to record the audio and let people manually choose a profile/name to attach it to and worry about the speaker recognition part later.
You obviously have two options for the recognition side of things: You can either tie in someone else's speech authentication/speaker recognition library (which will probably be in C or C++), or you can try to write your own.
How many people are going to use your app? You might be able to create something basic yourself: If it's the difference between a man and a woman you could probably figure that out by doing an FFT spectral analysis of the audio and figure out where the frequency peaks are. Obviously the frequencies used to enunciate different phonemes are going to vary somewhat, so solving the general case for two people who sound fairly similar is probably hard. You'll need to train the system with a bunch of text and build some kind of model of frequency distributions. You could try to do clustering or something, but you're going to run into a fair bit of maths fairly quickly (gaussian mixture models, et al). There are libraries/projects that'll do this. You might be able to port this from matlab, for example: https://github.com/codyaray/speaker-recognition
If you want to take something off-the-shelf, I'd go with a straight C library like mistral, as it should be relatively easy to call into from Objective-C.
The SpeakHere sample code should get you started for audio recording and playback.
Also, it may well take longer for the user to train your app to recognise them than it's worth in time-saving from just picking their name from a list. Unless you're intending their voice to be some kind of security passport type thing, it might just not be worth bothering with.
Back Story: I was approached to write an app, but iOS isn't something that have any experience with.
Short Description: Want an app for coverage map for use in an airplane while spraying.
Long Description: The customer has a some airplanes that he uses to spray chemicals on farm fields. They want a system to display a map of the area, a boundary of the field(s) that are to be sprayed on the current flight, and record the flight path of the airplane. The user interface needs to be extremely clean and simple because the user is going to be flying an airplane while using it. Dropbox will be used to transfer data between the airplane and the main office. Someone in the office will create a list of fields that need to be sprayed, and the boundary information of those fields are stored in a shape file format. Those shape files need to be read by the app and displayed over satellite imagery. The airplane already has a high accuracy GPS receiver on it that outputs NMEA position data at 10Hz or faster. The customer also wants to attach a pressure sensor to the spray circuit to monitor if it is dropping spray or not. That information needs to go to the app as well to paint the screen where the plane has already been. This will help the operator to eliminate overlap and skips.
As for getting the GPS position data and pressure data into an iPad, I'm guessing that 802.11 wireless is the simplest way, with that data being supplied in a TCP data stream. I can build a device that makes the data available as a TCP server on a 802.11 wireless network.
From there, I need an app on the iPad that connects to that server to get the data stream. That data gets parsed and turned in to a map.
I have experience with developing apps for Windows in VB.net and two apps for Android. How much difference is there with development concepts in iOS?
I see that iOS uses OpenGL for the graphics, which is ideal for a map. Can I easily access terrain data like is available in Google Earth?
Like dasdom i will encourage you to not begin with that complex project, perhaps divide the several goals in your requirements and make tiny apps for getting in tune with the iPhone SDK, also you have to learn Objective-C that implies that you are already good enough in C programming.
study this topics: Objective-C, iOS Memory Management, sockets, MapKit, Quartz and CoreGraphics, etc.
Or you can buy this excellent book from Aaron Hillegas:
"iPhone Programming: The Big Nerd Ranch Guide"
That book cover almost all topics to introduce your self in the iOS programming madness :)