On my iOS app I use a Google Maps view with a GMSLayer, the transparency renders correctly on a real device but the background is white on Simulator, I can't make the screenshots for the App Store...
And this is the code I use :
// Implement GMSTileURLConstructor
// Returns a Tile based on the x,y,zoom coordinates, and the requested floor
let urls: GMSTileURLConstructor = {(x, y, zoom) in
let url = "http://wxs.ign.fr/[KEY]/geoportail/wmts?LAYER=TRANSPORTS.DRONES.RESTRICTIONS&FORMAT=image/png&SERVICE=WMTS&VERSION=1.0.0&REQUEST=GetTile&STYLE=normal&TILEMATRIXSET=PM&TILEMATRIX=" + String(zoom) + "&TILEROW="+String(y)+"&TILECOL="+String(x)
return URL(string: url)
}
// Create the GMSTileLayer
let layer = GMSURLTileLayer(urlConstructor: urls)
layer.zIndex = 0
layer.map = mapView
class TestTileLayer: GMSSyncTileLayer {
override func tileFor(x: UInt, y: UInt, zoom: UInt) -> UIImage {
let url = URL(string : "http://wxs.ign.fr/[KEY]/geoportail/wmts?LAYER=TRANSPORTS.DRONES.RESTRICTIONS&FORMAT=image/png&SERVICE=WMTS&VERSION=1.0.0&REQUEST=GetTile&STYLE=normal&TILEMATRIXSET=PM&TILEMATRIX=" + String(zoom) + "&TILEROW="+String(y)+"&TILECOL="+String(x))
return UIImage(data: try! Data(contentsOf: url!))!
}
}
I'm sure the image given by the URL uses transparency and I tried to manually change the white pixel to transparent but it doesn't change anything...
This is a simulator problem only it has been mentioned here.
However as #a.munzer suggested some work around you can take a screenshot from a real device and just edit it, shouldn't be that much of a problem.
Related
This is a long question so I wanted to put a TL;DR on top:
I want to track QR codes via on of two methods: image tracking by cropping them upon detection, or placing anchors with raycasting. Both of these methods fail when the phone is in portrait mode. Camera source is an ARSession, SceneKit and RealityKit not used. There's only ARKit. What to do?
I am currently working on an application with Swift in which I try to render some stuff on a server, transmit the video to iPhone and display it on screen using a MTKView. I only needed a custom Meal shader to apply some complex calculations to received frames, so I did not use SceneKit or RealityKit. I only have ARSession from ARKit and a Metal view here, and up to this point everything works fine.
I am able to do image tracking at this point. However, I want to apply this behaviour to QR codes. What I want is to detect a QR code (multiple if possible) and then track it just like images. Since I don't have the QR code as ARReferenceImages beforehand like normal image tracking, I was left with two options:
Option 1: Using raycast(_:) on ARSession
This is probably the right way to do it. However, for this I need to activate both plane tracking options on ARSession, which then creates many anchors and managing them with image tracking becomes harder. This is not the actual problem though. Actual problem is that when the phone is in landscape mode, raycasting works as intended. When phone goes into portrait mode, even if I pass the frame in correct orientation it misses everything and hit test results return empty. I am not using hitTest(_:) because it is deprecated.
I want to explain the "correct orientation" thing here before going into second option. ARSession is capturing frames and I am able to check each frame through didUpdate delegate function of the session. When I read the pixel buffer out of the frame using frame.capturedImage and turn it into a CIImage, the image is always in landscape mode (width > height). Doesn't matter if the phone is in portrait mode or not. So whenever I want to pass this image, I am using oriented(.right) for portrait and oriented(.up) for landscape. I got that idea from another question asked about QR bounding box, and so far it is the best option (but not good enough). Just want to note that when I tried raycasting, I tried it with the image size, not screen size (screen size = my Metal view size because it is fullscreen) since the image is larger than the screen in reality. I am able to see this if I put a breakpoint and quicklook my CIImage created from current camera frame.
Option 2: Cropping the QR and treating it as image tracking
This is another approach which I am currently working on. Algorithm is simple: check every frame with Vision. If there are detected QR codes, read their data first. If that data matches with an existing QR, then re-read it if the cropped QR size is larger than existing one. If not, do nothing. Then use this cropped QR image for tracking QR as an image. At this point we would have the data already so no problems here.
However, I tried many times to do the proper transformation explained here in the answer. Again, I think I am able to transform normalized bounding box into a real rect which can correctly crop the image. Yet, as it is in raycasting, works perfectly only if the phone is in landscape position. When in portrait it works good enough ONLY IF the phone is really close to QR code and it is centered on the screen.
For related code, I have this in my View controller:
private var ciContext: CIContext = CIContext.init(options: nil)
private var sequenceHandler: VNImageRequestHandler?
And then I have this code to extract QR codes from CIImage:
func extractQrCode(image: CIImage) -> [VNBarcodeObservation]? {
self.sequenceHandler = VNImageRequestHandler(ciImage: image)
let barcodeRequest = VNDetectBarcodesRequest()
barcodeRequest.symbologies = [.QR]
try? self.sequenceHandler?.perform([barcodeRequest])
guard let results = barcodeRequest.results else {
return nil
}
return results
}
An this is the delegate that checks and operates on every frame (code currently for Option 2):
func session(_ session: ARSession, didUpdate frame: ARFrame) {
let rotImg = self.renderer?.getInterfaceOrientation() == .portrait ? CIImage(cvPixelBuffer: frame.capturedImage).oriented(.right) : CIImage(cvPixelBuffer: frame.capturedImage)
if let barcodes = self.extractQrCode(image: rotImg) {
for barcode in barcodes {
guard let payload = barcode.payloadStringValue else { continue }
var rect = CGRect()
rect = VNImageRectForNormalizedRect(barcode.boundingBox.botToTop(), Int(rotImg.extent.width), Int(rotImg.extent.height))
let existingQR = TrackedImagesManager.imagesToTrack.filter{ $0.isQR && $0.QRData == payload}.first
if ((rect.size.width < 800 || rect.size.height < 800 || abs(rect.size.height - rect.size.width) > 32) && existingQR == nil) {
DispatchQueue.main.async {
self.showToastMessage(message: "Please get closer to the QR code and try centering it on your screen.", font: UIFont.systemFont(ofSize: 18), duration: 3)
}
continue
} else if (existingQR != nil) {
if (rect.width > existingQR?.originalImage?.size.width ?? 999) {
let croppedImg = rotImg.cropped(to: rect)
let croppedCgImage = self.ciContext.createCGImage(croppedImg, from: croppedImg.extent)!
let trackImg = UIImage(cgImage: croppedCgImage)
existingQR?.originalImage = trackImg
existingQR?.image = ARReferenceImage(croppedCgImage, orientation: .up, physicalWidth: 0.1)
} else {
continue
}
} else if rect.width != 0 {
let croppedImg = rotImg.cropped(to: rect)
let croppedCgImage = self.ciContext.createCGImage(croppedImg, from: croppedImg.extent)!
let trackImg = UIImage(cgImage: croppedCgImage)
TrackedImagesManager.imagesToTrack.append(TrackedImage(id: 9, type: 1, image: ARReferenceImage(croppedCgImage, orientation: .up, physicalWidth: 0.1), originalImage: trackImg, isQR: true, QRData: payload))
print("qr norm rect: \(barcode.boundingBox) \n qr rect: \(rect) \nqr data: \(payload) \nqr hittestres: ")
}
}
}
}
Finally, for the transformation, I have this extension (tried various ways, this is the best so far):
extension CGRect {
func botToTop() -> CGRect {
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
return self.applying(transform)
}
}
So for both options I need some advice to make things right. Android side of the same thing is implemented as in Option 2, but Android returns a nicely cropped QR code upon detection. We don't have that. What do I do now?
In my application, I used VNImageRequestHandler with a custom MLModel for object detection.
The app works fine with iOS versions before 14.5.
When iOS 14.5 came, it broke everything.
Whenever try handler.perform([visionRequest]) throws an error (Error Domain=com.apple.vis Code=11 "encountered unknown exception" UserInfo={NSLocalizedDescription=encountered unknown exception}), the pixelBuffer memory is held and never released, it made the buffers of AVCaptureOutput full then new frame not came.
I have to change the code as below, by copy the pixelBuffer to another var, I solved the problem that new frame not coming, but memory leak problem is still happened.
Because of memory leak, the app crashed after some times.
Notice that before iOS version 14.5, detection works perfectly, try handler.perform([visionRequest]) never throws any error.
Here is my code:
private func predictWithPixelBuffer(sampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// Get additional info from the camera.
var options: [VNImageOption : Any] = [:]
if let cameraIntrinsicMatrix = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
options[.cameraIntrinsics] = cameraIntrinsicMatrix
}
autoreleasepool {
// Because of iOS 14.5, there is a bug that when perform vision request failed, pixel buffer memory leaked so the AVCaptureOutput buffers is full, it will not output new frame any more, this is a temporary work around to copy pixel buffer to a new buffer, this currently make the memory increased a lot also. Need to find a better way
var clonePixelBuffer: CVPixelBuffer? = pixelBuffer.copy()
let handler = VNImageRequestHandler(cvPixelBuffer: clonePixelBuffer!, orientation: orientation, options: options)
print("[DEBUG] detecting...")
do {
try handler.perform([visionRequest])
} catch {
delegate?.detector(didOutputBoundingBox: [])
failedCount += 1
print("[DEBUG] detect failed \(failedCount)")
print("Failed to perform Vision request: \(error)")
}
clonePixelBuffer = nil
}
}
Has anyone experienced the same problem? If so, how did you fix it?
iOS 14.7 Beta available on the developer portal seems to have fixed this issue.
I have a partial fix for this using #Matthijs Hollemans CoreMLHelpers library.
The model I use has 300 classes and 2363 anchors. I used a lot of the code Matthijs provided here to convert the model to MLModel.
In the last step a pipeline is built using the 3 sub models: raw_ssd_output, decoder, and nms. For this workaround you need to remove the nms model from the pipeline, and output raw_confidence and raw_coordinates.
In your app you need to add the code from CoreMLHelpers.
Then add this function to decode the output from your MLModel:
func decodeResults(results:[VNCoreMLFeatureValueObservation]) -> [BoundingBox] {
let raw_confidence: MLMultiArray = results[0].featureValue.multiArrayValue!
let raw_coordinates: MLMultiArray = results[1].featureValue.multiArrayValue!
print(raw_confidence.shape, raw_coordinates.shape)
var boxes = [BoundingBox]()
let startDecoding = Date()
for anchor in 0..<raw_confidence.shape[0].int32Value {
var maxInd:Int = 0
var maxConf:Float = 0
for score in 0..<raw_confidence.shape[1].int32Value {
let key = [anchor, score] as [NSNumber]
let prob = raw_confidence[key].floatValue
if prob > maxConf {
maxInd = Int(score)
maxConf = prob
}
}
let y0 = raw_coordinates[[anchor, 0] as [NSNumber]].doubleValue
let x0 = raw_coordinates[[anchor, 1] as [NSNumber]].doubleValue
let y1 = raw_coordinates[[anchor, 2] as [NSNumber]].doubleValue
let x1 = raw_coordinates[[anchor, 3] as [NSNumber]].doubleValue
let width = x1-x0
let height = y1-y0
let x = x0 + width/2
let y = y0 + height/2
let rect = CGRect(x: x, y: y, width: width, height: height)
let box = BoundingBox(classIndex: maxInd, score: maxConf, rect: rect)
boxes.append(box)
}
let finishDecoding = Date()
let keepIndices = nonMaxSuppressionMultiClass(numClasses: raw_confidence.shape[1].intValue, boundingBoxes: boxes, scoreThreshold: 0.5, iouThreshold: 0.6, maxPerClass: 5, maxTotal: 10)
let finishNMS = Date()
var keepBoxes = [BoundingBox]()
for index in keepIndices {
keepBoxes.append(boxes[index])
}
print("Time Decoding", finishDecoding.timeIntervalSince(startDecoding))
print("Time Performing NMS", finishNMS.timeIntervalSince(finishDecoding))
return keepBoxes
}
Then when you receive the results from Vision, you call the function like this:
if let rawResults = vnRequest.results as? [VNCoreMLFeatureValueObservation] {
let boxes = self.decodeResults(results: rawResults)
print(boxes)
}
This solution is slow because of the way I move the data around and formulate my list of BoundingBox types. It would be much more efficient to process the MLMultiArray data using underlying pointers, and maybe use Accelerate to find the maximum score and best class for each anchor box.
In my case it helped to disable neural engine by forcing CoreML to run on CPU and GPU only. This is often slower but doesn't throw the exception (at least in our case). At the end we implemented a policy to force some of our models to not run on neural engine for certain iOS devices.
See MLModelConfiguration.computeUntis to constraint the hardware coreml model can use.
A few things I want to establish first:
This works properly on multiple iPhones (iOS 10.3 & 11.x)
This works properly on any iPad simulator (iOS 11.x)
What I am left with is a situation where when I run the following code (condensed from my application to remove unrelated code), I am getting an image that is upside down (landscape) or rotated 90 degrees (portrait). Viewing the video that is processed just prior to this step shows that it is properly oriented. All testing has been done on iOS 11.2.5.
* UPDATED *
I did some further testing and found a few more interesting items:
If the video was imported from a phone, or an external source, it is properly processed
If the video was recorded on the iPad in portrait orientation, then the reader extracts it rotated 90 degrees left
If the video was recorded on the iPad in landscape orientation, then the reader extracts it upside down
In the two scenarios above, UIImage reports an orientation of portrait
A condensed version of the code involved:
import UIKit
import AVFoundation
let asset = ...
let assetReader = try? AVAssetReader(asset: asset)
if let assetTrack = asset.tracks(withMediaType: .video).first,
let assetReader = assetReader {
let assetReaderOutputSettings = [
kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_32BGRA)
]
let assetReaderOutput = AVAssetReaderTrackOutput(track: assetTrack,
outputSettings: assetReaderOutputSettings)
assetReaderOutput.alwaysCopiesSampleData = false
assetReader.add(assetReaderOutput)
var images = [UIImage]()
assetReader.startReading()
var sample = assetReaderOutput.copyNextSampleBuffer()
while (sample != nil) {
if let image = sample?.uiImage { // The image is inverted here
images.append(image)
sample = assetReaderOutput.copyNextSampleBuffer()
}
}
// Continue here with array of images...
}
After some exploration, I came across the following that allowed me to obtain the video's orientation from the AVAssetTrack:
let transform = assetTrack.preferredTransform
radians = atan2(transform.b, transform.a)
Once I had that, I was able to convert it to degrees:
let degrees = (radians * 180.0) / .pi
Then, using a switch statement I could determine how to rotate the image:
Switch Int(degrees) {
case -90, 90:
// Rotate accordingly
case 180:
// Flip
default:
// Do nothing
}
I have a map in my app, implemented with Google Maps iOS SDK. Traffic is perfectly works with Google tiles (normal, hybrid, satellite, etc.), but nothing happens when i set my custom tiles. What can be wrong?
Code for custom tiles (tiles itself loads correctly):
let urls: GMSTileURLConstructor = {(x, y, zoom) in
let url = "http://a.tile.openstreetmap.org/\(zoom)/\(x)/\(y).png"
return URL(string: url)
}
let layer = GMSURLTileLayer(urlConstructor: urls)
layer.map = mapView
Code for trigging traffic:
func toggleShowTraffic(status: Bool) {
mapView.isTrafficEnabled = status
}
Is there any way to automate SFSafariViewController? I like the Xcode 7 UI test feature, but it seems it does not support SFSafariViewController automation. Some of the UI flows I am testing require a web browser so the app uses SFSafariViewController to make it safer vs a web view.
If it's similar to launching extensions (currently broken with direct interactions), try tapping the screen at the point where the element you're looking for is:
Example of tapping an action sheet which launches an extension:
func tapElementInActionSheetByPosition(element: XCUIElement!) {
let tableSize = app.tables.elementBoundByIndex(0).frame.size
let elementFrame = element.frame
// get the frame of the cancel button, because it has a real origin point
let CancelY = app.buttons["Cancel"].frame.origin.y
// 8 is the standard apple margin between views
let yCoordinate = CancelY - 8.0 - tableSize.height + elementFrame.midY
// tap the button at its screen position since tapping a button in the extension picker directly is currently broken
app.coordinateWithNormalizedOffset(CGVectorMake(elementFrame.midX / tableSize.width, yCoordinate / app.frame.size.height)).tap()
}
Note: You must tap at the XCUIApplication query layer. Tapping the element by position doesn't work.
For now Xcode 9.3 has support for that, but it doesn't work properly because of annoying Xcode bug.
In test you can print app.webViews.buttons.debugDescription or app.webViews.textFields.debugDescription, and it prints correct information, but after tap or typeText you have crash.
To workaround you can parse debugDescription, extract coordinates and tap by coordinate. For text field you can insert text via "Paste" menu.
private func coordinate(forWebViewElement element: XCUIElement) -> XCUICoordinate? {
// parse description to find its frame
let descr = element.firstMatch.debugDescription
guard let rangeOpen = descr.range(of: "{{", options: [.backwards]),
let rangeClose = descr.range(of: "}}", options: [.backwards]) else {
return nil
}
let frameStr = String(descr[rangeOpen.lowerBound..<rangeClose.upperBound])
let rect = CGRectFromString(frameStr)
// get the center of rect
let center = CGVector(dx: rect.midX, dy: rect.midY)
let coordinate = XCUIApplication().coordinate(withNormalizedOffset: .zero).withOffset(center)
return coordinate
}
func tap(onWebViewElement element: XCUIElement) {
// xcode has bug, so we cannot directly access webViews XCUIElements
// as workaround we can check debugDesciption, find frame and tap by coordinate
let coord = coordinate(forWebViewElement: element)
coord?.tap()
}
Article about that: https://medium.com/#pilot34/work-with-sfsafariviewcontroller-or-wkwebview-in-xcode-ui-tests-8b14fd281a1f
Full code is here: https://gist.github.com/pilot34/09d692f74d4052670f3bae77dd745889