Number text recognition not highlighting/recognizing text - ios

I am following the apple phone number recognition sample. Normally it creates a red outline around the recognized text. Mine does not seem to do recognizing the text and creating the red outline even though I used their code. The only difference is my view controller class is called "TextScanViewController" where their's is just "ViewController". I went through and made sure that any "ViewControllers" were changed to "TextScanViewController". Am I missing something else that I should change?
Here is what it should look like (when I use the original Apple project) compared to what it is doing (should have red outlines but is not showing them even if the text is perfectly in the center of the rectangle)
Should look like:
Looks like:
There are 5 different swift files I am using (PreviewView, TextScanViewController, VisionViewController, StringUtils, AppDelegate)
TextScanViewController:
import UIKit
import AVFoundation
import Vision
class TextScanViewController: UIViewController {
// MARK: - UI objects
#IBOutlet weak var previewView: PreviewView!
#IBOutlet weak var cutoutView: UIView!
#IBOutlet weak var numberView: UILabel!
var maskLayer = CAShapeLayer()
// Device orientation. Updated whenever the orientation changes to a
// different supported orientation.
var currentOrientation = UIDeviceOrientation.portrait
// MARK: - Capture related objects
private let captureSession = AVCaptureSession()
let captureSessionQueue = DispatchQueue(label: "com.example.apple-samplecode.CaptureSessionQueue")
var captureDevice: AVCaptureDevice?
var videoDataOutput = AVCaptureVideoDataOutput()
let videoDataOutputQueue = DispatchQueue(label: "com.example.apple-samplecode.VideoDataOutputQueue")
// MARK: - Region of interest (ROI) and text orientation
// Region of video data output buffer that recognition should be run on.
// Gets recalculated once the bounds of the preview layer are known.
var regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1)
// Orientation of text to search for in the region of interest.
var textOrientation = CGImagePropertyOrientation.up
// MARK: - Coordinate transforms
var bufferAspectRatio: Double!
// Transform from UI orientation to buffer orientation.
var uiRotationTransform = CGAffineTransform.identity
// Transform bottom-left coordinates to top-left.
var bottomToTopTransform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
// Transform coordinates in ROI to global coordinates (still normalized).
var roiToGlobalTransform = CGAffineTransform.identity
// Vision -> AVF coordinate transform.
var visionToAVFTransform = CGAffineTransform.identity
// MARK: - View controller methods
override func viewDidLoad() {
super.viewDidLoad()
// Set up preview view.
previewView.session = captureSession
// Set up cutout view.
cutoutView.backgroundColor = UIColor.gray.withAlphaComponent(0.5)
maskLayer.backgroundColor = UIColor.clear.cgColor
maskLayer.fillRule = .evenOdd
cutoutView.layer.mask = maskLayer
// Starting the capture session is a blocking call. Perform setup using
// a dedicated serial dispatch queue to prevent blocking the main thread.
captureSessionQueue.async {
self.setupCamera()
// Calculate region of interest now that the camera is setup.
DispatchQueue.main.async {
// Figure out initial ROI.
self.calculateRegionOfInterest()
}
}
}
override func viewWillTransition(to size: CGSize, with coordinator: UIViewControllerTransitionCoordinator) {
super.viewWillTransition(to: size, with: coordinator)
// Only change the current orientation if the new one is landscape or
// portrait. You can't really do anything about flat or unknown.
let deviceOrientation = UIDevice.current.orientation
if deviceOrientation.isPortrait || deviceOrientation.isLandscape {
currentOrientation = deviceOrientation
}
// Handle device orientation in the preview layer.
if let videoPreviewLayerConnection = previewView.videoPreviewLayer.connection {
if let newVideoOrientation = AVCaptureVideoOrientation(deviceOrientation: deviceOrientation) {
videoPreviewLayerConnection.videoOrientation = newVideoOrientation
}
}
// Orientation changed: figure out new region of interest (ROI).
calculateRegionOfInterest()
}
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
updateCutout()
}
// MARK: - Setup
func calculateRegionOfInterest() {
// In landscape orientation the desired ROI is specified as the ratio of
// buffer width to height. When the UI is rotated to portrait, keep the
// vertical size the same (in buffer pixels). Also try to keep the
// horizontal size the same up to a maximum ratio.
let desiredHeightRatio = 0.15
let desiredWidthRatio = 0.6
let maxPortraitWidth = 0.8
// Figure out size of ROI.
let size: CGSize
if currentOrientation.isPortrait || currentOrientation == .unknown {
size = CGSize(width: min(desiredWidthRatio * bufferAspectRatio, maxPortraitWidth), height: desiredHeightRatio / bufferAspectRatio)
} else {
size = CGSize(width: desiredWidthRatio, height: desiredHeightRatio)
}
// Make it centered.
regionOfInterest.origin = CGPoint(x: (1 - size.width) / 2, y: (1 - size.height) / 2)
regionOfInterest.size = size
// ROI changed, update transform.
setupOrientationAndTransform()
// Update the cutout to match the new ROI.
DispatchQueue.main.async {
// Wait for the next run cycle before updating the cutout. This
// ensures that the preview layer already has its new orientation.
self.updateCutout()
}
}
func updateCutout() {
// Figure out where the cutout ends up in layer coordinates.
let roiRectTransform = bottomToTopTransform.concatenating(uiRotationTransform)
let cutout = previewView.videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: regionOfInterest.applying(roiRectTransform))
// Create the mask.
let path = UIBezierPath(rect: cutoutView.frame)
path.append(UIBezierPath(rect: cutout))
maskLayer.path = path.cgPath
// Move the number view down to under cutout.
var numFrame = cutout
numFrame.origin.y += numFrame.size.height
numberView.frame = numFrame
}
func setupOrientationAndTransform() {
// Recalculate the affine transform between Vision coordinates and AVF coordinates.
// Compensate for region of interest.
let roi = regionOfInterest
roiToGlobalTransform = CGAffineTransform(translationX: roi.origin.x, y: roi.origin.y).scaledBy(x: roi.width, y: roi.height)
// Compensate for orientation (buffers always come in the same orientation).
switch currentOrientation {
case .landscapeLeft:
textOrientation = CGImagePropertyOrientation.up
uiRotationTransform = CGAffineTransform.identity
case .landscapeRight:
textOrientation = CGImagePropertyOrientation.down
uiRotationTransform = CGAffineTransform(translationX: 1, y: 1).rotated(by: CGFloat.pi)
case .portraitUpsideDown:
textOrientation = CGImagePropertyOrientation.left
uiRotationTransform = CGAffineTransform(translationX: 1, y: 0).rotated(by: CGFloat.pi / 2)
default: // We default everything else to .portraitUp
textOrientation = CGImagePropertyOrientation.right
uiRotationTransform = CGAffineTransform(translationX: 0, y: 1).rotated(by: -CGFloat.pi / 2)
}
// Full Vision ROI to AVF transform.
visionToAVFTransform = roiToGlobalTransform.concatenating(bottomToTopTransform).concatenating(uiRotationTransform)
}
func setupCamera() {
guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .back) else {
print("Could not create capture device.")
return
}
self.captureDevice = captureDevice
// NOTE:
// Requesting 4k buffers allows recognition of smaller text but will
// consume more power. Use the smallest buffer size necessary to keep
// down battery usage.
if captureDevice.supportsSessionPreset(.hd4K3840x2160) {
captureSession.sessionPreset = AVCaptureSession.Preset.hd4K3840x2160
bufferAspectRatio = 3840.0 / 2160.0
} else {
captureSession.sessionPreset = AVCaptureSession.Preset.hd1920x1080
bufferAspectRatio = 1920.0 / 1080.0
}
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else {
print("Could not create device input.")
return
}
if captureSession.canAddInput(deviceInput) {
captureSession.addInput(deviceInput)
}
// Configure video data output.
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
// NOTE:
// There is a trade-off to be made here. Enabling stabilization will
// give temporally more stable results and should help the recognizer
// converge. But if it's enabled the VideoDataOutput buffers don't
// match what's displayed on screen, which makes drawing bounding
// boxes very hard. Disable it in this app to allow drawing detected
// bounding boxes on screen.
videoDataOutput.connection(with: AVMediaType.video)?.preferredVideoStabilizationMode = .off
} else {
print("Could not add VDO output")
return
}
// Set zoom and autofocus to help focus on very small text.
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return
}
captureSession.startRunning()
}
// MARK: - UI drawing and interaction
func showString(string: String) {
// Found a definite number.
// Stop the camera synchronously to ensure that no further buffers are
// received. Then update the number view asynchronously.
captureSessionQueue.sync {
self.captureSession.stopRunning()
DispatchQueue.main.async {
self.numberView.text = string
self.numberView.isHidden = false
}
}
}
#IBAction func handleTap(_ sender: UITapGestureRecognizer) {
captureSessionQueue.async {
if !self.captureSession.isRunning {
self.captureSession.startRunning()
}
DispatchQueue.main.async {
self.numberView.isHidden = true
}
}
}
}
// MARK: - AVCaptureVideoDataOutputSampleBufferDelegate
extension TextScanViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// This is implemented in VisionViewController.
}
}
// MARK: - Utility extensions
extension AVCaptureVideoOrientation {
init?(deviceOrientation: UIDeviceOrientation) {
switch deviceOrientation {
case .portrait: self = .portrait
case .portraitUpsideDown: self = .portraitUpsideDown
case .landscapeLeft: self = .landscapeRight
case .landscapeRight: self = .landscapeLeft
default: return nil
}
}
}
PreviewView:
import Foundation
import UIKit
import AVFoundation
class PreviewView: UIView {
var videoPreviewLayer: AVCaptureVideoPreviewLayer {
guard let layer = layer as? AVCaptureVideoPreviewLayer else {
fatalError("Expected `AVCaptureVideoPreviewLayer` type for layer. Check PreviewView.layerClass implementation.")
}
return layer
}
var session: AVCaptureSession? {
get {
return videoPreviewLayer.session
}
set {
videoPreviewLayer.session = newValue
}
}
// MARK: UIView
override class var layerClass: AnyClass {
return AVCaptureVideoPreviewLayer.self
}
}
VisionViewController:
import UIKit
import AVFoundation
import Vision
class VisionViewController: TextScanViewController {
var request: VNRecognizeTextRequest!
// Temporal string tracker
let numberTracker = StringTracker()
override func viewDidLoad() {
// Set up vision request before letting ViewController set up the camera
// so that it exists when the first buffer is received.
request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
super.viewDidLoad()
}
// MARK: - Text recognition
// Vision recognition handler.
func recognizeTextHandler(request: VNRequest, error: Error?) {
var numbers = [String]()
var redBoxes = [CGRect]() // Shows all recognized text lines
var greenBoxes = [CGRect]() // Shows words that might be serials
guard let results = request.results as? [VNRecognizedTextObservation] else {
return
}
let maximumCandidates = 1
for visionResult in results {
guard let candidate = visionResult.topCandidates(maximumCandidates).first else { continue }
// Draw red boxes around any detected text, and green boxes around
// any detected phone numbers. The phone number may be a substring
// of the visionResult. If a substring, draw a green box around the
// number and a red box around the full string. If the number covers
// the full result only draw the green box.
var numberIsSubstring = true
if let result = candidate.string.extractPhoneNumber() {
let (range, number) = result
// Number may not cover full visionResult. Extract bounding box
// of substring.
if let box = try? candidate.boundingBox(for: range)?.boundingBox {
numbers.append(number)
greenBoxes.append(box)
numberIsSubstring = !(range.lowerBound == candidate.string.startIndex && range.upperBound == candidate.string.endIndex)
}
}
if numberIsSubstring {
redBoxes.append(visionResult.boundingBox)
}
}
// Log any found numbers.
numberTracker.logFrame(strings: numbers)
show(boxGroups: [(color: UIColor.red.cgColor, boxes: redBoxes), (color: UIColor.green.cgColor, boxes: greenBoxes)])
// Check if we have any temporally stable numbers.
if let sureNumber = numberTracker.getStableString() {
showString(string: sureNumber)
numberTracker.reset(string: sureNumber)
}
}
override func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
// Configure for running in real-time.
request.recognitionLevel = .fast
// Language correction won't help recognizing phone numbers. It also
// makes recognition slower.
request.usesLanguageCorrection = false
// Only run on the region of interest for maximum speed.
request.regionOfInterest = regionOfInterest
let requestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: textOrientation, options: [:])
do {
try requestHandler.perform([request])
} catch {
print(error)
}
}
}
// MARK: - Bounding box drawing
// Draw a box on screen. Must be called from main queue.
var boxLayer = [CAShapeLayer]()
func draw(rect: CGRect, color: CGColor) {
let layer = CAShapeLayer()
layer.opacity = 0.5
layer.borderColor = color
layer.borderWidth = 2
layer.frame = rect
boxLayer.append(layer)
previewView.videoPreviewLayer.insertSublayer(layer, at: 1)
}
// Remove all drawn boxes. Must be called on main queue.
func removeBoxes() {
for layer in boxLayer {
layer.removeFromSuperlayer()
}
boxLayer.removeAll()
}
typealias ColoredBoxGroup = (color: CGColor, boxes: [CGRect])
// Draws groups of colored boxes.
func show(boxGroups: [ColoredBoxGroup]) {
DispatchQueue.main.async {
let layer = self.previewView.videoPreviewLayer
self.removeBoxes()
for boxGroup in boxGroups {
let color = boxGroup.color
for box in boxGroup.boxes {
let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))
self.draw(rect: rect, color: color)
}
}
}
}
}
StringUtils:
import Foundation
extension Character {
// Given a list of allowed characters, try to convert self to those in list
// if not already in it. This handles some common misclassifications for
// characters that are visually similar and can only be correctly recognized
// with more context and/or domain knowledge. Some examples (should be read
// in Menlo or some other font that has different symbols for all characters):
// 1 and l are the same character in Times New Roman
// I and l are the same character in Helvetica
// 0 and O are extremely similar in many fonts
// oO, wW, cC, sS, pP and others only differ by size in many fonts
func getSimilarCharacterIfNotIn(allowedChars: String) -> Character {
let conversionTable = [
"s": "S",
"S": "5",
"5": "S",
"o": "O",
"Q": "O",
"O": "0",
"0": "O",
"l": "I",
"I": "1",
"1": "I",
"B": "8",
"8": "B"
]
// Allow a maximum of two substitutions to handle 's' -> 'S' -> '5'.
let maxSubstitutions = 2
var current = String(self)
var counter = 0
while !allowedChars.contains(current) && counter < maxSubstitutions {
if let altChar = conversionTable[current] {
current = altChar
counter += 1
} else {
// Doesn't match anything in our table. Give up.
break
}
}
return current.first!
}
}
extension String {
// Extracts the first US-style phone number found in the string, returning
// the range of the number and the number itself as a tuple.
// Returns nil if no number is found.
func extractPhoneNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 10 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
}
class StringTracker {
var frameIndex: Int64 = 0
typealias StringObservation = (lastSeen: Int64, count: Int64)
// Dictionary of seen strings. Used to get stable recognition before
// displaying anything.
var seenStrings = [String: StringObservation]()
var bestCount = Int64(0)
var bestString = ""
func logFrame(strings: [String]) {
for string in strings {
if seenStrings[string] == nil {
seenStrings[string] = (lastSeen: Int64(0), count: Int64(-1))
}
seenStrings[string]?.lastSeen = frameIndex
seenStrings[string]?.count += 1
print("Seen \(string) \(seenStrings[string]?.count ?? 0) times")
}
var obsoleteStrings = [String]()
// Go through strings and prune any that have not been seen in while.
// Also find the (non-pruned) string with the greatest count.
for (string, obs) in seenStrings {
// Remove previously seen text after 30 frames (~1s).
if obs.lastSeen < frameIndex - 30 {
obsoleteStrings.append(string)
}
// Find the string with the greatest count.
let count = obs.count
if !obsoleteStrings.contains(string) && count > bestCount {
bestCount = Int64(count)
bestString = string
}
}
// Remove old strings.
for string in obsoleteStrings {
seenStrings.removeValue(forKey: string)
}
frameIndex += 1
}
func getStableString() -> String? {
// Require the recognizer to see the same string at least 10 times.
if bestCount >= 10 {
return bestString
} else {
return nil
}
}
func reset(string: String) {
seenStrings.removeValue(forKey: string)
bestCount = 0
bestString = ""
}
}
AppDelegate:
import UIKit
#UIApplicationMain
class AppDelegate: UIResponder, UIApplicationDelegate {
var window: UIWindow?
}

I was using the wrong class on the view controller.. instead of it being TextScanViewController it should have been set to Visionviewcontroller... it was skipping a whole class. I didn't realize how classes are inherited and that there was an important order to them. I have a lot to learn but learning a lot! :)

Related

ARKit: Tracking VisonCoreML detected object

I'm new to iOS and I am currently refactoring a code I got from a tutorial on VisionCoreML and ARKit that adds a node to the detected object.
currently, if the I move the object the node does not move and follow the object. I can see from Apple's sample code for Recognizing Objects in Live Capture they use layers and repositions this each time Vision detects the object at a new position which is what I was hoping to replicate with an ARObject.
Is there a way I can achieve this with ARKit?
Any help around this would be greatly appreciated.
Thanks.
EDIT: Working code with solution
#IBOutlet var sceneView: ARSCNView!
private var viewportSize: CGSize!
private var previousAnchor: ARAnchor?
private var trackingNode: SCNNode!
lazy var objectDetectionRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: yolov5s(configuration: MLModelConfiguration()).model)
let request = VNCoreMLRequest(model: model) { [weak self] request, error in
self?.processDetections(for: request, error: error)
}
return request
} catch {
fatalError("Failed to load Vision ML model.")
}
}()
func renderer(_ renderer: SCNSceneRenderer, willRenderScene scene: SCNScene, atTime time: TimeInterval) {
guard let capturedImage = sceneView.session.currentFrame?.capturedImage
else { return }
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: capturedImage, orientation: .leftMirrored, options: [:])
do {
try imageRequestHandler.perform([objectDetectionRequest])
} catch {
print("Failed to perform image request.")
}
}
func processDetections(for request: VNRequest, error: Error?) {
guard error == nil else {
print("Object detection error: \(error!.localizedDescription)")
return
}
guard let results = request.results else { return }
for observation in results where observation is VNRecognizedObjectObservation {
let objectObservation = observation as! VNRecognizedObjectObservation
let topLabelObservation = objectObservation.labels.first
print(topLabelObservation!.identifier + " " + "\(Int(topLabelObservation!.confidence * 100))%")
guard recognisedObject(topLabelObservation!.identifier) && topLabelObservation!.confidence > 0.9
else { continue }
let rect = VNImageRectForNormalizedRect(
objectObservation.boundingBox,
Int(self.sceneView.bounds.width),
Int(self.sceneView.bounds.height))
let midPoint = CGPoint(x: rect.midX, y: rect.midY)
let raycastQuery = self.sceneView.raycastQuery(from: midPoint,
allowing: .estimatedPlane,
alignment: .any)
let raycastArray = self.sceneView.session.raycast(raycastQuery!)
guard let raycastResult = raycastArray.first else { return }
let position = SCNVector3(raycastResult.worldTransform.columns.3.x,
raycastResult.worldTransform.columns.3.y,
raycastResult.worldTransform.columns.3.z)
if let _ = trackingNode {
trackingNode!.worldPosition = position
} else {
trackingNode = createNode()
trackingNode!.worldPosition = position
self.sceneView.scene.rootNode.addChildNode(trackingNode!)
}
}
}
private func recognisedObject(_ identifier: String) -> Bool {
return identifier == "remote" || identifier == "mouse"
}
private func createNode() -> SCNNode {
let sphereNode = SCNNode(geometry: SCNSphere(radius: 0.01))
sphereNode.geometry?.firstMaterial?.diffuse.contents = UIColor.purple
return sphereNode
}
private func loadSession() {
let configuration = ARWorldTrackingConfiguration()
configuration.planeDetection = []
sceneView.session.run(configuration)
}
override func viewDidLoad() {
super.viewDidLoad()
sceneView.delegate = self
viewportSize = sceneView.frame.size
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
loadSession()
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
sceneView.session.pause()
}
To be honest, the technologies you're using here cannot do that out of the box. YOLO (and any other object detection model you swapped out for it) have no built in concept of tracking the same object in a video. They look for objects in a 2D bitmap, and return 2D bounding boxes for them. As either the camera or object moves, and you pass in the next capturedImage buffer, it will give you a new bounding box in the correct position, but it has no way of knowing whether or not it's the same instance of the object detected in a previous frame.
To make this work, you'll need to do some post processing of those Vision results to determine whether or not it's the same object, and if so, manually move the anchor/mesh to match the new position. If you're confident there should only be one object in view at any given time, then it's pretty straightforward. If there will be multiple objects, you're venturing into complex (but still achievable) territory.
You could try to incorporate Vision Tracking, which might work though would depend on the nature and behavior of the tracked object.
Also, sceneView.hitTest() is deprecated. You should probably port that over to use ARSession.raycast()

Show depth data with ARKit and MetalKit

I am total beginner in Swift & iOS, and I am trying to:
Visualise the depth map on the phone screen, instead of the actual video recording.
Save both the RGB and depth data stream.
I am currently stuck on the first one. I am using ARKit4 with MetalKit. It seems that I can get the depth data from the frame, but the visualisation that I am rendering is really bad. According to the ARKit4 video (https://youtu.be/SpZyxHkmfqE?t=1132 - with timestamp), the quality of the depth map is really low, the colors are actually different, and the distant objects are not shown at all (of course, I do not mean really distant objects, but even on ~1m it already completely fails in the indoor static environment). Examples are in the bottom of the question.
My ViewController.swift:
import UIKit
import Metal
import MetalKit
import ARKit
extension MTKView : RenderDestinationProvider {
}
class ViewController: UIViewController, MTKViewDelegate, ARSessionDelegate {
var session: ARSession!
var configuration = ARWorldTrackingConfiguration()
var renderer: Renderer!
var depthBuffer: CVPixelBuffer!
var confidenceBuffer: CVPixelBuffer!
override func viewDidLoad() {
super.viewDidLoad()
// Set the view's delegate
session = ARSession()
session.delegate = self
// Set the view to use the default device
if let view = self.view as? MTKView {
view.device = MTLCreateSystemDefaultDevice()
view.backgroundColor = UIColor.clear
view.delegate = self
guard view.device != nil else {
print("Metal is not supported on this device")
return
}
// Configure the renderer to draw to the view
renderer = Renderer(session: session, metalDevice: view.device!, renderDestination: view)
renderer.drawRectResized(size: view.bounds.size)
}
//let tapGesture = UITapGestureRecognizer(target: self, action: #selector(ViewController.handleTap(gestureRecognize:)))
//view.addGestureRecognizer(tapGesture)
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
// Create a session configuration
//let configuration = ARWorldTrackingConfiguration()
configuration.frameSemantics = .sceneDepth
// Run the view's session
session.run(configuration)
UIApplication.shared.isIdleTimerDisabled = true
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
// Pause the view's session
session.pause()
}
/*#objc
func handleTap(gestureRecognize: UITapGestureRecognizer) {
// Create anchor using the camera's current position
if let currentFrame = session.currentFrame {
// Create a transform with a translation of 0.2 meters in front of the camera
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
let transform = simd_mul(currentFrame.camera.transform, translation)
// Add a new anchor to the session
let anchor = ARAnchor(transform: transform)
session.add(anchor: anchor)
}
}
*/
// MARK: - MTKViewDelegate
// Called whenever view changes orientation or layout is changed
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
renderer.drawRectResized(size: size)
}
// Called whenever the view needs to render
func draw(in view: MTKView) {
renderer.update()
}
// MARK: - ARSessionDelegate
func session(_ session: ARSession, didFailWithError error: Error) {
// Present an error message to the user
}
func sessionWasInterrupted(_ session: ARSession) {
// Inform the user that the session has been interrupted, for example, by presenting an overlay
}
func sessionInterruptionEnded(_ session: ARSession) {
// Reset tracking and/or remove existing anchors if consistent tracking is required
}
}
My Renderer.swift (only the modified functions updateCaptureImageTextures(frame: ARFrame) and drawCapturedImage(renderEncoder: MTLRenderCommandEncoder):
import Foundation
import Metal
import MetalKit
import ARKit
protocol RenderDestinationProvider {
var currentRenderPassDescriptor: MTLRenderPassDescriptor? { get }
var currentDrawable: CAMetalDrawable? { get }
var colorPixelFormat: MTLPixelFormat { get set }
var depthStencilPixelFormat: MTLPixelFormat { get set }
var sampleCount: Int { get set }
}
// The max number of command buffers in flight
let kMaxBuffersInFlight: Int = 3
// The max number anchors our uniform buffer will hold
let kMaxAnchorInstanceCount: Int = 64
// The 16 byte aligned size of our uniform structures
let kAlignedSharedUniformsSize: Int = (MemoryLayout<SharedUniforms>.size & ~0xFF) + 0x100
let kAlignedInstanceUniformsSize: Int = ((MemoryLayout<InstanceUniforms>.size * kMaxAnchorInstanceCount) & ~0xFF) + 0x100
// Vertex data for an image plane
let kImagePlaneVertexData: [Float] = [
-1.0, -1.0, 0.0, 1.0,
1.0, -1.0, 1.0, 1.0,
-1.0, 1.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0,
]
class Renderer {
let session: ARSession
let device: MTLDevice
let inFlightSemaphore = DispatchSemaphore(value: kMaxBuffersInFlight)
var renderDestination: RenderDestinationProvider
// Metal objects
var commandQueue: MTLCommandQueue!
var sharedUniformBuffer: MTLBuffer!
var anchorUniformBuffer: MTLBuffer!
var imagePlaneVertexBuffer: MTLBuffer!
var capturedImagePipelineState: MTLRenderPipelineState!
var capturedImageDepthState: MTLDepthStencilState!
var anchorPipelineState: MTLRenderPipelineState!
var anchorDepthState: MTLDepthStencilState!
var capturedImageTextureY: CVMetalTexture?
var capturedImageTextureCbCr: CVMetalTexture?
// Captured image texture cache
var capturedImageTextureCache: CVMetalTextureCache!
// Metal vertex descriptor specifying how vertices will by laid out for input into our
// anchor geometry render pipeline and how we'll layout our Model IO vertices
var geometryVertexDescriptor: MTLVertexDescriptor!
// MetalKit mesh containing vertex data and index buffer for our anchor geometry
var cubeMesh: MTKMesh!
// Used to determine _uniformBufferStride each frame.
// This is the current frame number modulo kMaxBuffersInFlight
var uniformBufferIndex: Int = 0
// Offset within _sharedUniformBuffer to set for the current frame
var sharedUniformBufferOffset: Int = 0
// Offset within _anchorUniformBuffer to set for the current frame
var anchorUniformBufferOffset: Int = 0
// Addresses to write shared uniforms to each frame
var sharedUniformBufferAddress: UnsafeMutableRawPointer!
// Addresses to write anchor uniforms to each frame
var anchorUniformBufferAddress: UnsafeMutableRawPointer!
// The number of anchor instances to render
var anchorInstanceCount: Int = 0
// The current viewport size
var viewportSize: CGSize = CGSize()
// Flag for viewport size changes
var viewportSizeDidChange: Bool = false
var depthTexture: CVMetalTexture?
var confidenceTexture: CVMetalTexture?
.......................................
func updateCapturedImageTextures(frame: ARFrame) {
// Create two textures (Y and CbCr) from the provided frame's captured image
//
guard let depthData = frame.sceneDepth ?? frame.sceneDepth else { return }
var pixelBufferDepth: CVPixelBuffer!
pixelBufferDepth = depthData.depthMap
var texturePixelFormat: MTLPixelFormat!
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
depthTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
pixelBufferDepth = depthData.confidenceMap
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
confidenceTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
let pixelBuffer = frame.capturedImage
if (CVPixelBufferGetPlaneCount(pixelBuffer) < 2) {
return
}
capturedImageTextureY = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.r8Unorm, planeIndex:0)
capturedImageTextureCbCr = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.rg8Unorm, planeIndex:1)
}
func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
if status != kCVReturnSuccess {
texture = nil
}
return texture
}
func drawCapturedImage(renderEncoder: MTLRenderCommandEncoder) {
guard let textureY = capturedImageTextureY, let textureCbCr = capturedImageTextureCbCr, let depthTexture = depthTexture, let confidenceTexture = confidenceTexture else {
return
}
// Push a debug group allowing us to identify render commands in the GPU Frame Capture tool
renderEncoder.pushDebugGroup("DrawCapturedImage")
// Set render command encoder state
renderEncoder.setCullMode(.none)
renderEncoder.setRenderPipelineState(capturedImagePipelineState)
renderEncoder.setDepthStencilState(capturedImageDepthState)
// Set mesh's vertex buffers
renderEncoder.setVertexBuffer(imagePlaneVertexBuffer, offset: 0, index: Int(kBufferIndexMeshPositions.rawValue))
// Set any textures read/sampled from our render pipeline
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureY), index: Int(kTextureIndexY.rawValue))
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureCbCr), index: Int(kTextureIndexCbCr.rawValue))
renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2)
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(confidenceTexture), index: 3)
// Draw each submesh of our mesh
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
renderEncoder.popDebugGroup()
}
}
Everything else is the same like in MetalKit default template of Xcode.
So, do I access the data in some wrong way? Do I have some configuration parameters wrong? Do I just render the depth map in some bad way? Or the sensor on new iPhone just really has so bad data (though does not look like, as I have managed to acquire decent 3D point clouds with some apps from AppStore, even on distance of 3-4 meters).
Update: I've figured out that the quality is better if I change renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2) to renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 1). This is, however, just a random observation because the documentation is... well, not very extensive. The rendered image is, however, still green-to-white, while I want it to be either grayscale, or looking as the RGB map shown in the referenced video (that would be perfect, but the grayscale version would be enough).

AVCaptureVideoDataOutputSampleBufferDelegate drop frames using CIFilters for video filtering

I have very strange case where AVCaptureVideoDataOutputSampleBufferDelegate drops frames if I use 13 different filter chains. Let me explain:
I have CameraController setup, nothing special, here is my delegate method:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
if connection.output?.connection(with: .audio) == nil {
//capture video
// my try to avoid "Out of buffers error", no luck ;(
lastCapturedBuffer = nil
let err = CMSampleBufferCreateCopy(allocator: kCFAllocatorDefault, sampleBuffer: sampleBuffer, sampleBufferOut: &lastCapturedBuffer)
if err == noErr {
}
connection.videoOrientation = .portrait
// getting image
let pixelBuffer = CMSampleBufferGetImageBuffer(lastCapturedBuffer!)
// remove if any
CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: pixelBuffer!)
//remove if any
CVPixelBufferUnlockBaseAddress(pixelBuffer!,CVPixelBufferLockFlags(rawValue: 0))
//CVPixelBufferUnlockBaseAddress(pixelBuffer!, .readOnly)
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(lastCapturedBuffer!)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(lastCapturedBuffer!)
}
// apply effect in realtime <- here is problem. If I comment next line, it will be fixed but effect will n't be applied
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(lastCapturedBuffer!)
}
}
} else {
// paused
}
}
I also implemented didDrop delegate method, here is how I figure out why it drops frames:
func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
print("did drop")
var mode: CMAttachmentMode = 0
let reason = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_DroppedFrameReason, attachmentModeOut: &mode)
print("reason \(String(describing: reason))") // Optional(OutOfBuffers)
}
So I did it like a pro and just commented parts of code to find where is the problem. So, it here:
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
FilterManager - is singleton, here is called func:
func applyFilterForCamera(inputImage: CIImage) -> CIImage {
return currentVsFilter!.apply(sourceImage: inputImage)
}
currentVsFilter is object of VSFilter type - here is example of one:
import Foundation
import AVKit
class TestFilter: CustomFilter {
let _name = "Тестовый Фильтр"
let _displayName = "Test Filter"
var tempImage: CIImage?
var final: CGImage?
override func name() -> String {
return _name
}
override func displayName() -> String {
return _displayName
}
override init() {
super.init()
print("Test Filter init")
// setup my custom kernel filter
self.noise.type = GlitchFilter.GlitchType.allCases[2]
}
// this returns composition for playback using AVPlayer
override func composition(asset: AVAsset) -> AVMutableVideoComposition {
let composition = AVMutableVideoComposition(asset: asset, applyingCIFiltersWithHandler: { request in
let inputImage = request.sourceImage.cropped(to: request.sourceImage.extent)
DispatchQueue.global(qos: .userInitiated).async {
let output = self.apply(sourceImage: inputImage, forComposition: true)
request.finish(with: output, context: nil)
}
})
let size = FilterManager.shared.cropRectForOrientation().size
composition.renderSize = size
return composition
}
// this returns actual filtered CIImage, used for both AVPlayer composition and realtime camera
override func apply(sourceImage: CIImage, forComposition: Bool = false) -> CIImage {
// rendered text
tempImage = FilterManager.shared.textRenderedImage()
// some filters chained one by one
self.screenBlend?.setValue(tempImage, forKey: kCIInputImageKey)
self.screenBlend?.setValue(sourceImage, forKey: kCIInputBackgroundImageKey)
self.noise.inputImage = self.screenBlend?.outputImage
self.noise.inputAmount = CGFloat.random(in: 1.0...3.0)
// result
tempImage = self.noise.outputImage
// correct crop
let rect = forComposition ? FilterManager.shared.cropRectForOrientation() : FilterManager.shared.cropRect
final = self.context.createCGImage(tempImage!, from: rect!)
return CIImage(cgImage: final!)
}
}
And now, the most strange thing, I have 30 VSFilters and when I got to 13(switching one by one by UIButton) I got error "Out of Buffer", this one:
kCMSampleBufferDroppedFrameReason_OutOfBuffers
What I tested:
I changed vsFilters order in filters array inside FilterManager singleton - same
I tried switch from first to 12 one by one, then go back - works, but after I switched to 13tn(of 30th from 0) - bug
Looks like it can handle only 12 VSFIlter objects, like if it retains them somehow or maybe it's related to threading, I don't know.
This app made for iOs devices, tested on iPhone X iOs 13.3.1
This is video editor app to apply different effects to both live stream from camera and video files from camera roll
Maybe someone has experience with this?
Have a great day
Best, Victor
Edit 1. If I reinit cameraController(AVCaptureSession. input/output devices) it works but this is ugly option and it adds lag when switching filters
Ok, so I finally won this battle. In case some one else get this "OutOfBuffer" problem, here is my solution
As I figured out, CIFilter grabs CVPixelBuffer and don't release it while filtering images. It's kinda creates one huge buffer, I guess. Strange thing: it don't create memory leak, so I guess it grabs not particular buffer but creates strong reference to it. As rumors(me) say, it can handle only 12 such references.
So, my approach was to copy CVPixelBuffer and then work with it instead of buffer I got from AVCaptureVideoDataOutputSampleBufferDelegate didOutput func
Here is my new code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
//print("camera controller \(id) got frame")
if connection.output?.connection(with: .audio) == nil {
//capture video
connection.videoOrientation = .portrait
// getting image
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// this works!
let copyBuffer = pixelBuffer.copy()
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: copyBuffer)
//remove if any
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(sampleBuffer)
}
self.captured = FilterManager.shared.applyFilterForCamera(inputImage: self.captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(sampleBuffer)
}
}
} else {
// paused
//print("paused camera controller \(id)")
}
}
and there is func to copy buffer:
func copy() -> CVPixelBuffer {
precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")
var _copy : CVPixelBuffer?
CVPixelBufferCreate(
kCFAllocatorDefault,
CVPixelBufferGetWidth(self),
CVPixelBufferGetHeight(self),
CVPixelBufferGetPixelFormatType(self),
nil,
&_copy)
guard let copy = _copy else { fatalError() }
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
let copyBaseAddress = CVPixelBufferGetBaseAddress(copy)
let currBaseAddress = CVPixelBufferGetBaseAddress(self)
print("copy data size: \(CVPixelBufferGetDataSize(copy))")
print("self data size: \(CVPixelBufferGetDataSize(self))")
memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(copy))
//memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(self) * 2)
CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
return copy
}
I used it as extension
I hope, this will help anyone with similar problem
Best, Victor

Toggle flash in ios swift

I am building an image clasifier app. On camera screen I have a switch button which I want to use to toggle flash so that user can switch on flash in low light.
Here is my code:
import UIKit
import AVFoundation
import Vision
// controlling the pace of the machine vision analysis
var lastAnalysis: TimeInterval = 0
var pace: TimeInterval = 0.33 // in seconds, classification will not repeat faster than this value
// performance tracking
let trackPerformance = false // use "true" for performance logging
var frameCount = 0
let framesPerSample = 10
var startDate = NSDate.timeIntervalSinceReferenceDate
var flash=0
class ImageDetectionViewController: UIViewController {
var callBackImageDetection :(State)->Void = { state in
}
#IBOutlet weak var previewView: UIView!
#IBOutlet weak var stackView: UIStackView!
#IBOutlet weak var lowerView: UIView!
#IBAction func swithch(_ sender: UISwitch) {
if(sender.isOn == true)
{
stopActiveSession();
let captureSession=AVCaptureSession()
let captureDevice: AVCaptureDevice?
setupCamera(flash: 1)
}
}
var previewLayer: AVCaptureVideoPreviewLayer!
let bubbleLayer = BubbleLayer(string: "")
let queue = DispatchQueue(label: "videoQueue")
var captureSession = AVCaptureSession()
var captureDevice: AVCaptureDevice?
let videoOutput = AVCaptureVideoDataOutput()
var unknownCounter = 0 // used to track how many unclassified images in a row
let confidence: Float = 0.8
// MARK: Load the Model
let targetImageSize = CGSize(width: 227, height: 227) // must match model data input
lazy var classificationRequest: [VNRequest] = {
do {
// Load the Custom Vision model.
// To add a new model, drag it to the Xcode project browser making sure that the "Target Membership" is checked.
// Then update the following line with the name of your new model.
// let model = try VNCoreMLModel(for: Fruit().model)
let model = try VNCoreMLModel(for: CodigocubeAI().model)
let classificationRequest = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
return [ classificationRequest ]
} catch {
fatalError("Can't load Vision ML model: \(error)")
}
}()
// MARK: Handle image classification results
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation]
else { fatalError("unexpected result type from VNCoreMLRequest") }
guard let best = observations.first else {
fatalError("classification didn't return any results")
}
// Use results to update user interface (includes basic filtering)
print("\(best.identifier): \(best.confidence)")
if best.identifier.starts(with: "Unknown") || best.confidence < confidence {
if self.unknownCounter < 3 { // a bit of a low-pass filter to avoid flickering
self.unknownCounter += 1
} else {
self.unknownCounter = 0
DispatchQueue.main.async {
self.bubbleLayer.string = nil
}
}
} else {
self.unknownCounter = 0
DispatchQueue.main.async {[weak self] in
guard let strongSelf = self
else
{
return
}
// Trimming labels because they sometimes have unexpected line endings which show up in the GUI
let identifierString = best.identifier.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
strongSelf.bubbleLayer.string = identifierString
let state : State = strongSelf.getState(identifierStr: identifierString)
strongSelf.stopActiveSession()
strongSelf.navigationController?.popViewController(animated: true)
strongSelf.callBackImageDetection(state)
}
}
}
func getState(identifierStr:String)->State
{
var state :State = .none
if identifierStr == "entertainment"
{
state = .entertainment
}
else if identifierStr == "geography"
{
state = .geography
}
else if identifierStr == "history"
{
state = .history
}
else if identifierStr == "knowledge"
{
state = .education
}
else if identifierStr == "science"
{
state = .science
}
else if identifierStr == "sports"
{
state = .sports
}
else
{
state = .none
}
return state
}
// MARK: Lifecycle
override func viewDidLoad() {
super.viewDidLoad()
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewView.layer.addSublayer(previewLayer)
}
override func viewDidAppear(_ animated: Bool) {
self.edgesForExtendedLayout = UIRectEdge.init(rawValue: 0)
bubbleLayer.opacity = 0.0
bubbleLayer.position.x = self.view.frame.width / 2.0
bubbleLayer.position.y = lowerView.frame.height / 2
lowerView.layer.addSublayer(bubbleLayer)
setupCamera(flash:2)
}
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
previewLayer.frame = previewView.bounds;
}
// MARK: Camera handling
func setupCamera(flash :Int) {
let deviceDiscovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: .back)
if let device = deviceDiscovery.devices.last {
if(flash == 1)
{
if (device.hasTorch) {
do {
try device.lockForConfiguration()
if (device.isTorchAvailable) {
do {
try device.setTorchModeOn(level:0.2 )
}
catch
{
print(error)
}
device.unlockForConfiguration()
}
}
catch
{
print(error)
}
}
}
captureDevice = device
beginSession()
}
}
func beginSession() {
do {
videoOutput.videoSettings = [((kCVPixelBufferPixelFormatTypeKey as NSString) as String) : (NSNumber(value: kCVPixelFormatType_32BGRA) as! UInt32)]
videoOutput.alwaysDiscardsLateVideoFrames = true
videoOutput.setSampleBufferDelegate(self, queue: queue)
captureSession.sessionPreset = .hd1920x1080
captureSession.addOutput(videoOutput)
let input = try AVCaptureDeviceInput(device: captureDevice!)
captureSession.addInput(input)
captureSession.startRunning()
} catch {
print("error connecting to capture device")
}
}
func stopActiveSession()
{
if captureSession.isRunning == true
{
captureSession.stopRunning()
}
}
override func viewWillDisappear(_ animated: Bool) {
self.stopActiveSession()
}
deinit {
print("deinit called")
}
}
// MARK: Video Data Delegate
extension ImageDetectionViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
// called for each frame of video
func captureOutput(_ captureOutput: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let currentDate = NSDate.timeIntervalSinceReferenceDate
// control the pace of the machine vision to protect battery life
if currentDate - lastAnalysis >= pace {
lastAnalysis = currentDate
} else {
return // don't run the classifier more often than we need
}
// keep track of performance and log the frame rate
if trackPerformance {
frameCount = frameCount + 1
if frameCount % framesPerSample == 0 {
let diff = currentDate - startDate
if (diff > 0) {
if pace > 0.0 {
print("WARNING: Frame rate of image classification is being limited by \"pace\" setting. Set to 0.0 for fastest possible rate.")
}
print("\(String.localizedStringWithFormat("%0.2f", (diff/Double(framesPerSample))))s per frame (average)")
}
startDate = currentDate
}
}
// Crop and resize the image data.
// Note, this uses a Core Image pipeline that could be appended with other pre-processing.
// If we don't want to do anything custom, we can remove this step and let the Vision framework handle
// crop and resize as long as we are careful to pass the orientation properly.
guard let croppedBuffer = croppedSampleBuffer(sampleBuffer, targetSize: targetImageSize) else {
return
}
do {
let classifierRequestHandler = VNImageRequestHandler(cvPixelBuffer: croppedBuffer, options: [:])
try classifierRequestHandler.perform(classificationRequest)
} catch {
print(error)
}
}
}
let context = CIContext()
var rotateTransform: CGAffineTransform?
var scaleTransform: CGAffineTransform?
var cropTransform: CGAffineTransform?
var resultBuffer: CVPixelBuffer?
func croppedSampleBuffer(_ sampleBuffer: CMSampleBuffer, targetSize: CGSize) -> CVPixelBuffer? {
guard let imageBuffer: CVImageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
fatalError("Can't convert to CVImageBuffer.")
}
// Only doing these calculations once for efficiency.
// If the incoming images could change orientation or size during a session, this would need to be reset when that happens.
if rotateTransform == nil {
let imageSize = CVImageBufferGetEncodedSize(imageBuffer)
let rotatedSize = CGSize(width: imageSize.height, height: imageSize.width)
guard targetSize.width < rotatedSize.width, targetSize.height < rotatedSize.height else {
fatalError("Captured image is smaller than image size for model.")
}
let shorterSize = (rotatedSize.width < rotatedSize.height) ? rotatedSize.width : rotatedSize.height
rotateTransform = CGAffineTransform(translationX: imageSize.width / 2.0, y: imageSize.height / 2.0).rotated(by: -CGFloat.pi / 2.0).translatedBy(x: -imageSize.height / 2.0, y: -imageSize.width / 2.0)
let scale = targetSize.width / shorterSize
scaleTransform = CGAffineTransform(scaleX: scale, y: scale)
// Crop input image to output size
let xDiff = rotatedSize.width * scale - targetSize.width
let yDiff = rotatedSize.height * scale - targetSize.height
cropTransform = CGAffineTransform(translationX: xDiff/2.0, y: yDiff/2.0)
}
// Convert to CIImage because it is easier to manipulate
let ciImage = CIImage(cvImageBuffer: imageBuffer)
let rotated = ciImage.transformed(by: rotateTransform!)
let scaled = rotated.transformed(by: scaleTransform!)
let cropped = scaled.transformed(by: cropTransform!)
// Note that the above pipeline could be easily appended with other image manipulations.
// For example, to change the image contrast. It would be most efficient to handle all of
// the image manipulation in a single Core Image pipeline because it can be hardware optimized.
// Only need to create this buffer one time and then we can reuse it for every frame
if resultBuffer == nil {
let result = CVPixelBufferCreate(kCFAllocatorDefault, Int(targetSize.width), Int(targetSize.height), kCVPixelFormatType_32BGRA, nil, &resultBuffer)
guard result == kCVReturnSuccess else {
fatalError("Can't allocate pixel buffer.")
}
}
// Render the Core Image pipeline to the buffer
context.render(cropped, to: resultBuffer!)
// For debugging
// let image = imageBufferToUIImage(resultBuffer!)
// print(image.size) // set breakpoint to see image being provided to CoreML
return resultBuffer
}
// Only used for debugging.
// Turns an image buffer into a UIImage that is easier to display in the UI or debugger.
func imageBufferToUIImage(_ imageBuffer: CVImageBuffer) -> UIImage {
CVPixelBufferLockBaseAddress(imageBuffer, CVPixelBufferLockFlags(rawValue: 0))
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipFirst.rawValue | CGBitmapInfo.byteOrder32Little.rawValue)
let context = CGContext(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue)
let quartzImage = context!.makeImage()
CVPixelBufferUnlockBaseAddress(imageBuffer, CVPixelBufferLockFlags(rawValue: 0))
let image = UIImage(cgImage: quartzImage!, scale: 1.0, orientation: .right)
return image
}
I am getting error An AVCaptureOutput instance may not be added to more than one session'
Now I want to give user the facility to toggle flash. How to destroy active camera session and open new with flash on?
Can anyone help me also any other way to achieve this?

SpriteKit Shop Scene in game

Any idea how I could implement a shop in my spriteKit game that users could buy different players with coins they have earned in game? any tutorials out there?
This is a multi-step project that took me about 500 loc (more without using .SKS) Here is the link to github finished project: https://github.com/fluidityt/ShopScene
Note, I am using a macOS SpriteKit project because it launches much faster on my computer. Simply change mouseDown() to touchesBegan() to get this to run on iOS.
First edit your GameScene.sks to look like this: (saves a bunch of time coding labels)
Make sure that you name everything EXACTLY as we need this to detect touch:
"entershop", "getcoins", "coinlabel", "levellabel"
This is the main "gameplay" scene and as you click coins++ you get levels and can move around. Clicking the shop will enter the shop.
Here is our GameScene.swift which matches this SKS:
import SpriteKit
class GameScene: SKScene {
let player = Player(costume: Costume.defaultCostume)
lazy var enterNode: SKLabelNode = { return (self.childNode(withName: "entershop") as! SKLabelNode) }()
lazy var coinNode: SKLabelNode = { return (self.childNode(withName: "getcoins" ) as! SKLabelNode) }()
lazy var coinLabel: SKLabelNode = { return (self.childNode(withName: "coinlabel") as! SKLabelNode) }()
lazy var levelLabel: SKLabelNode = { return (self.childNode(withName: "levellabel") as! SKLabelNode) }()
override func didMove(to view: SKView) {
player.name = "player"
if player.scene == nil { addChild(player) }
}
override func mouseDown(with event: NSEvent) {
let location = event.location(in: self)
if let name = atPoint(location).name {
switch name {
case "entershop": view!.presentScene(ShopScene(previousGameScene: self))
case "getcoins": player.getCoins(1)
default: ()
}
}
else {
player.run(.move(to: location, duration: 1))
}
}
override func update(_ currentTime: TimeInterval) {
func levelUp(_ level: Int) {
player.levelsCompleted = level
levelLabel.text = "Level: \(player.levelsCompleted)"
}
switch player.coins {
case 10: levelUp(2)
case 20: levelUp(3)
case 30: levelUp(4)
default: ()
}
}
};
Here you can see that we have a few other things going on not yet introduced: Player and Costume
Player is a spritenode subclass (it doubles as a data model and a UI element). Our player is just a colored square that gets moved around when you click the screen
The player wears something of Costume type, which is just a model that keeps track of data such as price, name, and the texture for the player to display.
Here is Costume.swift:
import SpriteKit
/// This is just a test method should be deleted when you have actual texture assets:
private func makeTestTexture() -> (SKTexture, SKTexture, SKTexture, SKTexture) {
func texit(_ sprite: SKSpriteNode) -> SKTexture { return SKView().texture(from: sprite)! }
let size = CGSize(width: 50, height: 50)
return (
texit(SKSpriteNode(color: .gray, size: size)),
texit(SKSpriteNode(color: .red, size: size)),
texit(SKSpriteNode(color: .blue, size: size)),
texit(SKSpriteNode(color: .green, size: size))
)
}
/// The items that are for sale in our shop:
struct Costume {
static var allCostumes: [Costume] = []
let name: String
let texture: SKTexture
let price: Int
init(name: String, texture: SKTexture, price: Int) { self.name = name; self.texture = texture; self.price = price
// This init simply adds all costumes to a master list for easy sorting later on.
Costume.allCostumes.append(self)
}
private static let (tex1, tex2, tex3, tex4) = makeTestTexture() // Just a test needed to be deleted when you have actual assets.
static let list = (
// Hard-code any new costumes you create here (this is a "master list" of costumes)
// (make sure all of your costumes have a unique name, or the program will not work properly)
gray: Costume(name: "Gray Shirt", texture: tex1 /*SKTexture(imageNamed: "grayshirt")*/, price: 0),
red: Costume(name: "Red Shirt", texture: tex2 /*SKTexture(imageNamed: "redshirt")*/, price: 5),
blue: Costume(name: "Blue Shirt", texture: tex3 /*SKTexture(imageNamed: "blueshirt")*/, price: 25),
green: Costume(name: "Green Shirt", texture: tex4 /*SKTexture(imageNamed: "greenshirt")*/, price: 50)
)
static let defaultCostume = list.gray
};
func == (lhs: Costume, rhs: Costume) -> Bool {
// The reason why you need unique names:
if lhs.name == rhs.name { return true }
else { return false }
}
The design of this struct is twofold.. first is to be a blueprint for a Costume object (which holds the name, price, and texture of a costume), and second it serves as a repository for all of your costumes via a hard-coded static master list property.
The function at the top makeTestTextures() is just an example for this project. I did this just so that way you can copy and paste instead of having to download image files to use.
Here is the Player.swift, which can wear the costumes in the list:
final class Player: SKSpriteNode {
var coins = 0
var costume: Costume
var levelsCompleted = 0
var ownedCostumes: [Costume] = [Costume.list.gray] // FIXME: This should be a Set, but too lazy to do Hashable.
init(costume: Costume) {
self.costume = costume
super.init(texture: costume.texture, color: .clear, size: costume.texture.size())
}
func getCoins(_ amount: Int) {
guard let scene = self.scene as? GameScene else { // This is very specific code just for this example.
fatalError("only call this func after scene has been set up")
}
coins += amount
scene.coinLabel.text = "Coins: \(coins)"
}
func loseCoins(_ amount: Int) {
guard let scene = self.scene as? GameScene else { // This is very specific code just for this example.
fatalError("only call this func after scene has been set up")
}
coins -= amount
scene.coinLabel.text = "Coins: \(coins)"
}
func hasCostume(_ costume: Costume) -> Bool {
if ownedCostumes.contains(where: {$0.name == costume.name}) { return true }
else { return false }
}
func getCostume(_ costume: Costume) {
if hasCostume(costume) { fatalError("trying to get costume already owned") }
else { ownedCostumes.append(costume) }
}
func wearCostume(_ costume: Costume) {
guard hasCostume(costume) else { fatalError("trying to wear a costume you don't own") }
self.costume = costume
self.texture = costume.texture
}
required init?(coder aDecoder: NSCoder) { fatalError() }
};
Player has a lot of functions, but they all could be handled elsewhere in the code. I just went for this design decision, but don't feel like you need to load up your classes with 2 line methods.
Now we are getting to the more nitty-gritty stuff, since we have set up our:
Base scene
Costume list
Player object
The last two things we really need are:
1. A shop model to keep track of inventory
2. A shop scene to display inventory, UI elements, and handle the logic of whether or not you can buy items
Here is Shop.swift:
/// Our model class to be used inside of our ShopScene:
final class Shop {
weak private(set) var scene: ShopScene! // The scene in which this shop will be called from.
var player: Player { return scene.player }
var availableCostumes: [Costume] = [Costume.list.red, Costume.list.blue] // (The green shirt wont become available until the player has cleared 2 levels).
// var soldCostumes: [Costume] = [Costume.defaultCostume] // Implement something with this if you want to exclude previously bought items from the store.
func canSellCostume(_ costume: Costume) -> Bool {
if player.coins < costume.price { return false }
else if player.hasCostume(costume) { return false }
else if player.costume == costume { return false }
else { return true }
}
/// Only call this after checking canBuyCostume(), or you likely will have errors:
func sellCostume(_ costume: Costume) {
player.loseCoins(costume.price)
player.getCostume(costume)
player.wearCostume(costume)
}
func newCostumeBecomesAvailable(_ costume: Costume) {
if availableCostumes.contains(where: {$0.name == costume.name}) /*|| soldCostumes.contains(costume)*/ {
fatalError("trying to add a costume that is already available (or sold!)")
}
else { availableCostumes.append(costume) }
}
init(shopScene: ShopScene) {
self.scene = shopScene
}
deinit { print("shop: if you don't see this message when exiting shop then you have a retain cycle") }
};
The idea was to have the fourth costume only be available at a certain level, but I've run out of time to implement this feature, but most of the supporting methods are there (you just need to implement the logic).
Also, Shop can pretty much just be a struct, but I feel that it's more flexible as a class for now.
Now, before jumping into ShopScene, our biggest file, let me tell you about a couple of design decisions.
First, I'm using node.name to handle touches / clicks. This lets me use the .SKS and the regular SKNode types quickly and easily. Normally, I like to subclass SKNodes and then override their own touchesBegan method to handle clicks. You can do it either way.
Now, in ShopScene you have buttons for "buy", "exit" which I have used as just regular SKLabelNodes; but for the actual nodes that display the costume, I have created a subclass called CostumeNode.
I made CostumeNode so that way it could handle nodes for displaying the costume's name, price, and doing some animations. CostumeNode is just a visual element (unlike Player).
Here is CostumeNode.swift:
/// Just a UI representation, does not manipulate any models.
final class CostumeNode: SKSpriteNode {
let costume: Costume
weak private(set) var player: Player!
private(set) var
backgroundNode = SKSpriteNode(),
nameNode = SKLabelNode(),
priceNode = SKLabelNode()
private func label(text: String, size: CGSize) -> SKLabelNode {
let label = SKLabelNode(text: text)
label.fontName = "Chalkduster"
// FIXME: deform label to fit size and offset
return label
}
init(costume: Costume, player: Player) {
func setupNodes(with size: CGSize) {
let circle = SKShapeNode(circleOfRadius: size.width)
circle.fillColor = .yellow
let bkg = SKSpriteNode(texture: SKView().texture(from: circle))
bkg.zPosition -= 1
let name = label(text: "\(costume.name)", size: size)
name.position.y = frame.maxY + name.frame.size.height
let price = label(text: "\(costume.price)", size: size)
price.position.y = frame.minY - price.frame.size.height
addChildrenBehind([bkg, name, price])
(backgroundNode, nameNode, priceNode) = (bkg, name, price)
}
self.player = player
self.costume = costume
let size = costume.texture.size()
super.init(texture: costume.texture, color: .clear, size: size)
name = costume.name // Name is needed for sorting and detecting touches.
setupNodes(with: size)
becomesUnselected()
}
private func setPriceText() { // Updates the color and text of price labels
func playerCanAfford() {
priceNode.text = "\(costume.price)"
priceNode.fontColor = .white
}
func playerCantAfford() {
priceNode.text = "\(costume.price)"
priceNode.fontColor = .red
}
func playerOwns() {
priceNode.text = ""
priceNode.fontColor = .white
}
if player.hasCostume(self.costume) { playerOwns() }
else if player.coins < self.costume.price { playerCantAfford() }
else if player.coins >= self.costume.price { playerCanAfford() }
else { fatalError() }
}
func becomesSelected() { // For animation / sound purposes (could also just be handled by the ShopScene).
backgroundNode.run(.fadeAlpha(to: 0.75, duration: 0.25))
setPriceText()
// insert sound if desired.
}
func becomesUnselected() {
backgroundNode.run(.fadeAlpha(to: 0, duration: 0.10))
setPriceText()
// insert sound if desired.
}
required init?(coder aDecoder: NSCoder) { fatalError() }
deinit { print("costumenode: if you don't see this then you have a retain cycle") }
};
Finally we have ShopScene, which is the behemoth file. It handles the data and logic for not only showing UI elements, but also for updating the Shop and Player models.
import SpriteKit
// Helpers:
extension SKNode {
func addChildren(_ nodes: [SKNode]) { for node in nodes { addChild(node) } }
func addChildrenBehind(_ nodes: [SKNode]) { for node in nodes {
node.zPosition -= 2
addChild(node)
}
}
}
func halfHeight(_ node: SKNode) -> CGFloat { return node.frame.size.height/2 }
func halfWidth (_ node: SKNode) -> CGFloat { return node.frame.size.width/2 }
// MARK: -
/// The scene in which we can interact with our shop and player:
class ShopScene: SKScene {
lazy private(set) var shop: Shop = { return Shop(shopScene: self) }()
let previousGameScene: GameScene
var player: Player { return self.previousGameScene.player } // The player is actually still in the other scene, not this one.
private var costumeNodes = [CostumeNode]() // All costume textures will be node-ified here.
lazy private(set) var selectedNode: CostumeNode? = {
return self.costumeNodes.first!
}()
private let
buyNode = SKLabelNode(fontNamed: "Chalkduster"),
coinNode = SKLabelNode(fontNamed: "Chalkduster"),
exitNode = SKLabelNode(fontNamed: "Chalkduster")
// MARK: - Node setup:
private func setUpNodes() {
buyNode.text = "Buy Costume"
buyNode.name = "buynode"
buyNode.position.y = frame.minY + halfHeight(buyNode)
coinNode.text = "Coins: \(player.coins)"
coinNode.name = "coinnode"
coinNode.position = CGPoint(x: frame.minX + halfWidth(coinNode), y: frame.minY + halfHeight(coinNode))
exitNode.text = "Leave Shop"
exitNode.name = "exitnode"
exitNode.position.y = frame.maxY - buyNode.frame.height
setupCostumeNodes: do {
guard Costume.allCostumes.count > 1 else {
fatalError("must have at least two costumes (for while loop)")
}
for costume in Costume.allCostumes {
costumeNodes.append(CostumeNode(costume: costume, player: player))
}
guard costumeNodes.count == Costume.allCostumes.count else {
fatalError("duplicate nodes found, or nodes are missing")
}
let offset = CGFloat(150)
func findStartingPosition(offset: CGFloat, yPos: CGFloat) -> CGPoint { // Find the correct position to have all costumes centered on screen.
let
count = CGFloat(costumeNodes.count),
totalOffsets = (count - 1) * offset,
textureWidth = Costume.list.gray.texture.size().width, // All textures must be same width for centering to work.
totalWidth = (textureWidth * count) + totalOffsets
let measurementNode = SKShapeNode(rectOf: CGSize(width: totalWidth, height: 0))
return CGPoint(x: measurementNode.frame.minX + textureWidth/2, y: yPos)
}
costumeNodes.first!.position = findStartingPosition(offset: offset, yPos: self.frame.midY)
var counter = 1
let finalIndex = costumeNodes.count - 1
// Place nodes from left to right:
while counter <= finalIndex {
let thisNode = costumeNodes[counter]
let prevNode = costumeNodes[counter - 1]
thisNode.position.x = prevNode.frame.maxX + halfWidth(thisNode) + offset
counter += 1
}
}
addChildren(costumeNodes)
addChildren([buyNode, coinNode, exitNode])
}
// MARK: - Init:
init(previousGameScene: GameScene) {
self.previousGameScene = previousGameScene
super.init(size: previousGameScene.size)
}
required init?(coder aDecoder: NSCoder) { fatalError("init(coder:) has not been implemented")}
deinit { print("shopscene: if you don't see this message when exiting shop then you have a retain cycle") }
// MARK: - Game loop:
override func didMove(to view: SKView) {
anchorPoint = CGPoint(x: 0.5, y: 0.5)
setUpNodes()
select(costumeNodes.first!) // Default selection.
for node in costumeNodes {
if node.costume == player.costume { select(node) }
}
}
// MARK: - Touch / Click handling:
private func unselect(_ costumeNode: CostumeNode) {
selectedNode = nil
costumeNode.becomesUnselected()
}
private func select(_ costumeNode: CostumeNode) {
unselect(selectedNode!)
selectedNode = costumeNode
costumeNode.becomesSelected()
if player.hasCostume(costumeNode.costume) { // Wear selected costume if owned.
player.costume = costumeNode.costume
buyNode.text = "Bought Costume"
buyNode.alpha = 1
}
else if player.coins < costumeNode.costume.price { // Can't afford costume.
buyNode.text = "Buy Costume"
buyNode.alpha = 0.5
}
else { // Player can buy costume.
buyNode.text = "Buy Costume"
buyNode.alpha = 1
}
}
// I'm choosing to have the buttons activated by searching for name here. You can also
// subclass a node and have them do actions on their own when clicked.
override func mouseDown(with event: NSEvent) {
guard let selectedNode = selectedNode else { fatalError() }
let location = event.location(in: self)
let clickedNode = atPoint(location)
switch clickedNode {
// Clicked empty space:
case is ShopScene:
return
// Clicked Buy / Leave:
case is SKLabelNode:
if clickedNode.name == "exitnode" { view!.presentScene(previousGameScene) }
if clickedNode.name == "buynode" {
// guard let shop = shop else { fatalError("where did the shop go?") }
if shop.canSellCostume(selectedNode.costume) {
shop.sellCostume(selectedNode.costume)
coinNode.text = "Coins: \(player.coins)"
buyNode.text = "Bought"
}
}
// Clicked a costume:
case let clickedCostume as CostumeNode:
for node in costumeNodes {
if node.name == clickedCostume.name {
select(clickedCostume)
}
}
default: ()
}
}
};
There's a lot to digest here, but pretty much everything happens in mouseDown() (or touchesBegan for iOS). I had no need for update() or other every-frame methods.
So how did I make this? The first step was planning, and I knew there were several design decisions to make (which may not have been the best ones).
I knew that I needed a certain set of data for my player and shop inventory, and that those two things would also need UI elements.
I chose to combine the data + UI for Player by making it a Sprite subclass.
For the shop, I knew that the data and UI elements would be pretty intense, so I separated them (Shop.swift handling the inventory, Costume.swift being a blueprint, and CostumeNode.swift handling most of the UI)
Then, I needed to link the data to the UI elements, which meant that I needed a lot of logic, so I decided to make a whole new scene to handle logic pertaining just to entering and interacting with the shop (it handles some graphics stuff too).
This all works together like this:
Player has a costume and coins
GameScene is where you collect new coins (and levels)
ShopScene handles most of the logic for determining which UI elements to display, while CostumeNode has the functions for animating the UI.
ShopScene also provides the logic for updating the Player's texture (costume) and coins through Shop.
Shop just manages the player inventory, and has the data with which to populate more CostumeNodes
When you are done with the shop, your GameScene instance is immediately resumed where you left off prior to entering
So the question you may have is, "how do I use this in my game??"
Well, you aren't going to be able to just copy and paste it. A lot of refactoring will likely be needed. The takeaway here is to learn the basic system of the different types of data, logic, and actions that you will need to create, present, and interact with a shop.
Here is the github again:
https://github.com/fluidityt/ShopScene

Resources