Best way to very quickly match a needle against a large haystack

Best way to very quickly match a needle against a large haystack - ios

Think of an employee management system application, now add a biometric machine to it. Users can swipe their finger to clock in and out of the system as they enter and exit the building.
So far my application currently can successfully enroll fingerprint templates retrieved from my biometric machine and have it saved into the database in a blob/varchar field.
I have over 3000 registered users in this local database of mine on a local mac app that I created. Now I am implementing the matching algorithm, and wondering whether I should:
load all 3000 fingerprint templates from the local database - the haystack - into an NSMutableArray with one database call that would be done at application startup and then match the user's fingerprint template - needle - against the haystack for each user that enters or exits the building;
Make multiple SQL calls returning one element of the haystack at a time, compare it with the needle, and if they dont match, return another element from the haystack until they match. Or,
Grab portions of the haystack (say a 100 fingerprint templates at a time) and load that into an NSMutableArray and then match the needle against that haystack subset.
Which is best for a lightning fast approach?
The fingerprint SDK I am using atm, is meant to be lightning fast in terms of the fingerprint verification itself, but I need to handle the database side of things and managing the haystack and needle myself and would ideally like that to be fast too.
I am currently using SQLite, if I need adopt Cocoa Core Data to fulfil a better approach to what I wish to accomplish, then I am all ears and can adopt the system to provide a better user experience in terms of speed
Here's an example of what one fingerprint template data looks like in a base 64 encoded string format.

3000 records seems like a tiny haystack. How big is a template? I'd almost certainly just load it all up into memory if you need to do complicated comparisons provided the templates aren't outrageously large.
The various approaches don't sound difficult to implement. Which way are you doing it now, and what's the performance bottleneck? Start by implementing simply, and then see if there's a problem. Often there isn't, and you spend a lot of time making things complicated for nothing.

3800 base 64 chars is only 2500 bytes...
just test it, try strnstr (null chars will stop it from searching), I bet you can search that hundreds of times a second.
be sure to search decoded otherwise because of alignment your needle could look like lots of different things.

Related

I want to make a procedurally generated virtualized drive - how can I do this?

I've had the idea for a procedurally generated virtual drive bouncing around my head for a long time. It wouldn't really have any uses, and really it would just be for the memes, but I've finally settled down and decided to make it.
The idea is to make a fake drive that, whenever a program asks for a sector from, a bit of code will generate that sector on the fly (instead of reading it from a storage medium like a normal drive). Of course, writing to the disk would be impossible, but that's ok - I'm just doing this for fun.
Question is: how do I actually make this appear as a drive?
I should think there's a library out there somewhere that will let you directly, but I haven't been able to find it yet. I don't really know what keywords I should be searching for, either.
I have a lot of experience with Arduinos and hardware - would it be easier to simply connect the SPI pins to an SD card slot and get the Arduino to generate the "sectors"?
I'm thinking of using this to play around with file systems and ridiculously large files - after all, there is no limit to how big the drive can be (since it doesn't require any real memory), besides 32 bit or 64 bit limits, which could be a lot of fun - if only to pretend you have a zettabyte of disk space. Even though it would be read only, I'm curious how Notepad would handle a petabyte txt file.
If anyone has any ideas on how to do this, or knows some better keywords I can use to search for this, let me know!
(I am fairly fluent in Python, Arduino, and I can do a little C if I sit down with a coffee)

You didn't specify how you wanted to tackle this, but on Linux there's Fuse, the Filesystem in Userspace. It's a library that allows you to implement functions that handle the usual filestem operations, i.e. enumerating files and directories, opening files and reading/writing to them.
It also has a python wrapper and a tutorial.

Game Data Persistence Across Devices

I'm still in the very early stages of a game, and I am saving information using a KeyChain wrapper class someone made. I wanted to ask for some advice early on, since I have time to change my approach.
My game has the potential to persist a fair amount of data about the player and what they've done, such as:
how much gold the player has
what items they've acquired (you can get about 50 items)
what skills, spells, and abilities they've chosen for their character
what experience level they are, max health, stats, etc
The reason I decided to store this in KeyChain was that I was told it's encrypted and much more difficult to tamper with. I felt there were other solutions such as the ones below, but I wrote some potential reasons why that might not be good:
Make everything web-based, and stored in a database somewhere on my server - I want my game to be playable offline
Use a local database (FMDB, let's say) - I could use a tool to edit the values directly, and give myself more health, etc.
Use Core Data - Never used this before, not sure if this is the same ease of tampering as #3?
GameCenter - Never used this before, so not sure what the lift is
NSPreferences - Preferences are more easily tampered with (i've used a tool to change the values pretty quickly)
So I am not sure if i'm completely wrong above, but let's say there is some degree of truth there and KeyChain is a good approach. The problem is now what if I want to then somehow allow the player to pick up a new device and pick up where they left off? How on earth would I serialize all that data I'm saving in keychain? I don't mind creating a giant JSON document of the values, and sending it along somewhere (where? to GameCenter?)
Any advice / pointers in the right direction would be good, especially now since i'm in the early stages and can make changes to step back.
Thanks so much everyone appreciate your time!

A few thoughts based on lessons learned (usually, "the hard way") that may (or may not) be helpful. :)
I see three requirements in your post: offline play (requiring local storage), data security (which is a massive topic in and of itself) and synchronization.
Playable offline:
Thus you need some sort of local storage. Keychain, Core Data, SQL, NSPreferences are all options. I don't know the limitations of the Keychain, so not really sure how suitable it is for continuous read/write of large chunks of data.
Data security:
They keychain protects your secrets when you're not logged in, and partitions them between apps. https://developer.apple.com/library/content/documentation/Security/Conceptual/keychainServConcepts/02concepts/concepts.html and http://evgenii.com/blog/sharing-keychain-in-ios/ give more details. That should prevent other tools from modifying other app's content on non-jailbroken devices.
Core Data, SQL, et al will be inside your app's container, which makes it harder for other tools to access on non-jailbroken devices. There's a good description here: https://developer.apple.com/library/content/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileSystemOverview/FileSystemOverview.html
NSPreferences doesn't offer any security that I'm aware of. Plus, if they jailbreak the device, they basically have root access on a linux machine and can probably do anything they want.
In today's security world, the mantra is generally, "assume breach." As in, assume if they want in bad enough, they're going to find a way in. Thus you need to think about other layers of defense: partitioning your secrets so compromised secrets have limited value, encryption at rest and encryption in transit. Obfuscating the data before you write it/transmit is therefore another layer of defense (although they may still edit it and you'll have to handle potential garbage values).
You can chase security pretty far down the rabbit hole; you'll have to decide where the cost outweighs the benefit, and/or what risks you intend to defend against.
Synchronization across devices:
Thus you need a syncing solution. the iCloud Keychain syncs between devices, so if the keychain meets your storage needs, this probably will meet your sync needs, too. Again, I'm not sure if there are size or frequency-of-update constraints. I haven't used this, but this SO answer gives more info: https://stackoverflow.com/a/32606371/1641444. Based on Apple's docs (https://support.apple.com/en-us/HT204085) it looks like the user does have to enable syncing for this to work.
Otherwise, Apple provides GameCenter and CloudKit. Or you could explore 3rd party options. I've used GameCenter and CloudKit to sync data across devices. CloudKit wins over GameCenter, IMO, no contest.
With GameCenter, you gain multiplayer matchmaking and channels to share data between users. But, you have to adhere specifically to its structure which is IMO both extremely frustrating to work with and limited in functionality. Check the GameCenter and GKTurnBasedMatch tags here on SO for a taste of the problems.
CloudKit is a lower-level solution. It allows you to store a wide variety of data objects in iCloud and "subscribe" to change notifications. All data is contained within a container for the app, and has a "private" (user-specific) section and a public (shared by all users of the app) section. After watching the introductory WWDC videos on CloudKit, I was successfully sync'ing user settings between devices AND different users on different devices in no time. However, if you want some of the multiplayer features of GameCenter, you'll have to build the data model/subscriptions yourself. Since you support offline play, you'd need to save your data to a local storage solution and periodically upload it to iCloud for sync.
Conclusion (aka TL;DR)
So, my opinion is: none of the tools individually meet all three of your requirements, and you'll end up rolling your own solution for at least 1 req, regardless of which option you go for. In my multiplayer game, I'm trusting in Apple's filesystem (containers) to provide sufficient data security, I'm using a local storage option within the app's container, and I'm periodically writing NSData objects to cloudKit for synchronization across devices. I hit my frustration limit with GameCenter and pulled it from my app.

Star for a good question!!
I'm doing something similar with this. Right now, I'm using UserDefaults for primary storage. When I need to transfer data, I can save it to iCloudKit, or I could export it as a plist or json to use with AlamoFire, email, etc.
As for actually storing the data, I saw you mentioned CoreData.. that is kind of overkill! Look at NSCoder and NSKeyedArchiver.
Personally, I make my own save/load functions that manipulate dictionaries, then toss them into a UserDefaults key.
An example hierarchy of how I store dicts in UserDefaults (there are many ways to go about this)
Key:
Items->Equipment->Weapons->WeaponName
Value:
{ dictionary of data like AP, Cost, etc }
This to me is easier than using NSCoder, and certainly easier than CoreData!! The goal is to turn everything into a dictionary, which you can then easily put into UserDefaults, which allows you to easily create plists or save to the cloud.
With the above key system, you basically just do for loops to find what you need, parsing out each section.. so a function to display all equipment, you just load the entire UserDefaults.standard.dictionaryRepresentation, then search for keys that only have 'Items->Equipment' and so on.
Hope this helps!
I have some functions from my game I could share if you need more tips.
UPDATE:
If you are making this an online game, I would focus on learning the essentials of persistence and cloud usage first, then port it to a more secure platform that uses encryption.
Your best bet will be to create your own servers and transfer data that way. That will give you exactly what you want, when you want it, and with the security that you want.
If you want to use GameCenter, I would use GC as an in-between layer of something custom you created, so that way you can filter out the cheaters' scores / have more flexibility.

shazam for voice recognition on iphone

I am trying to build an app that allows the user to record individual people speaking, and then save the recordings on the device and tag each record with the name of the person who spoke. Then there is the detection mode, in which i record someone and can tell whats his name if he is in the local database.
First of all - is this possible at all? I am very new to iOS development and not so familiar with the available APIs.
More importantly, which API should I use (ideally free) to correlate between the incoming voice and the records I have in the local db? This should behave something like Shazam, but much more simple since the database I am looking for a match against is much smaller.

If you're new to iOS development, I'd start with the core app to record the audio and let people manually choose a profile/name to attach it to and worry about the speaker recognition part later.
You obviously have two options for the recognition side of things: You can either tie in someone else's speech authentication/speaker recognition library (which will probably be in C or C++), or you can try to write your own.
How many people are going to use your app? You might be able to create something basic yourself: If it's the difference between a man and a woman you could probably figure that out by doing an FFT spectral analysis of the audio and figure out where the frequency peaks are. Obviously the frequencies used to enunciate different phonemes are going to vary somewhat, so solving the general case for two people who sound fairly similar is probably hard. You'll need to train the system with a bunch of text and build some kind of model of frequency distributions. You could try to do clustering or something, but you're going to run into a fair bit of maths fairly quickly (gaussian mixture models, et al). There are libraries/projects that'll do this. You might be able to port this from matlab, for example: https://github.com/codyaray/speaker-recognition
If you want to take something off-the-shelf, I'd go with a straight C library like mistral, as it should be relatively easy to call into from Objective-C.
The SpeakHere sample code should get you started for audio recording and playback.
Also, it may well take longer for the user to train your app to recognise them than it's worth in time-saving from just picking their name from a list. Unless you're intending their voice to be some kind of security passport type thing, it might just not be worth bothering with.

Creating PDFs from iOS text fields

I'm working on the requirements & specifications for a new iOS app intended for use by certain professionals working "in the field". All day long for weeks on end, these folks have a sizable reporting burden to their superiors using standardized forms that track all different kinds of information. Traditionally, those forms are in PDF, and are simply printed and filled out in ink and then shared with the dozens to hundreds of others working the same operation. Sometimes they'll use a PDF with form fields so the data can be typed and then printed as part of the form. Either way, given their workflow, time and stress pressures, and other factors, it's not a very productive way to get the standardized reporting forms done.
The app we're spec'ing would offer an iOS (and Android, if possible -- but secondary or even tertiary requirement at this point) user interface for tracking the data they enter in the field, organizing it in a logical manner for each individual user, and with the press of a button, take all that data and automatically create a PDF file of it using the standardized form.
Of course, the forms are STRICTLY and rigidly standardized in this industry, and any deviation in format, structure, or presentation is simply not tolerable.
So I was approaching the project by thinking the app would maintain an internal repository of the original standardized forms from the accrediting organization, with each possible data area defined as a field. The app would:
open the necessary PDF form for the task at hand;
parse its dictionary to identity the specific data fields;
for every single field, identify the relevant data from the iOS app's own user interface and data tables, and assign that data to the corresponding field from the PDF/dictionary
export the PDF to a NEW PDF file, which the app would either email or store through iCloud, Dropbox, or some other form of file sharing.
The catch with #4 is that that PDF file must remain editable by standard PDF applications on Windows and Mac (Acrobat, Preview, etc.), so all the fields need to remain. And the PDF should be viewable just the same on either Windows or Mac.
Now, at NO time will the PDF (neither the original nor the exported final document) EVER need to be displayed inside the iOS app, nor would it make much sense to be able to do so.
I don't know if any of this is possible. This is our first iOS project, and we've been leaning towards building the app using Moai or Corona or some other framework to save development time and make porting across platforms easier. That said, if it cannot be done using Lua and one of these frameworks (I remain skeptical...they seem HIGHLY geared towards games), we're not opposed to doing it directly in Objective C and building an Android version some time down the road.
But either way, I'm at a loss in assessing whether this is even a practical undertaking. Our requirements are clear, and frankly if this can't be done, the project won't be pursued any further. But I could definitely use some help from you folks in identifying what my options are, whether I can do it in Lua, and what SDK(s) would be most useful in accomplishing this.

Based on what you've said, it seems that there is little reason to do the PDF-based part of the work on the mobile device itself since:
you don't need to display it on the ipad
you plan to email it or store it in the cloud
if you write this for iOS you will have to write again for Android as you've mentioned
Can you simplify the mobile part of your requirement by focusing on the data-collection and validation, then firing off to a server to do the document production? That will give you a lot more flexibility in the tools that you can use to merge the data into PDF docs. If so you could look at creating PDFs or populating the fields from code using something like iText (C# or Java). If you don't want to build your own back end server you could try something like Docmosis Cloud - but that might not allow you to get your precise layouts.
Certainly the catch you mentioned - needing to keep the PDFs editable with their fields is a significant gotcha in all cases. If you could convince the stakeholders that it is better to generate the final documents from your system (generate draft, review, update data, generate again etc) - rather than generating editable documents that you then lose control and tracability over, then you will be miles ahead.
Hope that helps.

Did you consider just generating a new pdf using an image of the form as the background to the pdf and just writing the user's data into the required areas over the form image. Would reduce the complexity of trying to parse the original form PDFs.

That's a point of worthwhile discussion, but one we don't have an ideal answer on. I tend to think of that as the almost perfect scenario -- it'd be considerably easier to develop. There are two key issues with this approach that have made us table it except as a very last resort:
The users of this product would be working in the field. That field could be quite literally anywhere--the streets of Manhattan, a disaster-stricken area with infrastructure that's been severely damaged or even destroyed, or the most war-ravaged third world country. If it were the streets of, say, Manhattan, there's no problem--their iOS or Android device will have 3G or Wi-Fi access just about anywhere they go. In the latter two scenarios (which are arguably more common in this industry), that connectivity may be very limited. The concern is whether the end user's ability to be productive or to see and share data with their colleagues will be too greatly restricted if they don't have a decent signal. To be fair though, even today they often aren't even using mobile devices, forcing them to go back to a headquarters type location or use radios to share information, effectively negating my point here. But if we're not going to significantly increase their productivity in the field, it just gives us pause to think through whether or not we have enough of a value proposition to ask them to fairly significantly change their methods of doing things.
To your latter point, no there's no convincing the stakeholders that this new system is the better approach. Even if there were, it would take years to do so. These forms are a part of a well-defined, decades-old standard used by literally thousands of organizations.

Need some input on how to build a large scale text replacement system

My Rails app deals a lot with data from third-party APIs (specifically UPS, FedEx, DHL, etc).
What I'd like to do is whenever that data comes in, replace certain phrases with customized phrases.
Example: "On FedEx vehicle for delivery" (which we get from the FedEx API), I'd like to replace with "Out for Delivery."
Is it best to replace the the text on its way in to the database? Or on output? (Talking from an end-user speed perspective)
I'm planning on storing these phrases in our database, so I'm assuming I'd just create a helper that pulls the phrases I want to replace and then run the strings through those using gsub and replace as necessary?
Any tips on making this efficient and easy to manage would be great.

For speed you should replace the phrases when they enter the database. If you do it on output you'll have to do it every time an user requests the data. It is quite obvious that doing it every time will put more load on the server.
You may, however, want to store the original phrases, in case you want to change the wording in the phrases you replace with.

Just a random idea, which might not be applicable depending on how your data is, but maybe you could leverage the i18n framework that's built into Rails for this. The original text could be viewed as a separate language called vendorspeak :-).

Categories

HOME

imagemagick

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart