OCR validations with Rails building a business card scanner - ruby-on-rails

My goal is to write a validation class for Rails that is capable of using an OCR recognised text from a business card and is able to detect string snippets and assign them to the correct attributes. I know this cannot be probably 100% perfect but I want to get as close as possible. Here is my approach so far:
I scan business cards via jquery's navigator.mediaDevices
I send the scanned image to a third party API Service, called OCRSpace (a gem is available here: https://github.com/suyesh/ocr_space)
I then get a unformatted array of recognised text snippets back, for example:
result = [['John Doe'], ['+49 160 123456'], ['Mainstr. 45a'], ['12345 Berlin'], ['CEO'], ['johndoe#business-website.de'], ['www.business-website.de']]
I then iterate through the array and do some checks, for example
Using the people library (https://github.com/mericson/people)
to split the name in firstname and lastname (additionally the title
or middlenames) Using the phonelib library
(https://github.com/daddyz/phonelib) to look up a valid phone number
and format it in an international string
Doing a basic regex check on the email address and store it
What I miss now is:
How can I find out what the name-string would possibly be? Right now I let the user choose it (in my example he defines "John Doe" as the name and then the library does the rest). I'm sure I would run into conflicts when using a regex as strings like "Main Street" would then also be recognized as a name?
How do I regex a combination of ZIP-Code and City name? I'm not a regex expert, do you know any good sources that would help? Couldn't find any so far except some regex-checkers in general.
In general: Do you like my approach or is this way too complicated? And do you know some best-practices that look better?

Don't consider this a full answer, but it was too much to make it a comment.
Your way of working seems Ok but I wouldn't use the OCR Service since there are other ways , Tesseract is the best known.
If you do and all the results are comparible presented it seems not too difficult since every piece of info has it's own characteristics.
You can identify the name part because it won't have numbers in it, the rest does, also you can expect to contain it "Mr." or "Mrs." or the such and not "Str.", "street" and so on. You could also use Google Maps to check for correct adresses, there are Ruby gems but have no experience with them.
Your people gem could also help.
You could guess all of this, present the results in you webpage and let the user confirm or adjust.
You could also RegExpr the post-city combination by looking fo a number and string combination in either order but you could also use a gem like ZipCodes to help.
I'm sorry, don't have the time now to test some Regular Expressions now and I don't publish code without testing.
Hope this was some help, success !

Related

ResearchKit: Validate email

I'm attempting to create a form step where one of the form step items is an email input. For this I want to validate the email against certain domains i.e.
#gmail.com, #icloud.com, #me.com
I can see we have an email answer format in the form of this:
ORKEmailAnswerFormat()
However I can't see anywhere in this type that allows me to apply a validation regex. Looking into this I see we have the following
ORKAnswerFormat.textAnswerFormatWithValidationRegex(validationRegex, invalidMessage)
I suppose this is my best option? If so, would anyone know of a regex (my regex isn't the greatest!) in swift that would handle the 3 domains stated above?
I have something like this...(not the greatest i know!)
[A-Z0-9a-z._%+-]+#gmail.com
[A-Z0-9a-z._%+-]+#(?:icloud|me|gmail)\.com
(or, if you don't care about capturing:)
[A-Z0-9a-z._%+-]+#(icloud|me|gmail)\.com
Now I made two modifications. I escaped the . and I made it so that the other two domains are options.
I suggest that you convert the whole thing to lower case. I don't know Swift, but you may be able to use one of its functions or the i modifier:
(?i)[0-9a-z._%+-]+#(icloud|me|gmail)\.com

Take Action based on QR code Information of VCARD, iPhone

What is the best way to handle QR code's Information, As QR code can have any information, For now, I want to handle only
1 URL and redirect to safari browser--> its fine
2 vCard and open contact book with contact values, But I'm seeing that VCARDS keys name are not unique(not sure if I'm Having improper QR codes). And also, QR value is a string to how to detect which value is for which key of address book?
e.g:
"BEGIN:VCARD
FN:Ashwin kanjariya
TEL:+999-999-9999
EMAIL:you#we.com
URL:http://www.youandme.com
N:kanjariya;ashwin
ADR:any address
ROLE:software developer
VERSION:3.0
END:VCARD"
So, I'm Not sure for VCARD all keys are universal or not? What is the best way to handle it?
I appreciate your any kind of suggest that can help me to figure out VCARD parsing.
Is CFDataCreate with ABPersonCreatePeopleInSourceWithVCardRepresentation best way to go with?(I have support for below IOS 9 as well)
like
let vCardnsdata = CFDataCreate(nil, UnsafePointer<UInt8>(vCard.bytes), vCard.length)
let addressbookDefaultSrc = ABAddressBookCopyDefaultSource(addressBook)
let vCardPeople = ABPersonCreatePeopleInSourceWithVCardRepresentation(defaultSource.takeUnretainedValue(), vCardData).takeRetainedValue() as NSArray
VCARD has several versions with slightly different implementations, keys don't have to be unique, a person can have multiple home or work phone numbers for example, but you should be able to tell what is a phone number and just accept as many as your customer believes is reasonable for their use case.
An extensive list of what you may find in VCARD's is here: https://en.wikipedia.org/wiki/VCard
If you want to make sure that all the data is stored, then you may have to implement lists, or in database terms, store items in different tables so that one to many relationships can be maintained for several items.
When designing a system to store information about people, you may also want to observe some of the Falsehoods Programmers Believe About Names

How can I map/link/associate a UUID to a random hex number

Newbie here, wrapping my head 'round this stuff!
I'd like to use the hex number as my url (external identifier) and keep the uuid within the database for a ruby on rails application. Is this even possible?
Thanks a bunch
Many people advise you against it but, yes, it is possible. It will need some code for it, and the solution depends on which version of Rails you use and what you use for the database, which is why I'm going to answer in a generic way.
You will want to have two different fields for the model: one for the external hex representation and another one for a separate UUID. Then, you can use the hex string to find instances in your controller actions, for example.
Please take a look at the following (they don't seem to have the two fields but will point you to the right direction anyway):
Problems setting a custom primary key in a Rails 4 migration
Change Primary Key Issue Rails 4.0
http://www.speakingcode.com/2013/12/07/gracefully-using-custom-primary-keys-in-rails-4-routes-controllers-models-associations-and-migrations/
And a longer post of a similar thing to do: http://ruby-journal.com/how-to-override-default-primary-key-id-in-rails/
Also, the FriendlyId gem might do what you want.

Rails to_s Mechanics

Hey guys this has been tripping me up quite a bit. So here is the general problem:
I am writing an application that requires users to enter their Summoner Names from league of legends. I do a pretty simple data scrape of a match and enter the data into my database. Unfortunately I am having some errors registering users with "special characters".
For this example I will use one problem user: RIÇK
As you can see RICK != RIÇK. So when I do the data scrub from the site I get the correct value which I push onto an array for later use.
Once I need the player names I pull from the array as follows: (player_names is the array)
#temp_player = User.find_by_username(player_names[i].to_s)
The problem is the users with any special characters are not being pulled. Should I not be using find_by? Is to_s changing my original values? I am really quite lost on what to do and would greatly appreciate any help / advice.
Thanks in advance,
Dan
I would like to thank Brian Kung for the link to the following: joelonsoftware.com/articles/Unicode.html It does a great job giving the bare minimum a programmer truly needs to understand.
For my particular issue I had used a HTML scraper to get the contents but which kept HTML entries throughout. When using these with my SQL lookups it was obvious that things were not being found. In order to fix this I used the HTMLEntities Gem to decode the text as follows (as soon as I put the into the array originally):
requires 'RubyGems' #without this cannot include htmlentries as a gem
requires 'HTMLEntries'
coder = HTMLEntries.new
line = '&lt;'
player_names.push(coder.decode(line))
The Takeaway
When working with text and if running into errors I would strongly recommend tracing the strings you are working with to the origin and truly understanding what encoding is being used in each process. By doing this you can easily find where things are going wrong.

How do I construct the cake when using Scalaxb to connect to a SOAP service?

I've read the documentation, but what I need to know is:
I'm not using a fictitious stock quote service (with an imaginary wsdl file). I'm using a different service with a different name.
Where, among the thousands and thousands of lines of code that have been generated, will I find the Scala trait(s) that I need to put together that correspond to this line in the documentation's example:
val service = (new stockquote.StockQuoteSoap12Bindings with scalaxb.SoapClients with scalaxb.DispatchHttpClients {}).service
Now, you might be thinking "Why not just search for Soap12Bindings in the generated code"? Good idea - but that turns up 0 results.
The example in the documentation is outdated, or too specific. (The documentation is also internally inconsistent and inconsistent with the actual filenames output with scalaxb.)
First, search for SoapBindings instead of Soap12Bindings to find the service-specific trait (the first trait).
Then, instead of scalaxb.SoapClients, use scalaxb.Soap11Clients.

Resources