We have a system that allows you to scan your credit card on a MSR and from the dump I pull the needed fields such as name/cc/exp. Recently we had to add globalized credit cards to this. For almost all of the card provided, I was able to still pull the information since they seemed to all follow a standard. One exception however was a Maestro card. The format is completely different, and since I neither have one to verify actual number on card vs dumped data, nore have access to any other dumps, it's very hard for me to figure out the correct format of these. I also did some google searching with little luck of extracting data from a MSR dump.
Unlike almost all other cards, track one does not start with "%B" and Track two does not start with ";". Both tracks do appear to end with "?" (based off analyzing the whole dump, not by track). Track 3 does appear to be empty, which is normal.
The whole dump seems to lack any name data and is basically in the format of:
###=###?
###=###=###==#=###?
Note that besides the single #, where I had 3 it was variable length.
Again I only had access to one single dump, which for obvious reason I cannot post here.
If anyone has some example code in any language, or can link me to some help, I'd really appreciate it.
Thanks in advance,
Anthony
Is it possible that the card you are testing is faulty or simply a non standard card that is generally not supported? try to check track data from other maestro cards before assuming your system is at fault.
I say this because ISO 7813, the governing standard for transaction cards is pretty clear regarding the fact that track 2 data begins with start sentinel ";" and that all valid bank cards have a format code "B" following the start sentinel "%" in track 1.
check the standard carefully and make sure your system is parsing correctly:
http://www.gae.ucm.es/~padilla/extrawork/tracks.html
Related
Does anyone know what ZCD may refer to? It is described as a segment with a link back to PreManage for the patient!
Can anyone please provide more details?
The Z segments (segments those begin with the letter "Z") are custom segments. Those are not defined in specifications. They vary from vendor to vendor. Vendor may publish a document explaining usage of segment. Two connected parties should know in advance and decide the usage by mutual understanding.
As those are custom, and if there is no way to know what data they contain, it may be safe to neglect them hoping the sender have not put critical data in it.
Please refer to this:
Z-segments can be inserted anywhere in the HL7 message. A popular approach is to place the Z-segment within a group of segments that contain similar information, such as insurance. Z-segments are also often placed at the end of the message. The advantage of doing so is that this placement prevents systems configured to parse “standard” HL7 format from requiring any configuration modifications in order to process the message. The application simply reads the segments in the order expected and then extracts the data from the Z-segment (if needed) via parser modifications.
Working with unexpected Z-segments
Sometimes systems may send unexpected Z-segments, whether or not they were part of the original specifications. Even if you are not interested in the data in the Z-segment, you may still (depending on its location) need to take the segment into account while testing and developing your interface.
Rather new to Bluetooth Low-energy devices, and having recently purchased a bunch of trackers off Amazon, decided to write a little application to see what type of information I can get from these.
The trackers are from a Chinese company, and they don't have a ton of information around advertisement information, so I'm playing by best guess here.
What I've been able to achieve so far, through Flutter Reactive BLE, is to find the devices by their ID (filter out additional noise I don't care about) and pull information like RSSI, Name and ID from it.
Now I want to interpret the manufacturerData object, screenshot attached of just one of them, and can't seem to get anything concrete from it.
I half assumed that reactive_ble would've stripped the leading checks and only supplied the the necessary portions of the data object that's relevant to interpret, however, this does not seem to be the case.
My first feel was to just convert this UInt8List to String utf8.decode(device.manufacturerData), however, this returns either a 1x spaced string or nothing at all.
I've tried using ByteData with a start of 3 and end of 4, and that's not very helpful either.
Is there something I'm missing in it's interpretation? I've read the Bluetooth spec and as I don't come from a CompSci background, is rather foreign to me, so would appreciate a layman response.
The first 16 bits (little endian) in manufacturer data contain the manufacturer id (Bluetooth SIG's web site has a list). The layout of the rest of the bytes are totally up to the manufacturer. If you can't guess what they mean, you'll have to ask the manufacturer.
Local travel cards in Saint-Petersburg, Russia have got huge id numbers that aren't easy to read and type into a web page when topping up the card online. So I want to build a small app that would take a photo of a travel card and parse the number out.
The task is a bit easier than a free form recognition:
card is of the very well known size
id numbers are of known size, are located in the very well known location on a card and they are number only, no letters (okay, there are two variations I think and maybe they will add 1-2 more in the future)
even the font is known in advance
even the first several numbers are the same for most of the card (so far there are only two prefixes used)
How would you do it? Are there any libraries tuned not for the general OCR, but for a "hinted" OCR like I need?
Best regards,
Artem.
P.S.
Actually a free/cheap web service for this task would also be good enough
Yes Google has a library called Tesseract and there is an iOS SDK on Github you can import into your application. So you can use this SDK and it has some documentation that you can read that will explain how to set it up in your app. It has methods that will return you a string with the text of the card in the string. BUT it will be ALL of the text from the card. So best thing to do would be to:
1 "clip" the original image to extract a sub image that displays only the portion of the card you wish to get the numbers from.
2 Process this sub image through Tesseract to retrieve the string you are looking for.
3 Then parse through the string and pick out the data that you need.
But just be warned, it can be a bit quirky. This SDK tends to recognize words best from images that are scanned, not "taken a picture of". Because although it is an advance piece of technology, it isn't perfect. So to get it to work as perfectly as possible for you, try to get scanned copies of the originals.
Best of luck.
The ideal solution for you would have three components:
1) Detection of the card. This is useful because if you have the detection, then the end users have much easier time actually using the scanner, because they can place the phone above the card in an arbitrary direction
2) Accurate OCR component. Ideally, customizable for this exact font you have on the card, for the exact position on the card.
3) Parsing mechanism. This would enable you to obtain the exact string written on the card without writing huge amount of OCR parsing code.
BlinkID SDK has all this. It has a preset for detection cards in the ID-1 format. It has integrated OCR engine. And it provides RegexParser, where you can define the exact format of the text which you're trying to extract from the document.
BlinkID was initially built for scanning ID documents which have very similar properties as the problem you're trying to solve.
Note. I'm one of the developers working on BlinkID.
I have a slightly unusual profanity-related question.
Now we're used to dealing with profanity-filtering of user-generated content — any method is imperfect, but products like CleanSpeak and WebPurify do a good-enough job.
The problem we have at the moment, though, is that we've been building an engine to run promotional-code–based competitions, that will be used internationally. We could do with checking that none of these codes is profane in Latin American Spanish or Malay (at least in the first instance), to make sure we don't send out a code that's equivalent to FUCK23 or PEN15 or something.
We've tried Googling around and asking people we know, but we can't find an easy way of getting hold of an es-419 or an ms profanity list to filter the codes against. As there are literally millions of codes per locale, we'd rather do an offline check than hit an API for each code (which would be expensive both in terms of bandwidth and usage fees).
I know this is a bit of a long shot, but does anyone know of a good source for profanity lists in different languages?
#disclaim: We know that no profanity filtering is perfect, that it's essentially futile with user-generated content and we have read SO #273516: How do you implement a good profanity filter? — that's not what we're asking.
Building or finding lists in other languages is extremely time consuming and difficult (trust me, we've built many of them at Inversoft). You might be better off tweaking the code generators instead (from what I could tell your code is generating the promotional codes rather than humans).
The best way to tweak a generator is to ensure that the codes can't easily form words based on the general use of consonants and vowels in most European languages. Things get a bit dicey in Polish and others, but it usually works.
Generally, most codes that start with a vowel are followed by another vowel or a non-joining consonant (like 'q' without a 'u'). If the code starts with a consonant then the next character is the same consonant or one that has a low probability of being used. For example, if you start with 's' then adding 'g' is a good choice.
You could also use wiktionary or other similar sources (like Linux dictionary files) to build a statistical approach to this. By extracting the probability of characters being next to each other, you should be able to generate codes with good accuracy of never being words in any language.
However, if I misread your question and you aren't generating the codes programmatically, you can ignore my response completely. :)
I have had the same thoughts. in trying to generate 6 character codes for a project i am doing.
I decided to reduce the likelyhood of obvious porfain codes So i removed the vowels that i found in as many "bad" words as i could think of, from my intial base 36 generation code. Leaving me with something more like a base 28 system that did not include a,e,i,o,u, 1,0. the one and zero were removed to reduce confusion between those characters in some fonts with I,L,O's
so far I have not seen a "profain" code genreated. Although base 28 has 1.something billion unique combinations.
i cannot vouch for other languages, and had not even considered it...
Are there any open source/commercial libraries out there that can detect mailing addresses in text, just like how Apple's Mail app underlines addresses on the Mac/iPhone.
I've been doing a little online research and the ideas seem to be either to use Google, Regex or a full on NLP package such as Stanford's NLP, which usually are pretty massive. I doubt iPhone has a 500MB NLP package in there, or connects to Google every time you read an email. Which makes me to believe there should be an easier way. Too bad UIDataDetectors is not open source.
I know this question has been asked before, but there were no conclusive answers, so here's my try.
As for Python you can try Pyap:
https://pypi.python.org/pypi/pyap
It currently supports US and Canadian addresses
Parsing addresses isn't a science. At my office we have been dealing with address parsing for years and the problem is that there aren't any rules about what constitutes a valid address. We use the USPS address database for cleaning addresses which is actually pretty fast and way more accurate than we were ever able to get on our own. It gets us 98% accuracy where as before we got about 90% cleaned addresses.
The bigger problem with address parsing tends to be that people don't input the address the same way. The same address might be in all the following forms.
128 E Beaumont St
128 East Beaumont Street
128 E Bmt St
128 Beaumont Street
128 Highway 88
The third one looks totally wrong but people will type that sometimes. Sometimes a street is also a highway. There are a bunch of possibilities. Just try to catch 90% and you accept that is as good as it gets for address parsing.
Extractiv provides commercial NLP powered by Language Computer Corporation that can parse entities and relations in either uploaded documents or from web crawls. The former service utilizes a REST API. I dropped this URL in, and it extracts 4/5 of the addresses. Note, having them strung like that together makes them especially difficult.
Search for "address" in this JSON output:
http://rest.extractiv.com/extractiv/?url=https://stackoverflow.com/questions/5099684/detect-parse-mailing-addresses-in-text&output_format=json
One of them:
{
"id": 11,
"len": 17,
"offset": 1557,
"text": "128 E Beaumont St",
"type": "ADDRESS"
},
(Note: if you use the HTML output, which is more for demos, it filters out non-sentence content, which is why I showed the JSON instead).
Disclaimer: I work at Extractiv.
Update:
Extractiv is no more.
You can actually get extremely high accuracy as Drew mentioned by extracting the addresses and then comparing them against the USPS data. Getting a DVD from the USPS yearly will certainly work but doesn't factor in the addresses that change. For that, you would want a more up-to-date version. The USPS publishes it's updated address data (in proprietary format) monthly so that would be a good source of authoritative addresses.
On top of that, using an address validation service (after you extract the address data) will standardize the addresses for you and then check them for deliverability and/or vacancy status. As Drew mentioned, the same address can be written in many different ways that still work. However, the USPS will always use the standardized format.
In order to do what you are looking for programmatically, you'll definitely want an API, although list processing services are also available.
SmartyStreets has a free address validation API called LiveAddress that will standardize, verify, and then validate any US postal address. In the interest of full disclosure, I'm the founder of SmartyStreets.