I have a magnetic card reader and I'm trying to parse the information in my ColdFusion site to send off to PayPal. However, I can't seem to get the 3 digit CV code on the back.
Here's the format I get:
%B4444555566667777^DOE/JOHN G^10051010101010106000000?;4444555566667777=10051010101010106000000?
So, from this I can get:
Name: John
Middle Name: G
Last Name: Doe
Card Number: 4444555566667777
Expiration: 05/10
Where can I get that three digit number that's on the back? Or do I need it? Is it a different number when you swipe the card?
Any insight would be appreciated. Thank you in advance!
The CV/CVV code on the back is not encoded in any way on the magnetic strip, it's printed on the card just to verify that the person using it has the actual physical card.
Since a magnetic strip is easily created if someone hacks a card database and gets the card numbers, the CVV code was added to add an extra, non electronic security measure. Since the CVV is not allowed to be stored, that will never be in the database.
The CVV code is not included in the magnetic data - it cannot be read by the reading device. If you need the CVV number, include a place for your users to enter it.
Related
I'm trying to retrieve contacts phone numbers to be able to send SMS on behalf of a user. When the number includes the country code (ie starts with +, or 00) I'm fine. However when this is not the case, I'm trying to guess what the country code should be.
// contact is a CNContact with at least one phone number
contact.phoneNumbers[0].value.value(forKey: "countryCode") as? String
returns a country code like us or fr (even if it's not recommended to do so) but I've found it to sometime be inaccurate. My guess is that Apple uses the user's local. It even misclassify numbers with explicit country code. For instance a number 00 54 ... is classified as us while it's from Argentina.
I can also use the user's current local (NSLocale.currentLocale().objectForKey(NSLocaleCountryCode) as? String) and use that to fill missing country codes. But it will obviously misclassify some numbers.
Is there's a better less error prone way?
We want to identify the address fields from a document. For Identifying the address fields we converted the document to OCR files using Tesseract. From the tesseract output we want to check a string contains the address field or not . Which is the right strategy to resolve this problem ?
Its not possible to solve this problem using the regex because address fields are different for various documents and countries
Tried NLTK for classifying the words but not works perfectly for address field.
Required output
I am staying at 234 23 Philadelphia - Contains address files <234 23 Philadelphia>
I am looking for a place to stay - Not contains address
Provide your suggestions to solve this problem .
As in many ML problems, there are mutiple posible solutions, and the important part(and the one commonly has greater impact) is not which algorithm or model you use, but feature engineering ,data preprocessing and standarization ,and things like that. The first solution comes to my mind(and its just an idea, i would test it and see how it performs) its:
Get your training set examples and list the "N" most commonly used words in all examples(thats your vocabulary), this list will contain every one of the "N" most used words , every word would be represented by a number(the list index)
Transform your training examples: read every training example and change its representation replacing every word by the number of the word in the vocabolary.
Finally, for every training example create a feature vector of the same size as the vocabulary, and for every word in the vocabulary your feature vector will be 0(the corresponding word doesnt exists in your example) or 1(it exists) , or the count of how many times the word appears(again ,this is feature engineering)
Train multiple classifiers ,varing algorithms,parameters, training set sizes, etc, and do cross validation to choose your best model.
And from there keep the standard ML workflow...
If you are interested in just checking YES or NO and not extraction of complete address, One simple solution can be NER.
You can try to check if Text contains Location or not.
For Example :
import nltk
def check_location(text):
for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(text))):
if hasattr(chunk, "label"):
if chunk.label() == "GPE" or chunk.label() == "GSP":
return "True"
return "False"
text="I am staying at 234 23 Philadelphia."
print(text+" - "+check_location(text))
text="I am looking for a place to stay."
print(text+" - "+check_location(text))
Output:
# I am staying at 234 23 Philadelphia. - True
# I am looking for a place to stay. - False
If you want to extract complete address as well, you will need to train your own model.
You can check: NER with NLTK , CRF++.
You're right. Using regex to find an address in a string is messy.
There are APIs that will attempt to extract addresses for you. These APIs are not always guaranteed to extract addresses from strings, but they will do their best. One example of an street address extract API is from SmartyStreets. Documentation here and demo here.
Something to consider is that even your example (I am staying at 234 23 Philadelphia) doesn't contain a full address. It's missing a state or ZIP code field. This makes is very difficult to programmatically determine if there is an address. Once there is a state or ZIP code added to that sample string (I am staying at 234 23 Philadelphia PA) it becomes much easier to programmatically determine if there is an address contained in the string.
Disclaimer: I work for SmartyStreets
A better method to do this task could be as followed below:
Train your own custom NER model (extending pre-trained SpaCy's model or building your own CRF++ / CRF-biLSTM model, if you have annotated data) or using a pre-trained models like SpaCy's large model or geopandas, etc.
Define a weighted score mechanism based on your problem statement.
For example - Let's assume every address have 3 important components - an address, a telephone number and an email id.
Text that would have all three of them would get a score of 33.33% + 33.33% + 33.33% = 100 %
For identifying if it's an address field or not you may take into account - the per% of SpaCy's location tags (GPE, FAC, LOC, etc) out of total tokens in text which gives a good estimate of how many location tags are present in text. Then run a regex for postal codes, and match the found city names with the 3-4 words just before the found postal code, if there's an overlap, you have correctly identified a postal code and hence an address field - (got your 33.33% score!).
For telephone numbers - certain checks and regex could do it but an important criteria would be that it performs these phone checks only if an address field is located in above text.
For emails/web address again you could perform nomial regex checks and finally add all these 3 scores to a cumulative value.
An ideal address would get 100 score while missing fields wile yield 66% etc. The rest of the text would get a score of 0.
Hope it helped! :)
Why do you say regular expressions won't work?
Basically, define all the different forms of address you might encounter in the form of regular expressions. Then, just match the expressions.
I will receive 850 purchase order. In return, I need to generate and send 997 response, which include ISA/GS number. Where and who do I register with for this ISA id?
Thanks in advance
EDI systems are typically limited in scope to be between a few or even just 2 different organizations. These organizations need to decide beforehand on how much of the full EDI specification they're going to use, and how they're going to specify IDs. See here.
Also, see here. From this it looks like DUNS numbers or variants on them are common choices for IDs.
So your organization and the others need to just figure out if you're going to use DUNS number or ad-hoc made up numbers or what.
Your 850 will have an ISA (interchange) and GS (group) identifier where you will be designated as the receiver. When you generate the 997, the IDs will be reversed so that you are the sender of the acknowledgement.
Back in the day, it was important to uniquely identify yourself. X12 handles this via a qualifier/ID pair. Let's say you want to use your phone number. Your ID would be 12 (qualifier) and then 5555551212 (your ID / phone number). You could make up something arbitrary like ZZ (qualifier: mutally defined) and ACMEWIDGETSCO. Again, it should be something unique and not already found on a VAN. This is probably less probable these days than it was 10 years ago when everyone was using VANs predominantly.
Look at the below example. The IDs in this example are made up, but could be DUNS, HIN, Industry identifier, phone number, mutually defined, etc. Just for frame of reference, I used SENDER and RECEIVER.
ISA*00* *00* *ZZ*SENDER *ZZ*RECEIVER *150622*2131*U*00401*000000006*0*T*>~
GS*PO*SENDER*RECEIVER*20150622*2131*4*X*004010~
In other words, you don't need to register it with anyone, you just need to make sure it is unique on the networks you are trading on with - that's really the important part. If you're using direct connections (AS2, FTP) to your partners, it won't matter as much, but the best practice is to give your company an ID that is somewhat unique (DUNS, phone numbers, arbitrary name). If you don't understand EDI, download EDI Notepad from Liaison and that should give you a better picture of how the data is described.
I am preparing a dataset for my academic interests. The original dataset contains sensitive information from transactions, like Credit card no, Customer email, client ip, origin country, etc. I have to obfuscate this sensitive information, before they leave my origin data-source and store them for my analysis algorithms. Some of the fields in data can be categorical and would not be difficult to obfuscate. Problem lies with the non-categorical data fields, how best should I obfuscate them to leave underlying statistical characteristics of my data intact but make it impossible (at least mathematically hard) to revert back to original data.
EDIT: I am using Java as front-end to prepare the data. The prepared data would then be handled by Python for machine learning.
EDIT 2: To explain my scenario, as a followup from the comments. I have data fields like:
'CustomerEmail', 'OriginCountry', 'PaymentCurrency', 'CustomerContactEmail',
'CustomerIp', 'AccountHolderName', 'PaymentAmount', 'Network',
'AccountHolderName', 'CustomerAccountNumber', 'AccountExpiryMonth',
'AccountExpiryYear'
I have to obfuscate the data present in each of these fields (data samples). I plan to treat these fields as features (with the obfuscated data) and train my models against a binary class label (which I have for my training and test samples).
There is no general way to obfuscate non categorical data as any processing leads to the loss of information. The only thing you can do is try to list what type of information is the most important one and design transformation which leaves it. For example if your data is Lat/Lng geo position tags you could perform any kind of distance-preserving transformations, such as translation, rotations etc. if it is not good enough you can embeed your data in lower dimensional space while preserving the pairwise distances (there are many such methods). In general - each type of non-categorical data requires different processing, and each destroys information - it is up to you to come up with the list of important properties and finding transformations preserving it.
I agree with #lejlot that there is no silver bullet method to solve your problem. However, I believe this answer can get you started thinking about to handle at least the numerical fields in your data set.
For the numerical fields, you can make use of the Java Random class and map a given number to another obfuscated value. The trick here is to make sure that you map the same numbers to the same new obfuscated value. As an example, consider your credit card data, and let's assume that each card number is 16 digits. You can load your credit card data into a Map and iterate over it, creating a new proxy for each number:
Map<Integer, Integer> ccData = new HashMap<Integer, Integer>();
// load your credit data into the Map
// iterate over Map and generate random numbers for each CC number
for (Map.Entry<Integer, Integer> entry : ccData.entrySet()) {
Integer key = entry.getKey();
Random rand = new Random();
rand.setSeed(key);
int newNumber = rand.nextInt(10000000000000000); // generate up to max 16 digit number
ccData.put(key, newNumber);
}
After this, any time you need to use a credit card num you would access it via ccData.get(num) to use the obfuscated value.
You can follow a similar plan for the IP addresses.
In my BB app, i want to detect phone numbers as well as to call those phone numbers. In order to do that i have used ActiveRichTextField instead of LabelField. This field works fine to detect the phone numbers but the problem i am getting is that it fails to detect some of the numbers especially of the country Australia. It detects phone numbers of India perfectly fine but not for Australia and some other numbers. What i have done for this is posted below as:
ActiveRichTextField descField;
if (isFocaseble) {
descField = new ActiveRichTextField(replacedString.trim(),
ActiveRichTextField.FIELD_LEFT |
ActiveRichTextField.USE_ALL_WIDTH |
ActiveRichTextField.FOCUSABLE)
i have checked here if its focused or not because only numbers have to be gained focused since there are other data also which have no need to be given focus and replacedString is what the data getting from the webservice.Below are the snaps of my screen through which one can get the clear idea of my problem.
(1) Below are Numbers of Australia:
(2) Numbers of Australia
(3) Numbers of India
Can anybody have any idea regarding this? why i am not able to detect whole numbers and where am i lacking?
Any sort of help would be appreciable.