Detecting a duplicate customer - duplicate-data

I have a bunch of customer data that is normalized into multiple tables. I want to decide the best criteria for make a best guess that a customer might be the same. There needs to be a balance between minimizing the number of duplicates but also minimizing the false positives and therefore interrupting users to ask about potential dupes.
I am looking at some combination of first/last name + phone number || email address.
The first question is, what is a good set of criteria for determining if a customer might be the same as another customer.
The second question is, for this specific application, I only want to detect duplicates for customers that have signed up within the last 2 months or so. Does this change the detection criteria at all?

How would you go about asking a customer if they are the owner of a duplicate accoount?
"Hey Sam Jones, there is another Sam Jones that has an ip in your local area, his email is sam.jones#abc.com and your latest registration had an email of sam.jones#apple.com, are you the same guy/girl?"
If the above is even close to your scenario, then you would be leaking private information. i.e. the other Sam Jone's email address.
Typically you don't allow a customer to signup with the same email address, and secondly you verify that the email address they do sign up with is valid. That way if they signup again with a mistype in the email, they can't validate it.

An important thing is to choose attributes that are unlikely to change. If you use something like telephone number or email address, you risk having duplicates any time someone changes ISPs or mobile phone providers.
If these customers are customers that have made purchases in the past, you can store a hash of their credit card number and a hash of their billing address. Whenever they make another purchase, hash their payment info and compare it to your database. (notice I said to store a
hash, NOT their actual payment info)

if this question is of still interest to you, please check this tool https://sourceforge.net/projects/deduper/
I wrote this tool mainly for the purpose that you have mentioned in this question

Related

Handle payment via bank transfer for Rails

I am building a rails application for place booking. The app should be able to facilitate bank transfer (not VISA/Mastercard direct payment) for payment. Basically we let users know our bank account number. User can then pay via iBanking / go to ATM or Bank. Nah, when we received the payment, we should know whom this payment comes from and from which booking.
How are we supposed to know whom send it and for which booking it is, while there is no additional data in the transfer information other than amount of money. I heard we can apply a unique cents identifier, like when the payment is $8, we make it $8 2 cents to link it to the user who sends it and the booking data.
Is that the best practice in linking the actual payment data and the booking data? If it is, is there any ruby gem capable for generating the unique cents identifier? Or if not, is there any better approach?
Thank you for your assistance.
Bit vague, but a lot of companies that bill people, and allow the user to pay by bank transfer, require the user to put a specific reference number on the transaction, which ties the transaction back to that user's account.
It needs to be made obvious to the user (and it usually is) that if they fail to put in the right reference number then the payment won't be linked with them, and therefore won't show up as a credit on their account.
This doesn't feel like a particularly satisfactory system, as it puts the onus on the user to get it right or risk being charged extra for late payment, or have a hassle sorting it out. But, lots of successful companies seem to operate like this.

How to share a single Twilio number between multiple clients?

I'm creating a twilio service with three actors:
The customer, a person who calls a company
The company, a company who forwards calls to the service-provider
The service-provider (that's us), an entity that services the customer on behalf of the company
Herein lies the catch: The service provider needs to be able to identify the company associated with the customer but it may only use a single phone number. We cannot use multiple phone numbers for cost reasons (the margins are that low). We cannot use the caller id because a single customer may be associated with multiple companies.
I am familiar with Twilio's ForwardedFrom field but as mentioned here it isn't always reliable. In fact, forwarding from my cell-phone carrier results in a null ForwardedFrom field.
How can we (reliably) identify the company who redirected a customer to us without using multiple phone numbers?
You can use the number + extension. http://www.twilio.com/docs/howto/ivrs-extensions
Perhaps you could build a sort of phone tree system, asking the caller the nature of their problem, which would be an indicator of the company their call is related to.
My guess is that you wouldn't want an outright "which company are is your call related to?" question, because that would feel cheap to the customer. So, maybe you could formulate a question or series of questions that wouldn't be overtly asking which company their calling about, but the answer(s) would clearly indicate on the service-provider end which company the call is about.
This could be further whittled down on the service-provider end by doing a company lookup based on the customer's calling number - if it matches a certain company (or set of companies), then that automatically limits the potential company they could be calling about.
Another possibility (if it fits in your use case) is some sort of we'll-call-you setup. Perhaps the customer could text/email requesting a call, and the information they'd provide in the text/email/online-form-submission would indicate the company they wish to speak about (again, you could use questions that aren't overtly "which company do you want a call from?").
Then again, if it's such a low-margin operation, maybe the companies are ok with a phone-tree style call-in number, where the customer needs to select a company they're calling about which is then indicated to the service-provider.
This doesn't seem to be possible at this time (2013). I will keep an eye out for new answers and will accept them if this becomes possible at a later time.
In general, I'd recommend separate numbers per customer but since you say that's not an option, here's another approach:
When the call comes into the Company, that individual leg gets a CallSid which is a unique identifier. When the call is forwarded to the Service Provider, that separate leg also gets a CallSid. Let's call them CallSidOne and CallSidTwo respectively.
If you then query based on CallSidTwo, you'll get back its instance properties as listed here:
http://www.twilio.com/docs/api/rest/call#instance-properties
The key property here is "parent_call_sid" which should be CallSidOne. Therefore, you can connect the two segments together.. then you can query on CallSidOne which gives you the ability to track who called which customer called which customer when.
Does that solve your problem?
~Twilio employee

How to ensure unique users registration

I wonder what is the best way to ensure unique users. I will issue a common instruction to 100 people to my website. Once they come to my site, I'll need to allocate them to North south east west region, one after another. I also need to prevent one user from having many accounts. (The user may use another computer / their phone to access)
What is the best way to do this in grails?
There no way to be 100% sure that all users are uniq, btw you if you gather and validate as much details as possible, like:
email (you know), easy to counterfeit btw
cell number (send text message with special code to confirm number), but user can also use a friend number, or buy new one
ask for scan of person ID, and/or
address verification (require scan of bill/other pappers with full user name and address on it, or send a letter to this address with special code to confirm)
Are you controlling who gets the initial instructions, like from a contact list or something? If so could include a "registration key" and only let a particular key register once, or only let those particular email addresses be used to register (once), or even create the users ahead of time and send them the instructions to login.

Fixing duplicate records in a rails app from an autocomplete form

I'm building a Rails 3.1 application that allows people to submit events. One of the fields for the event is a venue. On the create/edit form, the venue_name field has autocomplete functionality so it displays venues with a similar name, but the user is able to enter any name.
When the form is submitted, I'm using find_or_create_by_name when attaching the venue to the event model.
I'm doing this because it's not possible for us to maintain a complete list of venues and I don't want to prevent people from submitting an event because the venue isn't in the list.
The problem is that it's quite likely we'll get duplicates over time like "Venue Name" and "The Venue Name" or any number of other possibilities.
I was thinking that I probably just need to create an administrative tool that allows the admin to review recent venues and if he/she thinks they're duplicates to search/select a master record and have the duplicate record's association copied over to the master record and once successful to delete the duplicate record.
Is this a good approach? In terms of the data manipulation would it be best to handle this in a transaction? Would it be best to add this functionality in a sort of utility class - or directly in the Venue model?
Thanks for your time.
If I were going to put together a system like that, I'd probably try to find a unique identifier I could associate with each venue - perhaps an address or a phone number?
So, if I had "The Clubhouse" with a phone number 503-555-1212, and someone tried to input a new venue called "Clubhouse" with the phone number 503-555-1212, I might take them to an interstitial page where I ask them "Did you mean this location?"
Barring that, I might ask for a phone number or address first, then present a list of possible matches with the option to create a new venue.
Otherwise, you're introducing a lot of potential for error at the admin level, plus you run into a scalability problem. If your admin has to review 10 entries a month, maybe not so bad - but if your app takes off and that number goes to 1000, that becomes unmanageable fast!

Restricting multiple votes from the same person in a picture rating web application

I'm trying to write a web application in ASP.NET MVC that allows each user to vote for multiple pictures but does not allow them to vote multiple times for the same picture. Users are not authenticated. What should I save in the database or in cookies?
Store the votes in a database table with columns PictureId, UserId, Score, and add a composite unique constraint to the columns PictureId and UserId - this will ensure that there is only a single vote per user and picture.
With anonymous users, you have two options, neither of which are very good:
1) Track the user with a user id stored in a cookie. As long as the cookie persists. the user can't vote twice. However, they can delete or otherwise modify the cookie. They might have cookies turned off. They could have two different browsers open at the same time. Scripts for "cheating" (curl http://site/vote?score=5&pic_id=1) won't store a cookie anyways. Basically, you'll end up with people voting more than they should.
1.5 *
2) Track the user by IP address. This is essentially the opposite. Users can't vote twice, regardless of deleting cookies, switching browsers, etc. However, several people from the same household (using a DSL router) can only vote once combined. Many companies will similarly hide many users behind a single IP address. I think some ISPs do, too (AOL?). You'll end up with far fewer "votes" than legitimately should have been recorded.
So the question is do you want over or under votes? If you think cheating is likely, I'd go for #2. But if cheating is likely, that means there's an incentive. And if people realize their votes aren't counted (which they may not realize), they'll be unhappy.
After that, whether you store each vote as a row, or combine the votes into a single row (update pictures set num_votes = num_votes + 1, total_score = total_score + [submitted score]) is up to you.
1.5 The third option would be to record their vote and an email address, send them the email with a confirmation link and ask them to click it to record their vote. People can still cheat with fake email addresses, but it's not as likely as deleting a cookie.
Database records for unique, authenticated users, as Daniel Brückner suggests, has to be the way forward. Cookies are unreliable as, for example, they can be deleted or a user may use a different browser.
If your users are authenticated, then you can save UserIDs with Image votes.
If your users are anonymous, then systems tend to store their IP address with Image votes. It's not perfect, it's not 100% proof, but it works in majority of situations.

Resources