Duplicate detection of customers - machine-learning

What are the current algorithms that exist for detecting duplicate accounts by the same costumer?
This would be for customers who are trying to hide the fact that they opened two different accounts. For example, maybe they changed their name to a shorter version or used another email account in their second account.
What kind of algorithm is used for that? Not sure if something like MinHashing is a good idea here.

Related

Detecting use of iOS's "Hide my Email" on website signup

Apple's latest changes which allow users to hide their IP, hide their email, etc. are creating problems for my web-based app (non-native) which relies upon these things to build a sense of who a person is.
In most situations, I can see why these are great "features" to have, however in my use case I have a voting platform that utilizes things like email address and IP to do a decent job at detecting duplicate votes or fraudulent vote (i.e, logins from other countries, etc.).
Now, before anyone says "These aren't foolproof ways of identifying a person" and derail my actual question: I know. I'm not looking for perfection, but these methodologies shed light on the 95%+ of people who might be trying to circumvent our voting system.
Apple placing the ability to circumvent these measures by being right up in front of the user as a first-class feature shoots major holes in my existing strategy.
Is there a way to detect if a user is utilizing these methods to where I could prompt them that they need to sign-up without using these features?
I think it would be easily justifiable to explain that, due to the nature of the application being a voting website, the ability to create multiple aliases would directly undermine the purpose of the site.
Perhaps there is an email address pattern to look for (I know in my test cases, I was getting email addresses #icloud.com).
If there is no reasonable way, I need to rethink the entire process of identifying individuals and preventing aliases (phone / text confirmation, etc).

How do I achieve single sign on and data sharing across 2 rails apps?

I am looking to set up 2 rails apps (with the same tld) which have single sign on and share some user data. If I have railsapp.com I will have the second app set up as otherapp.railsapp.com or railsapp.com/otherapp. I will most likely have railsapp.com handle registration/login etc (open to suggestion if this is not the best solution).
So lets say I sign up and upload an avatar and start accumulating user points on the main-app, I can then browse to the other-app and my profile there has the correct avatar and points total. is there an easy way to achieve this? Do the available SSO solutions create the user in the second app with the same user ID? if not, how are they tied together? (ie how can I query the other app for information I would like to be shared across the 2 - user points and avatar) I was initially looking at sharing a database or at least the user table between the 2 apps, but I can't help thinking there must be an easier solution?
I think the simplest solution is if you set the cookie on the .railsapp.com domain, then it should be sent when you do requests to otherapp.railsapp.com or any other subdomain (just stressing that as it might be a security concern). Remember to mark the cookie as secure!
And a extra bit you might need to make this work, is to store authentication tokens on a database so they can be shared between the two apps.
Disclaimer: I don't have much experience with Rails anymore, so I'm not sure if some of the frameworks like Devise can do something like this out of the box.
 Edit
Got curious and ... google had the answer: http://codetheory.in/rails-devise-omniauth-sso/

Detecting a duplicate customer

I have a bunch of customer data that is normalized into multiple tables. I want to decide the best criteria for make a best guess that a customer might be the same. There needs to be a balance between minimizing the number of duplicates but also minimizing the false positives and therefore interrupting users to ask about potential dupes.
I am looking at some combination of first/last name + phone number || email address.
The first question is, what is a good set of criteria for determining if a customer might be the same as another customer.
The second question is, for this specific application, I only want to detect duplicates for customers that have signed up within the last 2 months or so. Does this change the detection criteria at all?
How would you go about asking a customer if they are the owner of a duplicate accoount?
"Hey Sam Jones, there is another Sam Jones that has an ip in your local area, his email is sam.jones#abc.com and your latest registration had an email of sam.jones#apple.com, are you the same guy/girl?"
If the above is even close to your scenario, then you would be leaking private information. i.e. the other Sam Jone's email address.
Typically you don't allow a customer to signup with the same email address, and secondly you verify that the email address they do sign up with is valid. That way if they signup again with a mistype in the email, they can't validate it.
An important thing is to choose attributes that are unlikely to change. If you use something like telephone number or email address, you risk having duplicates any time someone changes ISPs or mobile phone providers.
If these customers are customers that have made purchases in the past, you can store a hash of their credit card number and a hash of their billing address. Whenever they make another purchase, hash their payment info and compare it to your database. (notice I said to store a
hash, NOT their actual payment info)
if this question is of still interest to you, please check this tool https://sourceforge.net/projects/deduper/
I wrote this tool mainly for the purpose that you have mentioned in this question

How does Path app know my phone number in the registration process

I started using Path, and noticed that in the registration process, they identified both my phone number and my email.
As far as I know, there is no way to programmatically get those values (without being rejected by apple), so how does path do it?
Moving my comments into an answer :)
As I've stated above, this is a duplicate of How does Square's CardCase app automatically populate the user's details from the address book?
Because Path asks beforehand for the first and last name it's easy to search for the contact in the address book. Of course one has to handle the case when a) no contact or b) multiple contacts are found. In both this cases I'd probably go with standard input fields, because for the "no contact found" case you need those anyway.
How common it is to have a contact with it's own name I don't know, but according to the Fact that Path and other apps are doing it the same way I suppose it's worth taking the risk :) AFAIK MacOS X automatically creates a contact with my name in the Address Book, but really can't recall if iOS has the same behavior.

Handling user abuse in rails

I've been working on a web app that could be prone to user abuse, especially spam comments/accounts. I know that RECAPTCHA will take care of bots as far as fake users are concerned, but it won't do anything for those users who create an account and somehow put their spam comments on autopilot (like I've seen on twitter countless times).
The solution that I've thought up is to enable any user to flag another user and then have a list of flagged users (boolean attribute) come up on a users index action only accessible by the admin. Then the users that have been flagged can become candidates for banning(another boolean attribute) or unflagging. Banned users will still be able to access the site but will have greatly reduced privileges. For certain reasons, I don't want to delete users entirely.
However, when I thought of it, I realized that going through a list of flagged users to decide which ones should be banned or unflagged could be potentially very time consuming for an admin. Short of hiring someone to do the unflagging/banning of users, is there a more automated and elegant way to go about this?
I would create a table named abuses, containing both the reported user and the one that filed the report. Instead of the flagged boolean field, I suggest having a counter cache column such as "abuse_count". When this column reaches a predefined value, you could automatically "ban" the users.
Before "Web 2.0", web sites were moderated by administrators. Now, the goal is to get communities to moderate themselves. StackOverflow itself is a fantastic case study. The reputation system enables users to take on more "administrative" tasks as they prove themselves trustworthy. If you're allowing users to flag each other, you're already on this path. As for the details of the system (who can flag, unflag, and ban), I'd say you should look at various successful online communities (like StackOverflow) to see how they work, and how successful they are. In the end it will probably take some trial and error, since all communities differ.
If you want to write some code, you might create a script that looks for usage patterns typical of spammers (eg, same comment posted on multiple pages), though I think the goal should be to grow a community that does this for you. This may be more about planning than programming.
Some sophisticated spammers are happy to spend their time breaking your captcha if they feel that the reward is high enough. You should also consider looking at a spam server such as akismet for which there's a great rails plugin (https://github.com/joshfrench/rakismet).
There are other alternatives such as defensio (https://github.com/thewebfellas/defensio-ruby) as well as a gem that I found once which worked pretty well at detecting common blog spam, but I can't for the life of me find it any more.

Resources