Associating source and search keywords with account creation

Associating source and search keywords with account creation - ruby-on-rails

As a part of the signup process for my online application, I'm thinking of tracking the source and/or search keywords used to get to my site. This would allow me to see what advertising is working and from where with a somewhat finer grain than Google Analytics would.
I assume I could set some kind of cookie with this information when people get to my site, but I'm not sure how I would go about getting it. Is it even possible?
I'm using Rails, but a language-independent solution (or even just pointers to where to find this information) would be appreciated!

Your best bet IMO would be to use javascript to look for a cookie named "origReferrer" or something like that and if that cookie doesn't exist you should create one (with an expiry of ~24hours) and fill it with the current referrer.
That way you'll have preserved the original referrer all the way from your users first visit and when your users have completed whatever steps you want them to have completed (ie, account creation) you can read back that cookie on the server and do whatever parsing/analyzing you want.
Andy Brice explains the technique in his blog post Cookie tracking for profit and pleasure.

Related

Requesting input on conceptual ideas for disguising browser history

I am working with a Domestic Violence support organisation to build a website and have been asked to provide a "Quick Exit" function.
The purpose is to enable the user to exit the site quickly without closing the browser. I have seen such buttons on similar sites and the normal scenario is that they simply cause a Google search page to be shown. (easy but doesn't hide history)
I am looking for ideas to improve on this function to hide/disguise the history stored in the browser as this is currently a fairly significant flaw with the Quick Exit buttons I've seen to date.
I had a concept but I am looking for input on either fleshing out my concept, or other alternative directions to consider.
My concept was to have two domains: let's call them dv-site.com and decoy-site.com. The former being the source of domestic violence support information and the latter being some random content, could be anything, lets just say weather information for the sake of the conversation.
If a user navigates directly to dv-site.com the server redirects to decoy-site.com but also attaches some session specific, or perhaps single use query string or similar.
decoy-site.com validates the query string and, if valid, loads dv-site.com within an iframe or something like that so from the users perspective they are just looking at dv-site.com, though the domain recorded in history is decoy-site.com.
Links within the iframe loaded site would similarly be redirected with the same or a new query string.
If a user was to click on the browser history and go directly to decoy-site.com it would not be able to validate the query string and would just load the decoy site like a normal site. i.e. just showing weather information that exist on that site.
Domestic violence is a serious systemic issue and I would love some input from anyone who has more technical knowledge than I do on fleshing out this concept.
Other aspects I am unsure of how to tackle;
ensuring that dv-site.com can get crawled and ranked by search engines, even though users are all redirected, as it is imperative that it appears in search results so it can be found
technical aspects of a redirect that does not appear in history.
I'm unsure if it's possible to do this without all content and engagement being attributed to the decoy-site..

For the redirect, I believe that HTTP redirects do not get stored in history. You can use a 302 redirect for that. HTTP has a set-cookie header that lets you record a cookie - coupled with the headers here, you can give the decoy site access without recording it in history. Then, delete the cookie.
As far as pagerank goes, you could add a line to robots.txt as described here (the last point) to force the bot to scrape using a query parameter. Then in the backend, return the dv site only if that parameter is passed, otherwise redirect. If the googlebot removes query params when publishing, it will work out. Otherwise, it might fail.
Best of luck.

Azure - App Insights - how to track the logged-in Username in Auth Id?

What is the best-supported approach for tracking logged-in Usernames/Ids in App Insights telemetry?
A User with Username "JonTester1" said some Pages he visited 4 hours ago were really slow. How can I see everything JonTester1 did in App Insights to trouble shoot/know which pages he's referring to?
Seems like User Id in App Insights is some Azure-generated anonymized thing like u7gbh that Azure ties to its own idea of the same user (thru cookie?). It doesn't know about our app's usernames at all.
I've also seen a separate field in App Insights called Auth Id (or user_AuthenticatedId in some spots), which looks to sometimes have the actual username e.g. "JonTester1" filled in - but not always... And while I don't see any mention of this field in the docs, it seems promising. How is our app's code/config supposed to be setting that Auth Id to make sure every App Insights log/telemetry has it set?
Relevant MS docs:
https://learn.microsoft.com/en-us/azure/azure-monitor/app/usage-send-user-context
This looks to just copy one library Telemetry object's User Id into another... no mention of our custom, helpful Username/Id anyway... and most in-the-wild examples I see don't actually look like this, including MS docs own examples in the 3rd link below; they instead hardcode get a new TelemetryClient()
https://learn.microsoft.com/en-us/azure/azure-monitor/app/website-monitoring No mention of consistently tracking a custom Username/Id
https://learn.microsoft.com/en-us/azure/azure-monitor/app/api-custom-events-metrics#authenticated-users Shows some different helpful pieces, but still no full example. E.g. it says with only the setAuth... JS function call (still no full example of working client-side JS that tracks User) on the page, you don't need any server-side code for it to track custom User Id across both client-side and server-side telemetry sent to Azure... yet then it also shows explicit code to new up a TelemetryClient() server-side to track User Id (in the Global.asax.cs or where?)... so you do need both?
Similar SO questions, but don't connect the dots/show a full solution:
Azure Insights telemetry not showing Auth ID on all transactions
Application Insights - Tracking user and session across schemas
How is Application insight tracking the User_Id?
Display user ID in the metrics of application Insight
I'm hoping this question and answers can get this more ironed out; hopefully do a better job of documentation than the relevant MS docs...

The first link in your question lists the answer. What it does show you is how to write a custom telemetry initializer. Such an initializer lets you add or overwrite properties that will be send along any telemetry that is being send to App Insights.
Once you add it to the configuration, either in code or the config file (see the docs mentioned earlier in the answer) it will do its work without you needing to create special instances of TelemetryClient. That is why this text of you does not make sense to me:
[…] and most in-the-wild examples I see don't actually look like this, including MS docs own examples in the 3rd link below; they instead hardcode get a new TelemetryClient()
You can either overwrite the value of UserId or overwrite AuthenticatedUserId in your initializer. You can modify the code given in the docs like this:
if (requestTelemetry != null && !string.IsNullOrEmpty(requestTelemetry.Context.User.Id) &&
(string.IsNullOrEmpty(telemetry.Context.User.Id) || string.IsNullOrEmpty(telemetry.Context.Session.Id)))
{
// Set the user id on the Application Insights telemetry item.
telemetry.Context.User.AuthenticatedUserId = HttpContext.Current.User.Identity.Name;
}
You can then see the Auth Id and User Id by going to your AI resource -> Search and click an item. Make sure to press "Show All" first, otherwise the field is not displayed.
Auth Id in the screenshot below is set to the user id from the database in our example:
We access the server from azure functions as well so we set the user id server side as well since there is no client involved in such scenarios.
There is no harm in settting it in both places, javascript and server side via an initializer. That way you cover all scenario's.

You can also manually add user id to app insights by
appInsights.setAuthenticatedUserContext(userId);
See App Insights Authenticated users

How do I hide my API calls / routes from users of my Rails app?

I'm writing an app that make some calls to my API that have restrictions. If users were to figure out what these url routes were and the proper parameters and how to specify them, then they could exploit it right?
For example if casting a vote on something and I only want users to be able to cast one vote, a user knowing the route:
get '/castvote/' => 'votemanager#castvote'
could be problematic, could it not? Is it easy to figure out these API routes?
Does anyone know any ways to remove the possibility of this happening?

There is no way to hide AJAX calls - if nothing else, one just needs to open Developer Tools - Network panel, and simply see what was sent. Everything on clientside is an open book, if you just know how to read it.
Instead, do validation on serverside: in your example, record the votes and users that cast them; if a vote was already recorded by that user, don't let them do it again.

Your API should have authorization built into it. Only authorized users having specific access scopes should be allowed to consume your API. Checkout Doorkeeper and cancancan gems provided by the rails community.

As others have said, adding access_tokens/username/password authorisation is a good place to start. Also, if your application should only allow one vote per user, then this should be validated by your application logic on the server

This is a broader problem. There's no way to stop users from figuring out how voting works and trying to game it but there are different techniques used to make it harder. I list some solutions from least to most effective here:
Using a nonce or proof of work, in case of Rails this is implemented through authenticity token for non-GET requests. This will require user to at least load the page before voting, therefore limiting scripted replay attacks
Recording IP address or other identifiable information (i.e. browser fingerprinting). This will limit number of votes from a single device
Requiring signup. This is what other answers suggest
Requiring third-party login (i.e. Facebook, Twitter)
Require payment to cast a vote (like in tv talent shows)
None of those methods is perfect and you can quickly come up with ways to trick any of them.
The real question is what your threat model and how hard you want it to make for users to cast fake votes. From my practical experience requiring third-party login will ensure most votes are valid in typical use cases.

Rails - Tracking Referrals to Conversions

We just launched and are looking to better understand where the users who are converting to registered users are actually coming from. We can see our traffic sources and referrals via Google Analytics and our other web statistics programs, but in volume, it's difficult to tie these specifically to which users in our database have converted and from where.
We have several "goals" in Google Analytics setup to better help track conversions, but what are others doing to associate user signups with inbound traffic sources?
One thought we've been kicking around - capturing the referral on the first page load and pass it along in the session into the registration form where you store it into the user record.
Any other solutions that are working successfully for you?
Thanks!

Indeed, I would suggest storing the referrer in the user record. Then you can write some code to sensibly draw out additional data from the URL. For instance, you could parse Google URL's to determine the keywords used to discover your site. And your code could detect things like referrals from ad runs, specific SEO campaigns you're running, or partner deals you have going.
It would be beneficial to spend some time building an admin-only page to visualize these conversions to help you better learn what is working and what isn't. And when things are going well, such a page is encouraging for the whole team!

Capturing referral is a good start. You should capture it to persistent cookie instead of a session so that if user returns tomorrow it still has the same referral information.
I've created a gem to automate tracking and saving referral infos. See https://github.com/holli/referer_tracking for more info.
Some notes when designing tracking (I've tried to catch these with the gem already)
It might be better to save tracking data to separate table. So that when you delete user account you won't delete information about how that user account was created. You get the answer like "where does bogus user accounts come from?"
Save also cookies to db. If you are using Google Analytics you can parse Googles cookies to get additional information about the visitor. Like the number of visits or campaign information.
It's good to save also user_agents etc to be able to differ between mobile and desktop browsers etc.
In the end its good to visualize the information and conversions. But in the beginning its hard to know what data you want to visualize and how. So try to capture as much data as possible and then later decide how to crunch that data with scripts.

Prevent bot from crawling certain areas of site

I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings?
edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file.
I appreciate any help!

You can prevent some malicious clients from hitting your server too heavily by implementing throttling on the server. "Sorry, your IP has made too many requests to this server in the past few minutes. Please try again later." In practice, though, assume that you can't stop a truly malicious user from bypassing any throttling mechanisms that you put in place.
Given that, here's the more important question:
Are you comfortable with the information that you're making available for all the world to see? Are your users comfortable with this?
If the answer to those questions is no, then you should be ensuring that only authorized users are able to see the sensitive information. If the information isn't particularly sensitive but you don't want clients crawling it, throttling is probably a good alternative. Is it even likely that you're going to be crawled anyway? If not, robots.txt should be just fine.

It seems like you have 2 issues.
Firstly a concern about certain data appearing in search results. The second about malicious or unscrupulous user harvesting user related data.
The first issue will be covered by appropriate use of a robots.txt file as all the big search engines honour this.
The second issue seems more to do with data privacy. The first question which immediately springs to mind is: If there is user information which people may not want displayed, why are you making it available at all?
What is the privacy policy for such data?
Do users have the ability to control what information is made available?
If the information is potentially sensitive but important to the system could it be restricted so it is only available to logged in users?

Check out the Robots exclusion standard. It's a text file that you put on your site that tells a bot what it can and can't index. You will also want to address what happens if a bot doesn't honour the robots.txt file.

robots.txt file as mentioned. If that is not enough then you can:
Block unknown useragents - hard to maintain, easy for a bot to forge a browser's (although most legitimate bots wont)
Block unknown IP addresses - not useful for a public site
Require logins
Throttle user connections - tricky to tune, you will still be disclosing information.
Perhaps by using a combination. Either way it is a trade off, if the public can browse to it, so can a bot. Be sure you don't block & alienate people in your attempts to block bots.

a few options:
force the user to login to view the content
add a CAPTCHA page before the content
embed content in Flash
load dynamically with JavaScript

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart