Custom user tracking or 3rd party service for page referral analytics - ruby-on-rails

The question I'm trying to answer for a set of users is how other users end up on their page. There are about 5 different ways a user can end up on your page. For example, they could have searched your name, clicked a link from a newsfeed or received an e-mail with a link to your page.
What is the best way to accomplish tracking these events? I'm initially inclined to create a table to track this. Each link would send an async event to the server to be added to the table. However, I'm also aware that there are many tracking services out there such as Google Analytics and Mixpanel. I've looked at their docs briefly and they don't seem to fit my need.
Am I missing something? Is it worth it to create a "custom" even tracking system to accomplish this?

It is not worth creating your own service. Plus you cannot add async link to search engine result pages or emails (that would require tracking code that you cannot implement in search engines or that would not be executed in mail clients).
Web analytics software tracks traffic sources by analyzing the incoming traffic via its http headers. If there is a referrer set the traffic will be attributed to, well, the referring site, unless the traffic is included in a list of known search engines in which case it will be attributed to organic search traffic etc.
In most systems you can customize source attribution by adding query parameters in the url (obviously this will not work with search engines and the like, since you cannot add parameters to organic search results). For example with Google Analytics you can add custom campaign parameters in email links or advertising campaigns. If people click on those links the parameter value will be send to GA and the source/medium/campaign information will be set accordingly (e.g. traffic from web mail clients would usually be attributed as a referrer, but campaign parameters allow to attribute the link to your mail campaigns).
There might be reasons to create your own system, but channel attribution is not one of them; GA and every other system I know of has this thoroughly covered.

Related

Google+ Authorship: #REL, GET Parameters and Redirects

I recently decided to start to take advantage of rich snippets to improve my personal website's content for the search engines and, IMHO most importantly, the site readers – hi, Mam! ;-). One of these are Google Authorship. Personally, I think the idea behind Google Authorship is a sound one: it helps to brings a sense of identity, personality and – arguably, most importantly – credibility to what is still largely an anonymous web.
Normally, I would link my article to Google Authorship using the following line of HTML:
<A REL="author" HREF="https://plus.google.com/112431363835029530079?rel=author">Jordan Clark</A>
However, in the instance of a website that publishes articles that are written by multiple authors, manually entering each another’s Google+ UID string starts to become a tiresome process.
Is is valid to do the following:
(a) Link to the author like so, using the script "author.php" (or other type of server-side script).
<A REL="author" HREF="/author.php?by=Alice&rel=author/[UID]?rel=author">Alice</A>
(b) The file "author.php" scripts simply do a quick check for Alice's (or whoever) User ID string provided by Google, and then uses a simple HTTP redirect header to pass this data to Google.
What I would like to know is:
Is it okay to use a local script to redirect to your Google+ user profile? (i.e. will it affect the PageRank of already indexed page or have any other unforeseen negative effects on new and indexed pages?)
Why do I not see more people linking with Google’s “prettified” version:
http://profiles.google.com/clarky.y2k?rel=author
Are there any drawbacks to using the “prettified” version of this method?
Ideally, I would like to use the intermediate PHP script, as I have already described above (see part 1). However, any tips, suggestions or other ways you may have implemented on your websites are very welcome!
For item (1), you can maintain your own app's profiles (author.php in your case) for your authors. On your own app's profile page (author.php), you would add a link from that page to Google and specify the rel="me" attribute on that link. So Alice's profile page might say something like "Find Alice on Google+.
This indirect authorship linking is supported. You also will need the link from Alice's Google+ profile that lists her as a contributor to your site. Once the linking is setup in both directions, authorship can start to show up. Authorship won't always display in all cases and can take some time for it to start appearing as Google would need to reindex your pages.
For item (2), I don't think the profiles URL will enable authorship. Some people use that URL as a vanity URL, but as far as I know it isn't supported for use with things like authorship, badges, etc.
You should test if your redirects are followed using the Rich Snippets Testing Tool: http://www.google.com/webmasters/tools/richsnippets
rel="author" is no longer supported.

Avoid robots from going into a www.domain.com/thishash when link posted to twitter, facebook

I'm building a service where people gets notified (mails) when they follow a link with the format www.domain.com/this_is_a_hash. The people that use this server can share this link on different places like, twitter, tumblr, facebook and more...
The main problem I'm having is that as soon as the link is shared on any of this platforms a lot of request to the www.domain.com/this_is_a_hash are coming to my server. The problem with this is that each time one of this requests hits my server a notification is sent to the owner of the this_is_a_hash, and of course this is not what I want. I just want to get notifications when real people is going into this resource.
I found a very interesting article here that talks about the huge amount of request a server receives when posting to twitter...
So what I need is to avoid search engines to hit the "resource" url... the www.mydomain.com/this_is_a_hash
Any idea? I'm using rails 3.
Thanks!
If you don’t want these pages to be indexed by search engines, you could use a robots.txt to block these URLs.
User-agent: *
Disallow: /
(That would block all URLs for all user-agents. You may want to add a folder to block only those URLs inside of it. Or you could add the forbidden URLs dynamically as they get created, however, some bots might cache the robots.txt for some time so they might not recognize that a new URL should be blocked, too.)
It would, of course, only hold back those bots that are polite enough to follow the rules of your robots.txt.
If your users would copy&paste HTML, you could make use of the nofollow link relationship type:
cute cat
However, this would not be very effective, as even some of those search engines that support this link type still visit the pages.
Alternatively, you could require JavaScript to be able to click the link, but that’s not very elegant, of course.
But I assume they only copy&paste the plain URL, so this wouldn’t work anyway.
So the only chance you have is to decide if it’s a bot or a human after the link got clicked.
You could check for user-agents. You could analyze the behaviour on the page (e.g. how long it takes for the first click). Or, if it’s really important to you, you could force the users to enter a CAPTCHA to be able to see the page content at all. Of course you can never catch all bots with such methods.
You could use analytics on the pages, like Piwik. They try to differentiate users from bots, so that only users show up in the statistics. I’m sure most analytics tools provide an API that would allow sending out mails for each registered visit.

Tracking users' clicks and page visits in Rails

I would like to monitor users' page visits and clicks in my Rails app to make recommendations. My questions are:
Is there a Rails gem for this, or Google Analytics is the standard? If latter is true, then how should I link a page visit to a particular user profile?
It is typical in Rails to have a section in application.html.erb, which is shared for all pages. If I add Google Analytics pageview tracking code to in application.html.erb, will it be able to track all individual pages?
There are other ways, but the vast majority probably use Google Analytics. Several gems exist that help you integrate with GA to get at the data. See here: https://www.ruby-toolbox.com/categories/Web_Analytics.
Based on your first question, it seems you may want more insight than GA can provide. I've used ClickTale (http://www.clicktale.com) and Woopra (http://www.woopra.com) before, to good effect. This article lists several other alternatives, too - notice the high marks for Clicky: http://imimpact.com/web-stats-alternatives-to-google-analytics/.
Google Analytics (and almost all of these others) will take care of your second question automatically whenever the user loads a new page, since it keyed by URL. That means that, although you put the GA script code in a single place, each unique page is tracked individually.
If you have AJAX requests that change that page without changing the URL, you'll need to dig in to the GA script API. Essentially you'll need to push a new url (possibly with a # in it) whenever you want to track an AJAX-driven link/button click. See here: http://davidwalsh.name/ajax-analytics
I am biased, but I would recommend checking out impressionist, if you need to integrate the page views into the app in real-time. With analytics you will always have some lag time and you are also relying on an external dependency. Impressionist is good if you need this kind of control, but if you are just looking for simple metrics and don't need to pull them into the app, then analytics is probably the way to go.
Check out Ahoy, at https://github.com/ankane/ahoy. With just a few lines of code in your app, you can track page views and tie them to user accounts.
You can further customize Ahoy to track custom events, both the client (with JavaScript) and server.
Ahoy does not depend on any third-party services.

How do Bit Torrent search engines work?

For a normal search engine, I understand that it regularly travel across the internet to gather web page information, and sometimes the web page can voluntarily submit to engines their latest updates. But how about BT search engines? These torrents cannot be simply find through viewing web pages. Then how do they work? User submit?
A publisher submits their torrent to a tracker, and then distributes a link to the file on that tracker. Users in turn use that file to connect to the specified tracker and download that file; the tracker then gives a list of users who are sharing that file. The torrent search sites just list what trackers are available and what files can be found on what trackers, which are submitted by publishers.
However, I think this may be better suited to something like the superuser rather than stackoverflow...
No, users do not submit torrents. As we made with our torrent search site http://tornado.li/, we've created different robots that scan all added torrent sites for new torrents and add them to database. The whole process is fully automated, only in this way it's possible to give a good choise of torrents.

Find the number of times a tweet has been viewed

There have been quite a few number of start-up pertaining to analyzing Twitter data. There is CrowdBooster, then there is Klout, which use Twitter data to tell the user their True reach.
I have got the following two questions:
1) Is there a way to find out who has viewed one's tweet, or the number of people that have viewed a tweet. Crowdbooster claims to tell you how many impression one received per tweet. How do they do it?
2) Thousands and thousands of links are shared each day on Twitter. Can we find out which user has clicked the link in a tweet?
I have looked through Twitter API and some of the companies that have licensed Twitter's Firehose, but have not found anything that meet my needs.
Also, to give you a short answer to your 2nd question. Now that we've established that view analysis is impossible. Can you find out which user has clicked on that link, absolutely. And depending on what your talking about, user as far as the user who has clicked on the link or the user that has the link on their Twitter stream. Both are possible,
in the case of A, you would get the referring users IP address. Methods vary depending on language.
But what I think your asking for is scenario B, finding out which user has the link in their Twitter stream. This can be done by querying the link, the API response you will get can include tweet entities which will list all this information out for you and more. Open up a firehose with your link and watch what comes in.
https://dev.twitter.com/docs/streaming-api/methods
1) Is there a way to find out who has viewed one's tweet, or the
number of people that have viewed a tweet. Crowdbooster claims to tell
you how many impression one received per tweet. How do they do it?
No, in the case of a view - this would be impossible. The tweet impression can happen in multiple silos. On the website, in a widget, in a mobile app. You can imagine that it's simply not possible to get the impression of a tweet on a view because of this reason and because unlike a click, there is no I viewed this tweet identifier sent when a view has been enacted. I spent a great deal of time researching for a way to get the tweet impression even based on a similar clicked link and this is not even possible. (edit: it's possible see the last paragraph) This brings us to question 2.
2) Thousands and thousands of links are shared each day on Twitter.
Can we find out which user has clicked the link in a tweet?
Yes, what these websites are mainly doing is analyzing links that you process through their website. If you can have a unique hash marker on a link then analysis becomes possible. Without a unique hash marker, Twitter will re-interrupt two of the same links in a exactly the same way, even in the case that it shortens your link to it's custom t.co wrapper.
This means the only reliable way to do tweet analysis is by including a unique link marker code on your tweet and analyze the the fact that somebody that has hit your server has clicked on that link.
There is a somewhat hidden Twitter API feature that helps you understand how popular a particular link is. That being the link count API .. http://urls.api.twitter.com/1/urls/count.json?url=
Something really outside of the box you can do if your set on analyzing multiple versions of exactly the same link without using markers and if your also using the Streaming (firehose) would be to analyze the tweet views (using the link count API) on similar links that hit your server. The link that got the +1 boost in view is the one that hit your server. But that's about the extent of creative analysis you can get with your tweets and more specifically the links, as mentioned links are the only thing your really able to analyze when it comes to Twitter.
1) Is there a way to find out who has viewed one's tweet, or the number of people that have viewed a tweet. Crowdbooster claims to tell you how many impression one received per tweet. How do they do it?
Yes, sign up for Twitter Analytics https://analytics.twitter.com (free service provided by Twitter) and you can see how many people view (impressions) for each tweet and totals for specific dates or a date range.
2) Thousands and thousands of links are shared each day on Twitter. Can we find out which user has clicked the link in a tweet?
Yes, you can do this. Using a URL shortening service like Bitly.com you can track how many clicks you had from Twitter (only give out that Bitly link on Twitter to do this). But if you want more indept information you may need to create a tracking software, as I don't know of any available. To do that you would need the tracking software to track the link and find out the refer header and see if it's from Twitter (or better yet, just give out a unique URL for your tweets), then you would need to use the Twitter API to find out the handle (username) of that visitor who clicked your link. Lastly store this information in a database so you can review who clicked what link.

Resources