How does Buzzfeed's "Pound" work? - url

A week ago Buzzfeed announced a new viral traffic tracking tool called "Pound" (Process for Optimizing and Understanding Network Diffusion). Whereas marketers and webmasters are currently used to seeing social traffic in aggregate buckets per source, Pound promises to help us visualize the actual person-to-person sharing of content and the traffic resulting from each step... sorta, apparently the tool can't (or opts not to) match individual users to their corresponding node in the network:
Pound does not store usernames or any personally identifiable information (PII) with the share events. Each node in the sharing graph is anonymous. We are not able to figure out who a user is by looking at the graph data.
Interesting. I assume Buzzfeed is keeping this anonymous to preempt complaints when the company uses Pound to sell ads. More interesting, the hint the Buzzfeed engineers provide as to how this tool works:
Pound data is collected based on an oscillating, anonymous hash in a sharer’s URL as a UTM code.
How might this work? Does the UTM code mutate every time a link is shared or reshared? I don't understand how this is possible. If it's not, how might this functionality be possible?

Related

Can a group of 3 researchers share/pool Twitter API tokens to accelerate/improve data collection on a sentiment analysis project?

Our group is working on a sentiment analysis research project. We are trying to use the Twitter API to collect tweets. Out aimed dataset involves a lot of query terms and filters. However, since each of us has a developer account, we were wondering if we can pool API access tokens to accelerate the data collection. For example, we will make an app that allows us to define a configuration file that contains a list of our access tokens that the app will try to use to search for a tweet. This app will be run on our local computer. Since the app uses our individual access tokens, we believe that we are not actually not bypassing or changing any Twitter limit as the record is kept for each access token. Are there any problems legal/technical that may arise from this methodology? Thank you! =D
Here is a pseudocode for what we are trying to do:
1. define a list of search terms such as 'apple', 'banana'
and 'oranges' (we have 100 of these search terms, we are okay
with the 100 limit per tweet)
2. define a list of frequent emotional adjectives such as 'happy', 'sad', 'crazy', etc. (we have have 100 of these) using TF-IDF
3. get the product of the search terms and emotional adjectives,
in total we have 10,000 query terms and we have computed
through the rate limit rules that we would need at least
55 runs of 15-minute sessions with 180 tweets per 15-minute.
55 * 15 = 825 minutes or ~14 hours to collect this amount of tweets.
4. we were thinking of improving the data collection by
pooling access tokens so that we can trim down the time
of collection from 14 hours to ~4 hours, e.g. by dividing the query items into subsets and letting a specific access token work on a subset
We were pushing for this since we just think it's efficient if it's possible and permitted since why not and it might help future researches as well?
The question is, are we actually breaking any Twitter rules or policies by doing this? By sharing one access token per each of us three and creating an app that we name as clones of the research project, we believe that in turn we are also losing something which is the headroom for one more app that we fully control.
I can't find specific rule in Twitter so far about this. Our concern is that we will publish a paper and will publish the app we will program and use for documentation and the app we plan to build. Disclaimer: Only the app's source code will be published and not the dataset because of Twitter's explicit rules about datasets.
This is absolutely not allowed under the Twitter Developer Policy and Agreement.
Twitter developer policy 5a:
Do not do any of the following:
Use a single application API key for multiple use cases or multiple application API keys for the same use case.
Feel free to check with Twitter directly via the developer forums. StackOverflow is not really the best place for this question since it is not specifically a coding question.

Geolocation, Is it possible to get latitude and longitude from address and store locally in my database

I want to be able to run queries locally comparing latitude and longitude of locations so I can run queries for certain addresses I've captured based on distance.
I found a free database that has this information for zip codes but I want this information for more specific addresses. I've looked at google's geolocation service and it appears it's against the TOS to store these values in my database or to use them for anything other than doing stuff with google maps. (If somebody's looked deeper into this and I'm incorrect let me know)
Am I likely to find any (free or pay) service that will let me store these lat/lon values locally? The number of addresses I need is currently pretty small but if my site becomes popular it could expand quite a bit over time to a large number. I just need to get the coordinates of each address entered once though.
This question hasn't received enough attention...
You're correct -- it can't be done with Google's service and still conform to the TOS. Cheers to you for honestly seeking to comply with the TOS.
I work at a company called SmartyStreets where we process addresses and verify addresses -- and geocode them, too. Google's terms don't allow you to store the data returned from the API, and there's pretty strict usage limits before they throttle or cut off your access.
Screen scraping presents many challenges and problems which are both technical and ethical, and I don't suppose I'll get into them here. The Microsoft library linked to by Giorgio is for .NET only.
If you're still serious about doing this, we have a service called LiveAddress which is accessible from any platform or language. It's a RESTful API which can be called using GET or POST for example, and the output is JSON which is easy to parse in pretty much every common language/platform.
Our terms allow you to store the data you collect as long as you don't re-manufacture our product or build your own database in an attempt to duplicate ours (or something of the like). For what you've described, though, it shouldn't be a problem.
Let me know if you have further questions about address geocoding; I'll be happy to help.
By the way, there's some sample code at our GitHub repo: https://github.com/smartystreets/LiveAddressSamples
http://www.zip-info.com/cgi-local/zipsrch.exe?ll=ll&zip=13206&Go=Go could use a screen scraper if you just need to get them once.
Also Microsoft provides this service. Check if this can help you http://msdn.microsoft.com/en-us/library/cc966913.aspx

About data mining by using twitter data

I plan to write a thesis about using sentiment information to enhance the predictivity of some financial trading model for currency.
The sentiment data should be twitter threads including some keyword, like "EUR.USD". And I will filter out some sentiment words to identify the sentiment. Simple idea. Then we try to see whether here is any relation between the degree of sentiment and the movement of EUR.USD.
My big concern is on twitter data. As we all know that the twitter set up the limit to see the history data. You could only browser back for like 5 days. It is not enough since our strategy based on daily sentiment.
I noticed that google have some fantastic thing like timeline about the twitter updates: http://www.readwriteweb.com/archives/googles_twitter_timeline_lets_you_explore_the_past.php
But first of all, I am in Switzerland and seems I have no such function on my google which is too smart to identify my location and may block some US google version function like this. Secondly, even I could see some fancy interactive google timeline control on my firefox, How could I dig out data from my query and save them? Does google supply such api?
The Google service you mentioned has shut down recently so you won't be able to use it. (http://www.searchenginejournal.com/google-realtime-shuts-down-as-twitter-deal-expires/31007/)
If you need a longer timespan of data to analyze I see the following options:
pay for historical data :) (https://dev.twitter.com/docs/twitter-data-providers)
if you don't want to pay, you need to fetch tweets containing EUR/USD whatever else (you could use the streaming API for this) and store them somehow. Run this service for a while (if possible) and you'll have more than just 5 days of data.

Reverse geocoding services

I'm working on a project that returns information based on the user's location. I also want to display the user's town in text (no map) so they can change it if it's not accurate.
If things go well I hope this will be more than a small experiment, so can anyone recommend a good reverse geocoding service with the least restrictions? I notice that Google/Yahoo have a limit to the number of daily queries along with other usage terms. I basically need to take latitude and longitude and convert them to a city/town (which I presume cannot be done using the HTML5 Geolocation API).
Geocoda just launched a geocoding and spatial database service and offers up to 1K queries a month free, with paid plans starting at $49 for 25,000 queries/month. SimpleGeo just closed their Context API so you may want to look at Geocoda or other alternatives.
You're correct, the browser geolocation API only provides coordinates.
I use SimpleGeo a lot and recommend them. They offer 10K queries a day free then 0.25USD per 1K calls after that. Their Context API is what you're going to want, it pretty much does what is says on the tin. Works server-side and client-side (without requiring you to draw a map, like Google.)
GeoNames can also do this and allows up to 30K "credits" a day, different queries expend different credit amounts. The free service has highly variable performance, the paid service is more consistent. I've used them in the past, but don't much anymore because of the difficulty of automatically dealing with their data, which is more "pure" but less meaningful to most people.

Using ATOM, RSS or another syndication feed for paid content

I work for a publishing house and we're discussing different ways to sell our content over digital channels.
Besides the web, we're closely watching the development of content publishing on tablets (e.g. iPad) and smartphones (e.g. iPhone). Right now, it looks like there are four different approaches:
Conventional publishing houses release Apps like The Daily, Wired or Time Magazine. Personally I name them Print-Content-Meets-Offline-Website Magazines. Very nice to look at, but slow, very heavy regarding datasize and often inconsistent on the usability side. Besides that: These magazines don't co-exist well in a world where Facebook and Twitter is where users spend most of their time and share content.
Plain and stupid PDF. More or less lightweight, but as interactive and shareable as a granite block. A model mostly used by conventional publishers and apps like Zinio.
Websites with customized views for different devices (like Die Zeit's tablet-enhanced website). Lightweight, but (at least until now) not able to really exploit a hardware platform as a native app can.
Apps like Flipboard, Reeder or Zite go a different way: Relaying on Twitter-, Facebook- and/or syndication-feeds like RSS and Atom, they give the user a very personalized way to consume news and media. Besides that, the data behind it is as lightweight as possible, the architecture to distribute the data is fast and has proven for years to be reliable.
Personally, I think #4 is the way to go. Unluckily the mentioned Apps only distribute free content and as a publishing house we're also interested in distributing paid content.
I did some research googled around and came to the conclusion, that there is no standardized way to protect and sell individual articles in a syndication feed.
My question:
Do you have any hints or ideas how this could be implemented in a plattform-agnostic way? Or is there an existing solution I just haven't found yet?
Update:
This article explains exactly what we're looking for:
"What publishers and developers need is
a standard API that enables
distribution of content for authorized
purposes, monitors its use, offers
standard advertising units and
subscription requirements, and
provides a way to share revenues."
Just brainstorming, so take it for what it's worth:
Feedreaders can't do buying but most of them have at least let you authenticate to feeds, right? If your free feed was authenticated, you would be able to tie the retrieval of atom entries to a given user account. The retrieval could check the user account against purchased articles and make sure they were populated with fully paid content.
For unpurchased content, the feed gets populated with a link that takes you to a Buy The Article page. You adjust that user account and the next time the feed is updated, the feed gets shows the full content. You could even offer "article tracks" or something like that where someone can by everything written by a given author or everything matching some search criteria. You could adjust rates accordingly.
You also want to be able to allow people to refer articles to others via social media sites and blogs and so forth. To facilitate this, the article URLs (and the atom entry ids) would need to be the same whether they are purchased or not. Only the content of the feed changes depending on the status of the account accessing the feed.
The trick, it seems to me, is providing enough enticement to get people to create an account. Presumably, you'd need interesting things to read and probably some percentage of it free so that it leaves people wanting more.
Another problem is preventing redistribution of paid content to free channels. I don't know that there is a way to completely prevent this. You'd need to monitor the usage of your feeds by account to look for access anomalies, but it's a hard problem.
Solution we're currently following:
We'll use the same Atom feed for paid and free content. A paid content entry in the feed will have no content (besides title, summary, etc.). If a user chooses to buy that content, the missing content is fetched from a webservice and inserted into the feed.
Downside: The buying-process is not implemented in any existing feedreader.
Anyone got a better idea?
I was looking for something else, but I've came across with Flattr RSS plugin for WordPress.
I didn't have time to look it through, but maybe you can find some useful ideas in it.

Resources