Comparing fan overlap between two Facebook pages - analysis

What's the best way to compare the fan overlap between two pages? For instance, if I were analyzing Coke and Pepsi, how could I find the:
1) Number of fans that like both pages
2) The demos of the fans that like both pages
3) The names of fans that like both pages (although I would expect that this is protected for privacy purposes)
Thanks.

Related

When searching a list of specific sites in Google's Programmable Search, YouTube dominates the results. Any way to balance it among the site list?

I've given google's programmable search a list of sites, for now I'm starting with:
But when I search any query, I get pretty much only YouTube results, which I believe is because YouTube has the highest page rankings.
Is there any way to customize this to be more balanced among the different sites?
See below for more detail for what I'm thinking.
Possible Solution #1: Round-Robin
The top result from the first site displayed, then the next site, and so on, until all sites have had their top search result displayed, then starting over.
Possible Solution #2: Site-to-Page Rank Ratio
Let's pretend that youtube's site "rating" is a 10, and reddit's is a 5. Now, in a list of search results, let's say that youtube.com/some-result has a rating of 8, and reddit.com/some-other-result has a rating of 6. In this case, the reddit.com result should display first, even though the youtube.com result has a higher absolute ranking. The reddit page's page-to-site ratio is higher.
I have no idea if #2 would even be possible, but maybe it can serve as an illustration for what I'm looking for. I'd be plenty happy with a simple #1 Round Robin approach.

Rails: How to query records in different languages

I have a Rails inventory app that is available to global users, allowing them to enter their own inventory information and query those of others.
a British person in London adds 10 units of "bicycle" to the inventory table
a Japanese person adds 2 units of 自転車 (bicycle in Japanese)
a Vietnamese adds 5 units of xe dap (bicycle in Vietnamese)
The British person can query 'bicycle' and it will output all bicycles in the system (17 units) and can show the details of each in their original language, without the users classifying them beforehand. Likewise, the Japanese person can query '自転車', which will show all bicycles.
How can this be done?
The globalize gem requires users to manually translate each record so it's not the correct way. I've heard about machine learning and deep learning but I don't know if it's the right solution for this.
So if stackoverflow is not the right place to ask this? Where should I ask? Quora does not allow long questions.
Machine learning does not seem like a proper solution in this context since you don't have enough experience with it and it's a complex matter to just start with it and learn enough to apply to a real life problem.
Here are a few solutions you could implement today, as long as you understand the requirements and the up/downs for each, you will have to figure those out by yourself.
Since I don't have enough information about your system I'll try and generalize it to something that's likely.
Solutions:
1.Define a limited number of items for your system, like Bike and add
them to a config file or in a items database, each item having it's
unique id and when a user will have to add something they will have
to select from your list. Have a Other item as a catch-all, and
maybe provide a note so the users can add anything to recognize the
item.
2.Similar to the above solution but you give the users a way to add new items into the system, so you have 10 standard items and every user can add items to the site (those being moderated) and other users will have access to them.
3.Have a solid search system in place like Elasticsearch (or anything else), and when the user create items you index that item in the language that is entered, and then use Google translation API (or another translation service) to translate them in all the languages you need and index those for search as well.
I think solution 1 is the best if you are able to implement it followed by solution 2.

Typical crawling depth by search engines

When a site is crawled by a search engine (google, bing, etc), what is the typical maximum depth a search engine would crawl into a site. By depth, I mean number of hops from homepage.
Thanks,
It depends on the overall rank of your site, and the rank of incoming links, especially if they aren't pointing at your homepage.
Crawlers for smaller search engines like blekko aren't going to go that far away from landing-points of external links, unless your overall site is awesome or you have lots of links from awesome sites. We save our crawling and indexing energy for stuff with higher rank, so if our estimate is that a page will have poor rank, we won't bother.
Google's crawler might crawl quite a distance even if you only have a poor inlink profile - but even they know about 10x more URLs than they actually crawl.
If you want to crawl whole world then 19 depth is enough. Because whole world cover in 19 depth. But if you want to crawl for a specific domain or a country then 10 depth is quite enough.
I have found this info from a paper. Which was used for developing Mercator.
Thanks
Mohiul Alam Prince

Using ATOM, RSS or another syndication feed for paid content

I work for a publishing house and we're discussing different ways to sell our content over digital channels.
Besides the web, we're closely watching the development of content publishing on tablets (e.g. iPad) and smartphones (e.g. iPhone). Right now, it looks like there are four different approaches:
Conventional publishing houses release Apps like The Daily, Wired or Time Magazine. Personally I name them Print-Content-Meets-Offline-Website Magazines. Very nice to look at, but slow, very heavy regarding datasize and often inconsistent on the usability side. Besides that: These magazines don't co-exist well in a world where Facebook and Twitter is where users spend most of their time and share content.
Plain and stupid PDF. More or less lightweight, but as interactive and shareable as a granite block. A model mostly used by conventional publishers and apps like Zinio.
Websites with customized views for different devices (like Die Zeit's tablet-enhanced website). Lightweight, but (at least until now) not able to really exploit a hardware platform as a native app can.
Apps like Flipboard, Reeder or Zite go a different way: Relaying on Twitter-, Facebook- and/or syndication-feeds like RSS and Atom, they give the user a very personalized way to consume news and media. Besides that, the data behind it is as lightweight as possible, the architecture to distribute the data is fast and has proven for years to be reliable.
Personally, I think #4 is the way to go. Unluckily the mentioned Apps only distribute free content and as a publishing house we're also interested in distributing paid content.
I did some research googled around and came to the conclusion, that there is no standardized way to protect and sell individual articles in a syndication feed.
My question:
Do you have any hints or ideas how this could be implemented in a plattform-agnostic way? Or is there an existing solution I just haven't found yet?
Update:
This article explains exactly what we're looking for:
"What publishers and developers need is
a standard API that enables
distribution of content for authorized
purposes, monitors its use, offers
standard advertising units and
subscription requirements, and
provides a way to share revenues."
Just brainstorming, so take it for what it's worth:
Feedreaders can't do buying but most of them have at least let you authenticate to feeds, right? If your free feed was authenticated, you would be able to tie the retrieval of atom entries to a given user account. The retrieval could check the user account against purchased articles and make sure they were populated with fully paid content.
For unpurchased content, the feed gets populated with a link that takes you to a Buy The Article page. You adjust that user account and the next time the feed is updated, the feed gets shows the full content. You could even offer "article tracks" or something like that where someone can by everything written by a given author or everything matching some search criteria. You could adjust rates accordingly.
You also want to be able to allow people to refer articles to others via social media sites and blogs and so forth. To facilitate this, the article URLs (and the atom entry ids) would need to be the same whether they are purchased or not. Only the content of the feed changes depending on the status of the account accessing the feed.
The trick, it seems to me, is providing enough enticement to get people to create an account. Presumably, you'd need interesting things to read and probably some percentage of it free so that it leaves people wanting more.
Another problem is preventing redistribution of paid content to free channels. I don't know that there is a way to completely prevent this. You'd need to monitor the usage of your feeds by account to look for access anomalies, but it's a hard problem.
Solution we're currently following:
We'll use the same Atom feed for paid and free content. A paid content entry in the feed will have no content (besides title, summary, etc.). If a user chooses to buy that content, the missing content is fetched from a webservice and inserted into the feed.
Downside: The buying-process is not implemented in any existing feedreader.
Anyone got a better idea?
I was looking for something else, but I've came across with Flattr RSS plugin for WordPress.
I didn't have time to look it through, but maybe you can find some useful ideas in it.

Contest ranking question - how to rank entries in multiple categories?

I'm currently developing a video contest web application using Ruby on Rails. It integrates closely with YouTube, which it uses for submitting videos, comments, average rating, and popularity stats. The application will also count Twitter and (possibly) Facebook mentions, and count the number of times visitors have clicked an "Add This" social network button.
Instead of direct voting it will use each video's YouTube rating and social media presence to pick a winner.
My question is: What is the fairest method for ranking the entries?
My basic idea is to just find each video's ranking in each category separately by sorting the results of an ActiveRecord query, then compute the average of all these numbers and use it as the video's master rank. Then I'd sort all the entries by this rank, with the lowest number coming in first, etc. Is this a fair way to rank the contest entries?
Shouldn't the contest organizer be telling you how they want everything rated?
I would personally count the number of youtube submissions, add the score for each, then divide by the number to get their average score, and then suppliment that somehow by social media mentions, but it is up to them to tell you which should carry more weight. They have to understand that you can design the app to do whatever they want, but they are in charge of letting you know what precisely they want. That sort of decision should not be left up to the designer. Let them wrestle with it in committee for a bit, don't sweat the actual algorithm until they come up with the answer for you.
It depends on what you're trying to accomplish. However, it seems to me that the social media score is pointless. The net result is that someone bothered to watch and/or rate the video on YouTube. Those scores alone should tell you if someone is "doing a good job" on the social media front.

Resources