I'm building a portal that lists certain products and automatically gets the prices from product pages of listed vendors. To get the URL for the product page on a vendor's website, I've been using Google search API and it's been working great - the first result is invariably the page of the product. However, now I'm getting errors saying that Google has blocked my website (actually my develoment machine's IP) from the API because I've been making automated requests such as scraping (the only item that applies).
Fine, Google can go jump off a cliff, but... how do product portals generally get URLs for thewir products? I can enter the URLs manually but that can be a problem if the vendor's website changes the URL scheme somehow. I obviously need an automated way to do this.
I'm making no more than 50-60 requests per day so I don't get what Google wants. Do they want money?
First, they want you to use one of their APIs, not scrape their web page directly. Their custom search API is documented here. Once you register they'll give you an API key. You can get results in JSON format by requesting
https://www.googleapis.com/customsearch/v1?q=SEARCH_TERMS&key=YOUR_KEY
Second, they do like money, but you might be okay. You're allowed 100 searches per day for free; beyond that you're you're going to be charged $5 per thousand searches.
Related
I've got a bunch of free online HTML, CSS, and JS tutorials under my belt and I want to try using them to make a browser extension. But I want to make sure that the data I want to use is actually accessible before getting started.
My goal is to make a browser extension for twitter.com that shows the number of impressions of any tweet next to the likes, retweets, and replies. My basic idea is to get the status URL of any given tweet, poll the Twitter API for the number of impressions of that tweet, store that in a variable, and then use CSS to display a little eye icon and the number stored in the impressions variable.
I know that I can find the number of impressions of all of my tweets, both through Twitter Analytics, and also just going to my profile page and clicking the little bar chart icon next to views, retweets, etc. But I'm not clear on whether I can do that for other people's tweets via Twitter's API or anything else. Can you?
For the record, I'm not too concerned about the varying definition of "impression," since it will be consistently applied across all tweets and I'm mostly interested in giving users a comparison between tweets. This is part of a research project to see how this might change how people engage with social media if they know how many views a given post has. If there's a simpler way to go about that using existing platforms, I'm open to suggestions.
Thanks for the advice!
No, impressions data is private. If you are authenticated to the Twitter API then you can use the new Twitter Developer Labs Tweets API to get private metrics like impressions, but you cannot get that for other people's Tweets. Also, the Twitter API does not support CORS, so I don't think you'll be successful trying to use it from a browser extension.
I have a range of Google docs that are publicly viewable, but I would like to get some information about how often they are being viewed. I understand that there used to be a way of doing this with Google Analytics, but now that has been removed.
It seems to me that I have two main options, one of which is to make all my doc links point to a page which redirects according to a query string parameter, e.g.:
http://myurl.net?page=1 # Sends you to one page and logs the visit
http://myurl.net?page=2 # Sends you to another page and logs the visit
Or alternatively, I could try to embed some code in each doc that makes a call back to the server with its page number. But I don't know if this is possible.
The first option looks like it should be fairly easy, but I don't see how to redirect the client.
Could anyone give me some ideas about how to do this? It seems it would be useful for quite a lot of people.
Many thanks.
Justin.
The question I'm trying to answer for a set of users is how other users end up on their page. There are about 5 different ways a user can end up on your page. For example, they could have searched your name, clicked a link from a newsfeed or received an e-mail with a link to your page.
What is the best way to accomplish tracking these events? I'm initially inclined to create a table to track this. Each link would send an async event to the server to be added to the table. However, I'm also aware that there are many tracking services out there such as Google Analytics and Mixpanel. I've looked at their docs briefly and they don't seem to fit my need.
Am I missing something? Is it worth it to create a "custom" even tracking system to accomplish this?
It is not worth creating your own service. Plus you cannot add async link to search engine result pages or emails (that would require tracking code that you cannot implement in search engines or that would not be executed in mail clients).
Web analytics software tracks traffic sources by analyzing the incoming traffic via its http headers. If there is a referrer set the traffic will be attributed to, well, the referring site, unless the traffic is included in a list of known search engines in which case it will be attributed to organic search traffic etc.
In most systems you can customize source attribution by adding query parameters in the url (obviously this will not work with search engines and the like, since you cannot add parameters to organic search results). For example with Google Analytics you can add custom campaign parameters in email links or advertising campaigns. If people click on those links the parameter value will be send to GA and the source/medium/campaign information will be set accordingly (e.g. traffic from web mail clients would usually be attributed as a referrer, but campaign parameters allow to attribute the link to your mail campaigns).
There might be reasons to create your own system, but channel attribution is not one of them; GA and every other system I know of has this thoroughly covered.
I am new to Twitter and need some tips.
I need to display tweet feed from multiple users on some webpage.
The first thing I stumbled upon is Embedded Timelines. It allows to display tweets from list of users but the gotcha is that those lists should be maintained on Twitter-side (i.e. I cannot specify #qwe and #asd only on my side and get timeline without adding those users into list on Twitter-side).
The thing is that list of users that should be included into timeline is dynamic and managing those lists through Twitter API will probably be painful. Not to mention that my website will probably generate tons of those lists and I feel that I will violate some api quotas sooner or later.
So, my question is - am I stuck with using Embedded Timelines that refer some user list on Twitter-side and managing those lists through, say Twitter REST api, or there is a simplier way to do what I want?
It's pretty simple to display tweets for multiple users.
Links to start with
This post explains some of the search queries you can make
This post is a simple library to make requests to the twitter API that 'just works'
Your Query
Okay, so you want multiple users. The endpoint you're looking at using is the search/tweets one: https://api.twitter.com/1.1/search/tweets.json.
The query string uses :from and you can interpolate multiple froms with AND/OR.
An example query for the GET request:
?q=from:user1+OR+from:user2
Read more about the search API queries here.
Your "over-the-quote" issue
This is something you're going to need to figure out yourself - depending on the number of requests you expect to make, and the twitter imposed limits, maybe some sort of caching or saving information when you hit your limit, and only pull back from the cache whilst you're hitting your limit..
Recently search engines have been able to page dynamic content on social networking sites. I would like to understand how this is done. Are there static pages created by a site like Facebook that update semi frequently. Does Google attempt to store every possible user name?
As I understand it, a page like www.facebook.com/username, is not an actual file stored on disk but is shorthand for a query like: select username from users and display the information on the page. How does Google know about every user, this gets even more complicated when things like tweets are involved.
EDIT: I guess I didn't really ask what I wanted to know about. Do I need to be as big as twitter or facebook in order for google to make special ways to crawl my site? Will google automatically find my users profiles if I allow anyone to view them? If not what do I have to do to make that work?
In the case of tweets in particular, Google isn't 'crawling' for them in the traditional sense; they've integrated with Twitter to provide the search results in real-time.
In the more general case of your question, dynamic content is not new to Facebook or Twitter, though it may seem to be. Google crawls a URL; the URL provides HTML data; Google indexes it. Whether it's a dynamic query that's rendering the page, or whether it's a cache of static HTML, makes little difference to the indexing process in theory. In practice, there's a lot more to it (see Michael B's comment below.)
And see Vartec's succinct post on how Google might find all those public Facebook profiles without actually logging in and poking around FB.
OK, that was vastly oversimplified, but let's see what else people have to say..
As far as I know Google isn't able to read and store the actual contents of profiles, because the Google bot doesn't have a Facebook account, and it would be a huge privacy breach.
The bot works by hitting facebook.com and then following every link it can find. Whatever content it sees on the page it hits, it stores. So even if it follows a dynamic url like www.facebook.com/username, it will just remember whatever it saw when it went there. Hopefully in that particular case, it isn't all the private data of said user.
Additionally, facebook can and does provide special instructions that search bots can follow, so that google results don't include a bunch of login pages.
profiles can be linked from outside;
site may provide sitemap