I am using Twython with the streaming API to retrieve specific tweets. There is a limitation of 1% of the total volume of tweets that can be gathered. If a request reaches this limit (https://dev.twitter.com/discussions/2907), there should be an entry in the json file like
{"limit":{"track":1234}}
I tried to reach the limit by tracking every tweet that contains 'the' or every tweet in the whole world with location='-180.,-90.,180,90.' but I never got this limit/track.
Could you tell me if I can have access to it with Twython?
Best regards,
F.
Related
I have got a requirement to fetch and store response time and count of different 2xx,4xx and 5xx request from access logs to influxdb.(for graphing and alerting purpose).
I know I can use telegraf to parse logs file and keep sending data to influxdb. And get these counts from running query on the data.
[1, 2 ]
But in this way, I will be sending a lot of datapoints to the influxbd server.
What I am trying to find is, if there is any way, I can only send processed data to influxdb, like no of req/sec, no of 2xx/4xx/5xx req/sec.
I have been reading on various threads and blogs, but couldn't find anything matching.
Anyhelp would be appreciated.
Thanks.
"c:\Program Files (x86)\Log Parser 2.2\LogParser.exe"
SELECT sc-status, COUNT(*) FROM [logfile]
GROUP BY sc-status
you can add the math on select statement to calculate /sec responses.
I get that there is a cost incurred when I use YouTube API service, but what I would like to know is if the cost is per request or not.
For example, when I query the meta data of 3 videos, would the cost be tripled for that one request, or would the cost be the same as if I query the meta data for 1 video?
I assume you talk about the quota with the YouTube API v3, i can suggest you to visit this link, a quota calculator:
This tool lets you estimate the quota cost for an API query. All API requests, including invalid requests, incur a quota cost of at least one point.
https://developers.google.com/youtube/v3/determine_quota_cost COST 3
would the cost be tripled for that one request, or would the cost be the same as if I query the meta data for 1 video?
We can assume "cost be the same as if I query the meta data for 1 video" because they speaks about "request" like this :
GET https://www.googleapis.com/youtube/v3/videos?part=snippet COST 3
The request for multiple videos is like this :
GET https://www.googleapis.com/youtube/v3/videos?part=snippet&id=zMbIipvQL0c%2CLOcKckBLouM&key={YOUR_API_KEY}
Which is also one request, so it's also a cost of 3 !
The real deal is when you have multiple pages:
Note: If your application calls a method, such as search.list, that returns multiple pages of results, each request to retrieve an additional page of results will incur the estimated quota cost.
I am processing twitter tweets by using twiiter4j.properties through storm-bolts. My topology looks like:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("KafkaSpout", new KafkaSpout(kafkaConfig), 2).setNumTasks(4);
builder.setBolt("Preprocesing", new preprocessBolt2(), 2)
.setNumTasks(4).shuffleGrouping("KafkaSpout");
builder.setBolt("AvgScoreAnalysis",
new AvgScoringBolt(), 4).setNumTasks(8)
.fieldsGrouping("Preprocesing",new Fields("tweetId"));
builder.setBolt("PrinterBolt", new LocalFile(), 6).setNumTasks(4)
.shuffleGrouping("AvgScoreAnalysis");
Where I am taking tweets from KafkaSpout and sending it to bolt for pre-processing, My problem is in the avgScoring where I am calling S3 in that I am having csv for each user and calculating the scoring for each user for each single tweet. I am having 100 users means my avg scoring has to calculate avg score for each tweet for all the number of users in the s3. It is pretty slow how can I increase the performance in this bolt and there are so many duplicates in the file how can I remove duplicates?
Calling S3 from the AvgScoringBolt is not a great idea if you want high performance: unless you're filtering the tweets with some criteria, there's no way to make a connection to S3 for every tweet and still parse thousands tweet per second.
Since there are only 100 users, maybe you could download the CVS of the users at the start of application, make the computation inside the bolt without connecting to S3 (using the downloaded CSV), and periodically upload the updated CSV to S3 to have a loosely synchronization with S3. I don't know if this scenario fits your requirements.
I am trying to create a yahoo pipe that takes ideally takes all tweets tweeted at any point in time and filters down by a number of attributes to then display a filtered feed.
Basically in order this is what I want to happen:
Get a feed of all tweets at any one time.
Filter tweets by geolocation origin, i.e. UK,
Filter by a number of of different combinations of keywords.
Output as an RSS feed (though this isn't really the crucial stage as Yahoo Pipes takes care of this anyway)
Disclaimer: of course I understand that there are limits to the amount of tweets that could come through etc but I would like to cast the input net as wide as possible.
I have managed to get stages 3 & 4 working correctly and for the time being I am not really worrying about step 2 (although if you have any suggestions I am all ears), but stages 1 is where I am struggling. What I have attempted is using a Fetch Feed module with the URL - http://search.twitter.com/search.atom?q=lang:en - however it seems that this only pulls 15 tweets. Is there any way that I can pull more than 15 tweets every time the pipe is run, otherwise I think this may all be in vain.
FYI, here is the link to the pipe as it stands - http://pipes.yahoo.com/ludus247/182ef4a83885698428d57865da5cf85b
Thanks in advance!
I am currently trying to pull data about videos from a YouTube user upload feed. This feed contains all of the videos uploaded by a certain user, and is accessed from the API by a request to:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads
Where USERNAME is the name of the YouTube user who owns the feed.
However, I have encountered problems when trying to access feeds which are longer than 1000 videos. Since each request to the API can return 50 items, I am iterating through the feed using max_length and start_index as follows:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1&max-results=50&orderby=published
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=51&max-results=50&orderby=published
And so on, incrementing start_index by 50 on each call. This works perfectly up until:
http://gdata.youtube.com/feeds/api/users/USERNAME/uploads?start-index=1001&max-results=50&orderby=published
At which point I receive a 400 error informing me that 'You cannot request beyond item 1000.' This confused me as I assumed that the query would have only returned 50 videos: 1001-1051 in the order of most recently published. Having looked through the documentation, I discovered this:
Limits on result counts and accessible results
...
For any given query, you will not be able to retrieve more than 1,000
results even if there are more than that. The API will return an error
if you try to retrieve greater than 1,000 results. Thus, the API will
return an error if you set the start-index query parameter to a value
of 1001 or greater. It will also return an error if the sum of the
start-index and max-results parameters is greater than 1,001.
For example, if you set the start-index parameter value to 1000, then
you must set the max-results parameter value to 1, and if you set the
start-index parameter value to 980, then you must set the max-results
parameter value to 21 or less.
I am at a loss about how to access a generic user's 1001st last uploaded video and beyond in a consistent fashion, since they cannot be indexed using only max-results and start-index. Does anyone have any useful suggestions for how to avoid this problem? I hope that I've outlined the difficulty clearly!
Getting all the videos for a given account is supported, but you need to make sure that your request for the uploads feed is going against the backend database and not the search index. Because you're including orderby=published in your request URL, you're going against the search index. Search index feeds are limited to 1000 entries.
Get rid of the orderby=published and you'll get the data you're looking for. The default ordering of the uploads feed is reverse-chronological anyway.
This is a particularly easy mistake to make, and we have a blog post up explaining it in more detail:
http://apiblog.youtube.com/2012/03/keeping-things-fresh.html
The nice thing is that this is something that will no longer be a problem in version 3 of the API.