Is there any limit to the number of rows returned by API? - youtube

I am making a bulk call with 30 posts and daily data of all. Is there any limits to the number of rows that will be returned by the API?
I am having problem getting the results.
Can anyone please help.

YouTube doesn't return any rows ... it's not relational data. That may sound like a pedantic thing to point out, but it's crucial for this next point; the API will return 50 videos at a time, along with tokens to get more results based on the same query, up to a total of 500 ... because the data isn't relational, you can't just "select all rows" that match a certain criteria. Rather, it is probabilistically determining relevance to your search parameters, and after about 500 results the algorithms don't have enough certainty to make additional results relevant.
So in your case, where you can change the date as needed (to allow the algorithms to be more specific), you'll want to do a series of calls; perhaps one at a time (since you have to paginate anyway to get more than 50 results, it's probably not that much more expensive in terms of network bandwidth).

Related

Google Ads API: getMetrics doesn't get results for all passed keywords

I'm on migration from the old Google AdWords API to the new Google Ads API, using PHP-SDK by Google.
This is the use case, where I'm stuck:
I feed an amount of keywords (paginating them by keyword plans a 10k) to generateHistoricalMetrics($keywordPlanResource) and collect the results.
To do so I followed instructions at https://developers.google.com/google-ads/api/docs/keyword-planning/generate-historical-metrics and, especially, https://developers.google.com/google-ads/api/docs/keyword-planning/generate-historical-metrics#mapping_to_the_ui, with using of KeywordPlanAdGroupKeywords (with a single ad group) and avoiding to pass a specific date range for now, relying on the default value.
Further I had to apply some filters on my keywords because of KEYWORD_HAS_INVALID_CHARS and KEYWORD_TEXT_TOO_LONG, but all the errors which I'm aware of are gone now.
Now, I found out, that the KeywordPlanHistoricalMetrics object does not contain any keyword id (of the form customers//keywordPlanAdGroupKeywords/) So, I have to rely on the correct ordering. This is ok as it seems, that the original ordering of keywords is preserved within the results, as here https://developers.google.com/protocol-buffers/docs/encoding#optional
But still I have the problem, that
count($keywordPlanServiceClient->generateHistoricalMetrics($keywordPlanResource)->getMetrics()) is lower then count($passedKeywords), where each of $passedKeywords where passed to
new KeywordPlanAdGroupKeyword([
'text' => $passedKeyword,
'match_type' => KeywordMatchType::EXACT
'keyword_plan_ad_group' => $planAdGroupResource
]);
Q: So I have two questions here:
Why getMetrics() does not yield the same amount of results as the amount of passed keywords?
I'm struggling with debugging at this moment: Say, I want to know which keywords are let out. Either for providing more information at this place or just to skip them, and let my customer know, that these particular keywords were not queried. How to do this, when although I have a keyword id for every passed keyword I cannot match the returned metrics to them, because the KeywordPlanHistoricalMetrics object does not contain any keyword id.
Detail: While testing I found out, that the reducing of an amount of queried keywords reduces the amount of lost keyword data:
10k of queried keywords - 4,72% loss,
5k - 2,12%,
2,5k - 0,78%,
1,25k - 0,43%,
625 - 0,3%,
500 - 0,24%,
250 - 0,03%
200 - 0,03% of lost keywords.
But I can't imagine, that keywords should be queried one by one.

Can I search for channels which contain at least one video?

I am searching for channels against the YouTube API v3:
/search?part=id&type=channel&q=searchquery
Problem: It also returns channels which contain zero (0) videos.
I can see two possible approaches:
Use the &order=videoCount parameter, which sorts DESC the returned channels by the number of videos they contain.
Run a second query to search for videos in each channel, limit &maxResults=1 the returned amount of videos to one, and parse the result.
I don't want to take approach 1, as it changes the default &order=relevance, which I believe is fine-tuned by YouTube to return the most relevant channels first [still, the result may contain channels without videos, which makes YouTube's algorithm for calculating relevance questionable].
Approach 2 causes a lot of additional load, so it also isn't something that I'd want to do.
Is there a way to solve this in an elegant way that I don't see right now?
Approach 2 makes sense and assuming you'll use paging, you can run the second query as you need, instead of trying to run the query against the whole result set. That should be fine.

Why limited number of next page tokens?

Through a script I can collect a sequence of videos that search list returns. The maxresults variable was set to 50. The total number items are big in number but the number of next page tokens are not enough to retrieve all the desired results. Is there any way to take all the returned items or it is YouTube restricted?
Thank you.
No, retrieving the results of a search is limited in size.
The total results that you are allowed to retrieve seems to have been reduced to 500 (in the past it was limited to 1000). The api does not allow you to retrieve more from a query. To try to get more, try using a number of queries with different parameters, like: publishedAfter, publishedBefore, order, type, videoCategoryId, or vary the query tags and keep track of getting different video id's returned.
See for a reference:
https://code.google.com/p/gdata-issues/issues/detail?id=4282
BTW. "totalResults" is an estimation and its value can change on the next page call.
See: YouTube API v3 totalResults field is returning 1 000 000 when it shoudn't

Twitter API Search - how to check query char length?

I keep getting a Twitter 403 error for the search query being too many chars.
search = Twitter::Search.new.contains("#{self.home_team.long_name} #{self.away_team.long_name} win OR lose")
search.language("en").no_retweets.since_date(since)
How do I count how many chars the query is so I skip trying to run a search on it? For ex, this one is 156 chars:
GET https://search.twitter.com/search.json?q=Middle%20Tennessee%20State%20Blue%20Raiders%20Florida%20International%20Golden%20Panthers%20win%20OR%20lose%20-rt%20-from%3Aespn&&lang=en&since=2011-02-09: 403: Sorry, your query is too complex. Please reduce complexity and try again.
I don't think the issue is your query length, it's the query complexity. Try reducing the complexity or amount of results you can get with a more restrictive query or a simpler one. This usually happens with geographically based queries, like near, where the radius is too large and results in a very computationally complex or large query.
If you still want to count the query length, you can probably get a good estimate of the length by counting up your parameters and keys, like this:
search = Twitter::Search.new.contains(....).language('en')
search.query.inject(0){ |sum, obj| sum += obj[0].size+obj[1].size+1 }
That should give you the length of the key+value+symbol of your query.

How do I handle large amounts of logfile data for display in dynamic charts?

I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.

Resources