Undeterministic search results with most viewed - youtube-api

I would like to use the Youtube data API search to retrieve the most viewed videos on Youtube. However, for some reason, my results are missing some videos.
Here is the API call: https://www.googleapis.com/youtube/v3/search?part=snippet&key={YOUR-API-KEY}&alt=json&type=video&order=viewCount&maxResults=50
The 13 first results returned by the API are:
9bZkp7q19f0
RgKAFK5djSk
fRh_vgS2dFE
OPf0YbXqDm0
e-ORhEE9VVg
KYniUCGPGLs
YQHsXMglC9A
nfWlot6h_JM
NUsoVlDFqZg
HP-MbfHFUqs
CevxZvSJLk8
7PCkvCPvDXk
0KSOMA3QBU0
These should be the 13 most viewed videos on Youtube at the time I queried it.
However, looking at this Youtube-made playlist:
https://www.youtube.com/playlist?list=PLirAqAtl_h2r5g8xGajEwdXd3x1sZh8hC
I can see that the video YqeW9_5kURI, which has 1.7 billion views, should arrive in 9th position in the list returned by the API, but it doesn't. Actually, it never appears among the 500 (max) videos returned by the API.
UPDATE
Since results change every day, I did more comprehensive tests with the search API.
Here is the result of the API call I mentioned above from yesterday dec. 13th (first 10 results):
9bZkp7q19f0
RgKAFK5djSk
fRh_vgS2dFE
OPf0YbXqDm0
e-ORhEE9VVg
KYniUCGPGLs
YQHsXMglC9A
nfWlot6h_JM
NUsoVlDFqZg
HP-MbfHFUqs
And here is the result obtained today, dec. 14th (again, first 10 results):
9bZkp7q19f0
RgKAFK5djSk
fRh_vgS2dFE
KYniUCGPGLs
nfWlot6h_JM
NUsoVlDFqZg
YqeW9_5kURI
HP-MbfHFUqs
CevxZvSJLk8
09R8_2nJtjg
I am absolutely sure that the API call did not change between these two dates, code is exactly the same.
First anomaly: video 7 in second call (YqeW9_5kURI) does not appear in first call (this was my original post example), although its 1.7 billion views were definitely not done overnight.
Second anomaly: videos 4, 5, 7 in first call (OPf0YbXqDm0, e-ORhEE9VVg, YQHsXMglC9A) do not appear in second call, although they are still available on Youtube and still more viewed than video 5 (nfWlot6h_JM), for instance.
These anomalies repeat very often over larger sets of results.
To sum up, the search API does not seem to yield deterministic results with no query string and viewCount order, is this expected behaviour?
Or can you help me figure out what could be the reason for this?
Thanks in advance for your help, any pointers will be greatly appreciated.

Try to include filtering parameters like chart. The chart parameter identifies the chart that you want to retrieve. mostPopular retruns the most popular videos for the specified content region and `video category.
If successful, this method returns a response body with the following structure:
{
"kind": "youtube#videoListResponse",
"etag": etag,
"nextPageToken": string,
"prevPageToken": string,
"pageInfo": {
"totalResults": integer,
"resultsPerPage": integer
},
"items": [
video Resource
]
}

Related

DynamoDB Timeseries: Querying large timespans of data

I have a simple timeseries table:
{
"n": "EXAMPLE", # Name, Hash Key
"t": 1640893628, # Unix Timestamp, Range Key
"v": 10 # Value being stored
}
Every 15 minutes I will poll data and insert into the table. If I want to query values between a 24-hour period, this works well - this would equate to a total of 96 records.
Now, say I want to query a larger timespan - 1 or 2 years. This is now tens of thousands of records, and (in my opinion) impractical to do regularly. This will require multiple queries to retrieve larger time ranges which would negatively impact response times as well as being much more costly.
I have thought of a couple of potential solutions to this problem:
1. Replicate data in another table, with larger increments. A table with a single record every 6 hours, for example.
2. Have another table to store common query results, such as records for "EXAMPLE" for the past week, month, and year (respectively). I would periodically update records in the new table to hold every N'th record in the main table (a total of 100). Something like:
{
"n": "EXAMPLE#WEEKLY",
"v": [
{
"t": 1640893628,
"v": 10
},
{
"t": 1640993628,
"v": 15
},
... 98 more.
]
}
I believe #2 is a solid approach. It seems to me like this would be a common enough problem, so I would love to hear about how other people have approached it.
More options present themselves if you can convert your unix timestamps into ISO 8601-type strings like 2021-12-31T09:27:58+00:00.
If so, DynamoDB's begins_with key condition expression lets us query for discrete calendar time buckets. December 2021, for example,
is queryable using n = id1 AND begins_with(t, "2021-12"). Same for days and hours. We can take this one step further by adding
other periods in indexes.
Some rolling windows are possible, too: n = id1 AND t > [24 hours ago] gives us last 24h.
n (PK) t (SK) hour_bucket (LSI1 SK) week (LSI2 SK)
id1 2021-12-31T10:45 2021-12-31T09-12 2021-52
id1 2021-12-31T13:00 2021-12-31T13-15 2021-52
id1 2022-06-01T22:00 2022-06-01T22-24 2022-01
If you are looking for arbitrary time-series queries, you might consider Athena, as the other answer suggested, or AWS's serverless
Timestream, which is a "purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. "
You could export the table to Amazon S3 and run Amazon Athena on the exported data. Here’s a blog post describing the process: https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

Daily views of every video in the channel measured from the release date

I am trying to get the number of views of every video in the channel but measured from the release date for 90 days. I would like something like this:
ID, Release date, 0, 1, 2, 3, 4, ...... 90
Video1, 2020-04-03, 100, 40, 20, 10, ....., 0
Video2, 2020-06-03, 100, 40, 20, 10, ....., 0
...
Is there a way to achive this?
You can achieve this by using a combination of YouTube Data API and YouTube Analytics API requests.
Firstly, query the Data API to retrieve all videos of a channel from the search endpoint.
Set the query parameters to:
part:snippet
channelId:CHANNEL_ID
You will get a list of videos with their details as a response, among others the videoId and publishedAt values. This is what you will need for the second query.
Secondly, query the Analytics API to retrieve view stats of the videos as a video report (differs for content owners and just channel owners).
Set the query parameters to:
dimensions:video,day
filters:video==comma separated list of video ids
metric:views
startDate:earliest published date of all videos
endDate:latest published date of all videos + 90
ids:refer to docs for your owner type
The scope you need for this query is:
yt-analytics.readonly
This query will return views for each video aggregated by day for the date span you provided. After you get the response, you can filter out just the first 90 days for each video.
Alternatively, you can write a separate query for each video with the specified start date and end date, getting a result that does not need to be filtered.
Using the Advanced Tab in YouTube Studio > Analytics you can download your statistics as a CVS-File or open in Google Tables.
That's the quickest way possible.
Hope it helps :)

Get youtube trending videos in specific day and region

The specific information that I want to get is the list of videos that were most viewed in South Korea on Apr 1st, 2020. It would be awesome I can also get the statistics of each videos(such as the number of view counts, likes, dislikes, and comments)
I tried some coding with python using youtube API, but the results seem very different from what I expected. (The title of some videos in the results are written in Arabic or Russian even though their regioncode is KR, I have no idea what's happening.) The followings are my code. Any comments would help. Thx!!
api_key=" "
from apiclient.discovery import build
youtube = build('youtube','v3',developerKey=api_key)
from datetime import datetime
start_time = datetime(year=2020, month=4, day=1).strftime('%Y-%m-%dT%H:%M:%SZ')
end_time = datetime(year=2020, month=4, day=2).strftime('%Y-%m-%dT%H:%M:%SZ')
res = youtube.search().list(part='snippet',
maxResults='50',
regionCode='KR',
order='viewCount',
type='video',
publishedAfter=start_time,
publishedBefore=end_time
).execute()
for item in res['items']:
print(item['snippet']['title'], item['snippet']['publishedAt'])
res
The Search.list endpoint's doc says that:
regionCode string
The regionCode parameter instructs the API to return search results for videos that can be viewed in the specified country. The parameter value is an ISO 3166-1 alpha-2 country code.
This means that filtering a search result set by the regionCode being KR produces a list of videos that are allowed to be viewed in the KR region, regardless of the respective videos being actually viewed from within that region or not.

How ot write points into influxdb 0.8 with time in seconds

I would like to write points into an influx 0.8 database with the time values given in seconds through HTTP. Here's a sample point in JSON format:
[
{
"points": [
[
1435692857.0,
897
]
],
"name": "some_series",
"columns": [
"time",
"value"
]
}
]
The documentation is unclear what the format of time values should be (nano or milli seconds?) and how to specify to influxdb what to expect. Currently I'm using a query parameter: precision=s
That seems to work fine, the server returns HTTP Status code 200 as expected. When querying against the database using influx' admin interface using select * from some_series the datapoints in the table are returned with the expected timestamp. On the graph however, the time axis is indexed with fractions of seconds and queries like select * from some_series where time > now() - 1h dont yield any results.
I assume that there is something wrong with the timestamps. I tried multiplying my value by 1000 but then nothing gets inserted into the database with no visible errors.
Whats the problem?
By default, supplied timestamps are assumed to be in milliseconds. I think your writes are defaulting to milliseconds because the query string parameter should be time_precision=s, not precision=s.
See the details under "Time Precision on Written Data" on https://influxdb.com/docs/v0.8/api/reading_and_writing_data.html.
I also think the time value should be an integer rather than a float. I'm not sure how to explain the other behaviors, where the timestamp seems to be the right date and multiplying by 1000 doesn't solve the issue, but I wonder if it's related to writing floats.
Please contact the InfluxDB support team at support#influxdb.com for further assistance.
I found the solution! The problem was only in part with the precision. Your answer was correct, the query parameter is called time_precision and I should post integers instead of floats. Which was probably the first thing I attempted with no results...
However, due to some time zone problems, my time values where in the future relative to server time and by default, any select statement includes a where time < now() statement. So, in fact values were written into the database, but not displayed because of that hidden where statement. The solution was to tell the database to return "future" values, too:
select value from some_series where time < now() + 1h

issue with yql (yahoo api's) using oauth

i am using oauth_util.rb ( https://gist.github.com/383159 ) and my YQL query is
"select * from search.termextract where context=\"#{text}\""
This works where text is a short string, but fails for the longer ones with the following error:
RuntimeError (Please provide valid credentials. OAuth oauth_problem="signature_invalid", realm="yahooapis.com" for text [ Virdhawal Khade Wins Historic Medal | Sports News Bangalore November 16, 2010 -19-year-old Indian swimmer and GoSports Foundation awardee, Virdhawal Khade, has made history at the Guangzhou Games by clinching the Bronze Medal in the Men's 50m Butterfly event. ... Starting the Finals in fifth place, Veer's performance was nothing short of astonishing, as he finished with his season best time of 24.31 seconds. He finished 0.65 seconds shy of first placed 27-year-old Zhou Jiawei, the top ranked Chinese Swimmer who is also ... ]):
Thanx in advance.
got it... needed to use URI.encode to encode the URI instead of CGI::escape

Resources