YouTube Analytics API - Demographics Minimum Views Seem to Be Required for Query - youtube-api

I've been working with the demographics queries available in the YouTube Analytics API and was hoping you could shed some light on the following situation:
If I ask for demographics for a channel for a day where the channel
had around 2000 views or less, I get no rows returned.
If I ask for 3 such days in separate queries, I still get no demographic data
returned.
But if I ask for all 3 days in a single query that spans the days, I do get demographic data
So it seems like there's an imposed minimum number of views that a query needs to cover before demographic data will be returned by the API, even when other methods show there is some demog data avilable. Am I understanding this correctly? Is the API supposed to behave this way?
[Update: I originally stated that the Analytics dashboard was not matching the API on thi, when in fact they return the same results, so I've updated the title and the description.]

I can confirm that this is the intended behavior of both the API and the YouTube Analytics web interface.

This makes sense from a statistical point of view. Most statistics have a margin of error, a bigger sample helps reduce this. If there isn't enough data the analytics wouldn't be accurate. It would be possible to show those analytics (for smaller numbers) if they also included their margin of errors, but if the margin of error is too high the data would be useless. However, it should be possible to extrapolate the data for day "N" using the demographics for (N-3,N) and (N-3,N-1) and total views for those ranges (plus for day "N") (considering there is demographic data the ranges) by subtracting the demographics of the 3day range from the 4day range.
For example (assuming 10k views/day), if the 4day range has 75% green viewers out of 40,000 you have 30,000 green (lizard-people) viewers in the past 4 days. If the 3day (3 days right before day "N") range has 87.5% green viewers out of 30,000 total, you have 26,250 green viewers in that range. By subtracting, 4day-range from 3day-range you're left with 3,750 green viewers for day "N", and since day "N" has 10k views, that means 37.5% of those views are from green lizard-people.
It should work, but I wouldn't trust that data to have a reliable margin of error. (please don't use this if it's for something important)

Related

Google Sheets: API To Get Crypto Prices and History?

As GOOGLEFINANCE() seems very limited in the cryptocurrencies it supports, are there any (free?) APIs that I can use to get data from?
Although I use GF() for ETH and BTC, I'm specifically looking for Price and Historical Closing Prices on ADA (Cardano).
I've searched the forum for suggestions, there aren't many and most are old. Binance's API seemed OK, but it gives prices in USDT instead of USD.
If anyone is interested, I found an API that offers a free key, although limited in the number of daily calls you can make: CoinAPI.
It seems very powerful, with quotes available in most currencies. So far, I've managed to get a current price:
Formulas shown in brown.
(1) shows the raw data returned, a two rows delimited by semi-colons.
(2) wraps a QUERY() around IMPORTDATA(), using offset 1 plus param 0, to not return the header row, then wraps all that in SPLIT() to separate the delimited text into columns.
(3) wraps (2) with INDEX() so I can get just the Price in the 4th col.
As this value will not automatically update like GOOGLEFINANCE(), I think I'll need to set a Trigger to do that.
I've also retrieved historical data, but I've yet to figure out how to split multiple rows of delimited text from the IMPORTDATA() function.
[Edit] See the solution to splitting multiple rows by #player0 at https://stackoverflow.com/a/69055990/190925.

using sum and countifs to get a percentage of 'yes'es across multiple columns by month and team - is there a simpler way?

I've been asked to create a summary for some google form responses, and though I have a working solution, I can't help but feel there must be a more elegant one.
The form collects data related to case checking - every month each team (there's 100+ teams) has to check a certain number of cases based on how many staff are in their team, and enter the results for each case they've checked in the google form. The team that have set this up want me to summarise the data by team, month, and section of the form (preliminary questions, case recording, outcomes, etc). There are 8 sections on the live form, ranging from 1-13 questions, all with Yes/No/NA/blank answers.
(honestly, it's not how I'd have approached setting all this up, but that is out of my hands!)
So they're essentially looking for a live monthly summary with team names down the side, section names along the top, and a %age completed that will keep up with entries as they come in (where we can also use importrange and query to pull the relevant bits into other google sheet summaries, as and when needed).
What I've currently got is this:
=iferror(sum(countifs('Form Responses'!$B:$B,$A3,'Form
Responses'!$F:$F,"Yes",'Form Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$G:$G,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$H:$H,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$I:$I,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$J:$J,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)),countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$K:$K,"Yes",'Form
Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1)))/(countifs('Form
Responses'!$B:$B,$A3,'Form Responses'!$E:$E,">="&$B$1,'Form
Responses'!$E:$E,"<"&edate($B$1,1))*6),0)
It works, but it feels like a bit of a brute-force-and-ignorance solution. I've tried countifs & array, I've looked a pivot but I can't get the section groups, I've had a play with query but I can't figure out how to ask it to count all Yeses in multiple columns at once.
Is there a more elegant solution, or do I have to resign myself to setting up the next financial year's summaries like this?
Edit:
You can use plain array boolean multiplication to achieve the count, as trues are converted to 1s and false are converted to 0s:
=TO_PERCENT(ARRAYFORMULA(
SUM((f!F1:K="Yes")*(f!E1:E>=B1)*(f!E1:E<EDATE(B1,1))*(f!B:B=A3))/
SUM(6*(f!E1:E>=B1)*(f!E1:E<EDATE(B1,1))*(f!B:B=A3))
)
)
Renamed Form Responses to f
Numerator: SUM of
Question filter (f!F:K =Yes) and
Month filter (f!E:E is within month of B1) and
Team filter(B:B = A3)
Denominator: 6 times the SUM of
Month filter (f!E:E is within month of B1) and
Team filter(B:B = A3)
On this sample sheet that you provided you'll notice two new tabs. MK.Retab and MK.Summary.
On MK.Retab is a single formula in A2 that "re-tabulates" all of your survey data into a format that is much easier to analyze going forward. That tab can be "hidden" on your real project. It will continue to build the 6 column dataset forever. It would be a sort of "back end" sheet, only used to supply data to any further downstream analysis.
On MK.Summary is a single formula in cell A1 that Query's that dataset from MK.Retab and shows the percentage of Yes's by month by section by team in a format similar to what you proposed. I coded it to display the most recent month at the left, immediately to the right of the team names, and to push historical data off to the right. Even though people are often used to seeing time go from left to right, I find that the opposite method nice because it keeps you from having to scroll sideways to see the most recent data. It is very simple to change should you want to by getting rid of the "desc" that you find in the "order by" clause of the query string.
I find this kind of two step solution to problems like your useful, because while the summary migth not be exactly what you want, it's always easier to build formulas and analyses off of the data as laid out in the MK.Retab sheet.
As for the formula in MK.Retab, it is based on a method that I came up with a while back that constructs a large vlookup where the [search key] is actually a sequence of decimal numbers that is built by counting the number of rows in your real data set and multiplying by the number of columns of data that need to be repeated for each row. I built a demo some time ago that I'm happy to share with folks if you want to understand better how it works.
You said that your goal was to understand the formulas so that you could modify them going forward as needed. I'm not sure how easy that will be to do, but I can try my best to answer any questions you might have about the method or the solution generally.
What I can tell you is that some of the formulas are more complicated than they need to be because you just used Q1 Q2 Q3 etc instead of the actual questions. if you had a list of the questions asked somewhere (on some other tab, say), and what you wanted to call/name their corresponding "sections", it would make the formula significantly less complicated. As it stands, I had to use the appearance of the word "Comments", in row 1 to distinguish between where one section ended and another section began. The upside to that decision though, is that the formula I wrote is infinitely expandable to the right. That is, if you were to add another 100 columns worth of questions and answers to the sample set here, the formula would be able to handle that and break it out, so long as there was the word "Comments" between each section.
Hope all this helps.

Google Spreadsheets: Sum over multiple criteria constrained by timeframe

Hello Everyone (this is all in Google Spreadsheets),
I'm trying to make a report where I have to sum the product of the number of Apples and Bananas bought respectively within a certain time frame by different people. The price of the goods differs, depending on who is buying them. The people who buy it do so at different times and purchase a different number of items. The formula should be extendable to include additional people in the future.
For details see this Google Spreadsheet.
I would like to get the calculation without needing steps in-between. If it makes any difference, the number of items bought on specific dates are actually in different worksheets, so they're not on the same page as in the example. I named the ranges accordingly (even though I believe/hope it makes little difference in terms of what formula to use).
Finally, if it were possible to use one formula for the total expenditures, instead of the sum over the cells above that would be grand.
I use the DATEVALUE, because otherwise I wouldn't be able to find the first and the last date of the calendar weeks. There is a dedicated DATEVALUE column in every worksheet. (Additionally, I don't have to deal with the intricacies of the date format, which gets me every time.)
I hope I didn't miss an answer to my problem and provided enough information. I can't get my head around it, I am really looking forward to your answers.
Thank you everyone :)
Greg
P.S. A picture of the sheet, if required: Apples, Bananas & €
Credit to Sennsei from the Google Docs Help Forum (Link). I quote:
I won't be surprised if this isn't the best way to go about this, but regardless, here's my take on solving your problem. Result is based on this modified worksheet.
Apples:
=IFERROR(SUM(ARRAYFORMULA(ARRAYFORMULA(VLOOKUP(FILTER('Prices/Amounts'!$J$4:$J,'Prices/Amounts'!$K$4:$K>=B4,'Prices/Amounts'!$K$4:$K<=B5),FILTER('Prices/Amounts'!$J$4:$L,'Prices/Amounts'!$K$4:$K>=B4,'Prices/Amounts'!$K$4:$K<=B5),3,0))*ARRAYFORMULA(VLOOKUP(FILTER('Prices/Amounts'!$J$4:$J,'Prices/Amounts'!$K$4:$K>=B4,'Prices/Amounts'!$K$4:$K<=B5), 'Prices/Amounts'!$B$3:$D,3,0)))),0)
Bananas:
=IFERROR(SUM(ARRAYFORMULA(ARRAYFORMULA(VLOOKUP(FILTER('Prices/Amounts'!$F$4:$F,'Prices/Amounts'!$G$4:$G>=B4,'Prices/Amounts'!$G$4:$G<=B5),FILTER('Prices/Amounts'!$F$4:$H,'Prices/Amounts'!$G$4:$G>=B4,'Prices/Amounts'!$G$4:$G<=B5),3,0))*ARRAYFORMULA(VLOOKUP(FILTER('Prices/Amounts'!$F$4:$F,'Prices/Amounts'!$G$4:$G>=B4,'Prices/Amounts'!$G$4:$G<=B5), 'Prices/Amounts'!$B$3:$D,2,0)))),0)
Expenditure:
=B7+B8
The B4's and B5's refer to the date constraints. Since the formulae contain $ signs to ensure the cells stay the same, the formula can be dragged across to apply to other weeks without having to touch the formulae. As a plus side, these formulae allows a sheet to be infinitely expandable!
Sennsei

Average wins per player

I'm not really good with Java, even less with Sheets and i need help for this :
I want to create a list of average win of players using a list with several other players :
Example (I want to get the average on the right):
Conceptually this would be "for each player, see if the player match and if he won (ratio 1:1) then continue until there is no more game (or the end of the array)".
It's for a team game and we use Google Sheets a lot for it; I wanted some stats too.
JavaScript != Java.
Additionally, there's no JavaScript involved here if you're just using Sheets.
=AVERAGE(COUNTIF(A2:A7, "Win")/COUNTA(A2:A7))
Steps for understanding:
COUNTIF all cells in a range containing the text "Win".
COUNTA all cells in the same range, regardless of what they contain.
Calculate the AVERAGE of those two values using the built-in function.
A2:A7 is just an example and should be replaced with whatever range your RESULT column takes up.

Is there any limit to the number of rows returned by API?

I am making a bulk call with 30 posts and daily data of all. Is there any limits to the number of rows that will be returned by the API?
I am having problem getting the results.
Can anyone please help.
YouTube doesn't return any rows ... it's not relational data. That may sound like a pedantic thing to point out, but it's crucial for this next point; the API will return 50 videos at a time, along with tokens to get more results based on the same query, up to a total of 500 ... because the data isn't relational, you can't just "select all rows" that match a certain criteria. Rather, it is probabilistically determining relevance to your search parameters, and after about 500 results the algorithms don't have enough certainty to make additional results relevant.
So in your case, where you can change the date as needed (to allow the algorithms to be more specific), you'll want to do a series of calls; perhaps one at a time (since you have to paginate anyway to get more than 50 results, it's probably not that much more expensive in terms of network bandwidth).

Resources