Import site-specific data - google-sheets

The data on the page is delivered as follows:
https://int.soccerway.com/international/europe/uefa-champions-league/20192020/group-stage/r54142/
1 - Below each schedule is a link to the match.
2 - I would like to import all data at once.
3 - The result I seek would be as follows:
4 - Import separately, I can, but as they are separate formulas, it takes a long time, I would like a way to import all at once, for a formula only if it were possible.
5 - The Xpath are:
"//*[#class='date no-repetition']"
"//*[#class='score-time status']/a"
"//*[#class='score-time status']/a/#href"
6 - An important detail, I indicated the 'score-time status' because there are games that appear as 'score-time score' but these cannot be imported.
7 - There is another detail that complicates, the time comes with spaces between the sign of :, so for him I use the =SUBSTITUTE(," ","")
Is there any way to do this that I want?
I've tried using ={;;} to import the data, but can't make calls to more than two =IMPORTXML().
I also tried for =IMPORTHML() but it can't fetch the links from each of the below-hours matches and the date also appears in only one of the games...

How about this answer? I think that there are several answers for your situation. So please think of this as just one of several possible answers.
xpath:
Unfortunately, I couldn't find the xpath for directly retrieving the 3 values in your question. So in this answer, the following xpath are used.
Date: //td[#class='date no-repetition']/span
Time: //td[#class='score-time status']/a/span
URL: //td[#class='score-time status']/a/#href
Sample formula:
=ARRAYFORMULA({IMPORTXML(A1,"//td[#class='date no-repetition']/span"),IMPORTXML(A1,"//td[#class='score-time status']/a/span"),"https://"&IMPORTXML(A1,"//td[#class='score-time status']/a/#href")})
In this formula, the URL of https://int.soccerway.com/international/europe/uefa-champions-league/20192020/group-stage/r54142/ is put to the cell "A1".
Retrieved 3 values are put to the column "A", "B" and "C".
Result:
Note:
In above case, I think that the time zone might be the place when the values are retrieved by IMPORTXML.
If you want to change the timezone to your own Spreadsheet, how about the following sample formula?
=ARRAYFORMULA({IMPORTXML(A1,"//td[#class='date no-repetition']/span/#data-value")/86400+DATE(1970,1,1),IMPORTXML(A1,"//td[#class='date no-repetition']/span/#data-value")/86400+DATE(1970,1,1),"https://"&IMPORTXML(A1,"//td[#class='score-time status']/a/#href")})
In this case, please set the format to the column "A" and "B".
In above formula, the date and time is retrieved the unix time. This value is converted to the serial number. So the converted value can be used as the date and time at Spreadsheet.
References:
IMPORTXML
ARRAYFORMULA
If this was not the direction you want, I apologize.

Related

Google Sheet - Sort - Missing data and not working as expected

I have a very simple request, I just want to sort my sheet by 3 columns : Date, Type and Course.
I have some missing row when performing the formula on one day. I also have a problem to sort by date and the other columns. It is not sorting. I try to convert to plain text and others solutions as =SORT(ARCHIVAGE!B7:J;ARCHIVAGE!B7:B;true;ARCHIVAGE!D7:D;true;ARCHIVAGE!E7:E;true)
=SORT(ARCHIVAGE!B7:J;1;true;3;4;true) but no success at all.
Here is a sample of the file a few days
Thnaks for your help on this.
j.
This formula is in cell A1 of a new tab called MK_Idea:
=QUERY(ARCHIVAGE!B:G;"where D<>'' order by B,D,F")
read about query() here.

Google Sheets – Query for a date returns no values

​​I'm trying to filter a list from another sheet by the dates of the entries and simply doesn't work:
=QUERY(Import!A:Z;"select A,T where T >= date '2021-08-27'";0)
​When I remove the date part it works fine, as expexted for filtering by text. I need the ability to sort by exact dates though, because I would like to add some more complex filters. When I set the last part of the function to a 0 instead of a 1 it shows only the first entry.
The source column is set to the correct date format. The data is pulled from another document using the IMPORTRANGE()​ function (I don't seee how this should make any difference though).
I feel like I'm misssing something simple here and would be glad if someone can point me in the right direction!
Check your date column if all cells are formatted as date. I had missing values as "='---" and the query filtering by date returned nothing. Changing the missing values to "=NA()" did the job.
Try this:
=QUERY(Import!A:Z;"select A,T where T >= date '"&TEXT("2021-08-27";"yyyy-mm-dd")&"'")

Google sheets Combining Query with Today() [duplicate]

This question already has an answer here:
How to compare dates or date against today with query on google sheets?
(1 answer)
Closed 5 months ago.
I would like to use the today function in a query. Right now I have to manually change the date each morning, which is time consuming. The query is:
=QUERY(StageTracking!A:W, "SELECT C where A =date'2021-05-13'")
When I try
=QUERY(StageTracking!A:W, "SELECT C where A =today()")
I get a #VALUE error.
I know it's just a syntax thing I'm not catching but I have tried many variations on the line above.
Let me offer another (perhaps simpler) option, given what I can tell from your post info.
Add a header in the top cell of your results column and put the following formula into the second cell of that otherwise empty column:
=FILTER(StageTracking!C2:C,StageTracking!A2:A=TODAY())
ADDENDUM (after seeing the actual sheet):
This is an excellent case in point of why it is always most efficient and effective to share a link to a sheet, since your formula attempts as originally posted (and mine as posted above) would not work with your actual layout and goal.
I've added a new sheet ("Erik Help").
First, I un-merged Rows 2-8 and simply increased the height of Row 2. There was no reason to merge those rows; and merging nearly always causes issues, especially in ranges where formulas or reference ranges are involved.
Next, I deleted your original A2 formula (=QUERY(StageTracking!A1:W1000,"select C where A = '06/23/2021'",1)) and replaced it with the simple =StageTracking!C1, which accomplishes the same thing. Again, I'm not sure what led to the long formula, but it was unnecessary.
I then deleted all of your individual erroneous formulas from B2:K2 and replaced them with one formula in B2:
=FILTER(FILTER(StageTracking!E2:W,StageTracking!A2:A=TODAY()),ISODD(COLUMN(StageTracking!E1:W1)))
This formula first creates a FILTERed array of everything from E2:W where A2:A = TODAY(). Then a second FILTER is applied to bring in only the odd columns.
NOTE: currently, while the formula is working, you will see no results in E2:W because you don't have any data for TODAY in your StageTracking sheet yet. Once you add data for today's date, you will see the formula populate B2:K2. (Or, you can simply add -1 after TODAY() in the current formula if you want to see the results from "yesterday" temporarily, in order to be sure the formula is, in fact, working.)
Try this:
=QUERY(StageTracking!A1:W1000,"select C where A = '06/23/2021'",1)
or
=QUERY(StageTracking!A1:W1000,"select C where A = date '"&TEXT(TODAY(),"yyyy-mm-dd")&"'",1)
Take a few minutes to review the scalar functions supported in the QUERY() function.
https://developers.google.com/chart/interactive/docs/querylanguage#scalar_functions
You can use YEAR(), MONTH(), DAY() or NOW(). NOW() is a compete datestamp including time, so that would require more effort.

SUMIFS and/or QUERY inside ARRAYFORMULA

Google spreadsheet sample: https://docs.google.com/spreadsheets/d/1MdRjm5QmKY_vaah9c3GrvH6dDOBCQX_zvCubvN0akmk/edit?usp=sharing
Im trying to get the sum of all values for each ID. The values im trying to add up are found in the Source tab while the calculations are done in the Output. My desired values are based on 2 things: ID and Date. The Id is supposed to match and the Date is supposed to be February. I tried first just using a sumif with just matching ID and it worked using this formula: =ARRAYFORMULA(IF(A2:A="",, SUMIF(Source!A:A,A2:A,Source!B:B)))
But when I add the 2nd critera and use a sumifs function, it only outputs for the first id. Here is the sumifs formula I used: =ARRAYFORMULA(SUMIFS(Source!B2:B,Source!A2:A,A2:A,Source!C2:C,">="&DATE(2021,2,1),Source!C2:C,"<="&DATE(2021,2,28)))
I tried using query as some of the answers I found online suggested to use it but it also outputs the first data only, here is the query formula I used =ARRAYFORMULA(QUERY(Source!A2:C,"select sum(B) where A = '"&Output!A2:A&"' and C >= date '"&TEXT(DATEVALUE("2/1/2021"),"yyyy-mm-dd")&"' and C <= date '"&TEXT(DATEVALUE("2/28/2021"),"yyyy-mm-dd")&"' label sum(B) '' "))
I know this is possible by making a temporary query/filter where you only include desired dates and from there I can use SUMIF, but I will be needing to make a monthly total and making 12 of these calculated temporary filters/query would take up a lot of space since we have a lot of data so I want to avoid this option if possible. Is there a better fix to this situation?
Solved by Astrotia - =arrayformula(sumif(I3:I20&month(K3:K20), A2:A6&2, J3:J20))

How to get child nodes through importxml xpath query?

I'm trying to get the seperate <td>'s to show up in Google Sheet of a <tr> that I'm importing through IMPORTXML.
This code should get my match data based on the match ID I provide, and my player ID. I feel that simply adding /* or /td to end of Xpath should work, but that's the end of my knowledge.
I tried: adding /*, /td and other to end of xPath Query but doesn't seem to work.
Even disabled JavaScript and inspected website again but to no avail.
FORMULA:
=IMPORTXML("https://www.dotabuff.com/matches/5011379854";"//tr[contains(#class,'9764136')]")
Also tried:
=IMPORTXML("https://www.dotabuff.com/matches/5011379854";"//td[parent::tr[contains(#class,'9764136')]]")
Which only gives the first of all the /td's and not the rest.
Current outputis all mushed together:
"19LemthTop (Off)ZeusCoreTop (Off) Roaminglost27108.7k127933650626.5k-183-/-5m7m21m31m"
The output that I want is separate <td> on separate lines:
"19
LemthTop (Off)ZeusCoreTop (Off) Roaminglost
2
7
10
8.7k
127
9
336
506
26.5k
-
183
-/-
5m7m21m31m"
Issue and workaround:
Although I have tried to parse the values for each row, unfortunately, it seemed that td cannot be directly parsed using a xpath with IMPORTXML as each row. But fortunately, each table can be retrieved by IMPORTHTML and also each tab can be accessed. Using them, how about the following workaround?
Retrieve a table from the URL using IMPORTHTML.
Retrieve a row including the name corresponding to 9764136 you want using a query.
Modified formula:
=TRANSPOSE(SPLIT(TEXTJOIN("#",TRUE,QUERY(IMPORTHTML(A1,"table",1), "where Col4 contains '"&IMPORTXML(A1,"//a[contains(#href,'9764136')]")&"'", 0)),"#",TRUE,TRUE))
The URL of https://www.dotabuff.com/matches/5011379854 is put to the cell "A1".
After the table was retrieved, the row is retrieved from the table by the query.
The important point of this workaround is the methodology. I think that there are various formulas for retrieving the value. So please think of above sample formula as just one of them.
Result:
Note:
If you use above formula for other URL, an error might occur. Please be careful this.
References:
IMPORTHTML
IMPORTXML
TEXTJOIN
SPLIT
TRANSPOSE

Resources