Formatting IMPORTXML Xpath query into readable data for Google Sheets - google-sheets

I am importing XML data to a google sheet which has the following structure:
edit: URL: https://sonicstate.com/news/tools/revive_stats/client_camp_3785.xml
<data>
<campaignId>15802</campaignId>
<campaignName>Some name</campaignName>
<startDate>
<year>2021</year>
<month>12</month>
<day>02</day>
</startDate>
<endDate>
<year>2021</year>
<month>12</month>
<day>13</day>
</endDate>
</data>
<data>
.... another record
</data>
I want the results of multiple rows to be imported with date values concatenated so they can appear in manageable rows in the sheet I can then query while the other values will appear in their own cells with a row for each record.
eg
15802 | Some name | 2021/12/02 | 2021/12/13
15803 | Another name | 2021/11/30 | 2021/12/04
I have tried:
IMPORTXML("myurl" , "//data/campaignId | //data/campaignName | //data/startDate/year | //data/startDate/month|//data/startDate/day")
But each value returns on a separate row with cells for year, month, day
eg:
15802
Some Name
Year | Month | Day
15802
Another Name
Year | Month | Day
etc
I also tried:
IMPORTXML("myurl" , "concat(//data/campaignId , //data/campaignName , //data/startDate/year,'/', //data/startDate/month,'/',//data/startDate/day")
But that only returns a single record. I'm struggling to find the right terms to search for what I am trying to achieve. Also Sheets XMLIMPORT uses XPath 1.0 which limits the functions available

try:
=INDEX(SUBSTITUTE(TRIM(SPLIT(FLATTEN(SPLIT(QUERY(FLATTEN(QUERY(TRANSPOSE(
IFERROR(IF(0=MOD(ROW(A:A)-1, 5)-{0, 1, 4, 3, 2}, {"♦","","","",""}&TEXT(
IMPORTXML(A1, "//data/campaignId|//data/campaignName|//data/startDate/year|//data/startDate/month|//data/startDate/day"),
{"#","#","#","#","#"})&{"♠","♠","","♣","♣"}, ))),,9^9)),,9^9), "♦")), "♠")), "♣ ", "/"))
formula explanation
shorter fx:
=INDEX(SUBSTITUTE(TRIM(SPLIT(FLATTEN(SPLIT(QUERY(IFNA(
CHOOSE(MATCH(MOD(SEQUENCE(999)-1, 5), {0, 1}), "♦", )&
IMPORTXML(A1, "//data/campaignId|//data/campaignName|//data/startDate/year|//data/startDate/month|//data/startDate/day")&
CHOOSE(MATCH(MOD(SEQUENCE(999)-1, 5), {0, 1, 2, 3, 4}), "♠","♠","♣","♣","")),,9^9), "♦")), "♠")), "♣ ", "/"))

Try this
in A1 : your url
in A2 : your function without any change
in B1 : put 5 as you request 5 values per item
in B2 :
=ARRAYFORMULA(VLOOKUP(SEQUENCE(ROUNDUP(COUNTA(A2:A)/B1),B1,ROW(A2)),{ROW(A2:A),A2:A},2,0))
https://docs.google.com/spreadsheets/d/1nFcPgXgRc11-WWICG8Y8KEsB4qAoOt1lIbptHzXdC4M/edit?usp=sharing

try:
=INDEX({IMPORTXML(A1, "//data/campaignId"),
IMPORTXML(A1, "//data/campaignName"),
IMPORTXML(A1, "//data/startDate/day")&"/"&
IMPORTXML(A1, "//data/startDate/month")&"/"&
IMPORTXML(A1, "//data/startDate/year")})

Related

Use google query as an array formula to sum values in a column based on information in another column

I have 2 tabs needing worked with. The first tab, market_pull has a function that pulls information from EVE Online's ESI and sorts the information into several columns:
duration | is_buy_order | issued | location_id | min_volume | order_id | price | range | type_id | volume_remain | volume_total
The second tab, bulk_market_data sorts the information into several columns:
Citadel ID | Item Id | Item Name | Volume Ea | Qty Available | Lowest Price | Total Volume | Jita Sell | ISK Difference | % Difference
I need help with the bulk_market_data tab. I need to use the Item Id column from bulk_market_data as a criteria to compare to market_pull the column type_id pull the MIN value from the corresponding row in the price column.
I need to do essentially the same thing except I need to use the Item Id column from bulk_market_data as a criteria to compare to market_pull the column type_id pull the total SUM value from the all the corresponding rows in the volume_remain column.
I'm using array formulas because in the bulk_market_data tab there is about 10,000 rows and when I had a formula in every row for every column the sheet slowed down drastically. Thank you for your time and HERE is a sample spreadsheet with the concept.
use in F4:
=ARRAYFORMULA(IFNA(VLOOKUP(C4:C, QUERY({market_pull!C4:C, market_pull!J4:K},
"select Col2,sum(Col3) where Col2 is not null and Col1 = FALSE group by Col2"), 2, 0)))
use in G4:
=ARRAYFORMULA(IF(C4:C="",,IFNA(VLOOKUP(C4:C, SORT(QUERY({market_pull!C4:J},
"select Col8,Col6 where Col1 = FALSE"), 1, 1, 2, 1), 2, 0), 0)))
use in H4:
=ARRAYFORMULA(IF(C4:C="",,ROUNDUP(IF(
E4:E1004*F4:F1004=0,,E4:E1004*F4:F1004), 1)))

count empty cells until the next filled cell

I have a table similar to the one below and like to automate the calculation of the sum column. The number of rows per day varies. I'm looking for a way to find the number of empty cells in the date column after the current row. This number can then be used to fill the sum column.
Is there regardless of the solution below a way to count the number of empty cells between the dates?
Date |Value|Sum
----------+-----+---
16/07/2020| 2| 5
| 3|
17/07/2020| 2| 10
| 3|
| 5|
18/07/2020| 2| 11
| 3|
| 5|
| 1|
if you start from row 1 use:
=ARRAYFORMULA(IF(A:A="",,VLOOKUP(A:A, QUERY({VLOOKUP(ROW(A:A),
FILTER({ROW(A:A), A:A}, A:A<>""), 2), B:B},
"select Col1,sum(Col2) group by Col1"), 2, 0)))
=ARRAYFORMULA(IFNA(VLOOKUP(A:A, QUERY(IF(B:B="",,VLOOKUP(ROW(A:A),
IF(A:A<>"", {ROW(A:A), A:A}), 2, 1)),
"select Col1,count(Col1)
where Col1 is not null
group by Col1
label count(Col1)''"), 2, 0)))
I don't think you need the answer to your first question to figure out the answer to the Sum.
With the entirety of column C blank, try this in C1:
=ARRAYFORMULA({"Sum";IF(A2:A="",,VLOOKUP(A2:A,QUERY({VLOOKUP(ROW(A2:A),FILTER({ROW(A2:A),A2:A},A2:A<>""),2),B2:B},"Select Col1,SUM(Col2) group by Col1"),2,0))})
If that doesn't work, it might be easier to demonstrate the idea on a sample sheet.
André, try this:
1.) Delete C:C entirely (including the header).
2.) Place the following formula into cell C1:
=ArrayFormula({"Sum";IF(A2:A="","",VLOOKUP(A2:A,QUERY({VLOOKUP(ROW(A2:A),FILTER({ROW(A2:A),A2:A},A2:A<>"",B2:B<>""),2,TRUE),B2:B},"Select Col1, SUM(Col2) Group By Col1"),2,FALSE))})
UPDATE:
Your post example shows headers. The formula I suggested, then, accounted for those headers. Since your actual sample sheet is different and does not use headers like the original post, you'd use this version:
=ArrayFormula(IF(A:A="","",VLOOKUP(A:A,QUERY({VLOOKUP(ROW(A:A),FILTER({ROW(A:A),A:A},A:A<>"",B:B<>""),2,TRUE),B:B},"Select Col1, SUM(Col2) Group By Col1"),2,FALSE)))

In Google Sheets How do I get a sum of comma separated values in rows against an ID

I have a table which has a list of id's against names
Sheet 1
A | B
1 | Joe
12 | Dave
23 | Pete
I then have a table of rows which shows when a person was present at an event (through their ID)
Sheet 2
A | B
boston | 1
florida | 1,12
nyc | 12,23
In the 3rd sheet for appearances, I am then looking to achieve the following
Sheet 3
A | B (Appearances)
Joe | 2
Dave | 2
Pete | 1
I can get this to work when just one person makes an appearance with something like =COUNTIF(appearances!A:A, INDEX(name_db!$A$2:$A$1000, MATCH ($A11, name_db!$B$2:$B$1000, 0)))
But as soon as I add a comma value it all goes wrong.
I've tried looking into vLOOKUPS and things like that but can't seem to quite figure it out
Any help on where to look would be much appreciated
=ARRAYFORMULA(IFERROR(VLOOKUP(G:G, QUERY(VLOOKUP(TRANSPOSE(SPLIT(CONCATENATE(
IF(IFERROR(SPLIT(E:E, ","))<>"", "♦"&SPLIT(E:E, ","), )), "♦")), A:B, 2, 0),
"select Col1,count(Col1) group by Col1 label count(Col1)''", 0), 2, 0)))
This works for me:
=ARRAYFORMULA(IFERROR(SUM(SPLIT(D2;","))))

Google Sheets formula with variable number of disjoint columns

In the table I have multiple disjoint columns with similar meaning: “is this row interesting?”
I want to create an array formula to get all interesting rows. How can I approach this?
Example table:
Obj id | Case 1 data | Case 1 interesting? | Case 2 data | Case 2 interesting?
1 | … | YES | … | NO
2 | … | NO | … | NO
3 | … | NO | … | YES
4 | … | NO | … | NO
5 | … | YES | … | YES
6 | … | NO | … | NO
The actual table is split into several sheets with different subsets of ids on each sheet.
My current approach is stuck with INDIRECT function not accepting array or ranges. I first search for my columns: FILTER(COLUMN(A1:1), REGEXMATCH(A1:1, "interesting")), then I convert column addresses to ranges, but when I feed the result to INDIRECT, it only returns the first column.
The desired formula would output an array of unique object ids where each row is interesting for at least one case.
UPDATE: here is a test table for this problem. There are 3 sheets: student's data with ids and 2 programs. Each program has several exams (not known beforehand. The desired formula would output an array of unique student ids with at least 1 Passed exam (in the test sheet: 1, 3, 4, 6)
={"Passing"; ARRAYFORMULA(UNIQUE(QUERY({
IF(IFERROR(REGEXEXTRACT(REGEXREPLACE(TRIM(TRANSPOSE(QUERY(
TRANSPOSE('Program 1'!A1:Z), , 999^99))), "Pass", "♠"), "♠"))="♠", 'Program 1'!A1:A, );
IF(IFERROR(REGEXEXTRACT(REGEXREPLACE(TRIM(TRANSPOSE(QUERY(
TRANSPOSE('Program 2'!A1:Z), , 999^99))), "Pass", "♠"), "♠"))="♠", 'Program 2'!A1:A, )},
"where Col1 is not null order by Col1", 0)))}
if you want to VLOOKUP it:
=ARRAYFORMULA(IF(LEN(A2:A), IF(IFERROR(VLOOKUP(A2:A, UNIQUE(QUERY({
IF(IFERROR(REGEXEXTRACT(REGEXREPLACE(TRIM(TRANSPOSE(QUERY(
TRANSPOSE('Program 1'!A1:Z), , 999^99))), "Pass", "♠"), "♠"))="♠", 'Program 1'!A1:A, );
IF(IFERROR(REGEXEXTRACT(REGEXREPLACE(TRIM(TRANSPOSE(QUERY(
TRANSPOSE('Program 2'!A1:Z), , 999^99))), "Pass", "♠"), "♠"))="♠", 'Program 2'!A1:A, )},
"where Col1 is not null", 0)), 1, 0))<>"", "PASS", "FAIL"), ))

Auto-filling empty cells with Google Sheets' query

I have a table with data that I want to run a QUERY from.
In the output tab I need just one column from the data tab, but also I have 3 empty columns in the output tab, that are not in the data tab, that I need to be filled automatically, based on conditions, preferably with the QUERY.
I am using a simple QUERY formula to load the data that I have in the source tab to the output tab.
=QUERY('Source'!$A$1:$X, "SELECT A WHERE F IS NOT NULL", 1)
The issue is that I can not have any other formulas in the output sheet, rather than the QUERY itself, as some issues arise when exporting Google Sheet that contains formulas to .CSV.
Regardless if the above is true or not, these are the rules...
This is the output that I need to have:
+---------+------------+-------------------+-----------------+
| Country | Researched | Status | Reason |
+---------+------------+-------------------+-----------------+
| UK | TRUE | In Progress | |
+---------+------------+-------------------+-----------------+
| US | TRUE | Unable to Proceed | Not a UK member |
+---------+------------+-------------------+-----------------+
Column 1 is what the QUERY extracts from the source.
Columns 2 to 4 are the ones that I need to create with the QUERY.
The value of each cell in those columns depends on column 1, except for column 2 that needs to have the value "TRUE" for each record.
Is it possible to implement multiple conditions in the QUERY itself, that will fill the empty columns in the output tab, based on conditions?
=QUERY({QUERY(Source!$A$1:$X,
"select A, 'TRUE'
where F is not null
label 'TRUE' 'Researched'", 1),
QUERY(ARRAYFORMULA(IFERROR(VLOOKUP(
QUERY(Source!$A$2:$X,
"select A
where F is not null", 0),
{"UK", "In Progress", ""}, {2, 3}, 0),
{"Unable to Proceed", "Not a UK member"})),
"select *
label Col1 'Status', Col2 'Reason'", 0)}, , 0)

Resources