How use REGEXREPLACE in google sheets? - google-sheets

I import data via IMPORTXML and get it in the form as shown in column A:B
How can I use REGEXPREPLACE to bring them into the form as I showed in columns D:F ?
With the price I tried
ARRAYFORMULA(--REGEX REPLACE(IMPORT XML(I 2;"//span[#class='tooltip_3']");"\D";)),
but in in this form, I have extra zeros at the end.
https://docs.google.com/spreadsheets/d/1nx4m_qMRA5Fm1lr18p2Mvt6Ds96_Kay-zJKGVZn2Uaw/edit?usp=sharing

try:
=ARRAYFORMULA({
SPLIT(IMPORTXML("https://www.rsi-llc.ru/catalog/195/";"//h2[#class='product-item__title']"); " | ";) \
REGEXEXTRACT(SUBSTITUTE(IMPORTXML("https://www.rsi-llc.ru/catalog/195/";"//span[#class='tooltip_3']"); " "; ); "\d+")*1;
SPLIT(IMPORTXML("https://www.rsi-llc.ru/catalog/100/";"//h2[#class='product-item__title']"); " | ";) \
REGEXEXTRACT(SUBSTITUTE(IMPORTXML("https://www.rsi-llc.ru/catalog/100/";"//span[#class='tooltip_3']"); " "; ); "\d+")*1})

Related

Google sheets - Combine rows from one column, until a specific character is detected

Lets say i have column A
A
"|---|----|--|-|----|-------|--
--|----|--|"
|-----| | | | |------
----------|--------------
-----|----| |"
And i want to make it like this:
B
|---|----|--|-|----|-------|----|----|--|"
|-----| | | | |-----------------------------------|----| | |"
So if B is the correct form(where " marks the end of the wanted row and the beginning of the next) but i have the information separated into smaller rows like column A, how can i convert column A to look like B, can it be made with formula which converts all information into one row until it sees " and then goes on a new row and repeats the process until it reaches the bottom for the column? I tried functions like JOIN, CONCAT, TEXTJOIN but i can't make it work. I also checked for options in Notepad but couldn't find a solution. Also tried with Excel. I'm open to any ideas.
Use string manipulation, like this:
=arrayformula(
transpose(
split(
join("", A2:A),
"""", false, true
)
)
& """"
)

Converting full names to 'Surname, First name' format on Google Spreadsheets

I have a Google Spreadsheet that record some author names like this:
A
A. Dagliati
A. Zambelli
A.H.M. ter Hofstede
Agnes Bates Koschmider
Ágnes Vathy-Fogarassy
Ahmed B. Najjar
Ala Norani
I want column B to receive some formula such that B will display the last name, a comma, and the first/middle name, like this:
A B
A. Dagliati Dagliati, A.
A. Zambelli Zambelli, A.
A.H.M. ter Hofstede Hofstede, A.H.M. ter
Agnes Bates Koschmider Koschmider, Agnes Bates
Ágnes Vathy-Fogarassy Vathy-Fogarassy, Ágnes
Ahmed B. Najjar Najjar, Ahmed B.
Ala Norani Norani, Ala
How can I do that?
Try this formula on row 2 of your sheet, with an empty column below it.
=ArrayFormula(IF(LEN(A2:A),REGEXEXTRACT(A2:A,".+\s(.+)") &", " & LEFT(A2:A,LEN(A2:A)-LEN(REGEXEXTRACT(A2:A,".+\s(.+)") )),""))
Image:
=CONCAT(RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1)), CONCAT(", ", LEFT(A1, FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))))
Basically, we are cutting off text from last index of " " (whitespace), append comma and do same from beginning.
Your best bet is probably to use regular expression replacement. It's not pretty, but it's the easiest way to search for the delimiter and perform the replacement using capture groups. I am still working on how to properly support cases where there is a single name like "A". For now it assumes it is the surname. Here is the formula:
=REGEXREPLACE(A1,"(.*?)([^ ]+)$", "$2, $1")
This function will search for any character non-greedily (.*? see the docs for RE2 here) which will allow the second capture group to find all the characters from the end of the string to the first delimiter which is a space in this case. Since we are using capture groups in the regular expression we can reference them in the replacement string using the $1 and $2 placeholders.
The output is as desired:
You could use a combination of:
SPLIT()
COLUMNS()
INDEX()
ARRAY_CONSTRAIN()
JOIN()
=JOIN(
", ",
INDEX(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))),
JOIN(" ", ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1))
)
Get the last name
Split up the name by space SPLIT(A2, " ")
Get the number of names COLUMNS(SPLIT(A2, " "))
Select on the last name INDEX(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " ")))
Get the other names
Split up the name by space again SPLIT(A2, " ")
Get all names except the last name ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1)
Join them back together JOIN(" ", ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1))
Join in desired order
JOIN(", ", LAST_NAME_FORMULA, OTHER_NAMES_FORMULA)

Problem with query using Join if value of Join contains '

I needed to find how to extract all lines from another sheet (Incidents!A2:P) to my sheet if line contains a list of words (B2:B) in column E.
I didn't want to ask everything from you so I had a look by myself and I succeed to write the following formula :
=query(Incidents!A2:P;"Select * where E matches " & "'" & (join("|";B2:B)) & "'")
It works perfectly fine ... except if one of the words of my list contains the ' character.
Example :
A B
2 let go
3 let's go
4 lets go
My formula works if there's no " ' " in the line #3
I thing the "'" is understood as a programming symbol. Do someone have a solution ? An idea ?
Thanks
Oliver
Try replacing single quotes with doubled double quotes:
=query(Incidents!A2:P;"Select * where E matches " & """" & (join("|";B2:B)) & """")

Easiest way to query multiple sheets that are named by year, for all time?

I use a big nasty formula to query multiple sheets in the same Google Sheet document named according to year (2020, 2019, 2018, etc...) to sum up a total value. Because I need to query a filtered range in a complex way, I've figured out the best way to do this without running into other troubleshooting issues is to SUM multiple queries like so:
=SUM(
IFERROR(QUERY(FILTER({EOMONTH(INDIRECT("'"&**TO_TEXT(YEAR(TODAY()))&"'!A1:A"&ROWS(INDIRECT("'"&TO_TEXT(YEAR(TODAY()))&"'!K2:K"))), 0),INDIRECT("'"&TO_TEXT(YEAR(TODAY()))&"'!K2:K")}, [filter conditions]), "select Col2 label Col2' ' ")),
IFERROR(QUERY(FILTER({EOMONTH(INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))&"'!A1:A"&ROWS(INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))&"'!K2:K"))),0),INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))**&"'!K2:K")}, [filter conditions]),
"select Col2 label Col2' ' "))
)
For some context, you can see the much larger IF formula that this SUM is meant to be nested into, in the "Example Matrix" tab of the sheet. My focus for this question is on the INDIRECT references, which I have been using to dynamically reference the most current year's sheet and the previous year's sheet.
The problem is, if I want to keep doing this for every sheet as the years go on, I have to manually add a whole other query into my SUM using INDIRECT("'"&TO_TEXT(YEAR(TODAY()-730))&"'!K2:K") and INDIRECT("'"&TO_TEXT(YEAR(TODAY()-1095))&"'!K2:K") and so on, and that is just not an option considering how many of them I would need to add to multiple formulas in multiple sheets.
Is there any way I can adapt this for simplicity or perhaps make it into a script to accomplish summing queries for all sheets that are named by year for all time?
Here's a copy of my Example Sheet: https://docs.google.com/spreadsheets/d/1b29gyEgCDwor_KJ6ACP2rxdvauOzacDI9FL2K-jgg5E/edit#gid=1652431688
Thank you, any help is appreciated.
Usually an array formula would be a way to go in such case, but INDIRECT does not work inside array formulas.
There are a few approaches using scripting like this.
Here I will describe another approach: formula generation. We'll get a string with the formula and manually place it in a cell. It would be nice to put it in an inverted FORMULATEXT function, but unfortunately there is no such function at the moment, so we'll just paste it manually.
Step 1
Set the year limits (sheet names) in some cells. The first year of the period will be in K22, and the last will be in M22.
I set the period to from 2005 to 2040.
All the year numbers will e easily generated with SEQUENCE. If there were arbitrary names, a range of those names set manually would've been needed.
Step 2
Write a formula generator for what you need. We just generate a string here, in that string will be a formula you would normally type manually. It is not hard, but there are a lot of repetition and it would be tedious to write it manually.
Here is the generator:
=ARRAYFORMULA(
"=SUM(
FILTER(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!D2:D, 0)"
) & "
},
ISNUMBER(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!D2:D, 0)"
) & "
}
),
REGEXMATCH(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!A2:A, 0)"
) & "
},
""(?i)^TOTAL$""
),
REGEXMATCH(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!C2:C, 0)"
) & "
},
""(?i)^"" & IF(F19 = ""Condition 1 Count"", ""Condition 1"", ""Condition 2"") & ""$""
)
)
)"
)
Compared to the original formula the resulting formula is heavily changed, simplified. For example there is no actual need for INDIRECT with this approach, EOMONTH wasn't used anywhere and so on.
Step 3
Copy that result as text, remove enclosing quotes, replace double double quotes with single double quotes: "" -> ".
Now we've got our formula to paste somewhere as we could've typed manually. Here is a part of it:
=SUM(
FILTER(
{
IFERROR('2005'!F2:F, 0);
IFERROR('2006'!F2:F, 0);
...
IFERROR('2039'!F2:F, 0);
IFERROR('2040'!F2:F, 0)
},
ISNUMBER(
{
IFERROR('2005'!F2:F, 0);
IFERROR('2006'!F2:F, 0);
...
IFERROR('2039'!F2:F, 0);
IFERROR('2040'!F2:F, 0)
}
),
REGEXMATCH(
{
IFERROR('2005'!C2:C, 0);
IFERROR('2006'!C2:C, 0);
...
IFERROR('2039'!C2:C, 0);
IFERROR('2040'!C2:C, 0)
},
"(?i)^TOTAL$"
),
REGEXMATCH(
{
IFERROR('2005'!E2:E, 0);
IFERROR('2006'!E2:E, 0);
...
IFERROR('2039'!E2:E, 0);
IFERROR('2040'!E2:E, 0)
},
"(?i)^" & IF(F19 = "Condition 1 Count", "Condition 1", "Condition 2") & "$"
)
)
)
Step 4
Manually place this resulting formula into some cell.
It does what it supposed to do, dropdown reference works, non-existing sheets are tolerated.
There is no 2021 sheet for example, but when it will be crated there will be no need to change the formula, data from that new sheet will be used.
You'll need to repeat the process in two cases: the formula needs some change in logic or it is almost 2040 and you want to add another 50 years to the period. Still that process of generation is faster than making changes manually to the resulting monster.
A few notes on the original formula:
YEAR(TODAY() - 365) ➡ YEAR(TODAY()) - 1. With your approach there will be an error because of leap years. Depends on the years number, but at the beginning of a year it will emerge for sure.
"select Col2 label Col2' ' " ➡ "select Col2 label Col2 ''". Do you really need a column with a header name ' ' (just a space)? I'm guessing it meant to be blank.
No need for TO_TEXT.

Replace comma chars within importdata function

I'm importing a .csv file with the IMPORTDATA function. The separator is ; and decimal char , on which Google Sheets automatically applies a text to column. I guess this is the expected behavior from IMPORTDATA but as a result, my file is not correctly parsed.
I've tried to use the substitute function on , with . but I guess that the text to the column is applied within the IMPORTDATA function.
=ARRAYFORMULA(SPLIT(SUBSTITUTE(IMPORTDATA("https://drive.google.com/uc?export=download&id=1hosZrfgrKnJJgXkgmPZSKdFoYV_AxKJS"), ",", "."), ";"))
Is there any way to import a CSV with ; as a separator and , as a decimal symbol using a single formula?
I've seen solutions using multiple sheets but I'd like to keep it simple.
=ARRAYFORMULA(SPLIT(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IMPORTDATA(
"https://drive.google.com/uc?export=download&id=1hosZrfgrKnJJgXkgmPZSKdFoYV_AxKJS")), ,
999^99))), " ", "."), ";"))
to compensate for space separated values:
=ARRAYFORMULA(SUBSTITUTE(SPLIT(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(SUBSTITUTE(
IMPORTDATA("https://drive.google.com/uc?export=download&id=1hosZrfgrKnJJgXkgmPZSKdFoYV_AxKJS"),
" ", "♠")), , 999^99))), " ", "."), ";"), "♠", " "))

Resources