Converting full names to 'Surname, First name' format on Google Spreadsheets - google-sheets

I have a Google Spreadsheet that record some author names like this:
A
A. Dagliati
A. Zambelli
A.H.M. ter Hofstede
Agnes Bates Koschmider
Ágnes Vathy-Fogarassy
Ahmed B. Najjar
Ala Norani
I want column B to receive some formula such that B will display the last name, a comma, and the first/middle name, like this:
A B
A. Dagliati Dagliati, A.
A. Zambelli Zambelli, A.
A.H.M. ter Hofstede Hofstede, A.H.M. ter
Agnes Bates Koschmider Koschmider, Agnes Bates
Ágnes Vathy-Fogarassy Vathy-Fogarassy, Ágnes
Ahmed B. Najjar Najjar, Ahmed B.
Ala Norani Norani, Ala
How can I do that?

Try this formula on row 2 of your sheet, with an empty column below it.
=ArrayFormula(IF(LEN(A2:A),REGEXEXTRACT(A2:A,".+\s(.+)") &", " & LEFT(A2:A,LEN(A2:A)-LEN(REGEXEXTRACT(A2:A,".+\s(.+)") )),""))
Image:

=CONCAT(RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1)), CONCAT(", ", LEFT(A1, FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))))
Basically, we are cutting off text from last index of " " (whitespace), append comma and do same from beginning.

Your best bet is probably to use regular expression replacement. It's not pretty, but it's the easiest way to search for the delimiter and perform the replacement using capture groups. I am still working on how to properly support cases where there is a single name like "A". For now it assumes it is the surname. Here is the formula:
=REGEXREPLACE(A1,"(.*?)([^ ]+)$", "$2, $1")
This function will search for any character non-greedily (.*? see the docs for RE2 here) which will allow the second capture group to find all the characters from the end of the string to the first delimiter which is a space in this case. Since we are using capture groups in the regular expression we can reference them in the replacement string using the $1 and $2 placeholders.
The output is as desired:

You could use a combination of:
SPLIT()
COLUMNS()
INDEX()
ARRAY_CONSTRAIN()
JOIN()
=JOIN(
", ",
INDEX(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))),
JOIN(" ", ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1))
)
Get the last name
Split up the name by space SPLIT(A2, " ")
Get the number of names COLUMNS(SPLIT(A2, " "))
Select on the last name INDEX(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " ")))
Get the other names
Split up the name by space again SPLIT(A2, " ")
Get all names except the last name ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1)
Join them back together JOIN(" ", ARRAY_CONSTRAIN(SPLIT(A2, " "), 1, COLUMNS(SPLIT(A2, " "))-1))
Join in desired order
JOIN(", ", LAST_NAME_FORMULA, OTHER_NAMES_FORMULA)

Related

Google Sheets: Remove comma between specific cells in an TEXTJOIN

I'm using an ARRAYFORMULA / TEXTJOIN formula in Google Sheets to pull selected data together to make a single line of code arranged in a specific way for my project.
The resulting array needs to both INCLUDE commas in the first half, as well as EXCLUDE commas towards the end of the same formula.
example of commas needing removed
I'm currently using a ", " at the beginning of my TEXTJOIN, which works for placing a , between each cell, however I also need the last few cells (in this case: I9, O5, O6, O7, O8) to not have any commas between them.
Is there a way to do this?
Thank you in advance!
Here is a demo of what I'm working on:
https://docs.google.com/spreadsheets/d/1gTQiNKy4c376FuIWQQAomlJ6J1utCOjuq6JzRplTSu4/edit?usp=sharing
Option 01
=TEXTJOIN(", ",1,
TEXTJOIN(", ",1,C5:F8),
TEXTJOIN(", ",1,C3,I5))&", "&
TEXTJOIN(", ",1,I6:L9)&" "&
TEXTJOIN(" ",1,O5:O8)
Option 02
Use this formula to replace the last set of commas
=REGEXREPLACE(B12,
REGEXEXTRACT(B12&"", " --ar.+?(,.+)"),
REGEXREPLACE(REGEXEXTRACT(B12&"", " --ar.+?(,.+)"), ",", ""))
Try this simpler formula (based on your formula)
=INDEX(concatenate("signal code: ",
TEXTJOIN(", ",1,C5:C8,C3,I5:I8) & " "
& TEXTJOIN(" ", 1,I9, O5, O6, O7, O8)))
I got the answer from Reddit:
If your formula is:
=ARRAYFORMULA(concatenate("signal code: ",TEXTJOIN(", ",TRUE,
$C$5:$F$5,$C$6:$F$6,$C$7:$F$7,$C$8:$F$8,$C3,$I5,$I6:L6,
$I7:L7,$I8:L8,$I9,$O5, O6, O7, O8)))
Then just change the separator (in this example, a space) for those
few:
=ARRAYFORMULA(concatenate("signal code:",TEXTJOIN(",",TRUE,
$C$5:$F$5,$C$6:$F$6,$C$7:$F$7,$C$8:$F$8,$C3,$I5,$I6:L6,$I7:L7,
$I8:L8)& " " & TEXTJOIN(" ", TRUE, $I9, $O5, O6, O7, O8)))

How can I Countunique only first word when formular doesn't accept Æ, Ø, Å

In this sheet I'm trying to count different destination. It seems to be problematic when it comes to Æ, Ø and Å.
The formula
=COUNTA(B3:B12) & " travels to " & COUNTUNIQUE(ARRAYFORMULA(TRIM(IF(LEN(B3:B12);IFERROR(REGEXEXTRACT(LOWER(B3:B12);"^([\w\-]+)");B3:B12);))))&" differenct countries"
How can I make the formula accept these characters?
the TRIM(IF(LEN(B3:B14);IFERROR( is unnecessary in your formula.
the formula can be contracted to:
=COUNTA(B3:B14) & " travels to " & COUNTUNIQUE(ARRAYFORMULA(REGEXEXTRACT(LOWER(B3:B14);"\w+")))&" different countries"
EDIT:
if there will be any blanks in the data, then TRIM(IF(LEN(B3:B14); is unnecessary, and the formula can be contracted to:
=COUNTA(B3:B14) & " travels to " & COUNTUNIQUE(ARRAYFORMULA(IFERROR(REGEXEXTRACT(LOWER(B3:B14);"\w+");)))&" different countries"
EDIT 2:
in regards to your enquiry about countries with two word names, the following formula uses "[^,\d(]*" to find everything up to the first comma, number, or opening bracket, then uses TRIM to remove trailing spaces, so can return the full name.
=COUNTA(B3:B16) & " travels to " & COUNTUNIQUE(ARRAYFORMULA(IFERROR(TRIM(REGEXEXTRACT(LOWER(B3:B16);"[^,\d(]*"));)))&" different countries"

Problem with query using Join if value of Join contains '

I needed to find how to extract all lines from another sheet (Incidents!A2:P) to my sheet if line contains a list of words (B2:B) in column E.
I didn't want to ask everything from you so I had a look by myself and I succeed to write the following formula :
=query(Incidents!A2:P;"Select * where E matches " & "'" & (join("|";B2:B)) & "'")
It works perfectly fine ... except if one of the words of my list contains the ' character.
Example :
A B
2 let go
3 let's go
4 lets go
My formula works if there's no " ' " in the line #3
I thing the "'" is understood as a programming symbol. Do someone have a solution ? An idea ?
Thanks
Oliver
Try replacing single quotes with doubled double quotes:
=query(Incidents!A2:P;"Select * where E matches " & """" & (join("|";B2:B)) & """")

Easiest way to query multiple sheets that are named by year, for all time?

I use a big nasty formula to query multiple sheets in the same Google Sheet document named according to year (2020, 2019, 2018, etc...) to sum up a total value. Because I need to query a filtered range in a complex way, I've figured out the best way to do this without running into other troubleshooting issues is to SUM multiple queries like so:
=SUM(
IFERROR(QUERY(FILTER({EOMONTH(INDIRECT("'"&**TO_TEXT(YEAR(TODAY()))&"'!A1:A"&ROWS(INDIRECT("'"&TO_TEXT(YEAR(TODAY()))&"'!K2:K"))), 0),INDIRECT("'"&TO_TEXT(YEAR(TODAY()))&"'!K2:K")}, [filter conditions]), "select Col2 label Col2' ' ")),
IFERROR(QUERY(FILTER({EOMONTH(INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))&"'!A1:A"&ROWS(INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))&"'!K2:K"))),0),INDIRECT("'"&TO_TEXT(YEAR(TODAY()-365))**&"'!K2:K")}, [filter conditions]),
"select Col2 label Col2' ' "))
)
For some context, you can see the much larger IF formula that this SUM is meant to be nested into, in the "Example Matrix" tab of the sheet. My focus for this question is on the INDIRECT references, which I have been using to dynamically reference the most current year's sheet and the previous year's sheet.
The problem is, if I want to keep doing this for every sheet as the years go on, I have to manually add a whole other query into my SUM using INDIRECT("'"&TO_TEXT(YEAR(TODAY()-730))&"'!K2:K") and INDIRECT("'"&TO_TEXT(YEAR(TODAY()-1095))&"'!K2:K") and so on, and that is just not an option considering how many of them I would need to add to multiple formulas in multiple sheets.
Is there any way I can adapt this for simplicity or perhaps make it into a script to accomplish summing queries for all sheets that are named by year for all time?
Here's a copy of my Example Sheet: https://docs.google.com/spreadsheets/d/1b29gyEgCDwor_KJ6ACP2rxdvauOzacDI9FL2K-jgg5E/edit#gid=1652431688
Thank you, any help is appreciated.
Usually an array formula would be a way to go in such case, but INDIRECT does not work inside array formulas.
There are a few approaches using scripting like this.
Here I will describe another approach: formula generation. We'll get a string with the formula and manually place it in a cell. It would be nice to put it in an inverted FORMULATEXT function, but unfortunately there is no such function at the moment, so we'll just paste it manually.
Step 1
Set the year limits (sheet names) in some cells. The first year of the period will be in K22, and the last will be in M22.
I set the period to from 2005 to 2040.
All the year numbers will e easily generated with SEQUENCE. If there were arbitrary names, a range of those names set manually would've been needed.
Step 2
Write a formula generator for what you need. We just generate a string here, in that string will be a formula you would normally type manually. It is not hard, but there are a lot of repetition and it would be tedious to write it manually.
Here is the generator:
=ARRAYFORMULA(
"=SUM(
FILTER(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!D2:D, 0)"
) & "
},
ISNUMBER(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!D2:D, 0)"
) & "
}
),
REGEXMATCH(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!A2:A, 0)"
) & "
},
""(?i)^TOTAL$""
),
REGEXMATCH(
{
" & JOIN(
";" & CHAR(10) & " ",
"IFERROR('" & SEQUENCE(M22 - K22 + 1, 1, K22, 1) & "'!C2:C, 0)"
) & "
},
""(?i)^"" & IF(F19 = ""Condition 1 Count"", ""Condition 1"", ""Condition 2"") & ""$""
)
)
)"
)
Compared to the original formula the resulting formula is heavily changed, simplified. For example there is no actual need for INDIRECT with this approach, EOMONTH wasn't used anywhere and so on.
Step 3
Copy that result as text, remove enclosing quotes, replace double double quotes with single double quotes: "" -> ".
Now we've got our formula to paste somewhere as we could've typed manually. Here is a part of it:
=SUM(
FILTER(
{
IFERROR('2005'!F2:F, 0);
IFERROR('2006'!F2:F, 0);
...
IFERROR('2039'!F2:F, 0);
IFERROR('2040'!F2:F, 0)
},
ISNUMBER(
{
IFERROR('2005'!F2:F, 0);
IFERROR('2006'!F2:F, 0);
...
IFERROR('2039'!F2:F, 0);
IFERROR('2040'!F2:F, 0)
}
),
REGEXMATCH(
{
IFERROR('2005'!C2:C, 0);
IFERROR('2006'!C2:C, 0);
...
IFERROR('2039'!C2:C, 0);
IFERROR('2040'!C2:C, 0)
},
"(?i)^TOTAL$"
),
REGEXMATCH(
{
IFERROR('2005'!E2:E, 0);
IFERROR('2006'!E2:E, 0);
...
IFERROR('2039'!E2:E, 0);
IFERROR('2040'!E2:E, 0)
},
"(?i)^" & IF(F19 = "Condition 1 Count", "Condition 1", "Condition 2") & "$"
)
)
)
Step 4
Manually place this resulting formula into some cell.
It does what it supposed to do, dropdown reference works, non-existing sheets are tolerated.
There is no 2021 sheet for example, but when it will be crated there will be no need to change the formula, data from that new sheet will be used.
You'll need to repeat the process in two cases: the formula needs some change in logic or it is almost 2040 and you want to add another 50 years to the period. Still that process of generation is faster than making changes manually to the resulting monster.
A few notes on the original formula:
YEAR(TODAY() - 365) ➡ YEAR(TODAY()) - 1. With your approach there will be an error because of leap years. Depends on the years number, but at the beginning of a year it will emerge for sure.
"select Col2 label Col2' ' " ➡ "select Col2 label Col2 ''". Do you really need a column with a header name ' ' (just a space)? I'm guessing it meant to be blank.
No need for TO_TEXT.

In an array, return all values except one

I would like to use a formula to capitalize just the first letter of an array of words. Sometimes the array might have just 1 word, and sometimes 2, 3, 4 or more words. The source is dynamic, so I need my formula to be flexible. I know about Proper(text), but that capitalizes every word.
For example, in cell A1 I might have the text "aidan is a good boy,"
or I might just have "hi,"
or maybe it will say, "drive in your own lane please!"
My formula over in B1 needs a result of "Aidan is a good boy,"; "Hi,"; or "Drive in your own lane please!"
I wish I could say, B1: =Proper(index(split(M1, " "), 1)) & " " & lower(index(split(M1, " "), *everything except 1*)), but I don't know how to fill in the *everything except 1* part of the formula.
Please try:
=REPLACE(A1,1,1,UPPER(left(A1)))

Resources