How to remove a piece of text from a cell? - google-sheets

I'm trying to remove a piece of text (Perfomance) from a column in Google Spreadsheet that contains (XX Performance) XX is a number like 89. I'm using:
=REGEXREPLACE(D:D, " Performance "," - ")
But no love...
enter image description here

Try this Example Sheet
=ArrayFormula(IF(D2:D="",, REGEXEXTRACT(D2:D, "[0-9]+")))

You can use the expression \D+:
\D matches any character that's not a digit (equivalent to [^0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed
The formula will be like:
=REGEXREPLACE(D:D, "\D+","")
UPDATE
I did put it in another column otherwise it creates a circular dependency. The data is imported via API from another app.
Then you will need to create another sheet or use a hidden column to put that information and then use the regex on the column you want the final result.

Related

Google Sheets - Iterate through text string, removing charters from end until a vlookup match

A user pastes in a value to see if there is a full or partial match. I need to do a vlookup and keep removing characters until there is a match. A full match of something like test1.test2.test3 is no problem because it's a full match to my list. But if someone pastes in something like test1.test2.test3.test4, I need to remove a character one at a time from the end until there is a match. So in this example, it would match test1.test2.test3 and return that result.
Conceptually I see this as a for loop that counts the characters using len, using left to remove the number of characters from the end based on the current iteration, and doing vlookups until returning the value when true. But I'm not sure how to do this in Google Sheets.
This formula will give you the matching value that was found in the data(i.e. test1.test2.test3)
=FILTER([column_with_data], REGEXMATCH([cell_with_pasted_value_to_look], [column_with_data]))
This formula will give you the matching data and the cell reference where it was found (i.e. test1.test2.test3 # $A$4)
=FILTER([column_with_data], REGEXMATCH([cell_with_pasted_value_to_look], [column_with_data]))&" # "&CELL("address",INDEX([column_with_data],MATCH(FILTER([column_with_data], REGEXMATCH([cell_with_pasted_value_to_look], [column_with_data])),[column_with_data],0),1))
Simply copy & paste any of the above formulas next to the cell where users paste a value to look. Then, replace the two references in the square brackets [ ] with the proper coordinates in your sheet:
replace [column_with_data] with the coordinates of the column containing all the stored data (i.e. A1:A)
replace [cell_with_pasted_value_to_look] with the absolute ($col$row)coordinates of the cell where users paste the value to look (i.e. $B$1)
Would it be a problem to download the data from Google sheets, transform the file type to use the for loop in another software, and re-upload? I think your idea for a for loop would work.
It might be quicker if this is a long term project, but not so great if the client is continually monitoring/uploading.

ArrayFormula, RegexExtract, and Join in Google Sheets

I have a data set wherein emails are populated. I would like to list all the surnames extracted in the emails per cell and will be all joined to a one single cell but I want to put a separator or delimeter to the emails obtaine per cell.
Here is the data set:
A
B
john.smith#gmail.com, jane.doe#gmail.com
UPDATE
john.smith#gmail.com
CLOSE
And here is the formula to extract
=ARRAYFORMULA(
PROPER(
REGEXEXTRACT(
A:A,
REGEXREPLACE(
A:A,
"(\w+)#","($1)#"
)
)
)
)
This initially yields the ff:
C
D
Smith
Doe
Smith
I would like to use JOIN() inside the ARRAYFORMULA() but it is not working as I seem to think it would since it outputs an error that it only accepts one row or one column of data. My initial understanding of ARRAYFORMULA() is that it iterates through the course of the data, so I thought it will JOIN() first, and then move on to the next element/row but I guess it doesn't work that way. I can use FLATTEN() but I want to have delimiters or separators in between the row elements. I need help in obtaining my intended final result which will look like this:
UPDATE:
Smith
Doe
CLOSE:
Smith
All are located in one cell, C1. UPDATE and CLOSE are from column B.
EDIT: I would like to clarify that the email entries in column A are dynamic and maybe more than two.
I think this will work:
=arrayformula(flatten(if(A2:A<>"",regexreplace(trim(split(B2:B&":"&char(9999)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(9999)&" "),char(9999))),".*\.",),)))
NOTES:
Proper(A2:A) changes the capitalisation.
The regexreplace "#[\w\.]+,\ ?|#.*" finds:
# symbol...
then any number of A-Z, a-z, 0-9, _ [using \w] or . [using \.]
then a comma
then 'optionally' a space \ [the optional bit is ?]
or [using |], the # symbol then an number of characters [using .*]
The result is replaced with a character that you won't expect to have in your text - char(9999) which is a pencil icon, and a trailing space (used later on when the flatten keeps a gap between lines). The purpose is to get all of the 'name.surname' and 'nameonly' values in front of any # symbol and separate them with char(9999).
Then infront of the regexreplace is B2:B&":"&char(9999)& which gets the value from column B, the : chanracter and char(9999).
The split() function then separates then into columns. Trim() is used to get rid of spaces in front of names that don't contain ..
The next regexreplace() function deletes anything before, and including . to keep surname or name without ..
The if(A2:A<>"" only process rows where there is a value in col A. The arrayformula() function is required to cascade the formula down the sheet.
I didn't output the results in a single cell, but it looks like you've sorted that with textjoin.
Here's my version of getting the results into a single cell.
=arrayformula(textjoin(char(10),1,if(A2:A<>"",REGEXREPLACE(B2:B&":"&char(10)&regexreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(10)),".*\.",),)))
Assuming that your A:A cells will always contain only contiguous email addresses separated by commas, you could place this in, say, C1 (being sure that both Columns C and D are otherwise empty beforehand):
=TRANSPOSE(FILTER({B:B,IFERROR(REGEXEXTRACT(SPLIT(PROPER(A:A),","),"([^\.]+)#"))},A:A<>""))
If this produces the desired result and you'd like to know how it works, report back. The first step, however, is to make sure it works as expected.
use:
=INDEX(REGEXREPLACE(TRIM(QUERY(FLATTEN(QUERY(TRANSPOSE({{B1; IF(B2:B="",,"×"&B2:B)},
PROPER(REGEXEXTRACT(A:A, REGEXREPLACE(A:A, "(\w+)#", "($1)#")))})
,,9^9)),,9^9)), " |×", CHAR(10)))

How to assign a unique ID to a google form input?

Google Forms - I have set up a google form and I want to assign a unique id each of the completed incoming form inputs. My intention is to use the unique ID as an input for another google form I have created which I will use to link the two completed forms. Is there another easier way to do this?
I'm not a programmer but I have programming resources available to me if needed.
I was also banging my head at this and finally found a solution.
I compose a 6-digit number that gets generated automatically for every row and is composed of:
3 digits of the row number - that gives the uniqueness (you can use more if you expect more than 998 responses), concatenated with
3 digits of the timestamp converted to a number - that prevents guessing the number
Follow these instructions:
Create an additional column in the spreadsheet linked to your form, let's call it: "unique ID"
Row number 1 should be populated with column titles automatically
In row number 2, under column "Unique ID", add the following formula:
=arrayformula( if( len(A2:A), "" & text(row(A2:A) - row(A2) + 2, "000") & RIGHT(VALUE(A2:A), 3), iferror(1/0) ) )
Note: An array formula applies automatically to the entire column.
Make sure you never delete that row, even if you clear up all the results from the form
Once a new submission is populated, its "Unique ID" will appear automatically
Formula explanation:
Column A should normally hold the timestamp. If the timestamp is not empty, then this gives the row number: row(A2:A) - row(A2) + 2
Using text I trim it to a 3-digit number.
Then I concatenate it with the timestamp converted to a number using VALUE and trim it to the three right-most digits using RIGHT
Voila! A number that is both unique and hard-to-guess (as the submitter has no access to the timestamp).
If you would like more confidence, obviously you could use more digits for each of the parts.
You can apply unique ID numbers using an arrayformula next to the form data. In row 1 of the first rightmost empty column you can use something like
=arrayformula(if(row(A1:A)=1,"UNIQUE ID",if(len(A1:A)>0,98+row(A1:A),iferror(1/0))).
A few comments regarding the explanation provided by #Ying, which I will try to expand, as it is very good.
> Column A should normally hold the timestamp.
In my case, it is date+time stamp.
> 4. Make sure you never delete that row,
even if you clear up all the results from the form
That issue can easily be avoided by placing the formula in the header like this
={"calculated_id";arrayformula( if( len(C2:C); "" & text(row(C2:C) - row(C2) + 2; "000") & RIGHT(VALUE(C2:C); 3); iferror(1/0) ) )}
This formula provides an string for one cell, and a formula for the next one, which happens to be an array formula which will cover all the cells below.
Note: Depending on your language settings you may need to use ";" or "," as separator among parameters.
> 5. Once a new submission is populated,
its "Unique ID" will appear automatically
Issue
And here is the issue I see with this solution.
If the Google Form allows responders to Edit their responses, the date+time stamp will change and so the calculated_id.
A workaround is to have 2 columns, one is the calculated_id and the other will be static_id.
static_id will take whatever is on calculated_id only if itself has no data, otherwise it will stay as it is.
Doing that we will have an ID that will not change no matter how many updates the response experience.
The sort formula for static_id is
=IF(AND(IFERROR(K2)<>0;K2<>"");K2;L2)
The large one is
={"static_id";ArrayFormula(IF(AND(IFERROR(M2:M)<>0;M2:M<>"");M2:M;L2:L))
}
M or K -> static_id
L -> calculated_id
Remember to put this last one on the header of the column. I tend to change the color to purple when it has a formula behind, so I don't mess with it by mistake.
Extra info.
The numeric value from the date/time stamp differs when it comes from both or just one. Here are some examples.
Note that the number of digits on the fractional part differ quite a lot depending on the case.

Small in arrayformula (Google Spreadsheet)

I have 5 columns of numbers that I want to sort per row into another set of columns. I figured I need to use small() (e.g. small(a2:e2,1) for f2; small(a2:e2,2) for g2 and so on). Is there away to iterate this for the next rows; if possible using only native google spreadsheet formulas?
Thanks in advance
I was able to make a temporary work around, but I had to use 3 cheat columns. It looks ok for now but I imagine it will be troublesome for really huge numbers.
Here's a sample sheet for reference: https://docs.google.com/spreadsheets/d/1MQTP2XkRsPRAnPQ5wLhkR8JoNVY6YOExVlOkkX8UeRs/edit#gid=0
The original data are in A3:E
The first cheat column (G3:G) simply creates a column of numbers from 1 to the largest number found in the source data. 1-9 is changed to 01-09 for easier searching. "#" is then added at the end-this will come handy later:
Cheat Column 1 =filter(if(row(A:A)=max(A:E)+1,ʺ#ʺ,text(row(A:A),ʺ00ʺ)),row(A:A)<=max(A:E)+1)
The second cheat column (H3:H) combines each row into a string separated by "-" with a "#" marker:
Cheat Column 2=filter(text(A3:A,ʺ00ʺ)&ʺ-ʺ&text(B3:B,ʺ00ʺ)&ʺ-ʺ&text(C3:C,ʺ00ʺ)&ʺ-ʺ&text(D3:D,ʺ00ʺ)&ʺ-ʺ&text(E3:E,ʺ00ʺ)&ʺ#ʺ,A3:A<>ʺʺ)
The last cheat column (I3:I) sorts each line (from cheat column 2) by finding each number from cheat column from 01 up to the max number, then the "#" char (this ensures that each line will still have the # end marker). "Find" will return the "position" of each number or an error if it's not found. By using "if", we can make "find" return the actual number or "" instead.
=filter(arrayformula(if(iferror(find(transpose(filter(G3:G,G3:G<>ʺʺ)),H3:H),ʺʺ), transpose(filter(G3:G,G3:G<>ʺʺ)),ʺʺ)),A3:A<>ʺʺ)
The formula above creates as many columns as there are numbers from cheat column 1. To prevent this, a "-" is added to each number then "Concatenate" is used to combine everything into one massive string with each set separated by "#". The string is then split using the "#" marker.
Cheat Column 3 =transpose(split(concatenate(filter(arrayformula(if(iferror(find(transpose(filter(G3:G,G3:G<>ʺʺ)),H3:H),ʺʺ),ʺ-ʺ&transpose(filter(G3:G,G3:G<>ʺʺ)),ʺʺ)),A3:A<>ʺʺ)),ʺ#ʺ))
Each number is then separated into each corresponding column by using mid().
Small 1 =filter(mid(I3:I,2,2)*1,A3:A<>ʺʺ)
Small 2 =filter(mid(I3:I,5,2)*1,A3:A<>ʺʺ)
Small 3 =filter(mid(I3:I,8,2)*1,A3:A<>ʺʺ)
Small 4 =filter(mid(I3:I,11,2)*1,A3:A<>ʺʺ)
Small 5 =filter(mid(I3:I,14,2)*1,A3:A<>ʺʺ)
Note that the formula above is only for numbers 1-99. For larger numbers, the Text() formulas should have more zeroes to correspond to the number of digits of the biggest number. The Mid() formulas should also be adjusted accordingly.
I would like to stress that I am very far from being a spreadsheet expert and that this solution is very "unoptimized". It requires several cheat columns; with the first one even having more rows than the original data. If anyone can help me get rid of the cheat columns (or at least the first one) I will be very grateful.
How about using SMALL like you mentioned in your question?
=small($A3:$E3,column()-columns($A3:$G3))
You will need to change the ranges accordingly. The last $G$3 is the cell just before the cell where the formula is placed.
Sample

extract number from cell in openoffice calc

I have a column in open office like this:
abc-23
abc-32
abc-1
Now, I need to get only the sum of the numbers 23, 32 and 1 using a formula and regular expressions in calc.
How do I do that?
I tried
=SUMIF(F7:F16,"([:digit:].)$")
But somehow this does not work.
Starting with LibreOffice 6.4, you can use the newly added REGEX function to generically extract all numbers from a cell / text using a regular expression:
=REGEX(A1;"[^[:digit:]]";"";"g")
Replace A1 with the cell-reference you want to extract numbers from.
Explanation of REGEX function arguments:
Arguments are separated by a semicolon ;
A1: Value to extract numbers from. Can be a cell-reference (like A1) or a quoted text value (like "123abc"). The following regular expression will be applied to this cell / text.
"[^[:digit:]]": Match every character which is not a decimal digit. See also list of regular expressions in LibreOffice
The outer square brackets [] encapsulate the list of characters to search for
^ adds a NOT, meaning that every character not included in the search list is matched
[:digit:] represents any decimal digit
"": replace matching characters (every non-digit) with nothing = remove them
"g": replace all matches (don't stop after the first non-digit character)
Unfortunately Libre-Office only supports regex in find/replace and in search.
If this is a once-only deal, I would copy column A to column to B, then use [data] [text to columns] in B and use the - as a separator, leaving you with all the text in column B and the numbers in column C.
Alternatively, you could use =Right(A1,find("-",A1,1)+1) in column B, then sum Column C.
I think that this is not exactly what do you want, but maybe it can help you or others.
It is all about substring (in Calc called [MID][1] function):
First: Choose your cell (for example with "abc-23" content).
Secondly: Enter the start length ("british" --> start length 4 = tish).
After that: To print all remaining text, you can use the [LEN][2] function (known as length) with your cell ("abc-23") in parameter.
Code now looks like this:
D15="abc-23"
=MID(D15; 5; LEN(D15))
And the output is: 23
When you edit numbers (in this example 23), no problem. However, if you change anything before (text "abc-"), the algorithm collapses because the start length is defined to "5".
Paste the string in a cell, open search and replace dialog (ctrl + f) extended search option mark regular expression search for ([\s,0-9])([^0-9\s])+ and replace it with $1
adjust regex to your needs
I didn't figure out how to do this in OpenOffice/LibreOffice directly. After frustrations in searching online and trying various formulas, I realised my sheet was a simple CSV format, so I opened it up in vim and used vim's built-in sed-like feature to find/replace the text in vim command mode:
:%s/abc-//g
This only worked for me because there were no other columns with this matching text. If there are other columns with the same text, then the solution would be a bit more complex.
If your sheet is not a CSV, you could copy the column out to a text file and use vim to find/replace, and then paste the data back into the spreadsheet. For me, this was a lot less frustrating than trying to figure this out in LibreOffice...
I won't bother with a solution without knowing if there really is interest, but, you could write a macro to do this. Extract all the numbers and then implement the sum by checking for contained numbers in the text.

Resources