I have a large list of people where each person has a line like this.
Bill Gates, IT Manager, Microsoft, <https://www.linkedin.com/in/williamhgates>
I want to extract the company name in a specific cell. In this example, it would be Microsoft, which is between the second and third delimiters (in this case, the delimiter is ", "). How can I do this?
Right now I'm using the split method (=SPLIT(A2, ", ",false)). But it gives me four different cells with information. I would like a command only to output the company in one cell. Can anyone help? I have tried different things, but I can't seem to find anything that works.
Maybe some regex can do it, but I'm not into regex.
Short answer
Use INDEX and SPLIT to get the value between two separators. Example
=INDEX(SPLIT(A1,", ",FALSE),2)
Explation
SPLIT returns an 1 x n array.
The first argument of INDEX could be a range or an array.
The second and third arguments of INDEX are optional. If the first parameter is an array that has only one row or one column, it will assume that the second argument corresponds to the larger side of the array, so there is no need to use the third argument.
A bit nasty, but this formula works, assuming data in cell D3.
=MID(D3,FIND(",",D3,FIND(",",D3)+1)+2,FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
Broken down, this is what it does:
Take the Mid point of D3 =MID(D3
starting two characters after the 2nd comma FIND(",",D3,FIND(",",D3)+1)+2
and the number of characters between the 2nd and 3rd comma, excluding spaces FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
I'll add my favourite ArratFormula, which you could use to expand list automatically without draggind formula down. Assumptions:
you have list with data in range "A1:A20"
all data have same sintax "...,Company Name, <..."
In this case you could use Arrayformula, pasted in cell B1:
=ArrayFormula(REGEXEXTRACT(A1:A20,", ([^,]+), <"))
If your data doest's always look like "...,Company Name, <..." or you wish to get different ounput, use this formula in cell B1:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),"skipping 4")
in this formula:
change 2 in offset 2 to 0, 1, 2, 3 to get name, position, company, link
in skipping 4 4 is a number of items.
Number of items can be counted by formula:
=len(A1)-len(SUBSTITUTE(A1,",",""))+1
and final formula is:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),
"skipping "&len(A1)-len(SUBSTITUTE(A1,",",""))+1)
Related
Sample data here.
In my sheet, I mark header rows in the A-column. If all rows between any given header row are marked as "Ignore" in the B-column, then I'd like that header column to format to a different color.
How do you build a formala that can check if the string "Ignore" happens on any number of rows between two A-column cells with a given string?
Checking for an unknown number of rows is beyond my skillset in formula-building.
EDIT:
I've added a few new conditions that make this slightly more complicated.
A top header row, which should be ignored.
Some rows in column A have data in non-header rows. So, the dynamic range has to check for the exact string that marks a header row and how many rows it takes before that string repeats in the column.
Some B-column rows are blank. Blank doesn't mean "Ignore", so if all B-column rows beneath a header are blank, the header shouldn't have the special format.
try:
=(NOT(REGEXMATCH(ROW($A1)&"", INDEX(TEXTJOIN("|", 1, "×",
IFERROR(SORT(UNIQUE(FILTER(VLOOKUP(ROW($A1:$A),
IF($A1:$A<>"", {ROW($A1:$A), ROW($A1:$A)}), 2, 1),
$B1:$B<>"Ignore", $B1:$B<>"")), 1, 0)))))))*($A1<>"")
update:
=NOT(REGEXMATCH(ROW($A2)&"", "^"&TEXTJOIN("$|^", 1, "×",
IFERROR(SORT(UNIQUE(FILTER(IFNA(VLOOKUP(IF(($A2:$A<>"")*($A2:$A<>"*"),, ROW($A2:$A)),
IF(($A2:$A<>"")*($A2:$A<>"*"), {ROW($A2:$A), ROW($A2:$A)}), 2, 1)),
$B2:$B<>"ignore", $C2:$C<>"")), 1, 0)))&"$"))*($A2<>"")*($A2<>"*")
step-by-step formula explanation
This is essentially the same as Player() only a little shorter formula.
=if(A1<>"",len(SUBSTITUTE(TEXTJOIN("",,B2
:INDEX(B:B,MATCH(true,isblank(B2:B),0)+row()-1,1)),"Ignore",""))=0,"")
Explanation of Dynamic Range
The hardest part of this is matching the groups of values in column b. To do this, I used a vector approach of with an index function separating the ranges with a :. So like one would do B2:B3, one could do: B2:Index.
To get the lower position, I used a method of matching the first blank (note ="" won't work). This will identify the distance from the cell the function is being called from. We then need to add the row it's being called from, then one cell higher (less) as we don't want the blank cell, but the one above. So combining... INDEX(B:B,MATCH(true,isblank(B2:B),0)+row()-1,1) gets the dynamic lower value.
After that, there's a variety of ways to solve. I used textjoin and substation to confirm a length of zero as a method, but lots of other ways.
Paste this: formula in C1, to get a helper column that can be hidden.
=AND( A1<>"", LOWER(B2)= "ignore")
Paste this: formula in conditional formatting and set Apply to range to A1:A1000, take a look at Example Sheet
=$B:$B="Ignore"
I have list of partner name codes, delimited by space. Like the one shown in below,
I have another table(E:F), from where I have to map them to show the partner names like the column C, perhaps i am not able to understand how to make it happen,
I have tried using this formula which brings only one partner name but when there are multiple it does not shows up, do i need to add another function like TEXTJOIN or what I am doing wrong here.
=IFERROR(VLOOKUP(IFERROR(REGEXEXTRACT(A2,JOIN("|",FILTER($E$2:$E,$E$2:$E<>""))),""),$E$2:$F,2,0),"")
Link To GS
See my sheet ("Erik Help"). The following formula is in cell B1:
=ArrayFormula({"PARTNER NAMES";IF(A2:A="",,REGEXREPLACE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(VLOOKUP(SPLIT(A2:A," ",0,1),D:E,2,FALSE)&",")),,COLUMNS(SPLIT(A2:A," ",0,1))))),",$",""))})
This one formula produces the header (which you can change within the formula itself as you like) and all results for all rows.
IF(A2:A="",,...) means if a cell in Col A is blank, then the result in the same row of Col B will also be blank (i.e., null).
SPLIT (the first time in the formula) will split the Col-A values at the spaces.
VLOOKUP will try to find each split value in the D:E list. If found, the full name will replace the initials. If not found, IFERROR will return null.
You will see &",". That is appending a comma to any full names that are returned.
TRANSPOSE(QUERY(TRANSPOSE...),,COLUMNS())) is what many call "QUERY Smash." It basically, flips the remaining results of the VLOOKUP into columns instead of rows, turns everything into headers (to get them in one cell per column) and then flips them back to row orientation.
TRIM gets rid of spaces where no names were found in the full list.
REGEXREPLACE(... ,",$","") replaces any final comma that has no name after it with null.
I have a data set wherein emails are populated. I would like to list all the surnames extracted in the emails per cell and will be all joined to a one single cell but I want to put a separator or delimeter to the emails obtaine per cell.
Here is the data set:
A
B
john.smith#gmail.com, jane.doe#gmail.com
UPDATE
john.smith#gmail.com
CLOSE
And here is the formula to extract
=ARRAYFORMULA(
PROPER(
REGEXEXTRACT(
A:A,
REGEXREPLACE(
A:A,
"(\w+)#","($1)#"
)
)
)
)
This initially yields the ff:
C
D
Smith
Doe
Smith
I would like to use JOIN() inside the ARRAYFORMULA() but it is not working as I seem to think it would since it outputs an error that it only accepts one row or one column of data. My initial understanding of ARRAYFORMULA() is that it iterates through the course of the data, so I thought it will JOIN() first, and then move on to the next element/row but I guess it doesn't work that way. I can use FLATTEN() but I want to have delimiters or separators in between the row elements. I need help in obtaining my intended final result which will look like this:
UPDATE:
Smith
Doe
CLOSE:
Smith
All are located in one cell, C1. UPDATE and CLOSE are from column B.
EDIT: I would like to clarify that the email entries in column A are dynamic and maybe more than two.
I think this will work:
=arrayformula(flatten(if(A2:A<>"",regexreplace(trim(split(B2:B&":"&char(9999)®exreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(9999)&" "),char(9999))),".*\.",),)))
NOTES:
Proper(A2:A) changes the capitalisation.
The regexreplace "#[\w\.]+,\ ?|#.*" finds:
# symbol...
then any number of A-Z, a-z, 0-9, _ [using \w] or . [using \.]
then a comma
then 'optionally' a space \ [the optional bit is ?]
or [using |], the # symbol then an number of characters [using .*]
The result is replaced with a character that you won't expect to have in your text - char(9999) which is a pencil icon, and a trailing space (used later on when the flatten keeps a gap between lines). The purpose is to get all of the 'name.surname' and 'nameonly' values in front of any # symbol and separate them with char(9999).
Then infront of the regexreplace is B2:B&":"&char(9999)& which gets the value from column B, the : chanracter and char(9999).
The split() function then separates then into columns. Trim() is used to get rid of spaces in front of names that don't contain ..
The next regexreplace() function deletes anything before, and including . to keep surname or name without ..
The if(A2:A<>"" only process rows where there is a value in col A. The arrayformula() function is required to cascade the formula down the sheet.
I didn't output the results in a single cell, but it looks like you've sorted that with textjoin.
Here's my version of getting the results into a single cell.
=arrayformula(textjoin(char(10),1,if(A2:A<>"",REGEXREPLACE(B2:B&":"&char(10)®exreplace(Proper(A2:A),"#[\w\.]+,\ ?|#.*",char(10)),".*\.",),)))
Assuming that your A:A cells will always contain only contiguous email addresses separated by commas, you could place this in, say, C1 (being sure that both Columns C and D are otherwise empty beforehand):
=TRANSPOSE(FILTER({B:B,IFERROR(REGEXEXTRACT(SPLIT(PROPER(A:A),","),"([^\.]+)#"))},A:A<>""))
If this produces the desired result and you'd like to know how it works, report back. The first step, however, is to make sure it works as expected.
use:
=INDEX(REGEXREPLACE(TRIM(QUERY(FLATTEN(QUERY(TRANSPOSE({{B1; IF(B2:B="",,"×"&B2:B)},
PROPER(REGEXEXTRACT(A:A, REGEXREPLACE(A:A, "(\w+)#", "($1)#")))})
,,9^9)),,9^9)), " |×", CHAR(10)))
I watched a tutorial where the author uses an IF statement along with the ARRAYFORMULA function to add a title row to a column of data. Links are given to the docs; however, for an example of how to use ARRAYFORMULA see this answer.
An example can be seen below:
I was able to populate the C column by placing the following formula in C1:
=ARRAYFORMULA(if(row(A:A) = 1, "spent", B:B - A:A))
I'm confused about the syntax. I understand that X:X references the entire X column but I don't understand how it's being used to check if we're at cell A1 in one context and then being used to apply mass formulas in another context.
How does the above line work?
Can you illustrate with some examples?
It sounds to me that the information you learned led you to expect that row(A:A)=1 translates to row A1?
It works a little different than that, the syntax as your using it now, is basically saying if any row in A:A has a value of 1, then write "spent" else subtract B-A
My suggestion:
use a literal array to make your header, then use the if(arrayformula) to only populate rows with values, for aesthetics:
Example:
={"Spent";arrayformula(if(isnumber(A2:A),B2:B-A2:A,))}
Explanation:
The {} allow you to build a literal array, and using a semicolon instead of a comma allows you to stack your cells vertically, following that we check if there is a value in column A, if so, subtract A from B, else leave it blank.
why not just put the column title directly on the first row cell, and start the array formula from the 2nd row, using the A2:A, B2:B syntax?
If something does not have to be in a formula, better put it directly on the cell - simpler for others to understand what's going on, and the formula will be simpler.
If you put the array formula in line 2, and someone sorts the data, then the arrayformula will move. If it is in the header line, this is less likely to happen.
You can also use the IFS function to achieve a similar effect to the array,
=arrayformula(ifs(row(A1:A)=1,"Spent",A1:A="",,True,B1:B-A1:A)
Here the first condition checks the row number, and if it is row ONE, then inserts a Column Header.
The Second condition - A1:A="",, - ensures that blank lines are ignored.
The Third condition True (ELSE) performs the calculation.
This method also allows for different calculations to performed on different rows depending on requirements.
Here is a screen shot of some data:
I would like to build a new column that is the the string in column A the total number of times it occurs.
So entry "Too expensive" would be on 26 rows then under that would start "Don't want it" taking up 6 rows, then "too expensive" (different since lower case) would take up another 6 + 5 from row 14.
So just a new column that is each string the number of times it appears. Inverse pivot tabling, if you will.
How would I do that? I tried playing with rept() but that put everything in one cell.
It looks like most likely you first need a helper column to basically unique the values so in column C you would put :
=UNIQUE(A:A)
and for the sake of explanation, if you want to see how it breaks down, in column D you can put
=sum(FILTER(B2:B,exact(C2,A2:A)))
The reason for using exact , is that otherwise it wont be case sensitive.
Once you have your final number for the REPT function you consutruct your repeatable value with a delimiter:
=rept(C2&";",D2)
This helps out split them out properly later into a column, if you rept the value with out the semicolon you will see the same result your describing up top where they are all mashed together.
Currently at this point this is what you would see:
To save some space I nest the sum filter into the rept function so I can remove column D:
=REPT(C2&";",sum(FILTER(B2:B,exact(C2,A2:A))))
I then join all those and split them out one last time using the ; as a delimiter:
=TRANSPOSE(SPLIT(JOIN(";",D2:D4),";"))
Alternatively, see if this works ?
=ArrayFormula(trim(transpose(split(query(rept(A2:A&char(10),B2:B),,50000), char(10)))))