Split function treating dashes oddly - google-sheets

Why does =SPLIT("1,2-5,4", ",")
equal
1 42040 4
instead of
1 2-5 4 ?
I have all of the cells formatted at plain text.

Regextract should give you the desired output. Try:
=ArrayFormula(regexextract("1,2-5,4", {"^(\d+),",",(.+),",",(\d+)$"}))

To complement JPV's answer.
You can use:
=REGEXEXTRACT(A1,"(.*?),(.*?),(.*)")
which is "hard-coded" to splitting exactly 3 elements (as JPV's is). To give more flexibility, you can use something like:
=REGEXEXTRACT(A1&REPT(",",10),REPT("(.*?),",10))
which is limited to a maximum of 10 elements (that number can be changed to suit). However, it will output an array that is always that maximum number of elements long (padded out with blank cells). You could use QUERY or FILTER to filter out those blank cells - the formula will become a little convoluted.
Alternatively, you can "code" your string such that automatic date coercion is avoided, and then "uncode" it after the SPLIT:
=ArrayFormula(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"-","x"),","),"x","-"))

With 1,2-5,4 in a cell this can be split as required with Data > Split text into columns... .

Related

Add leading zeros in cells with non digits

I have two cells that need to be merged into one, but the formatting needs to be updated. In one cell there are category IDs and in the other cell are subcategory IDs. None of them are formatted to have leading zeros, which isn't too big of a deal because I can just use the number format for that. The problem is that the subcategories sometimes have a letter at the end. Number format completely ignores these and just leaves them be. Is there a formula/combination of formulas I can use to get data to look like this?
Category
Subcat
Output
1
1
01-01
4
12
04-12
21
1
21-01
21
1b
21-01b
EDIT
Sorry about the previous answer. This one is correct
=ArrayFormula(IF(A2:A<>"", IF(LEN(A2:A)=1,"0"&A2:A,A2:A)&"-"&IF(REGEXMATCH(B2:B&"","^\d{1}$|^\d{1}\D")=TRUE,"0"&B2:B,B2:B),""))
WRONG (You can use this)
=ArrayFormula(IF(A2:A<>"", IF(LEN(A2:A)=1,"0"&A2:A,A2:A)&"-"&IF(LEN(B2:B)=1,"0"&B2:B,B2:B),""))
try:
=INDEX(IFNA(TEXT(A1:A, "00-")&
TEXT(REGEXEXTRACT(B1:B&"", "\d+"), "00")&
IFERROR(REGEXEXTRACT(B1:B, "\D+"))))
Try:
=IF(ISNUMBER(A2), TEXT(A2,"00"), TEXT(LEFT(A2,LEN(A2)-1),"00")&RIGHT(A2))&"-"&IF(ISNUMBER(B2), TEXT(B2,"00"), TEXT(LEFT(B2,LEN(B2)-1),"00")&RIGHT(B2))
And just paste it in every cell in column C

Google Sheets: Sum if value is round number

I'm sure there's a pretty simple solution, but I cant' get to it.
I'm trying to sum a list of numbers, but only the values that are round numbers/whole integers.
Eg column A:
1
1.5
3
2.4
2
sum of the whole numbers
1 + 3 + 2 = 6
Any hint?
Let's suppose your numbers list begins in A2 and runs downward (i.e., A2:A). You can use this:
=SUM(FILTER(A2:A,A2:A=INT(A2:A)))
In plain English, this reads as follows: "Sum only those numbers in A2:A where the original value is the same as the integer-only portion of that value."
Try:
=sumproduct((A1:A5)*(A1:A5=int(A1:A5)))
this will also work in Excel
use dot detection:
=INDEX(SUM(IF(REGEXMATCH(""&A:A, "\."),,A:A)))
Another solution, you can use SEARCH to search for the decimal point:
=SUM(A1:A)-SUM(FILTER(A1:A,SEARCH(".",TO_TEXT(A1:A))))
or =SUM(FILTER(A1:A,NOT(ISNUMBER(SEARCH(".",A1:A)))))
as JvdV mentioned in his comment.
Either try QUERY():
=SUM(QUERY(A:A,"where A matches '\d+'"))
Or FILTER():
=SUM(FILTER(A:A,MOD(A:A,1)=0))
Note: This 1st option makes use of the possibility to use a regular expression inside the "where" clause of QUERY(). Use =SUM(QUERY(A:A,"where A matches '-?\d+'")) if you want to account for positive and negative integers.

How do you sort an alpha-numeric list in excel?

I'm trying to sort a list of documents, but I'm having an issue with the documents that have a letter as a suffix.
Whenever we amend a document we add a letter to the end of the number, but when I sort by number in excel it sorts like this:
1
2
3
10
11
1606
1603D
1605B
1606A
1606C
1610A
1623A
20A
220B
390A
399A
402A
415A
450A
488A
557B
How can I make it sort in order of document number and amendment?
Like so:
1
2
3
10
11
1603D
1605B
1606
1606A
1606C
1610A
1623A
20A
220B
390A
399A
402A
415A
450A
488A
557B
As long as you have a mix of text and number, you won't be able to use Excel's built-in sort to achieve the result you describe.
If you append a letter to a number you effectively change the data type from number to text. Text will always be sorted after any number, hence the number 1606 comes before the text 1606A.
You could try to make all values real text, maybe indicate levels by appending digits with dots, like this:
1.
1.0.
1.1.
1.6.0.3.D
1.6.0.5.B
1.6.0.6.
1.6.0.6.A
1.6.0.6.C
1.6.1.0.A
1.6.2.3.A
2.
2.0.A.
2.2.0.B.
3.
3.9.0.A.
3.9.9.A.
4.0.2.A.
4.1.5.A.
4.5.0.A.
4.8.8.A.
5.5.7.B.
But even that does not give you the sort order you describe as the desired result.
Your desired sort order will be hard to achieve even if all values are text, or if you replace the A, B, C with a decimal .1, .2, .3. -- It's really hard to understand why 20 would come after 1623.
The solution I found was to add a column, and copy this formula into each cell:
=IF(ISNUMBER(--RIGHT(A2)),A2,LEFT(A2,LEN(A2)-1))
The formula removes the letters from the numbers, you can then sort your sheet using the new column of clean numbers.

Get data between number two and three delimiter

I have a large list of people where each person has a line like this.
Bill Gates, IT Manager, Microsoft, <https://www.linkedin.com/in/williamhgates>
I want to extract the company name in a specific cell. In this example, it would be Microsoft, which is between the second and third delimiters (in this case, the delimiter is ", "). How can I do this?
Right now I'm using the split method (=SPLIT(A2, ", ",false)). But it gives me four different cells with information. I would like a command only to output the company in one cell. Can anyone help? I have tried different things, but I can't seem to find anything that works.
Maybe some regex can do it, but I'm not into regex.
Short answer
Use INDEX and SPLIT to get the value between two separators. Example
=INDEX(SPLIT(A1,", ",FALSE),2)
Explation
SPLIT returns an 1 x n array.
The first argument of INDEX could be a range or an array.
The second and third arguments of INDEX are optional. If the first parameter is an array that has only one row or one column, it will assume that the second argument corresponds to the larger side of the array, so there is no need to use the third argument.
A bit nasty, but this formula works, assuming data in cell D3.
=MID(D3,FIND(",",D3,FIND(",",D3)+1)+2,FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
Broken down, this is what it does:
Take the Mid point of D3 =MID(D3
starting two characters after the 2nd comma FIND(",",D3,FIND(",",D3)+1)+2
and the number of characters between the 2nd and 3rd comma, excluding spaces FIND(",",D3,FIND(",",D3,FIND(",",D3)+1)+1)-FIND(",",D3,FIND(",",D3)+1)-2)
I'll add my favourite ArratFormula, which you could use to expand list automatically without draggind formula down. Assumptions:
you have list with data in range "A1:A20"
all data have same sintax "...,Company Name, <..."
In this case you could use Arrayformula, pasted in cell B1:
=ArrayFormula(REGEXEXTRACT(A1:A20,", ([^,]+), <"))
If your data doest's always look like "...,Company Name, <..." or you wish to get different ounput, use this formula in cell B1:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),"skipping 4")
in this formula:
change 2 in offset 2 to 0, 1, 2, 3 to get name, position, company, link
in skipping 4 4 is a number of items.
Number of items can be counted by formula:
=len(A1)-len(SUBSTITUTE(A1,",",""))+1
and final formula is:
=QUERY(QUERY(TRANSPOSE(SPLIT(JOIN(", ",A1:A20),", ",0)),"offset 2"),
"skipping "&len(A1)-len(SUBSTITUTE(A1,",",""))+1)

Countif with len in Google Spreadsheet

I have a column XXX like this :
XXX
A
Aruin
Avolyn
B
Batracia
Buna
...
I would like to count a cell only if the string in the cell has a length > 1.
How to do that?
I'm trying :
COUNTIF(XXX1:XXX30, LEN(...) > 1)
But what should I write instead of ... ?
Thank you in advance.
For ranges that contain strings, I have used a formula like below, which counts any value that starts with one character (the ?) followed by 0 or more characters (the *). I haven't tested on ranges that contain numbers.
=COUNTIF(range,"=?*")
To do this in one cell, without needing to create a separate column or use arrayformula{}, you can use sumproduct.
=SUMPRODUCT(LEN(XXX1:XXX30)>1)
If you have an array of True/False values then you can use -- to force them to be converted to numeric values like this:
=SUMPRODUCT(--(LEN(XXX1:XXX30)>1))
Credit to #greg who posted this in the comments - I think it is arguably the best answer and should be displayed as such. Sumproduct is a powerful function that can often to be used to get around shortcomings in countif type formulae.
Create another list using an =ARRAYFORMULA(len(XXX1:XXX30)>1) and then do a COUNTIF based on that new list: =countif(XXY1:XXY30,true()).
A simple formula that works for my needs is =ROWS(FILTER(range,LEN(range)>X))
The Google Sheets criteria syntax seems inconsistent, because the expression that works fine with FILTER() gives an erroneous zero result with COUNTIF().
Here's a demo worksheet
Another approach is to use the QUERY function.
This way you can write a simple SQL like statement to achieve this.
For example:
=QUERY(XXX1:XXX30,"SELECT COUNT(X) WHERE X MATCHES '.{1,}'")
To explain the MATCHES criteria:
It is a regex that matches every cell that contains 1 or more characters.
The . operator matches any character.
The {1,} qualifies that you only want to match cells that have at 1 or more characters in them.
Here is a link to another SO question that describes this method.

Resources