how would I extract the last part of the string with a regex? The last part of the string will always be a 3 digit number as well.
"/gdc/md/vin06hdpq8442qocut9aoih8q5j5k43u/obj/185"
Something like: /\d+$/ should get all digits at the end of the string
Just slice the last 3 chars :
str = "/gdc/md/vin06hdpq8442qocut9aoih8q5j5k43u/obj/185"
p str[-3,3] # => "185"
There are a number of different ways you can accomplish this.
You can use regular expressions to match either all trailing digits (/\d+$/), or just the last three trailing digits (/\d{3}$/), depending on what behaviour you want in case the string for some reason has more digits than you expected:
str.match(/\d+$/)[0]
#=> "185"
str.match(/\d{3}$/)[0]
#=> "185"
Another option is to split the string into an array, using / as the separator, and then grabbing the last element (which will contain everything past the last /.)
str.split("/").last
#=> "185"
Or you can use the fact that substrings can be accessed using indices, much like arrays, and use it to grab the last three digits:
str[-3, 3]
#=> "185"
Unless you're doing this thousands of times inside a loop, any performance difference will be insignificant, so you can go for the option that is offer the most robustness and legibility.
Note that in all four cases, you will be returned a string, so if you intend to use this number as an integer, you'll need to first convert it using #to_i.
Or a combination of both with str[/\d{3}$/]
or maybe str[-3..-1]
So many ways to skin that string :)
Related
I need a Google Sheet function that will return the position of the last instance of a particular character. Basically, FIND, but starting on the right.
For example, for the data set below, I need to return the position of the last dash.
ABC-DEF-GHI = 8
ABCD-EF-GH-IJK = 11
AB-C-DE-FGH-I-JK = 14
Thanks!
I don't know where to start. MID might work, but the file names are of different lengths and different formats. The files just generally end with - ***.png, and I need the asterisk. The string I need is also of variable length and can contain spaces (the string is the name of the student).
Here's a possible solution:
=len(regexextract(A1,".*-"))
It's essentially extracting everything up to the last dash and taking the length of the resulting string.
for the whole array try:
=INDEX(LEN(REGEXEXTRACT(A1:A3; "(.*-)")))
I'm sure there's a pretty simple solution, but I cant' get to it.
I'm trying to sum a list of numbers, but only the values that are round numbers/whole integers.
Eg column A:
1
1.5
3
2.4
2
sum of the whole numbers
1 + 3 + 2 = 6
Any hint?
Let's suppose your numbers list begins in A2 and runs downward (i.e., A2:A). You can use this:
=SUM(FILTER(A2:A,A2:A=INT(A2:A)))
In plain English, this reads as follows: "Sum only those numbers in A2:A where the original value is the same as the integer-only portion of that value."
Try:
=sumproduct((A1:A5)*(A1:A5=int(A1:A5)))
this will also work in Excel
use dot detection:
=INDEX(SUM(IF(REGEXMATCH(""&A:A, "\."),,A:A)))
Another solution, you can use SEARCH to search for the decimal point:
=SUM(A1:A)-SUM(FILTER(A1:A,SEARCH(".",TO_TEXT(A1:A))))
or =SUM(FILTER(A1:A,NOT(ISNUMBER(SEARCH(".",A1:A)))))
as JvdV mentioned in his comment.
Either try QUERY():
=SUM(QUERY(A:A,"where A matches '\d+'"))
Or FILTER():
=SUM(FILTER(A:A,MOD(A:A,1)=0))
Note: This 1st option makes use of the possibility to use a regular expression inside the "where" clause of QUERY(). Use =SUM(QUERY(A:A,"where A matches '-?\d+'")) if you want to account for positive and negative integers.
I am using the following formula to extract the substring venue01 from column C, the problem is that when value string in column C is shorter it only extracts the value 1 I need it to extract anything straight after the - (dash) no matter the length of the value text in column c
={"VenueID";ARRAYFORMULA(IF(ISBLANK(A2:A),"",RIGHT(C2:C,SEARCH("-",C2:C)-21)))}
There is a much simpler solution using regular expressions.
=REGEXEXTRACT(A1,".*-(.*)")
In case you are no familiar with Regular Expressions what this means is, get me every string of characters ((.*)) after a dash (-).
Example
Reference
REGEXTRACT
Test regular expressions
Cheat sheet for regular expressions
To answer bomberjackets question in the comment of Raserhin:
To select the part of the string before the "-"
=REGEXEXTRACT(A1,"(.*)-.*")
EXAMPLE
example of code
Adding to your original formula. I think if you'd use RIGHT and inside it reverse the order of the string with ARRAY then that may work.
=Right(A1,FIND("-",JOIN("",ARRAYFORMULA(MID(A1,LEN(A1)-ROW(INDIRECT("1:"&LEN(A1)))+1,1))))-1)
It takes string from the right side up to X number of characters.
Number of character is fetched from reversing the text, then finding
the dash "-".
It adds one more +1 of the text as it will take out so it accounts
for the dash itself, if no +1 is added, it will show the dash on
the extracted string.
The REGEX on the other answer works great too, however, you can control a number of character to over or under trim. E.g. if there is a space after the dash and you would like to always account for one more char.
I have string ( "-" ) delimited data (alphanumeric, variable length) in a single column which has the format...
Column A
"aaa-bbb-ccc"
"ddd-bbb-eee"
"aaa-fff-ggg"
I have been able to use array_constrain() to return partial data elsewhere but this is N elements from the beginning of the array to the end of the array so I could have...
aaa (num_cols = 1)
aaa-bbb (num_cols = 2)
aaa-bbb-ccc (num_cols = 3)
I am looking to get the last N elements from the split data so...
bbb-ccc OR
ccc
num_cols only goes from the beginning of the array to the back of the array so that's no good for my scenario.
This answer to a similar question suggests using regexextract() to retrieve the last value which if my RegEx Foo was stronger then perhaps I could make that work with a little nudge in the right direction.
I know that index() can be used but that only returns one element at a time from the split data so that gets messy / inefficient.
So the question is does anybody know how to return the last N elements from an array? Ordinary sorting wouldn't work but reversing the array could work.
There's no reverse function. I see two ways:
write small function is sctipt: reverse array
use ugly huge formula
Huge formula sample
=JOIN("-",QUERY({ArrayFormula(ROW(INDIRECT("A1:A"&COUNTA(SPLIT(A1,"-"))))),TRANSPOSE(SPLIT(A1,"-"))},"select Col2 order by Col1 desc limit 2 "))
change limit 2 in the end of the formula to get N last elements.
Script sample
Try this:
function reverseLine(line) {
return line[0].reverse();
}
Use as custom formula: =reverseLine(B1:D1)
for line: aaa bbb ccc returns:
ccc
bbb
aaa
Provided the variable length is exclusively from the number of sets of the format -aaa appended to the first three characters then those three characters and as many further sets as desired may be stripped off the front with:
=replace(A1,1,find("-",A1)*B1,"")
where A1 contains the likes of aaa-bbb-ccc and B1 the number of sets additional to the first three characters to be removed, plus one.
Why does =SPLIT("1,2-5,4", ",")
equal
1 42040 4
instead of
1 2-5 4 ?
I have all of the cells formatted at plain text.
Regextract should give you the desired output. Try:
=ArrayFormula(regexextract("1,2-5,4", {"^(\d+),",",(.+),",",(\d+)$"}))
To complement JPV's answer.
You can use:
=REGEXEXTRACT(A1,"(.*?),(.*?),(.*)")
which is "hard-coded" to splitting exactly 3 elements (as JPV's is). To give more flexibility, you can use something like:
=REGEXEXTRACT(A1&REPT(",",10),REPT("(.*?),",10))
which is limited to a maximum of 10 elements (that number can be changed to suit). However, it will output an array that is always that maximum number of elements long (padded out with blank cells). You could use QUERY or FILTER to filter out those blank cells - the formula will become a little convoluted.
Alternatively, you can "code" your string such that automatic date coercion is avoided, and then "uncode" it after the SPLIT:
=ArrayFormula(SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"-","x"),","),"x","-"))
With 1,2-5,4 in a cell this can be split as required with Data > Split text into columns... .