I have 2D array in which the second column has domain names of some emails, let us call the array myData[][]. I decided to use ArrayLib in order to search the second column for a specific domain.
ArrayLib.indexOf(myData, 1, domain)
Here is where I found an issue. In myData array, one of the domains look like this "ewmining.com" (pay attention to the w).
While searching for "e.mining.com" (notice the first dot), the indexOf() function actully gave me the row containing "ewmining.com".
This is what is in the array "ewmining.com"
This is what is in the serach string "e.mining.com"
It seams that ArrayLib treats the dot to mean any character. Is this supposed to be the correct behavior? Is there a way to stop this behavior and search for exact match.
I really need help on this issue.
Thanks in advance for your help.
The dot usually represents "any character" in regular expressions. I am not familiar with ArrayLib, but maybe you should look for a way to turn off regular expressions when searching. Otherwise you might have to escape the dot, for example search for e[.]mining[.]com
Related
I've been searching for a formula on Google Sheets that removes everything from a url, up to the first /. For example:
www.example.com/example/
www.example.com/example/example1
www.example.com/example/example1/example2
to:
/example/
/example/example1/
/example/example1/example2
Any help would be greatly appreciated!
Say your URL is in A1. Then
=regexreplace(A1,"[\w.-]*\.com/","")
works as long as the site portion of your URLs always end with .com.
When that is not true, it's a bit more cumbersome:
=substitute(A20,regexextract(A20,"[\w-.]*/"),"")
Quickly on why they work: [\w.-] means any character from A-Z, a-z, 0-9, _, ., -. * means any number of the preceding. regexreplace() replaces all matching patterns. And that's why without specifically .com, regexreplace() doesn't work for your problem. regexextract() extracts the part of URL that you don't want. So we can use substitute() to get rid of it.
If you are new to regular expression, I think this example is simple enough to get a little bit into it. For more complex scenarios, the use of regular expression will save us tons of work.
In your current problem, since you only need to recognize the first occurrence of / without more complex patterns, we can just search for that with find() and thus
=right(A1,len(A1)-find("/",A1))
also works.
Try this
on column A is your input
paste this in column B =IF(A2="",,RIGHT(REGEXEXTRACT(A2,".com\/.+"),LEN(REGEXEXTRACT(A2,".com\/.+"))-4))
you can drag it down without any problem
Hope it answered your question.
I have the following extract of code. My aim is to extract the value 7.4e-07 after the symbol DAN. My usual go-to formula (using MID & FIND formula's) for this can't work because DAN is surrounded by ", and therefore confuses the formula.
{"data":{"log":{"address":[{"balances":[{"currency":{"address":"example1","symbol":"ROB"},"value":0.0},{"currency":{"address":"example2","symbol":"DAN"},"value":7.4e-07},{"currency":{"address":"example3","symbol":"COLIN"},"value":0.0},{"currency":{"address":"example4","symbol":"BOB"},"value":0.0},{"currency":{"address":"example5","symbol":"PAUL"},"value":13426.64}}}
I will always need to find the number shown in the 'value' after DAN. However, all other data surrounding will change so cannot be used in the search formula.
Any help would be appreciated.
The extract the digit you want, it can be achieved by using regex, split, index, here is the formula, accept if help :)
=index(split(REGEXEXTRACT(A1,"\""DAN\""},\""value\"":[\d.a-zA-Z-]+"),":"),0,2)
This is the regex I used to extract the value including the beginning text
"DAN"},"value":[\d.a-zA-Z-]+
This is outcome from the regex,
You could try an arrayformula to work down the sheet, extracting all values after 'DAN':
=arrayformula(regexreplace(A1:A,".*(DAN...........)([\w\.\-]*)(\}.*)","$2"))
I have a UniCode string UniStr.
I also have a MAP of { UniCodeChar : otherMappedStrs }
I need the 'otherMappedStrs' version of UniStr.
Eg: UniStr = 'ABC', MAP = { 'A':'233','B':'#$','C':'9ij' }, Result = '233#$9ij'
I have come up with the formula below which works;
=ArrayFormula(JOIN("",VLOOKUP(REGEXEXTRACT(A1,REPT("(.)",LEN(A1))),MapRange,2,FALSE)))
The MAP being a whole character set (40 chars) is quite large.
I need to use this function in multiple spreadsheets. How can I subsume the MAP into the formula for portability ?
Is there a better way to iterate a string other than the REGEXEXTRACT method in formula ? This method has limitation for long strings.
I also tested the below formula. Problem here is it gives 2 results (or the size of the array within SUBSTITUTE replacement). If 3 substitutions made, then it gives three results. Can this be resolved ?
=ArrayFormula(SUBSTITUTE(A1,{"s","i"},{"#","#"}))
EDIT;
#Tom 's first solution appears best for my case (1) REGEX has an upper limit on search criteria which does not hinder in your solution (2) Feels fast (did not do empirical testing) (3) This is a better way to iterate string characters, I believe (you answered my Q2 - thanks)
I digress here. I wish google would introduce Named-Formulas or Formula-Aliases. In this case, hypothetically below. I have sent feed back along those lines many times. Nothing :(
MyFormula($str) == ArrayFormula(join(,vlookup(mid($str,row(indirect("1:"&len($str))),1), { "A","233";"B","#$";"C","9ij" },2,false)))
Not sure how long you want your strings to be, but the more traditional
=ArrayFormula(join(,vlookup(mid(A1,row(indirect("1:"&len(A1))),1), { "A","233";"B","#$";"C","9ij" },2,false)))
seems a bit more robust for long strings.
For a more radical idea, supposing the maximum length of your otherMappedStrings is 3 characters, then you could try:
=ArrayFormula(join(,trim(mid("233 #$9ij",find(mid(A1,row(indirect("1:"&len(A1))),1), "ABC")*3-2,3))))
where I have put a space in before #$ to pad it out to 3 characters.
Incidentally the original VLOOKUP is not case sensitive. If you want this behaviour, use SEARCH instead of FIND.
You seem to have several different Qs, but considering only portability, perhaps something like the following would help:
=join(,switch(arrayformula(regexextract(A1&"",rept("(.)",len(A1)))),"A",233,"B","#$","C","9ij"))
extended with 37 more pairs.
I would like to match strings/characters that are not surrounded by a well-defined string-wrapper. In this case the wrapper is '#L#' on the left of the string and '#R#' on the right of the string.
With the following string for example:
This is a #L#string#R# and it's #L#good or ok#R# to change characters in the next string
I would like to be able to search for (any number of characters) to change them on a case by case basis. For example:
Searching for "in", would match twice - the word 'in', and the 'in' contained within the last word 'string'.
Searching for a "g", should be found within the word 'change' and in the final word string (but not the first occurrence of string contained within the wrapper).
I'm somewhat familiar with how lookahead works in the sense that it identifies a match, and doesn't return the matching criteria as part of the identified match.
Unfortunately, I can't get my head around how to do it.
I've also been playing with this at http://regexpal.com/ but can't seem to find anything that works. Examples I've found for iOS are problematic, so perhaps the javascript tester is a tiny bit different.
I took some guidance from a previous question I asked, which seemed to be almost the same but sufficiently different to mean I couldn't work out how to reuse it:
Replacing 'non-tagged' content in a web page
Any ideas?
At first all the #L# to #R# blocks and then use alternation operator | to match the string in from the remaining string. To differentiate the matches, put in inside a capturing group.
#L#.*?#R#|(in)
DEMO
OR
Use a negative lookahead assertion. This would match the sub-string in only if it's not followed by #L# or #R#, zero or more times and further followed by #R#. So this would match all the in's which was not present inside the #L# and #R# blocks.
in(?!(?:(?!#[RL]#).)*#R#)
DEMO
I have a problem that requires me to write a regex that finds a line that containing exactly 3 groups of characters (it could be words or numbers) and that ends with another specific word. The way I had in mind was to find a pattern that ended in a space, and look for it 3 times. assuming this is the correct way to go about it, I do no know how to find a space, but I thought it would look like .*"find a space"{3} endword$. Is this the way it would be done? Even if it is not the way to do it how do you find a space? Any suggestions?
Assuming by three groups of words you would accept any non-space character, you could write:
/^\s*(?:\S+\s+){3}endword$/
The initial caret is to make sure you have exactly 3 non-space groups on the line.
Of course you need to consider whether things like control characters could appear, and adjust accordingly.
Depending on your flavor, something like the below would do it:
\b+.+?\b+.+?\b+.+?\bendword$
This makes use of the word boundary mark (\b) and non-greedy repetitions (+?), so it may be slightly different in your specific implementation, especially if you're using something old like grep.