Fail to concat a few substring parts using regex - google-sheets

I have this string:
<IMG SRC="https://ad.net/ddm/trackimp/N347.15BE.COM/B24.28;dc_trk_aid=48;dc_trk_cid=141;ord=%%TS%%;dc_lat=;dc_rdid=;ltd=?" BORDER="0" HEIGHT="1" WIDTH="1" ALT="Advertisement">
which I want to turn into this string:
https://ad.net/ddm/trackimp/N347.15BE.COM/B24.28;dc_trk_aid=48;dc_trk_cid=141;ord=[ts];dc_lat=;dc_rdid=;ltd=?"
I have tried this formula, but it doesn't work as I expect:
=REGEXREPLACE(K36,"(.+)(http.*?_ord=)(.+)(;ltd=?)(.+)", "$2[ts]$4")

This is a modification with Aresvik's answer. You could get the [ts] by using lower as Aresvik mentioned. But if you want the other parts as is (not lowered), then combine it with regexextract and lower the result to replace the string between the pair of %%.
This replaces [$2] with "["&lower(regexextract(K36,"\%\%(.*)\%\%"))&"]"
Modified Formula:
=regexreplace(regexreplace(K36,"<IMG SRC=""(.*)\ BORDER=.*","$1"),"(\%\%)(.*)(\%\%)","["&lower(regexextract(K36,"\%\%(.*)\%\%"))&"]")
If you want the " removed at the end, use this formula instead
=regexreplace(regexreplace(A1,"<IMG SRC=""(.*)"" BORDER=.*","$1"),"(\%\%)(.*)(\%\%)","["&lower(regexextract(A1,"\%\%(.*)\%\%"))&"]")

For the first bit, try:
=regexreplace(K36,"<IMG SRC=""(.*)\ BORDER=.*","$1")
For the %%TS%% try:
=regexreplace(regexreplace(K36,"<IMG SRC=""(.*)\ BORDER=.*","$1"),"(\%\%)(.*)(\%\%)","[$2]")
Not sure if [TS] is OK or if you want [ts]? You could wrap lower() round everything?

Related

Grep with BBEdit

I do have one large text file with lot of the following patterns;
because of,this
or that,has
or,not
Of course I want to change the following
because of, this
or that, has
or, not
To make myself clear: i would like to insert a space after each ,
How can i do that with BBEdit Find/Replace/Grep?
Find works ok with
[\,](\w)
but i can't figure out the coresponding part for replace.
Find: (\,)(\w)
Replace: \1 \2
Note: You need to hit the spacebar between \1 and \2. Works on my computer with BBedit v13
I complicates matters.
Pattern like Letter,Letter can also be found with ,\b.
\b is at the beginning or end of a word. \b in a regular expression means "word boundary".
The replacement is then done with ,_
Nota Bene: _ is a "Space" after ,
you could just replace all commas with a comma-space and then replace all space-space with space

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

Isolating/removing Characters from string using rails

I am using ruby on rails
I have
article.id = 509969989168Q000475601
I would like the output to be
article.id = 68Q000475601
basically want to get rid of all before it gets to 68Q
the numbers in front of the 68Q can be various length
is there a way to remove up to "68Q"
it will always be 68Q and Q is always the only Letter
is there a way to say remove all characters from 2 digits before "Q"
I'd use:
article.id[/68Q.*/]
Which will return everything from 68Q to the end of the string.
article.id.match(/68Q.+\z/)[0]
You can do this easily with the split method:
'68Q' + article.id.split('68Q')[1]
This splits the string into an array based on the delimiter you give it, then takes the second element of that array. For what it's worth though, #theTinMan's solution is far more elegant.

Regex in Ruby not working

I have a string from which I want to extract a certain part:
Original String: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx
Result Desired: /abc/d7_t/g-12/jkl/
The number of characters can vary in the entire string. It has alphabets, numbers, underscore and hyphen. I want to basically cut the string after the 5th "/"
I tried a few regex, but it seems there is some mistake with the format.
If a non-regexp approach is acceptable, how about this:
s.split('/').take(n).join('/')+'/'
Where s if your string (in your case: /abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx).
def cut_after(s, n)
s.split('/').take(n).join('/')+'/'
end
Then
cut_after("/abc/d7_t/g-12/jkl/m-n3/pqr/stu/vwx", 5)
should work. Not as compact as a regexp, but some people may find it clearer.
The regexp would be: %r(/(?:[^/]+/){4}). Note that it is a good idea in this case to use the %r literal version to avoid escaping slashes. Unescaped slashes are likely the cause of your format errors.
Match any sequence of chars except '/' 4 times :-
(\/[^\/]+){4}\/

Regular expression in Ruby

Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.

Resources