I am tryng to get rid of shortcodes inside a Google Sheet column. I have many items such as [spacer type="1" height="20"][spacer] or [FinalTilesGallery id="37"] I just would like to cancel them. Is there any simple way to do it?
Thanks !
For in-place replacement, the quick option would be to use the Find and Replace dialog (Ctrl + H) with Search Using Regular Expressions turned on, which is more powerful than your standard Find and Replace.
Find: \[.*?\] - Match anything within an open-bracket up to the very next close-bracket. This should work assuming you have no nested brackets, e.g. [[no][no]].
If you do have nested brackets, you'll have to change this to \[[^\[\]]*\]. And continue to Replace All until all the codes are gone.
Replace: Nothing.
Replace All. If you don't want to affect other sheets that may be in your document, make sure you select the right range to work with, too.
This just erases everything within the brackets.
If you want to erase any redundant spaces left by this, simply Find and Replace again (with Regular Expressions) on + (space and plus), which will match 1 or more spaces and replace with (single space).
E.g.:
string [] [] string2 -> string string2 after the shortcode replacement.
After replacing spaces, it will become string string2.
Let's say your original strings are in the range A2:A. Place the following into B2 of an otherwise completely empty Column B (or the second cell of any other empty column):
=ArrayFormula(IF(A2:A="",,TRIM(REGEXREPLACE(A2:A,"\[[^\[\]]+\]",""))))
I can't see your data, so I don't know what kind of information is between these shortcodes. If you find that this leaves you with concatenated pieces of data where there should be spaces between them, replace the above with this version:
=ArrayFormula(IF(A2:A="",,TRIM(REGEXREPLACE(SUBSTITUTE(SUBSTITUTE(A2:A,"["," ["),"]","] "),"\[[^\[\]]+\]",""))))
I can't teach regular expression language here. But I will note that, since square brackets have specific meaning within regex, your literal square brackets must be indicated with the escape character: the backslash.
Here is the regex expression alone:
\[[^\[\]]+\]
The opening \[ and the closing \], then, reference your actual opening and closing bracket sets. If we remove those, we have this left:
[^\[\]]+
Again, you see the escaped opening and closing square brackets, which I'll replace with the word these:
[^these]+
What remains there are opening and closing brackets with regex meaning, i.e., "anything in this group." And the circumflex symbol ^ as the first character within this set of square brackets means "anything except." The + symbol means "in any string length of one or more characters."
So that whole regex expression then reads: "A literal open square bracket, followed by one or more characters that are anything except square brackets, ending with a literal closing square bracket."
And we are REGEXREPLACE-ing any instance of that with "" (i.e., nothing).
Related
I have a plain text file with a one string per line. I'd like to identify any instances where a string contains a value outside of a restricted character set. In this particular instance, if the string contains any character outside of the set "[THADGRC.SMBN-WVKY]" I want to retain it and pass it along to a new file.
For example, let's say the original file "mystrings.txt" contained the following data:
THADGRC.SMBN-WVKY
YKVW-NBMS.CRGDHAT
THADGRC.SMBN-WVKYI
My intention is to retain only the third sequence, because it contains a character outside of the allowed set (I) in this case.
It doesn't matter how many times, or in what order, an allowed character is present - all I care about is if a character exists in that string outside of the allowed set.
Originally I tried:
cat mystrings.txt | grep -v [THADGRC\.SMBN-WVKY] > badstrings.txt
but of course the third string contains those allowed character in addition to the non-allowed characters, thus this search ended up producing no "offending" strings.
Last thing: I'm not sure what characters outside of the allowed set might exist in this text file. It would be great to know ahead of time to just search for anything with an "I", but I don't actually know this ahead of time.
So the question: is there a way to use grep (or another tool, say awk?) to pass in a restricted list of characters, and flag any instances where a string contains any number of characters outside of that set?
Thanks for your consideration
I think that your problem is N-W. This doesn't match "N", "-" and "W", it matches a range from "N" to "W". You should move "-" to the end of the character class, or escape it. I suggest changing to:
grep '[^THADGRC.SMBNWVKY-]' mystrings.txt
Also, note that "." doesn't have to be escaped when it's inside a character class.
Your attempt says "remove any lines which contain one of these characters at least once". But you want "print any lines which contain at least one character not in this set."
(Also, quote your regular expressions , and lose the useless cat.)
grep '[^-THADGRC.SMBNWVKY]' mystrings.txt > badstrings.txt
I moved the dash to the beginning of the character class on the assumption that you want a literal dash, not the regex range N-W (i.e. N, O, P, Q, R, S, T, U, V, W).
I have some strings with a sentence and i need to subdivise it into a substring of maximum 40 characters.
But i don't want to split the sentence in the middle of a word.
I tried with .gsub function but it's return 40 characters maximum and avoid to cut the string in the middle of a word. But it's return only the first occurence.
sentence[0..40].gsub(/\s\w+$/,'')
I tried with split but i can select only the fist 40 characters and split in the middle of a word...
sentence.split(...){40}
My string is "Sure, we will show ourselves only when we know the east door has been opened.".
The string output i want is
["Sure, we will show ourselves only when we","know the east door has
been opened."]
Do you have a solution ? Thanks
Your first attempt:
sentence[0..40].gsub(/\s\w+$/,'')
almost works, but it has one fatal flaw. You are splitting on the number of characters before cutting off the last word. This means you have no way of knowing whether the bit being trimmed off was a whole word, or a partial word.
Because of this, your code will always cut off the last word.
I would solve the problem as follows:
sentence[/\A.{0,39}[a-z]\b/mi]
\A is an anchor to fix the regex to the start of the string.
.{0,39}[a-z] matches on 1 to 40 characters, where the last character must be a letter. This is to prevent the last selected character from being punctuation or space. (Is that desired behaviour? Your question didn't really specify. Feel free to tweak/remove that [a-z] part, e.g. [a-z.] to match a full stop, if desired.)
\b is a word boundary look-around. It is a zero-width matcher, on beginning/end of words.
/mi modifiers will include case insensitive (i.e. A-Z) and multi-line matches.
One very minor note is that because this regex is matching 1 to 40 characters (rather than zero), it is possible to get a null result. (Although this is seemingly very unlikely, since you'd need a 1-word, 41+ letter string!!) To account for this edge case, call .to_s on the result if needed.
Update: Thank you for the improved edit to your question, providing a concrete example of an input/result. This makes it much clearer what you are asking for, as the original post was somewhat ambiguous.
You could solve this with something like the following:
sentence.scan(/.{0,39}[a-z.!?,;](?:\b|$)/mi)
String#scan returns an array of strings that match the pattern - so you can then re-join these strings to reconstruct the original.
Again, I have added a few more characters (!?,;) to the list of "final characters in the substring". Feel free to tweak this as desired.
(?:\b|$) means "either a word boundary, or the end of the line". This fixes the issue of the result not including the final . in the substrings. Note that I have used a non-capture group (?:) to prevent the result of scan from changing.
I have following regex handy to match all the lines containing console.log() or alert() function in any javascript file opened in the editor supporting PCRE.
^.*\b(console\.log|alert)\b.*$
But I encounter many files containing window.alert() lines for alerting important messages, I don't want to remove/replace them.
So the question how to regex-match (single line regex without need to run frequently) all the lines containing console.log() and alert() but not containing word window. Also how to escape round brackets(parenthesis) which are unescapable by \, to make them part of string literal ?
I tried following regex but in vain:
^.*\b(console\.log|alert)((?!window).)*\b.*$
You should use a negative lookhead, like this:
^(?!.*window\.).*\b(console\.log|alert)\b.*$
The negative lookhead will assert that it is impossible to match if the string window. is present.
Regex Demo
As for the parenthesis, you can escape them with backslashes, but because you have a word boundary character, it will not match if you put the escaped parenthesis, because they are not word characters.
The metacharacter \b is an anchor like the caret and the dollar sign.
It matches at a position that is called a "word boundary". This match
is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
I have been coding a program in Lua that automatically formats IRC logs from a roleplay. In the roleplay logs there is a specific guideline for "Out of character" conversation, which we use double parentheses for. For example: ((<Things unrelated to roleplay go here>)). I have been trying to have my program remove text between double brackets (and including both brackets). The code is:
ofile = io.open("Output.txt", "w")
rfile = io.open("Input.txt", "r")
p = rfile:read("*all")
w = string.gsub(p, "%(%(.*?%)%)", "")
ofile:write(w)
The pattern here is > "%(%(.*?%)%)" I've tried multiple variations of the pattern. All resulted in fruitless results:
1. %(%(.*?%)%) --Wouldn't do anything.
2. %(%(.*%)%) --Would remove *everything* after the first OOC message.
Then, my friend told me that prepending the brackets with percentages wouldn't work, and that I had to use backslashes to 'escape' the parentheses.
3. \(\(.*\)\) --resulted in the output file being completely empty.
4. (\(\(.*\)\)) --Same result as above.
5. (\(\(.*?\)\) --would for some reason, remove large parts of the text for no apparent reason.
6. \(\(.*?\)\) --would just remove all the text except for the last line.
The short, absolute question:
What pattern would I need to use to remove all text between double parentheses, and remove the double parentheses themselves too?
You're friend is thinking of regular expressions. Lua patterns are similar, but different. % is the correct escape character.
Your pattern should be %(%(.-%)%). The - is similar to * in that it matches any number of the preceding sequence, but while * tries to match as many characters as it can (it's greedy), - matches the least amount of characters possible (it's non-greedy). It won't go overboard and match extra double-close-parenthesis.
Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.