Removing lines that begin with > in a rails string - ruby-on-rails

I'm trying to remove any lines that begin with the character '>' in a long string (i.e. replies to an email).
In PHP I'd iterate over each line with an if statement, in linux I'd try and use sed or awk.
What's the most elegant rails approach?

You can try this:
your_string.gsub(/^\>.+\n/,'')

Your question is implying that the input is one string, containing multiple lines.
Do you want the output to be just one string with multiple lines as well? I'm assuming yes.
either using String and Array operations:
str.lines.reject{|x| x =~ /^>/}.join # this will return a new string, without those ">" lines
or using Regular Expressions:
str.gsub(/^>.+\n*/. '')
Better Solution:
You will need to use non-greedy multi-line matching mode for your Regular Expression:
str.gsub(/^>.*?$\n*/m, '') # by using gsub!() you can modify the string in place
^> matches your ">" character at the start of a line
.*?$ matches any characters after the start character until the end of the line (non-greedy)
\n* matches the newline character itself if any (you want to remove that as well)
the "m" at the end of the regular expressions indicates multi-line matching , which will apply the RegExp for each line in the string.

It should work as you expect:
your_string.lines.to_a.reject{|line| line[0] == '>'}.join

Related

Ruby Convert string into undescore, avoid the "/" in the resulting string

I have a name spaced class..
"CommonCar::RedTrunk"
I need to convert it to an underscored string "common_car_red_trunk", but when I use
"CommonCar::RedTrunk".underscore, I get "common_car/red_trunk" instead.
Is there another method to accomplish what I need?
Solutions:
"CommonCar::RedTrunk".gsub(':', '').underscore
or:
"CommonCar::RedTrunk".sub('::', '').underscore
or:
"CommonCar::RedTrunk".tr(':', '').underscore
Alternate:
Or turn any of these around and do the underscore() first, followed by whatever method you want to use to replace "/" with "_".
Explanation:
While all of these methods look basically the same, there are subtle differences that can be very impactful.
In short:
gsub() – uses a regex to do pattern matching, therefore, it's finding any occurrence of ":" and replacing it with "".
sub() – uses a regex to do pattern matching, similarly to gsub(), with the exception that it's only finding the first occurrence (the "g" in gsub() meaning "global"). This is why when using that method, it was necessary to use "::", otherwise a single ":" would have been left. Keep in mind with this method, it will only work with a single-nested namespace. Meaning "CommonCar::RedTrunk::BigWheels" would have been transformed to "CommonCarRedTrunk::BigWheels".
tr() – uses the string parameters as arrays of single character replacments. In this case, because we're only replacing a single character, it'll work identically to gsub(). However, if you wanted to replace "on" with "EX", for example, gsub("on", "EX") would produce "CommEXCar::RedTrunk" while tr("on", "EX") would produce "CEmmEXCar::RedTruXk".
Docs:
https://apidock.com/ruby/String/gsub
https://apidock.com/ruby/String/sub
https://apidock.com/ruby/String/tr
This is a pure-Ruby solution.
r = /(?<=[a-z])(?=[A-Z])|::/
"CommonCar::RedTrunk".gsub(r, '_').downcase
#=> "common_car_red_trunk"
See (the first form of) String#gsub and String#downcase.
The regular expression can be made self-documenting by writing it in free-spacing mode:
r = /
(?<=[a-z]) # assert that the previous character is lower-case
(?=[A-Z]) # assert that the following character is upper-case
| # or
:: # match '::'
/x # free-spacing regex definition mode
(?<=[a-z]) is a positive lookbehind; (?=[A-Z]) is a positive lookahead.
Note that /(?<=[a-z])(?=[A-Z])/ matches an empty ("zero-width") string. r matches, for example, the empty string between 'Common' and 'Car', because it is preceeded by a lower-case letter and followed by an upper-case letter.
I don't know Rails but I'm guessing you could write
"CommonCar::RedTrunk".delete(':').underscore

Validate name to have no tabs or backslashes - Rails [duplicate]

I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).
Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.
You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.
You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =
Just match /^index\.php/, and then reject whatever matches it.
In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>
Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end

string regex on ruby on rails

how to make sure my string format must be like this :
locker_number=3,email=ucup#gmail.com,mobile_phone=091332771331,firstname=ucup
i want my string format `"key=value,"
how to make regex for check my string on ruby?
This regex will find what you're after.
\w+=.*?(,|$)
If you want to capture each pairing use
(\w+)=(.*?)(?:,|$)
http://rubular.com/r/A2ernIzQkq
The \w+ is one or more occurrences of a character a-z, 1-9, or an underscore. The .*? is everything until the first , or the end of the string ($). The pipe is or and the ?: tells the regex no to capture that part of the expression.
Per your comment it would be used in Ruby as such,
(/\w+=.*?(,|$)/ =~ my_string) == 0
You can use a regex like this:
\w+=.*?(,|$)
Working demo
You can use this code:
"<your string>" =~ /\w+=.*?(,|$)/
What about something like this? It's picky about the last element not ending with ,. But it doesn't enforce the need for no commas in the key or no equals in the value.
'locker_number=3,email=ucup#gmail.com,mobile_phone=091332771331,firstname=ucup' =~ /^([^=]+=[^,]+,)*([^=]+=[^,]+)$/

Pattern match dropping new lines characters

How to extract the values from a csv like string dropping the new lines characters (\r\n or \n) with a pattern.
A line looks like:
1.1;2.2;Example, 3
Notice there are only 3 values and the separator is ;. The problem I'm having is to come up with a pattern that reads the values while dropping the new line characters (the file comes from a windows machine so it has \r\n, reading it from a linux and would like to be independent from the new line character used).
My simple example right now is:
s = "1.1;2.2;Example, 3\r\n";
p = "(.-);(.-);(.-)";
a, b, c = string.match(s, p);
print(c:byte(1, -1));
The two last characters printed by the code above are the \r\n.
The problem is that both, \r and \n are detected by the %c and %s classes (control characters and space characters), as show by this code:
s = "a\r";
print(s:match("%c"));
print(s:match("%s"));
print(s:match("%d"));
So, is it possible to left out from the match the new lines characters? (It should not be assumed that the last two characters will be new lines characters)
The 3º value may contain spaces, punctuation and alphanumeric characters and since \r\n are detected as space characters a pattern like `"(.-);(.-);([%w%s%c]-).*" does not work.
Your pattern
p = "(.-);(.-);(.-)";
does not work: the third field is always empty because .- matches a little as possible. You need to anchor it at the end of the string, but then the third field will contain trailing newline chars:
p = "(.-);(.-);(.-)$";
So, just stop at the first trailing newline char. This also anchors the last match. Try this pattern instead:
p = "(.-);(.-);(.-)[\r\n]";
If trailing newline chars are optional, try this pattern:
p = "(.-);(.-);(.-)[\r\n]*$";
Without any lua experience I found a naive solution:
clean_CR = s:gsub("\r","");
clean_NL = clean_CR:gsub("\n","");
With POSIX regex syntax I'd use
^([^;]*);([^;]*);([^\n\r]*).*$
.. with "\n" and "\r" possibly included as "^M", "^#" (control/unicode characters) .. depending on your editor.

string format checking (with partly random string)

I would like to use regular expression to check if my string have the format like following:
mc_834faisd88979asdfas8897asff8790ds_oa_ids
mc_834fappsd58979asdfas8897asdf879ds_oa_ids
mc_834faispd8fs9asaas4897asdsaf879ds_oa_ids
mc_834faisd8dfa979asdfaspo97asf879ds_dv_ids
mc_834faisd111979asdfas88mp7asf879ds_dv_ids
mc_834fais00979asdfas8897asf87ggg9ds_dv_ids
The format is like mc_<random string>_oa_ids or mc_<random string>_dv_ids . How can I check if my string is in either of these two formats? And please explain the regular expression. thank you.
That's a string start with mc_, while end with _oa_ids or dv_ids, and have some random string in the middle.
P.S. the random string consists of alpha-beta letters and numbers.
What I tried(I have no clue how to check the random string):
/^mc_834faisd88979asdfas8897asff8790ds$_os_ids/
Try this.
^mc_[0-9a-z]+_(dv|oa)_ids$
^ matches at the start of the line the regex pattern is applied to.
[0-9a-z] matces alphabetic and numeric chars.
+ means that there should be one or more chars in this set
(dv|oa) matches dv or oa
$ matches at the end of the string the regex pattern is applied to.
also matches before the very last line break if the string ends with a line break.
Give /\Amc_\w*_(oa|dv)_ids\z/ a try. \A is the beginning of the string, \z the end. \w* are one or more of letters, numbers and underscores and (oa|dv) is either oa or dv.
A nice and simple way to test Ruby Regexps is Rubular, might have a look at it.
This should work
/mc_834([a-z,0-9]*)_(oa|dv)_ids/g
Example: http://regexr.com?2v9q7

Resources