Regex get from character to whitespace - ruby-on-rails

I'm trying to pull the username from a post in rails. I thought the best way to do this would be using regex and pull from the # to the next whitespace character which would give me the username.
e.g in the string:
'#stackoverflow is good for help'
I would be able to pull from the # to the next whitespace character giving me the string 'stackoverflow'
My regex skills are a little lacking so any help would be appreciated.
Thanks.

You can use \S to match any non-whitespace character, for example:
(?<=#)\S*
Will match any sequence of zero or more non-whitespace characters which appear immediately after a # character. The (?<=…) creates a lookbehind assertion, so the # will not be included in the match.
Demonstration
Alternatively, you could use:
#(\S*)
This will match a #, followed by zero or more non-whitespace characters, captured in group 1.
Demonstration

How about this:
regex = /#(\S*)/
\S here matches all non-whitespace character.

Related

How can I construct a regular expression to account for non-consecutive characters?

I'm currently using this regex for my names \A^[a-zA-Z'.,\s-]*\z; however, I don't want there to be any consecutive characters for a apostrophe, period, comma, whitespace, or hyphen. How can I do this?
The significant part would be (?:[a-zA-Z]|['.,\s-](?!['.,\s-])).
Meaning:
(?:
[a-zA-Z] # letters
| # or
['.,\s-] # any of these
(?!['.,\s-]) # but in front can not be another of these
)
But, in this case:
Guedes, Washington
------^^----------
Would invalidate the name, so maybe you want remove \s from the negative look-ahead.
Hope it helps.
How about this (string of letters, potentially ending with one of those terminator chars)
\A^[a-zA-Z]*['.,\s-]?\z

What does this pattern ^[%w-.]+$ mean in Lua?

Just came across this pattern, which I really don't understand:
^[%w-.]+$
And could you give me some examples to match this expression?
Valid in Lua, where %w is (almost) the equivalent of \w in other languages
^[%w-.]+$ means match a string that is entirely composed of alphanumeric characters (letters and digits), dashes or dots.
Explanation
The ^ anchor asserts that we are at the beginning of the string
The character class [%w-.] matches one character that is a letter or digit (the meaning of %w), or a dash, or a period. This would be the equivalent of [\w-.] in JavaScript
The + quantifier matches such a character one or more times
The $ anchor asserts that we are at the end of the string
Reference
Lua Patterns
Actually it will match nothing. Because there is an error: w- this is a start of a text range and it is out of order. So it should be %w\- instead.
^[%w\-.]+$
Means:
^ assert position at start of the string
[%w\-.]+ match a single character present in the list below
+ Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
%w a single character in the list %w literally (case sensitive)
\- matches the character - literally
. the literal character .
$ assert position at end of the string
Edit
As the OP changed the question and the tags this answer no longer fits as a proper answer. It is POSIX based answer.
As #zx81 comment:
%w is \w in Lua which means any alphanumeric characters plus "_"

Difference between \b and \s in Regular Expression

I was learning regular expression in iOS, saw this tutorial:http://www.raywenderlich.com/30288/nsregularexpression-tutorial-and-cheat-sheet
It reads like this for \b:
\b matches word boundary characters such as spaces and punctuation. to\b will match the "to" in "to the moon" and "to!", but it will not match "tomorrow". \b is handy for "whole word" type matching.
and \s:
\s matches whitespace characters such as spaces, tabs, and newlines. hello\s will match "hello " in "Well, hello there!".
I have two questions on this:
1) what is the difference between \s and \b? when to use which?
2) \b is handy for "whole word" type matching -> Don't understand the meaning..
Need some guidance on these two.
\b Boundary characters
\b matches the boundary itself but not the boundary character (like a comma or period). It has no length in itself but can be used to find for example e in the end of a word.
For example in the sentence: "Hello there, this is one test. Testing"
The regex e\b will match an e if it's at the end of the word (followed by a word boundary). Notice in the image below that the e in "test" and "Testing" didn't match since the "e" is not followed by a boundary.
\s Whitespace
\s on the other hand matches the actual white space characters (like spaces and tabs). In the same sentence it will match all the spaces between the words.
Edit
Since \b doesn't make much sense alone I showed to how to it as e\b (above). The OP asked (in a comment) about what e\s would match compared to e\b to better explain the difference between \b and \s.
In the same string there is only one match for e\s while there was two matches for e\b since the comma is not a whitespace. Note that the e\s match (image 3) includes the white space where as the e\b match doesn't (image 1).
\b is matching a word boundary. That is a zero width assertion, means it is not matching a character, it is matching a position, where a certain condition is true.
\b is related to \w. \w is defining "word characters", means letters, digits and underscores. So \b is now matching on a change from a word character to a non-word character, or the other way round. Means it matches the start and end of a word, but not the character before or after the word.
\s is a predefined character class that is matching any whitespace character.
See and try out what \bFoo\b matches here on Regexr
See and try out what \sFoo\s matches here on Regexr
\b is zero-width. That is, it doesn't actually match any character. Meanwhile, \s does match a character. This is an important distinction for capturing and more complicated regular expressions.
For example, say you're trying to match numbers that begin with multiple zeros, like 007 or 000101101. You might try:
0+\d*
But see, that would also match 1007 and 101000101101! So then, you might try:
\s0+\d*
But see how that wouldn't match a 007 at the beginning of the string (because there's no space character)? Using \b allows you to get the "whole word (or number)":
\b0+\d*
\b matches any character that is not a letter or number without including itself in the match.
\s matches only white space.
For example:
\b would match any of these: "!?,.##$%^&*()_+ ".
$text = "Hello, Yo! moo .";
$regex = "~o\b~";
^---Will match all three o's.
$text = "Hello, Yo! moo .";
$regex = "~o\s~";
^---Will only match the 'o' in 'moo'.

How to update this REGEX to make sure string does not have _(underscore) at the end or beigning

This is the regular expression which i have, i need to make sure that string does not start or end with underscore , underscore may appear in between.
/^[a-zA-Z0-9_.-]+$/
I have tried
(?!_)
But doesn't seem to work
Allowed strings:
abcd
abcd_123
Not allowed strings:
abcd_
_abcd_123
Not too hard!
/^[^_].*[^_]$/
"Any character except an underscore at the start of the line (^[^_]), then any characters (.*), then any character except an underscore before the end of the line ([^_]$)."
This does require at least two characters to validate the string. If you want to allow one character lines:
/^[^_](.*[^_]|)$/
"Anything except an underscore to start the line, and then either some characters plus a non-underscore character before end-of-line, or just an immediate end-of-line.
You could approach this in the inverse way,
Check all those that do match starting and ending underscores like this:
/^_|_$/
^_ #starts with underscore
| #OR
_$ #ends with underscore
And then eliminate those that match. The above regexp is much more easier to read.
Check : http://www.rubular.com/r/H3Axvol13b
Or you can try the longer regex:
/^[a-zA-Z0-9.-][a-zA-Z0-9_.-]*[a-zA-Z0-9.-]$|^[a-zA-Z0-9.-]+$|^[a-zA-Z0-9.-][a-zA-Z0-9.-]$/
^[a-zA-Z0-9.-] #starts with a-z, or A-Z, or 0-9, or . -
[a-zA-Z0-9_.-]* #anything that can occur and the underscore
[a-zA-Z0-9.-]$ #ends with a-z, or A-Z, or 0-9, or . -
| #OR
^[a-zA-Z0-9.-]$ #for one-letter words
| #OR
^[a-zA-Z0-9.-][a-zA-Z0-9.-]$ #for two letter words
Check: http://www.rubular.com/r/FdtCqW6haG
/^[a-zA-Z0-9.-][a-zA-Z0-9_.-]+[a-zA-Z0-9.-]$/
Try this
Description:
In the first section, [a-zA-Z0-9.-], regex only allows lower and upper case alphabets, digits, dot and hyphen.
In the next section, [a-zA-Z0-9_.-]+, regex looks for a single or more than one characters that are lower or upper case alphabets, digits dot, hyphen or an underscore.
The last part, [a-zA-Z0-9.-], is the same as the first part that restricts the input to end with an underscore.
Try this:
Recently had the same concern and this is how I did it.
// '"^[a-zA-Z0-9_.-]*$"' → Alphanumeric and 「.」「_」「-」
// "^[^_].*[^_]$" → Reject start and end of string if contains 「_」
// (?=) REGEX AND operator
SLUG_REGEX = '"(?=^[a-zA-Z0-9_.-]*$)(?=^[^_].*[^_]$)"';
I used this snippet for my Laravel Validation so you may need to change the code as needed like " to / based on your code sample and other answers' code.

username regex in rails

I am trying to find a regex to limit what a person can use for a username on my site. I don't need to have it check to see how many characters there are in it, as another validation does this. Basically all I need to make it do is make sure that it allows: letters (capital and lowercase) numbers, dashes and underscores.
I came across this: /^[-a-z]+$/i
But it doesn't seem to allow numbers.
What am I missing?
The regex you're looking for is
/\A[a-z0-9\-_]+\z/i
Meaning one or more characters of range a-z, range 0-9, - (needs to be escaped with a backslash) and _, case insensitive (the i qualifier)
Use
/\A[\w-]+\z$/
\w is shorthand for letters, digits and underscore.
\A matches at the start of the string, \z matches at the end of the string. These tokens are called anchors, and Ruby is a bit special with regard to them: Most regex engines use ^ and $ as start/end-of-string anchors by default, whereas in Ruby they can also match at the start/end of lines (which matters if you're working with multiline strings). Therefore, it's safer (as #JustMichael pointed out) to use \A and \z because there is no such ambiguity.
Your regular expression contains a character class [-a-z] that allows the characters - (dash) and a through z. In order to expand the range of characters allowed by this character class, you will need to add more characters within the [].
Please see Character Classes or Character Sets for further information and examples.

Resources