Words including given string in Ruby - ruby-on-rails

I'm writing little Rails api application, and I need to analyze string to find words having given string like:
Assuming my source text is hello mr one two three four nine nineteen and I want to check occurence of on, it will produce: one, and if I'll check occurence of ne t in the same string it will result in one two.
I know there is an ugly way with substrings, counting positions and parsing string this way, but I think it can be solved with regex scan.
Please say if you need some additional information, thanks.

▶ str = 'hello mr one two three four nine nineteen'
#⇒ "hello mr one two three four nine nineteen"
▶ re = ->(pattern) { /\p{L}*#{pattern}\p{L}*/ }
▶ str[re.('ne t')]
#⇒ "one two"
▶ str[re.('on')]
#⇒ "one"
Matcher \p{L} is generally better than \w and, especially, \S because it matches all utf-8 letters.
To match accented letters as well (i. e. combined ï in “naïve”,) one should extend left and right matchers:
▶ re = ->(pattern) { /[\p{L}\p{Mc}]*#{pattern}[\p{L}\p{Mc}]*/ }
Please note, that code above will return the first match. To return all matches, use String#scan instead of String#[]:
▶ str.scan re.('ni')
#⇒ ["nine", "nineteen"]

Use a regular expression:
search = "on"
/\s([^\s]*#{search}.[^\s]*)\s/.match("hello mr one two three four nine nineteen")[1]
# returns "one"
search = "ne t"
/\s([^\s]*#{search}.[^\s]*)\s/.match("hello mr one two three four nine nineteen")[1]
# returns "one two"
The way it works is it finds the substring you are looking for, and then groups any additional characters that are attached to the ends of your substring stopping at the first whitespace on both ends.

Related

How to find a word in a single long string?

I want to be able to copy and paste a large string of words from say a text document where there are spaces, returns and not commas between each and every word. Then i want to be able to take out each word individually and put them in a table for example...
input:
please i need help
output:
{1, "please"},
{2, "i"},
{3, "need"},
{4, "help"}
(i will have the table already made with the second column set to like " ")
havent tried anything yet as nothing has come to mind and all i could think of was using gsub to turn spaces into commas and find a solution from there but again i dont think that would work out so well.
Your delimiters are spaces ( ), commas (,) and newlines (\n, sometimes \r\n or \r, the latter very rarely). You now want to find words delimited by these delimiters. A word is a sequence of one or more non-delimiter characters. This trivially translates to a Lua pattern which can be fed into gmatch. Paired with a loop & inserting the matches in a table you get the following:
local words = {}
for word in input:gmatch"[^ ,\r\n]+" do
table.insert(words, word)
end
if you know that your words are gonna be in your locale-specific character set (usually ASCII or extended ASCII), you can use Lua's %w character class for matching sequences of alphanumeric characters:
local words = {}
for word in input:gmatch"%w+" do
table.insert(words, word)
end
Note: The resulting table will be in "list" form:
{
[1] = "first",
[2] = "second",
[3] = "third",
}
(for which {"first", "second", "third"} would be shorthand)
I don't see any good reasons for the table format you have described, but it can be trivially created by inserting tables instead of strings into the list.

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

Ruby .scan method returns empty using regex

So given a string like this "\"turkey AND ham\" NOT \"roast beef\"" I need to get an array with the inner strings like so: ["turkey AND ham", "roast beef"] and eliminate OR's, AND's and NOT's that may or may not be there.
With the help of Rubular I came up with this regex /\\["']([^"']*)\\["']/
which returns the following 2 groups:
Match 1
1. turkey AND ham
Match 2
1. roast beef
however when I use it with .scan keep getting and empty array.
I looked at this and this other SO posts, and a few others, but can not figure out where I am going wrong
Here is the result from my rails console:
=> q = "\"turkey and ham\" OR \"roast beef\""
=> q.scan(/\\["']([^"']*)\\["']/)
=> []
Expectation:
["turkey AND ham", "roast beef"]
I shall also mention I suck at regex.
When the regex used with scan contains a capture group (#davidhu2000's approach), one generally can use lookarounds1 instead. It's just a matter of personal preference. To allow for double-quoted strings that contain either single- or (escaped) double-quoted strings, you could use the following regex.
r = /
(?<=") # match a double quote in a positive lookbehind
[^"]+ # match one or more characters that are not double-quotes
(?=") # match a double quote in a positive lookahead
| # or
(?<=') # match a single quote in a positive lookbehind
[^']+ # match one or more characters that are not single-quotes
(?=') # match a single quote in a positive lookahead
/x # free-spacing regex definition mode
"\"turkey AND ham\" NOT 'roast beef'".scan(r)
#=> ["turkey AND ham", "roast beef"]
As '"turkey AND ham" NOT "roast beef"' #=> "\"turkey AND ham\" NOT \"roast beef\"" (i.e., how the single-quoted string is saved), we need not be concerned about that being an additional case to deal with.
1 For any in the audience who still consider regular expressions to be black magic, there are four kinds of lookarounds (positive and negative lookbehinds and lookaheads) as elaborated in the doc for Regexp. Sometimes they are regarded as "zero-width" matches as they are not part of the matched text.
You regex is trying to match \, which won't match anything in the string, since the \ existed to escape the double quote, and won't be part of the string.
So if you remove \\ in your regex
res = q.scan(/["']([^"']*)["']/)
This will return a 2d array
res = [["turkey and ham"], ["roast beef"]]
Each inner array is all the matching groups from the regex, so if you have two capture groups in your regex, you will see two items in the inner array.
If you want a simple array, you can run flatten method on the array.

Full name regex in Ruby

I know there are lots of similar questions, but I couldn't find my case anywhere.
I'm trying to write a Full Name RegEx in Ruby on Rails user model.
It should validate that first name and last name are filled with one whitespace. Both of the names should contain at least 2 characters (ex: Li Ma).
As a bonus, but not necessary I would like to trim the whitespaces to one character in case that user will mistype and enter more than one whitespace (ex: Li Ma will be trimmed to Li Ma)
Currently I'm validating it like that (Warning: It might be incorrect):
validates :name,
presence: true,
length: {
maximum: 64,
minimum: 5,
message: 'must be a minimum: 5 letters and a maximum: 64 letters'},
format: {
# Full Name RegEx
with: /[\w\-\']+([\s]+[\w\-\']){1}/
}
This works for me, but doesn't check for minimum 2 characters for each name (ex: Peter P is now correct). This also accepts multiple whitespaces which is not good (ex: Peter P)
I know that this problem of identifying names is very culture-centric and it might be not a proper way to validate full name (maybe there are people with one character name), but this is currently a requirement.
I don't want to split this field to 2 different fields First name and Last name as it will complicate user interface.
You could match the following regex:
/([\w\-\']{2,})([\s]+)([\w\-\']{2,})/
and replace with: (assuming it supports capturing groups)
'\1 \3' or $1 $3 whatever the syntax is:
It gets rid of extra whitespaces and only keeps one, as you wanted.
Demo: http://regex101.com/r/oQ6aO7
result = subject.gsub(/\A(?=[\w' -]{5,64})([\w'-]{2,})([\s]{1})\s*?([\w'-]{2,})\Z/, '\1\2\3')
http://regex101.com/r/dT1fJ4
Assert position at the beginning of the string «^»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[\w' -]{5,64})»
Match a single character present in the list below «[\w' -]{5,64}»
Between 5 and 64 times, as many times as possible, giving back as needed (greedy) «{5,64}»
A word character (letters, digits, and underscores) «\w»
The character “'” «'»
The character “ ” « »
The character “-” «-»
Match the regular expression below and capture its match into backreference number 1 «([\w'-]{2,})»
Match a single character present in the list below «[\w'-]{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
A word character (letters, digits, and underscores) «\w»
The character “'” «'»
The character “-” «-»
Match the regular expression below and capture its match into backreference number 2 «([\s]{1})»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «[\s]{1}»
Exactly 1 times «{1}»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the regular expression below and capture its match into backreference number 3 «([\w'-]{2,})»
Match a single character present in the list below «[\w'-]{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
A word character (letters, digits, and underscores) «\w»
The character “'” «'»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Best way to count words in a string in Ruby?

Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?
string.split.size
Edited to explain multiple spaces
From the Ruby String Documentation page
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ' ' were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
" now's the time".split #=> ["now's", "the", "time"]
While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.
I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.
The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.
counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]
# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]
# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]
The gem provides a bunch more useful methods.
If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):
>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]
However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".
This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.
puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s
puts "your sentence is " + target + " words long"
The best way to do is to use split method.
split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings.
split takes two parameters, namely; pattern and limit.
pattern is the delimiter over which the string is to be split into an array.
limit specifies the number of elements in the resulting array.
For more details, refer to Ruby Documentation: Ruby String documentation
str = "This is a string"
str.split(' ').size
#output: 4
The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.
The above solution is wrong, consider the following:
"one-way street"
You will get
["one-way","", "street"]
Use
'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size
This splits words only on ASCII whitespace chars:
p " some word\nother\tword|word".strip.split(/\s+/).size #=> 4

Resources