Using Ruby, how can I convert a string to an array when there may be variable whitespace between terms? - ruby-on-rails

Imagine I have a string like this: "hello:world, foo:bar,biz:baz, last:term "
And I want to convert it to an array ["hello:world", "foo:bar", "biz:baz", "last:term"]
Essentially I want to split by comma, but also by a variable amount of whitespace. I could do the split and then go through each term and strip whitespace from either side, but I'm hoping there is a simpler way - maybe using Regexp? (I'm very unfamiliar with how to use Regexp). I'm using Ruby on Rails.

You can use scan with a Regexp:
string = "hello:world, foo:bar,biz:baz, last:term "
string.scan(/[^\s,]+/)
#=> ["hello:world", "foo:bar", "biz:baz", "last:term"]
Or you could use split to split the string at the , and the strip to remove the unwanted whitespace.
string = "hello:world, foo:bar,biz:baz, last:term "
string.split(',').map(&:strip)
#=> ["hello:world", "foo:bar", "biz:baz", "last:term"]
I would probably prefer the second version because it is easier to read and understand. Additionally, I wouldn't be surprised if the simple string methods of the second version would perform better for small strings because Regexps are pretty expensive and usually only worth it for more complex or bigger tasks.

Related

Escape quote in Dart Regex

I'm trying to use the regex /^[a-zA-Z0-9_$&+:;=?##|'<>.^*()%!-]+$/ with dart regex. I've seen you can use raw strings. So Ive put the above in between r'' like this:
r'^[a-zA-Z0-9_$&+:;=?##|'<>.^*()%!-]+$' but the ' is messing it up. How do I tell dart this is a special character..
EDIT
I tried this but it doesn't seem to work
static final RegExp _usernameRegExp = RegExp(
r"^[a-zA-Z0-9_$&+:;=?##|'<>.^*()%!-]+$",
);
So I have a TextField with a text controller for a username. A method like this
static bool isValidUsername(String username) {
return (_usernameRegExp.hasMatch(username));
}
I pass the controller.text as the username.
I've a function:
bool get isUserNameValid => (Validators.isValidUsername(userNameTextController.text.trim()));
I can type all the given characters in to the textbook but not '
Your RegExp source contains ', so you can't use that as string delimiter without allowing escapes. It also contains $ so you want to avoid allowing escapes.
You can use " as delimiter instead, so a raw string like r"...".
However, Dart also has "multi-line strings" which are delimited by """ or '''. They can, but do not have to, contain newlines. You can use those for strings containing both ' and ". That allows r'''...'''.
And you can obviously also use escapes for all characters that mean something in a string literal.
So, for your code, that would be one of:
r'''^[\w&+:;=?##|'<>.^*()%!-]+$'''
r"^[\w&+:;=?##|'<>.^*()%!-]+$"
'^[\\w&+:;=?##|\'<>.^*()%!-]+\$'
(I changed A-Za-z0-9$_ to \w, because that's precisely what \w means).
In practice, I'll always use a raw string for regexps. It's far too easy, and far too dangerous, to forget to escape a backslash, so use one of the first two options.
I'd probably escape the - too, making it [....\-] instead of relying on the position to make it non-significant in the character class. It's a fragile design that breaks if yo add one more character at the end of the character class, instead of adding it before the -. It's less fragile if you escape the -.

Remove quotes from string built from an array

I have user controller input like so (the length and # of items may change):
str = "['honda', 'toyota', 'lexus']"
I would like to convert this into an array, but I'm struggling to find the best way to do so. eval() does exactly what I need, but it is not very elegant and is dangerous in this case, since it's user controller input.
Another way is:
str[1..-2].split(',').collect { |car| car.strip.tr("'", '') }
=> ["honda", "toyota", "lexus"]
But this is also not very elegant. Any suggestions that are more 'Rubyish'?
You could use a regular expression:
# match (in a non-greedy way) characters up to a comma or `]`
# capture each word as a group, and don't capture `,` or `]`
str.scan(/'(.+?)'(?:,|\])/).flatten
Or JSON.parse (but accounting for the fact that single quotes are in fact technically not allowed in JSON):
JSON.parse( str.tr("'", '"') )
JSON.parse probably has a small edge over the regexp in terms of performance, but if you're expecting your users to do single quote escaping, then that tr is going to mess things up. In this case, I'd stick with the regexp.
The JSON.parse looks more correct, but here is another alternative:
str.split(/[[:punct:] ]+/).drop(1)

Rails: Given a String, check if an Array (of strings) contains a substring of String

Is there a more Railsy way to do this (without explicit regex, perhaps?):
array_o_strings = ["some strings", "I'd like", "to parse"]
string = "like to parse"
re = Regexp.union(array_o_strings.map { |i| Regexp.new(i) })
string =~ re
Just pining for magical Rails methods.
There's really nothing wrong with using a regular expression here if that's your intent. It's generally more efficient to use one of those than to go through the trouble of comparing arrays.
It's worth noting you don't have to do that much work to get this:
re = Regexp.union(array)
That should handle automatically escaping those strings and compiling them into a singular regular expression. Test with strings containing * and ? to be sure.
One note to add on style is that the =~ operator is a hold-over from Perl. It's preferable to use string.match(re) to make it clear what's going on there.
How big is the array? It may be worth comparing the speed using a regex vs checking each element. If the array is sorted shortest to longest that would help when checking one by one as you're more likely to find a match first.
In any event, this is one way:
array_o_strings.any?{|e| string.index(e) }

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

Regular expression in Ruby

Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.

Resources