Regular Expression Attack Vector? - ruby-on-rails

How does one "parameterize" variable input into a Regex in Ruby? For example, I'm doing the following:
q = params[:q]
all_values.collect { | col | [col.name] if col.name =~ /(\W|^)#{q}/i }.compact
Since it (#{q}) is a variable from an untrusted source (the query string), I have to assume it could be an attack vector. Any best practices here?

Try Regexp.escape:
>> Regexp.escape('foo\bar\baz$+')
=> "foo\\\\bar\\\\baz\\$\\+"
So your code would look something like:
q = params[:q]
re = Regexp.escape(q)
all_values.collect { | col | [col.name] if col.name =~ /(\W|^)#{re}/i }.compact

So, do you want the user to be able to provide an arbitrary regular expression, or just some literal text that surely can be quoted? If the user can provide both a regular expression and the text it will be attempted matched against, it's not hard to do a DoS-attack by providing an expression with exponential runtime.

Related

Changing text based on the final letter of user name using regular expression

I am looking to change the ending of the user name based on the use case (in the language system will operate, names ends depending on how it is used).
So need to define all endings of names and define the replacement for them.
Was suggested to use .gsub regular expression to search and replace in a string:
Changing text based on the final letter of user name
"name surname".gsub(/e\b/, 'ai')
this will replace e with ai, so "name surname = namai surnamai".
How can it be used for more options like: "e = ai, us = mi, i = as" on the same record?
thanks
You can use String#gsub with block. Docs say:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
So you can use a regex with concatenation of all substrings to be replaced and then replace it in the block, e.g. using a hash that maps matches to replacements.
Full example:
replacements = {'e'=>'ai', 'us'=>'mi', 'i' => 'as'}
['surname', 'surnamus', 'surnami'].map do |s|
s.gsub(/(e|us|i)$/){|p| replacements[p] }
end
#Sundeep makes an important observation in a comment on the question. If, for example, the substitutions were give by the following hash:
g = {'e'=>'ai', 's'=>'es', 'us'=>'mi', 'i' => 'as'}
#=> {"e"=>"ai", "s"=>"es", "us"=>"mi", "i"=>"as"}
'surnamus' would be converted (incorrectly) to 'surnamues' merely because 's'=>'es' precedes 'us'=>'mi' in g. That situation may not exist at present, but it may be prudent to allow for it in future, particularly because it is so simple to do so:
h = g.sort_by { |k,_| -k.size }.to_h
#=> {"us"=>"mi", "e"=>"ai", "s"=>"es", "i"=>"as"}
arr = ['surname', 'surnamus', 'surnami', 'surnamo']
The substitutions can be done using the form of String##sub that employs a hash as its second argument.
r = /#{Regexp.union(h.keys)}\z/
#=> /(?-mix:us|e|s|i)\z/i
arr.map { |s| s.sub(r,h) }
#=> ["surnamai", "surnammi", "surnamas", "surnamo"]
See also Regexp::union.
Incidentally, though key-insertion order has been guaranteed for hashes since Ruby v1.9, there is a continuing debate as to whether that property should be made use of in Ruby code, mainly because there was no concept of key order when hashes were first used in computer programs. This answer provides a good example of the benefit of exploiting key order.

Ruby on Rails: Checking for valid regex does not work properly, high false rate

In my application I've got a procedure which should check if an input is valid or not. You can set up a regex for this input.
But in my case it returns false instead of true. And I can't find the problem.
My code looks like this:
gaps.each_index do | i |
if gaps[i].first.starts_with? "~"
# regular expression
begin
regex = gaps[i].first[1..-1]
# a pipe is used to seperate the regex from the solution-string
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
puts "REGEX WRONGGGG -------------------"
#wrong_indexes << i
end
rescue
end
else
# normal string
if data[i].nil? || data[i].strip != gaps[i].first.strip
#wrong_indexes << i
end
end
An example would be:
[[~berlin|berlin]]
The left one before the pipe is the regex and the right one next to the pipe is the correct solution.
This easy input should return true, but it doesn't.
Does anyone see the problem?
Thank you all
EDIT
Somewhere in this lines must be the problem:
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
Update: Result without ~
The whole point is that you are initializing regex using the Regexp constructor
Constructs a new regular expression from pattern, which can be either a String or a Regexp (in which case that regexp’s options are propagated, and new options may not be specified (a change as of Ruby 1.8).
However, when you pass the regex (obtained with regex.split("|")[0..-2].join("|")) to the constructor, it is a string, and reg = Regexp.new(regex, true) is getting ~berlin (or /berlin/i) as a literal string pattern. Thus, it actually is searching for something you do not expect.
See, regex= "[[/berlin/i|berlin]]" only finds a *literal /berlin/i text (see demo).
Also, you need to get the pattern from the [[...]], so strip these brackets with regex = regex.gsub(/\A\[+|\]+\z/, '').split("|")[0..-2].join("|").
Note you do not need to specify the case insensitive options, since you already pass true as the second parameter to Regexp.new, it is already case-insensitive.
If you are performing whole word lookup, add word boundaries: regex= "[[\\bberlin\\b|berlin]]" (see demo).

ActiveSupport::Inflector::camelize - help in understanding regex

Short version:
I am having a rather hard time understanding two rather complex regular expressions in the ActiveSupport::Inflector::camelize method.
This is the definition of the camelize method:
def camelize(term, uppercase_first_letter = true)
string = term.to_s
if uppercase_first_letter
string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end
I have some difficulty understanding:
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
and:
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
Please explain to me what they mean. Thank you.
Long version
This shows me trying to understand the regex and how I interpret them to mean. It would be very helpful if you could go through this and correct my mistakes.
For the first regex
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
Based on what I am seeing, inflections.acronym_regex is from the Inflections class in the ActiveSupport::Inflector module, and in the initialize method of the Inflections class,
def initialize
#plurals, #singulars, #uncountables, #humans, #acronyms, #acronym_regex = [], [], [], [], {}, /(?=a)b/
end
acronym_regex is assigned /(?=a)b/. From what I understand from http://www.ruby-doc.org/core-2.0.0/Regexp.html#class-Regexp-label-Anchors ,
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but doesn't include those characters in the matched text
So /(?=a)b/ ensures that character a is inside the text, but we dont include character a inside the matched text, and what immediately follows character a must be character b. In other words, "abc" would match this regex, but "bbc" would not match this regex, and the matched text for "abc" would be "b" (instead of "ab").
So combining the value of inflections.acronym_regex into this regex /^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/, I do not know which of the following two regex results:
A. /^(?:/(?=a)b/(?=\b|[A-Z_])|\w)/
B. /^(?:(?=a)b(?=\b|[A-Z_])|\w)/
although I am thinking it is B. From what I understand, (?: provides grouping without capturing, (?= means positive lookahead assertion, \b matches word boundaries when outside brackets and matches backspace when inside brackets. So in english terms, regex B, when matching against a text, will find a string that begins with an a character, followed by a b character, and one of (1. backspace [whatever that may mean] 2. any uppercase character or underscore 3. any english alphabetic character, digit, or underscore).
However, I find it strange that passing upper_case_first_letter = false to the camelize function should cause it to match a string starting with the characters ab, given that that does not seem to be how the camelize function behaves.
For the second regex
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
The regex is:
/(?:_|(\/))([a-z\d]*)/i
I am guessing that this regex will match a substring that starts with either an _ or /, followed by 0 or more (upper or lowercase english alpabetic characters or digit). Furthermore, for the first group (?:_|(\/)), whether we match the _ or /, the ([a-z\d]*) capturing group will always be regarded as the second group. I do understand the part where the block tries to look up inflections.acronyms[$2] and on failure, does $2.captitalize.
Since (?: means grouping without capturing, what is the value of $1 when we match _ ? Is it still _ ? And for the .gsub('/', '::') portion, I am guessing that it gets applied for each match in the initial gsub, instead of being applied to the overall string after the outer gsub call is done?
Apologies for the really long post. Please point out my errors in understanding the 2 regular expressions, or explain them in a better way if you can do it.
Thank you.
However, I find it strange that passing upper_case_first_letter =
false to the camelize function should cause it to match a string
starting with the characters ab, given that that does not seem to be
how the camelize function behaves.
?: acts like a . here and does match the string (ie. single character) but there is no grouping, therefore the match is in $&.
Since (?: means grouping without capturing, what is the value of $1
when we match _ ? Is it still _ ?
It's nil since there is no capturing. The value is in $2
And for the .gsub('/', '::') portion, I am guessing that it gets
applied for each match in the initial gsub, instead of being applied
to the overall string after the outer gsub call is done?
It's applied to the overall result as gsub with block returns a string and the gsub('/', '::') is outside of a block.

string format check

Suppose I have string variables like following:
s1="10$"
s2="10$ I am a student"
s3="10$Good"
s4="10$ Nice weekend!"
As you see above, s2 and s4 have white space(s) after 10$ .
Generally, I would like to have a way to check if a string start with 10$ and have white-space(s) after 10$ . For example, The rule should find s2 and s4 in my above case. how to define such rule to check if a string start with '10$' and have white space(s) after?
What I mean is something like s2.RULE? should return true or false to tell if it is the matched string.
---------- update -------------------
please also tell the solution if 10# is used instead of 10$
You can do this using Regular Expressions (Ruby has Perl-style regular expressions, to be exact).
# For ease of demonstration, I've moved your strings into an array
strings = [
"10$",
"10$ I am a student",
"10$Good",
"10$ Nice weekend!"
]
p strings.find_all { |s| s =~ /\A10\$[ \t]+/ }
The regular expression breaks down like this:
The / at the beginning and the end tell Ruby that everything in between is part of the regular expression
\A matches the beginning of a string
The 10 is matched verbatim
\$ means to match a $ verbatim. We need to escape it since $ has a special meaning in regular expressions.
[ \t]+ means "match at least one blank and/or tab"
So this regular expressions says "Match every string that starts with 10$ followed by at least one blank or tab character". Using the =~ you can test strings in Ruby against this expression. =~ will return a non-nil value, which evaluates to true if used in a conditional like if.
Edit: Updated white space matching as per Asmageddon's suggestion.
this works:
"10$ " =~ /^10\$ +/
and returns either nil when false or 0 when true. Thanks to Ruby's rule, you can use it directly.
Use a regular expression like this one:
/10\$\s+/
EDIT
If you use =~ for matching, note that
The =~ operator returns the character position in the string of the
start of the match
So it might return 0 to denote a match. Only a return of nil means no match.
See for example http://www.regular-expressions.info/ruby.html on a regular expression tutorial for ruby.
If you want to proceed to cases with $ and # then try this regular expression:
/^10[\$#] +/

<< and >> symbols in Erlang

First of all, I'm an Erlang rookie here. I need to interface with a MySQL database and I found the erlang-mysql-driver. I'm trying that out, and am a little confused by some of the syntax.
I can get a row of data from the database with this (greatly oversimplified for brevity here):
Result = mysql:fetch(P1, ["SELECT column1, column2 FROM table1 WHERE column2='", Key, "'"]),
case Result of
{data, Data} ->
case mysql:get_result_rows(Data) of
[] -> not_found;
Res ->
%% Now 'Res' has the row
So now here is an example of what `Res' has:
[[<<"value from column1">>, <<"value from column2">>]]
I get that it's a list of records. In this case, the query returned 1 row of 2 columns.
My question is:
What do the << and >> symbols mean? And what is the best (Erlang-recommended) syntax for turning a list like this into a records which I have defined like:
-record(
my_record,
{
column1 = ""
,column2 = ""
}
).
Just a small note: the results are not bit string comprehensions per see, they are just bit strings. However you can use bit string comprehensions to produce a sequence of bit strings (which is described above with the generators and that), much like lists and lists comprehensions.
you can use erlang:binary_to_list/1 and erlang:list_to_binary/1 to convert between binary and strings (lists).
The reason the mysql driver returns bit strings is probably because they are much faster to manipulate.
In your specific example, you can do the conversion by matching on the returned column values, and then creating a new record like this:
case mysql:get_result_rows(Data) of
[] ->
not_found;
[[Col1, Col2]] ->
#my_record{column1 = Col1, column2 = Col2}
end
These are bit string comprehensions.
Bit string comprehensions are analogous to List Comprehensions. They are used to generate bit strings efficiently and succinctly.
Bit string comprehensions are written with the following syntax:
<< BitString || Qualifier1,...,QualifierN >>
BitString is a bit string expression, and each Qualifier is either a generator, a bit string generator or a filter.
• A generator is written as:
Pattern <- ListExpr.
ListExpr must be an expression which evaluates to a list of terms.
• A bit string generator is written as:
BitstringPattern <= BitStringExpr.
BitStringExpr must be an expression which evaluates to a bitstring.
• A filter is an expression which evaluates to true or false.
The variables in the generator patterns shadow variables in the function clause surrounding the bit string comprehensions.
A bit string comprehension returns a bit string, which is created by concatenating the results of evaluating BitString for each combination of bit string generator elements for which all filters are true.
Example:
1> << << (X*2) >> ||
<<X>> <= << 1,2,3 >> >>.
<<2,4,6>>

Resources