I have a long list of information stored in a variable and I need to run some regex expressions against that variable and get various pieces of information from what is found.
How can you store the line that matches a regex expression in a variable?
How can you get the line number of the line that matches a regex expression?
Here is an example of what I'm talking about.
body = "service timestamps log datetime msec localtime show-timezone
service password-encryption
!
hostname switch01
!
boot-start-marker"
If I search for the line that contains "hostname" I need the line number, in this case it would be 4. I also need to store the line "hostname switch01" as another variable.
Any ideas?
Thanks!
First you'd want to convert the string to lines: body.split('\n'), then you want to add line numbers to the lines: .each_with_index. Then you want to select the lines .select {|line, line_nr| line =~ your_regex }. Putting it all together:
body.split('\n').each_with_index
.select {|line, line_nr| line =~ your_regex }
.map {|line, line_nr| line_nr }
This will give you all the lines matching 'your_regex'
Let's say you have an object file that provides a #lines method:
lines = file.lines.each_with_index.select {|line, i| line =~ /regex/ }
If you already have a list of lines you can leave out the call to #lines. If you have a string you can use string.split("\n").
This will result in the variable lines containing an array of 2-element arrays with the line that matched your RegEx and the index of the line in the original file.
Breakdown
file.lines gets the lines - of course the other methods I mentioned might also apply here for you. We then add the index to each element with #each_with_index, because you want to store these as well. This has the same effect as #map.with_index {|e, i| [e, i]}, i.e. map every element to [element, index]. We then use the #select method to get all lines that do match your RegEx (FYI, =~ is the matching operator in Ruby, Perl and other languages - in case you didn't already know). We're done after that, but you might need to further transform the data so you can process it.
Related
I am trying to apply the following regex to one of my views:
^([^\s]+)\s+
This is to remove any string of consecutive non-whitespace characters including any white space characters that follow from the start of the line (remove everything except the first word). I have input it on Rubular and it works.
I was wondering how I would be able to apply it to my rails project. Would I create a rails helper method? So far I have tested it in irb and it is not returning the right value:
I would like to know how I can fix my method and if making it a helper method is the right approach. Thank you very much for your help guys!
The =~ operator matches the regular expression against a string, and it returns either the offset of the match from the string if it is found, otherwise nil.
You could either try it with String.match and work with the match data.
like
str.match(^([^\s]+)\s+)
or you don't use regex for readability. Split the string on spaces and return and array of the words and take the first one, like:
str.split(' ').first
I have a test_list.txt file containing lines of file names. Each file name contains the date when they were created. Here's how it looks like:
test_list.txt:
UTF_06012018_SAMPLE_Control.xlsx
UTF_06022018_SAMPLE_Control.xlsx
UTF_06092018_SAMPLE_Control.xlsx
UTF_06022018_SAMPLE_Control.xlsx
UTF_06082018_SAMPLE_Control.xlsx
UTF_06032018_SAMPLE_Demand.xlsx
UTF_06092018_SAMPLE_Demand.xlsx
UTF_06122018_SAMPLE_Demand.xlsx
UTF_06032018_SAMPLE_Control.xlsx
UTF_06022018_SAMPLE_Demand.xlsx
The date in the file name is in the format mmddyyyy. Also, there are files which were created on the same date. What I'm trying to do is to print the line that matches the regex expression for the dates and sort them alphabetically by date.
Here's my code so far:
path = Dir.glob('/path/to/my/file/*.txt').first
regex = /(\d{1,2}\d{1,2}\d{4})/
samplefile = File.open(path)
string = File.read(samplefile)
string.scan(regex).each do|x|
sorted = x.sort_by { |s| s.scan(/\d+/).first.to_i }
puts sorted
end
However, what my code does is it only prints the dates, not the entire line. To add to that, it doesn't even sort them alphabetically. How to tweak it and make it do as I intend to?
You may use
string.scan(/^([^_]*_(\d++)(.*))/).sort_by { |m,n,z| [n.to_i,z] }.collect{ |m,n,z| m}.join("\n")
See the Ruby demo.
The regex will extract all lines into a three element array with the following values: whole line, the date string, and the string after the date. Then, .sort_by { |m,n,z| [n.to_i,z] } will sort by the date string first, and then by the substring after the date. The .collect{ |m,n,z| m} will only keep the first value of the array elements and .join("\n") will re-build the resulting string.
Note that instead of [n.to_i,z], you might want to parse the date string first, then use [Date.strptime(n,"%d%m%Y"),z] (add require 'date').
Regex details
^ - start of a line
([^_]*_(\d++)(.*)) - Group 1 (m): the whole line meeting the following patterns:
[^_]* - zero or more chars other than _
_ - an underscore
(\d++) - Group 2 (n): 1+ digits, a possessive match
(.*) - Group 3 (z): the rest of the line.
I am trying to use word wrap such that it can wrap a single long string of text within an array on multiple lines.
I found the link below to be somewhat helpful:
http://apidock.com/rails/ActionView/Helpers/TextHelper/word_wrap
I copied the regex listed in the link above, which matches a single long string within an array:
def breaking_wrap_wrap(msg, col = 80)
msg.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/,"\\1\\3\n")
end
However, I'm not sure how to fix the regex to match the string within the array:
undefined method `gsub' for ["adsgagsdgds"]:Array
I couldn't get it to work in Rubular or another gsub testing tool, so was wondering if anyone knew how to match the array above and break the line if a single string in the array is greater than 80 characters?
Thanks!
I am parsing a yaml file and searching for specific values, after the search matches i want to get the line number and print it. I managed to do exactly that but the major problem is that while parsing the yaml file using YAML.LOAD , the blank lines are ignored.
i can count the rest of the lines using keys i.e. 1 key per line but i an unable to count blank lines. please help, been stuck with this for a few days now.
this is how my code looks like:
hash = YAML.load(IO.read(File.join(File.dirname(__FILE__), 'en.yml')))
def recursive_hash_to_yml_string(input, hash, depth = 0)
hash.keys.each do |search|
#count = #count + 1
if hash[search].is_a?(String) && hash[search] == input
#yml_array.push(#count)
elsif hash[search].is_a?(Hash)
recursive_hash_to_yml_string(input, hash[search], depth + 1)
end
end
end
I agree with #Wukerplank - parsing a file should ignore blank lines. You might want to think about finding the line number using a different approach.
Perhaps you don't need to parse the YAML at all. If you are just searching the file for some matching text and returning the line number, maybe you'd manage better reading each line of the file using File.each_line.
You could iterate over each line in the file until you found a match and then do something with the line number.
I have string "(1,2,3,4,5,6),(1,2,3)" I would like to change it to "('1','2','3','4','5','6'),('1','2','3')" - replase all parts that mathces /([^,)("])/ with the '$1', '$2' etc
"(1,2,3,4,5,6),(1,2,3)".gsub(/([^,)("]\w*)/,"'\\1'")
gsub is a "global replace" method in String class. It finds all occurrences of given regular expression and replaces them with the string given as the second parameter (as opposed to sub which replaces first occurrence only). That string can contain references to groups marked with () in the regexp. First group is \1, second is \2, and so on.
Try
mystring.gsub(/([\w.]+)/, '\'\1\'')
This will replace numbers (ints/floats) and words with their "quote-surrounded" selves while leaving punctuation (except the dot) alone.
UPDATED: I think you want to search for this
(([^,)("])+)
And replace it with this
'$1'
the looks for anything 1 or more times and assigns it to the $1 variable slot due to using the parenthesis around the "\d". The replace part will use what it finds as the replacement value.