Ruby Parsing an array of string - ruby-on-rails

This is a simple code to parse a simple string in ruby
str = "Amdh#34HB!x"
length = str.length
upper = str.scan(/[A-Z]/).count #count upper chars
lower = str.scan(/[a-z]/).count #count lower chars
digit = str.scan(/[0-9]/).count #count digits
special = str.scan(/[^a-z0-9]/i).count #count special chars
result = "Length : " + length.to_s + " Upper : " + upper.to_s + " Lower : " + lower.to_s + " Digit : " + digit .to_s + " Special : " + special.to_s
The result is given as "Length : 11 Upper : 3 Lower : 4 Digit : 2 Special : 2"
I want to do the same things to an array of strings
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
and so I can know the details like above of each element of the array
Example :
array[0] = "Length : 11 Upper : 3 Lower : 4 Digit : 2 Special : 2"
array[1] = "Length : 8 Upper : 3 Lower : 3 Digit : 2 Special : 0"
array[2] = "Length : 8 Upper : 1 Lower : 3 Digit : 3 Special : 1"
....
I know the answer seems simple by using each method but didn't find the right way to do it.
The code above is not optimised, if there is any better suggestion, you are welcome!

You need only make a single pass through the string to obtain the needed counts.
def obtain_counts(str)
str.each_char.with_object(Hash.new(0)) do |c,h|
h[case(c)
when /[[:upper:]]/ then :upper
when /[[:lower:]]/ then :lower
when /[[:digit:]]/ then :digit
else :special
end] += 1
end
end
def construct_array(arr)
arr.map! { |str|
"Length : #{str.length} Upper : %d Lower : %d Digit : %d Special : %d" %
obtain_counts(str).values_at(:upper, :lower, :digit, :special) }
end
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
construct_array(array)
#=> ["Length : 11 Upper : 3 Lower : 4 Digit : 2 Special : 2",
# "Length : 8 Upper : 3 Lower : 3 Digit : 2 Special : 0",
# "Length : 8 Upper : 1 Lower : 3 Digit : 3 Special : 1"]
array
#=> ["Length : 11 Upper : 3 Lower : 4 Digit : 2 Special : 2",
# "Length : 8 Upper : 3 Lower : 3 Digit : 2 Special : 0",
# "Length : 8 Upper : 1 Lower : 3 Digit : 3 Special : 1"]
Note that
["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"].map { |str| obtain_counts(str) }
#=> [{:upper=>3, :lower=>4, :special=>2, :digit=>2},
# {:upper=>3, :lower=>3, :digit=>2},
# {:special=>1, :digit=>3, :upper=>1, :lower=>3}]
Notice that the second hash in this array has no key :special (because the second string contained no special characters). That explains why, in obtain_counts, we need Hash.new(0) (empty hash with default 0), rather than simply {}.

You may use simple map to do this:
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
result = array.map do |str|
length = str.length
upper = str.scan(/[A-Z]/).count #count upper chars
lower = str.scan(/[a-z]/).count #count lower chars
digit = str.scan(/[0-9]/).count #count digits
special = str.scan(/[^a-z0-9]/i).count #count special chars
{length: length, upper: upper, lower: lower, digit: digit, special: special}
end
[117] pry(main)> result
=> [{:length=>11, :upper=>3, :lower=>4, :digit=>2, :special=>2},
{:length=>8, :upper=>3, :lower=>3, :digit=>2, :special=>0},
{:length=>8, :upper=>1, :lower=>3, :digit=>3, :special=>1}]

I guess you want to do this:
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
result = []
array.each do |str|
length = str.length
upper = str.scan(/[A-Z]/).count #count upper chars
lower = str.scan(/[a-z]/).count #count lower chars
digit = str.scan(/[0-9]/).count #count digits
special = str.scan(/[^a-z0-9]/i).count #count special chars
result << "Length : " + length.to_s + " Upper : " + upper.to_s + " Lower : " + lower.to_s + " Digit : " + digit .to_s + " Special : " + special.to_s
end
puts result

try this solution:
def string_check(str)
result = str.scan(/([A-Z])|([a-z])|([0-9])|([^a-z0-9])/).inject([0,0,0,0]) do
|sum,arr| sum.map.with_index{|e,i| e+(arr[i].nil? ? 0: 1) if !arr.nil?}
end
"Length : #{str.size} Upper : #{result[0]} Lower : #{result[1]} Digit : #{result[2]} Special : #{result[3]}"
end
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
array.each {|s| puts string_check(s)}
outputs:
Length : 11 Upper : 3 Lower : 4 Digit : 2 Special : 2
Length : 8 Upper : 3 Lower : 3 Digit : 2 Special : 0
Length : 8 Upper : 1 Lower : 3 Digit : 3 Special : 1

More DRY solution:
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"]
formats = { Upper: /[A-Z]/,
Lower: /[a-z]/,
Digit: /[0-9]/,
Special: /[^a-z0-9]/i }
array.map do |e|
"Length: #{e.length}, " +
formats.map {|k, v| "#{k}: #{e.scan(v).count}" }.join(', ')
end
#=> ["Length: 11, Upper: 3, Lower: 4, Digit: 2, Special: 2",
# "Length: 8, Upper: 3, Lower: 3, Digit: 2, Special: 0",
# "Length: 8, Upper: 1, Lower: 3, Digit: 3, Special: 1"]

Here's a start to help you move into a more OO Ruby script:
class Foo
attr_reader :str, :str_len, :upper, :lower, :digit, :punctuation
def initialize(str)
#str = str
#str_len = str.length
#upper, #lower, #digit, #punctuation = %w[upper lower digit punct].map { |re| str.scan(/[[:#{re}:]]/).count }
end
def to_s
('"%s": ' % str) +
[:str_len, :upper, :lower, :digit, :punctuation].map { |s|
'%s: %s' % [s.to_s.upcase, instance_variable_get("##{s}")]
}.join(' ')
end
end
array = ["Amdh#34HB!x", "AzErtY45", "#1A3bhk2"].map { |s| Foo.new(s) }
puts array.map(&:to_s)
Which, when run, outputs:
"Amdh#34HB!x": STR_LEN: 11 UPPER: 3 LOWER: 4 DIGIT: 2 PUNCTUATION: 2
"AzErtY45": STR_LEN: 8 UPPER: 3 LOWER: 3 DIGIT: 2 PUNCTUATION: 0
"#1A3bhk2": STR_LEN: 8 UPPER: 1 LOWER: 3 DIGIT: 3 PUNCTUATION: 1
The regular expression classes like [[:upper:]] are POSIX definitions, which help relieve some of the visual noise of a traditional expression's classes. See the Regexp documentation for more information.
It can be DRYer but that's an exercise for the student. You should be able to coerce this into something closer to what you want.

Related

Parsing and count words from multi lines text in Lua

Say I have the multi-lines text:
str = [[
The lazy dog sleeping on the yard.
While a lazy old man smoking.
The yard never green again.
]]
I can split each words using:
for w in str:gmatch("%S+") do print(w) end
But how I can get results as an example:
The = 3 words, line 1,3
Lazy = 2 words, line 1,2
Dog = 1 word, line 1
..and so on?
Thank you
You could detect the \n using gmatch like you are already to count the words.
The pattern would be something like "[^\n]+" and the code something like this:
local str = [[
The lazy dog sleeping on the yard.
While a lazy old man smoking.
The yard never green again.
]]
local words = {}
local lines = {}
local line_count = 0
for l in str:gmatch("[^\n]+") do
line_count = line_count + 1
for w in l:gmatch("[^%s%p]+") do
w = w:lower()
words[w] = words[w] and words[w] + 1 or 1
lines[w] = lines[w] or {}
if lines[w][#lines[w]] ~= line_count then
lines[w][#lines[w] + 1] = line_count
end
end
end
for w, count in pairs(words) do
local the_lines = ""
for _,line in ipairs(lines[w]) do
the_lines = the_lines .. line .. ','
end
--The = 3 words, line 1,3
print(w .." = " .. count .. " words , lines " .. the_lines)
end
Full output, note i also changed the pattern you used for capturing the words to "[^%s%p]+" i did this to remove the . that was getting attached to smoking, again, and yard.
smoking = 1 words , lines 2,
while = 1 words , lines 2,
green = 1 words , lines 3,
never = 1 words , lines 3,
on = 1 words , lines 1,
lazy = 2 words , lines 1,2,
the = 3 words , lines 1,3,
again = 1 words , lines 3,
man = 1 words , lines 2,
yard = 2 words , lines 1,3,
dog = 1 words , lines 1,
old = 1 words , lines 2,
a = 1 words , lines 2,
sleeping = 1 words , lines 1,

How to get each individual digit of a given number in Basic?

I have one program downloaded from internet and need to get each digit printed out from a three digit number. For example:
Input: 123
Expected Output:
1
2
3
I have 598
Need to Get:
5
9
8
I try using this formula but the problem is when number is with decimal function failed:
FIRST_DIGIT = (number mod 1000) / 100
SECOND_DIGIT = (number mod 100) / 10
THIRD_DIGIT = (number mod 10)
Where number is the above example so here is calulation:
FIRST_DIGIT = (598 mod 1000) / 100 = 5,98 <== FAILED...i need to get 5 but my program shows 0 because i have decimal point
SECOND_DIGIT = (598 mod 100) / 10 = 9,8 <== FAILED...i need to get 9 but my program shows 0 because i have decimal point
THIRD_DIGIT = (598 mod 10) = 8 <== CORRECT...i get from program output number 8 and this digit is correct.
So my question is is there sample or more efficient code that get each digit from number without decimal point? I don't want to use round to round nearest number because sometime it fill failed if number is larger that .5.
Thanks
The simplest solution is to use integer division (\) instead of floating point division (/).
If you replace each one of your examples with the backslash (\) instead of forward slash (/) they will return integer values.
FIRST_DIGIT = (598 mod 1000) \ 100 = 5
SECOND_DIGIT = (598 mod 100) \ 10 = 9
THIRD_DIGIT = (598 mod 10) = 8
You don't have to do any fancy integer calculations as long as you pull it apart from a string:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
PRINT MID$(X$, Z, 1)
NEXT
Then, for example, you could act upon each string element:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
NEXT
Additionally, you could tear apart the string character by character:
INPUT X
X$ = STR$(X)
FOR Z = 1 TO LEN(X$)
SELECT CASE MID$(X$, Z, 1)
CASE " ", ".", "+", "-", "E", "D"
' special char
CASE ELSE
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
END SELECT
NEXT
I'm no expert in Basic but looks like you have to convert floating point number to Integer. A quick google search told me that you have to use Int(floating_point_number) to convert float to integer.
So
Int((number mod 100)/ 10)
should probably the one you are looking for.
And, finally, all string elements could be parsed:
INPUT X
X$ = STR$(X)
PRINT X$
FOR Z = 1 TO LEN(X$)
SELECT CASE MID$(X$, Z, 1)
CASE " "
' nul
CASE "E", "D"
Exponent = -1
CASE "."
Decimal = -1
CASE "+"
UnaryPlus = -1
CASE "-"
UnaryNegative = -1
CASE ELSE
Q = VAL(MID$(X$, Z, 1))
N = N + 1
PRINT "Digit"; N; " equals"; Q
END SELECT
NEXT
IF Exponent THEN PRINT "There was an exponent."
IF Decimal THEN PRINT "There was a decimal."
IF UnaryPlus THEN PRINT "There was a plus sign."
IF UnaryNegative THEN PRINT "There was a negative sign."

Doesn't remove duplicate elements from array

I am trying to take input from user in an array .And want to remove duplicate elements but the result is weird .I don't have to use uniq or any other ruby method.Here is my code
digits = []
digits = gets.chomp.to_i
k= digits & digits
puts k
input - 1 2 3 4 1 2 3 <br>
Required output- 1 2 3 4<br>
Getting output 1
gets.chomp returns string "1 2 3 4 1 2 3"
Then you call to_i on that string:
"1 2 3 4 1 2 3".to_i => 1
Consequentially 1 & 1 => 1
You should do this:
digits = gets.chomp.split(' ').map(&:to_i)
k = digits & digits
puts k

Using Regex and ruby regular expressions to find values

So I'm currently trying to sort values from a file. I'm stuck on the finding the first attribute, and am not sure why. I'm new to regex and ruby so I'm not sure how to go about the problem. I'm trying to find values of a,b,c,d,e where they are all positive numbers.
Here's what the line will look like
length=<a> begin=(<b>,<c>) end=(<d>,<e>)
Here's what I'm using to find the values
current_line = file.gets
if current_line == nil then return end
while current_line = file.gets do
if line =~ /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
length, begin_x, begin_y, end_x, end_y = $1, $2, $3, $4, $5
puts("length:" + length.to_s + " begin:" + begin_x.to_s + "," + begin_y.to_s + " end:" + end_x.to_s + "," + end_y.to_s)
end
end
for some reason it never prints anything out, so I'm assuming it never finds a match
Sample input
length=4 begin=(0,0) end=(3,0)
A line with 0-4 decimals after 2 integers seperated by commas.
So it could be any of these:
2 4 1.3434324,3.543243,4.525324
1 2
18 3.3213,9.3233,1.12231,2.5435
7 9 2.2,1.899990
0 3 2.323
Here is your regex:
r = /length=<(\d+)> begin=((\d+),(\d+)) end=((\d+),(\d+))/
str.scan(r)
#=> nil
First, we need to escape the parenthesis:
r = /length=<(\d+)> begin=\((\d+),(\d+)\) end=\((\d+),(\d+)\)/
Next, add the missing < and > after "begin" and "end".
r = /length=<(\d+)> begin=\(<(\d+)>,<(\d+)>\) end=\(<(\d+)>,<(\d+)>\)/
Now let's try it:
str = "length=<4779> begin=(<21>,<47>) end=(<356>,<17>)"
but first, let's set the mood
str.scan(r)
#=> [["4779", "21", "47", "356", "17"]]
Success!
Lastly (though probably not necessary), we might replace the single spaces with \s+, which permits one or more spaces:
r = /length=<(\d+)>\s+begin=\(<(\d+)>,<(\d+)>\)\send=\(<(\d+)>,<(\d+)>\)/
Addendum
The OP has asked how this would be modified if some of the numeric values were floats. I do not understand precisely what has been requested, but the following could be modified as required. I've assumed all the numbers are non-negative. I've also illustrated one way to "build" a regex, using Regexp#new.
s1 = '<(\d+(?:\.\d+)?)>' # note single parens
#=> "<(\\d+(?:\\.\\d+)?)>"
s2 = "=\\(#{s1},#{s1}\\)"
#=> "=\\(<(\\d+(?:\\.\\d+)?)>,<(\\d+(?:\\.\\d+)?)>\\)"
r = Regexp.new("length=#{s1} begin#{s2} end#{s2}")
#=> /length=<(\d+(?:\.\d+)?)> begin=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\) end=\(<(\d+(?:\.\d+)?)>,<(\d+(?:\.\d+)?)>\)/
str = "length=<47.79> begin=(<21>,<4.7>) end=(<0.356>,<17.999>)"
str.scan(r)
#=> [["47.79", "21", "4.7", "0.356", "17.999"]]
Sample input:
length=4 begin=(0,0) end=(3,0)
data.txt:
length=3 begin=(0,0) end=(3,0)
length=4 begin=(0,1) end=(0,5)
length=2 begin=(1,3) end=(1,5)
Try this:
require 'pp'
Line = Struct.new(
:length,
:begin_x,
:begin_y,
:end_x,
:end_y,
)
lines = []
IO.foreach('data.txt') do |line|
numbers = []
line.scan(/\d+/) do |match|
numbers << match.to_i
end
lines << Line.new(*numbers)
end
pp lines
puts lines[-1].begin_x
--output:--
[#<struct Line length=3, begin_x=0, begin_y=0, end_x=3, end_y=0>,
#<struct Line length=4, begin_x=0, begin_y=1, end_x=0, end_y=5>,
#<struct Line length=2, begin_x=1, begin_y=3, end_x=1, end_y=5>]
1
With this data.txt:
2 4 1.3434324,3.543243,4.525324
1 2
18 3.3213,9.3233,1.12231,2.5435
7 9 2.2,1.899990
0 3 2.323
Try this:
require 'pp'
data = []
IO.foreach('data.txt') do |line|
pieces = line.split
csv_numbers = pieces[-1]
next if not csv_numbers.index('.') #skip the case where there are no floats on a line
floats = csv_numbers.split(',')
data << floats.map(&:to_f)
end
pp data
--output:--
[[1.3434324, 3.543243, 4.525324],
[3.3213, 9.3233, 1.12231, 2.5435],
[2.2, 1.89999],
[2.323]]

How do I truncate a string and return both first and last part without a center part?

Let string_a = "I'm a very long long long string"
Is there a method in Ruby that truncates a string something like this?:
magic_method(string_a) # => "I'm a ver...ng string"
In Rails I can do: truncate(string_a) but only the first part is returned.
You can try the method String.scan with a regular expression to break the string where you wish. Try somenthing like:
def magic_method(string_x)
splits = string_x.scan(/(.{0,9})(.*?)(.{0,9}$)/)[0]
splits[1] = "..." if splits[1].length > 3
splits.join
end
Explanation:
Supose string_a = "I'm a very long long long string"; string_b = "A small test string"; and string_c = "tiny". First split the string within 3 groups.
The first group (.{0,9}) try to catch the first 9 or less characters. Ex. "I'm a ver" for string_a; "A small t" for string_b and "tiny" for string_c.
The last group (.{0,9}$) try to catch the last 9 or less characters and the end of string ($). Ex."ng string" for string_a; "st string" for string_b and "" for string_c;
The middle group (.*?) try to catch what is left over. This only works because of the ? that makes this regular expression not greedy (otherwise it would get the rest of the string, lefting nothing to the last group. Ex: "y long long lo" for string_a, "e" for string_b and "" for string_c
Than we check if the middle group is greater than 3 characters, if so, we replace with "...". This will only happen on string_a. Here we wouldn't like to make string_b longer, replacing "e" with "..." resulting in "A simple t...st string".
Finally join the groups (array elements) into a single string with Array.join
Why not try something like:
s[0..l] + "..." + s[s.length-l..s.length]
with l * 2 + 3 being the desired length, so perhaps a method something like this
def middle_truncate(s,length=20,ellipsis="...")
return s if s.length <= length + ellipsis.length
return s[0..length] if length < ellipsis.length
len = (length - ellipsis.length)/2
s_len = len - length % 2
s[0..s_len] + ellipsis + s[s.length-len..s.length]
end
s ="I'm a very long long long string"
puts middle_truncate(s, 100)
puts middle_truncate(s, 21)
puts middle_truncate(s, 11)
puts middle_truncate(s, 5)
puts middle_truncate(s*2, 45)
puts middle_truncate(s)
puts middle_truncate(s,2)
outputs
I'm a very long long long string
I'm a ver...ng string
I'm ...ring
I...g
I'm a very long long ...long long long string
I'm a ver...g string
I'm
This will do the trick:
ELLIPSIS = "..."
DEFAULT_MAX_LEN = 15
def trunc_to_len(string, len = DEFAULT_MAX_LEN)
return "" if len <= 0
return "." * len if len < ELLIPSIS.length + 2
return string if string.length <= len
# make room for "..."
half = (len - ELLIPSIS.length) / 2
string[0,half] + ELLIPSIS + string[-half,half]
end
These examples:
puts trunc_to_len("I'm a very long long long string", 100)
puts trunc_to_len("I'm a very long long long string", 21)
puts trunc_to_len("I'm a very long long long string", 11)
puts trunc_to_len("I'm a very long long long string", 5)
puts trunc_to_len("I'm a very long long long string")
puts trunc_to_len("I'm a very long long long string",2)
output:
I'm a very long long long string
I'm a ver...ng string
I'm ...ring
I...g
I'm a ...string
..
str = "abcdefghijklmnopqrstuvwxyz"
str[9..-10] = "..."
p str #=> "abcdefghi...rstuvwxyz"
def trunc_to_len(s, n)
if n < s.length
i, j = n.divmod 2
s[0..i+j-1] + "..." + s[-i..-1]
else
s
end
end

Resources