I have built a small program in ruby that collects table data from my own PDF bank statements. This does so by scanning each PDF statement for tables and then filters out for transactional line item patterns.
Everything is working great and I have managed to collect an array of line items as an array of string arrays. Getting an array of keyed objects would be better but a bit tricky with the format of the statements.
The issue is that the line items have different lengths, so it's kind of tricky to always know the location of the correct values to map.
For example:
["Transaction 1", "1.00"]
["Transaction 2", "Hello World", "3.00"]
["Transaction 3", "Hello World", "feeffe", "5.00"]
["Transaction 4", "Hello World", "feeffe", "5.00", "12.00"]
["Transaction 5", "Hello World # 10.00", "feeffe", "10.00", "12.00"]
The line items only range in between 2 and 5 array items normally.
Is there an efficient/accurate way to map the above to:
{ description: "Transaction 1", amt: "1.00"}
{ description: "Transaction 2 - Hello World", amt: "3.00"}
{ description: "Transaction 3 - Hello World - feeffe", amt: "5.00"}
{ description: "Transaction 4 - Hello World - feeffe", amt: "5.00"}
{ description: "Transaction 5 - Hello World # 10.00 - feeffe", amt: "10.00"}
-Or is the only way to write IF conditions that looks at the array length and makes a "best guess" effort?
If you are having,
row = ["Transaction 2", "Hello World", "3.00"]
You can follow by doing,
{ description: row[0..-2].join(' - '), amt: row[-1] }
You have to further manipulate how these rows get iterated so further logic will vary.
update:
For condition updated specified later, it is seen to have row can have length 5 where actual amount is second last value.
data = (row.length == 5) ? [row[0..-3], row[-2]] : [row[0..-2], row[-1]]
{ description: data[0].join(' - '), amt: data[1] }
Assume your transaction is on a variable tr, i.e.
tr=["Transaction 5", "Hello World", "feeffe", "10.00", "12.00"]
I would first separtate this into those strings which look like an amount, and those which don't:
amounts,texts= tr.partition {|el| /^\d+[.]\d{2}/ =~ el}
Here you can check that !amounts.empty?, to guard agains transaction without amount. Now your hash could be
{
transaction_name: texts.first,
transaction_text: "#{texts[1]}#{amounts.size > 1 ? %( # #{amounts.first}) : ''}#{texts.size > 2 ? %( - #{texts.last}) : ''}",
amt: amounts.last
}
Try this regex:
"\K[^",\]]+
Here is Demo
If the number of items always determines the index of the amount element, you can do something like:
input = [
["Transaction 1", "1.00"],
["Transaction 2", "Hello World", "3.00"],
["Transaction 3", "Hello World", "feeffe", "5.00"],
["Transaction 4", "Hello World", "feeffe", "5.00", "12.00"],
["Transaction 5", "Hello World # 10.00", "feeffe", "10.00", "12.00"]
]
ROW_LENGTH_TO_AMOUNT_INDEX = {
2 => 1,
3 => 2,
4 => 3,
5 => 3,
}
def map(transactions)
transactions.map do |row|
amount_index = ROW_LENGTH_TO_AMOUNT_INDEX[row.length]
{
description: row[0],
amt: row[amount_index]
}
end
end
p map(input)
[{:description=>"Transaction 1", :amt=>"1.00"}, {:description=>"Transaction 2", :amt=>"3.00"}, {:description=>"Transaction 3", :amt=>"5.00"}, {:description=>"Transaction 4", :amt=>"5.00"}, {:description=>"Transaction 5", :amt=>"10.00"}]
Or, perhaps something like this?
MAPPERS = {
2 => lambda { |row| { description: row[0], amt: row[1]} },
3 => lambda { |row| { description: row[0], amt: row[2]} },
4 => lambda { |row| { description: row[0], amt: row[3]} },
5 => lambda { |row| { description: row[0], amt: row[3]} }
}
def map(transactions)
transactions.map do |row|
MAPPERS[row.length].call(row)
end
end
arr = [["Transaction 1", "1.00"],
["Transaction 2", "Hello World", "3.00"],
["Transaction 3", "Hello World", "feeffe", "5.00"]]
arr.map {|*first, last| { description: first.join(' - '), amt: last } }
#=> [{:description=>"Transaction 1", :amt=>"1.00"},
# {:description=>"Transaction 2 - Hello World", :amt=>"3.00"},
# {:description=>"Transaction 3 - Hello World - feeffe", :amt=>"5.00"}]
Related
If I receive an input such as:
up 1, down 2, down 3, left 5
And I save this as an array, it will give me
["up 1", " down 2", " down 3", " left 5"]
But I need to delete the spaces before down and left!
Any ideas?
Since you only want to remove the whitespace before certain strings use lstrip:
Returns a copy of str with leading whitespace removed.
arr = ["up 1", " down 2", " down 3", " left 5"]
arr.map(&:lstrip)
# => ["up 1", "down 2", "down 3", "left 5"]
You can use String#strip:
array = ["up 1", " down 2", " down 3", " left 5"]
p array.map &:strip
# ["up 1", "down 2", "down 3", "left 5"]
strip returns a copy of str with leading and trailing whitespace removed, lstrip and rstrip do the same just respectively for left or right:
p ' ayayayayay '.strip # "ayayayayay"
p ' ayayayayay '.lstrip # "ayayayayay "
p ' ayayayayay '.rstrip # " ayayayayay"
Just a thought...
In the off case that you "receive an input" as a string such as:
str = "up 1, down 2, down 3, left 5"
You could do:
str.gsub(', ', ',').split(',')
Which gives:
=> ["up 1", "down 2", "down 3", "left 5"]
OR, if you're not a moron (like me), you could do:
str.split(', ')
As Sebastian (very politely) points out.
How about using squish to remove whitespaces.
["up 1"," down 2", "down 3"," left 5"].map(&:squish)
I need to delete the spaces before down and left!
Answering the exact question stated:
["up 1"," down 2"," down 3"," left 5"].
map { |e| e.gsub(/\s+(?=down|left)/, '') }
#⇒ ["up 1", "down 2", "down 3", "left 5"].
I have an array in which each array item is a hash with date values, as shown in my example below. In actuality, it is longer and there are about 20 dates per item instead of 3. What I need to do is get the date interval values for each item (that is, how many days between each date value), and their intervals' medians. My code is as follows:
require 'csv'
require 'date'
dateArray = [{:date_one => "May 1", :date_two =>"May 5", :date_three => " "}, {:date_one => "May 10", :date_two =>"May 10", :date_three => "May 20"}, {:date_one => "May 6", :date_two =>"May 11", :date_three => "May 12"}]
public
def median
sorted = self.sort
len = sorted.length
return (sorted[(len - 1) / 2] + sorted[len / 2]) / 2.0
end
puts dateIntervals = dateArray.map{|h| (DateTime.parse(h[:date_two]) - DateTime.parse(h[:date_one])).to_i}
puts "\nMedian: "
puts dateIntervals.median
Which returns these date interval values and this median:
4
0
5
Median: 4
However, some of these items' values are empty, as in the first item, in its :date_three value. If I try to run the same equations for the :date_three to :date_two values, as follows, it will throw an error because the last :date_three value is empty.
It's okay that I can't get that interval, but I would still would need the next two items date intervals (which would be 10 and 1).
How can I skip over intervals that return errors when I try to run them?
I would recommend adding helper functions that can deal with the types of inputs you're expecting. For instance:
def date_diff(date_one, date_two)
return nil if date_one.nil? || date_two.nil?
(date_one - date_two).to_i
end
def str_to_date(input_string)
DateTime.parse(input_string)
rescue
nil
end
dateArray.map{|h| date_diff(str_to_date(h[:date_three]), str_to_date(h[:date_two])) }
=> [nil, 10, 1]
dateArray.map{|h| date_diff(str_to_date(h[:date_three]), str_to_date(h[:date_two])) }.compact.median
=> 5.5
The bonus here is that you can then add unit tests for the individual components so that you can easily test edge cases (nil dates, empty string dates, etc).
In your map block, you can just add a check to make sure the values aren't blank
dateIntervals = dateArray.map{ |h|
(DateTime.parse(h[:date_two]) - DateTime.parse(h[:date_one])).to_i unless any_blank?(h)
}
def any_blank?(h)
h.each do |k, v|
return true if v == " "
end
end
I would first just filter out the empty values first (I check if the string consists entirely of whitespace or is empty), then compare the remaining values using your existing code. I added a loop which will compare all values in the sequence to the next value.
dateArray = [
{ date_one: "May 1", date_two: "May 5", date_three: " ", date_four: "" },
{ date_one: "May 10", date_two: "May 10", date_three: "May 20" }
]
intervals = dateArray.map do |hash|
filtered = hash.values.reject { |str| str =~ /^\s*$/ }
(0...filtered.size-1).map { |idx| (DateTime.parse(filtered[idx+1]) - DateTime.parse(filtered[idx])).to_i }
end
# => [[4], [0, 10]]
I have an array that behaves like a multidimensional array through spaces, like:
"roles"=>["1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0", "15 editor 0"], "commit"=>"Give Access", "id"=>"3"}
Each array value represents [category_id, user.title, checked_boolean], and comes from
form
<%= hidden_field_tag "roles[]", [c.id, "editor", 0] %>
<%= check_box_tag "roles[]", [c.id, "editor", 1 ], !!checked %>
which I process it using splits
params[:roles].each do |role|
cat_id = role[0].split(" ")[0]
title = role.split(" ")[1]
checked_boolean = role.split(" ")[2]
end
Given the array at the top, you can see that the "Category 1" & "Category 2" is checked, while "Cat 14" and "Cat 15" are not.
I would like to compare the values of the given array, and if both 1 & 0 exists for a given category_id, I would like to get rid of the value with "checked_boolean = 0". This way, if the boolean is a 1, I can check to see if the Role already exists, and if not, create it. And if it is 0, I can check to see if Role exists, and if it does, delete it.
How would I be able to do this? I thought of doing something like params[:roles].uniq but didn't know how to process the uniq only on the first split.
Or is there a better way of posting the "unchecks" in Rails? I've found solutions for processing the uncheck action for simple checkboxes that passes in either true/false, but my case is different because it needs to pass in true/false in addition to the User.Title
Let's params[:roles] is:
["1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0", "15 editor 0"]
The example of the conversion and filtering is below:
roles = params[:roles].map {| role | role.split " " }
filtered = roles.select do| role |
next true if role[ 2 ].to_i == 1
count = roles.reduce( 0 ) {| count, r | r[ 0 ] == role[ 0 ] && count + 1 || count}
count == 1
end
# => [["1", "editor", "1"], ["2", "editor", "1"], ["14", "editor", "0"], ["15", "editor", "0"]]
filtered.map {| role | role.join( ' ' ) }
Since the select method returns a new filtered role array, so result array you can see above. But of course you can still use and source params[:roles], and intermediate (after map method worked) versions of role array.
Finally you can adduce the result array into the text form:
filtered.map {| role | role.join( ' ' ) }
=> ["1 editor 1", "2 editor 1", "14 editor 0", "15 editor 0"]
majioa's solution is certainly more terse and a better use of the language's features, but here is my take on it with a more language agnostic approach. I have only just started learning Ruby so I used this as an opportunity to learn, but it does solve your problem.
my_array = ["1 editor 0", "1 editor 0", "1 editor 1", "2 editor 0",
"2 editor 1", "14 editor 0", "15 editor 0"]
puts "My array before:"
puts my_array.inspect
# As we're nesting a loop inside another for each loop
# we can't delete from the same array without confusing the
# iterator of the outside loop. Instead we'll delete at the end.
role_to_del = Array.new
my_array.each do |role|
cat_id, checked_boolean = role.split(" ")[0], role.split(" ")[2]
if checked_boolean == "1"
# Search through the array and mark the roles for deletion if
# the category id's match and the found role's checked status
# doesn't equal 1.
my_array.each do |s_role|
s_cat_id = s_role.split(" ")[0]
if s_cat_id != cat_id
next
else
s_checked_boolean = s_role.split(" ")[2]
role_to_del.push s_role if s_checked_boolean != "1"
end
end
end
end
# Delete all redundant roles
role_to_del.each { |role| my_array.delete role }
puts "My array after:"
puts my_array.inspect
Output:
My array before:
["1 editor 0", "1 editor 0", "1 editor 1", "2 editor 0", "2 editor 1", "14 editor 0",
"15 editor 0"]
My array after:
["1 editor 1", "2 editor 1", "14 editor 0", "15 editor 0"]
This is my code for calculate word frequency
word_arr= ["I", "received", "this", "in", "email", "and", "found", "it", "a", "good", "read", "to", "share......", "Yes,", "Dr", "M.", "Bakri", "Musa", "seems", "to", "know", "what", "is", "happening", "in", "Malaysia.", "Some", "of", "you", "may", "know.", "He", "is", "a", "Malay", "extra horny", "horny nor", "nor their", "their babes", "babes are", "are extra", "extra SEXY..", "SEXY.. .", ". .", ". .It's", ".It's because", "because their", "their CONDOMS", "CONDOMS are", "are Made", "Made In", "In China........;)", "China........;) &&"]
arr_stop_kwd=["a","and"]
frequencies = Hash.new(0)
word_arr.each { |word|
if !arr_stop_kwd.include?(word.downcase) && !word.match('&&')
frequencies["#{word.downcase}"] += 1
end
}
when i have 100k data it will take 9.03 seconds,that,s to much time can i calculate any another way
Thx in advance
Take a look at Facets gem
You can do something like this using the frequency method
require 'facets'
frequencies = (word_arr-arr_stop_kwd).frequency
Note that stop word can be subtracted from the word_arr. Refer to Array Documentation.
words.delete_if do |x|
x == ("a"||"for"||"to"||"and")
end
words is an array with many words. My code is deleting "a" but not deleting "for", "to" or "and".
May this will help you
words.delete_if do |x|
%w(a for to and).include?(x)
end
Just do
words - ["a", "for", "to", "and"]
Example
words = %w(this is a just test data for array - method and nothing)
=> ["this", "is", "a", "just", "test", "data", "for", "array", "-", "method", "and", "nothing"]
words = words - ["a", "for", "to", "and"]
=> ["this", "is", "just", "test", "data", "array", "-", "method", "nothing"]
If you run "a" || "b" in irb then you will always get "a" because it is a non null value and it would be returned by || always..
In your case "a"||"for" will always evaluate for "a" irrespective of the other values in the array..
So this is my alternate solution to your question
w = %W{a for to end}
words.reject! { |x| w.include?(x) }