Cache a complex calculation in Rails 3 model - ruby-on-rails

I'm new to Ruby/Rails, so this is possibly (hopefully) a simple question that I just dont know the answer to.
I am implementing an accounting/billing system in Rails, and I'm trying to keep track of the running balance after each transaction in order to display it in the view as below:
Date Description Charges($) Credits($) Balance($)
Mar 2 Activity C $4.00 -$7.50
Feb 25 Payment for Jan $8.00 -$3.50
Feb 23 Activity B $1.50 -$11.50
Feb 20 Activity A $2.00 -$10.00
Each transaction (also known as line item) is stored in the database, with all the values above (Date, Description, Amount) except for the Balance. I can't store the balance for each transaction in the database because it may change if something happens to an earlier transaction (a payment that was posted subsequently failed later for example). So I need to calculate it on the fly for each line item, and the value for the Balance for a line item depends on the value for the line item before it (Balance = Balance of Prev Line Item + Amount for this Line Item, i.e.)
So here's my question. My current (inept) way of doing it is that in my LineItem model, I have a balance method which looks like :
def balance
prev_balance = 0
#get previous line items balance if it exists.
last_line_item = Billing::LineItem.get_last_line_item_for_a_ledger(self.issue_date,self.ledger_item_id)
if last_line_item
prev_balance = last_line_item.balance
.. some other stuff...
end
prev_balance + (-1*net_amount) # net_amount is the amount for the current line item
end
This is super costly and my view takes forever to load since I'm calculating the prev line item's balance again and again and again. Whats a better way to do this?

You're basically paying a price for not wanting to store the balance in each transaction. You could optimize your database with indices and use caches etc; but fundamentally you'll run into the problem that calculating a balance will take a long time, if you have lots of transactions.
Keep in mind that you'll continue to get new transactions, and your problem will thus get worse over time.
You could consider several design alternatives. First, like Douglas Lise mentioned, you could store the balance in each transaction. If an earlier dated transaction comes in, it means you may have to do an update of several transaction since that date. However, this has an upper-bound (depending on how "old" transactions you want to allow), so it has a reasonable worst-case behavior.
Alternatively, you can do a reconciliation step. Every month you "close the books" on transactions older than X weeks. After reconciliation you store the Balance you calculated. In def balance you now use your existing logic, but also refer to "balance as of the previous reconciliation". This again, provides a reasonable and predictable worst-case scenario.

Related

How can I seed a counter instrument when migrating towards OTel counters?

Imagine a business metric that is currently being tracked by an in-house tool somewhere, let's call it "hats sold": whenever an order is processed in my application, the number of hats sold in the order is persisted alongside the date of the event. This data is later aggregated in a custom UI so that business can tell at any given time how our sales are going using graphs and totals.
For this example, consider that as of today, this "hats sold" value is sitting at 420 total, and that this was reached in 10 orders of 42 hats each: one per month.
Now imagine I want to move away from the in-house solution, and into a custom OpenTelemetry metric, using a COUNTER instrument to keep historical track of the "hats sold" value. If I just change my code to stop tracking the value manually and instead push increments to a Counter instance, this counter will start counting from 0 again on whatever monitoring tool I'm using: this only makes sense, since these would be the first emissions of the counter from the monitoring tool's perspective.
How do I ensure that, when migrating to the OpenTelemetry approach, that my counter starts counting from 420 (the value I had prior to the migration) instead of from 0? And what about my historical sales data, how would I "migrate" that when converting the logic to OpenTelemetry-based? In other words, how do I "seed" historical counter data?

Modelling recurrent items (expenses) as records with Rails

I am writing what could be defined as an accountancy/invoicing app using Rails 5. I am in need of implementing a section that predicts the company's cashflow in the future. So far I've got the following:
Actual bank movements and balances (in the past), imported from the bank
Future invoices (income) which are expected to be paid on a certain date
Future one-time expenses which are expected to be paid on a certain date
Using these three sets of data, I can calculate, for any given date in the future, the sum of: the last known bank balance, plus all the future invoices values coming IN, minus all the future expenses going OUT, so I get, theoretically, the expected balance of the company for any given date.
My doubt arises when it comes to recurrent expenses (or potentially incomes). Given that all of the items I mentioned before (bank movements, invoices and expenses) are actual ActiveRecord records stored in my database, I'm not sure about how to treat the recurrent expenses, for example:
Let's imagine I want to enter a known future recurrent paycheck of a certain employee, which is $2000 every first day of the month.
1- Should I generate at some point the next X entries and treat them as normal future expenses (each with its own ID, date and amount)?
2- The other option I've thought of is having some kind of "declaration" on the nature of the recurrent expense, as in "it's $2000 every day 1 of month until -forever-", similarly to a cronjob. But, if I were to take this approach, I'd like to have an ActiveRecord - similar interface, so that I can do something like:
cashflow = []
last_movement = BankMovement.last
value = last_movement.balance
(last_movement.date..(last_movement.date + 12.months)).each do |day|
value += Invoice.pending.expected_on(day).sum(:gross_amount)
value -= Expense.pending.expected_on(day).sum(:gross_amount)
value -= RecurringExpense.expected_on(day).sum(:gross_amount)
cashflow.push( { date: day, balance: value } )
end
This feels almost right but, I'm not sure about how to link the actual expense when it comes with the recurrent/calculated one. How can I then change the date if the expense gets paid the day after it was supposed? I need to have an actual record of each one of those, at least whenever they are "consolidated".
I'm not really sure if I was clear enough with my trouble here, so, should anyone want and have some spare time to help me out, please feel free to ask for any extra relevant info, I'd really appreciate some help, especially if we can find a way of doing this "the Rails way"!

Handling default values in window aggregations

I have an aggregation that looks at a sliding 30-day window (1 day period) of customer purchases, keyed by customer id, with the value being the purchase amount. I sum up the values by key, thus getting the aggregate purchase amount for each customer during the last 30 days. I store this number in a customer record in an external database.
My question is this: if a customer hasn't purchased anything in the last 30 days, how do I automatically reset the customer record to a default value, in this case zero? I'd prefer to keep all my logic in Dataflow and avoid doing too much work, since this will need to scale quite a bit. I'm basically looking for a way to automatically get a key-value for each key that was not in the current window but was in the last, and the value being a potentially configurable default.
Trying to answer my own question, but hoping for feedback as to whether this solution would scale:
I've thought about having a step after the initial window-and-sum. This transform would receive (customerId, purchaseSum) elements once a day, as the result of the 30-day window sum is made available. Since these elements are timestamped (with the timestamp of the most recent input element, I believe) I can re-window them. If I create a two-day window with a one-day period, I would then be able to group by key and process (customerId, [purchaseSumA, purchaseSumB]) for customers that had a purchase both in the last 30 days and in the last 31 days. In this case, I emit purchaseSumB. However, if there's only in element in the list, and the timestamp indicates that the purchase was made 31 days ago, I can assume that there were no purchases from the customer since, and I need to emit (customerId, 0). Does that make sense?
Is it an option to slightly amend the database schema? I suppose now you have something like
(customer_id int, purchases_last_month int)`
Instead how about
`(customer_id int, last_purchase datetime, purchases_last_month int)`
where this time last_purchase is the time of the last purchase made by this customer, and purchases_last_month refers to purchases made in the month before the last one? Then in your DoFn that writes to the database, you'd be making a conditional update (merge/upsert) that updates both last_purchase and purchases_last_month with the values from the current window, but only if last_purchase is increasing. This way you can deal with windows being processed out-of-order or in parallel, at the cost of slight increase in complexity in client queries (which you can address by adding a view on top of the table).

Prepping Data For Usage Clustering

Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.

Algorithm for tracking changes in value over time

I am writing a rails app that deals with product inventory. I would like to include the following features, and am struggling with developing an efficient algorithm:
View stock history (how many were in stock on each date)
Quantity removed from warehouse, and quantity added to warehouse over specific periods of time
Amount of time the product was out of stock in any given period
My questions are as follows:
What is the best way of tracking changes? In addition to my Products
table, should I create another table called
HistoricProductQuantities, and insert a new record each time there
is a change in the quantity?
What number should I track? The historic stock quantity (i.e. 50 in
stock on this day, 24 in stock on that day), or the CHANGE in stock
quantity i.e. -5 (5 sold) or 15 (15 added to inventory)? Or do I
track both in separate tables?
Thanks for your help.
First of all I recommend implementing Date Dimensions on your application, as it seems like you will be doing a lot of Time related calculations. Search on Google for date dimensions as it's beyond the scope of your questions. That said, I believe it will be of great benefit for your app to implement and use date dimensions.
As far as your direct questions go:
What is the best way of tracking changes? In addition to my Products table, should I create another table called HistoricProductQuantities, and insert a new record each time there is a change in the quantity?
Yes you could do this, I would probably call it HistoricProductSnapshot and keep track of the product activity in there on daily basis. With this information as well as time dimensions you could do calculations such as "how many of Product X Did we have 5 days ago or a month ago etc etc."
What number should I track? The historic stock quantity (i.e. 50 in stock on this day, 24 in stock on that day), or the CHANGE in stock quantity i.e. -5 (5 sold) or 15 (15 added to inventory)? Or do I track both in separate tables?
I do not have experience writing inventory control software but I believe with the Snapshot table I mentioned on the question above you would only have to keep track of quantities per day. The Change in product counts could then be calculated from your snapshot table. You could for example have a function that will output the product amount in a given time range as an array. Example: From March 1 to March 7 these were the stock amounts for Product Y [45,40,39,27,22,45,44].
Hope that helps. As I said I am not a product inventory guy but I have worked with Point of Sales Systems and the procedure above should give you a could enough start for what you are trying to do.
This gem could be usefull for tracking changes in models https://github.com/collectiveidea/audited
Keep the data raw. I would personally create a new data entry every day, displaying how much items you have in stock per day. Or you can make the interval much shorter, such as every 12 hours.
For our particular use case:
We had a table called Days, which had a many to many relationship with products, and each "relationship" will have a value called quantity (to keep track of quantity of product per day). Additionally per relationship, we had another value for the relationship with transactions (a one to many relationship) that has the entries for the time of transaction and remaining stocks.
I would personally advise you to use the quantity of stock as the raw data, as it will enable you to gather the data such as how much items were removed during a certain transaction, when the item was out of stock and when it became in stock, all through the data. When you have data in which you need to perform statistical calculations on, it's best to store this data as raw values (quantity of the item).

Resources