Modelling recurrent items (expenses) as records with Rails - ruby-on-rails

I am writing what could be defined as an accountancy/invoicing app using Rails 5. I am in need of implementing a section that predicts the company's cashflow in the future. So far I've got the following:
Actual bank movements and balances (in the past), imported from the bank
Future invoices (income) which are expected to be paid on a certain date
Future one-time expenses which are expected to be paid on a certain date
Using these three sets of data, I can calculate, for any given date in the future, the sum of: the last known bank balance, plus all the future invoices values coming IN, minus all the future expenses going OUT, so I get, theoretically, the expected balance of the company for any given date.
My doubt arises when it comes to recurrent expenses (or potentially incomes). Given that all of the items I mentioned before (bank movements, invoices and expenses) are actual ActiveRecord records stored in my database, I'm not sure about how to treat the recurrent expenses, for example:
Let's imagine I want to enter a known future recurrent paycheck of a certain employee, which is $2000 every first day of the month.
1- Should I generate at some point the next X entries and treat them as normal future expenses (each with its own ID, date and amount)?
2- The other option I've thought of is having some kind of "declaration" on the nature of the recurrent expense, as in "it's $2000 every day 1 of month until -forever-", similarly to a cronjob. But, if I were to take this approach, I'd like to have an ActiveRecord - similar interface, so that I can do something like:
cashflow = []
last_movement = BankMovement.last
value = last_movement.balance
(last_movement.date..(last_movement.date + 12.months)).each do |day|
value += Invoice.pending.expected_on(day).sum(:gross_amount)
value -= Expense.pending.expected_on(day).sum(:gross_amount)
value -= RecurringExpense.expected_on(day).sum(:gross_amount)
cashflow.push( { date: day, balance: value } )
end
This feels almost right but, I'm not sure about how to link the actual expense when it comes with the recurrent/calculated one. How can I then change the date if the expense gets paid the day after it was supposed? I need to have an actual record of each one of those, at least whenever they are "consolidated".
I'm not really sure if I was clear enough with my trouble here, so, should anyone want and have some spare time to help me out, please feel free to ask for any extra relevant info, I'd really appreciate some help, especially if we can find a way of doing this "the Rails way"!

Related

better design for fact table where each row has a Start & End Date

My fact table contains details for clients who attend a course.
To ensure i can get a list of clients registered on any particular day, I have not related the date dimension to the fact table.
Instead i created a measure that does basic between logic (where startDate <= selectedDate && endDate >=SelectedDate)
This allows me to find all clients registered on one single selected day.
There are a few drawback to this however:
-I have to ensure the report user only selects a single day, i.e. they cannot select a date range.
-I cant easily do counts for samePeriodLastMonth or Year.
Is there a better design i should consider that will still allow me to see counts of registered clients on any given day, along with allowing me to use SamePeriodLastMonth/Year functionality?
Would you mind uploading the structure of your fact and dim tables?
Just a thought bubble: if you would like to measure counts for a program over calendar years, I believe you would definitely need to create a Date dimension. Also depending on your reporting needs you might want to consider whether you need an Accumulating Snapshot Fact table.
Please find further details on this:
http://www.kimballgroup.com/2012/05/design-tip-145-time-stamping-accumulating-snapshot-fact-tables/
Cheers
Nithin

Handling default values in window aggregations

I have an aggregation that looks at a sliding 30-day window (1 day period) of customer purchases, keyed by customer id, with the value being the purchase amount. I sum up the values by key, thus getting the aggregate purchase amount for each customer during the last 30 days. I store this number in a customer record in an external database.
My question is this: if a customer hasn't purchased anything in the last 30 days, how do I automatically reset the customer record to a default value, in this case zero? I'd prefer to keep all my logic in Dataflow and avoid doing too much work, since this will need to scale quite a bit. I'm basically looking for a way to automatically get a key-value for each key that was not in the current window but was in the last, and the value being a potentially configurable default.
Trying to answer my own question, but hoping for feedback as to whether this solution would scale:
I've thought about having a step after the initial window-and-sum. This transform would receive (customerId, purchaseSum) elements once a day, as the result of the 30-day window sum is made available. Since these elements are timestamped (with the timestamp of the most recent input element, I believe) I can re-window them. If I create a two-day window with a one-day period, I would then be able to group by key and process (customerId, [purchaseSumA, purchaseSumB]) for customers that had a purchase both in the last 30 days and in the last 31 days. In this case, I emit purchaseSumB. However, if there's only in element in the list, and the timestamp indicates that the purchase was made 31 days ago, I can assume that there were no purchases from the customer since, and I need to emit (customerId, 0). Does that make sense?
Is it an option to slightly amend the database schema? I suppose now you have something like
(customer_id int, purchases_last_month int)`
Instead how about
`(customer_id int, last_purchase datetime, purchases_last_month int)`
where this time last_purchase is the time of the last purchase made by this customer, and purchases_last_month refers to purchases made in the month before the last one? Then in your DoFn that writes to the database, you'd be making a conditional update (merge/upsert) that updates both last_purchase and purchases_last_month with the values from the current window, but only if last_purchase is increasing. This way you can deal with windows being processed out-of-order or in parallel, at the cost of slight increase in complexity in client queries (which you can address by adding a view on top of the table).

Prepping Data For Usage Clustering

Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.

Algorithm for tracking changes in value over time

I am writing a rails app that deals with product inventory. I would like to include the following features, and am struggling with developing an efficient algorithm:
View stock history (how many were in stock on each date)
Quantity removed from warehouse, and quantity added to warehouse over specific periods of time
Amount of time the product was out of stock in any given period
My questions are as follows:
What is the best way of tracking changes? In addition to my Products
table, should I create another table called
HistoricProductQuantities, and insert a new record each time there
is a change in the quantity?
What number should I track? The historic stock quantity (i.e. 50 in
stock on this day, 24 in stock on that day), or the CHANGE in stock
quantity i.e. -5 (5 sold) or 15 (15 added to inventory)? Or do I
track both in separate tables?
Thanks for your help.
First of all I recommend implementing Date Dimensions on your application, as it seems like you will be doing a lot of Time related calculations. Search on Google for date dimensions as it's beyond the scope of your questions. That said, I believe it will be of great benefit for your app to implement and use date dimensions.
As far as your direct questions go:
What is the best way of tracking changes? In addition to my Products table, should I create another table called HistoricProductQuantities, and insert a new record each time there is a change in the quantity?
Yes you could do this, I would probably call it HistoricProductSnapshot and keep track of the product activity in there on daily basis. With this information as well as time dimensions you could do calculations such as "how many of Product X Did we have 5 days ago or a month ago etc etc."
What number should I track? The historic stock quantity (i.e. 50 in stock on this day, 24 in stock on that day), or the CHANGE in stock quantity i.e. -5 (5 sold) or 15 (15 added to inventory)? Or do I track both in separate tables?
I do not have experience writing inventory control software but I believe with the Snapshot table I mentioned on the question above you would only have to keep track of quantities per day. The Change in product counts could then be calculated from your snapshot table. You could for example have a function that will output the product amount in a given time range as an array. Example: From March 1 to March 7 these were the stock amounts for Product Y [45,40,39,27,22,45,44].
Hope that helps. As I said I am not a product inventory guy but I have worked with Point of Sales Systems and the procedure above should give you a could enough start for what you are trying to do.
This gem could be usefull for tracking changes in models https://github.com/collectiveidea/audited
Keep the data raw. I would personally create a new data entry every day, displaying how much items you have in stock per day. Or you can make the interval much shorter, such as every 12 hours.
For our particular use case:
We had a table called Days, which had a many to many relationship with products, and each "relationship" will have a value called quantity (to keep track of quantity of product per day). Additionally per relationship, we had another value for the relationship with transactions (a one to many relationship) that has the entries for the time of transaction and remaining stocks.
I would personally advise you to use the quantity of stock as the raw data, as it will enable you to gather the data such as how much items were removed during a certain transaction, when the item was out of stock and when it became in stock, all through the data. When you have data in which you need to perform statistical calculations on, it's best to store this data as raw values (quantity of the item).

How would you build this daily class schedule?

What I want to do is very simple but I'm trying to find the best or most elegant way to do this. The Rails application I'm building now will have a schedule of daily classes. For each class the fields relevant to this question are:
Day of the week
Starting time
Ending time
A single entry could be something such as:
day of week: Wednesday
starting time: 10:00 am
ending time: Noon
Also I must mention that it's a bi-lingual Rails 2.2 app and I'm using the native i18n Rails feature. I actually have several questions.
Regarding the day of the week, should I create an extra table with list of days, or is there a built-in way to create that list on the fly? Keep in mind these days of the week will have to be rendered in English or Spanish in the schedule view depending on the locale variable.
While querying the schedule I will need to group and order the results by weekday, from Monday to Sunday, and of course order the classes within each day by starting time.
Regarding the starting time and ending time of each class would you use datetime fields or integer fields? If the latter how would you implement this exactly?
Looking forward to read the different suggestions you guys will come up with.
I would just store the day of the week as an integer. 0 => Monday ... 6 => Sunday (or any way you want. ie. 0 => Sunday). Then store the start time and end time as Time.
That would make grouping really easy. All you would have to do is sort by the day of the week and the start time.
You can display this in multiple ways, but here is what I would do.
Have functions like: #sunday_classes = DailyClass.find_sunday_classes that returns all the classes for Sunday sorted by start time. Then repeat for each day.
def find_sunday_classes
find_by_day_of_week(1, :order -> 'start_time')
end
Note: find_by probably should have id at the end but that's just preference in how you want to name the column.
If you want the full week then call all seven from the controller and loop trough them in the view. You could even create detail pages for each day.
Translation is the only tricky part. You can create a helper function that takes an integer and returns the text for the appropriate day of the week based on local.
That's very basic. Nothing complicated.
If your data is a Time then I would store that as a Time - otherwise you will always have to convert it out of the database when you do date and time related operations on it. The day is redundant data, as it will be part of the time object.
This should mean that you don't need to store a list of days.
If t is a time then
t.strftime('%A')
will always give you the day as a string in English. This could then be translated by i18n as required.
So you only need to store starting time and ending time, or starting time and duration. Both should be equivalent. I would be tempted to store ending time myself, in case you need to do data manipulations on ending times, which therefore won't have to be calculated.
I think most of the rest of what you describe should also fall out of storing time data as instances of Time.
Ordering by week day and time will just be a matter of ordering by your time column. i.e.
daily_class.find(:all, :conditions => ['whatever'], :order => :starting_time)
Grouping by day is a little more tricky. However this is an excellent post on how to group by week. Grouping by day will be analogous.
If you are dealing with non-trivial volumes of data, it may be better to do it in the database, with a find_by_sql and that may depend on your database's time and date functionality, but again storing the data as a Time will also help you here. For example in Postgresql (which I use), getting the week of a class is
date_trunc('week', starting_time)
which you can use in a Group By clause, or as a value to use in some loop logic in rails.
Re days-of-week, if you need to have e.g. classes that meet 09:00-10:00 on MWF, then you could either use a separate table for days a class meets (keyed by both class ID and DOW) or be evil (i.e. non-normalized) and keep the equivalent of an array of DOW in each class. The classic argument is this:
The separate table can be indexed in a way to support either class-oriented or DOW-oriented selects, but takes a bit more glue to put the entire picture together for a class.
The array-of-DOW is simpler to visualize for beginning programmers and slightly simpler to code about, but means that reasoning about DOW requires looking at all classes.
If this is only for your personal class schedule, do what gets you the value you're looking for, and live with the consequences; if you're trying to build a real system for multiple users, I'd go with a separate table. All those normalization rules are there for a reason.
As far as (human-readable) DOW names, that's a presentation-layer issue, and shouldn't be in the core concept of DOW. (Suppose you decided to move to Montreal, and needed French? That should be another "face" and not a change to the core implementation.)
As for starting/ending times, again the issue is your requirements. If all classes begin and end at hour (x:00) boundaries, you could certainly use 0..23 as the hours of the day. But then your life would be miserable as soon as you had to accommodate that 45-minute seminar. As the old commercial said, "Pay me now or pay me later."
One approach would be to define your own ClassTime concept and partition all reasoning about times to that class. It could start with a simplistic representation (integral hours 0..23, or integral minutes after midnight 0..1439) and then "grow" as needed.

Resources