Modelling indefinitely-recurring tasks in a schedule (calendar-like rails app) - ruby-on-rails

This has been quite a stumbling block. Warning: the following is not a question, rather explanation of what I came up with. My question is — do you have a better way to do this? Is there some common technique for this that I'm not familiar with? Seems like this is a trivial problem.
So you have Task model. You can create tasks, complete them, destroy them. Then you have recurring tasks. It's just like regular task, but it has a recurrence rule attached to it. However, tasks can recur indefinitely — you can go a year ahead in the schedule, and you should see the task show up.
So when a user creates a recurring task, you don't want to build thousands of tasks for hundred years into the future, and save them to database, right? So I started thinking — how do you create them?
One way would be to create them as you view your schedule. So, when the user is moving a month ahead, any recurring tasks will be created. Of course that means that you can't simply work with database records of tasks any longer. Every SELECT operation on tasks you ever do has to be in the context of a particular date range, in order to trigger recurring tasks in that date range to persist. This is a maintenance and performance burden, but doable.
Alright, but how about the original task? Every recurrent task gets associated with the recurrence rule that created it, and every recurrence rule needs to know the original task that started the recurrence. The latter is important, because you need to clone the original task into new dates as the user browses their schedule. I guess doable too.
But what happens if the original task is updated? It means that now as we browse the schedule, we will be creating recurring tasks cloned off of the modified task. That's undesirable. All the implicitly persisted recurring tasks should show up the way the original task looked like when recurrence was added. So we need to store a copy of the original task separately, and clone from that, in order for recurrence to work.
However, when the user navigates the tasks in the schedule, how do we know if at a particular point a new recurrence task needs to be created? We ask recurrence rule: "hey, should I persist a task for this day?" and it says yes or no. If there is already a task for this recurrence for this day, we don't create one. All nice, except a user shall also be able to simply delete one of the recurring tasks that has been automatically persisted. In that case following our logic, the system will re-create the task that has been deleted. Not good. So it means we need to keep storing the task, but mark it as deleted task for this recurrence. Meh.
As I said in the beginning, I want to know if somebody else tackled this problem and can provide architectural advice here. Does it have to be this messy? Is there anything more elegant I'm missing?
Update: Since this question is hard to answer perfectly, I will approve the most helpful insight into design/architecture, which has the best helpfulness/trade-offs ratio for this type of problem. It does not have to encompass all the details.

I know this is an old question but I'm just starting to look into this for my own application and I found this paper by Martin Fowler illuminating: Recurring Events for Calendars
The main takeaway for me was using what he calls "temporal expressions" to figure out if a booking falls on a certain date range instead of trying to insert an infinite number of events (or in your case tasks) into the database.
Practically, for your use case, this might mean that you store the Task with a "temporal expression" property called schedule. The ice_cube recurrence gem has the ability to serialize itself into an active record property like so:
class Task < ActiveRecord::Base
include IceCube
serialize :schedule, Hash
def schedule=(new_schedule)
write_attribute(:schedule, new_schedule.to_hash)
end
def schedule
Schedule.from_hash(read_attribute(:schedule))
end
end
Ice cube seems really flexible and even allows you to specify exceptions to the recurrence rules. (Say you want to delete just one occurrence of the task, but not all of them.)
The problem is that you can't really query the database for a task that falls in a specific range of dates, because you've only stored the rule for making tasks, not the tasks themselves. For my case, I'm thinking about adding a property like "next_recurrence_date" which will be used to do some basic sorting/filtering. You could even use that to throw a task on a queue to have something done on the next recurring date. (Like check if that date has passed and then regenerate it. You could even store an "archived" version of the task once its next recurring date passes.)
This fixes your issue with "what if the task is updated" since tasks aren't ever persisted until they're in the past.
Anyway, I hope that is helpful to someone trying to think this through for their own app.

Having done a calendar-like component for an internal social networking app, here's my approach to that problem.
Tiny bit of background: I needed to book boardrooms for meetings for the entire company. Every boardroom needed to be booked either as a one-off or on a recurring basis. As you've found out, it's the recurrence rules that kill you. The additional twist to my problem was that there could be conflicts, i.e. two people could try to book the same boardroom for the same date and time.
I split my models into Boardroom (obviously) and Event (which is the booking associated to a User). I think there was a join model, as well, but it's been a while. When a User would try to book a boardroom, this is the process taken:
Attempt to book on the first available date (done through the calendar UI by the user similar to how Google Calendar creates events)
If it's a one-off, you're done
If it's a recurring event, try to immediately book the next 6 events based on the rule given (weekly, bi-weekly, monthly); If it fails, due to conflict, book the ones you can, e-mail the conflicts to the user
Book for the next year or up to the date the recurrence is ending in a background job; Follow the conflict resolution rule from #3
When resolving the conflicts, the user had the option of either resolving them on a case-by-case basis or moving the remaining bookings to the new, available date and time.
If the user updated the original booking (e.g changed the time and date), he/she had the option of updating only the that one or every following recurrence. If the latter was selected, steps 3 and 4 are re-invoked after the deletion of existing events.
If this sounds a lot like Google Calendar, then you've fully understood my approach, :)
Hope this helps.

I personally think that (in python which I know well), and ruby (which I know less well, but it's a dynamic language, and so I think the concepts map 1:1), you should be using generators. How's that for a minimalistic answer? Now, when you generate your UI, you pass in a reference to the generator, and it generates the objects you need, as they are requested.
As an interface, it has next item, and previous item methods, and acts a bit like a cursor that can wade forward and backward through the various interations. It is in fact, a piece of code masquerading as an infinite series (array) without using infinite memory.
Why do you need to proliferate objects? What you really need are virtual data display controls (for the web or desktop) also known as "paging" I think, in web contexts, and you can think of your schedule as an infinite generated-on-demand spreadsheet, with no top row, and no bottom row. The only values you need to be able to calculate (calculate, not store) are the ones that appear right now, as visible to the user.

Related

Rails finding all user-scheduled events for a particular time?

I've got an app that allows users to schedule tasks to run whenever they desire. (A todo list with recurring items.)
I need to somehow re-trigger these events to show up again each time their schedule comes up by updating an attribute on the object - it may also send a notification to the user.
My plan for this was to have a cron job that runs every minute/hour/short interval, and in that job, it would find all of the items with schedules that match the current time or should be updated since the last job, however, short of iterating through every item, I don't see a quick way of querying for those objects.
Using Ice Cube I can very easily and cleanly save schedules in my database, but I don't see a method of finding all events that match a particular point in time.
I know once I find the item I can run occurring_between? or occurring_at? to find if I should run it, but that requires pulling every single item into memory and manually checking it, which is not very scaleable.
Is there a way I'm missing, or are there other suggestions for accomplishing what I'm trying to do here? It's still pretty early, so I'm not attached to Ice Cube or any of the current implementations.
After some more thought- I'm not seeing any way to do this, so I've come up with a little hack that I'll do instead:
On the item object, I'll have 2 additional attributes. One will be the schedule which is fed to Ice Cube to generate all of the dates/times to recur at. The next will be next_occurrence, which I'll set on create and each time the item is renewed.
Then in the worker, I'll query for all items that have a next_occurrence in the past and process them, resetting the next_occurrence to be the next time the schedule is to occur.
I'll leave this answer unmarked for a bit in case anybody has a better solution.

perform some logic before object is loaded from database

The situation:
The magazine accepts submissions. Once you submit, an editor will schedule your submission for review. Once it has been reviewed, you are no longer allowed to edit it.
So, I have submission in various states. "Draft", "queued", "reviewed", etc. Most of the switches into these various states are triggered by some action, e.g., a submission becomes queued when an editor schedules it. Easy peasey. However, the switch into the "reviewed" state is not triggered by any action, it just happens after a certain datetime has passed.
I have two thoughts on how to accomplish this:
Run a daily/hourly cron job to check up on all the queued submissions and switch them to reviewed if necessary. I dislike this because
I would prefer it to be hourly, so that I can edit my submission up to three hours before a meeting starts
Hourly cron jobs cost money on Heroku, and this application will either never make money or won't make money for months and months to come
Somehow construct a before_load ActiveRecord callback, that will perform some logic on submissions each time they are loaded. "Queued? No? Nevermind. Otherwise, switch it to 'Reviewed' if its meeting is less than three hours away."
I wanted to get people's input on the second idea.
Is that an atrociously smelly way to accomplish this?
If so, can you suggest an awesomer third way?
If 'no' to both of the above, can you give tips on how to perform such logic each time a record is loaded from the database? I would need to always perform some logic before doing a select from the submissions table (which is gearing up to be the most-queried table in the app...)
If there's no good way to accomplish Option Two (or, I hope!, Option Three), I will resort to Option One with a daily cron job. Being able to edit up to a day before a meeting will just have to suffice.
Maybe using after_find, although your performance will sort of suck, same goes if you do something as crazy as before_load, performance would suck, that said money might be more important than performance, if that is so, I would go with the after_find.

Recommendations on handling object status fields in rails apps: store versus calculate?

I have a rails app that tracks membership cardholders, and needs to report on a cardholder's status. The status is defined - by business rule - as being either "in good standing," "in arrears," or "canceled," depending on whether the cardholder's most recent invoice has been paid.
Invoices are sent 30 days in advance, so a customer who has just been invoiced is still in good standing, one who is 20 days past the payment due date is in arrears, and a member who fails to pay his invoice more than 30 days after it is due would be canceled.
I'm looking for advice on whether it would be better to store the cardholder's current status as a field at the customer level (and deal with the potential update anomalies resulting from potential updates of invoice records without updating the corresponding cardholder's record), or whether it makes more sense to simply calculate the current cardholder status based on data in the database every time the status is requested (which could place a lot of load on the database and slow down the app).
Recommendations? Or other ideas I haven't thought of?
One important constraint: while it's unlikely that anyone will modify the database directly, there's always that possibility, so I need to try to put some safeguards in place to prevent the various database records from becoming out of sync with each other.
The storage of calculated data in your database is generally an optimisation. I would suggest that you calculate the value on every request and then monitor the performance of your application. If the fact that this data is not stored becomes an issue for you then is the time to refactor and store the value within the database.
Storing calculated values, particularly those that can affect multiple tables are generally a bad idea for the reasons that you have mentioned.
When/if you do refactor and store the value in the DB then you probably want a batch job that checks the value for data integrity on a regular basis.
The simplest approach would be to calculate the current cardholder status based on data in the database every time the status is requested. That way you have no duplication of data, and therefore no potential problems with the duplicates becoming out of step.
If, and only if, your measurements show that this calculation is causing a significant slowdown, then you can think about caching the value.
Recently I had similar decision to take and I decided to store status as a field in database. This is because I wanted to reduce sql queries and it looks simpler. I choose to do it that way because I will very often need to get this status and calculating it is (at least in my case) a bit complicated.
Possible problem with it is that it get out of sync, so I added some after_save and after_destroy to child model, to keep it synchronized. And of course if somebody would modify database in different way, it would make some problems.
You can write simple rake task that will check all statuses and, if needed, correct them. You can run it in cron so you don't have to worry about it.

When I have required model relationships, how do I guard against errors?

I have an application with a lot of database relationships that depend on each other to successfully operate the application. The hinge in the application is a model called the Schedule, but the schedule will pull Blocks, an Employee, a JobTitle, and an Assignment (in addition to that, every Block will pull an assignment from the database along with it as well) to assemble an employees schedule throughout the day.
When I built the app, I put a lot of emphasis on validations that would ensure that all of the pieces had to be in place before everything was saved to the database. This has worked out fantastically so far, and the app has been live and pounded on for almost 6 months, serving approximately 150,000 requests a month with no hiccups or errors. Until last week.
Last week, while someone was altering a schedule, it looks like the database erred, and a Schedule was saved to the database with its Assignment missing. Because the association is called in every view, whenever this schedule was called from the database, the application would throw an NoMethod error for calling on nil.
When designing an application in the way that I state, do you guard against a possible failure on the part of the database/validations? And if so how do you programatically defend against it? Do you check every relationship to make sure that it is not nil before sending it to the view?
I know this question is awash in generality, and if I can be more specific in what I mean, please let me know in the comments.
I would recommend adding database-enforced foreign key constraints and wrapping important groups of operations into transactions.
If there is a foreign-key between Schedule and Assignment somewhere, a database-enforced foreign key constraint would have prevented the errant insert. Additionally, if you wrap the particular action in a transaction, you can be sure that either the entire stream of inserts/updates/deletes happens or fails, reverting to a clean state.
In addition to your validations, and adding some database constraints as mentioned in other answers, you might also run a background job that periodically sweeps the database looking for orphans.
When it finds one, it cleans it up (if possible), or deletes it, or just marks it inactive and sends you email so you can look at it later. Depending on the amount and nature of your data, once a minute, once an hour, once a day...
That way, if bad data does get in despite whatever safeguards you have in place, you'll know about it sooner rather than later.
I'll argue the non-conventional wisdom on this. The constraints you describe don't belong in the database, they belong in your OO code. And it's not true that "the database erred", it's unquestionably true that the application is what inserted improperly validated data.
When you start expecting the database to carry the burden of these checks, you're putting business rules into the schema. At a minimum, this makes it a lot harder to write unit tests (which is where you should probably have caught this in the first place; but now is your chance to add another test.)
Ideally, you should be able to replace the RDBMS with some other generic data store and still have all the functional logic properly active and unchanged in the appropriate other places. The UI shouldn't be talking to the DAL much less dealing with database exceptions directly.
You can add the additional database constraints if you want, but it should be strictly as a backup. As you can see, handling database structural errors gracefully (especially if the UI is involved) is a lot harder.
If it's something that must be true in order for the app to function, that's really what assert()s are for. I've barely ever used Ruby, but I imagine it must have that concept. Use them to enforce preconditions in various places throughout your code. That combined with sanitizing and validating your external (user) inputs should be enough to protect you. I think if something goes wrong after that amount of checking, your app is righteously allowed to crash (in a controlled manner, of course).
I doubt the problem you're experiencing is a bug in your database. More likely there's some edge case in your validations that you've overlooked.

Letting something happen at a certain time with Rails

Like with browser games. User constructs building, and a timer is set for a specific date/time to finish the construction and spawn the building.
I imagined having something like a deamon, but how would that work? To me it seems that spinning + polling is not the way to go. I looked at async_observer, but is that a good fit for something like this?
If you only need the event to be visible to the owning player, then the model can report its updated status on demand and we're done, move along, there's nothing to see here.
If, on the other hand, it needs to be visible to anyone from the time of its scheduled creation, then the problem is a little more interesting.
I'd say you need two things. A queue into which you can put timed events (a database table would do nicely) and a background process, either running continuously or restarted frequently, that pulls events scheduled to occur since the last execution (or those that are imminent, I suppose) and actions them.
Looking at the list of options on the Rails wiki, it appears that there is no One True Solution yet. Let's hope that one of them fits the bill.
I just did exactly this thing for a PBBG I'm working on (Big Villain, you can see the work in progress at MadGamesLab.com). Anyway, I went with a commands table where user commands each generated exactly one entry and an events table with one or more entries per command (linking back to the command). A secondary daemon run using script/runner to get it started polls the event table periodically and runs events whose time has passed.
So far it seems to work quite well, unless I see some problem when I throw large number of users at it, I'm not planning to change it.
To a certian extent it depends on how much logic is on your front end, and how much is in your model. If you know how much time will elapse before something happens you can keep most of the logic on the front end.
I would use your model to determin the state of things, and on a paticular request you can check to see if it is built or not. I don't see why you would need a background worker for this.
I would use AJAX to start a timer (see Periodical Executor) for updating your UI. On the model side, just keep track of the created_at column for your building and only allow it to be used if its construction time has elapsed. That way you don't have to take a trip to your db every few seconds to see if your building is done.

Resources