Dataflow Sliding Window vs Global window with trigger usecase?

Dataflow Sliding Window vs Global window with trigger usecase? - google-cloud-dataflow

I am designing a basket abandoning system for an Ecommerce company. The system will send a message to a user based on the below rules:
There is no interaction by the user on the site for 30 minutes.
Has added more than $50 worth of products to the basket.
Has not yet completed a transaction.
I use Google Cloud Dataflow to process the data and decide if a message should be sent. I have couple of options in below:
Use a Sliding window with a duration of 30 minutes.
A global window with a time based trigger with a delay of 30 minutes.
I think Sliding Window might work here. But my question is, can there be a solution based on using a global window with a processing time based trigger and a delay for this usecase?
As far as i understand the triggers based on Apache Beam documentation =>
Triggers allow Beam to emit early results, before a given window is closed. For example, emitting after a certain amount of time elapses, or after a certain number of elements arrives.
Triggers allow processing late data by triggering after the event time watermark passes the end of the window.
So, for my use case and as per the above trigger concepts, i don't think the trigger can be triggered after a set delay for each and every user (It is mentioned in above - can emit only after a certain number of elements it is mentioned above, but not sure if that could be 1). Can you confirm?

Both answers 1 - Sliding Windows and 2 - Global Window are incorrect
Sliding windows is not correct because - assuming there is one key per user, a message will be sent 30 minutes after they first started browsing even if they are still browsing
Global Windows is not correct because - it will cause messages to be sent out every 30 minutes to all users regardless of where they are in their current session
Even Fixed Windows would be incorrect in this case, because assuming there is one key per user, a message will be sent every 30 minutes
Correct answer would be - Use a session window with a gap duration of 30 minutes
This is correct because it will send a message per user after that user is inactive for 30 minutes

I think that sliding window is the correct approach from what you described, and I don't think you can solve this with trigger+delay. If event time sliding windowing makes sense from your business logic perspective, try to use it first, that's what it's for.
My understanding is that while you can use a trigger to produce early results, it is not guaranteed to fire at specific (server/processing) time or with exact number of elements (received so far for the window). The trigger condition enables/unblocks the runner to emit the window contents but it doesn't force it to do so.
In case of event time this makes sense, as it doesn't matter when the event arrives or when the trigger fires, because if the element has a timestamp within a window, then it will be assigned to the correct window no matter when it arrives. And when the trigger will fire for the window, the element will be guaranteed to be in that window if it has arrived.
With processing time you can't do this. If event arrives late, it will be accounted for at that time, and will be emitted next time the trigger fires, basically. And because the trigger doesn't guarantee the exact moment it fires you can potentially end up with unexpected data belonging to unexpected emitted panes. It is useful to get the early results in general but I am not sure if you can reason about windowing based on that.
Also, trigger delay only adds a fire delay (e.g. if it was supposed to fire at 12pm, not it will fire at 12.05pm) but it doesn't allow you to reliably stagger multiple trigger firings so that it goes off at specific intervals.
You can look at the design doc for triggers here: https://s.apache.org/beam-triggers , and possibly lateness doc may be relevant as well: https://s.apache.org/beam-lateness
Other docs can be found here, if you are interested: https://beam.apache.org/contribute/design-documents/ .
Update:
Rui pointed that this use case can be more complicated and probably not easily solvable by sliding windows. Maybe it's worth looking into session windows or manual logic on top of keys+state+timers

I find state[1] and timer[2] doc of Apache Beam, which should be able to handle this specific use case without using processing time trigger in global window.
Assuming the incoming data are events of users' actions, and each event(action) can be keyed by user_id.
The nice property that state and timer have is on per key and window basis. So you can accumulate state for each user_id and the state is amount of money in cart in this case. Timer can be set at the first time when amount in cart exceeds $50, and timer can be reset when user still have shopping actions within 30 mins in processing time.
Assume transaction completion is also a user_id keyed event. When an transaction completion event is seen, timer can be deleted[3].
update:
This idea is completely on processing time domain so it will have false alarm messages depending on lateness problem in system. So the question is how to improve this idea to event time domain so we have less false alarm. One possibility is event time based timer[4]. I am not clear what does event time based timer mean at this moment.
[1] https://beam.apache.org/blog/2017/02/13/stateful-processing.html
[2] https://docs.google.com/document/d/1zf9TxIOsZf_fz86TGaiAQqdNI5OO7Sc6qFsxZlBAMiA/edit#
[3] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/state/Timers.java#L45
[4] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/state/TimeDomain.java#L33

Related

Rails finding all user-scheduled events for a particular time?

I've got an app that allows users to schedule tasks to run whenever they desire. (A todo list with recurring items.)
I need to somehow re-trigger these events to show up again each time their schedule comes up by updating an attribute on the object - it may also send a notification to the user.
My plan for this was to have a cron job that runs every minute/hour/short interval, and in that job, it would find all of the items with schedules that match the current time or should be updated since the last job, however, short of iterating through every item, I don't see a quick way of querying for those objects.
Using Ice Cube I can very easily and cleanly save schedules in my database, but I don't see a method of finding all events that match a particular point in time.
I know once I find the item I can run occurring_between? or occurring_at? to find if I should run it, but that requires pulling every single item into memory and manually checking it, which is not very scaleable.
Is there a way I'm missing, or are there other suggestions for accomplishing what I'm trying to do here? It's still pretty early, so I'm not attached to Ice Cube or any of the current implementations.

After some more thought- I'm not seeing any way to do this, so I've come up with a little hack that I'll do instead:
On the item object, I'll have 2 additional attributes. One will be the schedule which is fed to Ice Cube to generate all of the dates/times to recur at. The next will be next_occurrence, which I'll set on create and each time the item is renewed.
Then in the worker, I'll query for all items that have a next_occurrence in the past and process them, resetting the next_occurrence to be the next time the schedule is to occur.
I'll leave this answer unmarked for a bit in case anybody has a better solution.

How to monitor a long process in a JSF 2/Primefaces/JBoss application

I have a JSF 2/Primefaces/JBoss application that has to run some long term processes from time to time. Specification says that once the process is started, its progress has to be monitored, showing the occurrence of intermediary phases of it.
The total count of steps is retrieved in the beginning and a progress bar has to show the process evolution and and estimate of the remaining time, based on the statistics calculated during the process execution. It's not like only showing that something is happening, but actually showing how much the process still remains to be done.
Even if the user closes the page, the process has to continue and if the user gets back to the page, it has to show the current progress status of the process. If the process is already finished, then some informations should be presented, like the number of objects processed and, the total time taken and some other statistics, like the object that took longer to be processed.
How to accomplish that in a JSF 2/Primefaces/JBoss application?

You question can be broken down into two main problems.
1. How to log and track the status of a job.
2. How to present this information in the UI.
Depending on your requirements and your JBoss version, you may want to consider using a managed thread. You must decide how you will track the process steps. You could log each completed step in a database or keep it in memory. How will errors be handled? What if the process does not complete? Once you have these back end design decisions completed and implemented you just need to figure out what you want the UI to be like.
As mentioned in the comments section, primefaces offers a couple different options such as polling or server sent events. If you are using polling leaving the page and revisiting should already be handled by presenting the result for any point in time. You would then just need to refresh the view on an interval.

Can a Passbook pass be relevant for multiple days?

I'm trying to use relevantDate to refine when the passes for my app are shown.
The relevantDate options I know about are: specify a start and end time on a single day, or specify a single time which seems to show the pass in the lock screen for about a four-hour window in either direction (!), at least for the "generic" pass type.
It seems like there ought to be a way to specify, e.g., for a coupon, that it should be shown on the lock screen when they're in the store for the next two weeks, at which point it should expire. Is there? If so, what is it?

Sorry, there's not currently a way to do this.
Relevance is a cooperative effort between you and the system. Your pass gives a point in time (the relevant date) and/or a point in space (the relevant locations). There's no API to provide a duration or a region. The system decides what radius to use around that location and what window of time to use around the date. There is some documentation for the relevant locations in the Passbook guide but the time window is not documented. In practice, the time window is on the order of 4-8 hours, depending on the pass style.
You should go on Apple's bug report page and file an enhancement request, describing how it would add value to your coupons to be relevant for multiple days.

Although not quite what you are looking for, you could send a push update to remove the locations after the promotion period ends.
If you have a promotion that is valid in a particular store, then you could use locations with relevantText appropriate for the promotion period E.g. 20% Off, Ends Jan 20. Once the promotion ends, you then send a push with no locations (or replace with a new offer).
The relevantDate key is not supported by the coupon or storeCard pass types and there is no way to specify a custom lock screen message for a time based alert, so personally I prefer to use location alerts whenever a location is known. The exception would be when it makes sense to remind the user a few hours beforehand (E.g. for a dental appointment or scheduled personal training session).

How to find out when user-created calendar events are about to start [Rails]

I'm building an online calendar in Ruby on Rails that needs to send out email notifications whenever a user-created event is about to start/finish (i.e you get a reminder when a meeting is 5 minutes away). What's the best way of figuring out when an event is about to start? Would there be a cron task that checks through all events to find out which ones are starting within a certain threshold (i.e 5 minutes) ? A cron task seems inefficient to me, so I'm wondering what might be a better solution. My events are stored in a mySQL database. There must be a design pattern for this... I'm just at a loss for what to search for.
Any help would be greatly appreciated. Thanks.

In all likelihood you will probably implement some background queuing mechanism to actually deliver the notifications - at least you certain should be considering this approach.
Assuming this, why not create your delayed notification jobs at event creation time to be delivered when the associated event is starting or finishing. The background queue, which is already waking up periodically to look for work, will pick these up and run them.
However adopting this approach requires you to consider the following (at least):
Removing queued notification job if the associated event is removed
Amending the notification job if the associated event is amended (say a new time)
Ensuring that the polling resolution of the queuing system does not allow notifications to be delivered so late as to be useless.
If you haven't picked a queuing solution for your application you should consider these options

Letting something happen at a certain time with Rails

Like with browser games. User constructs building, and a timer is set for a specific date/time to finish the construction and spawn the building.
I imagined having something like a deamon, but how would that work? To me it seems that spinning + polling is not the way to go. I looked at async_observer, but is that a good fit for something like this?

If you only need the event to be visible to the owning player, then the model can report its updated status on demand and we're done, move along, there's nothing to see here.
If, on the other hand, it needs to be visible to anyone from the time of its scheduled creation, then the problem is a little more interesting.
I'd say you need two things. A queue into which you can put timed events (a database table would do nicely) and a background process, either running continuously or restarted frequently, that pulls events scheduled to occur since the last execution (or those that are imminent, I suppose) and actions them.
Looking at the list of options on the Rails wiki, it appears that there is no One True Solution yet. Let's hope that one of them fits the bill.

I just did exactly this thing for a PBBG I'm working on (Big Villain, you can see the work in progress at MadGamesLab.com). Anyway, I went with a commands table where user commands each generated exactly one entry and an events table with one or more entries per command (linking back to the command). A secondary daemon run using script/runner to get it started polls the event table periodically and runs events whose time has passed.
So far it seems to work quite well, unless I see some problem when I throw large number of users at it, I'm not planning to change it.

To a certian extent it depends on how much logic is on your front end, and how much is in your model. If you know how much time will elapse before something happens you can keep most of the logic on the front end.
I would use your model to determin the state of things, and on a paticular request you can check to see if it is built or not. I don't see why you would need a background worker for this.

I would use AJAX to start a timer (see Periodical Executor) for updating your UI. On the model side, just keep track of the created_at column for your building and only allow it to be used if its construction time has elapsed. That way you don't have to take a trip to your db every few seconds to see if your building is done.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart