Custom function to calculate on all events in an Esper window - esper

I am wondering what is the best or most idiomatic way to expose all events in a window to a custom function. The following example is constructed following the stock price style examples used in the Esper online documentation.
Suppose we have the following Esper query:
select avg(price), custom_function(price) from OrderEvent#unique(symbol)
The avg(price) part returns an average of the most recent price for each symbol. Suppose we want custom_function to work in a similar manner, but it needs complex logic - it would want to iterate over every value in the window each time the result is needed (eg outlier detection methods might need such an algorithm).
To be clear, I'm requiring the algorithm look something like:
custom_function(window):
for each event in window:
update calculation
and there is no clever way to update the calculation as events enter or leave the window.
A custom aggregation could achieve this by pushing and popping events to a set, but this becomes problematic when primitive types are used. It also feels wasteful, as presumably esper already has the collection of events in the window so we prefer not to duplicate that.
Esper docs mention many ways to customize things, see this Solution Pattern, for example. Also mentioned is that the 'pull API' can iterate all events in a window.
What approaches are considered best to solve this type of problem?

For access to all events at the same time use window(*) or prevwindow(*).
select MyLibrary.computeSomething(window(*)) from ...
The computeSomething is a public static method of class MyLibrary (or define a UDF).
For access to individual event-at-a-time you could use an enumeration method. The aggregate-method has an initial value and accumulator lambda. There is an extension API to extend the existing enumeration methods that you could also use.
select window(*).aggregate(..., (value, eventitem) => ...) from ...
link to window(*) doc and
link to enum method aggregate doc and
link to enum method extension api doc

Related

Does odata v4 support aggregation on date values?

I am looking for an OData query syntax which helps to solve Sum((DateDiff(minute, StartDate, EndDate) which we do in SqlServer. Is it possible to do such things using OData v4?
I tried the aggregate function but not able to use the sum operator on the duration type. Any idea?
You can't execute a query like that directly in standards compliant v4 service as the built in Aggregates all operate on single fields, for instance there is no support for creating a new arbitrary column to project the results into, this is mainly because the new column is undefined. By restricting the specification to only columns that are pre-defined in the resource itself, we can have a strong level of certainty on the structure of the data that will be returned.
If you are the author of the API, there are three common approaches that can achieve a query similar to your request.
Define a Custom Data Aggregate, this is way more involved than is necessary, but it means you could define the aggregate once and use it in many resource queries.
Only research this solution if you truly need to reuse the same aggregate on multiple resources
Define a Custom Function to compute the result of all or some elements in your query.
Think of a Function as similar to a SQL View, it is really just a way of expressing a custom query and custom response object that is associated with a resource.
It is common to use Functions to apply complex filter conditions that still return the resource that they are bound to, but you can return an entirely different structure of data if you want.
Exploit Open Type, this can sometimes be more effort than you expect, but can be managed if there is only a small number of common transformations you want to apply to the resource and project their results as discrete properties in addition to the standard resource definition.
In your case you could project DateDiff(minute, StartDate, EndDate) into its own discrete column, perhaps called Minutes or Duration. Then you could $apply a simple SUM across this new field.
Exposing a custom Function is usually the least effort approach, because you are not constrained by the shape of the result at all, it can be maintained in relative isolation from the main resource, as with Open Types, the useful thing about functions is that the caller can still apply OData aggregates to the result of the Function.
If the original post is updated with some more detailed code examples, I can elabortate on the function implementation, however in this state I hope this information sets you on the right path.

Is there a method to splt a Map based on the FIRST node to satisfy a predicate?

There is a function called Map.partition that splits the map into 2 maps with one containing ALL elements that satisfy the predicate. The predicate takes a key and a value as arguments and examines each element of the map to determine which result map it belongs to.
My requirement is a special case of this. I have a map and I want to split into 2 maps based on whether or not the key is greater than or less than some value.
This would be much more efficient as you only have to search the tree until the output of the predicate changes. The current implementation would be O(n) and what I am looking for would be O(log(n)). This should be straight forward for a custom tree implementation but I would prefer to use the built in collections if I can, before I roll my own.
The documentation for the F# Maps can be found in the following link: https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/collections.map-module-%5Bfsharp%5D
Altough the operation you want to implement could be implemented in O(log(n)), it is not implemented in this module. The best option would be to use partition (would be O(n), as you said) or implement your own version of Map. You could also search for some code in github which implements a Red-Black Tree and include your own custom method for this operation.
Short Answer: No, there is not a method to split a Map based on the first node to satisfy a predicate.
EDIT: this -> the

How to write filtering query with graphql?

Currently we are using graphql/graphql-ruby library. I have wrote few queries and mutations as per our requirement.
I have the below use case, where i am not sure how to implement it,
I have already an query/endpoint named allManagers which return all manager details.
Today i have got requirement to implement another query to return all the managers based on the region filter.
I have 2 options to handle this scenario.
Create an optional argument for region , and inside the query i need to check if the region is passed then filter based on region.
Use something like https://www.howtographql.com/graphql-ruby/7-filtering/ .
Which approach is the correct one ?
Looks like you can accomplish this with either approach. #2 looks a bit more complicated, but maybe is more extensible if you end up adding a ton of different types of filters?
are you going to be asked to select multiple regions? or negative regions (any region except north america?) - those are the types of questions you want to be thinking about when choosing an approach.
Sounds like a good conversation to have with a coworker :)
I'd probably opt to start with a simple approach and then change it out for a more complex one when the simple solution isn't solving all of my needs any more.

Ruby event triggers

I am looking for a data structure or design pattern to work with rails and active record to provide a way to make configurable events and event triggers based on real time events.
While this is not the end usage for this sort of system, the follow example I believe demonstrates what I am trying to do. Similar to an log monitoring system like splunk, essentially what I am trying to do is create a system where I can take some attribute from an object and then compare it to a desired value and take perform an action if the evaluation is true.
Is there a library to do this or would something need to be rolled out from scratch. The way I was thinking about designing this would be similar to the following:
I would have an Actor (not in the traditional concurrency sense) which would house the attributes that I want to compare to. Then I would have a Trigger model which would have a pointer to the actor_id, attribute (IE count), comparator (<, <=, ==, >=, >), value, and action_id. The action_id would point to an object with a perform method that would just house the code that needs to run when the trigger is fired.
So in the end the trigger would evaluate to something like:
action.perform if actor.send(attribute) comparator value
Another option, possibly a more standard one, seems to develope a DSL (IE FQL for facebook). Would this be a better and more flexible approach.
Is this something that a library can handle or if not is this a decent structure for a system like the one I am proposing?
EDIT: Looks like a DSL might be the most flexible way to go. Tutorials for writing DSL in Ruby
If I understand the question correctly, what you have written is nearly correct as it stands. You can send comparison operators as symbols, like so:
comparator = :>
action.perform if actor.send(attribute).send(comparator, value)
# action will perform if actor's attribute is greater than value

How best to do a codebase-wide find only on *displayed* instances of a string in Rails?

In several places in a rather complex Rails app there are references to a particular kind of object; let's call them "apples". I'd like to change all of these user-facing references from "apples" to "oranges". This would be simple enough, except that I'd like to retain Apple as class, so I don't want to touch the myriad methods, variables, symbols, etc. that use the word "apple".
There are several orders of magnitude more instances of apple in the code proper than there are user-facing instances of "apple". My question is: How can I zero in on the relatively few displayed instances? Is there a way to perform a search on all and only what is displayed by a browser?
Unless you've taken a disciplined approach to separate your language from your code, such as using localization files, then no, there's no easy way to find instances of displayed text. How is a search supposed to differentiate between 'apple' used as a type column and 'apple' inserted into a page?
This is why you might want to take an approach where you don't embed language in your controllers and models. Instead you could create a helper method to describe them for you:
You have <%= pluralize_model(#apple, 10) %> left.
That method, if constructed properly, would render '10 apples' or whatever term you'd like to use for that type of object.

Resources