I am looking for an OData query syntax which helps to solve Sum((DateDiff(minute, StartDate, EndDate) which we do in SqlServer. Is it possible to do such things using OData v4?
I tried the aggregate function but not able to use the sum operator on the duration type. Any idea?
You can't execute a query like that directly in standards compliant v4 service as the built in Aggregates all operate on single fields, for instance there is no support for creating a new arbitrary column to project the results into, this is mainly because the new column is undefined. By restricting the specification to only columns that are pre-defined in the resource itself, we can have a strong level of certainty on the structure of the data that will be returned.
If you are the author of the API, there are three common approaches that can achieve a query similar to your request.
Define a Custom Data Aggregate, this is way more involved than is necessary, but it means you could define the aggregate once and use it in many resource queries.
Only research this solution if you truly need to reuse the same aggregate on multiple resources
Define a Custom Function to compute the result of all or some elements in your query.
Think of a Function as similar to a SQL View, it is really just a way of expressing a custom query and custom response object that is associated with a resource.
It is common to use Functions to apply complex filter conditions that still return the resource that they are bound to, but you can return an entirely different structure of data if you want.
Exploit Open Type, this can sometimes be more effort than you expect, but can be managed if there is only a small number of common transformations you want to apply to the resource and project their results as discrete properties in addition to the standard resource definition.
In your case you could project DateDiff(minute, StartDate, EndDate) into its own discrete column, perhaps called Minutes or Duration. Then you could $apply a simple SUM across this new field.
Exposing a custom Function is usually the least effort approach, because you are not constrained by the shape of the result at all, it can be maintained in relative isolation from the main resource, as with Open Types, the useful thing about functions is that the caller can still apply OData aggregates to the result of the Function.
If the original post is updated with some more detailed code examples, I can elabortate on the function implementation, however in this state I hope this information sets you on the right path.
Related
I have a list of IPs that I want to filter out of many queries that I have in sumo logic. Is there a way to store that list of IPs somewhere so it can be referenced, instead of copy pasting it in every query?
For example, in a perfect world it would be nice to define a list of things like:
things=foo,bar,baz
And then in another query reference it:
where mything IN things
Right now I'm just copying/pasting. I think there may be a way to do this by setting up a custom data source and putting the IPs in there, but that seems like a very round-about way of doing it, and wouldn't help to re-use parts of a query that aren't data (eg re-use statements). Also their template feature is about parameterizing a query, not re-use across many queries.
Yes. There's a notion of Lookup Tables in Sumo Logic. Consult:
https://help.sumologic.com/docs/search/lookup-tables/create-lookup-table/
for details.
It allows to store some values (either manually once, or in a scheduled way as as a result of some query) with | save operator.
And then you can refer to these values using | lookup which is conceptually similar to SQL's JOIN.
Disclaimer: I am currently employed by Sumo Logic.
I'm struggling to find a real world example on how to use google cloud dataflow combiners to run a common ETL tasl which aggregates records on multiple keys (e.g. Date, Location) and sums values over different measures (e.g. GrossValue, NetValue, Quantity). I can only find examples with a typical Key/Value (e.g. Day/Value) aggregation. Any hints on how this is done with the Python SDK would be appreciated.
I'm not 100% sure I understand your question. Do you have separate elements you are trying to join the data together for, in which case you may wish to use CoGroupByKey? Or does a single element have multiple fields?
Hope some of this info helps,
I would suggest looking at windowing, which will allow you to subdivide a PCollection according to the timestamps of its individual elements. If you want to see all the events for particular day this may be useful. Python examples of windowing. You may want to window across a days worth of data. This link is useful as well to understand how you can use GroupByKey in different ways,
Another option is to determine what date your elements belongs to, and use a group by key to key it with "[location][date][other]". You may need to do something like this if you want to join the data based on multiple fields.
See this GroupByKey example, but change the key to use your multiple fields concatenated.
Here is an example for reducing with a custom combiner. You can add logic here to do a custom aggregation for multiple different measurements.
There are several possible ways I can think of to store and then query temporal data in Neo4j. Looking at an example of being able to search for recurring events and any exceptions, I can see two possibilities:
One easy option would be to create a node for each occurrence of the event. Whilst easy to construct a cypher query to find all events on a day, in a range, etc, this could create a lot of unnecessary nodes. It would also make it very easy to change individual events times, locations etc, because there is already a node with the basic information.
The second option is to store the recurrence temporal pattern as a property of the event node. This would greatly reduce the number of nodes within the graph. When searching for events on a specific date or within a range, all nodes that meet the start/end date (plus any other) criteria could be returned to the client. It then boils down to iterating through the results to pluck out the subset who's temporal pattern gives a date within the search range, then comparing that to any exceptions and merging (or ignoring) the results as necessary (this could probably be partially achieved when pulling the initial result set as part of the query).
Whilst the second option is the one I would choose currently, it seems quite inefficient in that it processes the data twice, albeit a smaller subset the second time. Even a plugin to Neo4j would probably result in two passes through the data, but the processing would be done on the database server rather than the requesting client.
What I would like to know is whether it is possible to use Cypher or Neo4j to do this processing as part of the initial query?
Whilst I'm not 100% sure I understand you requirement, I'd have a look at this blog post, perhaps you'll find a bit of inspiration there: http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html
I am using the Neo4j .NET Client ExecuteGetCypherResults to run cypher. It expects everything to come back in a single column. I have simple class JobType which contains a list of JobSpecialties on it. In the database this is modeled as the Types having a relationship to the Specialties.
I need a cypher query that returns the results as such, in a single column. The related Specialties should be a child property of the Type node I would expect the query to look like this:
start s=node:node_auto_index(StartType='JobTypes')
match s-[:starts]->t, t-[:SubTypes]->ts
return {Id: t.Id, Name: t.Name, JobSpecialties: ts}
But this doesn't work. I can't figure out from the docs if this is even possible. If there is a better way to get the result back to the .Net client, I am open to suggestions.
start s=node:node_auto_index(StartType='JobTypes')
match s-[:SubTypes]->js
return s.Id, s.Name, js;
ExecuteGetCypherResults does support multiple columns, you just need to kick our deserializer into a different mode. This is an implementation detail generally hidden behind our higher level APIs, which is why this isn't obvious.
When you call new CypherQuery, pass CypherResultMode.Projection instead of CypherResultMode.Set.
I actually can't remember why we have this. Sometime, I'll need to dig through the lower levels and try and kill it. Pull requests welcomed. :)
As a preference though, we always prefer people to use the higher level APIs (but we recognise there are some limitations).
It sounds like the .Net client needs some updating for cypher. Cypher doesn't support building maps on the fly yet, although it is something that is in the feature request list already...
You can create an array with your results (but as of 1.9.M04, they need to be the same type to be merged into the array):
http://console.neo4j.org/r/xo7voi
I've actually submitted a pull request (through back channels, since it broke some unit tests) to fix that (so you can have multiple types in an array built on the fly), but I think there are some concerns whether merging of different types is a good idea.
https://github.com/wfreeman/neo4j/commit/ca457ace0df4732376833b8694e4affac4143244
Update: This will be fixed in 1.9.M05/1.9.GA. Now you can build an array with any type mixed:
http://console.neo4j.org/r/vm4f83
My objective:
I have built a working controller action in MVC which takes user input for various filter criteria and, using PredicateBuilder (part of LinqKit - sorry, I'm not allowed enough links yet) builds the appropriate LINQ query to return rows from a "master" table in SQL with a couple hundred thousand records. My implementation of the predicates is totally inelegant, as I'm new to a lot of this, and under a very tight deadline, but it did make life easier. The page operates perfectly as-is.
To this, I need to add a Full-Text search filter. Understanding the way LINQ translates Contains to LIKE(%%), using the advice in Simon Blog: LINQ-to-SQL - Enabling Full-Text Searching, I've already prepared Table Functions in SQL to run Freetext queries on the relevant columns. I have 4 functions, to match the query against 4 separate tables.
My approach:
At the moment, I'm building the predicates (I'll spare you) for the initial IQueryable data object, running a LINQ command to return them, like so:
var MyData = DB.Master_Items.Where(outer);
Then, I'm attempting to further filter MyData on the Keys returned by my full-text search functions:
var FTS_Matches_Subtable_1 = (from tbl in DB.Subtable_1
join fts in DB.udf_Subtable_1_FTSearch(KeywordTerms)
on tbl.ID equals fts.ID
select tbl.ForeignKey);
... I have 4 of those sets of matches which I've tried to use to filter my original dataset in several ways with no success. For instance:
MyNewData = MyData.Where(d => FTS_Matches_Subtable_1.Contains(d.Key) ||
FTS_Matches_Subtable_2.Contains(d.Key) ||
FTS_Matches_Subtable_3.Contains(d.Key) ||
FTS_Matches_Subtable_4.Contains(d.Key));
I just get the error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I get that it's because I'm trying to pass a relatively large set of data into the Contains function and LINQ is converting each record into a separate parameter, exceeding the limit.
I just don't know how to get around it.
I found another post linq expression to return property value which seemed SO promising. I tried ifwdev's solution (2nd highest ranked answer): using LinqKit to build an extension that will break up the queries into manageable chunks. But I can't figure out how to implement it. Out of my depth right now maybe?
Is there another approach that I'm missing? Some simpler way to accomplish this that I've overlooked?
Sorry for the long post. But thank you for any help you can provide!
This is a perfect time to go back to raw ado.net.
Twisting things around just to use linq to sql is probably just as time consuming if you wrote the query and hydration by hand.