Based on this talk: https://www.youtube.com/watch?v=srfaKA2wJ0s
I would like to implement an analytics/time series query in GraphQL like
query {
sales(date: { start: ‘2017-01-01’, end: ‘2018-01-01’ }) {
revenue(stat: mean)
daily: interval(by: day) {
date
revenue
}
}
}
revenue(stat: mean) is an aggregation based on one statistic (mean in this case)
daily is a list of data points by hour/day/month
How to implement this using mongodb, or postgresql/mysql databases in a performant way?
Having given this a bit more thought...
I'd have sales(date: { start: ‘2017-01-01’, end: ‘2018-01-01’ }) resolve to an object that looks like this:
{
__type: 'SalesDateSelection',
start: ‘2017-01-01’,
end: '2018-01-01',
}
This is free because it's just a dumb object that holds the arguments it was created with, no expensive data access going on here.
Each of the child fields (revenue(stat: mean) and interval(by: day)) can then be resolved using a combination of their arguments and the data on this parent object. Which would presumably be 2 database queries?
Related
I have a rails app that connects to elastic search and the user could define some conditions like (at least),(at most), and ... to compare and find his specific result. everything is fine but now I have to add not equal to my comparison operator list and even calculate not equal for all of my conditions described before. for example, if the user wants to search, "not equal at least" I want to calculate the result based on the user query and then make them reverse in result I know make query response reverse, it's impossible in elastic search but does it possible elastic calculate not equal by himself, assume user want to know ((the count of the specific event did not happen at most 1 time in last 30 days)) I know elastic support not equal query, but just in bool and query string query, but in my case, I should use aggregation and terms query in it. aggregation like :
body_data = {
group_by_profile_id: {
terms: {
field: "profile_id.keyword",
min_doc_count: #count_first,
size: 10000
},
aggs: {
filter: {
bucket_selector: {
buckets_path: {
docCount: "_count"
},
script: "params.docCount #{#operator} #{#counts.last}"
}
}
}
}
}
does anyone knows how can i handle not equal query in aggregation and term query.
I work with R2DBC and i need to execute query, whiсh on request returns Flux of my
entities and after that i need to convert this entities to DTO's,but to create an DTO i need to make another query to the database for each entity, which returns some special info from another tables, for example:
This code doesn't work when total number of Ids exceeds 512
orderRepository.findByIds(listIds).flatMap{ order->
eventRepostiry.findByOrderId(order.id).map{events->
entityToDtoMapper.map(order,events,OrderWithEventsDto::class.java)
}
}
concatMap doesn't help.
But this code works
orderRepository.findByIds(listIds).collectList().flatMapMany{orders->
Flux.fromIterable(orders)
}.flatMap{ order->{
eventRepository.findByOrderId(order.id).collectList().flatMapMany{ events->
Flux.fromIterable(events)
}.map { event->
entityToDtoMapper.map(order,events,OrderWithEventsDto::class.java)
}
}
}
I think there’s a better solution to this problem. How am I supposed to do these queries right?
I have an Item model which has an attribute category. I want the items count grouped by category. I wrote a map reduce for this functionality. It was working fine. I recently wrote a script to create 5000 items. Now I realize my map reduce only gives the result for the last 80 records. The following is the code for the mapreduce function.
map = %Q{
function(){
emit({},{category: this.category});
}
}
reduce = %Q{
function(key, values){
var category_count = {};
values.forEach(function(value){
if(category_count.hasOwnProperty(value.category))
category_count[value.category]++;
else
category_count[value.category] = 1
})
return category_count;
}
}
Item.map_reduce(map,reduce).out(inline: true).first.try(:[],"value")
After researching a bit and I discovered mongodb invokes reduce function multiple times. How can achieve the functionality I intended for?
There is a rule you must follow when writing map-reduce code in MongoDB (a few rules, actually). One is that the emit (which emits key/value pairs) must have the same format for the value that your reduce function will return.
If you emit(this.key, this.value) then reduce must return the exact same type that this.value has. If you emit({},1) then reduce must return a number. If you emit({},{category: this.category}) then reduce must return the document of format {category:"string"} (assuming category is a string).
So that clearly can't be what you want, since you want totals, so let's look at what reduce is returning and work out from that what you should be emitting.
It looks like at the end you want to accumulate a document where there is a keyname for each category and its value is a number representing the number of its occurrences. Something like:
{category_name1:total, category_name2:total}
If that's the case then the correct map function would emit({},{"this.category":1}) in which case your reduce will need to add up the numbers for each key corresponding to a category.
Here is what the map should look like:
map=function (){
category = { };
category[this.category]=1;
emit({},category);
}
And here is the correct corresponding reduce:
reduce=function (key,values) {
var category_count = {};
values.forEach(function(value){
for (cat in value) {
if( !category_count.hasOwnProperty(cat) ) category_count[cat]=0;
category_count[cat] += value[cat];
}
});
return category_count;
}
Note that it satisfies two other requirements for MapReduce - it works correctly if the reduce function is never called (which will be the case if there is only one document in your collection) and it will work correctly if the reduce function gets called multiple times (which is what's happening when you have more than 100 documents).
A more conventional way to do that would be to emit category name as key and the number as value. This simplifies map and reduce:
map=function() {
emit(this.category, 1);
}
reduce=function(key,values) {
var count=0;
values.forEach(function(val) {
count+=val;
}
return count;
}
This will sum the number of times each category appears. This also satisfies requirements for MapReduce - it works correctly if the reduce function is never called (which will be the case for any category that only appears once) and it will work correctly if the reduce function gets called multiple times (which will happen if any category appears more than 100 times).
As others pointed out, aggregation framework makes the same exercise much simpler with:
db.collection.aggregate({$group:{_id:"$category",count:{$sum:1}}})
although that matches the format of the second mapReduce I showed, and not the original format that you had which is outputting category names as keys. However aggregation framework will always be significantly faster than MapReduce.
I agree with Neil Lunn's comment.
What I can see from the info that is provided is that if you are on a version of MongoDB greater or equal than 2.2 you can use the aggregation framework instead of map-reduce.
db.items.aggregate([
{ $group: { _id: '$category', category_count: { $sum: 1 } }
])
Which is a lot simpler and performant (see Map/Reduce vs. Aggregation Framework )
Essentially, I'm storing a directed graph of entities in CouchDB, and need to be able to find edges going IN and OUT of the graph.
SETUP:
The way the data is being stored right now is as follows. Each document represents a RELATION between two entities:
doc: {
entity1: { name: '' ... },
entity2: { name: '' ... }
...
}
I have a view which does a bunch of emits, two of which emit documents keyed on their entity1 component and on their entity2 component, so something like:
function() {
emit(['entity1', doc.entity1.name]);
emit(['entity2', doc.entity2.name]);
}
Edges are directed, and go from entity1 and entity2. So if I want to find edges going out of an entity, I just query the first emit; if I want edges going into an entity, I query the second emit.
PROBLEM:
The problem here lies in the fact that I also have the need to capture edges both going INTO and OUT OF entities. Is there a way I can group or reduce these two emits into a single bi-directional set of [x] UNIQUE pairs?
Is there a better way of organizing my view to promote this action?
It might be preferable to just create a second view. But there's nothing stopping you from cramming all sorts of different data into the same view like so:
function() {
if (doc.entity1.name == doc.entity2.name) {
emit(['self-ref', doc.entity1.name], 1);
}
emit(['both' [doc.entity1.name, doc.entity2.name]], 1);
emit(['either' [doc.entity1.name, "out"]], 1);
emit(['either' [doc.entity2.name, "in"]], 1);
emit(['out', doc.entity1.name], 1);
emit(['in', doc.entity2.name], 1);
}
Then you could easily do the following:
find all the self-ref's:
startkey=["self-ref"]&endkey=["self-ref", {}].
find all of the edges (incoming or outgoing) for a particular node:
startkey=["either", [nodeName]]&endkey=["either", [nodeName, {}]]
if you don't reduce this, then you'll still be preserving "in" vs "out" in the key. If you never need to query for all nodes with incoming or outgoing edges, then you can replace the last two emits with the "either" emits.
find all of the edges from node1 -> node2:
key=["both", [node1, node2]
as well as your original queries for incoming or outgoing for a particular node.
I'd recommend benchmarking your application's typical use cases before choosing between this combined view approach or a multi-view approach.
I have a domain class Schedule with a property 'days' holding comma separated values like '2,5,6,8,9'.
Class Schedule {
String days
...
}
Schedule schedule1 = new Schedule(days :'2,5,6,8,9')
schedule1.save()
Schedule schedule2 = new Schedule(days :'1,5,9,13')
schedule2.save()
I need to get the list of the schedules having any day from the given list say [2,8,11].
Output: [schedule1]
How do I write the criteria query or HQL for the same. We can prefix & suffix the days with comma like ',2,5,6,8,9,' if that helps.
Thanks,
Hope you have a good reason for such denormalization - otherwise it would be better to save the list to a child table.
Otherwise, querying would be complicated. Like:
def days = [2,8,11]
// note to check for empty days
Schedule.withCriteria {
days.each { day ->
or {
like('username', "$day,%") // starts with "$day"
like('username', "%,$day,%")
like('username', "%,$day") // ends with "$day"
}
}
}
In MySQL there is a SET datatype and FIND_IN_SET function, but I've never used that with Grails. Some databases have support for standard SQL2003 ARRAY datatype for storing arrays in a field. It's possible to map them using hibernate usertypes (which are supported in Grails).
If you are using MySQL, FIND_IN_SET query should work with the Criteria API sqlRestriction:
http://grails.org/doc/latest/api/grails/orm/HibernateCriteriaBuilder.html#sqlRestriction(java.lang.String)
Using SET+FIND_IN_SET makes the queries a bit more efficient than like queries if you care about performance and have a real requirement to do denormalization.