REST API pull query with group by clouse - ksqldb

Need advice or some ideas for my case.
Here is my query pull
{
“ksql”: “SELECT rtdate as trans_date , trb102 as account_number, count() as freq_trans_today FROM trx_log WHERE rtdate = ‘2022-12-29’ GROUP BY rtdate , trb102 having count() >= 3;”,
“streamsProperties”: {}
}
But I got an error
{
“#type”: “statement_error”,
“error_code”: 40001,
“message”: “Pull queries don’t support GROUP BY clauses. See Queries - ksqlDB Documentation for more info.\nAdd EMIT CHANGES if you intended to issue a push query.\nStatement: SELECT rtdate as trans_date , trb102 as account_number, count() as freq_trans_today FROM trx_log WHERE rtdate = ‘2022-12-29’ GROUP BY rtdate , trb102 having count() >= 3;”,
“entities”:
}
When I add EMIT CHANGES, the request being LISTEN not return the value.

Related

Grails pagination set maximum records

When using pagination in groovy I want to have a max of 10 per page but also I want the query to return after I hit 500 records.
So let's say with my criteria there are 10,000 records that match, I want the PagedResultList to return 10 and the results.totalCount = 500 NOT 10,000.
I've been trying to do this with maxResults(500) in the criteria but have been unsuccessful. Can you not use maxResults when also using max?
max will return an ArrayList with your objects.
maxResults will return a PagedResultList of witch you can use totalCount to issue a second query to count the objects that match your criteria. The totalCount is a second query - it does not actually fetch all 10k objects.
So I'm not sure how to solve the issue you describe, since if the query does match 10k in your db, that will be your total count. You could possibly code in a check if totalcount is > 500, display 500.
But if your concern is GORM is pulling back 10k objects, it is not. See the first example http://docs.grails.org/3.2.10/ref/Domain%20Classes/createCriteria.html for more info.
A very simple controller to show the behavior:
def index() {
println "max:"
def q1 = Foo.createCriteria().list(max: 10) { }
println q1.class.name
println q1.totalCount
println "maxResults:"
def q2 = Foo.createCriteria().list {
maxResults(10)
}
println q2.class.name
// println q2.size() // totalCount doesn't exist on an ArrayList
}
output:
max:
Hibernate: select this_.id as id1_0_0_, this_.version as version2_0_0_,
this_.name as name3_0_0_ from foo this_ limit ?
grails.orm.PagedResultList
Hibernate: select count(*) as y0_ from foo this_
2
maxResults:
Hibernate: select this_.id as id1_0_0_, this_.version as version2_0_0_,
this_.name as name3_0_0_ from foo this_ limit ?
java.util.ArrayList
You can also see the hql generated is the same either way and a second query is sent for the totalCount.
Hope that helps.

Grails executeQuery subquery

im using grails 2.4.5
i want to retrieve the top 3 customers based on how many contracts they have
im trying to execute this code
def customers = Customer.executeQuery("Select cu, (Select count(*) from Contract co where co = cu.contract) from Customer cu",
[max: 3])
and it returns this error
left and right hand sides of a binary logic operator were incompatibile [com.cms.Contract : java.util.Set(com.cms.Customer.contract)]
i understand that the co and cu.contract types are not the same but i dont get why. can someone help me how this executeQuery of grails work. this is the only framework i used that have a static query execution but still need to follow a certain format.
what i really want to do is to generate a query like this
Select * from Customer cu order by (Select count(*) from Contract co where co.id = cu.id)
You could try a Criteria query with projections:
def results = Customer.createCriteria().list() {
createAlias( 'contracts', 'contractalias' )
projections {
groupProperty( 'contractalias.contract' )
count( 'contractalias.contract', 'contractCount' )
}
maxResults( 3 )
order 'contractCount', 'desc'
}
I'm not 100% of your field names so had to assume in above query.
It's often useful to turn on sql logging when trying out these queries e.g. add following to development DataSource
development {
dataSource {
...
logSql = true
}
}

Getting Conditional Count in Join with Laravel Query Builder

I am trying to achieve the following with Laravel Query builder.
I have a table called deals . Below is the basic schema
id
deal_id
merchant_id
status
deal_text
timestamps
I also have another table called merchants whose schema is
id
merchant_id
merchant_name
about
timestamps
Currently I am getting deals using the following query
$deals = DB::table('deals')
-> join ('merchants', 'deals.merchant_id', '=', 'merchants.merchant_id')
-> where ('merchant_url_text', $merchant_url_text)
-> get();
Since only 1 merchant is associated with a deal, I am getting deals and related merchant info with the query.
Now I have a 3rd table called tbl_deal_votes. Its schema looks like
id
deal_id
vote (1 if voted up, 0 if voted down)
timestamps
What I want to do is join this 3rd table (on deal_id) to my existing query and be able to also get the upvotes and down votes each deal has received.
To do this in a single query you'll probably need to use SQL subqueries, which doesn't seem to have good fluent query support in Laravel 4/5. Since you're not using Eloquent objects, the raw SQL is probably easiest to read. (Note the below example ignores your deals.deal_id and merchants.merchant_id columns, which can likely be dropped. Instead it just uses your deals.id and merchants.id fields by convention.)
$deals = DB::select(
DB::raw('
SELECT
deals.id AS deal_id,
deals.status,
deals.deal_text,
merchants.id AS merchant_id,
merchants.merchant_name,
merchants.about,
COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count,
COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count
FROM
deals
JOIN merchants ON (merchants.id = deals.merchant_id)
LEFT JOIN (
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1 && deal_id
GROUP BY deal_id
) tbl_upvotes ON (tbl_upvotes.deal_id = deals.id)
LEFT JOIN (
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id
) tbl_downvotes ON (tbl_downvotes.deal_id = deals.id)
')
);
If you'd prefer to use fluent, this should work:
$upvotes_subquery = '
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1
GROUP BY deal_id';
$downvotes_subquery = '
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id';
$deals = DB::table('deals')
->select([
DB::raw('deals.id AS deal_id'),
'deals.status',
'deals.deal_text',
DB::raw('merchants.id AS merchant_id'),
'merchants.merchant_name',
'merchants.about',
DB::raw('COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count'),
DB::raw('COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count')
])
->join('merchants', 'merchants.id', '=', 'deals.merchant_id')
->leftJoin(DB::raw('(' . $upvotes_subquery . ') tbl_upvotes'), function($join) {
$join->on('tbl_upvotes.deal_id', '=', 'deals.id');
})
->leftJoin(DB::raw('(' . $downvotes_subquery . ') tbl_downvotes'), function($join) {
$join->on('tbl_downvotes.deal_id', '=', 'deals.id');
})
->get();
A few notes about the fluent query:
Used the DB::raw() method to rename a few selected columns.
Otherwise, there would have been a conflict between deals.id
and merchants.id in the results.
Used COALESCE to default null votes to 0.
Split the subqueries into separate PHP strings to improve readability.
Used left joins for the subqueries so deals with no upvotes/downvotes still show up.

Solving a PG::GroupingError: ERROR

The following code gets all the residences which have all the amenities which are listed in id_list. It works with out a problem with SQLite but raises an error with PostgreSQL:
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: {amenity_id: id_list}).
references(:listed_amenities).
group(:residence_id).
having("count(*) = ?", id_list.size)
The error on the PostgreSQL version:
What do I have to change to make it work with PostgreSQL?
A few things:
references should only be used with includes; it tells ActiveRecord to perform a join, so it's redundant when using an explicit joins.
You need to fully qualify the argument to group, i.e. group('residences.id').
For example,
id_list = [48, 49]
Residence.joins(:listed_amenities).
where(listed_amenities: { amenity_id: id_list }).
group('residences.id').
having('COUNT(*) = ?", id_list.size)
The query the Ruby (?) code is expanded to is selecting all fields from the residences table:
SELECT "residences".*
FROM "residences"
INNER JOIN "listed_amenities"
ON "listed_amentities"."residence_id" = "residences"."id"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1;
From the Postgres manual, When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column.
You'll need to either group by all fields that aggregate functions aren't applied to, or do this differently. From the query, it looks like you only need to scan the amentities table to get the residence ID you're looking for:
SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
And then fetch your residence data with that ID. Or, in one query:
SELECT "residences".*
FROM "residences"
WHERE "id" IN (SELECT "residence_id"
FROM "listed_amenities"
WHERE "listed_amenities"."amenity_id" IN (48,49)
GROUP BY "residence_id"
HAVING count(*) = 2
ORDER BY "residences"."id" ASC
LIMIT 1
);

PSQL group by vs. aggregate speed

So, the general question is, what's faster, taking an aggregate of a field or having extra expressions in the GROUP BY clause. Here are the two queries.
Query 1 (extra expressions in GROUP BY):
SELECT sum(subquery.what_i_want)
FROM (
SELECT table_1.some_id,
(
CASE WHEN some_date_field IS NOT NULL
THEN
FLOOR(((some_date_field - current_date)::numeric / 7) + 1) * MAX(some_other_integer)
ELSE
some_integer * MAX(some_other_integer)
END
) what_i_want
FROM table_1
JOIN table_2 on table_1.some_id = table_2.id
WHERE ((some_date_field IS NOT NULL AND some_date_field > current_date) OR some_integer > 0) -- per the data and what i want, one of these will always be true
GROUP BY some_id_1, some_date_field, some_integer
) subquery
Query 2 (using an (arbitrary, because each record for the table 2 fields in question here have the same value (in this dataset)) aggregate function):
SELECT sum(subquery.what_i_want)
FROM (
SELECT table_1.some_id,
(
CASE WHEN MAX(some_date_field) IS NOT NULL
THEN
FLOOR(((MAX(some_date_field) - current_date)::numeric / 7) + 1) * MAX(some_other_integer)
ELSE
MAX(some_integer) * MAX(some_other_integer)
END
) what_i_want
FROM table_1
JOIN table_2 on table_1.some_id = table_2.id
WHERE ((some_date_field IS NOT NULL AND some_date_field > current_date) OR some_integer > 0) -- per the data and what i want, one of these will always be true
GROUP BY some_id_1
) subquery
As far as I can tell, psql doesn't provide good benchmarking tools. \timing on only times for one query, so running a benchmark with enough trials for meaningful results is... tedious at best.
For the record, I did do this at about n=50 and saw the aggregate method (Query 2) run faster on average, but a p value of ~.13, so not quite conclusive.
'sup with that?
The general answer - should be +- same. There's a chance to hit/miss function based index when using/not using functions on a field, but not aggregation function and in where clause more then in column list. But this is speculation only.
What you should use for analyzing execution is EXPLAIN ANALYZE. In plan you not only see scan types, but also number of iterations, cost and individual operations time. And of course you can use it with psql

Resources