How do I know the query_group of a query which was run? - psql

I need help in figuring out what was the query_group of a query which was run on redshift. I have set a query_group in the wlm config and want to make sure the query is getting executed from that query group.

query_group is the part of WLM(workload management) configuration which enables you to manage how to run queries through queues on the Redshift cluster. To use query_group, you have to set up your own queue with query_group name(Label) through the AWS console([Amazon Redshift] -> [Parameter Groups] -> Select parameter group -> [WLM]) or cli in advance.
Here is an example which is snipped from the Redshift doc.
set query_group to 'Monday';
select * from category limit 1;
...
reset query_group
You have to set the query_group before starting the query which you want to assign to the specific queue, and reset the query_group after finishing.
You can track the queries of query_group as following. 'label' is the name of query_group.
select query, pid, substring, elapsed, label
from svl_qlog where label ='Monday'
order by query;
query | pid | substring | elapsed | label
------+------+------------------------------------+-----------+--------
789 | 6084 | select * from category limit 1; | 65468 | Monday
790 | 6084 | select query, trim(label) from ... | 1260327 | Monday
791 | 6084 | select * from svl_qlog where .. | 2293547 | Monday
792 | 6084 | select count(*) from bigsales; | 108235617 | Monday
...
This document is good to understand how WLM works and use it.
http://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
This link is about query_group.
http://docs.aws.amazon.com/redshift/latest/dg/r_query_group.html

Related

How to link jobs on coordinator and workers on a Citus database on PostgreSQL 12

I have Citus extension on a PostgresSQL server. And I want to see the statistics from pg_stat_statements of each worker through the coordinator node. However, there is no column to match the tables from coordinator and workers. Does anybody know how can I do that?
I am also interested on how the queryId is being computed by PostgreSQL.
So the pg_stat_statements tables on the coordinator would show something like:
userid | dbid | queryid | query | other statistics related columns
1 | 2 | 123 | SELECT * FROM a; | ...
While the pg_stat_statements tables on the worker would show something like:
userid | dbid | queryid | query | other statistics related columns
1 | 2 | 456 | SELECT * FROM a_shard1; | ...
1 | 2 | 789 | SELECT * FROM a_shard2; | ...
You can match the table names on workers (shards) to the distributed tables on the coordinator with the help of pg_dist_partition, and pg_dist_shard_placement tables. For matching the stats, you can check citus_stat_statements view.
(Cannot reply above answer so adding my answer here)
You can use below query to list location of shards of a specific table in a specific worker node (See last three filters in WHERE clause).
SELECT pg_dist_shard.shardid, pg_dist_node.nodename, pg_dist_node.nodeport
FROM pg_dist_shard, pg_dist_placement, pg_dist_node
WHERE pg_dist_placement.groupid = pg_dist_node.groupid AND
logicalrelid = '<distributedTableName>'::regclass AND
pg_dist_node.nodename = '<nodeName>' AND
pg_dist_node.nodeport = '<nodePort>';
Then you can execute below query in worker node of your interest to see what Citus executes for a specific shard in that worker node:
SELECT * FROM pg_stat_statements WHERE query LIKE '%_<shardId>%';

What is the best way to attach a running total to selected row data?

I have a table that looks like this:
Created at | Amount | Register Name
--------------+---------+-----------------
01/01/2019... | -150.01 | Front
01/01/2019... | 38.10 | Back
What is the best way to attach an ascending-by-date running total to each record which applies only to the register name the record has? I can do this in Ruby, but doing it in the database will be much faster as it is a web application.
The application is a Rails application running Postgres 10, although the answer can be Rails-agnostic of course.
Use the aggregate sum() as a window function, e.g.:
with my_table (created_at, amount, register_name) as (
values
('2019-01-01', -150.01, 'Front'),
('2019-01-01', 38.10, 'Back'),
('2019-01-02', -150.01, 'Front'),
('2019-01-02', 38.10, 'Back')
)
select
created_at, amount, register_name,
sum(amount) over (partition by register_name order by created_at)
from my_table
order by created_at, register_name;
created_at | amount | register_name | sum
------------+---------+---------------+---------
2019-01-01 | 38.10 | Back | 38.10
2019-01-01 | -150.01 | Front | -150.01
2019-01-02 | 38.10 | Back | 76.20
2019-01-02 | -150.01 | Front | -300.02
(4 rows)

Restrict Sumo Logic search to one timeslice bucket

I have logs being pushed to sumo logic once every day, but other co-workers have the ability to force a push to update statistics. This causes an issue where some sumo logic searches will find and return double (or more) than what is expected due to finding more than one message within the allocated time range.
I am wondering if there is some way I can use timeslice so that I am only looking at the last set of results within a 24h period?
My search that works when there is only one log in 24h:
| json field=_raw "Policy"
| count by policy
| sort by _count
What I am trying to achieve:
| json field=_raw "Policy"
| timeslice 1m
| where last(_timeslice)
| count by policy
| sort by _count
Found a solution, not sure if optimal.
| json field=_raw "Policy"
| timeslice 1m
| count by policy , _timeslice
| filter _timeslice in (sort by _timeslice desc | limit 1)
| sort by _count
| fields policy, _count
If I'm understanding your question right, I think you could try something with the accum operator:
*
| json field=_raw "Policy"
| timeslice 1m
| count by _timeslice, policy
| 1 as rank
| accum rank by _timeslice
| where _accum = 1
This would be similar to doing a window partition in SQL to get rid of duplicates.

Heroku Postgres performance monitoring with pg_stat_statements

I've been trying to troubleshoot some recurring H12/13 errors on Heroku. After exhausting everything I can find on Google/Heroku/Stack Overflow I'm now checking to see if some long-running database queries are causing the problem on the advice of Heroku support.
Update: I'm on a production Crane instance. Per the accepted answer below...it appears you cannot do this on Heroku. The best I've been able to do is filter them out per the SQL below:
SELECT u.usename, (total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time, query
FROM pg_stat_statements p
JOIN pg_user u ON (u.usesysid = p.userid)
WHERE query != '<insufficient privilege>'
ORDER BY 2
DESC LIMIT 10;
I'm trying to use Craig Kerstien's very useful post,
http://www.craigkerstiens.com/2013/01/10/more-on-postgres-performance/ but running into some permission issues.
When I query the pg_stat_statements table I get "insufficient privileges" for some of the longer-running queries and it doesn't appear that Heroku lets you change user permissions.
Does anyone know how I can change permissions see these queries on Heroku?
heroku pg:psql --remote production
psql (9.2.2, server 9.2.4)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.
d4k2qvm4tmu579=> SELECT
d4k2qvm4tmu579-> (total_time / 1000 / 60) as total_minutes,
d4k2qvm4tmu579-> (total_time/calls) as average_time,
d4k2qvm4tmu579-> query
d4k2qvm4tmu579-> FROM pg_stat_statements
d4k2qvm4tmu579-> ORDER BY 1 DESC
d4k2qvm4tmu579-> LIMIT 10;
total_minutes | average_time | query
------------------+-------------------+--------------------------
121.755079699998 | 11.7572250919775 | <insufficient privilege>
17.9371053166656 | 1.73208859315089 | <insufficient privilege>
13.8710526000023 | 1.33945202190106 | <insufficient privilege>
6.98494270000089 | 0.674497883626922 | <insufficient privilege>
6.75377774999972 | 0.652175543095124 | <insufficient privilege>
6.55192439999995 | 0.632683664174224 | <insufficient privilege>
3.84014626666634 | 1.12786802880252 | <insufficient privilege>
3.40574066666667 | 1399.61945205479 | <insufficient privilege>
3.16332020000008 | 0.929081204384053 | <insufficient privilege>
2.30192519999944 | 0.222284382614463 | <insufficient privilege>
(10 rows)
I can't answer your question directly but maybe take a look at the pg-extra's plugin which brings a lot of this goodness directly to the Heroku CLI and returns data :)
https://github.com/heroku/heroku-pg-extras
You need to be running a production level instance of Heroku Postgres in order to utilize pg_stat_statements. Even then, it will only be able to show you stats for queries run by your app (or any client using the heroku supplied credentials). You won't be able to see queries for superusers (posters, collectd). Production plans are Crane and up (I believe).
You can see the username by joining in pg_user:
SELECT u.usename, (total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time, query
FROM pg_stat_statements p
JOIN pg_user u ON (u.usesysid = p.userid) ORDER BY 2 DESC LIMIT 10;

select distinct records based on one field while keeping other fields intact

I've got a table like this:
table: searches
+------------------------------+
| id | address | date |
+------------------------------+
| 1 | 123 foo st | 03/01/13 |
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
| 5 | 456 foo st | 03/01/13 |
| 6 | 567 foo st | 03/01/13 |
+------------------------------+
And want a result set like this:
+------------------------------+
| id | address | date |
+------------------------------+
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
+------------------------------+
But ActiveRecord seems unable to achieve this result. Here's what I'm trying:
Model has a 'most_recent' scope: scope :most_recent, order('date_searched DESC')
Model.most_recent.uniq returns the full set (SELECT DISTINCT "searches".* FROM "searches" ORDER BY date DESC) -- obviously the query is not going to do what I want, but neither is selecting only one column. I need all columns, but only rows where the address is unique in the result set.
I could do something like Model.select('distinct(address), date, id'), but that feels...wrong.
You could do a
select max(id), address, max(date) as latest
from searches
group by address
order by latest desc
According to sqlfiddle that does exactly what I think you want.
It's not quite the same as your requirement output, which doesn't seem to care about which ID is returned. Still, the query needs to specify something, which is here done by the "max" aggregate function.
I don't think you'll have any luck with ActiveRecord's autogenerated query methods for this case. So just add your own query method using that SQL to your model class. It's completely standard SQL that'll also run on basically any other RDBMS.
Edit: One big weakness of the query is that it doesn't necessarily return actual records. If the highest ID for a given address doesn't corellate with the highest date for that address, the resulting "record" will be different from the one actually stored in the DB. Depending on the use case that might matter or not. For Mysql simply changing max(id) to id would fix that problem, but IIRC Oracle has a problem with that.
To show unique addresses:
Searches.group(:address)
Then you can select columns if you want:
Searches.group(:address).select('id,date')

Resources