Heroku Postgres performance monitoring with pg_stat_statements - ruby-on-rails

I've been trying to troubleshoot some recurring H12/13 errors on Heroku. After exhausting everything I can find on Google/Heroku/Stack Overflow I'm now checking to see if some long-running database queries are causing the problem on the advice of Heroku support.
Update: I'm on a production Crane instance. Per the accepted answer below...it appears you cannot do this on Heroku. The best I've been able to do is filter them out per the SQL below:
SELECT u.usename, (total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time, query
FROM pg_stat_statements p
JOIN pg_user u ON (u.usesysid = p.userid)
WHERE query != '<insufficient privilege>'
ORDER BY 2
DESC LIMIT 10;
I'm trying to use Craig Kerstien's very useful post,
http://www.craigkerstiens.com/2013/01/10/more-on-postgres-performance/ but running into some permission issues.
When I query the pg_stat_statements table I get "insufficient privileges" for some of the longer-running queries and it doesn't appear that Heroku lets you change user permissions.
Does anyone know how I can change permissions see these queries on Heroku?
heroku pg:psql --remote production
psql (9.2.2, server 9.2.4)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.
d4k2qvm4tmu579=> SELECT
d4k2qvm4tmu579-> (total_time / 1000 / 60) as total_minutes,
d4k2qvm4tmu579-> (total_time/calls) as average_time,
d4k2qvm4tmu579-> query
d4k2qvm4tmu579-> FROM pg_stat_statements
d4k2qvm4tmu579-> ORDER BY 1 DESC
d4k2qvm4tmu579-> LIMIT 10;
total_minutes | average_time | query
------------------+-------------------+--------------------------
121.755079699998 | 11.7572250919775 | <insufficient privilege>
17.9371053166656 | 1.73208859315089 | <insufficient privilege>
13.8710526000023 | 1.33945202190106 | <insufficient privilege>
6.98494270000089 | 0.674497883626922 | <insufficient privilege>
6.75377774999972 | 0.652175543095124 | <insufficient privilege>
6.55192439999995 | 0.632683664174224 | <insufficient privilege>
3.84014626666634 | 1.12786802880252 | <insufficient privilege>
3.40574066666667 | 1399.61945205479 | <insufficient privilege>
3.16332020000008 | 0.929081204384053 | <insufficient privilege>
2.30192519999944 | 0.222284382614463 | <insufficient privilege>
(10 rows)

I can't answer your question directly but maybe take a look at the pg-extra's plugin which brings a lot of this goodness directly to the Heroku CLI and returns data :)
https://github.com/heroku/heroku-pg-extras

You need to be running a production level instance of Heroku Postgres in order to utilize pg_stat_statements. Even then, it will only be able to show you stats for queries run by your app (or any client using the heroku supplied credentials). You won't be able to see queries for superusers (posters, collectd). Production plans are Crane and up (I believe).
You can see the username by joining in pg_user:
SELECT u.usename, (total_time / 1000 / 60) as total_minutes,
(total_time/calls) as average_time, query
FROM pg_stat_statements p
JOIN pg_user u ON (u.usesysid = p.userid) ORDER BY 2 DESC LIMIT 10;

Related

Using Crosstab to Generate Data for Charts

I'm trying to make an efficient query to create a view that will contains counts for the number of successful logins by day as well as by type of user with no duplicate users per day.
I have 3 tables involved in this query. One table that contains all successful login attempts, one table for standard user accounts, and one table for admin user accounts. All user_id values are unique across the entire database so there are no user accounts that will share the same user_id with an admin account:
TABLE 1: user_account
user_id | username
---------|----------
1 | user1
2 | user2
TABLE 2: admin_account
user_id | username
---------|----------
6 | admin6
7 | admin7
TABLE 3: successful_logins
user_id | timestamp
---------|------------------------------
1 | 2022-01-23 14:39:12.63798-07
1 | 2022-01-28 11:16:45.63798-07
1 | 2022-01-28 01:53:51.63798-07
2 | 2022-01-28 15:19:21.63798-07
6 | 2022-01-28 09:42:36.63798-07
2 | 2022-01-23 03:46:21.63798-07
7 | 2022-01-28 19:52:16.63798-07
2 | 2022-01-29 23:12:41.63798-07
2 | 2022-01-29 18:50:10.63798-07
The resulting view I would like to generate would contain the following information from the above 3 tables:
VEIW: login_counts
date_of_login | successful_user_logins | successful_admin_logins
---------------|------------------------|-------------------------
2022-01-23 | 1 | 1
2022-01-28 | 2 | 2
2022-01-29 | 1 | 0
I'm currently reading up on how crosstabs work but having trouble figuring out how to write the query based on my table setups.
I actually was able to get the values I needed by using the following query:
SELECT
to_char(s.timestamp, 'YYYY-MM-DD') AS login_date,
count(distinct u.user_id) AS successful_user_logins,
count(distinct a.user_id) AS successful_admin_logins
FROM successful_logins s
LEFT JOIN user_account u ON u.user_id= s.user_id
LEFT JOIN admin_account a ON a.user_id= s.user_id
GROUP BY login_date
However, I was told it would be even quicker using crosstabs, especially considering the successful_logins table contains millions of records. So I'm trying to also create a version of the query using crosstabs then comparing both execution times.
Any help would be greatly appreciated. Thanks!
Turns out it isn't possible to do what I was asking about using crosstabs, so the original query I have will have to do.

How to link jobs on coordinator and workers on a Citus database on PostgreSQL 12

I have Citus extension on a PostgresSQL server. And I want to see the statistics from pg_stat_statements of each worker through the coordinator node. However, there is no column to match the tables from coordinator and workers. Does anybody know how can I do that?
I am also interested on how the queryId is being computed by PostgreSQL.
So the pg_stat_statements tables on the coordinator would show something like:
userid | dbid | queryid | query | other statistics related columns
1 | 2 | 123 | SELECT * FROM a; | ...
While the pg_stat_statements tables on the worker would show something like:
userid | dbid | queryid | query | other statistics related columns
1 | 2 | 456 | SELECT * FROM a_shard1; | ...
1 | 2 | 789 | SELECT * FROM a_shard2; | ...
You can match the table names on workers (shards) to the distributed tables on the coordinator with the help of pg_dist_partition, and pg_dist_shard_placement tables. For matching the stats, you can check citus_stat_statements view.
(Cannot reply above answer so adding my answer here)
You can use below query to list location of shards of a specific table in a specific worker node (See last three filters in WHERE clause).
SELECT pg_dist_shard.shardid, pg_dist_node.nodename, pg_dist_node.nodeport
FROM pg_dist_shard, pg_dist_placement, pg_dist_node
WHERE pg_dist_placement.groupid = pg_dist_node.groupid AND
logicalrelid = '<distributedTableName>'::regclass AND
pg_dist_node.nodename = '<nodeName>' AND
pg_dist_node.nodeport = '<nodePort>';
Then you can execute below query in worker node of your interest to see what Citus executes for a specific shard in that worker node:
SELECT * FROM pg_stat_statements WHERE query LIKE '%_<shardId>%';

What is the best way to attach a running total to selected row data?

I have a table that looks like this:
Created at | Amount | Register Name
--------------+---------+-----------------
01/01/2019... | -150.01 | Front
01/01/2019... | 38.10 | Back
What is the best way to attach an ascending-by-date running total to each record which applies only to the register name the record has? I can do this in Ruby, but doing it in the database will be much faster as it is a web application.
The application is a Rails application running Postgres 10, although the answer can be Rails-agnostic of course.
Use the aggregate sum() as a window function, e.g.:
with my_table (created_at, amount, register_name) as (
values
('2019-01-01', -150.01, 'Front'),
('2019-01-01', 38.10, 'Back'),
('2019-01-02', -150.01, 'Front'),
('2019-01-02', 38.10, 'Back')
)
select
created_at, amount, register_name,
sum(amount) over (partition by register_name order by created_at)
from my_table
order by created_at, register_name;
created_at | amount | register_name | sum
------------+---------+---------------+---------
2019-01-01 | 38.10 | Back | 38.10
2019-01-01 | -150.01 | Front | -150.01
2019-01-02 | 38.10 | Back | 76.20
2019-01-02 | -150.01 | Front | -300.02
(4 rows)

Thinking Sphinx group by, with distinct count

I have the following manual Sphinx query (via the mySQL client), that is producing proper results, and I would like to call it through Thinking Sphinx from Rails. For the life of me, I am struggling with how to make a 'distinct' query work in Thinking Sphinx.
mysql> select merchant_name, count (distinct part_number) from product_core group by merchant_name;
+-----------------------+-----------------------------------------+
| merchant_name | count (distinct part_number) |
+-----------------------+-----------------------------------------+
| 1962041491 | 1 |
| 3208850848 | 1 |
| 1043652526 | 48754 |
| 770188128 | 1 |
| 374573991 | 34113 |
+-----------------------+-----------------------------------------+
Please note: This mySQL query is agaist Sphinx, NOT mySQL. I use the mySQL client to connect to Sphinx, as: mysql -h 127.0.0.1 -P 9306. This works well for debugging/development. My actual db, is Postgres.
Given this, and to add more context, I am attempting to combine a group_by in thinking Sphinx, with a count('Distinct' ...).
So, this query works:
Product.search group_by: :merchant_name
... and, this query works:
Product.count ('DISTINCT part_number')
... but, this combined query throws an error:
Product.search group_by: :merchant_name, count ('DISTINCT part_number')
SyntaxError: (irb):90: syntax error, unexpected ( arg, expecting keyword_do or '{' or '('
...merchant_name, count ('DISTINCT part_num...
Both merchant_name and part_number are defined as attributes.
Environment:
Sphinx 2.2.10-id64-release (2c212e0)
thinking-sphinx 3.1.4
rails 4.2.4
postgres (PostgreSQL) 9.3.4
I have also tried using Facets, but to no avail:
Product.search group_by: :merchant_name, facets: :part_number
Product.facets :part_number, group_by: :merchant_name
For additional information, and to see if this could be accomplished through a Thinking Sphinx call, here is a basic example. I have one product table (and associated index), that lists both merchants, and their products (I agree, it could be normalized, but its coming in from a data feed, and Sphinx can handle it as is):
+-----------------+-------------------+
| merchant | product |
+-----------------+-------------------+
| Best Buy | Android phone |
| Best Buy | Android phone |
| Best Buy | Android phone |
| Best Buy | iPhone |
| Amazon | Android phone |
| Amazon | iPhone |
| Amazon | iPhone |
| Amazon | iPhone |
| Amazon | Onkyo Receiver |
+-----------------+-------------------+
With Thinking Sphinx, I want to: a) group the rows by merchant, and b) create a “distinct” product count for each group.
The above example, should give the following result:
+-----------------+------------------------+
| merchant | count(DISTINCT product |
+-----------------+------------------------+
| Best Buy | 2 |
| Amazon | 3 |
+-----------------+------------------------+
You're not going to be able to run this query through a model's search call, because that's set up to always return instances of a model, whereas what you're wanting is raw results. The following code should do the trick:
ThinkingSphinx::Connection.take do |connection|
result = connection.execute <<-SQL
SELECT merchant_name, COUNT(distinct part_number)
FROM product_core
GROUP BY merchant_name
SQL
result.to_a
end
Or, I think this will work to go through a normal search call:
Product.search(
select: "merchant_name, COUNT(distinct part_number) AS count",
group_by: :merchant_name,
middleware: ThinkingSphinx::Middlewares::RAW_ONLY
)

How do I know the query_group of a query which was run?

I need help in figuring out what was the query_group of a query which was run on redshift. I have set a query_group in the wlm config and want to make sure the query is getting executed from that query group.
query_group is the part of WLM(workload management) configuration which enables you to manage how to run queries through queues on the Redshift cluster. To use query_group, you have to set up your own queue with query_group name(Label) through the AWS console([Amazon Redshift] -> [Parameter Groups] -> Select parameter group -> [WLM]) or cli in advance.
Here is an example which is snipped from the Redshift doc.
set query_group to 'Monday';
select * from category limit 1;
...
reset query_group
You have to set the query_group before starting the query which you want to assign to the specific queue, and reset the query_group after finishing.
You can track the queries of query_group as following. 'label' is the name of query_group.
select query, pid, substring, elapsed, label
from svl_qlog where label ='Monday'
order by query;
query | pid | substring | elapsed | label
------+------+------------------------------------+-----------+--------
789 | 6084 | select * from category limit 1; | 65468 | Monday
790 | 6084 | select query, trim(label) from ... | 1260327 | Monday
791 | 6084 | select * from svl_qlog where .. | 2293547 | Monday
792 | 6084 | select count(*) from bigsales; | 108235617 | Monday
...
This document is good to understand how WLM works and use it.
http://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
This link is about query_group.
http://docs.aws.amazon.com/redshift/latest/dg/r_query_group.html

Resources