Informix Query Tuning - informix

I have a table called lead, which have about 500 thousand records and we need the following query to get executed.
SELECT skip 300000 first 75 *
FROM lead
WHERE ((enrollment_period IS NULL) OR
(enrollment_period IN ('FT2015','F16','SUM2016','FALL2016','FALL2017','SP17')))
ORDER BY created_on DESC
The table lead has id column as the primary key and thus have clustered index in that column. This query is taking about 12 - 13 mins. When I added a non-clustered index on created_on and enrollment_period columns, it came down to 4 - 5 mins. Then I changed the clustered index from id column to this index, execution time came down further to about 50 seconds now.
Is there any other optimization scope available for this query?
Overall, is there any other change that can be done so that the query will execute faster?
Thanks in Advance,
Manohar

Related

How does sqlite select the index when querying records?

Background 
I am an iOS developer, and we use CoreData which uses sqlite database to store data in the disk in our project. Several days before one of our users said that his interface is not fluent in some case when using our app which version is 2.9.9. After some efforts we finally found that it is due to the bad efficiency when querying records from sqlite. But after updating to the latest version 3.0.6, the issue disappeared. 
Analyze
(1) when querying records from sqlite, the SQL query is
'SELECT * FROM ZAPIOBJECT WHERE ZAPIOBJECTID = "xxx" AND Z_ENT == 34'
In the version 2.9.9 of our app, the schema of table ‘ZAPIOBJECT’ of the sqlite shows
'CREATE INDEX ZAPIOBJECT_Z_ENT_INDEX ON ZAPIOBJECT (Z_ENT);'
'CREATE INDEX ZAPIOBJECT_ZAPIOBJECTID_INDEX ON ZAPIOBJECT (ZAPIOBJECTID);'
and the query plan shows
'0 0 0 SEARCH TABLE ZAPIOBJECT AS t0 USING INDEX ZAPIOBJECT_Z_ENT_INDEX (Z_ENT=?)’
which uses the less efficient index ‘Z_ENT’ (cost ~4s for 1 row).
(2) In the version 3.0.6 of our app, the SQL query is the same:
'SELECT * FROM ZAPIOBJECT WHERE ZAPIOBJECTID = "xxx" AND Z_ENT == 34'
but the schema of table ‘ZAPIOBJECT’ of the sqlite shows:
‘CREATE INDEX ZAPIOBJECT_Z_ENT_INDEX ON ZAPIOBJECT (Z_ENT);’
‘CREATE INDEX Z_APIObject_apiObjectID ON ZAPIOBJECT (ZAPIOBJECTID COLLATE BINARY ASC);’
and the query plan shows
‘0 0 0 SEARCH TABLE ZAPIOBJECT AS t0 USING INDEX Z_APIObject_apiObjectID (ZAPIOBJECTID=?)’
which uses the more efficient index ‘ZAPIOBJECTID’ (cost ~0.03s for 1 row).
(3) the total number of records in the table 'ZAPIOBJECT' is about 130000, and the index ‘ZAPIOBJECTID’ which distinct count is more than 90000 is created by us, while the index ‘Z_ENT’ which distinct count is only 20 is created by CoreData.
(4) the versions of the sqlites in the two versions of our app are the same 3.8.8.3.
Questions
(1) how sqlite select index when querying records? In the document Query Planning I learn that sqlite would select the best algorithms by itself, however in our case selecting the different index can lead to obvious efficiency.Does the difference between the creation of ‘ZAPIOBJECTID’ in two version of our app lead to the different index adopted by sqlite?
(2) It seems that those users whose system version is lower than iOS 11 would have this issue, so how can we solve this problem for them? Can we set ‘ZAPIOBJECTID’ as the designated index with CoreData API?
SQLite uses the index that results in the lowest number of estimated I/O operations.
The details of that estimation change in every version.
See the Checklist For Avoiding Or Fixing Query Planner Problems.

Postgres Common Table Expression query with Ruby on Rails

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord doesn't support CTEs.
I have a table called user_activity_transitions which contains a series of records of a user activity being started and stopped (each row refers to a change of state: e.g started or stopped).
One user_activity_id might have a lot of couples started-stopped, which are in 2 different rows.
It's also possible that there is only "started" if the activity is currently going on and hasn't been stopped. The sort_key starts at 0 with the first ever state and increments by 10 for each state change.
id to_state sort_key user_activity_id created_at
1 started 0 18 2014-11-15 16:56:00
2 stopped 10 18 2014-11-15 16:57:00
3 started 20 18 2014-11-15 16:58:00
4 stopped 30 18 2014-11-15 16:59:00
5 started 40 18 2014-11-15 17:00:00
What I want is the following output, grouping couples of started-stopped together to be able to calculate duration etc.
user_activity_id started_created_at stopped_created_at
18 2014-11-15 16:56:00 2014-11-15 16:57:00
18 2014-11-15 16:58:00 2014-11-15 16:59:00
18 2014-11-15 17:00:00 null
The way the table is implemented makes it much harder to run that query but much more flexible for future changes (e.g new intermediary states), so that's not going to be revised.
My Postgres query (and the associated code in Rails):
query = <<-SQL
with started as (
select
id,
sort_key,
user_activity_id,
created_at as started_created_at
from
user_activity_transitions
where
sort_key % 4 = 0
), stopped as (
select
id,
sort_key-10 as sort_key2,
user_activity_id,
created_at as stopped_created_at
from
user_activity_transitions
where
sort_key % 4 = 2
)
select
started.user_activity_id AS user_activity_id,
started.started_created_at AS started_created_at,
stopped.stopped_created_at AS stopped_created_at
FROM
started
left join stopped on stopped.sort_key2 = started.sort_key
and stopped.user_activity_id = started.user_activity_id
SQL
results = ActiveRecord::Base.connection.execute(query)
What it does is "trick" SQL into joining 2 consecutive rows based on a modulus check on the sort key.
The query works fine. But using this raw AR call annoys me, especially since what connection.execute returns is quite messy. I basically need to loop through the results and put it in the right hash.
2 questions:
Is there a way to get rid of the CTE and run the same query using
Rails magic?
If not, is there a better way to get the results I want in a nice-looking hash?
Bear in mind that I'm quite new to Rails and not a query expert so there might be an obvious improvement...
Thanks a lot!
While Rails does not directly support CTEs, you can emulate a single CTE and still take advantage of ActiveRecord. Instead of a CTE, use a from subquery.
Thing
.from(
# Using a subquery in place of a single CTE
Thing
.select(
'*',
%{row_number() over(
partition by
this, that
order by
created_at desc
) as rank
}
)
:things
)
.where(rank: 1)
This is not exactly the same as, but equivalent to...
with ranked_things as (
select
*,
row_number() over(
partition by
this, that
order by
created_at desc
) as rank
)
select *
from ranked_things
where rank = 1
I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.
As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.
Is there a way to get rid of the CTE and run the same query using Rails magic?
Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.
If not, is there a better way to get the results I want in a nice-looking hash?
I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?
[
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
{"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]
I assume you have a model called UserActivityTransition you use for manipulating the data. You can use the model to get the results as well.
results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC
Note that these "virtual" attributes will not be visible when inspecting the result but they're there.

Only return one record per hour over a time period in Rails

I have written a Rails 4 app that accepts and plots sensor data. Sometimes there are 10 points per hour (but this number is not fixed). I'm plotting the data and doing a simple query of Points.all to get all the data points.
In order to reduce the query size, I would like to only return one record per hour. It doesn't matter which record is returned. The first record each hour using the created_at field would be fine.
How do I construct a query to do this?
You can get first one, but maybe average value is better. All you need to do is to group it by hour. I am not 100% about sqlite syntax but something in this sense:
connection.execute("SELECT AVG(READING_VALUE) FROM POINTS GROUP BY STRFTIME('%Y%m%d%H0', CREATED_AT)")
Inspired from this answer, here is an alternative which retrieves the latest record in that hour (if you don't want to average):
Point.from(
Point.select("max(unix_timestamp(created_at)) as max_timestamp")
.group("HOUR(created_at)") # subquery
)
.joins("INNER JOIN points ON subquery.max_timestamp = unix_timestamp(created_at)")
This will result in the following query:
SELECT `points`.*
FROM (
SELECT max(unix_timestamp(created_at)) as max_timestamp
FROM `points`
GROUP BY HOUR(created_at)
) subquery
INNER JOIN points ON subquery.max_timestamp = unix_timestamp(created_at)
You can also use MIN instead to get the first record of the hour, if you like, as well.

How can I speed up or optimize this SQLite query for iOS?

I have a pretty simple DB structure. I have 12 columns in a single table, most are varchar(<50), with about 8500 rows.
When I perform the following query on an iPhone 4, I've been averaging 2.5-3 seconds for results:
SELECT * FROM names ORDER BY name COLLATE NOCASE ASC LIMIT 20
Doesn't seem like this sort of thing should be so slow. Interestingly, the same query from the same app running on a 2nd gen iPod is faster by about 1.5 seconds. That part is beyond me.
I have other queries that have the same issue:
SELECT * FROM names WHERE SEX = ?1 AND ORIGIN = ?2 ORDER BY name COLLATE NOCASE ASC LIMIT 20
and
SELECT * FROM names WHERE name LIKE ?3 AND SEX = ?1 AND ORIGIN = ?2 ORDER BY name COLLATE NOCASE ASC LIMIT 20
etc.
I've added an index on the SQLite db: CREATE INDEX names_idx ON names (name, origin, sex, meaning) where name, origin, sex and meaning are the columns I tend to query against with WHERE and LIKE operators.
Any thoughts on improving the performance of these searches or is this about as atomic as it gets?
The index CREATE INDEX names_idx ON names (name, origin, sex, meaning) will only be used, I believe, if your query includes ALL those columns. If only some are used in a query, the index can't be used.
Going on your first query: SELECT * FROM names ORDER BY name COLLATE NOCASE ASC LIMIT 20 - I would suggest adding an index on name, just by itself, i.e. CREATE INDEX names_idx1 ON names (name). That should in theory speed up that query.
If you want other indexes with combined columns for other common queries, fair enough, and it may improve query speed, but remember it'll increase your database size.
What is the most used search criteria ? if you search for names for example you could create more tables according to the name initials. A table for names which start with "A" etc. The same for genre. This would improve your search performance in some cases.

How to efficiently search for last record matching a condition in Rails and PostgreSQL?

Suppose you want to find the last record entered into the database (highest ID) matching a string: Model.where(:name => 'Joe'). There are 100,000+ records. There are many matches (say thousands).
What is the most efficient way to do this? Does PostgreSQL need to find all the records, or can it just find the last one? Is this a particularly slow query?
Working in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.
The important part here is to have a matching index. You can try this small test setup:
Create schema xfor testing:
-- DROP SCHEMA x CASCADE; -- to wipe it all for a retest or when done.
CREATE SCHEMA x;
CREATE TABLE x.tbl(id serial, name text);
Insert 10000 random rows:
INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000);
Insert another 10000 rows with repeating names:
INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20;
Delete random 10% to make it more real life:
DELETE FROM x.tbl WHERE random() < 0.1;
ANALYZE x.tbl;
Query can look like this:
SELECT *
FROM x.tbl
WHERE name = 'y17'
ORDER BY id DESC
LIMIT 1;
--> Total runtime: 5.535 ms
CREATE INDEX tbl_name_idx on x.tbl(name);
--> Total runtime: 1.228 ms
DROP INDEX x.tbl_name_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id);
--> Total runtime: 0.053 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
--> Total runtime: 0.048 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_idx on x.tbl(name);
CLUSTER x.tbl using tbl_name_idx;
--> Total runtime: 1.144 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
CLUSTER x.tbl using tbl_name_id_idx;
--> Total runtime: 0.047 ms
Conclusion
With a fitting index, the query performs more than 100x faster.
Top performer is a multicolumn index with the filter column first and the sort column last.
Matching sort order in the index helps a little in this case.
Clustering helps with the simple index, because still many columns have to be read from the table, and these can be found in adjacent blocks after clustering. It doesn't help with the multicolumn index in this case, because only one record has to be fetched from the table.
Read more about multicolumn indexes in the manual.
All of these effects grow with the size of the table. 10000 rows of two tiny columns is just a very small test case.
You can put the query together in Rails and the ORM will write the proper SQL:
Model.where(:name=>"Joe").order('created_at DESC').first
This should not result in retrieving all Model records, nor even a table scan.
This is probably the easiest:
SELECT [columns] FROM [table] WHERE [criteria] ORDER BY [id column] DESC LIMIT 1
Note: Indexing is important here. A huge DB will be slow to search no matter how you do it if you're not indexing the right way.

Resources