Background
I am an iOS developer, and we use CoreData which uses sqlite database to store data in the disk in our project. Several days before one of our users said that his interface is not fluent in some case when using our app which version is 2.9.9. After some efforts we finally found that it is due to the bad efficiency when querying records from sqlite. But after updating to the latest version 3.0.6, the issue disappeared.
Analyze
(1) when querying records from sqlite, the SQL query is
'SELECT * FROM ZAPIOBJECT WHERE ZAPIOBJECTID = "xxx" AND Z_ENT == 34'
In the version 2.9.9 of our app, the schema of table ‘ZAPIOBJECT’ of the sqlite shows
'CREATE INDEX ZAPIOBJECT_Z_ENT_INDEX ON ZAPIOBJECT (Z_ENT);'
'CREATE INDEX ZAPIOBJECT_ZAPIOBJECTID_INDEX ON ZAPIOBJECT (ZAPIOBJECTID);'
and the query plan shows
'0 0 0 SEARCH TABLE ZAPIOBJECT AS t0 USING INDEX ZAPIOBJECT_Z_ENT_INDEX (Z_ENT=?)’
which uses the less efficient index ‘Z_ENT’ (cost ~4s for 1 row).
(2) In the version 3.0.6 of our app, the SQL query is the same:
'SELECT * FROM ZAPIOBJECT WHERE ZAPIOBJECTID = "xxx" AND Z_ENT == 34'
but the schema of table ‘ZAPIOBJECT’ of the sqlite shows:
‘CREATE INDEX ZAPIOBJECT_Z_ENT_INDEX ON ZAPIOBJECT (Z_ENT);’
‘CREATE INDEX Z_APIObject_apiObjectID ON ZAPIOBJECT (ZAPIOBJECTID COLLATE BINARY ASC);’
and the query plan shows
‘0 0 0 SEARCH TABLE ZAPIOBJECT AS t0 USING INDEX Z_APIObject_apiObjectID (ZAPIOBJECTID=?)’
which uses the more efficient index ‘ZAPIOBJECTID’ (cost ~0.03s for 1 row).
(3) the total number of records in the table 'ZAPIOBJECT' is about 130000, and the index ‘ZAPIOBJECTID’ which distinct count is more than 90000 is created by us, while the index ‘Z_ENT’ which distinct count is only 20 is created by CoreData.
(4) the versions of the sqlites in the two versions of our app are the same 3.8.8.3.
Questions
(1) how sqlite select index when querying records? In the document Query Planning I learn that sqlite would select the best algorithms by itself, however in our case selecting the different index can lead to obvious efficiency.Does the difference between the creation of ‘ZAPIOBJECTID’ in two version of our app lead to the different index adopted by sqlite?
(2) It seems that those users whose system version is lower than iOS 11 would have this issue, so how can we solve this problem for them? Can we set ‘ZAPIOBJECTID’ as the designated index with CoreData API?
SQLite uses the index that results in the lowest number of estimated I/O operations.
The details of that estimation change in every version.
See the Checklist For Avoiding Or Fixing Query Planner Problems.
Related
I have very minimal knowledge in writing dynamic queries. As part some implementation I have a need to write a DB2 DELETE query which should be able to delete the rows, as well as it should return the count of rows affected.
So that this query will be put in a DB2 stored procedure, where I have have this count as an OUT parameter.
I was trying as below, which is returning the count but doesn't delete the rows.
SELECT COUNT(STUDENT_ID) AS DELETE
FROM STUDENT
WHERE STUDENT_LOCATION = 'TNAGAR'
AND DATE(JOINING_DATE) < CURRENT DATE - 120 MONTHS;
However this can be achieved using two individual queries i.e. one for select and another for delete, but I am looking for one single query to achieve this.
If you are using SQL PL procedures in Db2, you can use the GET DIAGNOSTICS statement to return the number of rows affected by a previous insert/update/delete. See the documentation at this page.
Example:
declare v_rows_affected integer default 0;
...
DELETE FROM ...
get diagnostics v_rows_affected = row_count ;
If you are using a programming language other than SQL PL, with access to the SQLCA, then this information is also present in a part of the SQLCA (specifically SQLERRD(3)).
I have a Rails application that holds user data (in an aptly named user_data object). I want to display a summary table that shows me the count of total users and the count of users who are still active (status = 'Active'), created each month for the past 12 months.
In SQL against my Postgres database, I can get the result I want with the following query (the date I use in there is calculated by the application, so you can ignore that aspect):
SELECT total.creation_month,
total.user_count AS total_count,
active.user_count AS active_count
FROM
(SELECT date_trunc('month',"creationDate") AS creation_month,
COUNT("userId") AS user_count
FROM user_data
WHERE "creationDate" >= to_date('2015 12 21', 'YYYY MM DD')
GROUP BY creation_month) AS total
LEFT JOIN
(SELECT date_trunc('month',"creationDate") AS creation_month,
COUNT("userId") AS user_count
FROM user_data
WHERE "creationDate" >= to_date('2015 12 21', 'YYYY MM DD')
AND status = 'Active'
GROUP BY creation_month) AS active
ON total.creation_month = active.creation_month
ORDER BY creation_month ASC
How do I write this query with ActiveRecord?
I previously had just the total user count grouped by month in my display, but I am struggling with how to add in the additional column of active user counts.
My application is on Ruby 2.1.4 and Rails 4.1.6.
I gave up on trying to do this the ActiveRecord way. Instead I just constructed my query into a string and passed the string into
ActiveRecord::Base.connection.execute(sql_string)
This had the side effect that my result set came out as a array instead of a set of objects. So getting at the values went from a syntax (where user_data is the name assigned to a single record from the result set) like
user_data.total_count
to
user_data['total_count']
But that's a minor issue. Not worth the hassle.
I have the following validation:
validates :username, uniqueness: { case_sensitive: false }
Which causes the following query to be run painfully slow:
5,510 ms
SELECT ? AS one FROM "users" WHERE (LOWER("users"."username") = LOWER(?) AND "users"."id" != ?) LIMIT ?
Explain plan
1 Query plan Limit (cost=0.03..4.03 rows=1 width=0)
2 Query plan -> Index Scan using idx_users_lower_username on users (cost=0.03..4.03 rows=1 width=0)
3 Query plan Index Cond: ?
4 Query plan Filter: ?
The index was created in my structure.sql using CREATE INDEX idx_users_lower_username ON users USING btree (lower((username)::text)); See my question How to create index on LOWER("users"."username") in Rails (using postgres) for more on this.
This is using the index I set and still takes over 5 seconds? What's wrong here?
There are several different, interrelated things going on here. Exactly how you carry out the changes depends on how you manage changes to your database structure. The most common way is to use Rails migrations, but your linked question suggests you're not doing that. So I'll speak mostly in SQL, and you can adapt that to your method.
Use a sargable WHERE clause
Your WHERE clause isn't sargable. That means it's written in a way that prevents the dbms from using an index. To create an index PostgreSQL can use here . . .
create index on "users" (lower("username") varchar_pattern_ops);
Now queries on lowercased usernames can use that index.
explain analyze
select *
from users
where lower(username) = lower('9LCDgRHk7kIXehk6LESDqHBJCt9wmA');
It might appear as if PostgreSQL must lowercase every username in the table, but its query planner is smart enough to see that the expression lower(username) is itself indexed. PostgreSQL uses an index scan.
"Index Scan using users_lower_idx on users (cost=0.43..8.45 rows=1 width=35) (actual time=0.034..0.035 rows=1 loops=1)"
" Index Cond: (lower((username)::text) = 'b0sa9malg7yt1shssajrynqhiddm5d'::text)"
"Total runtime: 0.058 ms"
This table has a million rows of random-ish data; the query returns very, very quickly. It's just about equally fast with the additional condition on "id", but the LIMIT clause slows it down a lot. "Slows it down a lot" doesn't mean it's slow; it still returns in less than 0.1 ms.
Also, here the varchar_pattern_ops lets queries that use the LIKE operator use the index.
explain analyze
select *
from users
where lower(username) like 'b%'
"Bitmap Heap Scan on users (cost=1075.12..9875.78 rows=30303 width=35) (actual time=10.217..91.030 rows=31785 loops=1)"
" Filter: (lower((username)::text) ~~ 'b%'::text)"
" -> Bitmap Index Scan on users_lower_idx (cost=0.00..1067.54 rows=31111 width=0) (actual time=8.648..8.648 rows=31785 loops=1)"
" Index Cond: ((lower((username)::text) ~>=~ 'b'::text) AND (lower((username)::text) ~<~ 'c'::text))"
"Total runtime: 93.541 ms"
Only 94 ms to select and return 30k rows from a million.
Queries on very small tables might use a sequential scan even though there's a usable index. I wouldn't worry about that if I were you.
Enforce uniqueness in the database
If you're expecting any bursts of traffic, you should enforce uniqueness in the database. I do this all the time, regardless of any expectations (guesses) about traffic.
The RailsGuides Active Record Validations includes this slightly misleading or confusing paragraph about the "uniqueness" helper.
This helper validates that the attribute's value is unique right
before the object gets saved. It does not create a uniqueness
constraint in the database, so it may happen that two different
database connections create two records with the same value for a
column that you intend to be unique. To avoid that, you must create a
unique index on both columns in your database. See the MySQL manual
for more details about multiple column indexes.
It clearly says that, in fact, it doesn't guarantee uniqueness. The misleading part is about creating a unique index on "both columns". If you want "username" to be unique, you need to declare a unique constraint on the column "username".
alter table "users"
add constraint constraint_name unique (username);
Case-sensitivity
In SQL databases, case-sensitivity is determined by collation. Collation is part of the SQL standards.
In PostgreSQL, you can set collation at the database level, at the column level, at the index level, and at the query level. Values come from the locales the operating system exposes at the time you create a new database cluster using initdb.
On Linux systems, you probably have no case-insensitive collations. That's one reason we have to jump through rather more hoops than people who target SQL Server and Oracle.
try to run the query in psql using explain analyze, so you make sure postgres is running fine, because apparently the index and query are right.
if it is fast in psql, then there is a problem with your rails code.
this query against a 3k records table gave this result (in my local dev machine):
app=# explain analyze SELECT id AS one FROM "users" WHERE (LOWER(email) = LOWER('marcus#marcus.marcus') AND "users"."id" != 2000);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on users (cost=4.43..58.06 rows=19 width=4) (actual time=0.101..0.101 rows=0 loops=1)
Recheck Cond: (lower((email)::text) = 'marcus#marcus.marcus'::text)
Filter: (id <> 2000)
-> Bitmap Index Scan on users_lower_idx (cost=0.00..4.43 rows=19 width=0) (actual time=0.097..0.097 rows=0 loops=1)
Index Cond: (lower((email)::text) = 'marcus#marcus.marcus'::text)
Total runtime: 0.144 ms
(6 rows)
I am using sqlite3 database in my project. for that I can retrive the data from the Database using following query "select * from tablename"..
But I want to take the hundred sequence records from the database, like If I scroll the UITableView based on the I want to take 100 100 records.
I have tried the following things,
SELECT * FROM mytable ORDER BY record_date DESC LIMIT 100; - It retrives only 100 records.When I scroll the table i want to fetch the next 100 records and show it.
Is it possible to do it
Please Guide me.
You could simply use the OFFSET clause, but this would still force the database to compute all the records that you're skipping over, so it would become inefficient for a larger table.
What you should do is to save the last record_date value of the previous page, and continue with the following ones:
SELECT *
FROM MyTable
WHERE record_date < ?
ORDER BY record_date DESC
LIMIT 100
See https://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor for details.
Suppose you want to find the last record entered into the database (highest ID) matching a string: Model.where(:name => 'Joe'). There are 100,000+ records. There are many matches (say thousands).
What is the most efficient way to do this? Does PostgreSQL need to find all the records, or can it just find the last one? Is this a particularly slow query?
Working in Rails 3.0.7, Ruby 1.9.2 and PostgreSQL 8.3.
The important part here is to have a matching index. You can try this small test setup:
Create schema xfor testing:
-- DROP SCHEMA x CASCADE; -- to wipe it all for a retest or when done.
CREATE SCHEMA x;
CREATE TABLE x.tbl(id serial, name text);
Insert 10000 random rows:
INSERT INTO x.tbl(name) SELECT 'x' || generate_series(1,10000);
Insert another 10000 rows with repeating names:
INSERT INTO x.tbl(name) SELECT 'y' || generate_series(1,10000)%20;
Delete random 10% to make it more real life:
DELETE FROM x.tbl WHERE random() < 0.1;
ANALYZE x.tbl;
Query can look like this:
SELECT *
FROM x.tbl
WHERE name = 'y17'
ORDER BY id DESC
LIMIT 1;
--> Total runtime: 5.535 ms
CREATE INDEX tbl_name_idx on x.tbl(name);
--> Total runtime: 1.228 ms
DROP INDEX x.tbl_name_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id);
--> Total runtime: 0.053 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
--> Total runtime: 0.048 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_idx on x.tbl(name);
CLUSTER x.tbl using tbl_name_idx;
--> Total runtime: 1.144 ms
DROP INDEX x.tbl_name_id_idx;
CREATE INDEX tbl_name_id_idx on x.tbl(name, id DESC);
CLUSTER x.tbl using tbl_name_id_idx;
--> Total runtime: 0.047 ms
Conclusion
With a fitting index, the query performs more than 100x faster.
Top performer is a multicolumn index with the filter column first and the sort column last.
Matching sort order in the index helps a little in this case.
Clustering helps with the simple index, because still many columns have to be read from the table, and these can be found in adjacent blocks after clustering. It doesn't help with the multicolumn index in this case, because only one record has to be fetched from the table.
Read more about multicolumn indexes in the manual.
All of these effects grow with the size of the table. 10000 rows of two tiny columns is just a very small test case.
You can put the query together in Rails and the ORM will write the proper SQL:
Model.where(:name=>"Joe").order('created_at DESC').first
This should not result in retrieving all Model records, nor even a table scan.
This is probably the easiest:
SELECT [columns] FROM [table] WHERE [criteria] ORDER BY [id column] DESC LIMIT 1
Note: Indexing is important here. A huge DB will be slow to search no matter how you do it if you're not indexing the right way.