I'm working on an enterprise sales app, for iPad, that uses Sqlite as its internal database, and a strange behaviour recently showed up.
I have a huge table that is filled with information from several other tables (sort of like a "materialized view"), which can contain over 2 million rows, depending on how the user is set up. When the user wants to search for an item, the app performs a query on this huge table that has an indexed column and on other columns that are used as filters and/or metadata. I'll post the query and the basic idea below. Anyway, this query usually returns in 2~3 seconds on an iPad 4th gen, no more than that, and this is just fine. This table is dropped, re-created and filled every time the user taps a button to synchronize his data with our server.
However, recently the same query in the same table (with no relevant changes at all), randomly started to take 40~50 seconds. If you do the same thing later, on the same device, with the same filters (or even changing the filters!), the same query on the same table takes the 2~3 seconds again. I haven't found any specific situation that causes this slowdown, the app is the only one running at that time. The device is not the problem, we've seen this happen on at least 5 different iPads, one is an iPad 3 and the others are iPads 4th gen.
I don't think it is some sort of caching, since the app does not cache anything, and these times are rather random. Sometimes they take 40 seconds for 10 times in a row, then suddenly it starts to take only 2 seconds again, and the same thing the other way. The only thing that is clear to me is that this slowdown only occurs after intensive use (1 - 2 days of work using the app), so I'm also having troubles to cause this behaviour while debugging on the iPad I have with me.
What I've tried:
Attach Instruments to the process and check what resources are being used during the slowdown. The app does INTENSIVE use of the iPad's 'disk' (flash memory) during the whole time. I don't have the example to analyse it again now, but I think the CPU usage was around 30%. The RAM usage is stable at 90~100MB, which is normal for our app.
Run VACCUM on the db; - reduced ~50MB on a database I had as example. Went from ~600MB to ~550MB.
Run ANALYZE on the db; - didn't see any improvements
Run REINDEX on the db; - seems to be helping a little, but it's not solving the problem.
Kill the process and start over - nothing changes
The huge table is constructed as the following, and does NOT have any foreign keys or other any other constraint:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
And the query that is made to find the products is:
SELECT
PRODUCT_ID
,UNIQUE_ID
<all other required columns, ~20 columns>
FROM
FMV_CATALOG
WHERE
UNIQUE_ID = '<some id>_<other id>'
AND PRODUCT_NAME LIKE '%iPhone%'
<and other optional, rarely used, filters.>
I'm totally out of ideas, so any help will be appreciated.
Thanks!
UPDATE (more info):
Important informations that I forgot to mention, Rob reminded me of it. My database connection is always open, it is closed only when the user logs out. We've noticed a huge performance on all parts of the app when we kept the connection opened, since we have hundreds of small queries that are executed on other situations (but not while browsing/searching the products catalog).
The query used to create the index is below:
CREATE INDEX IDX_MV_CATALOG ON MV_CATALOG(UNIQUE_ID);
Also, even though the column is named UNIQUE_ID, it is not unique. It was supposed to be originally, but now it is repeated N times. I know this is wrong, we'll change that ASAP.
This "UNIQUE_ID" (which is not really unique) is filled by joining the IDs of two other tables. This way, our "materialized view" removes the need of at least three joins when the user searches on our catalog, which improved our query times from ~20 seconds to ~2 seconds.
We don't call sqlite3 API directly on our queries, we have developed a wrapper class around it and we've been using it for at least 2 years now. And it's the first time ever we've been on this situation, but again it's the first time we're handling so much data.
A couple of thoughts:
You're not showing us the creation of any index on FMV_CATALOG. If nothing else, if UNIQUE_ID is, as the name suggests, unique, then I'd be inclined to define the table with a PRIMARY KEY:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT PRIMARY KEY,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
You should try using the SQLite EXPLAIN QUERY PLAN command to look at the query and look at its plan and make sure it's availing itself of your index. Do this as it is, and then again with PRIMARY KEY (and perhaps if that still doesn't do it, an index on the fields in your WHERE clause), and make sure the final query is definitely using your index.
I'm not sure why, if you have the unique id, why you're also looking at the other fields. If adding of the primary key (and possibly other index(es)) doesn't solve the problem, I might try just retrieving the record based upon the unique id, and then check for conformance with your other parameters in code. I don't believe you need to do this, but it's a worst case scenario.
In terms of why it will slow down, that's harder to guess what's going on without seeing the code (which I'm sure is too complicated to share in a simple S.O. question). I could imagine strange behavior if, for example, you fail to sqlite3_finalize after one of your sqlite3_prepare_v2 statements or if you accidentally failed to close the database and then opened it again elsewhere. I could imagine performance issues that might come in place if the sequence of sqlite3 calls wasn't precisely right. Use of something like FMDB can minimize the chance of those sorts of issues occurring (as well as simplifying your SQLite code). Or, if that's too radical of a step, try writing your own macros that call the SQLite calls, but also log the fact that you've called that sqlite3 function, and pour through that log and double check the sequence of your SQLite calls.
The only thing I can suggest is whether you can construct a simplified project that can reproduce the aberrant behavior. Tracking down a Heisenbug can be infuriating: Unless you can consistently reproduce the bug, it's hard to track down.
Related
the code i'm working on makes heavy usage of TFDMemTables, and clones of those tables using CloneCursor.
Sometimes, under specific conditions which I am unable to identify, the source table and its clone become out of sync: the data between them may be different, the record count as well.
Calling Refresh on the cloned table puts things back in order.
From my understanding, CloneCursor is used to address the same underlying memory where data is stored, meaning alterations to the underlying data from any of the two pointers should reflect on the other table, yet allow the user to have separate filter / record positioning per "view". so how can it possibly go out of sync?
I built a small simulator, where I can insert / delete / filter records in either the table or its clone, and observe the impact on the other one. Changes were reflected correctly.
Another downside of Refresh is that it slows the execution tremendously, if overused.
Has anyone faced similar issues or found explanations / documentation regarding this matter?
Edit:
to clarify what I mean by "out of sync", it means reading a value from the table using FieldByName will return X prior to Refresh, and Y post-refresh. I was not able to reproduce this behavior on the simulator mentioned above.
Before you link me to a duplicate, please read what I'm asking..
I'm building an app which basically has a list of about 5000 teams. These teams are fairly static (they don't change very often). I would like to observe any time one is changed though as it's essential it get's updated in the app ASAP.
If I include dbTeams.ref.observe(.childAdded, with: {}), it runs each time the app starts, loading over all 5000 records despite having them in the persistent storage already (I have enabled persistence).
Now the documentation says this will happen, I know, but with 5000 records (and potentially way more in the future), I can't have this happen.
My options so far (from what I've found and tried) are:
Add a timestamp to each record and create a custom query to call .childAdded after the last timestamp... This is inefficient. Storing a timestamp for soccer teams which will hardly ever change, is silly. It also means keeping a copy of the last time it was checked.
Create a sub-list within the Teams list. This too is silly as you may as well call .value and get the whole bunch of data in one go.
Just live with it... Fine - until it scales to tens of thousands of records. Not clever either.
It just seems weird that all the other event listeners only fire when they are "supposed to" except this one.
Any help would be appreciated - how do I achieve what I need?
I'm using the latest ASP.NET MVC and Entity Framework (MVC 5.2.2, EF 6.1.2), and latest of Glimpse. I'm working on improving query times to eagerly load an entity with several nested child objects, and have reduced the number of queries by using .Include("Object.Child") to bring in navigation properties. At first, I thought I was getting a good result, seeing the "Total query execution time" in the SQL tab of Glimpse reduce significantly. Yet the "Total connection open time" stays high, and is very long for the resulting combined mega-query. See screenshot below.
I'm wondering if anyone can help me understand what is going on with the differences in the two durations? Glimpse says my command takes <100 ms, but that the SQL connection takes >5 seconds. The query in this case is really messy with lots of joins etc, however it's not clear where the time goes if indeed the query itself finishes in 100 ms.
Note: I've seen the answer about why two durations here, but it doesn't explain the nature of each.
Thanks for asking the question. The timer for the connection duration starts when the connection is opened and finishes when it is closed. To work this out further, how are you using your context/connection, are you sharing it, keeping it around, etc?
After further testing, I think I've figured out what was happening. I saw in another question which suggested that the .Include() approach to eager loading hierarchical entities in Entity Framework can result in complex queries with many joins and duplication of data in the result set. I had a long XML string as one of my properties, so if this was duplicated many times at the database, it would take a long time to return/process.
As a test, I cleared the data-heavy field and reran the query, getting a far shorter "connection" duration (the one list on the right in Glimpse). It went from 9 seconds to under 200 ms total. Based on this I assume the data size was the culprit, and learned my lesson about using large data properties this way.
I'd still be interested to know whether Glimpse could show you the raw data being returned from a query, or even show the size in bytes, along with the record count. This would have likely made this problem evident.
A little late to this question, but I encountered the same problem and was also trying to understand the disparity between my query execution time and connection open time.
FWIW, I discovered that I was passing an enumerable in my view model to the view, rather than a concrete list. Thus, the view was triggering evaluation of the query and prolonging the amount of time that the connection remained open. By passing lists of the items (call .ToList() on the enumerable), I drastically reduced the amount of time for which the connection remained open.
Backstory
This afternoon, I replied to a text from my girlfriend, then apparently neglected to sleep my phone before putting it back in my pocket. When I pulled it back out a few minutes later, my phone had decided to hit "Edit->Clear All" on the conversation, vaporizing two years and two phones worth of SMS history with her. While I have a backup of the phone, it's close to three weeks old at this point, and there's enough solid discussion that I'd like to reconstruct; I've already grabbed a copy of sms.db, but I think the method I used vacuumed the file, so there are no soft-deleted texts in it.
Meat of the Question
I have a three-week old backup of my sms.db, and have access to date copy of her sms.db. I'd like to
export the texts she has but I don't (easy, at least to CSV)
change the "perspective" info (the address field and the sent/received/deleted/unknown field), keeping the timestamp and text
import/merge these new entries into my old sms.db backup
merge this updated backup with my current sms.db (optional/there seems to be an online utility for that)
I don't really know SQL but would be willing to learn; the problem I have is that from what I understand, the tables within sms.db have become more interdependent over the OS's lifespan, and the triggers now call C functions that don't exist outside the phone, so it's not a simple matter of calling a single trigger on multiple entries. Does anyone know of any ways to work around this complexity, or even better, any utilities that have already figured out how to import individual entries into sms.db?
Edit:
I've been examining sms.db, and from what I can tell, the relationships are pretty straightforward:
for message, I need to mostly make sure that the ROWID of any added messages are higher than the current highest ROWID
msg_group holds the message:ROWID of the last message for each contact; I can lookup the correct address within group_member; group_member:group_id corresponds with msg_group:ROWID
msg_group has a hash column; this will probably be the hardest thing to update, since I'm not immediately sure what it's updating, or what hash to use
sqlite_sequencedoesn't seem like it's quite up-to-date; its entries seem to all be smaller than the actual ROWIDs, but I assume this means I won't have to mess with it very much.
I'm not really sure that I'll be able to change msg_pieces at all: it's the table in charge of handling the multiple parts of an MMS message.
Hey did you get this sorted out? if you haven't I suggest taking a look at http://smsmerge.homedns.org/
I have been in a similar position as you have, but I was lucky and had a more recent backup than that.
Let me know if you need a hand with it
I've been working on an iPad app and all is working fine besides sqlite performance. Now, this app needs to handle a lot of data.
At the moment I'm having 2 issues, one is when I'm populating the database. The current test is 710 records, each with 20 columns and the app can't handle that. This is the main issue, I'm not sure it would ever process anymore than this amount, or even anywhere near this amount but it's what I'm aiming for. My thoughts are; is sqlite even enough to handle this much data, on an iPad.
The second is when pulling data from the database to populate a table view - each row calls for 4 records and the time it takes to call all of these is causing the table to lag slightly whilst it's scrolling. Could I get away with processing the queries in a separate thread? I have tried something similar to this, but I had no luck.
Any help would me amazing, thanks a lot.
In my past project experience, I have seen that index on the tables have slowed down the insert. I dropped the index just before insert bulk, insert the records and recreate the index - I had seen significant difference. Hope this helps