SQLite iPad performance problems during mass insert and select - ios

I've been working on an iPad app and all is working fine besides sqlite performance. Now, this app needs to handle a lot of data.
At the moment I'm having 2 issues, one is when I'm populating the database. The current test is 710 records, each with 20 columns and the app can't handle that. This is the main issue, I'm not sure it would ever process anymore than this amount, or even anywhere near this amount but it's what I'm aiming for. My thoughts are; is sqlite even enough to handle this much data, on an iPad.
The second is when pulling data from the database to populate a table view - each row calls for 4 records and the time it takes to call all of these is causing the table to lag slightly whilst it's scrolling. Could I get away with processing the queries in a separate thread? I have tried something similar to this, but I had no luck.
Any help would me amazing, thanks a lot.

In my past project experience, I have seen that index on the tables have slowed down the insert. I dropped the index just before insert bulk, insert the records and recreate the index - I had seen significant difference. Hope this helps

Related

Autosave to Firestore lagging

I am autosaving to Firestore every time the user modifies anything in the text field. My code is:
self.collection.document(noteID!).updateData(["note.title": noteTitle.text, "note.lastUpdatedTimestamp": time])
It has started lagging very heavily all of a sudden. Is there a way to batch writes within a document to avoid this heavy lagging?
There is nothing terribly inefficient about that line of code that could be improved. Splitting this up into an update per field would not help. It's more likely the case that your network connection is slower than normal, or that somehow one of the fields is much longer than usual, which increases the time it takes to update.

Delphi TFDMemTable, CloneCursor and source table out of sync, unless Refresh is called

the code i'm working on makes heavy usage of TFDMemTables, and clones of those tables using CloneCursor.
Sometimes, under specific conditions which I am unable to identify, the source table and its clone become out of sync: the data between them may be different, the record count as well.
Calling Refresh on the cloned table puts things back in order.
From my understanding, CloneCursor is used to address the same underlying memory where data is stored, meaning alterations to the underlying data from any of the two pointers should reflect on the other table, yet allow the user to have separate filter / record positioning per "view". so how can it possibly go out of sync?
I built a small simulator, where I can insert / delete / filter records in either the table or its clone, and observe the impact on the other one. Changes were reflected correctly.
Another downside of Refresh is that it slows the execution tremendously, if overused.
Has anyone faced similar issues or found explanations / documentation regarding this matter?
Edit:
to clarify what I mean by "out of sync", it means reading a value from the table using FieldByName will return X prior to Refresh, and Y post-refresh. I was not able to reproduce this behavior on the simulator mentioned above.

Sqlite randomly slowing down on simple (but big) table on iOS

I'm working on an enterprise sales app, for iPad, that uses Sqlite as its internal database, and a strange behaviour recently showed up.
I have a huge table that is filled with information from several other tables (sort of like a "materialized view"), which can contain over 2 million rows, depending on how the user is set up. When the user wants to search for an item, the app performs a query on this huge table that has an indexed column and on other columns that are used as filters and/or metadata. I'll post the query and the basic idea below. Anyway, this query usually returns in 2~3 seconds on an iPad 4th gen, no more than that, and this is just fine. This table is dropped, re-created and filled every time the user taps a button to synchronize his data with our server.
However, recently the same query in the same table (with no relevant changes at all), randomly started to take 40~50 seconds. If you do the same thing later, on the same device, with the same filters (or even changing the filters!), the same query on the same table takes the 2~3 seconds again. I haven't found any specific situation that causes this slowdown, the app is the only one running at that time. The device is not the problem, we've seen this happen on at least 5 different iPads, one is an iPad 3 and the others are iPads 4th gen.
I don't think it is some sort of caching, since the app does not cache anything, and these times are rather random. Sometimes they take 40 seconds for 10 times in a row, then suddenly it starts to take only 2 seconds again, and the same thing the other way. The only thing that is clear to me is that this slowdown only occurs after intensive use (1 - 2 days of work using the app), so I'm also having troubles to cause this behaviour while debugging on the iPad I have with me.
What I've tried:
Attach Instruments to the process and check what resources are being used during the slowdown. The app does INTENSIVE use of the iPad's 'disk' (flash memory) during the whole time. I don't have the example to analyse it again now, but I think the CPU usage was around 30%. The RAM usage is stable at 90~100MB, which is normal for our app.
Run VACCUM on the db; - reduced ~50MB on a database I had as example. Went from ~600MB to ~550MB.
Run ANALYZE on the db; - didn't see any improvements
Run REINDEX on the db; - seems to be helping a little, but it's not solving the problem.
Kill the process and start over - nothing changes
The huge table is constructed as the following, and does NOT have any foreign keys or other any other constraint:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
And the query that is made to find the products is:
SELECT
PRODUCT_ID
,UNIQUE_ID
<all other required columns, ~20 columns>
FROM
FMV_CATALOG
WHERE
UNIQUE_ID = '<some id>_<other id>'
AND PRODUCT_NAME LIKE '%iPhone%'
<and other optional, rarely used, filters.>
I'm totally out of ideas, so any help will be appreciated.
Thanks!
UPDATE (more info):
Important informations that I forgot to mention, Rob reminded me of it. My database connection is always open, it is closed only when the user logs out. We've noticed a huge performance on all parts of the app when we kept the connection opened, since we have hundreds of small queries that are executed on other situations (but not while browsing/searching the products catalog).
The query used to create the index is below:
CREATE INDEX IDX_MV_CATALOG ON MV_CATALOG(UNIQUE_ID);
Also, even though the column is named UNIQUE_ID, it is not unique. It was supposed to be originally, but now it is repeated N times. I know this is wrong, we'll change that ASAP.
This "UNIQUE_ID" (which is not really unique) is filled by joining the IDs of two other tables. This way, our "materialized view" removes the need of at least three joins when the user searches on our catalog, which improved our query times from ~20 seconds to ~2 seconds.
We don't call sqlite3 API directly on our queries, we have developed a wrapper class around it and we've been using it for at least 2 years now. And it's the first time ever we've been on this situation, but again it's the first time we're handling so much data.
A couple of thoughts:
You're not showing us the creation of any index on FMV_CATALOG. If nothing else, if UNIQUE_ID is, as the name suggests, unique, then I'd be inclined to define the table with a PRIMARY KEY:
CREATE TABLE FMV_CATALOG(
UNIQUE_ID TEXT PRIMARY KEY,
PRODUCT_ID INTEGER,
<bunch of metadata/filtered columns - total of 20 columns>
);
You should try using the SQLite EXPLAIN QUERY PLAN command to look at the query and look at its plan and make sure it's availing itself of your index. Do this as it is, and then again with PRIMARY KEY (and perhaps if that still doesn't do it, an index on the fields in your WHERE clause), and make sure the final query is definitely using your index.
I'm not sure why, if you have the unique id, why you're also looking at the other fields. If adding of the primary key (and possibly other index(es)) doesn't solve the problem, I might try just retrieving the record based upon the unique id, and then check for conformance with your other parameters in code. I don't believe you need to do this, but it's a worst case scenario.
In terms of why it will slow down, that's harder to guess what's going on without seeing the code (which I'm sure is too complicated to share in a simple S.O. question). I could imagine strange behavior if, for example, you fail to sqlite3_finalize after one of your sqlite3_prepare_v2 statements or if you accidentally failed to close the database and then opened it again elsewhere. I could imagine performance issues that might come in place if the sequence of sqlite3 calls wasn't precisely right. Use of something like FMDB can minimize the chance of those sorts of issues occurring (as well as simplifying your SQLite code). Or, if that's too radical of a step, try writing your own macros that call the SQLite calls, but also log the fact that you've called that sqlite3 function, and pour through that log and double check the sequence of your SQLite calls.
The only thing I can suggest is whether you can construct a simplified project that can reproduce the aberrant behavior. Tracking down a Heisenbug can be infuriating: Unless you can consistently reproduce the bug, it's hard to track down.

In iOS, what's the fastest way to load a sorted list of contacts?

When implementing a view similar to an ABPeoplePickerNavigationController, I'm not able to sort the list very quickly. The native contacts app loads the list instantly. I'm dealing with 4000+ contacts, so keeping load times down is important. I can't use the ABPeoplePickerNavigationController because I need to do a lot of custom UI work.
I was using ABAddressBookCopyArrayOfAllPeople, then placing the people in UILocalizedIndexedCollation sections using sectionForObject and then sorting the sections using sortedArrayFromArray. For 4000 contacts, my time was about 8 seconds.
I then switched to using ABAddressBookCopyArrayOfAllSources and for every source, ABAddressBookCopyArrayOfAllPeopleInSourceWithSortOrdering and just appending each source's contacts to the unsorted array, then using the same UILocalizedIndexedCollation technique. This dropped the time down to about 5 seconds, I guess since the sections were mostly sorted already.
Is there any way to improve on this? Any techniques I'm not aware of? Can I somehow load an ABPeoplePickerNavigationController data source without the view and use that? Or is there a faster sorting method?
Thanks very much.
Is there any way to improve on this?
It may help to realize that you'll never need to display 4000 names immediately. All you really need is the names for whatever section the user is looking at, and you can probably find and sort those names much more quickly. Let's say the user is viewing the A section initially, so maybe you use a predicate to pick out names starting with A. Out of 4000 names, perhaps 400 start with A, and that'll cut your sorting time down to a fraction of a second. You can continue fetching and sorting sections in the background.
The point is that it doesn't really matter how long it takes to sort all the records. What matters is how long it takes to put the records that the user wants on the screen.
where are those contacts stored in? Are they in CoreData?
I would use CoreData together with a NSFetchedResultsController, which handles everything, so that you don't have to care about loading times, because the NSFetchedResultsController loads just as much contacts as needed for the currently visible cells of your tableView.
I hope this helps you
Linard

How do I overcome poor SSIS debugging performance?

I’m using SSIS to synchronize data between two databases. I’ve used SSIS and DTS in the past, but I generally write an application for things of this nature (I’m coder and it just comes easier to me).
In my package I use a SQL Task that returns about 15,000 rows. I’ve hooked that up to a Foreach Container, and within that I assign the resultset column values to variables, and then map those variables to parameters that are fed to another SQL Task.
The problem I’m having is with debugging, and not just more complicated debugging like breakpoints and evaluating values at runtime. I simply mean that if I run this with debugging rather than without, it takes hours to complete.
I ended up rewriting the process in Delphi, and the following is what I came up with:
Full Push of Data:
This pulls 15,000 rows, updates a destination table for each row, then pulls 11,000 rows and updates a destination table for each row.
Debugging:
Delphi App: 139s
SSIS: 4 hours, 46 minutes
Not Debugging:
Delphi App: 132s
SSIS: 384s
Update of Data:
This pulls 3,000 rows, but no updates are needed or made to the destination table. It then pulls 11,000 rows but, again, no updates are needed or made to the destination table.
Debugging:
Delphi App: 42s
SSIS: 1 hours, 10 minutes
Not Debugging:
Delphi App: 34s
SSIS: 205s
The odd thing is, I get the feeling that most of this time spent debugging is just updating UI elements in Visual Studio. If I watch the progress tab, a node is added to a tree for each iteration (thousands total), and this gets slower and slower as the process goes on. Trying to stop debugging usually doesn’t work, as Visual Studio seems caught in a loop updating the UI. If I check the profiler for SQL Server no actual work is being done. I'm not sure if the machine matters, but it should be more than up to the job (quad core, 4 gig of ram, 512 mb video card).
Is this sort of behavior normal? As I’ve said I’m a coder by trade, so I have no problem writing an app for this sort of thing (in fact it takes much less time for me to code an application than “draw” it in SSIS, but I figure that margin will shrink with more work done in SSIS), but I’m trying to figure out where something like SSIS and DTS would fit into my toolbox. So far nothing about it has really impressed me. Maybe I’m misusing or abusing SSIS in some way?
Any help would be greatly appreciated, thanks in advance!
SSIS control flow and loops are not very high performance, and not designed for processing these amounts of data. Especially during the debugging - before and after each task execution, debugger sends notifications to designer process, which updates colors of the shapes and this could be slow.
You could get much better performance using data flow. Data flow does not operate with single rows, it works with buffers of rows - much faster, and the debugger is only notified about beginning/end of the buffers - so its impact is less noticeable.
SSIS is not designed to do a foreach like that. If you are doing something for each row coming in, you probably want to read those into a dataflow and then using a lookup or merge join, determine whether to do an INSERT (these happen in bulk) or a database command object for multiple SQL UPDATE commands (a better performing option is to batch these into staging table and do a single UPDATE).
In another typical sync situation, you read all the data into a staging table, and do a SQL Server UPDATE on the existing rows (INNER JOIN) and INSERT on the new rows (LEFT JOIN, rhs IS NULL). There is also the possibility of using linked servers, but joins over that can be slow, since all (or a lot of) the data may have to come across the network.
I have SSIS packages that regular import 24 million rows, including handling data conversion and validation and slowly changing dimensions using the TableDifference component, and it performs relatively quickly for that large amount of data versus a separate client program.
I have noticed this is the behavior, I had an SSIS package for moves, that did somewhere in the neighborhood of 3 million entries, it was not possible to debug as it would run for about 3-4 days.
SSIS is still the way I did it, I just don't "debug" with SSIS, I run them when working with the full datasets. If I must debug, I use very small datasets.

Resources