I'm trying to determine if my sqlite access to a database is thread-safe on iOS. I'm writing a non App Store app (or possibly a launch daemon), so Apple's approval isn't an issue. The database in question is the built-in sms.db, so for sure the OS is also accessing this database for reading and writing. I only want to be able to safely read it.
I've read this about reading from multiple processes with sqlite:
Multiple processes can have the same database open at the same time.
Multiple processes can be doing a SELECT at the same time. But only
one process can be making changes to the database at any moment in
time, however.
I understand that thread-safety can be compiled out of sqlite, and that sqlite3_threadsafe() can be used to test for this. Running this on iOS 5.0.1
int safe = sqlite3_threadsafe();
yields a result of 2. According to this, that means mutex locking is available. But, that doesn't necessarily mean it's in use.
I'm not entirely clear on whether thread-safety is dynamically enabled on a per connection, per database, or global basis.
I have also read this. It looks like sqlite3_config() can be used to enable safe multi-threading, but of course, I have no control, or visibility into how the OS itself may have used this call (do I?). If I were to make that call again in my app, would it make it safe to read the database, or would it only deconflict concurrent access for multiple threads in my app that used the same sqlite3 database handle?
Anyway, my question is ...
can I safely read this database that's also accessed by iOS, and if so, how?
I've never used SQLite, but I've spent a decent amount of time reading its docs because I plan on using it in the future (and the docs are interesting). I'd say that thread safety is independent of whether multiple processes can access the same database file at once. SQLite, regardless of what threading mode it is in, will lock the database file, so that multiple processes can read from the database at once but only one can write.
Thread safety only affects how your process can use SQLite. Without any thread safety, you can only call SQLite functions from one thread. But it should still, say, take an EXCLUSIVE lock before writing, so that other processes can't corrupt the database file. Thread safety just protects data in your process's memory from getting corrupted if you use multiple threads. So I don't think you ever need to worry about what another process (in this case iOS) is doing with an SQLite database.
Edit: To clarify, any time you write to the database, including a plain INSERT/UPDATE/DELETE, it will automatically take an EXCLUSIVE lock, write to the database, then release the lock. (And it actually takes a SHARED lock, then a RESERVED lock, then a PENDING lock, then an EXCLUSIVE lock before writing.) By default, if the database is already locked (say from another process), then SQLite will return SQLITE_BUSY without waiting. You can call sqlite3_busy_timeout() to tell it to wait longer.
I don't think any of this is news to you, but a few thoughts:
In terms of enabling multi-threading (either serialized or multi-threaded), the general counsel is that one can invoke sqlite3_config() (but you may have to do a shutdown first as suggested in the docs or as discussed on SO here) to enable the sort of multi-threading you want. That may be of diminished usefulness here, though, where you have no control over what sort of access iOS is requesting of sqlite and/or this database.
Thus, I would have thought that, from an academic perspective, it would not be safe to read this system database (because as you say, you have no assurance of what the OS is doing). But I wouldn't be surprised if iOS is opening the database using whatever the default mode is, so from a more pragmatic perspective, you might be fine.
Clearly, for most users concerned about multi-threaded access within a single app, the best counsel would be to bypass the sqlite3_config() silliness and just simply ensure coordinated access through your own GCD serial queue (i.e., have a dedicated queue through which all database interactions go through, gracefully eliminating the multi-thread issue altogether). Sadly, that's not an option here because you're trying to coordinate database interaction with iOS itself.
Related
I think this is a popular antipattern that happens either standalone, for example activeJob local task with async, or coming from controllers, because then the strategy of the server must be taken into account.
My question is, what cautions should one take in the code when forking inside a thread (think inside of a ActiveJob task) and then even threading it?
The main worries I have seen online are:
Needs to lose and reopen the database connections after the fork. It seems that nowadays activeRecord takes care of it, doesn't it?
Access to the common Logger could be complicated. Somehow it seems to work.
Concurrent was expected to be problematic too but current versions are patched to detect that a fork has happened and threads are dead. Still it seems that one needs to make sure of doing, at the end of the forked process, a fine shutdown of any Rails::Concurrent pool that could have active or pending jobs. I think that it is enough
ActiveJob::Base.queue_adapter.shutdown
but perhaps it could miss some tasks that have not started or tasks under other Concurrent queue. In fact I think it already happens if one uses Concurrent::Future in a controller managed by the puma webserver. Generically I try to insert
Concurrent::global_io_executor.shutdown
Concurrent::global_io_executor.wait_for_termination
Extra problems I have found are resource-related: the postgres server is not ready to manage so many connections by default. Perhaps it could be sensible to reduce the size of the connection pool before the fork. And the inotify watcher gem also exhausts resource, when launched in development. Production is fine in both cases.
TL;DR; - I'm against doing it but many of us do it anyway and ignore the fact that it's unsafe... things break too rarely.
It is a simple fact that calling fork in a multi-threaded process may cause the new child to crash / deadlock / spin and may also cause other (harder to isolate) bugs.
This has nothing to do with Ruby, this is related to the locking mechanisms that safeguard critical sections and core process functionality such opening/closing files, allocating memory and any user created mutex / spinlock, etc'.
Why is it risky?
When calling fork the new process inherits all the state of the previous process but only the thread that called fork (all other threads do not exist in the new process).
This means that if any of the other threads was inside a critical section (i.e., allocating memory, opening a file, etc'), that critical section would remain locked for the lifetime of the new process, possibly causing deadlocks or unexpected errors.
Why do we ignore it?
In practical terms, the risk of something seriously breaking is often very low and most developers hadn't both encounter the issue and recognized its cause. Open files can be manually (if not automatically) closed, which leaves us mostly with the question of critical sections.
We can often reset our own critical sections which leaves mostly the system's critical sections...
The system's core critical sections that can be effected by fork are not that many. The main one is the memory allocator which can hardly ever break. Often the malloc implementation has multiple "arenas", each with its own critical section and it would be a long-shot to hit the system's underlying page allocation (i.e., mmap).
So is it safe?
No. Things still break, they just break rarely and when they do it isn't always obvious. Also, a parent process can sometime catch some of these errors and retry / recuperate and there are other ways to handle the risks.
Should I do it?
I wouldn't recommend to do it, but it depends. If you can handle an error, sure, go ahead. If not, that's a no.
Anyway, it's usually much better to use IPC to forward a message to a background process so that process perform any required fork / task.
The pattern can occur naturally when a Rails controller is combined with a webserver. The situation is different depending if the webserver is threaded, forked or evented, but the final conclusion is the same; that it is safe.
Fork + fork and thread + fork should not present problems of multiple access to the database or multiple execution of the same code, as only the current thread is active in the children.
Event + fork could be a source of troubles if the event machine is still active in the forked thread. Fortunately most designs generate a separate thread for the control of the event loop.
I have an app that uses Realm as a staging database. It receives information from a bluetooth device, processes it, and sends the processed result to a server.
The incoming data from bluetooth gets stored in a Realm table (table1). A separate thread reads data from the Realm database, processes it, and stores it into a second table (table2) for uploading to a server. When it pulls this data and successfully processed it, it deletes it from table1.
The third thread pulls data from table2, and when it successfully sends, removes it from table2.
I'm using a database here in case, for whatever reason, the app is killed - data won't be lost... it will just resume where it left off when the app is restarted. But as you can see, the database is not something that hangs around (it's not like an address book or something... it is just temporary staging)
What I notice is that no matter what the heck I do, the realm database file just increases in size over time. I'll end up with a database that if I open it, will have one record in it, but the database file on disk could be 10s of MB in size if the app is running long enough.
Data is being processed on different background queues so as to not block any UX (one of the reasons I'm using Realm instead of CoreData). But I'm using things like autoreleasepools and the invalidate command to avoid objects that are read from having copies made (as suggested by many realm questions/answers)
What gives? I know I don't have a code sample here, but this just seems like a basic garbage collection problem in Realm. I've seen other questions related to this where people are like "why is my database so huge", and the answers suggest doing things like "writeCopyToPath", but that feels like an incredible hack, and regardless, it would be very difficult - this app is meant to be constantly connected and monitoring a bluetooth device, so to do this, it would mean stopping, making sure all threads that might alter the database are quiesced, doing the copy to compact the db, and then starting everything back up again. That just seems nonsensical to me. I might interrupt user operations for example. I don't want a user to not be able to do something because I decided it was time to do database maintenance.
I feel like I'm either missing some incredibly fundamental point in how to make Realm not keep junk around, or Realm is just the completely wrong solution for my problem. I've never seen this problem with databases - adding and deleting lots of records... quickly... seems like something a database should just be able to do without exploding in size.
Are you making sure that the background thread is not holding on to old versions of the Realm, preventing the space from being reused?
Quote from the docs (https://realm.io/docs/swift/latest/#seeing-changes-from-other-threads):
If a thread has no runloop (which is generally the case in a background thread), then Realm.refresh() must be called manually in order to advance the transaction to the most recent state.
Failing to refresh Realms on a regular basis could lead to some transaction versions becoming “pinned”, preventing Realm from reusing the disk space used by that version, leading to larger file sizes.
The documentation for App Extension under "Sharing Data with Your Containing App" uses NSUserDefaults to do so, and write a bit further that
"to avoid data corruption, you must synchronize data accesses. Use Core Data, SQLite, or >Posix locks to help coordinate data access in a shared container."
But when I look documentation for NSUserDefaults says
"The NSUserDefaults class is thread-safe."
So do I need to use some kind of lock when using NSUserDefaults between my extension and container app or not?
Thread safety refers to the ability to change in-memory data structures from one thread in a way that doesn't damage the ability of other threads to also view or change those structures. When you use NSUserDefaults to share data between an app extension and its containing app, you're not sharing in-memory data between multiple threads, you're sharing on-disk data between multiple processes, so discussions of thread safety do not apply.
The documentation for NSUserDefaults synchronize doesn't say for sure, but one can almost certainly assume that it uses an atomic file write — that is, there's no danger of one process reading a file that's been partially written by another process. If you're concerned about race conditions or other timing issues between when your app writes defaults and your extension reads them (or vice versa), just be sure to synchronize immediately after important writes and immediately before important reads.
The comment about data corruption applies to plain file read/write operations — naively reading or writing a file in two processes can result in data corruption, because one process might read a partially written file or partially overwrite file contents. If you're doing your own file I/O directly, you need some sort of coordination mechanism (like NSFileCoordinator, but beware that only works correctly between iOS apps/extensions in iOS 8.2 and newer). Or you can use higher level utilities that do their own coordination, like CFPreferences/NSUserDefaults, SQLite, Core Data, or Posix file locks.
TLDR: Yes, you can safely use NSUserDefaults to share between an extension and its containing app. Just follow the recommendations in Apple's app extensions guide.
The documentation isn't overly clear, as it uses the NSUserDefaults as the main example of one way to share data but also covers other options without much of a pause. You should be safe enough to use NSUserDefaults without attempting to get a lock first, I've been building a Today extension using it and I've had no issues with data corruption. I am calling synchronize after each write though, just to ensure the data is immediately stored.
I am not sure if it is thread safe across extensions because of the following quote from docs:
When you set a default value, it’s changed synchronously within your process, and asynchronously to persistent storage and other processes.
In other words, it seems to indicate that it's thread-safe within your process, but NOT across processes (ex. extensions).
It could be that calling synchronize fixes this, but docs say:
this method is unnecessary and shouldn't be used
I recently modified my iOS app to enable serialized mode for both a database encrypted using SQLCipher and a non-encrypted database (also SQLite). I also maintain a static sqlite3 connection for each database, and each is only opened once (by simply checking for null values) and shared throughout the lifetime of the app.
The app is required to have a sync-like behavior which will download a ton of records from a remote database at regular intervals using a soap request and update the contents of the local encrypted database. Of course, the person using the app may or may not be updating or reading from the database, depending on what they're doing, so I made the changes mentioned in the above paragraph.
When doing short term testing, there doesn't appear to be any issue with how things work, and I've yet experience any problem.
However, some users are reporting that they've lost access to the encrypted database, and I'm trying to figure out why.
My thoughts are as follows: Methods written by another developer declared all sqlite3_stmt's to be static (I believe this code was in the problematic release). In the past I've noticed crashes when two threads using a particular method run simultaneously. One thread finalizes, modifies or replaces a sqlite3_stmt while another thread is using it. A crash doesn't always occur because he has wrapped most of his SQLite code in try/catch blocks. If it's true that SQLite uses prepare and finalize to implement locking, could the orphaning of sqlite3_stmt's which occurs due to their static nature in this context be putting the database into an inoperable state? For example, when a statement acquires an exclusive lock after being stepped is replaced by an assignment in the same method running in another thread?
I realize that this doesn't necessarily mean that the database will become permanently unusable, but, consider this scenario:
At some point during the app's lifetime it will re-key the encrypted database and that key is stored in another database. Suppose that it successfully re-keys the encrypted database, but then the new key is not stored in the other database because of what I mentioned above.
Provided that the database hasn't become corrupted at some point (I'm not really counting on this being the case), this is the only explanation I can come up with for why the user may not be able to use the encrypted database after restarting the iOS app, seeing as the app would be the only one to access the database file.
Being that I can't recreate this issue, I can only speculate about what the reasoning might be. What thoughts do you have? Does this seem like a plausible scenario for something that happens rarely? Do you have another idea of something to look into?
If the database is rekeyed, and the key for the database is not successfully stored in the other database, then it could certainly cause the problem.
I've been reading up on SQLite3 included in the iOS firmware which might serve my needs for the app i'm writiung.
What I can't figure out is if it is persistent or goes away like some objects do.
For example if I do sqlite3_open() which appears to be a C function rather than an Objective-C object, if I open this at the start of my application, will it stay persistent until I close it no matter how many views I push/pop all over the place.
Obviously that would depend on where I put it but if I was doing a universal app and had some central functions for loading / saving data which were common to both iPhone/iPad, if, in my didFinishLoading: I put a call to open the SQLite database and then called various exec's of queries, would it remain persistent throughout the lifecycle of the application.
or
Am I better off opening and closing as needed, i'm coming from a PHP background so i'd normally open a database at the start of the script and then run many queries and then finally close it before browser output.
From the 1,000,000th i've learned over the last few months about iOS programming, I think the latter might be the better way as there's possibility of app exit prematurely or it going to background.
I'd just like a second opinion on my thinking please.
I dont know directly, but I think you are right - you only need to open it once at the start of your app.
Looking at sqlitepersistentobjects, an ORM framework for iOS, it only opens the DB when its first used, and never closes it except when there is a problem opening it :)
Single opened sqlite database used throughout the app from different places in your app is fine.
You are using word "persistent" which is confusing. What you mean is "reuse of single connection, for executing different statements in the app, possibly from different threads". Persistence has completely different meaning in context of databases - it means that the requested modification of data has been safely stored to media (disk, flash drive) and the device can even unexpectedly shut down without affecting written data.
It's recommended to keep running sqlite statements from a single, dedicated thread.
It's not recommended to connect to sqlite database from different processes for and executing parallel modifications.
A good alternative solution is to use sqlite async extension which sends all writes to a dedicated, background thread.
You can check out https://github.com/mirek/CoreSQLite3 framework if you want to use custom built (newer version) of sqlite.