On the iOS filesystem, is there a way to optimize file access performance by using a tiered directory structure vs. a flat directory structure?
Specifically, my app has Objects that each contain a number of images and data files. A user could create thousands of these Objects and I need to optimize access to one image for ~100 arbitrary Objects at a time.
In this situation, how should I organize files on the filesystem? Would a tiered directory structure be faster than a flat one? And if so, how should I structure the tiered system (i.e. how many tiers, and how many subdirectories / files per tier)?
THANKS!
Well first of all you might as well try it with a flat structure to see if it is slow or not. Perhaps apple has put in code to optimize how files are found and you don't even need to worry about this. You can probably build out the whole app and just test how quickly it loads and see if that meets your requirements.
If you need to speed it up I would suggest trying to make some sort of structure based on the name of the file. You could have a folder which has all of the items beginning with the letter 'a' or 'b' and so on and so forth. This would split it into 26 folders which should significantly decrease the amount of items in each. Depending on how you name the files you might want a different scheme so that each of the folders had a similar amount of items in it
If you are using Core Data, you could always just enable the Allows External Storage option in the attribute of your model and let the system decide where it should go.
That would be a decent first step to see if the performance is ok.
Related
I am working on automating the translation workflow and improving the Localization process as a whole of a Rails website. I am using SimpleBackend so only YAML files are used for storing translations.
The current locales directory consists of folders, then sub-folders (in some cases) and those sub-folders containing yml files. I am considering to integrate the project with some third-party tool like Transifex for translation management so may be using a single YAML file for each language may be good for management of workflow.
If someone can highlight the pros and cons of both structures then it would be really helpful to decide whether I should switch from nested file structure to single file pattern or not. Also, the project is an Open-Source project with active contributors and so thinking for a long-term solution.
Thanks!
I think whatever tools you are using to make the process flow smoothly factors a lot in this decision. You should explore how exactly Transifex wants things to be structured in output, and try to keep your current input structure, and give that a shot before making a decision.
However, in my opinion, for a large app with a lot of translatable text, my preference would be to allow for multiple yaml files in your default locale, and one or two consolidated yaml files for each foreign translation. If there isn't a lot of translatable text in your app, maybe a single file is fine for you, but given it's already split up, there's a good chance that's the better choice. On a team with many contributors you can end up with a very high churn file (maybe with a lot of merge conflicts) that everyone changes all the time.
Splitting into separate files lets you logically separate out text to match a domain in your app, like a separate yaml file for mailers (or even each mailer), and one for each domain (or controller). Either way, it puts you in control of your organization strategy.
However, there isn't a lot of value, IMO in separating your foreign translations to mirror that structure. The systems I have experience with (not Transifex) generate your foreign translation files for you, so you just need to sync with the web interface and commit the results.
I am building an iOS application that will randomly generate sentences (think Mad Libs) where the data used for generation is in multiple tables. This will be used to generate scenarios for training lifeguards. Each table contains an item name, the words that will be used when selected, and different values that determine what can go togeather.
Using two of the 10 tables shown above, the application may pick a location of Deep Water. Then it needs to pick an appropriate activity for in the water, such as Breath holding, but not Running.
I have been looking at Core Data for storage but that seems to be more for data that is changing often by the user and users would never change the data stored. I do want to be able to update the tables myself fairly easily. What would be the optimal solution to do this? The ways I think of are:
Some kind of SQL DB, though my tables again aren't changing and
aren't relationshipable.
2-D arrays written into the source code. Not pretty to work with or read, but my knowledge of regex makes converting from TSV to array fairly easy.
TSV files attached to the project. Better organization itself but take some research on how to access.
Some other method Apple has that I do not know about.
is it recommended for creating each folder for each user to store the image uploaded? or should i just create an image folder and put everything inside?
Note: for web application such as ebay where each user will not have a a lot of image uploaded, but they might be millions of users using it.
There are other considerations to decide whether to create a folder for each user or
just a big image folder containing all images.
What is important? Ease of maintenance? Performance? Any Resources limitation ?
Will you ever need to persist those images?
What are the operations you will normally perform on the images?
How often?
Having a folder will be more organized, but you may need to maintain millions of folders and it will require more resources.
It will be difficult to maintain if the images are not followed any naming convention based on the user id or name if you put them in one single folder.
Before you decide which way to go, you need to look at the overall operations.
I need to traverse all the files in the folder structure of the directory accessed by the app from shared servers. With the inclusion static libraries I'm able to access various servers and the files shared in the them. List of all servers are stored in NSArray
I need to traverse through all folders shared by server to store all files in a container. I have used recursion but that has huge impact on the performance in case number of folders and sub folders increase.
Can anyone suggest any algorithm or logic to traverse the directory structure.
Kindly refer below illustration to have an idea of structure.
One of the possibility could be usage of threads but how to divide the logic to iterate all folders for files so that threads can work on them parallel.
Being a mobile app I don't have a luxury of memory.
Remark: "I don't have a luxury of memory." - ask an engineer who worked in the 70's. He will say that the 1GB of RAM you have in your iPhone is more than enough.
To the point: are you sure that it is indeed the recursion itself that has such a great impact on performance? Of course, there are algorithms for traversing a tree data structure (such as a directory in the filesystem) without recursion, using an explicit stack, but that really is painful.
Instead, make sure you obtain only the necessary information, so do not, for example, get all the attributes and hard link count and birthday and... and... of a file if all you need is its full path.
I'm having a HDF5 file with one-dimensional (N x 1) dataset of compound elements - actually it's a time series. The data is first collected offline into the HFD5 file, and then analyzed. During analysis most of the data turns out to be uninteresting, and only some parts of it are interesting. Since the datasets can be quite big, I would like to get rid of the uninteresting elements, while keeping the interesting ones. For instance, keep elements 0-100 and 200-300 and 350-400 of a 500-element dataset, dump the rest. But how?
Does anybody have experience on how accomplish this with HDF5? Apparently it could be done in several ways, at least:
(Obvious solution), create a new fresh file and write the necessary data there, element by element. Then delete the old file.
Or, into the old file, create a new fresh dataset, write the necessary data there, unlink the old dataset using H5Gunlink(), and get rid of the unclaimed free space by running the file through h5repack.
Or, move the interesting elements within the existing dataset towards the start (e.g. move elements 200-300 to positions 101-201 and elements 350-400 to positions 202-252). Then call H5Dset_extent() to reduce the size of the dataset. Then maybe run through h5repack to release the free space.
Since the files can be quite big even when the uninteresting elements have been removed, I'd rather not rewrite them (it would take a long time), but it seems to be required to actually release the free space. Any hints from HDF5 experts?
HDF5 (at least the version I am used to, 1.6.9) does not allow deletion. Actually, it does, but it does not free the used space, with the result that you still have a huge file. As you said, you can use h5repack, but it's a waste of time and resources.
Something that you can do is to have a lateral dataset containing a boolean value, telling you which values are "alive" and which ones have been removed. This does not make the file smaller, but at least it gives you a fast way to perform deletion.
An alternative is to define a slab on your array, copy the relevant data, then delete the old array, or always access the data through the slab, and then redefine it as you need (I've never done it, though, so I'm not sure if it's possible, but it should)
Finally, you can use the hdf5 mounting strategy to have your datasets in an "attached" hdf5 file you mount on your root hdf5. When you want to delete the stuff, copy the interesting data in another mounted file, unmount the old file and remove it, then remount the new file in the proper place. This solution can be messy (as you have multiple files around) but it allows you to free space and to operate only on subparts of your data tree, instead of using the repack.
Copying the data or using h5repack as you have described are the two usual ways of 'shrinking' the data in an HDF5 file, unfortunately.
The problem, as you may have guessed, is that an HDF5 file has a complicated internal structure (the file format is here, for anyone who is curious), so deleting and shrinking things just leaves holes in an identical-sized file. Recent versions of the HDF5 library can track the freed space and re-use it, but your use case doesn't seem to be able to take advantage of that.
As the other answer has mentioned, you might be able to use external links or the virtual dataset feature to construct HDF5 files that were more amenable to the sort of manipulation you would be doing, but I suspect that you'll still be copying a lot of data and this would definitely add additional complexity and file management overhead.
H5Gunlink() has been deprecated, by the way. H5Ldelete() is the preferred replacement.