I need to traverse all the files in the folder structure of the directory accessed by the app from shared servers. With the inclusion static libraries I'm able to access various servers and the files shared in the them. List of all servers are stored in NSArray
I need to traverse through all folders shared by server to store all files in a container. I have used recursion but that has huge impact on the performance in case number of folders and sub folders increase.
Can anyone suggest any algorithm or logic to traverse the directory structure.
Kindly refer below illustration to have an idea of structure.
One of the possibility could be usage of threads but how to divide the logic to iterate all folders for files so that threads can work on them parallel.
Being a mobile app I don't have a luxury of memory.
Remark: "I don't have a luxury of memory." - ask an engineer who worked in the 70's. He will say that the 1GB of RAM you have in your iPhone is more than enough.
To the point: are you sure that it is indeed the recursion itself that has such a great impact on performance? Of course, there are algorithms for traversing a tree data structure (such as a directory in the filesystem) without recursion, using an explicit stack, but that really is painful.
Instead, make sure you obtain only the necessary information, so do not, for example, get all the attributes and hard link count and birthday and... and... of a file if all you need is its full path.
Related
I am working on automating the translation workflow and improving the Localization process as a whole of a Rails website. I am using SimpleBackend so only YAML files are used for storing translations.
The current locales directory consists of folders, then sub-folders (in some cases) and those sub-folders containing yml files. I am considering to integrate the project with some third-party tool like Transifex for translation management so may be using a single YAML file for each language may be good for management of workflow.
If someone can highlight the pros and cons of both structures then it would be really helpful to decide whether I should switch from nested file structure to single file pattern or not. Also, the project is an Open-Source project with active contributors and so thinking for a long-term solution.
Thanks!
I think whatever tools you are using to make the process flow smoothly factors a lot in this decision. You should explore how exactly Transifex wants things to be structured in output, and try to keep your current input structure, and give that a shot before making a decision.
However, in my opinion, for a large app with a lot of translatable text, my preference would be to allow for multiple yaml files in your default locale, and one or two consolidated yaml files for each foreign translation. If there isn't a lot of translatable text in your app, maybe a single file is fine for you, but given it's already split up, there's a good chance that's the better choice. On a team with many contributors you can end up with a very high churn file (maybe with a lot of merge conflicts) that everyone changes all the time.
Splitting into separate files lets you logically separate out text to match a domain in your app, like a separate yaml file for mailers (or even each mailer), and one for each domain (or controller). Either way, it puts you in control of your organization strategy.
However, there isn't a lot of value, IMO in separating your foreign translations to mirror that structure. The systems I have experience with (not Transifex) generate your foreign translation files for you, so you just need to sync with the web interface and commit the results.
I have created a backtracking algorithm, but after a while the program runs out of memory, since the amount of results is so huge. So I am about to find a way to store the resulting Data Tree onto the Filesystem, rather than the Memory/RAM.
So I am looking for a convenient way to do that, such that there are as few I/O actions as possible, but also a moderate usage of RAM (max ≈2GB).
One way could be, to store each node into a single file, what would probably lead to billions of small files. Or store each level of the tree into a single file, but than those files can grow very large. If those files grow too large, the content wont fit into RAM for reading the data and bring me back to the original problem.
Would it be a good Idea to have files for Nodes and others for the links?
On the iOS filesystem, is there a way to optimize file access performance by using a tiered directory structure vs. a flat directory structure?
Specifically, my app has Objects that each contain a number of images and data files. A user could create thousands of these Objects and I need to optimize access to one image for ~100 arbitrary Objects at a time.
In this situation, how should I organize files on the filesystem? Would a tiered directory structure be faster than a flat one? And if so, how should I structure the tiered system (i.e. how many tiers, and how many subdirectories / files per tier)?
THANKS!
Well first of all you might as well try it with a flat structure to see if it is slow or not. Perhaps apple has put in code to optimize how files are found and you don't even need to worry about this. You can probably build out the whole app and just test how quickly it loads and see if that meets your requirements.
If you need to speed it up I would suggest trying to make some sort of structure based on the name of the file. You could have a folder which has all of the items beginning with the letter 'a' or 'b' and so on and so forth. This would split it into 26 folders which should significantly decrease the amount of items in each. Depending on how you name the files you might want a different scheme so that each of the folders had a similar amount of items in it
If you are using Core Data, you could always just enable the Allows External Storage option in the attribute of your model and let the system decide where it should go.
That would be a decent first step to see if the performance is ok.
I am developing a web application to accept a bunch of text and attachments (1 or more) via email, web and other methods.
I am planning to build a single interface, mostly a web service to accept this content.
What design considerations should I make?
I am building the app using ASP.NET MVC 2.
Should the attachments be saved to disk or in the database?
Should the unified single interface be a web service?
Pros and cons to using web services to upload files
as with any acceptance of files i'd be checking them for viruses or the like. i'm very nervous about files transmitted from the internet.
i always like putting my files in a database because it's neater i find. i hate having files over the network with folders needing rights etc. i know there are people that prefer it the other way so i guess my answer is also depends on personal preference.
i like the db approach because i can more easily tie files to records and do searches. if you have a file system then you still need to store info about the file plus the extra work of storing it.
then if you need to move files around you also need to possibly modify references in the database.
then again, you need to allocate enough space to grow the database and then cater for multiple databases perhaps as storage runs out.
so i guess if you're downloading large files then yeah maybe i can see the point of a file system as it's easier to grow it. if you have small text files then maybe a database will work.
I'm having a HDF5 file with one-dimensional (N x 1) dataset of compound elements - actually it's a time series. The data is first collected offline into the HFD5 file, and then analyzed. During analysis most of the data turns out to be uninteresting, and only some parts of it are interesting. Since the datasets can be quite big, I would like to get rid of the uninteresting elements, while keeping the interesting ones. For instance, keep elements 0-100 and 200-300 and 350-400 of a 500-element dataset, dump the rest. But how?
Does anybody have experience on how accomplish this with HDF5? Apparently it could be done in several ways, at least:
(Obvious solution), create a new fresh file and write the necessary data there, element by element. Then delete the old file.
Or, into the old file, create a new fresh dataset, write the necessary data there, unlink the old dataset using H5Gunlink(), and get rid of the unclaimed free space by running the file through h5repack.
Or, move the interesting elements within the existing dataset towards the start (e.g. move elements 200-300 to positions 101-201 and elements 350-400 to positions 202-252). Then call H5Dset_extent() to reduce the size of the dataset. Then maybe run through h5repack to release the free space.
Since the files can be quite big even when the uninteresting elements have been removed, I'd rather not rewrite them (it would take a long time), but it seems to be required to actually release the free space. Any hints from HDF5 experts?
HDF5 (at least the version I am used to, 1.6.9) does not allow deletion. Actually, it does, but it does not free the used space, with the result that you still have a huge file. As you said, you can use h5repack, but it's a waste of time and resources.
Something that you can do is to have a lateral dataset containing a boolean value, telling you which values are "alive" and which ones have been removed. This does not make the file smaller, but at least it gives you a fast way to perform deletion.
An alternative is to define a slab on your array, copy the relevant data, then delete the old array, or always access the data through the slab, and then redefine it as you need (I've never done it, though, so I'm not sure if it's possible, but it should)
Finally, you can use the hdf5 mounting strategy to have your datasets in an "attached" hdf5 file you mount on your root hdf5. When you want to delete the stuff, copy the interesting data in another mounted file, unmount the old file and remove it, then remount the new file in the proper place. This solution can be messy (as you have multiple files around) but it allows you to free space and to operate only on subparts of your data tree, instead of using the repack.
Copying the data or using h5repack as you have described are the two usual ways of 'shrinking' the data in an HDF5 file, unfortunately.
The problem, as you may have guessed, is that an HDF5 file has a complicated internal structure (the file format is here, for anyone who is curious), so deleting and shrinking things just leaves holes in an identical-sized file. Recent versions of the HDF5 library can track the freed space and re-use it, but your use case doesn't seem to be able to take advantage of that.
As the other answer has mentioned, you might be able to use external links or the virtual dataset feature to construct HDF5 files that were more amenable to the sort of manipulation you would be doing, but I suspect that you'll still be copying a lot of data and this would definitely add additional complexity and file management overhead.
H5Gunlink() has been deprecated, by the way. H5Ldelete() is the preferred replacement.