Ideal method for storing hierarchical data in HDF5 - storage

Hello Oracles of StackOverflow,
First time I managed to ask a question on stack overflow, so feel free to throw your cabbages at me. (or correct the way I should be asking my question)
I have this problem. I'm using HDF5 to store massive quantities of cookie information.
My Data is structured in the following way:
CookieID -> Event -> Key_value Pair
There are multiple events for each cookieID. But only one key_value pair per event.
I'd like to know what the best way I should store this in a HDF5.
Currently, I'm storing each cookie as a seperate table within a group in the HDF5, using the cookieID as the name of the table. Unfortunately for me, with 10,000,000 cookies, HDF5 (or specifically PyTables) doesn't approve of this type of storage.
Specifically throwing this error:
/CookieData`` is exceeding the recommended maximum number of children (16384)
I'm wondering if you could recommend the best way of storing this information.
Should I create a flat table? Should I keep this method? Is there something else I can do?
Help is appreciated. Thanks for reading.

Several hours of research later, I've discovered that what I was attempting to do was categorically impossible.
The following link gives details as to the impossibility of using HDF5 with variable-length nested children.
I've decided to go with a flat file for the time being and hope that this is more efficient than a database store. The problem with a flat file in the end is that I have to replicate values in the file, which otherwise should not exist.
If anyone else has any better ideas it would be appreciated.

Related

Trying to make a search engine for issues

Our company has a lot of data that are issue which are stored in a database.We want to create a search engine so that people can check how the issues were previously dealt with.We cannot use any 3rd party api as there is sensitive data an we want to keep it as in house. Right now the approach is as following :-
Clean up the data and then use a DOC2VEC to represent each issue as a vector .
Find the closest 5 issue using some distance metric.
The problem is that the results are not at all useful.The problem is most of the data is one liner and some issue description.There are spelling mistakes and stack traces and other things.
Is this the right approch or should we switch to something else?
Right now we are testing on 200K data.
Thanks for the help.

how store dynamic list of sequence numbers in iOS

In my iOS app, I need to keep track of which sequence numbers have already been received from the server and which sequence numbers need to be retrieved. I want to be able to store this in case the app terminates or crashes.
I am trying to decide which storage method I should use: core data, plist etc.
The list of sequence numbers is dynamic and can change a lot. Any pointers on how to decide on storage will be greatly appreciated.
Without more exact details in your question it is hard to give you an accurate answer. However, what can be provided is some insights on the benefits / downfalls of using the storage systems listed above.
I would stay away from using a plist since your data is dynamic and can change a lot. Every time you save to a plist you will need to overwrite the entire file. This means to change a single value you must retrieve all values, make a single change, and save all values back to the plist. This isn't a modular way of doing such saves and can become problematic if you have a lot of information that is changing and needs to be saved all the time. On the up side - setting up a plist save / read write structure is very easy and fast.
NSUserDefaults should be used for just that. Saving user settings and preferences. It is really easy to use NSUserDefaults, but may become very problematic in the long run if you data is very large. Values returned from NSUserDefaults are immutable too. This may or may not be a problem for your needs.
CoreData may be overkill for what you're doing, unless your sequence numbers are very large. Personally, I would go with CoreData knowing how it can handle dynamic values and how fast it is to save objects compared to the plist and NSUserDefaults. The down side is CoreData is a bit of a learning curve. Unless you have used it before, it is easy to go down the wrong path using it.
As far as pointers on which storage option to use, do some research. Make a list of pros and cons of each storage option. Ask yourself how big your data may get, and what is the best solution. You already know the data is dynamic and may change a lot. Look at the performance of each storage solution.
Here are some helpful reading material links straight from Apple:
Plist
NSUserDefaults
CoreData

Best way to store Trivia game data?

I'm creating an iOS trivia game that will have between 1,000 - 10,000 questions in it. Each question will have only two possible answers, so the amount of data per question will be very small.
I'm wondering if I should use Core Data to store the questions or if I can use a large dictionary that I populate when the app loads up?
Would either of those choices work or is there a better solution I haven't considered?
The 'best' way to store these questions depends heavily on your internal data structures, memory usage and source data structures.
How do you receive the questions? If they are an XML then you might like to preserve that structure and implement an XML parser. If Excel format, export to CSV and read from that. JSON: load into an NSDictionary.
If you want to add these into Core Data or sqlite and the source questions are in a different format, you will have to write a parser and importer. Then, if you update questions you will have to create a merge policy etc.
Personally, if you can keep the original format of the data without complicating code/exceeding memory I would keep it simple and go for that - that way, you can replace the source file and it will just work.

Best way for storing 100+ questions and answers

I am designing an exam practice app that will have the following format, requiring the user to rank the answer 1-5 to sections A-E (using a scroll view) for the same question that will be displayed at the top.
Here is an image:
Each question therefore has 5 parts. I am unsure of what is the best way to store the questions and answers. I read something about plists. Would that be the way to do it? If so, could you recommend any tutorials with images?
Just to clarify, the labels A-E are where the text for the subsections will go and the user will have to rank the appropriateness for each of these.
Thank you!
Few options here:
1. Core Data
Prepopulate .sqlite file with questions and include with app. Keep track of user's progress and attempts and whatever other stats.
This approach also give your ability to label questions by a topic (or any other criteria) and present questions to user that they need or that they failed most.
2 Get data from server
A bit more complicated but offers more benefits. With this approach you would be getting questions in json format.
Benefit of this approach is that you can add any number number of questions and tests without resubmitting your app.
3. Store as text/plist with app
Yes, you can also store your questions as text in plist or json format and as app loads populate core data or keep it in memory to display. The latter approach, however, would offer the least amount of benefit and flexibility to the user.
I suggest using SQLite. See Ray Wenderlich's Tutorial on SQLite for iOS
And there is obviously a not so suitable but easy way and that is using property lists as key and value pairs which i do not recommend.

Design pattern for Core Data default entries

I'm planning a new ios app and am not sure of what is the best way to set up a start list for a tableView on first app start. The app uses Core Data (more precisely Magical Record). Should I use some kind of (p)list (dictionary) which gets imported or should I just hard code the default entries like when the user adds something through a formula? Thank you!
There are a load of arguments for and against all different ways of doing this.
Personally I prefer just to hard code all the default entries (if there aren't thousands of them obviously).

Resources