How data stored in Operational Data Store - system-design

Recently I came across ODS(operational Data Store) in my work. I haven't got some points cleared about ODS.
Is ODS a SQL type or NOSQL type?
Can ODS extract data from both SQL and NOSQL db? Is so how they are stored in ODS at same time?
Does ODSes hold real time data or nearly real time data?
Is ODSes main purpose is to act as sink to multiple dbs and do some analytics over the data?

Related

Storing Statistics Data in InfluxDB

We want to analyze the usage of our application and therefor want to store the usage data in an influxdb. We want to store data like Session Start Time, Browser, Browser Version, OS, Language, Available Languages, etc.
We then want to know e.g. what the top 5 Browsers are (with percent of sessions or percent of users), or which the most often used OS is.
How would I store this data in influxdb in order to be able to get such reports as described above or are there better databases to store such data.
As per my understanding you are not dealing with timeseries data. Your data is structured as per my evaluation. You should use one of the relational databases like MySQL or PostgreSQL.
You can go ahead with InfluxDB if you are dealing with regular/irregular time series data.
Yes, you can use InfluxDB for this. You'd want to store this information as events in the database.
So when a user browses to your application, that would be an event and you'd store something like:
user_events,browser="Chrome",country="UK",os="Linux",language="en_GB",url="/home" some_field=1 TIMESTAMP
This is Line Protocol:
https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_tutorial/

Is it okay if Operational data store holds data for rolling time period?

We are planning to build an Operational data store for the front-end users data extraction requirements.
As far as I know the Kimball's approach to build ODS\DW, it should hold the data for complete time period and not like the rolling time period.
The reason being, there could be a need to extract older data from ODS\DW.
So I need your thoughts on this. How should I approach ?
I would create a snapshot table that could hold the values for the rolling period for each day, and filter on the client side of things which snapshot to display.
Once the period is over then the final values can be stored on the permanent data mart.
Kimball's approach for a data warehouse would be to load transactional data to any data warehouse if you can, because it is more flexible in terms of being rolled up. Certainly at the ODS stage you wouldn't want to 'pre-aggregate' your data, if there could be a need to get hold of older data.
If you store both the transactional data and then pre-aggregated versions of the data (in aggregate fact tables, with indexes/views or with a cube, or just filtering on the report side as the other answer suggests), you can get the best of both worlds.
(Note: Kimball's approach in fact does not require an ODS: they're fine if you want to build one, but their focus is on the dimensionally modelled data warehouse.)

Using Core Data to store large numbers of objects

I am somewhat new to Core Data and have a general question.
In my current project, users can access data reported by various sensors in each county of my state. Each sensor is represented in a table view which gathers its data from a web service call. Calling the web service could take some time since this app may be used in rural areas with slow wireless connectivity. Furthermore, users will typically only need data from one or two of the state's 55 counties. Each county could have anywhere from 15 to 500 items returned by the web service. Since the sensor names and locations change rarely, I would like the app to cache the data from the web service call to make gathering the list of sensors locations faster (and offer a refresh button for cases where something has changed). The app already uses Core Data to store bookmarked sensor locations, so it is already set up in the app.
My issue is whether to use Core Data to cache the list of sensors, or to use a SqlLite data store. Since there js already a data model in place, I could simply add another entity to the model. However, I am concerned about whether this would introduce unnecessary overhead, or maybe none at all.
Being new to Core Data, it appears that all that is really happening is that objects are serialized and their properties added as fields in a SqlLite DB managed by Core Data. If this is the case, it seems there really would not be any overhead from using the Core Data store already in place.
Can anyone help clear this up for me? Thanks!
it appears that all that is really happening is that objects are
serialized and their properties added as fields in a SqlLite DB
managed by Core Data
You are right about that. Core Data does a lot more, but that's the basic functionality (if you tell it to use a SQLite store, which is what most people do).
As for the number of records you want to store in Core Data, that shouldn't be a problem. I'm working on a Core Data App right now that also stores over 20,000 records in Core Data and I still get very fast fetch times e.g. for auto completion while typing.
Core Data definitely adds some overhead, but if you only have few entites and relationships and are not creating/modifying objects in more than one context, it is negligible.
Being new to Core Data, it appears that all that is really happening is that objects are serialized and their properties added as fields in a SqlLite DB managed by Core Data. If this is the case, it seems there really would not be any overhead from using the Core Data store already in place.
That's not always the case. Core Data hides its storage implementation from the developer. It is sometimes a SQL db, but in other cases it can be a different data storage. If you need a comprehensive guide to CoreData, I recommend this objc.io article.
As #CouchDeveloper noted, CoreData is a disk io/CPU bound process. If you notice performance hits, throw it in a background thread (yes - this is a pretty big headache), but it will always be faster than the average network.

How is data loaded from a remote database stored locally (in memory)

What is the best way that data loaded from a remote database can be stored locally on iOS. (You don't need to provide any code, I just want want to know the best way conceptually.)
For instance, take Twitter for iOS as an example. When it loads the tweets, does it just pull the tweet data from the remote database and store them in a local database on the iPhone? Or would it be better if the data is just stored locally as an array of objects or something similar?
See, I'm figuring that to be able to use/display the data from the remote database (for instance, in a dynamic table view), the data would have to be stored as objects anyway so maybe they should just be stored as objects in an array. However, when researching this, I did see a lot of articles about local databases, so I was thinking maybe its more efficient to load the remote data as a table and store it in a local database and use data directly from the local table to display the data or something similar.
Which one would require more overhead: storing the data as an array of Tweet objects or as a local database of tweets?
What do you think would be the best way of storing remote data locally (in memory) (for an app that loads data similar to how Twitter for iOS)?
I suppose this begs this prerequisite question: when data from a remote database is downloaded, is it usually loaded as a database table (result set) and therefore stored as one?
Thanks!
While it's very easy the put the fetched data right into your array and use it from there, it is likely that you would be benefitted by using a local database for two main reasons: scalability and persistance.
If you are hoping to download and display a large amount of data, it may be unsafe to try to store it in memory all at once. It would be more scalable to download whatever data you need, store it in a local database, and then fetch only the relevant objects you need to display.
If you download the data and only store it in an array, that data will have to be re-fetched from the remote database and re-parsed on next load of your app/view controller/etc before anything can be displayed. Instead, create a local database in which to store the downloaded data, allowing it to be readily available to display on next load while new data is fetched from your remote source.
While there is some initial overhead incurred in creating your database, the scalability and persistance that provides you is more than enough to warrant that. Storing the state of a remote database in a local database is an extremely common practice.
However, if you do not mind the user having to wait for this data to be fetched on every load, or the amount of data being fetched is relatively low or the data is incredibly simple, you will save time and effort on skipping the local database.

Migrating user data stored in sqlite to core data in app upgrade

I'm new to Stack Overflow and have been programming for only a year, so still a newbie with this stuff (i.e., please bear with me!). I'm upgrading an old app created in xcode 3 that uses an sqlite database to store user-generated data. The upgraded app needs to work with iCloud, so I've decided to switch to Core Data (because sqlite on its own can't be synced with iCloud). I have no problem implementing the Core Data structures but my problem is in how to allow users to retain their existing data.
I have already looked at topics along these lines: How to import a pre-existing sqlite file into Core Data? and Need To Populate Core Data From SQLite Database, and while I can create a utility app to import existing data, this is of no use because there is no existing data with the initial app bundle, the data is stored by the user.
I'd really appreciate any help I can get on this, have totally tied myself in a big knot over it! Maybe I'm better off avoiding core data altogether and finding another solution to save the sqlite data in iCloud?
Thanks in advance!
You can use core data. Create your core data model and stack, and your migration process consists of querying chucks of data from your sqlite database to insert them into the core data database.
Let's say that you have three tables in your sqlite. Person, Vehicle, and Property. You will create the equivalent entities in your core data mode, then you will have to query information from these tables to insert them into core data. You need to get familiar on how to insert data into a core data database, and how to use it on a multithreading environment (to avoid locking the application during the migration process).
Here's a few things to keep in mind:
Try to save in chunks. Do not load everything into memory, but also you must avoid making numerous save calls into your NSManagedObjectContext instance. Every save call into the context is IO. Balance IO and memory depending on your data usage.
Index the properties you have on your existing entities that can be indexed. This will help you build relationships (if any)
Ask the system for time before starting the migration. Use UIApplication's beginBackgroundTaskWithExpirationHandler: in order to be sure to have enough time from the system to continue the migration (in case the user sends the application to the background).
Do not bother cleaning the sqlite database. Delete the database file when you are done with it.
These are some of the things I can advice to you for now. If i think of more, I will edit this post. Best of luck.

Resources