If each DynamoDB item has unique schema, how can I query it? - ios

I'm using DynamoDB to store my data. Each item has a name (primary key) and then unique attributes. How can I query by primary key, if the iOS DynamoDB sdk wants me to specify a model class (but each item is unique)? For example, I want to just input name (primary key), then the results will tell me what attributes that item has. Looking at aws's dynamodb sample for ios, you have to specify what these attributes are prior to the query, which I do not want to do. Is that the only way?

The examples you were looking at are for the Dynamo Mapper which is just one of the abstractions you can use to work with Dynamo. In fact it is a pretty high level one and it is convenient if all items have a limited set of known attributes.
But underneath Dynamo is a document database that only requireas items to have a key (that may optionally be composed of a partition and a sort key) but other than that you can definitely store and query each item with a different set of attributes.
Please have a look at the DynamoDB low level API (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.LowLevelAPI.html) which supports querying items by key, and then iterating over each item's attributes. In fact, items are treated as a key-value map where the key is the attribute name and the value is whatever you want to store for each attribute.

Related

How to deal with nested hash with dynamic keys(PorstgreSQL JSONB) with the help of Cube.js?

I am quite a newbie to Cube.js. I have been trying to integrate Cube.js analytics functionality with my Ruby on Rails app. The database is PostgreSQL. In a database, there is a certain column called answers_json with jsonb data type which contains a nested hash. An example of data of that column is:
**answers_json:**
"question_weights_calc"=>
{"314"=>{"329"=>1.5, "331"=>4.5, "332"=>1.5, "333"=>3.0},
"315"=>{"334"=>1.5, "335"=>4.5, "336"=>1.5, "337"=>3.0},
"316"=>{"338"=>1.5, "339"=>3.0}}
There are many more keys in the same column with the same hash structure as shown above. I posted the specific part because I would be dealing with this part only. I need assistance with accessing the values in the hash. The column has a nested hash. In the example above, the keys "314", "315" and "316" are Category IDs. The keys associated with Category ID "314" are "329","331","332", "333"; which are Question IDs. Each category will have multiple questions. For different records, the category and question IDs will be dynamic. For example, for another record, Category ID and Question IDs associated with that category id will be different. I need to access the values associated with the key question id. For example, to access the value "1.5" I need to do this in my schema file:
**sql: `(answers_json -> 'question_weights_calc' -> '314' ->> '329')`**
But the issue here is, those ids will be dynamic for different records in the database. Instead of "314" and "329", they can be some other numbers. Adding different record's json here for clarification:
**answers_json:**
"question_weights_calc"=>{"129"=>{"273"=>6.0, "275"=>15.0, "277"=>8.0}, "252"=>{"279"=>3.0, "281"=>8.0, "283"=>3.0}}}
How can I know and access those dynamic IDs and their values since I also need to perform mathematical operations on values. Thanks!
As a general rule, it's difficult to run SQL-based reporting on highly dynamic JSON data. Postgres does have some useful functions for dealing with JSON, and you might be able to use json_each or json_object_keys plus a few joins to get there, but its quite likely that the performance and maintainability of such a query would be difficult to say the least 😅 Cube.js ultimately executes SQL queries, so if you do go the above route, the query should be easily transferrable to a Cube.js schema.
Another approach would be to create a separate data processing pipeline that collects all the JSON data and flattens it into a single table. The pipeline should then store this data back in your database of choice, from where you could then use Cube.js to query it.

Is there a way to connect multiple keys to the same value in Core Data?

I have a problem where I want to search for data in an application's database managed by Core Data. The problem is, that the key I would be using to query the database may have several similar ways of being written. For example, I want to access same data element with name "tomato" with the key "tomato" or "tomatoes". All other data fields would be the same. Does Core Data offer any built-in functionality to create aliases for a key so that a single element can be accessed by multiple keys?
I tried adding duplicate elements that only different by the "name" attribute, but I do not want to do this for every entry as it would require my database to use at least twice as much space.

Does it make sense to have a primary key attribute in Core Data in iOS 9?

In iOS 9, I can specify certain attributes in an entity to be unique constraints to prevent managed objects with the same unique constraints to be created.
Each NSManagedObject has its own ObjectId but maintained internally by Core Data and cannot be set as unique constraint in the model.
Based on that notion, does it make sense to include a "primary key" attribute for all entities in core data and specify the primary key as unique constraint if I don't want duplicate data?
If you have a meaningful way to source and populate that key, and the elimination of duplicates means something to you, then yes.
If you don't have a source of a meaningful value for the key, like a server generated value, then all you're doing is adding a requirement that you need to find the key first, and you'd be doing that anyway to avoid duplication. So adding it without 'external' support generally won't help.

How to mark data as demo data in SQL database

We haave Accounts, Deals, Contacts, Tasks and some other objects in the database. When a new organisation we want to set up some of these objects as "Demo Data" which they can view/edit and delete as they wish.
We also want to give the user the option to delete all demo data so we need to be able to quickly identify it.
Here are two possible ways of doing this:
Have a "IsDemoData" field on all the above objects : This would mean that the field would need to be added if new types of demo data become required. Also, it would increase database size as IsDemoData would be redundant for any record that is not demo data.
Have a DemoDataLookup table with TableName and ID. The ID here would not be a strong foreign key but a theoretical foreign key to a record in the table stated by table name.
Which of these is better and is there a better normalised solution.
As a DBA, I think I'd rather see demo data isolated in a schema named "demo".
This is simple with some SQL database management systems, not so simple with others. In PostgreSQL, for example, you can write all your SQL with unqualified names, and put the "demo" schema first in the schema search path. When your clients no longer want the demo data, just drop the demo schema.

How to create a fact table using natural keys

We've got a data warehouse design with four dimension tables and one fact table:
dimUser id, email, firstName, lastName
dimAddress id, city
dimLanguage id, language
dimDate id, startDate, endDate
factStatistic id, dimUserId, dimAddressId, dimLanguageId, dimDate, loginCount, pageCalledCount
Our problem is: We want to build the fact table which includes calculating the statistics (depending on userId, date range) and filling the foreign keys.
But we don't know how, because we don't understand how to use natural keys (which seems to be the solution to our problem according to the literature we read).
I believe a natural key would be the userId, which is needed in all ETL jobs which calculate the dimension data.
But there are many difficulties:
in the ETL jobs load(), we do bulk inserts with INSERT IGNORE INTO to remove duplicates => we don't know the surrogate keys which were generated
if we create meta data (including a set of dimension_name, surrogate_key, natural_key) this will not work because of the duplicate elimination
The problem seems to be the duplicate elimination strategy. Is there a better approach?
We are using MySQL 5.1, if it makes any difference.
If your fact table is tracking logins and page calls per user, then you should have set of source tables which track these things, which is where you'll load your fact table data from. I would probably build the fact table at the grain of one row per user / login date - or even lower to persist atomic data if at all possible.
Here you would then have a fact table with two dimensions - User and Date. You can persist address and language as dimensions on the fact as well, but these are really just attributes of user.
Your dimensions should have surrogate keys, but also should have the source "business" or "natural" key available - either as an attribute on the dimension itself, or through a mapping table as your colleague suggested. It's not "wrong" to use a mapping table - it does make things easier when there are multiple sources.
If you store the business keys on a mapping table, or in the dimension as an attribue, then for each row to load in the fact, it's a simple lookup (usually via a join) against the dim or mapping table to get the surrogate key for the user (and then from the user to get the user's "current" address / language to persist on the fact). The date dimension usually hase a surrogate key stored in a YYYYMMDD or other "natural" format - you can just generate this from the date information on your source record that you're loading into the fact.
do not force for single query, try to load the data in separated queries and mix the data in some provider...

Resources