YapDatabase only uses a single table to store data - ios

I am looking for a key value store database on iOS. It should be based on sqlite, so YapDatabase seems to be a good choice.
But I find YapDatabase only uses a single table, to quote "The main database table is named "database" ": CREATE TABLE "database2" ("rowid" INTEGER PRIMARY KEY, "collection" CHAR NOT NULL, "key" CHAR NOT NULL, "data" BLOB, "metadata" BLOB ). So I am concerned about storing different type objects into the same column.
For example, I plan to use YapDatabase for my chat app, storing each message into |collection|key|object|metadata|. Each message has a unique id, which will be used as the key, message content normally is nstring, which will be used as object, timestamp with some other data is used as metadata. Just like YapDatase author answered here.
Sometimes pictures will be sent. The images are small, normally around couple hundreds of kbs, I don't want to store them as files, I believe storing them as blob is appropriate.
But if I use YapDatabse they are stored in the same table as my normally text messages. Then how can I do some query like, said find all my text messages?
Is my concern valid (storing different type objects into the same column)? Do I need to store them in separated tables? If yes, how? If not, how do I find all my text message easily?

The whole point of a key/value store is that it uses only the keys to identify the values. So if you want to store messages and pictures in the same store, you must ensure that they have different keys.
If you want to store data in separate tables, use a database that actually has tables, like SQLite.

Related

How to deal with nested hash with dynamic keys(PorstgreSQL JSONB) with the help of Cube.js?

I am quite a newbie to Cube.js. I have been trying to integrate Cube.js analytics functionality with my Ruby on Rails app. The database is PostgreSQL. In a database, there is a certain column called answers_json with jsonb data type which contains a nested hash. An example of data of that column is:
**answers_json:**
"question_weights_calc"=>
{"314"=>{"329"=>1.5, "331"=>4.5, "332"=>1.5, "333"=>3.0},
"315"=>{"334"=>1.5, "335"=>4.5, "336"=>1.5, "337"=>3.0},
"316"=>{"338"=>1.5, "339"=>3.0}}
There are many more keys in the same column with the same hash structure as shown above. I posted the specific part because I would be dealing with this part only. I need assistance with accessing the values in the hash. The column has a nested hash. In the example above, the keys "314", "315" and "316" are Category IDs. The keys associated with Category ID "314" are "329","331","332", "333"; which are Question IDs. Each category will have multiple questions. For different records, the category and question IDs will be dynamic. For example, for another record, Category ID and Question IDs associated with that category id will be different. I need to access the values associated with the key question id. For example, to access the value "1.5" I need to do this in my schema file:
**sql: `(answers_json -> 'question_weights_calc' -> '314' ->> '329')`**
But the issue here is, those ids will be dynamic for different records in the database. Instead of "314" and "329", they can be some other numbers. Adding different record's json here for clarification:
**answers_json:**
"question_weights_calc"=>{"129"=>{"273"=>6.0, "275"=>15.0, "277"=>8.0}, "252"=>{"279"=>3.0, "281"=>8.0, "283"=>3.0}}}
How can I know and access those dynamic IDs and their values since I also need to perform mathematical operations on values. Thanks!
As a general rule, it's difficult to run SQL-based reporting on highly dynamic JSON data. Postgres does have some useful functions for dealing with JSON, and you might be able to use json_each or json_object_keys plus a few joins to get there, but its quite likely that the performance and maintainability of such a query would be difficult to say the least 😅 Cube.js ultimately executes SQL queries, so if you do go the above route, the query should be easily transferrable to a Cube.js schema.
Another approach would be to create a separate data processing pipeline that collects all the JSON data and flattens it into a single table. The pipeline should then store this data back in your database of choice, from where you could then use Cube.js to query it.

Is there a way to connect multiple keys to the same value in Core Data?

I have a problem where I want to search for data in an application's database managed by Core Data. The problem is, that the key I would be using to query the database may have several similar ways of being written. For example, I want to access same data element with name "tomato" with the key "tomato" or "tomatoes". All other data fields would be the same. Does Core Data offer any built-in functionality to create aliases for a key so that a single element can be accessed by multiple keys?
I tried adding duplicate elements that only different by the "name" attribute, but I do not want to do this for every entry as it would require my database to use at least twice as much space.

How to mark data as demo data in SQL database

We haave Accounts, Deals, Contacts, Tasks and some other objects in the database. When a new organisation we want to set up some of these objects as "Demo Data" which they can view/edit and delete as they wish.
We also want to give the user the option to delete all demo data so we need to be able to quickly identify it.
Here are two possible ways of doing this:
Have a "IsDemoData" field on all the above objects : This would mean that the field would need to be added if new types of demo data become required. Also, it would increase database size as IsDemoData would be redundant for any record that is not demo data.
Have a DemoDataLookup table with TableName and ID. The ID here would not be a strong foreign key but a theoretical foreign key to a record in the table stated by table name.
Which of these is better and is there a better normalised solution.
As a DBA, I think I'd rather see demo data isolated in a schema named "demo".
This is simple with some SQL database management systems, not so simple with others. In PostgreSQL, for example, you can write all your SQL with unqualified names, and put the "demo" schema first in the schema search path. When your clients no longer want the demo data, just drop the demo schema.

How to create a fact table using natural keys

We've got a data warehouse design with four dimension tables and one fact table:
dimUser id, email, firstName, lastName
dimAddress id, city
dimLanguage id, language
dimDate id, startDate, endDate
factStatistic id, dimUserId, dimAddressId, dimLanguageId, dimDate, loginCount, pageCalledCount
Our problem is: We want to build the fact table which includes calculating the statistics (depending on userId, date range) and filling the foreign keys.
But we don't know how, because we don't understand how to use natural keys (which seems to be the solution to our problem according to the literature we read).
I believe a natural key would be the userId, which is needed in all ETL jobs which calculate the dimension data.
But there are many difficulties:
in the ETL jobs load(), we do bulk inserts with INSERT IGNORE INTO to remove duplicates => we don't know the surrogate keys which were generated
if we create meta data (including a set of dimension_name, surrogate_key, natural_key) this will not work because of the duplicate elimination
The problem seems to be the duplicate elimination strategy. Is there a better approach?
We are using MySQL 5.1, if it makes any difference.
If your fact table is tracking logins and page calls per user, then you should have set of source tables which track these things, which is where you'll load your fact table data from. I would probably build the fact table at the grain of one row per user / login date - or even lower to persist atomic data if at all possible.
Here you would then have a fact table with two dimensions - User and Date. You can persist address and language as dimensions on the fact as well, but these are really just attributes of user.
Your dimensions should have surrogate keys, but also should have the source "business" or "natural" key available - either as an attribute on the dimension itself, or through a mapping table as your colleague suggested. It's not "wrong" to use a mapping table - it does make things easier when there are multiple sources.
If you store the business keys on a mapping table, or in the dimension as an attribue, then for each row to load in the fact, it's a simple lookup (usually via a join) against the dim or mapping table to get the surrogate key for the user (and then from the user to get the user's "current" address / language to persist on the fact). The date dimension usually hase a surrogate key stored in a YYYYMMDD or other "natural" format - you can just generate this from the date information on your source record that you're loading into the fact.
do not force for single query, try to load the data in separated queries and mix the data in some provider...

How can I add an index on an attr_encrypted db field?

I have
attr_accessible :access_token
attr_encrypted :access_token, key: ENV['ENCRYPTION_KEY']
and I'm doing some User.find_by_access_token, so I'd like to index the field in the db.
However, no access_token exists, only encrypted_access_token.
Does indexing this do the same thing as indexing any other field?
The whole point of saving encrypted data is to prevent the cleartext from showing. Obviously you cannot search for it, or the whole concept would be flawed.
You can index the encrypted token with a plain index and search the table with the encrypted token - for which you obviously need the encryption key.
CREATE INDEX tbl_encrypted_access_token_idx ON tbl(encrypted_access_token);
SELECT *
FROM tbl
WHERE encrypted_access_token = <encrypted_token>;
If all your tokens can be decrypted with an IMMUTABLE Postgres function, you could use an index on an expression:
CREATE INDEX tbl_decrypted_token_idx
ON tbl(decrypt_func(encrypted_access_token));
SELECT *
FROM tbl
WHERE decrypt_func(encrypted_access_token) = <access_token>;
Note that the expression has to match the expression in the index for the index to be useful.
But that would pose a security hazard on multiple levels.
I Googled and found a reference to Fast Search on Encrypted Field. A comment mentioned "deidentifying the data". Seems to be an accepted method.
Here's how I envision it working. In this example I separate the patient name from the rest of the Patient record.
Patient Row: [id=1, name_and_link=9843565346598789, …]
Patient_name Row: [id=1, name=”John”, patient_link=786345786375657]
The name_and_link field is an encrypted copy of two fields: the name and a link to the Patient_name. Having the name in both tables is redundant. I suggest it to provide faster access (no need to read from the Patient_name table). Also allows recreating the Patient_name table if necessary (e.g. if the two tables become out of sync).
The Patient_name table contains the unencrypted copy of the name value. The name row can be indexed for fast access. To search by name, locate matching names in the Patient_name table and then use the encrypted links back to the Patient table.
Note: In the example above I show long numbers as sample encrypted data. It's actually worse in real life. Depending on the encryption method, the minimum length of an encrypted value is about 67 bytes using Postgres' pgp_sym_encrypt() function. That means encrypting the letter "x" turns into 67 bytes. And I'm proposing two encrypted fields for each de-id'd field. That's why I suggest encrypting the name and the link together (as JSON tuple?) in the Patient table. Cuts the space overhead in half versus encrypting two fields separately.
Note: This requires breaking some fields into pieces. (e.g. phone numbers, SSN, addresses). Each part would need to be stored in a separate table. Even the street portion of an address would have to be further subdivided. This is becoming complicated. I'd like to see Postgres automate this.
Note: Just controlling access to the password is a tough issue itself.

Resources