How to mark data as demo data in SQL database - normalization

We haave Accounts, Deals, Contacts, Tasks and some other objects in the database. When a new organisation we want to set up some of these objects as "Demo Data" which they can view/edit and delete as they wish.
We also want to give the user the option to delete all demo data so we need to be able to quickly identify it.
Here are two possible ways of doing this:
Have a "IsDemoData" field on all the above objects : This would mean that the field would need to be added if new types of demo data become required. Also, it would increase database size as IsDemoData would be redundant for any record that is not demo data.
Have a DemoDataLookup table with TableName and ID. The ID here would not be a strong foreign key but a theoretical foreign key to a record in the table stated by table name.
Which of these is better and is there a better normalised solution.

As a DBA, I think I'd rather see demo data isolated in a schema named "demo".
This is simple with some SQL database management systems, not so simple with others. In PostgreSQL, for example, you can write all your SQL with unqualified names, and put the "demo" schema first in the schema search path. When your clients no longer want the demo data, just drop the demo schema.

Related

Json Column in Existing Postgres Table vs New Table

I have a "catalog" that I am trying to display information on. This information will be pulled from a few different tables that a user will be able to set a preference to hide a record from the respective table on their "catalog". I am running a Postgres database
So, my question is:
Would it be better (performance wise) to create a new table (table_a_to_catalog) where it would store the table_a_id and the catalog_id for the record from table_a that the user wants to hide for that catalog. Then have another table (table_b_to_catalog) to hold that connection...and so on...
OR
Would it be better to store the hide preference as a json value in the record of the catalog? So it would be something like {"table_a" => [id1, id2, id3], "table_b" => [id1, id2, id3]}
It really depends on the usecase of this catalog... If the information is readonly and you are running a job once a day to update the said catalog then json would be better. However, if you want to update information on the catelog live and and allow it to be editable then having a separate table would be best.
As for personal preference, I think keeping data in table allows more flexibility when you want to use the data for other features
Having very large tables negatively impacts for performance. Keeping "hide" view data in a postgres table means having a DB entry for each hidden entry in each catalog. Each client application will need to filter that table for information relevant to their user, and with many users this could take considerable time.
If one simply adds a field to the user table, containing an hstore, JSON or CSV of view data (e.g. hide preferences), that will reduce the initial load time marginally. JSON would make more sense if "hiding" means simply not displaying it client-side, wheras hstore makes more sense if you wish to not send the data to the client to begin with.
I say marginally because many other factors (caching) will impact performance more than this. You may want to look into using Redis for the application runtime and Postgres for data warehousing.

iOS Parse table structure

I am creating an application which requires the user to register. All data entered by user will be stored in this table called "customer". Now part of the information being collected is the address but I don't want to congest the table structure and would like to store address as an object (city, address, post code, etc).
What's the best practice: create an address table and refer the table through foreign key in the customer table or store the customer address as an object and store it in customer table?
I am not sure how parse fully functions so looking for your experience in the answer.
Thanks
I faced this exact problem a few months ago, and solved it by having a pointer in the customer object structure to the additional data. Note that if you do this, you'll need to make sure to include the pointed to field in future customer queries, or the data won't be fetched.
Retrospectively, I'm not sure I'd recommend splitting the objects up. It does create a more normalised data structure, but Parse fights against this in several ways:
You have to remember to include the pointed to field in all future queries. This is a pain.
You can only follow pointers up to a certain depth within a query (I think 3?)
Parse charges you by the database access, so denormalised data can be an issue.
Parse doesn't really support atomic operations or transactional queries, so it's easy to get your data into an inconsistent state if you're not careful about when you save. For example, you update your customer record, go to change the address record, and have the second query fail. Now you're in a "half updated state", and without transaction rollback, you'll have to fix it yourself (and you might not even know it's broken!).
Overall, were I to use Parse again (unlikely), I'd probably stick with giant denormalised objects.
Here is a solution to handle two table by the help of userId.
Note- You are creating a table of REGISTRATION and filling few data by your end(code).
so you can create an other one table for Address. and when you will create a new table of
Address a question will arise that how you manage these table
so its simple here you have same user id for both table "REGISTRATION & ADDRESS"
then by the help of that unique "userid" you can play. And as per your requirement find
the detail of both table and merge as well.
Hope it will resolve your problem .

How to create a fact table using natural keys

We've got a data warehouse design with four dimension tables and one fact table:
dimUser id, email, firstName, lastName
dimAddress id, city
dimLanguage id, language
dimDate id, startDate, endDate
factStatistic id, dimUserId, dimAddressId, dimLanguageId, dimDate, loginCount, pageCalledCount
Our problem is: We want to build the fact table which includes calculating the statistics (depending on userId, date range) and filling the foreign keys.
But we don't know how, because we don't understand how to use natural keys (which seems to be the solution to our problem according to the literature we read).
I believe a natural key would be the userId, which is needed in all ETL jobs which calculate the dimension data.
But there are many difficulties:
in the ETL jobs load(), we do bulk inserts with INSERT IGNORE INTO to remove duplicates => we don't know the surrogate keys which were generated
if we create meta data (including a set of dimension_name, surrogate_key, natural_key) this will not work because of the duplicate elimination
The problem seems to be the duplicate elimination strategy. Is there a better approach?
We are using MySQL 5.1, if it makes any difference.
If your fact table is tracking logins and page calls per user, then you should have set of source tables which track these things, which is where you'll load your fact table data from. I would probably build the fact table at the grain of one row per user / login date - or even lower to persist atomic data if at all possible.
Here you would then have a fact table with two dimensions - User and Date. You can persist address and language as dimensions on the fact as well, but these are really just attributes of user.
Your dimensions should have surrogate keys, but also should have the source "business" or "natural" key available - either as an attribute on the dimension itself, or through a mapping table as your colleague suggested. It's not "wrong" to use a mapping table - it does make things easier when there are multiple sources.
If you store the business keys on a mapping table, or in the dimension as an attribue, then for each row to load in the fact, it's a simple lookup (usually via a join) against the dim or mapping table to get the surrogate key for the user (and then from the user to get the user's "current" address / language to persist on the fact). The date dimension usually hase a surrogate key stored in a YYYYMMDD or other "natural" format - you can just generate this from the date information on your source record that you're loading into the fact.
do not force for single query, try to load the data in separated queries and mix the data in some provider...

rails create table in db dynamically

Normally to create/alter a table in database I use migrations (manually run rake db:migrate) and then in my code I use ActiveRecord. This is very cool as I don't have to worry about representation of the data in db and about a specific kind of db (sqlserver, pg or other).
But now a customer wants to be able to create "things" on-fly himself like, say, he starts selling computers, so he wants to an interface where he can dynamically create an object "computer" with properties like "Name, RAM, HD, ...". It seems to be quite natural to create a separate table in db with all these fields. But how can I do that in RoR and keep all these nice things about ActiveRecord?
Please suggest.
The usual way is to do exactly the opposite:
Have a table for object types
Have a table for field names for each object type
Have a very big table with all the custom attributes for each object of any type
This is called EAV (Entity-attribute-value model, see http://en.wikipedia.org/wiki/Entity-attribute-value_model). And it scales pretty bad.
Alternatively, you can use a store text column instead of the big EAV table (see http://api.rubyonrails.org/classes/ActiveRecord/Store.html) so you don't have to make those difficult attribute retrievals, typical of EAV. You still need to store somewhere the "object types" definitions, so the expected fields etc are available when building forms and tables.
The problem with this approach is that you are not able to query (where/join/select) on those attributes because they are not columns. There are a number of solutions to that:
Don't do filtering on those attributes (meh...)
Have an external search server that's able to do faceted search
(as #Amar correctly says) Use a document database
Use postgreSQL and use hstore instead of a simple serialized column.
NoSQL database(Document Database Mongodb,CouchDB) can be best fit for this or use redis. As per my thoughts you can use Vertical Table concept Try to run Rails 2.x Demo of application for MySQL.
You can try with Mongodb, check if this is needed.

How we design Dynamo db with keep relation of two entity

Hi iam new in dynamo db and, with my knowledge its a non relational db ie we cant join the tables. My doubt is how we design the table structure. Please clarify with following example.
I have a following tables
1) users - user_id, username, password, email, phone number, role
2) roles - id, name [ie admin, supervisor, ect..]
a) My first doubt is we have any provision to set auto increment for user_id fields ?
b) Is this correct way of setting primary key as user_id?
c) Is this is the correct method to store user role in dynamo db? ie a roles table contains id and title and store role id in user table?
e) Is this possible to retrieve two tables data along with each user? Am using rails 3 and aws-sdk gem
If anybody reply it will be very helpful for me like a new dynamodb user
Typically with nosql style databases you would provide the unique identifier, rather than having an auto increment PK field do that for you. This usually would mean that you would have a GUID be the key for each User record.
As far as the user roles, there are many ways to accomplish this and each has benefits and problems:
One simple way would be to add a "Role" attribute to the Users table and have one entry per role for that user. Then you could grab the User and you would have all the roles in one query. DynamoDB allows attributes to have multiple values, so one attribute can have one value per role.
If you need to be able to query users in a particular role (ie. "Give me all the Users who are Supervisors") then you will be doing a table scan in DynamoDB, which can be an expensive operation. But, if your number of users is reasonably small, and if the need to do this kind of lookup is infrequent, this still may be acceptable for your application.
If you really need to do this expensive type of lookup often, then you will need to create a new table something like "RolesWithUsers" having one record per Role, with the userIds of the users in the role record. For most applications I'd advise against doing something like this, because now you have two tables representing one fact: what role does a particular user have. So, delete or update needs to be done in two places each time. Not impossible to do, but it takes more vigilance and testing to be sure your application doesn't get wrong data. The other disadvantage of this approach is that you need two queries to get the information, which may be more expensive than the table scan, again, depending on the quantity of records.
Another option that makes sense for this specific use case would be to use SimpleDb. It has better querying capability (all attributes are indexed by default) and the single table with roles as multi-valued attribute is going to be a much better solution than DynamoDB in this case.
Hope this helps!
We have a similar situation and we simply use two DBs, a relational and a NoSQL (Dynamo). For a "User" object, everything that is tied to other things, such as roles, projects, skills, etc, that goes in relational, and everything about the user (attributes, etc) goes in Dynamo. If we need to add new attributes to the user, that is fine, since NoSQL doesn't care about those attributes. The rule of thumb is if we only need something on that object page (that is, we don't need to associate with other objects), then we put in Dynamo. Otherwise, it goes in relational.
Using a table scan on the NoSQL DB is not really an option after you cross even a small threshold (up to that point, you can just use an in memory DB anyway).

Resources