It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have found some tutorials, but they still leave me with questions.
Let's take a classic example of 2 tables, one for customer details and one for order details.
The customers table in the database has:
an autoincrementing integer customer_id as primary key
a text field for customer name
a text field for contact details
And the orders table has:
an integer customer_id which is a foreign key referencing the customers table
some other stuff, such a reference to a bunch of item numbers
an integer order_value to store the cash value of the order
I need two dataset components, two queries and a connection.
So far, so good? Or did I miss something already?
Now, the tutorials say that I have to set the MasterSource of the of the datasource which coresponds to the DB grid showing the orders table to be the datasource which corresponds to the customers table and the MasterFields, in this case, to customer_id.
Anything else? Should I for instance set the Detailfields of the query of the datasource which corresponds to the customers table to customer_id?
Should I use the properties, or a paramaterized query?
Ok, at this point, we have followed the classic tutorials and can scroll through the customers DB grid and see all orders for the current customer shown in the orders DB grid. When the user clicks the customers DB grid I have to Close(); then Open(); the orders query to refresh its corresponding DB grid.
However, those tutorials always seem to posit a static database with existing contents which never change.
When I asked anothter question, I gave an example where I was using a Command to INSERT INTO orders... and was told that that is A Bad Thing` and I should:
OrdersQuery.Append();
OrdersQuery.FieldByName('customer_id') := [some value]';
OrdersQuery.FieldByName('item_numbers') := [some value]';
OrdersQuery.FieldByName('order_value') := [some value]';
OrdersQuery.Post();
Is that correct?
I ask because it seems to me that a Command puts data in and a query should only take it out, but I can see that a command doesn't have linkage to the DB grid via a datasource's query.
Is this a matter of choice, or must the query be used?
If so, it seems that I can't use even simple SQL functions such as SUM, MIN< AVG, MAX in the query and have to move those into my code.
If I must use the query, how do I implement SQL UPDATE and DROP?
And, finally, can I have a Master/Detail/Detail relationship?
Let's say I want a 3rd DB grid, which shows the total and average of all orders for a customer. It gets its data from the orders table (but can't use SUM and AVG) which is updated each time the user selects a different customer, thus giving a Master/Detail/Detail relationship. DO I just set that up as two Master/Detail relationships? I.E, the DB grid, datasource, query for the total and average orders refers only to orders and has no reference to customers, even if it does use customer_id?
Thanks in advance for any help and clarification. I hope that this question will become a reference for others in the future (so, feel free to edit it).
TLDR: In the SQL world, Master/Detail is an archaism.
When some people say "Master Detail" they aren't going to go all the way down the rabbit hole. Your question suggests you do want to. I'd like to share a few things that I think are helpful, but I don't see that anyone can really answer your questions completely.
A minimal implementation of master detail, for any two datasets, for some people's purposes, is nothing more than an event handler firing when the currently selected row in the master table changes. This row is then used to filter the rows in the detail table dataset, so that only the rows that match the primary key of the master row are visible. This is done for you, if you configure it properly, in most of the TTable-like objects in Delphi's VCL, but even Datasets that do not explicitly support master/detail configurations can be made to function this way, if you are willing to write a few event handlers, and filter data.
At one of my former employers, a person had invented a Master Detail controller component, which along with a little known variant of ADO-components for Delphi known as Kamiak, and it had some properties which people who are only familiar with the BDE-TTable era concept of master detail would not have expected. It was a very clever bit of work, it had the following features:
You could create an ADO recordset and hold it in memory, and then as a batch, write a series of detail rows, all at once, if and only if the master row was to be stored to the disk.
You could nest these master-detail relationships to almost arbitrary depths, so you could have master, detail and sub-detail records. Batch updates were used for UPDATES, to answer that part of your question. To handle updates you need to either roll your own ORM or Recordset layer, or use a pre-built caching/recordset layer. There are many options, from ADO, to the various ORM-like components for Delphi, or even something involving client-datasets or a briefcase model with data pumps.
You could modify and post data into an in-memory staging area, and flush all the master and detail rows at once, or abandon them. This allowed a nearly object-relational level of persistence management.
As lovely as the roll-your-own-ORM approach seems above, it was not without it's dark side. Strange bugs in the system lead me to never want to ever use such an approach again. I do not wish to overstate things, but can I humbly suggest that there is such a thing as going too far down the master-detail rabbit-hole? Don't go there. or if you do, realize that you're really building a mini ORM, and be prepared to do the work, which should include a pretty solid set of unit tests and integration tests. Even then, be aware that you might discover some pretty strange corner cases, and might find that a few really wicked bugs are lurking in your beautiful ORM/MasterDetail thing.
As far as inserts go, that of course depends on whether you are a builder, or a user. A person who is content to build atop whatever Table classes are in the VCL and who never wants to dirty their hands with SQL is going to think your approach is wrong-headed if you are not afraid of SQL. I wonder how that person is going to deal with auto-assigned identity primary keys, though. I store a person record in a table, and immediately I need to fetch back that person's newly assigned ID, which is an integer, and I am going to use that integer primary key now, to associate my detail rows with the master row, and the detail rows, therefore refer to the master row's ID integer, as a foreign key, because my SQL database is nicely constructed, with referential integrity constraints, and because I've thought about all this in advance and don't want to do this over and over again repeatedly, I eventually get from here, to building an object-relational-mapping framework. I hope you can see how your many questions have many possible answers, answers which have lead to hundreds or millions of possible approaches, and there is no one right one. I happen to be a disbeliever in ORMs, and I think the safe place to get off this crazy train is before you get on it. I hand code my SQL, and I hand code my business objects, and I don't use any fancy Master Detail or ORM stuff. You, however, may choose to do as you like.
What I would have implemented as "master detail" in the BDE/dBase/flat-file era, I now simply implement as a query for master rows, and a second query for detail rows, and when the master row changes, I refresh the detail rows queries, and I do not use the MasterSource or related Master/Detail properties in the the TTable-objects at all.
Related
I am working on a design for an HR data mart using the Kimball approach outlined in 'The Data Warehouse Toolkit'.
As per the Kimball design, I was planning to have a time-stamped, slowly-changing dimension to track employee profile changes (to support point-in-time analysis of employee state) and a head-count periodic snapshot fact table to support measures of new hires, leavers, leave taken, salary paid etc.
The problem I've encountered is that, in some cases, our employees can be assigned to multiple roles/jobs and each one needs to be tracked separately (i.e. the grain of my facts has to be at job-level, not employee level).
How might the Kimball design be adapted to fit a scenario where employee and role/job form a hierarchy like this? Ideally, I want to avoid duplicating employee profile data (address, demographics etc) for each role/job an employee is assigned to, but does this mean I need to snow-flake the dimension?
Options I've been considering include the below - I'd be interested in any thoughts or suggestions the community has on this so all input is welcome!
1) (see attached, design 1) A snowflake-style approach with an employee table which has a 1-to-Many link role table, which, in turn, has a 1-to-many link with the fact table. The advantage here is a clean employee dimension but I don't want to introduce unnecessary complexity. Is there any reason why I shouldn't link both dimensions directly to the fact table? The snowflake designs I've seen don't seem to do this.
2) (see attached, design 2) A combined Employee/Role dimension where each employee has a record for each assigned role but only one on them is flagged as 'Primary Role'. Point-in-time queries on the dimension can be performed by constraining on the 'Primary Role' flag.
Anything that occurred is an event and can be a fact. When you look at relationships between data, you need to also ask if the data value describes the entity (dim) or something that happened to/with the entity(fact). Everything can be a dim or a fact.(sometimes both)
A job describes an event that happened to the employee. You should have a fact employeejob that relates to the Dim employee and Dim job (as well as your date dimensions). This will then allow you to break down absences by employee and job. Your dim job would really just be job title, pay grades, etc. The fact would contain effective dates. Research factless fact tables.
Note that your vacancy reference would be part of a separate fact (when/where did you post it, how many applicants are all measurable facts about the vacancy). This may also be an example of a degenerate dimension.
I'm not fond of your monthly fact. I think that should just be some calculated measures built on fact absence and fact employeejob. When those events are put up against your dimensions, you can break them down by date, job type, manager, etc.
I'm building a DW just like the one from AdventureWorks. I have one fact table called FactSales and theres a table in the database called SalesReason that tells us the reason why a certain costumer buys our product.
The thing is there are two types of costumers - the resselers and the online customers - and only the online customers have a sales reason linked to them.
First of all, can I vave to Dimension tables pointing to the same FK in the Fact? Like in my case - Sk_OnlineCustomer and SK_Resseler both point to FK_Customer. Their Id numbers don't overlap-
And Second,
Should I build a reason dimension, link it to the fact and have a FK that most of the times is null or with a "dummy reason"?
Should I just put the reason in the fact sales without it being a key, just like a technical description that is nullable?
Should I divide the fact in two fact tables with one for the resselers and one for the online customers? But even in that case, I would have some costumers that don't answer to the reason, so the fk_reason would be null in some of its appearences in the new fact_Online_Customer.
In a solution I saw from the adventure works tutorial, it's created a new fact table called fact_reason. It Links the factSales with a DimReason.
That looks like a good solution, but I don't know how it works, because I never lerned in my classes that I could link a fact to a fact, thus I wouldn't be able to justify my option to my teacher.
If you could explain it I would appreciate it.
Thanks!
Please find my comments for your questions:
First of all, can I vave to Dimension tables pointing to the same FK in the Fact? Like in my case - Sk_OnlineCustomer and SK_Resseler both point to FK_Customer. Their Id numbers don't overlap-
Yes the dimension in this case would be Dim_Customer(for eg) and this could be a role playing dimension. You can expose reporting views to separate the Online customer and Reseller customer
And Second, Should I build a reason dimension, link it to the fact and have a FK that most of the times is null or with a "dummy reason"?
Yes it would make sense to build a reason dimension. In this you can tag a fact record to the reason
Should I divide the fact in two fact tables with one for the resselers and one for the online customers? But even in that case, I would have some costumers that don't answer to the reason, so the fk_reason would be null in some of its appearences in the new fact_Online_Customer.
I would suggest you keep one fact as your business activity is sales, you can add context to it, online or reseller using your dimensions. If you would prefer you can have separate Dim_Sales dimension to include the sales type and other details of the sales which you cannot include in the dact
To summarise you probably might be well off with the following facts:
Fact_Sales linked to
Dim_Customer
Dim_Sales
Dim_Reason (This can also may be go to the Dim_Sales)
Dim_Date(always include a date dimension when you build a DWH solution)
Hope that helps...
I'm struggling to understand the best way to model a particular scenario for a data warehouse.
I have a Person dimension, and a Tenancy dimension. A person could be on 0, 1 or (rarely) multiple tenancies at any one time, and will often have a succession of tenancies over time. A tenancy could have one or more people associated with it. The people associated with a tenancy can change over time, and tenancies generally last for many years.
One option is to add tenancy reference, start and end dates to the Person Dimension as type 2 SCD columns. This would work well as long as I ignore the possibility of multiple concurrent tenancies for a person. However, I have other areas of the data warehouse where I am facing a similar design issue and ignoring multiple relationships is not a possibility.
Another option is to model the relationship as an accumulating snapshot fact table. I'm not sure how well this would work in practice though as I could only link it to one version of a Person and Tenancy (both of which will have type 2 SCD columns) and that would seem to make it impossible to produce current or historical reports that link people and tenancies together.
Are there any recommended ways of modelling this type of relationship?
Edit based on the patient answer and comments given by SQL.Injection
I've produced a basic model showing the model as described by SQL.Injection.
I've moved tenancy start/end dates to the 'junk' dimension (Dim.Tenancy) and added Person tenancy start/end dates to the fact table as I felt that was a more accurate way to describe the relationship.
However, now that I see it visually I don't think that this is fundamentally any different from the model that I started with, other than the fact table is a periodic snapshot rather than an accumulating snapshot. It certainly seems to suffer from the same flaw that whenever I update a type 2 slowly changing attribute in any of the dimensions it is not reflected in the fact.
In order to make this work to reflect current changes and also allow historical reporting it seems that I will have to add a row to the fact table every time a SCD2 change occurs on any of the dimensions. Then, in order to prevent over-counting by joining to multiple versions of the same entity I will also need to add new versions of the other related dimensions so that I have new keys to join on.
I need to think about this some more. I'm beginning to think that the database model is right and that it's my understanding of how the model will be used that is wrong.
In the meantime any comments or suggestions are welcome!
Your problem is similar to to the sale transactions with multiple item. The difference, is that a transaction usually has multiple items and your tenancy fact usually has a single person (the tenant).
Your hydra is born because you are trying to model the tenancy as a dimension, when you should be modeling it as a fact.
The reason why I think you have a tenancy dimension, is because somewhere you have a fact rent. To model the fact rent consider use the same approach i stated above, if two persons are tenants of the same property two fact records should be inserted each month:
1) And now comes some magic (that is no magic at all), split the value of the of the rent by the number of tenants and store it the fact
2) store also the full value of the rent (you don't know how the data scientist is going to use the data)
3) check 1) with the business user (i mean people that build the risk models); there might be some advanced rule on how to do the spliting (a similar thing happens when the cost of shipping is to be divided across multiple item lines of the same order -- it might not be uniformly distributed)
I am seeking a TDBTree component that is very versatile, and i would like to hear some recommendations. I am specifically seeking one that would show a master record and "n" number of Linked table records. (I mean records from various tables). For example, the TDBTree would be hook to master table, and Detail table 1, Additional table, etc.
Master Table Record
Detail Table 1 Record
Detail Table 1 Record
Detail Table 1 Record
Additional Table Record
Additional Table Record
I am not sure if this is possible or not. This is why i am inquiring. Thanks for any recommendations you may be able to provide.
And example would be
Master Checks
Check Details
Account Record
Bank Record
Look at Developer Express controls. They have something alike what you're looking for. They have both a grid that can show details "in line", and some db-aware trees with many capabilities - IMHO if you're displaying that kind of that their Master-Detail grid is better than any tree, you're going to show different data in each detail.
I know this isnt DB aware, but if your open to alternatives then VirtualStringTree is a very good option. I use this tree component displaying most of my DB data to the user - it offers a very flexible and speedy tree/grid for any data. It is very easy to handle DB updating in the many events it provides for you.
I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).