I'm a new user to QLIK, scripting & overall beginner. I am looking for any help or recommendations to deal with my tables below. Just trying to create a good model to link my tables.
Created a sample here
file The original 3 tables are different qvd files
Transactions table has multiple columns and the main ones are TxnID, SourcePartyTypeID, DestPartyTypeID, SourcePartyType, DestinationPartyType, ConductorID.
Customers Table - CustName, CustID etc.
Accounts Table - AcctID, AcctNum, PrimaryActID etc.
With transactions it can relate to multiple CustID's/AcctID's which are linked by the Dest/SourcePartyIDs. Also the transaction has a source/destination party type field where A = Accounts, C = Customers & some NULLs.
I have read a lot on data models and a link table for star schema or join is recommended but I am unsure how to code this because these are also based on the Source/DestinationType fields (Transactions Table) where A = Accounts & C = Customers. Have tried to code but not successful.
I'm unsure how to join based on SourceType/DestinationType = Accounts or Customers. Link table or ApplyMap() with a WHERE clause?? Any suggestions
Hopefully your introduction to Qlik is still a positive one! There are a lot of resources to help you develop your Qlik scripting capabilities including:
Qlik Continuous Classroom (https://learning.qlik.com)
Qlik Community (https://community.qlik.com)
Qlik Product Documentation (https://help.qlik.com)
In terms of your sample data question. If you are creating a Qlik Sense app you can use the Qlik Data Manager to link your data.
This is excellent because not only will it try and analyse your data and make useful suggestions to link fields, it will also build the script which you can then review and use as a basis for developing your own understanding further.
Looking at your sample data, one option might be a simple key field between a couple of the tables. Here is one example of how this could work.
Rod
[Transactions]:
Load
// User generated fields
AutoNumberHash256 ( [DestPartyID], [SoucePartyID] ) As _keyAccount,
// Fields in source data
[TxnID],
[TxnNum],
[ConductorID],
[SourcePartyType],
[SoucePartyID] As [CustID],
[DestPartyType],
[DestPartyID],
[etc...]
From [lib://AttachedFiles/TablesExamples.xlsx]
(ooxml, embedded labels, table is Transactions);
[Customers]:
Load
// User generated fields
// Fields in source data
[CustID],
[CustFirstName],
[CustLastName]
From [lib://AttachedFiles/TablesExamples.xlsx]
(ooxml, embedded labels, table is Customers);
[Accounts]:
Load
// User generated fields
AutoNumberHash256 ( [AcctID], [PrimaryAcctID] ) As _keyAccount,
// Fields in source data
[AcctID],
[AcctNum],
[PrimaryAcctID],
[AcctName]
From [lib://AttachedFiles/TablesExamples.xlsx]
(ooxml, embedded labels, table is Accounts);
Related
I have a database of economists in Microsoft Access, and i need to transfer it to Neoj.
Keep in mind that all these economists have also been teachers.
So, I have a table - Economist - where i store all economists data, with an ID "codecon" for each economist. Then, i have a table - University - where i store information about Universities, with an ID "coduni" for each University.
As last i have a table - Subject - with subjects informations, with an ID "codsubj" for each subject.
Now, in Access i have another table - Teaching, where i use the previous IDs to say that "Economist codecon teach Subject codsubj in University coduni.
How can i create this type of link in Neo4j, where i can only have relationships between TWO nodes?
Any help would be great. Thanks.
You will want to think about what your graph data model is going to look like. After that, you might want to use a tool to model it. Finally, you'll want to load the data in. You can use LOAD CSV if you've got the data exported out of Access as csv files, APOC has an XLS importer.
I think you'll find this following starter guide - going from relational to graph helpful. I would also suggest you start with exporting your tables to begin with to CSV.
For data modelling, I find arrows helpful.
I am developing a BI system for our company, from scratch, and currently, I am designing a data warehouse. I am completely new to this so there are many things that I don't really understand, so I need to hear some more insights into this.
My problems are:
1) In our source system, there are tables called "Booking" and "BookingAccess". Booking table holds the data of a booking, such as check-in time and check-out time, booking date, booking number, gross amount of that booking.
Whereas in BookingAccess, it holds foreign keys related to the booking, such as bookerID, customerID, processID, hotelID, paymentproviderID and a current status of that booking. Booking and BookingAccess has a 1:1 relation ship.
Our source system is about checking the validity of those bookings, these bookings are not ours. We receive these booking information from other sources, outsource the above process for them. The gross amount is just an information of that booking that we need to validate, their are not parts of our business. The current status of a booking which is hold in the BookingAccess table is the current status of that booking in our system, which can be "Processing" or "Finshed".
From what I read from Ralph Kimball, in this situation, the "Booking" is the Dimension table, and the BookingAccess should be the fact. I feel that the BookingAccess is some what a [Accumulating Snapshot table], in which I should track the time when a booking is "Processing", and when a booking is "Finshed".
Do I get it right?
2) In "Booking" table, there is also a foreign key called "ImportID". This key links to a table called "Import". This "Import" table hold history records of files (these file contain bookings which will be written to the "Booking" table) which were imported to our system, including attributes such as file name, imported date, total booking imported...
From my point of view, this is clearly a fact table.
But the problem is that, the "Import" table and the "Booking" table has a relationship of one to many (1 ImportID in "Import" table can have 1, 2 or more records which have a same ImportID in "Booking" table). This is against the idea of fact tables which insists that the relationship between Fact and Dimension must be many-to-one, which fact is always in the many side.
So what approach should I use to solve this case? I'm thinking of using bridge tables to solve this problem. But I don't know if this is a good practice, as there are a lot of record in the "Import" table, so I will have to create a big bridge table just to covers all of this.
3) Should I separate a table (from source system) which contains a mix of relationships and information to a fact table containing only relationships, and dimension table containing only information? (For example, a table called "Customer" in source system. This table contains some things like customer name, customer address and customertype id, customer parentID....)
I am asking this because I feel that if I use BI tools to analyze things (for example, analyzing the number of customers which has customertypeid = 1), I feel it's some what weird if there are no fact tables involved in.
Or should I treat it as a mere dimension table and use snowflake-schema? But this will lead to a mix of Star-Schema and snowflake-schema in our Data Warehouse. Is this normal? I have read some official sources (most likely Oracle) stating that one should try to avoid using and mixing snowflake-schema as much as possible. But some sources like Microsoft say that this is very normal. Even the Advanture Work Data Warehouse sample database uses this kind of approach.
Or should I de-normalize every relation in that "Customer" table? But I don't think this is a good approach as it will make the Customer contain a lot of columns, and it will be very hard to track the history of every row in the "DIM_Customer" table. For example, if any change occur in any relation of "Customer" table, the whole "DIM_Customer" table will need to be updated.
I still have a lot of question regarding to Data Warehouse. I am working with it nearly alone, without any help or consultant. So pardon me if I made any kind of inconveniences or mistakes.
In my CouchDB database, I have the following models (implemented as documents in the database with different type fields):
Team: name, id (has many matches, has many fans)
Match: name, team_a, team_b, time (has many teams, has many tweets)
Fan: team_id (has many tweets)
Tweet: time, sentiment, fan_id
I want to average the tweet sentiment for each team. If I were using SQL I'd do it like this:
SELECT avg(sentiment)
FROM team
JOIN match on team.id = match.team_a OR team.id = match.team_b
JOIN fan on fan.team = team.id
JOIN tweet on (tweet.time BETWEEN match.time AND match.time + interval '1 hour') AND tweet.user = fan.id
GROUP BY team.id
However in CouchDB you can at best do 1 join in a view function, as explained in the docs (by emitting the join field as the key).
How can this be better modelled in CouchDB to allow for this query to work? I don't really want to denormalise too much, but I guess I will if I have to?
It's a bit complex, but I use what I call "tertiary indexes". The goal is to be able to write a view that is applied to another view. Unfortunately, the only way to do this is to use a view to write data to a secondary database and then have another view that works on that database. Doing this requires an outside process - I use a script that listens to the _changes feed of the primary database, and then updates the relevant documents in the secondary database when something changes.
So in your example your secondary database could consist of a single document for each team with all of the (or the latest) match/fan/tweet data in that one document. Then you write a view that extracts the sentiment (or whatever) from that secondary database.
We haave Accounts, Deals, Contacts, Tasks and some other objects in the database. When a new organisation we want to set up some of these objects as "Demo Data" which they can view/edit and delete as they wish.
We also want to give the user the option to delete all demo data so we need to be able to quickly identify it.
Here are two possible ways of doing this:
Have a "IsDemoData" field on all the above objects : This would mean that the field would need to be added if new types of demo data become required. Also, it would increase database size as IsDemoData would be redundant for any record that is not demo data.
Have a DemoDataLookup table with TableName and ID. The ID here would not be a strong foreign key but a theoretical foreign key to a record in the table stated by table name.
Which of these is better and is there a better normalised solution.
As a DBA, I think I'd rather see demo data isolated in a schema named "demo".
This is simple with some SQL database management systems, not so simple with others. In PostgreSQL, for example, you can write all your SQL with unqualified names, and put the "demo" schema first in the schema search path. When your clients no longer want the demo data, just drop the demo schema.
I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).