I hesitate whether I should add a Country_Dimension or not since I already have a Customer_Dimension which contain some redundant fields such as:
continent_name
country_name
postcode_#
Those could be two completely different things. A country name present in your Customer_Dimension will probably represent the country of an address your customer is using. Most likely the country he's living (or has lived) in. It can change over time since customers can switch addresses.
A dimension representing Countries will do exactly that, it represents countries. I think you first have to decide what the use of your dimension should be.
Related
I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?
First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:
First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/
I'm a building a registration form for my website(it is using Neo4j) and need to populate the country, state and city field. All these fields are inter-linked i.e depending on country, state field will be set and depending on state city will be set. I'm trying to figure out what's the best approach to model this using Neo4j. Do I need to create nodes for each country, state and city, and then create relationships between all of them? For instance, Detroit - belongs to - Michigan - belongs to - United States. What would be the best approach to handle this in Neo4j? Are there any examples to look at ? Would it be efficient to do this in Neo4j ? Or is it better to use a document based DB for that such as MongoDB?
I don't see any reason you can't do what you suggested, creating nodes for City, State, Country and wiring them up (I'm planning on doing this exact same thing with my upcoming project). This also lets you reuse those nodes in other parts of your graph, potentially allowing you to make interesting queries using common locations at faster speeds than property comparisons.
If I understand your requirements correctly, you'll have dropdowns or autocomplete fields or similar to drill down to each level (populate dropdown with countries -> populate next dropdown with states in the selected country -> populate last drop down with cities in the selected state). Just add indexes on identifier or abbreviation for quick node lookup and you're good, it should work quite fast.
If you're adding zip codes in there, that could be tricky, as you can't really model it in the same way. You'll have one-to-many relationships from both state and city to zip, and unless I'm mistaken there are a few interesting zips which can span more than a single state and/or city. Some other factors that can complicate things include 5 vs 9 digit zips (or more for other countries), and handling of zip-equivalents in other countries, as they may adhere to different logic.
Im new to Rails and I'm in the middle of sketching up an ERD for my new app. A Yelp-sort of app, where a Client is sorted by price.
So I want one Client to have many priceranges - One Client can both have pricerange $ and Pricerange $$$$ for example. The priceranges are:
$ - $$ - $$$ - $$$$ - $$$$$
How would this look in a table? Would I create a table called PriceRange with Range1, Range2, Range3, Range4, Range5 to be booleans?
Doesn't the PriceRange-table need any foreign/primary keys?
PriceRange
Range1 (Boolean)
Range2 (Boolean)
Range3 (Boolean)
Range4 (Boolean)
Range5 (Boolean)
Look, I'm Brazilian and I'm not very knowledgeable about yelp applications. I do not quite know what it is, but from what I saw, they are systems to assess/measure/evaluate (perhaps the translation is wrong here for you) things, in this case, companies, right?
Following this logic, let's think...
By the description of your problem (context), you have clients (companies), and they can have price ranges, correct? If:
A price interval is represented by textual names, such as "$", "$$",
and so on,
and the same price range may have (numeric) values for different companies,
And the same price range (type) can be (or not) assigned to different
companies,
Then here is what we have:
By decomposing this conceptual model, you would end up with three tables:
Companies
Price Ranges
Price Ranges from Companies
The primary keys of Company and Price Ranges will be passed to Price Ranges from Companies as foreign keys. You can use them as a composite primary key, or use a surrogate key. If using a surrogate key, you will permit/allow a company to have the same kind of price range more than once, which I believe is not the case.
Let's look at another situation, if things are simpler as:
If there is no need to store prices,
and an company may have or not one or more price ranges represented by "$", "$$", and so on,
Then here is what we have:
Similarly, we'll have the same 3 tables. Likewise, you still must pass the primary keys of Companies and Price Ranges to Price Ranges from Companies as foreign keys.
So I want one Client to have many priceranges - One Client can both
have pricerange $ and Pricerange $$$$ for example
Notice how N-N relationships allow us to create optional relationships between entities. This will allow a company to have zero, one, two, (etc.) or all price ranges defined. Again, so that is not allowed a company to have a price range more than once, set the foreign keys as composite primary key in Price Ranges from Companies.
If you have any questions or anything I explained has nothing to do with your context, please do not hesitate to comment.
EDIT
Is the Price ranges from companies what is called a Joint table?
Yes. There are also other terms used, some in different areas of computer science, such as Link Table, or Intermediate Table.
Actually we do not have a table here in the diagram, but an entity. In the Conceptual Model there are no tables, but entities and relationships. Be careful with this terminology when developing the Conceptual Model, or else you may get confused (I say this from experience).
However, yes, once decomposed, we will have a table from this relationship. When decomposed, N-N relationships will always become tables, no exception. Differently, 1-1 and 1-N (or N-1) relationships do not become tables. These tables with these special names (Join/Link/Intermediate Tables) serves to associate records from different tables, hence the name.
And is it necessary to have a column called Price Range Id? I mean
what is it there for?
At where? If you say at the Price Ranges entity, it is rather necessary. Must We not identify records in a table in some way? Here I set what is called a Surrogate Key. If on the other hand, you have a column with unique values for each record in the table, you can also use this column. I highly recommend that you consider the use of surrogate keys. Read the link I gave you.
In the Conceptual Model, we have to define the properties and also the primary keys. During the phase of the conceptual model, natural attributes of entities can become primary keys if you so desire. In this case, we have what is called a Natural Key.
If on the other hand you refer to Price Ranges from Companies entity, so the question is another ("And is it necessary to have a column called Price Range Id?"). Here we have a table with two columns, as I told you. The two are foreign keys. You need it so you can relate rows from the two tables... I think you were not referring to that, is not it? If so, no problem, you can comment and ask more questions. I do not care to answer. To be honest, I did not quite understand your question.
EDIT 2
So that Company 28 can be identified in the Price Ranges (for instance
ID 40) Which would make it easier to call out the price ranges it has?
Maybe my English is not very good, but it seems to me that you have a beginner's doubt/question in relation to the concept of tables and relationships between them. If not that, I apologize because maybe I did not understand. But let's see...
The tables in a database have rows / records. Each line has its own data. Even with this, each line / record needs to be differentiated and identified somehow. That is why we attach to each line an identifier, known as the primary key (this, and this). In summary, the primary key is how we identify, differentiate, separate and organize different records.
Even if all records have different values, you must select a field (column) that represents the primary key of the table. By obligation, every record MUST have a primary key. Although you can choose which field is a primary key, you are allowed to choose one or more fields to serve as the primary key. When this happens, that is, when more than one field participates/serves as the primary key, we have a table with something called Composite Primary Key. Similarly, it has the ability to identify records. Note that, because of that, primary key values must be unique, otherwise you may have 2 identical records.
This is the basic concept so that we can relate tables to each other, in case, records/rows of tables together. If we have a Company identified by the ID 28 (a line/record), and we want to relate it to a Price Range identified by the ID 40, then we need to store somewhere that relationship (28 <--> 40). This is where the role of intermediate/link/join tables comes in (but only to relationships N-N! For 1-N or N-1 relationships it works similarly, but not identical).
My original question was whether it was necessary, and why a company
ID had to link up with a price range ID at all.
With this table storing records which relates to other records (for their primary keys), we can perform a SQL join operation (If you have questions about this, see this image). Depending on how you perform this operation, you'll get:
All companies that have Price Ranges.
All companies that do not have Price Ranges.
All the Price Ranges of a given company.
All companies that have or not a X Price Range.
All price ranges that are given or not to companies.
...
Anyway, you get all this because of the established relationship.
If it could just be taken out and then the table of price ranges would
only involve Pricerange1-5.
This sentence I did not understand. What should be taken out? Could you please explain this sentence better?
I have different providers which passes me an excel with different cities, in each city they use some special code for their operations and more data useful to my business.
The problem is that I have a mess with all these cities:
I have my own cities in my database, around 9000 records.
Provider A gives me his excel or webservice to get around 6000.
Provider B gives me another 5000.
Provider C ... etc
Some of the cities given by my providers are already in my database and I only have to update the required data I need.
Otherwise, I have to insert that new city in my database.
And this, each time a provider gives me an update of these cities.
Well, the main problem is that I call a city differently from them, and they differently from each other... how to know if I already have that city or I have to create a new one since we use different names?
The way I see it, I only can achieve it manually. Comparing their cities with mines.
Of course, it's too much work so I made my own script, and implementing the levehnstein function for the database, I can automatically see the more coincident ones and select them by a click. The script does the rest (updates their special operation code for that city into my corresponding city stored in my database).
Even with it, I still feel like I'm missing something. If there was an unicode for those cities this would be much easier and automatic, but I don't have any code which identifies these cities more than my table identifier. Same for my providers, despite some of the use to provide me the postal code among the cities their provide, but not all.
Is there any better solution than mine for this? Any universal code that you usually use or any other aproatch?
Edit:
Well, each city belongs to a country. Of course, I'm considering that.
In my city table I have an Id for each destination, and then a column for the operation code of each provider (I know, this could be better represented with a relationship more), plus country code, zip, url for seo...
Respecting the solution mentioned by MagnusL, creating a Synonyms table, why would I need to store the synonyms? Regarding the script you mentioned with levehnstein and human interaction, that's exactly what I'm currently doing:
With each record provided by a provider and my destinations table. Given a provider city record, I'm showing the more coincident ones from my table.
But before this, I automatically link all those which are coincident in zip code and country.
It's a lot of work for updating my providers special operation code for each city. I am just curious about how people deal with this problem, I'm sure a lot of developers have to face this at some point.
If it is important that the cities are correctly matched, I would guess you must have some manual steps in your process. If you include names of smaller towns you will some day encounter that the same name could actually be two different places in two different countries. (Try Munich on Google Maps and you get one in Germany and one in North Dakota.)
A somewhat complicated, but I guess future proof, workflow is to use id numbers in place of city names in your main data table. Then set up a locations table with those id numbers as primary keys and your preferred name of the city followed by as many meta data columns as required for country code, zip code, WGS84 coordinates, continent name, whatever. Add another table for city name synonyms, with just id numbers and names (without UNIQUE constraint on the id column).
Let your import script try to match the city with help from as many meta data as possible (probably different meta data from different providers), together with the Levehnstein algorithm you mentioned, and let it be clever enough to ask for human interaction in those cases where no one or more than one city are matched. It can of course show you the closest possible guesses, so you can pick the right one and have it stored in the synonym table.
(Yes, it is a lot of coding to get there. If you find it worth it or not depends on how often you do these updates.)
Tip: Wikipedia has articles with different names on cities, i.e. https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages
What if you used an extra table for name translation?
IE, the table would have 2 columns: column A the name you use, column B, the name a provider uses. You might need to do adapt this table manually, to look like:
Bruxelles:Brussels
Bruxelles:Brussel
Bruxelles:Bruxelles
While importing, for the name of the city you would then use
select A where B = Brussels
In your agglomerated database, names would then be consistent.
I am working on web mapping application and i am facing issue, scenario is that.
User can post address and address can be in any format like
Street, City,State, Country or Country Street State City
I have mentioned just two format but it would be in any format.
My task is extract City Name, Street, Country from address, problem is that multiple city name, street may be exist so how can i do this.
I have all information about locations in database like city,country,street,area code.
I don't believe there is an easy way to do what you want here. It seems the user can give the data in such a way that it is basically free form and there is no way to distinguish from the input data what is a street name vs what is a city name ect. Unless you enforce some sort of format nothing is going to work every time.
A different approach may be to take the input remove such things as "St" and "Street" ect and then search the database for each of the given names against city, street and county ect. From the results you will probably be able to determine what is the most likely address and get the user to confirm.
A lot of government websites appear to use the approach I have just given you above when registering for things. (i.e. Voting) It is not perfect however.