Normalize the following scenario - normalization

A college keeps details about a student and the various modules that the student has studied. These details compromise Registration number, Name, Address, Tutor Number, Tutor Name, Diploma Code, Diploma Name and repeating fields for module code and module name and result. Normalize the relation.

There's quite a few resource, like this and this - once you understand it, it'd simple... think in terms of sets of data and you won't go far wrong
Hint - most tables are obvious (answers in the question), you will also need a table that links two of them together

Related

Identifying and relating cities from different sources

I have different providers which passes me an excel with different cities, in each city they use some special code for their operations and more data useful to my business.
The problem is that I have a mess with all these cities:
I have my own cities in my database, around 9000 records.
Provider A gives me his excel or webservice to get around 6000.
Provider B gives me another 5000.
Provider C ... etc
Some of the cities given by my providers are already in my database and I only have to update the required data I need.
Otherwise, I have to insert that new city in my database.
And this, each time a provider gives me an update of these cities.
Well, the main problem is that I call a city differently from them, and they differently from each other... how to know if I already have that city or I have to create a new one since we use different names?
The way I see it, I only can achieve it manually. Comparing their cities with mines.
Of course, it's too much work so I made my own script, and implementing the levehnstein function for the database, I can automatically see the more coincident ones and select them by a click. The script does the rest (updates their special operation code for that city into my corresponding city stored in my database).
Even with it, I still feel like I'm missing something. If there was an unicode for those cities this would be much easier and automatic, but I don't have any code which identifies these cities more than my table identifier. Same for my providers, despite some of the use to provide me the postal code among the cities their provide, but not all.
Is there any better solution than mine for this? Any universal code that you usually use or any other aproatch?
Edit:
Well, each city belongs to a country. Of course, I'm considering that.
In my city table I have an Id for each destination, and then a column for the operation code of each provider (I know, this could be better represented with a relationship more), plus country code, zip, url for seo...
Respecting the solution mentioned by MagnusL, creating a Synonyms table, why would I need to store the synonyms? Regarding the script you mentioned with levehnstein and human interaction, that's exactly what I'm currently doing:
With each record provided by a provider and my destinations table. Given a provider city record, I'm showing the more coincident ones from my table.
But before this, I automatically link all those which are coincident in zip code and country.
It's a lot of work for updating my providers special operation code for each city. I am just curious about how people deal with this problem, I'm sure a lot of developers have to face this at some point.
If it is important that the cities are correctly matched, I would guess you must have some manual steps in your process. If you include names of smaller towns you will some day encounter that the same name could actually be two different places in two different countries. (Try Munich on Google Maps and you get one in Germany and one in North Dakota.)
A somewhat complicated, but I guess future proof, workflow is to use id numbers in place of city names in your main data table. Then set up a locations table with those id numbers as primary keys and your preferred name of the city followed by as many meta data columns as required for country code, zip code, WGS84 coordinates, continent name, whatever. Add another table for city name synonyms, with just id numbers and names (without UNIQUE constraint on the id column).
Let your import script try to match the city with help from as many meta data as possible (probably different meta data from different providers), together with the Levehnstein algorithm you mentioned, and let it be clever enough to ask for human interaction in those cases where no one or more than one city are matched. It can of course show you the closest possible guesses, so you can pick the right one and have it stored in the synonym table.
(Yes, it is a lot of coding to get there. If you find it worth it or not depends on how often you do these updates.)
Tip: Wikipedia has articles with different names on cities, i.e. https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages
What if you used an extra table for name translation?
IE, the table would have 2 columns: column A the name you use, column B, the name a provider uses. You might need to do adapt this table manually, to look like:
Bruxelles:Brussels
Bruxelles:Brussel
Bruxelles:Bruxelles
While importing, for the name of the city you would then use
select A where B = Brussels
In your agglomerated database, names would then be consistent.

Performance issues with complex nested RoR reservation system

I'm designing a Ruby on Rails reservation system for our small tour agency. It needs to accommodate a number of things, and the table structure is becoming quite complex.
Has anyone encountered a similar problem before? What sort of issues might I come up against? And are performance/ validation likely to become issues?
In simple terms, I have a customer table, and a reservations table. When a customer contacts us with an enquiry, a reservation is set up, and related information added (e.g., paid/ invoiced, transport required, hotel required, etc).
So far so good, but this is where is gets complex. Under each reservation, a customer can book different packages (e.g. day trip, long tour, training course). These are sufficiently different, require specific information, and are limited in number, such that I feel they should each have a different model.
Also, a customer may have several people in his party. This would result in links between the customer table and the reservation table, as well as between the customer table and the package tables.
So, if customer A were to make a booking for a long trip for customers A,B and C, and a training course for customer B, it would look something like this.
CUSTOMERS TABLE
CustomerA
CustomerB
CustomerC
CustomerD
CustomerE
etc
RESERVATIONS TABLE
1. CustomerA
LONG TRIP BOOKINGS
CustomerA - Reservation_ID 1
CustomerB - Reservation_ID 1
CustomerC - Reservation_ID 1
TRAINING COURSE BOOKINGS
CustomerB - Reservation_ID 1
This is a very simplified example, and omits some detail. For example, there would be a model containing details of training courses, a model containing details of long trips, a model containing long trip schedules, etc. But this detail shouldn't affect my question.
What I'd like to know is:
1) are there any issues I should be aware of in linking the customer table to the reservations model, as well as to bookings models nested under reservations.
2) is this the best approach if I need to handle information about the reservation itself (including invoicing), as well as about the specific package bookings.
On the one hand this approach seems to be complex, but on the other, simplifying everything into a single package model does not appear to provide enough flexibility.
Please let me know if I haven't explained this issue very clearly, I'm happy to provide more information. Grateful for any ideas, suggestions or comments that would help me think through this rather complex database design.
Many thanks!
I have built a large reservation system for travel operators and wholesalers, and I can tell you that it isn't easy. There seems to be similarity yet still large differences in the kinds of product booked. Also, date-sensitivity is a large difference from other systems.
1) In respect to 'customers' I have typically used different models for representing different concepts. You really have:
a. Person / Company paying for the booking
b. Contact person for emergencies
c. People travelling
a & b seem like the same, but if you have an agent booking, then you might want to separate them.
I typically use a => 'customer' table, then some simple contact-fields for b, and finally for c use a 'passengers' table. These could be setup as different associations to the same model, but I think they are different enough, and I tend to separate them - perhaps use a common address/contact model.
2) I think this is fine, but depends on your needs. If you are building up itineraries for a traveller, then it makes sense to setup 'passengers' on the 'reservation', then for individual itinerary items, with links to which passenger is travelling on/using that item.
This is more complicated, and you must be careful to track dependencies, but the alternative is to not track passenger names, and simply assign quantities to each item (1xAdult, 2xChildren). This later method is great for small bookings, so it seems to depend on if your bookings are simple, or typically built up of longer itineraries.
other) In addition, in respect to different models for different product types, this can work well. However, there tends to be a lot of cross over, so some kind of common 'resource' model might be better -- or some other means of capturing common behaviour.
If I haven't answered your questions, please do ask more specific database design questions, or I can add more detail about specific examples of what I've found works well.
Good luck with the design!

Rails model structures

I saw some threads on this already on Stack, but wanted a little more clarification.
I have seen many apps where there is a product model and category category model. This is a has and belongs to many association, or a has_many through association.
I have also seen many apps where there is a user model and an email_address model. Email_address belongs to user, but user can have many email addresses.
My question is, would there ever be a situation where you can lump all the email addresses or categories into the user and product models, respectively? So in your user model, you will have email_one, email_two, etc?
What are the pros and cons of breaking it into different models? Thanks.
If the attribute is simple, it's almost certainly best to keep it in a single model - you can even serialize the attribute so that it takes, for example, and array of email_addresses. BUT (big but) you may well want to add a lot more information to an email address - which one is the primary one, when was it last profiled, email last sent to .. etc etc. This of course is much easier to handle if you have a separate email address model. So perhaps the question is really 'when should i use serialized attributes?'. My own answer would be 'only if I'm sure that I am storing something in that field that I never want to add further attributes to'. Usually that means it is something pretty peripheral to the main application, and about which no-one cares very much ...

Tables representing an enumerated list of codes in Rails?

I've looked through similar questions but I'm still a little perplexed about something that seems to be a simple case in Rails.
Let's say I have a model class called Employee. One attribute of an employee is their security_clearance, which is a string that can be None, Some, or Full. No other values are valid. In other applications I'd probably represent this an Employees table that has a foreign key to the SecurityClearances table, which has exactly three rows. The SecurityClearances table has columns entitled code (e.g. SEC_CLEARANCE_NONE, SEC_CLEARANCE_SOME, ...) and value ("None", "Some", "Full").
How do I want to do this in Rails? Do I want has_one :security_clearance on Employee and belongs_to :employee on SecurityClearance? That doesn't seem to be quite right.
It seems nonoptimal to type out the string literals of None, Some, and Full everywhere, especially since the values to be displayed could change (for example, perhaps the string for the Some code will be change to be low clearance instead).
Update:
Now that I think about this some more, don't I really just want a belongs_to :security_clearance on Employee? That would do the trick, right? Employees need to know what their security clearance levels are, but security clearance levels have no tie to a particular employee.
Take a look at this plugin: http://github.com/mlightner/enumerations_mixin/tree/master
It allows you to define this like has_enumerated :security_clearance, besides also caching the SecurityClearance model, etc.
Without the plugin, though, you're right about the relationships.
Also check out the Enum Fields plugin from the GiraffeSoft folks:
http://giraffesoft.ca/blog/2009/02/17/floss-week-day-2-the-enum-field-rails-plugin.html

Rails best practices - How do you handle unique users that may have identical records?

How do you handle real name conflicts? Is there an established best practice or UI design pattern for disambiguating records like this? If authors can have many articles but more than one author can possibly have the same name how would you enable users to select the author they actually want when creating articles?
I can't dictate the author names be unique. The authors may have some other information that could individuate them (their articles or other optional fields).
To make this clearer - users are not authors. Users are people entering information about authors and articles. The only guaranteed information present for an author is the author's name. Other details are optional.
So if a user is creating a new record for an article they will have to either select or create an author for the many-to-many relationship between authors & articles.
With unambiguous rails examples such as the blog post category dropdown, like ryan bates uses in his railscasts, it is easy to create or update. If it exists link the blog post to it, if it doesn't then create and link the blog post to it.
My case is much messier. If it exists isn't that meaningful but I don't want to create a separate author entry for every article the author does.
Presumably you have a key that means you know which user authored which records, so it comes down to how you can best disambiguate them for your users.
Perhaps you need to ask your authors for a brief summary of themselves in their profile that you can use to disambiguate them on their terms. Alternatively depending on the type of article you might choose to describe them in terms of geography ("John Biggs, Florida", "John Biggs, California" ) or perhaps by the subject areas they choose to write about: "John Biggs, Java Expert", "John Biggs, Indonesia Specialist" and so on.
You could even just have "John Biggs (1)", "John Biggs (2)" and so on. I seem to recall this works alright for IMDB, who are a good example of a site that has had to sort this problem.
The important thing in usability where these types of thing are concerned is consistency- you need to always identify your authors in the same way so you don't have "John Biggs, Florida" and "John Biggs (2)" and you need to make sure that the identity you give to an author doesn't change once it is set up, so "John Biggs (2)" never becomes "John Biggs (5)" and your users can identify them whenever they see the disambiguated name as the same person who had that name previously.
One thing that worked for me on a past project is to have a text box in which users can type in the author's name. As they type, I update a div with possible matches - similar to Stack Overflow when you type a tag in the ignored or interesting box.
Users can click on a name in the div which opens the record in a new window - new window has a button, "select this author," which takes you back to the original page with that author in the textfield as Author Name (id).
If they submit the form with an ambiguous name, we have an extra step where we display matches, and they choose which one they mean.
I imagine you'd want something a little more streamlined if this is a data-entry type application, but on that project adding an author was an infrequent operation.
Several things to think about:
Can you filter by subject matter first?
For instance if John Jones (1) writes articles about genetics and John Jones (2) writes articles about computer networking, bu having the user select the general subejct matter first, you may be able to filter out many of the less applicable possible duplicate names.
(I would however have a button to see the unfiltered list becasue sometimes people write arrticles in a new subject matter). If you don't want to limit the choices perhaps a sort by subject matter or location could make it easier to find the right one.
When you show the list of possible duplicate names, show general information about the author including address and university affiliation and possibly the name of one article. Have a button to click on to show existing articles for any one of them. That way if you know the John Jones you want is located in FL, you only need to check out the three in Fl for articles not all 37 John Jones who wrote genetics articles.
Be aware that users are often lazy, they would rather just insert a new name than choose from a long list of existing names. So make it harder to insert a new name than to pick one. They have to go through the pick process first before they can enter a new name. We have an application which doesn't even show the button to add a new person until after you have done a search. Since names can have variations consider if you want to use fuzzy logic for your search. You might want to display J. Jones, Johnny Jones and Jon Jones as well as John Jones in your pick results.
Now a lot of this depends on how much knowledge your users have about the author ahead of time. If they know nothing beyond the name, they have no basis to judge between the 37 John Jones you have in the database. In this case it might be better just to accept the duplicates and return results based on a filtering by keywords or whatever you are storing about the article. Is it really necessary to make sure that the articles are ascribed to the correct John Jones, if you really know nothing about the author other than his name? Are you more concerned with the subject matter and name of the article or with having a list of all articles written by John Jones from UVA who is a professor of Political Science?
You don't! Names are a bad method of identification as you're finding out. You have a number of methods around this:
Add some form of unique identifier with normal users this would be a username to check for uniqueness. In your case, the method described above name(1) might have to do, if you really have no other information other than the name.
An alternative would be to use multiple attributes to make a composite key (e.g. name + dob)

Resources