How to model country, state and cities using Neo4j - neo4j

I'm a building a registration form for my website(it is using Neo4j) and need to populate the country, state and city field. All these fields are inter-linked i.e depending on country, state field will be set and depending on state city will be set. I'm trying to figure out what's the best approach to model this using Neo4j. Do I need to create nodes for each country, state and city, and then create relationships between all of them? For instance, Detroit - belongs to - Michigan - belongs to - United States. What would be the best approach to handle this in Neo4j? Are there any examples to look at ? Would it be efficient to do this in Neo4j ? Or is it better to use a document based DB for that such as MongoDB?

I don't see any reason you can't do what you suggested, creating nodes for City, State, Country and wiring them up (I'm planning on doing this exact same thing with my upcoming project). This also lets you reuse those nodes in other parts of your graph, potentially allowing you to make interesting queries using common locations at faster speeds than property comparisons.
If I understand your requirements correctly, you'll have dropdowns or autocomplete fields or similar to drill down to each level (populate dropdown with countries -> populate next dropdown with states in the selected country -> populate last drop down with cities in the selected state). Just add indexes on identifier or abbreviation for quick node lookup and you're good, it should work quite fast.
If you're adding zip codes in there, that could be tricky, as you can't really model it in the same way. You'll have one-to-many relationships from both state and city to zip, and unless I'm mistaken there are a few interesting zips which can span more than a single state and/or city. Some other factors that can complicate things include 5 vs 9 digit zips (or more for other countries), and handling of zip-equivalents in other countries, as they may adhere to different logic.

Related

Best Way to Store Contextual Attributes in Core Data?

I am using Core Data to store objects. What is the most efficient possibility for me (i.e. best execution efficiency, least code required, greatest simplicity and greatest compatibility with existing functions/libraries/frameworks) to store different attribute values for each object depending on the context, knowing that the contexts cannot be pre-defined, will be legion and constantly edited by the user?
Example:
An Object is a Person (Potentially =Employer / =Employee)
Each person works for several other persons and has different titles in relation to their work relationships, and their title may change from one year to another (in case this detail matters: each person may also concomitantly employ one or several other persons, which is why a person is an employee but potentially also an employer)
So one attribute of my object would be “Title vs Employer vs Year Ended”
The best I could do with my current knowledge is save all three elements together as a string which would be an attribute value assigned to each object, and constantly parse that string to be able to use it, but this has the following (HUGE) disadvantages:
(1) Unduly Slowed Execution & Increased Energy Use. Using this contextual attribute is at the very core of my prospective App´s core function (so it would literally be used 10-100 times every minute). Having to constantly parse this information to be able to use it adds undue processing that I’d very much like to avoid
(2) Undue Coding Overhead. Saving this contextual attribute as a string will unduly make additional coding for me necessary each time I’ll use this central information (i.e. very often).
(3) Undue Complexity & Potential Incompatibility. It will also add undue complexity and by departing from the expected practice it will escape the advantages of Core Data.
What would be the most efficient way to achieve my intended purpose without the aforementioned disadvantages?
Taking your example, one option is to create an Employment entity, with attributes for the title and yearEnded and two (to-one) relationships to Person. One relationship represents the employer and the other represents the employee.
The inverse relationships are in both cases to-many. One represents the employments where the Person is the employee (so you might name it employmentsTaken) and the other relationship represents the employments where the Person is the Employer (so you might name it employmentsGiven).
Generalising, this is the solution recommended by Apple for many-many relationships which have attributes (see "Modelling a relationship based on its semantics" in their documentation).
Whether that will address all of the concerns listed in your question, I leave to your experimentation: if things are changing 10-100 times a minute, the overhead of fetch requests and creating/updating/deleting the intermediate (Employment) entity might be worse than your string representation.

Best Practice: Managing string list with entity framework and migration

I've been searching for the best practice for managing/storing a list of strings in the database optional with entity framework, but migration should be supported.
E.g.
I have a list of city-names which I like to store in a new table. This table contains all available citys in my project.
I've a property City in a Address class, which address one of the citys in my City-Table.
1. Is it better to set a reference to the entry in the City-Table or apply and store the value in the Address - Table ?
2. What's the best practice for creating the City-Table? Generating a class City in my Model seems to be a little bit too much overhead, but where can I manage/create it if I like to reference the entries in the Address?
It all comes down to what your application will be able to do. If you are planning to do some operations with cities, it would be wise to have them as separeted entities. Otherwise, you would have to query the Address table, and group the cities (which will offer some troubles, like people entering "Moron" and "Morón" as cities, for example).

Identifying and relating cities from different sources

I have different providers which passes me an excel with different cities, in each city they use some special code for their operations and more data useful to my business.
The problem is that I have a mess with all these cities:
I have my own cities in my database, around 9000 records.
Provider A gives me his excel or webservice to get around 6000.
Provider B gives me another 5000.
Provider C ... etc
Some of the cities given by my providers are already in my database and I only have to update the required data I need.
Otherwise, I have to insert that new city in my database.
And this, each time a provider gives me an update of these cities.
Well, the main problem is that I call a city differently from them, and they differently from each other... how to know if I already have that city or I have to create a new one since we use different names?
The way I see it, I only can achieve it manually. Comparing their cities with mines.
Of course, it's too much work so I made my own script, and implementing the levehnstein function for the database, I can automatically see the more coincident ones and select them by a click. The script does the rest (updates their special operation code for that city into my corresponding city stored in my database).
Even with it, I still feel like I'm missing something. If there was an unicode for those cities this would be much easier and automatic, but I don't have any code which identifies these cities more than my table identifier. Same for my providers, despite some of the use to provide me the postal code among the cities their provide, but not all.
Is there any better solution than mine for this? Any universal code that you usually use or any other aproatch?
Edit:
Well, each city belongs to a country. Of course, I'm considering that.
In my city table I have an Id for each destination, and then a column for the operation code of each provider (I know, this could be better represented with a relationship more), plus country code, zip, url for seo...
Respecting the solution mentioned by MagnusL, creating a Synonyms table, why would I need to store the synonyms? Regarding the script you mentioned with levehnstein and human interaction, that's exactly what I'm currently doing:
With each record provided by a provider and my destinations table. Given a provider city record, I'm showing the more coincident ones from my table.
But before this, I automatically link all those which are coincident in zip code and country.
It's a lot of work for updating my providers special operation code for each city. I am just curious about how people deal with this problem, I'm sure a lot of developers have to face this at some point.
If it is important that the cities are correctly matched, I would guess you must have some manual steps in your process. If you include names of smaller towns you will some day encounter that the same name could actually be two different places in two different countries. (Try Munich on Google Maps and you get one in Germany and one in North Dakota.)
A somewhat complicated, but I guess future proof, workflow is to use id numbers in place of city names in your main data table. Then set up a locations table with those id numbers as primary keys and your preferred name of the city followed by as many meta data columns as required for country code, zip code, WGS84 coordinates, continent name, whatever. Add another table for city name synonyms, with just id numbers and names (without UNIQUE constraint on the id column).
Let your import script try to match the city with help from as many meta data as possible (probably different meta data from different providers), together with the Levehnstein algorithm you mentioned, and let it be clever enough to ask for human interaction in those cases where no one or more than one city are matched. It can of course show you the closest possible guesses, so you can pick the right one and have it stored in the synonym table.
(Yes, it is a lot of coding to get there. If you find it worth it or not depends on how often you do these updates.)
Tip: Wikipedia has articles with different names on cities, i.e. https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages
What if you used an extra table for name translation?
IE, the table would have 2 columns: column A the name you use, column B, the name a provider uses. You might need to do adapt this table manually, to look like:
Bruxelles:Brussels
Bruxelles:Brussel
Bruxelles:Bruxelles
While importing, for the name of the city you would then use
select A where B = Brussels
In your agglomerated database, names would then be consistent.

Database table design

I have a table of property listings. I need to add cities to these listings. Is it best practice to split a list of cities into it's own table?
I would like the user when adding a new property to be able to select from a list of cities.
By the way this is a Rails project.
A cities lookup table makes sense in this case.
This will also allow you to add more information for each city in the future, if needed.
If there is only one city per property, there is nothing terribly wrong with putting it in the properties table. If there are more, there is no good choice but to use a cities table.
Alternatively, if you want to pick the cities from a drop down list with no additions allowed, having a cities table may be a good idea. If you do that then you probably want to store the cityid not the city name in the property table. That way when someone changes the name of a city (which admittedly probably doesn't happen very often) you only have to change one record. Of course if you do have a cities table, you must have a foreign key and make sure city_id is indexed in the properties table to maintain your data integrity.
Yes. It is generally a best practice to normalize your database schema such that you are not repeating the same city names in multiple property listing records in your property listing table.
There are cases where you would want to denormalize for performance reasons. I would not consider your case to be one of these cases until it proves itself to be (i.e. table reads become very slow.) Even then, there are optimizations you could undertake prior to denormalizing your schema.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Resources