How to get only cities in geonames - geonames

I want to get cities from geonames dump.
I tried to use cities1000 dump, but it includes districts of cities too. For example there is Bronx in this dump, but the city is only New York.
I tried to select cities from allCountries dump using feature code. But there is no city code. The city can be PPL, PPLA or other.
What is the algorithm to get the cities only?

You have to filter the entries by "PPL" feature code.

Related

How to model country, state and cities using Neo4j

I'm a building a registration form for my website(it is using Neo4j) and need to populate the country, state and city field. All these fields are inter-linked i.e depending on country, state field will be set and depending on state city will be set. I'm trying to figure out what's the best approach to model this using Neo4j. Do I need to create nodes for each country, state and city, and then create relationships between all of them? For instance, Detroit - belongs to - Michigan - belongs to - United States. What would be the best approach to handle this in Neo4j? Are there any examples to look at ? Would it be efficient to do this in Neo4j ? Or is it better to use a document based DB for that such as MongoDB?
I don't see any reason you can't do what you suggested, creating nodes for City, State, Country and wiring them up (I'm planning on doing this exact same thing with my upcoming project). This also lets you reuse those nodes in other parts of your graph, potentially allowing you to make interesting queries using common locations at faster speeds than property comparisons.
If I understand your requirements correctly, you'll have dropdowns or autocomplete fields or similar to drill down to each level (populate dropdown with countries -> populate next dropdown with states in the selected country -> populate last drop down with cities in the selected state). Just add indexes on identifier or abbreviation for quick node lookup and you're good, it should work quite fast.
If you're adding zip codes in there, that could be tricky, as you can't really model it in the same way. You'll have one-to-many relationships from both state and city to zip, and unless I'm mistaken there are a few interesting zips which can span more than a single state and/or city. Some other factors that can complicate things include 5 vs 9 digit zips (or more for other countries), and handling of zip-equivalents in other countries, as they may adhere to different logic.

Can I Order Results Using Two Columns from Different Tables?

I have a Rails application featuring a city in the US. I'm working on a database process that will feature businesses that pay to be on the website. The goal is to feature businesses within an hour's drive of the city's location in order to make visitors aware of what is available. My goal is to group the businesses by city where the businesses in the city are listed first then the businesses from the next closest city are displayed. I want the cities to be listed by distance and the businesses within the city group to be listed by the business name.
I have two tables that I want to join in order to accomplish this.
city (has_many :businesses) - name, distance
business (belongs_to :city) - name, city_id, other columns
I know I can do something like the statement below that should only show data where business rows exist for a city row.
#businesses = City.order(“distance ASC").joins('JOIN businesses ON businesses.city_id = cities.id')
I would like to add order by businesses.name. I've seen an example ORDER BY a.Date, p.title which referencing columns from two databases.
ORDER BY a.Date, p.title
Can I add code to my existing statement to order businesses by name or will I have to embed SQL code to do this? I have seen examples with other databases doing this but either the answer is not Rails specific or not using PostgreSQL.
After lots more research I was finally able to get this working the way I wanted to.
Using .joins(:businesses) did not yield anything because it only included the columns for City aka BusinessCity and no columns for Business. I found that you have to use .pluck or .select to get access to the columns from the table you are joining. This is something I did not want to do because I foresee more columns being added in the future.
I ended up making Business the main table instead of BusinessCity as my starting point since I was listing data from Business on my view as stated in my initial question. When I did this I could not use the .joins(:business_cities) clause because it said the relation did not exist. I decided to go back to what I had originally started with using Business as the main table.
I came up with the following statement that provides all the columns from both tables ordered by distance on the BusinessCity table and name on the Business table. I was successful in added .where clauses as needed to accommodate the search functionality on my view.
#businesses = Business.joins("JOIN business_cities ON business_cities.id = businesses.business_city_id").order("business_cities.distance, businesses.name")

Identifying and relating cities from different sources

I have different providers which passes me an excel with different cities, in each city they use some special code for their operations and more data useful to my business.
The problem is that I have a mess with all these cities:
I have my own cities in my database, around 9000 records.
Provider A gives me his excel or webservice to get around 6000.
Provider B gives me another 5000.
Provider C ... etc
Some of the cities given by my providers are already in my database and I only have to update the required data I need.
Otherwise, I have to insert that new city in my database.
And this, each time a provider gives me an update of these cities.
Well, the main problem is that I call a city differently from them, and they differently from each other... how to know if I already have that city or I have to create a new one since we use different names?
The way I see it, I only can achieve it manually. Comparing their cities with mines.
Of course, it's too much work so I made my own script, and implementing the levehnstein function for the database, I can automatically see the more coincident ones and select them by a click. The script does the rest (updates their special operation code for that city into my corresponding city stored in my database).
Even with it, I still feel like I'm missing something. If there was an unicode for those cities this would be much easier and automatic, but I don't have any code which identifies these cities more than my table identifier. Same for my providers, despite some of the use to provide me the postal code among the cities their provide, but not all.
Is there any better solution than mine for this? Any universal code that you usually use or any other aproatch?
Edit:
Well, each city belongs to a country. Of course, I'm considering that.
In my city table I have an Id for each destination, and then a column for the operation code of each provider (I know, this could be better represented with a relationship more), plus country code, zip, url for seo...
Respecting the solution mentioned by MagnusL, creating a Synonyms table, why would I need to store the synonyms? Regarding the script you mentioned with levehnstein and human interaction, that's exactly what I'm currently doing:
With each record provided by a provider and my destinations table. Given a provider city record, I'm showing the more coincident ones from my table.
But before this, I automatically link all those which are coincident in zip code and country.
It's a lot of work for updating my providers special operation code for each city. I am just curious about how people deal with this problem, I'm sure a lot of developers have to face this at some point.
If it is important that the cities are correctly matched, I would guess you must have some manual steps in your process. If you include names of smaller towns you will some day encounter that the same name could actually be two different places in two different countries. (Try Munich on Google Maps and you get one in Germany and one in North Dakota.)
A somewhat complicated, but I guess future proof, workflow is to use id numbers in place of city names in your main data table. Then set up a locations table with those id numbers as primary keys and your preferred name of the city followed by as many meta data columns as required for country code, zip code, WGS84 coordinates, continent name, whatever. Add another table for city name synonyms, with just id numbers and names (without UNIQUE constraint on the id column).
Let your import script try to match the city with help from as many meta data as possible (probably different meta data from different providers), together with the Levehnstein algorithm you mentioned, and let it be clever enough to ask for human interaction in those cases where no one or more than one city are matched. It can of course show you the closest possible guesses, so you can pick the right one and have it stored in the synonym table.
(Yes, it is a lot of coding to get there. If you find it worth it or not depends on how often you do these updates.)
Tip: Wikipedia has articles with different names on cities, i.e. https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages
What if you used an extra table for name translation?
IE, the table would have 2 columns: column A the name you use, column B, the name a provider uses. You might need to do adapt this table manually, to look like:
Bruxelles:Brussels
Bruxelles:Brussel
Bruxelles:Bruxelles
While importing, for the name of the city you would then use
select A where B = Brussels
In your agglomerated database, names would then be consistent.

How can extract location from text

I am working on web mapping application and i am facing issue, scenario is that.
User can post address and address can be in any format like
Street, City,State, Country or Country Street State City
I have mentioned just two format but it would be in any format.
My task is extract City Name, Street, Country from address, problem is that multiple city name, street may be exist so how can i do this.
I have all information about locations in database like city,country,street,area code.
I don't believe there is an easy way to do what you want here. It seems the user can give the data in such a way that it is basically free form and there is no way to distinguish from the input data what is a street name vs what is a city name ect. Unless you enforce some sort of format nothing is going to work every time.
A different approach may be to take the input remove such things as "St" and "Street" ect and then search the database for each of the given names against city, street and county ect. From the results you will probably be able to determine what is the most likely address and get the user to confirm.
A lot of government websites appear to use the approach I have just given you above when registering for things. (i.e. Voting) It is not perfect however.

Location based information

I would like to show information on my website based on user's geography. In my current design would not want the user to enter their location/zip code.
Using IP I can find user's location but how do i leverage this information to show relevant events/information from surrounding cities/town.
Thanks
Based on IPs, you only have a certain accuracy with showing location. You should have a option that lets them enter their city/state or zip code.
Once you have long/lat, you just need to run a query to find records in your database a specific distance from that long/lat.
PHP/MySQL: Select locations close to a given location from DB
http://www.ipinfodb.com/ip_location_api.php
this will send you an xml response with the pertinent information
Lets say you can get their City from the IP address. You would need a database of cities with ID's that would pertain to other database entries. Like:
database table cities database table restaurants
--------------------- --------------------------
ID City ID city_id name
1 Los Angelos 1 1 Big Al's
Then you could search for restaurants that have the city_id of the city you got from their IP.
There are so many different approaches to relational databases. This is just one small example.

Resources