Querying Geonames for Continent Code - geonames

I'm looking to find out what endpoint and what parameters I'd have to pass to do a search for a continent and obtain results that include the continent code.
For example, if I said to it "North America", I'd like to be able to see in the results "NA".

I know this isn't exactly what you're asking, but it might help lead you towards it. The way to find only continents is to use the search criteria's setFeatureCode("CONT"); and when you're accessing each Toponym in the search results, you need to setStyle(Style.FULL); to be able to see the continent code.

Related

Foursquare API - Search By Address

I am using the venues/search API in my app and I am getting some strange results: https://developer.foursquare.com/docs/venues/search.
If I send in the query "1 Irving", as if a user is searching for an address, the list of results returned by Foursquare contains irrelevant venues. From looking at the documentation, I would guess that this is because the "query" parameter of the API is only searching against venue names, and not addresses.
If that is the case, does anyone know if there is any way to get the API to search against address information also? It seems the Foursquare and Swarm apps both do this when searching, as the results for "1 Irving" are much more relevant when I try there.
Edit: including screenshot from Foursquare app
In the venues/search documentation it mentions that if you set intent=match you can include an address parameter.
Finds venues that are are nearly-exact matches for the given
parameters. This intent is highly sensitive to the provided location.
We recommend using this intent only when trying to correlate an
existing place database with Foursquare's. The results will be sorted
best match first, taking distance and spelling mistakes/variations
into account.
query and ll are the only required parameters for this intent, but
matching also supports phone, address, city, state, zip, and twitter.
There's no specified format for these parameters—we do our best to
normalize them and drop them from the search if unsuccessful.
However you still need to provide a query parameter that would be the venue name. I don't believe an endpoint exists to lookup a venue with nothing but an address.
Just from taking a quick look at the docs, it seems like you can just pass the location in as the near object rather than the query object. The docs specifically say "A search term to be applied against venue names."

How to implement fuzzy search

I'm using Neo4j 3 REST API and i have node named customer it has properties like name etc i need to get search results of name of customer eg i should get results for name "john" for my input "joan".how to implement fuzzy search to get my desired results.
Thanks in advance
First off, I want to make that you know that if you're using Neo4j 3.x that 3.x is currently in beta and isn't considered stable yet.
You have two options to implement a fuzzy search in Neo4j. You can use the legacy indexes to implement Lecene-based indexing. That should provide anything that Lucene can do, though you'd probably need to do a bit more work. You can also implement your own unmanaged extension which will allow you to use Lucene a bit more directly.
Perhaps the easier alternative is to use elasticsearch with Neo4j and have elasticsearch do your full-text indexing. You might take a look at the Neo4j and ElasticSearch page on neo4j.com. There they provide a link to a GitHub repository which is a plugin for Neo4j which automagically updates ElasticSearch with data from Neo4j and which provides and endpoint for querying your graph fuzzily. There is also a video tutorial on how to do this.
You will have to try using https://neo4j.com/developer/kb/how-to-perform-a-soundex-search/ which in this case will work. If your input is Joan you will not get John as the response, unless you just give jo as input in which you will get both. To get what you are expecting you will have to use the soundex search.
Stepping back a little, what is the problem you are trying to solve with fuzzy matching?
My experience has been that misspellings and typos are far less common than you might think, and humans prefer exact matches whenever possible. If there is no exact match (often just missing a space between words), that's a good time to use a spellchecker, and that's where the fuzzy matching should kick in.
In addition, your example would match "joan" to "john", but some synonyms like "joanie" would be more useful. If you have a big corpus of content to work with, you may be able to extract some relationships, using fuzzy & machine learning to identify "joanne" and "joni" as possible synonyms and then submit that to a human curator. "Jon" looks like a related name but it's not, while "jo" and even "nonie" may or may not be nicknames in these groupings.

Parsing Wikipedia countries, regions, cities

Is it possible to get a list of all Wikipedia countries, regions and cities with relations between them? I couldn't find any API appropriate for this task.
What is be the easiest way to parse all the information I need?
PS: I know, that there are another datasources I can get this information from. But I am interested in Wikipedia...
[2020 update] this is now best done using the Wikidata Query Service, you can run super specific queries with a bit of SPARQL, example: Find all countries and their label. See Wikidata Query Help
It might be a bit tedious to get the whole graph but you can get most of the data from the experimental/non-official Wikidata Query API.
I suggest the following workflow:
Go to an instance of the kind of entities you want to work with, say Estonia (Q191) and look for its instance of (P31) properties, you will find: country, sovereign state, member of the UN, member of the EU, etc.
Use the Wikidata Query API claim command to output every entity
that as the chosen P31 property. Lets try with country (Q6256):
http://wdq.wmflabs.org/api?q=claim[31:6256]
It outputs an array of numeric ids: that's your countries! (notice that the result is still incomplete as there are only 141 items found: either countries are missing from Wikidata, or, as suggested by Nemo in comments, some countries are to be found in country (Q6256) subclasses(P279))
You may want more than ids though, so you can ask Wikidata Official API for entities data:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q16&format=json&props=labels|claims&languages=en|fr
(here Canada(Q16) data, in json, with only claims and labels data, in English and French. Look at the documentation to adapt parameters to your needs)
You can query multiple entities at a time, with a limit of 50, as follow:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q16|Q17|Q20|Q27|Q28|Q29|Q30|Q31|Q32|Q33|Q34|Q35|Q36|Q37|Q38|Q39|Q40|Q41|Q43|Q45|Q77|Q79|Q96|Q114&format=json&props=labels|claims&languages=en|fr
From every countries data, you could look for entities registered as administrative subdivisions (P150) and repeat on those new entities.
Aternatively, you can get all the tree of administrative subdivisions with the tree command. For instance, for France(Q142) that would be http://wdq.wmflabs.org/api?q=tree[142][150] Tadaaa, 36994 items! But that's way harder to refine given the different kinds of subdivision you can encounter from a country to another. And avoid doing this kind of query from a browser, it might crash.
You now just have to find cities by countries by refining this last query with the claim command, and the appropriate sub-class(P279) of municipality(Q15284) entity (all available here): for France, that's commune (Q484170), so your request looks like
http://wdq.wmflabs.org/api?q=tree[142][150] AND claim[31:484170]
then repeat for all the countries: have fun!
You should go with Wikidata and/or dbpedia.
Personally I'd start with Wikidata as it's directly using MediaWiki, with the same API so you can use similar code. I would use pywikibot to get started. Like that you can still request pages from Wikipedia where that makes sense (e.g. list pages or categories).
Here's a nice overview of ways to access Wikidata

What's the best way to lookup the US county a US city resides in?

I'm looking for the best/easiest way to programmatically grab the name of the US county a given US city resides in. It doesn't seem there's a straightforward API available for such a (seemingly simple) task?
You can download a freely-available database of county/city/zip code info such as this one:
http://www.unitedstateszipcodes.org/zip-code-database/ (no need to register or pay)
Import it whole, or a subsection of it, into a local, persistent data store (such as a database) and query it whenever you need to look up a city's county
Note: County info has disappeared from the originally-linked .csv file since this answer was posted.
This link no longer contains county information: http://federalgovernmentzipcodes.us/free-zipcode-database.csv
1) Cities span counties
2) Zips span both cities and counties, not even on the same lines
Any solution that uses zip as an intermediary is going to corrupt your data (and no, "zip+4" won't usually fix it). You will find that a city-to-zip-to-county data map (#2) has a larger number of city-to-county matches than the more accurate model (#1)--these are all bad matches.
What you're looking for is free census data. The Federal Information Processing Standards (FIPS) dataset you need is called "2010 ANSI Codes for Places": https://www.census.gov/geographies/reference-files/time-series/geo/name-lookup-tables.2010.html
Census "places" are the "cities" for our question. These files map "places" to one or more county.
It will not be easy to use geospace functions for this task because of the odd polygon shaped of counties and the point locations of cities.
Your best bet is to reference a database of cities and their respective counties, though I don't know where you could find one.
Maybe Texas publishes one?
CommonDataHub doesn't contain this information.
Here is a bit of code to programmatically grab the name of a US county given a single US city/state using the Google Maps API. This code is slow/inefficient and does not have any error handling. However, it has worked reliably for me to match counties with a list of ~1,000 cities.
#Set up googlemaps API
import googlemaps
google_maps = googlemaps.Client(key='API_KEY_GOES_HERE')
#String of city/state
address_string = 'Atlanta, GA'
#Geocode
location = google_maps.geocode(address_string)
#Loop through the first dictionary within `location` and find the address component that contains the 'administrative_area_level_2' designator, which is the county level
target_string = 'administrative_area_level_2'
for item in location[0]['address_components']:
if target_string in item['types']: #Match target_string
county_name = item['long_name'] #Or 'short_name'
break #Break out once county is located
else:
#Some locations might not contain the expected information
pass
This produces:
>>> county_name
Fulton County
Caveats:
code will break if google_maps.geocode() is not passed a valid
address
certain addresses will not return data corresponding to 'administrative_area_level_2'
this does not solve the problem of US cities that span multiple counties. Instead, I think the API simply returns the county associated with the single latitude/longitude associated with address_string
The quickest and most non-evasive way might be to use a JSON/XML request from a free geolocation API (Easily found on Google). That way you don't need to create/host your own database.

is there an algorithm to find out which words in a search-string belong together?

I was thinking about text driven search by user input.
often you are searching in a database of addresses, where you can find customers and so on.
has anybody any idea how to find out which of the typed words is the name, which is the street name, which is the company name?
and secondly if the name is a double name like "Lee Harvey", how can I find out that the two words Lee and Harvey belong together?
Same problem with company names like "frank the baker inc."...
Is there any algorithm or best practice strategy?
thanks for links, tutorials, scripts and all other help ;-)
What you basically want is a search engine :) Here are the basic steps you need to follow -
You need to create an 'Inverted Index' of the content you want to be searched on.
The index is 'name'=>'value' pair. You can have this pair in whichever way you want (tuned according to your data & needs.
Eg. for your problem of double names, you could split all your names into single words & index it like so -
'lee'=>'lee harvey'
'harvey'=>'lee harvey'
...
this way when anyone searches for 'lee' they get 'lee harvey'. There are other better approaches to this called "n-gram" indexing. Check it out...
You could possibly build indexes of names, addresses, emails etc & when the user types a query check it against all your indexes with the approach suggested above. After you get the results then merge them. Maybe you could introduce the notion of rank so that you can sort your results & show the most latest or most relevant ones at the top. For this you need to figure out a way to score your terms...
Don't care, just perform full-text search. Then you should check the result items for which field contains the search terms. Also, you may display items in separate lists (terms found int name, term found in address). The only difficulty is if John Smith is living in the John Smiht street, you must decide, which list/lists the result item belongs to.

Resources