Web access dataset with IP addresses - geolocation

I'm looking for a dataset that has information about web page access based on IP (could be anonymized) or Geo location. I noticed that neither the stackoverflow or wikipedia open-sourced datasets include this information.
Specifically, I'm looking for how many reads/writes are done on a webpage based on geo-location
Is there any public dataset that provides this information?

Related

How can I build a corpus from unstructured text (pdf, txt, html) and train IBM Watson? And then ask questions via API calls?

I want to train a machine learning system such as IBM Watson using some PDF, txt, html unstructured data, and then ask questions and get answers via API calls. How can I achieve that? GUI based training or API based training. From Bluemix, it is hard to decide which service is best to achieve this requirement. Can you please suggest the best options?
Retrieve and Rank- Retrieve and Rank can surface the most relevant information from a collection of documents. For example, using R&R, an experienced technician can quickly find solutions from dense product manuals. A contact center agent can also quickly find answers to improve average call handle times. The Retrieve and Rank service works "out of the box," but can also be customized to improve the results. More details here
Discovery Service- Extract value from unstructured data by converting, normalizing, enriching it. Use a simplified query language to explore that data or to quickly tap into pre-enriched datasets like the Discovery News collection. More details here
I would recommend Watson Discovery (https://www.ibm.com/watson/services/discovery) for your purpose.
It's very complete and supports many features in both GUI and API. It supports questions in natural language or in query format.
Its documentation is here: https://console.bluemix.net/docs/services/discovery/getting-started.html#getting-started-with-the-api
If you create a free instance of Watson Discovery, you can test its API here: https://watson-api-explorer.mybluemix.net/apis/discovery-v1
There are examples of each API call here: https://www.ibm.com/watson/developercloud/discovery/api/v1/
There is also a demo and respective code here:
https://discovery-news-demo.mybluemix.net/?cm_mc_uid=30407807098515090430617&cm_mc_sid_50200000=1509636542&cm_mc_sid_52640000=1509636542
and
https://github.com/watson-developer-cloud/discovery-nodejs

Geolocation, Is it possible to get latitude and longitude from address and store locally in my database

I want to be able to run queries locally comparing latitude and longitude of locations so I can run queries for certain addresses I've captured based on distance.
I found a free database that has this information for zip codes but I want this information for more specific addresses. I've looked at google's geolocation service and it appears it's against the TOS to store these values in my database or to use them for anything other than doing stuff with google maps. (If somebody's looked deeper into this and I'm incorrect let me know)
Am I likely to find any (free or pay) service that will let me store these lat/lon values locally? The number of addresses I need is currently pretty small but if my site becomes popular it could expand quite a bit over time to a large number. I just need to get the coordinates of each address entered once though.
This question hasn't received enough attention...
You're correct -- it can't be done with Google's service and still conform to the TOS. Cheers to you for honestly seeking to comply with the TOS.
I work at a company called SmartyStreets where we process addresses and verify addresses -- and geocode them, too. Google's terms don't allow you to store the data returned from the API, and there's pretty strict usage limits before they throttle or cut off your access.
Screen scraping presents many challenges and problems which are both technical and ethical, and I don't suppose I'll get into them here. The Microsoft library linked to by Giorgio is for .NET only.
If you're still serious about doing this, we have a service called LiveAddress which is accessible from any platform or language. It's a RESTful API which can be called using GET or POST for example, and the output is JSON which is easy to parse in pretty much every common language/platform.
Our terms allow you to store the data you collect as long as you don't re-manufacture our product or build your own database in an attempt to duplicate ours (or something of the like). For what you've described, though, it shouldn't be a problem.
Let me know if you have further questions about address geocoding; I'll be happy to help.
By the way, there's some sample code at our GitHub repo: https://github.com/smartystreets/LiveAddressSamples
http://www.zip-info.com/cgi-local/zipsrch.exe?ll=ll&zip=13206&Go=Go could use a screen scraper if you just need to get them once.
Also Microsoft provides this service. Check if this can help you http://msdn.microsoft.com/en-us/library/cc966913.aspx

Where are AWS data center locations listed?

I'm failing at Google search today. Is there a page that lists geolocations of the various Amazon AWS server farms?
I want to use this data to pick the appropriate farm for a client on a web app, CDN-style.
(This isn't programming, but it's for the purpose of programming, and I thought it would be useful to have this question answered for public consumption.)
Very late to the party, but in case anyone else has the same question, we've done some work to map AWS regional data centers by the fastest connection to each country (and state in US):
http://www.turnkeylinux.org/blog/aws-datacenters
The code used to this has been open sourced:
https://github.com/turnkeylinux/aws-datacenters
Mashup of associations and worldwide underwater cables:
Regarding the GeoIP implementation, see:
http://www.turnkeylinux.org/blog/geoip-amazon-regions
Each service's detail page (e.g. http://aws.amazon.com/ec2/, http://aws.amazon.com/cloudfront/) provides this information.
WikiLeaks - AmazonAtlas
Confidential AWS datacenters location has been leaked:
11 October 2018, WikiLeaks publishes a "Highly Confidential" internal document from the cloud computing provider Amazon. The document from late 2015 lists the addresses and some operational details of over one hundred data centers spread across fifteen cities in nine countries
https://wikileaks.org/amazon-atlas/
https://wikileaks.org/amazon-atlas/map/
datacenters.com
The best reference I found for tracking down the individual data center locations is datacenters.com's locations page: https://www.datacenters.com/locations
From there you can see locations and details of individual data centers like AWS Ashburn
domenech's Google Maps overlay
The other reference I cam across is from this blog post from domenech: Amazon Web Services Google Maps: World Domination Map
Direct link to the map here
An updated list here:
https://gist.github.com/atyachin/a011edf76df66c5aa1eac0cdca412ea9
Compiled from various sources including those mentioned in other answers.
Coordinates represent the location of a specific datacenter in the zone. Most availability-zones have multiple datacenters so there's no single coordinate for each AZ.

Good examples of MapServer / OpenLayers

I want to convince some clients to use MapServer and OpenLayers. Please can anyone suggest attractive websites to show off the possiblities!
The clients will be impressed by:
A density map (otherwise known as a heat map, colour-shaded grid coverage, contour plot...).
The ability for the user to download the underlying data for the density map, restricted to the area being viewed, in some format such as netCDF.
Standard OpenLayers stuff. Zooming, panning, scale bar, overview map...
Different base layers. Could be WMS, Google, Bing...
Searching for a placename, map is panned to display the place.
Exposing the heatmap data for other people to use in mashups as WMS or WCS
MapServer.org is back up but demo.mapserver.org seems to be down right now :( But from memory their examples didn't have the "wow" factor. The OpenLayers examples demonstrate only one or two features per example - I want something to wow the clients by showing all the capabilities in one example.
PS If you have good examples that use some other open source tools, post them by all means. But just JavaScript please: customer says no rich client.
EDIT Come on StackOverflow, someone must have an example that uses a density map?? I'm even offering a bounty now...
Note this answer is no longer relevant. The open source maps have since been replaced with a commercial alternative by a different company
http://maps.seai.ie/wind/ - mapping onshore and offshore wind speeds and farms in Ireland
http://maps.seai.ie/geothermal/ - mapping geothermal temperatures in Ireland, and borehole data
uses WMS services (and TileCache) for all the layers, so can be accessed by other client GIS's (well once I've set up metadata etc..)
has a variety of different base maps to choose from
built using MapFish / ExtJS
has drop down gazetteers for County and Townland (an Irish administrative unit)
all the basic map navigation tools and a simple info tool
right click on a layer to set transparency
uses MapServer opensource back-end, plus SQL Server 2008
The systems (and a third more complex Bioenergy Intranet system) got a mention here: http://www.geoconnexion.com/uploads/renewableenergy_intv9i4.pdf
http://haiticrisismap.org/ openlayes + geoxt
would it be possible to create a template map for the client with a bunch of data on it, census, socio, create some simple fake buffers.
Maybe have a look at the HeatMapAPI for Google Maps (not sure you'll wow the client with that though).
Another density map: http://maps.glassfish.org/server/ (showing the use of GlassFish around the world).
We're using the OpenLayers Heatmap layer, mostly because (for us) it handles large data volumes better than the Google Map version (your mileage may vary)
http://www.patrick-wied.at/static/heatmapjs/demo/maps_heatmap_layer/openlayers.php
By large data volumes, I mean location datasets with 100K+ rows
It also works nicely as an ASPX page with dynamic realtime data retrieval from an SQL Server database. I've used a stored procedure to pre-process the data into the array format, grouped by Latitude & Longitude.
For those that need a translation table to convert their UK Postcodes into Latitude & Longitude, here's a good source:
http://www.doogal.co.uk/UKPostcodes.php
The OneGeology Portal (http://portal.onegeology.org/OnegeologyGlobal/) has been online for about 10 years, currently running OpenLayers 2, with an OpenLayers 3 version in development.
The portal attempts to create a geological map of the world by pulling together disparate OGC services provided by data suppliers (mostly Geological Surveys) from across the globe. The portal provides access to data from WMS, WFS (simple and complex feature), and WCS. The portal uses CSW to help manage which functionality is available to a user, and provides the ability to style WMS layers through the application of custom SLD. Map contexts can be saved, shared and loaded using WMC.
There is a gazetteer to help you zoom to a location of choice, the ability to change projections, and scales, and the ability to create a KML file to allow the service to be used in Google Earth. Transparency can be changed on all layers.
There are currently 353 layers.
When the OneGeology project started, all documentation was geared to the support of services provided by MapServer, and many of the services in the portal are MapServer services. However, because the portal utilises open standards, any software that can provide services to those standards can be included.
This is an example of a classified grid generated in MapServer and displayed by OpenLayers: https://maps.greenwoodmap.com/sublette/mapserver/map#zcr=1/2690000/1170000/0&lyrs=slopesZ,townlim,ownership,roads. The raw, unclassified slope data can also queried by map click.

Finding City and Zip Code for a Location

Given a latitude and longitude, what is the easiest way to find the name of the city and the US zip code of that location.
(This is similar to https://stackoverflow.com/questions/23572/latitude-longitude-database, except I want to convert in the opposite direction.)
Related question: Get street address at lat/long pair
Any of the online services mentioned and their competitors offer "reverse geocoding" which does what you ask--convert lon/lat coordinates into a street address of some-sort.
If you only need the zip codes and/or cities, then I would obtain the Zip Code database and urban area database from the US Census Bureau which is FREE (paid for by your tax dollars). http://www.census.gov/geo/www/cob/zt_metadata.html.
From there, you can either come up with your own search algorithm for the spatial data or make use of one of a spatial databases such as Microsoft SQL Server, PostGIS, Oracle Spatial, ArcSDE, etc.
Update: The 2010 Census data can be found at:
http://www2.census.gov/census_2010/
This is the web service to call.
http://developer.yahoo.com/search/local/V2/localSearch.html
This site has ok web services, but not exactly what you're asking for here.
http://www.usps.com/webtools/
You have two main options:
Use a Reverse Geocoding Service
Google's can only be used in conjunction with an embedded Google Map on the same page, so I don't recommend it unless that is what you are doing.
Yahoo has a good one, see http://developer.yahoo.com/search/local/V3/localSearch.html
I've not used OpenStreetMap's. Their maps look very detailed and thorough, and are always getting better, but I'd be worried about latency and reliability, and whether their address data is complete (address data is not directly visible on a map, and OpenStreetMap is primarily an interactive map).
Use a Map of the ZIP Codes
The US Census publishes a map of US ZIP codes here. They build this from their smallest statistical unit, a Census Block, which corresponds to a city block in most cases. For each block, they find what ZIP code is most common on that block (most blocks have only one ZIP code, but blocks near the border between ZIP codes might have more than one). They then aggregate all the blocks with a given ZIP code into a single area called a Zip Code Tabulation Area. They publish a map of those areas in ESRI shapefile format.
I know about this because I wrote a Java Library and web service that (among other things) uses this map to return the ZIP code for a given latitude and longitude. It is a commercial product, so it won't be for everyone, but it is fast, easy to use, and solves this specific problem without an API. You can read about this product here:
http://askgeo.com/database/UsZcta2010
And about all of your geographic offerings here:
http://askgeo.com
Unlike reverse geocoding solutions, which are only available as Web APIs because running your own service would be extremely difficult, you can run this library on your own server and not depend on an external resource.
If you call volume to the service gets up too high, you should definitely consider getting your own set of postal data. In most cases, that will provide all of the information that you need, and there are plenty of db tools for indexing location data (i.e. PostGIS for PostgreSQL).
You can buy a fairly inexpensive subscription to zipcodes with lat and long info here: http://www.zipcodedownload.com/
Or google's reverse geocoding
link
http://maps.google.com/maps/geo?output=xml&q={0},{1}&key={2}&sensor=true&oe=utf8
where 0 is latitude 1 is longitude
geonames has an extensive set of ws that can handle this (among others):
http://www.geonames.org/export/web-services.html#findNearbyPostalCodes
http://www.geonames.org/export/web-services.html#findNearbyPlaceName
Another reverse geocoding provider that hasn't been listed here yet is OpenStreetMap: you can use their Nominatim search service.
OSM has the (potentially?) added bonus of being entirely user editable (wiki-like) and thus having a very liberal licencing scheme of all this data. Think of this of open source map data.

Resources