Does anyone know if I can download Wikipedia text data with the navigation popups included?
I know there are existing text dump datasets [1,2], but as far as I can tell, they do not include the navigation popup text.
[1] https://dumps.wikimedia.org/backup-index.html
[2] https://www.kiwix.org/en/download/
If you want to download the text inside the navigation popup, you can use the following URL:
https://en.wikipedia.org/api/rest_v1/page/summary/<search_term>
where <search_term> is the term you want to use.
Example about the summary of Colombia - shown in the navigation popup window:
https://en.wikipedia.org/api/rest_v1/page/summary/Colombia
Result:
{"type":"standard","title":"Colombia","displaytitle":"<span class=\"mw-page-title-main\">Colombia</span>","namespace":{"id":0,"text":""},"wikibase_item":"Q739","titles":{"canonical":"Colombia","normalized":"Colombia","display":"<span class=\"mw-page-title-main\">Colombia</span>"},"pageid":5222,"thumbnail":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Flag_of_Colombia.svg/320px-Flag_of_Colombia.svg.png","width":320,"height":213},"originalimage":{"source":"https://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Flag_of_Colombia.svg/900px-Flag_of_Colombia.svg.png","width":900,"height":600},"lang":"en","dir":"ltr","revision":"1130831378","tid":"2a3b2e60-89de-11ed-8f85-3d0f6ce6a5cb","timestamp":"2023-01-01T05:02:29Z","description":"Country in South America","description_source":"local","coordinates":{"lat":4,"lon":-72},"content_urls":{"desktop":{"page":"https://en.wikipedia.org/wiki/Colombia","revisions":"https://en.wikipedia.org/wiki/Colombia?action=history","edit":"https://en.wikipedia.org/wiki/Colombia?action=edit","talk":"https://en.wikipedia.org/wiki/Talk:Colombia"},"mobile":{"page":"https://en.m.wikipedia.org/wiki/Colombia","revisions":"https://en.m.wikipedia.org/wiki/Special:History/Colombia","edit":"https://en.m.wikipedia.org/wiki/Colombia?action=edit","talk":"https://en.m.wikipedia.org/wiki/Talk:Colombia"}},"extract":"Colombia, officially the Republic of Colombia, is a country in South America with insular regions in North America—near Nicaragua's Caribbean coast—as well as in the Pacific Ocean. The Colombian mainland is bordered by the Caribbean Sea to the north, Venezuela to the east and northeast, Brazil to the southeast, Ecuador and Peru to the south and southwest, the Pacific Ocean to the west, and Panama to the northwest. Colombia is divided into 32 departments and the Capital District of Bogotá, the country's largest city. It covers an area of 1,141,748 square kilometers, and has a population of 52 million. Colombia's cultural heritage—including language, religion, cuisine, and art—reflects its history as a Spanish colony, fusing cultural elements brought by immigration from Europe and the Middle East, with those brought by enslaved Africans, as well as with those of the various Indigenous civilizations that predate colonization. Spanish is the official state language, although English and 64 other languages are recognized regional languages.","extract_html":"<p><b>Colombia</b>, officially the <b>Republic of Colombia</b>, is a country in South America with insular regions in North America—near Nicaragua's Caribbean coast—as well as in the Pacific Ocean. The Colombian mainland is bordered by the Caribbean Sea to the north, Venezuela to the east and northeast, Brazil to the southeast, Ecuador and Peru to the south and southwest, the Pacific Ocean to the west, and Panama to the northwest. Colombia is divided into 32 departments and the Capital District of Bogotá, the country's largest city. It covers an area of 1,141,748 square kilometers, and has a population of 52 million. Colombia's cultural heritage—including language, religion, cuisine, and art—reflects its history as a Spanish colony, fusing cultural elements brought by immigration from Europe and the Middle East, with those brought by enslaved Africans, as well as with those of the various Indigenous civilizations that predate colonization. Spanish is the official state language, although English and 64 other languages are recognized regional languages.</p>"}
Related
I am trying to find a solution for being able to automatically split address into their separate components using python.
below is some sample data
Full Address
Street Number
Street
City
State
Zip Code
661 Camel Back Road Tulsa Oklahoma 74120
661
Camel Back Road
Tulsa
Oklahoma
68 Gnatty Creek Road Roslyn New York 11576
68
Gnatty Creek Road
Roslyn
New York
1 Raccoon Run Seattle Washington 98119
1
Raccoon Run
Seattle
Washington
616 Friendship Lane Santa Clara California 95054
616
Friendship Lane
Santa Clara
California
95054
3878 Grand Avenue Maitland Florida 32751
3878
Grand Avenue
Maitland
Florida
32751
The above data is a representation of what I am trying to achieve.
on the left is my input address, and on the right is the result after having being split out automatically.
The problem here, as cannot be seen in this over simplified example, is that the input addresses don't come in the same order, and will include components such as names of buildings etc.
My options so far are the following:
REGEX
MACHINE LEARNING MODEL
The REGEX option is familiar, but it will still be largely inaccurate. I need this solution to be as accurate as possible.
The MACHINE LEARNING MODEL option is more difficult in that I am not aware of any model or framework capable of classifying multiple categories as once.
Can anyone help?
so far I haven't really started the REGEX in anticipation of major gaps in capturing groups.
I think the only way to do this and get a fairly accurate result is to get the list of zip codes, for instance from here:
https://www.zipcode.com.ng/2022/06/list-of-5-digit-zip-codes-united-states.html?m=1
and a list of US cities.
Then you can match the zip code, state and city to the lists.
As I was looking for some of cities (San Jose, Berkeley, ParloAlto, Cupertino, Davis, Mountain View, Pasadena, Sunnyvale, Irvine, Livermore, Edwards, Whitmore, Loma Linda, Standford, Redwood City, El Segundo, Moss Landing,Marina Del Rey etc..) under california state could not be found highlighted in Highmaps as seen from the below link
http://www.highcharts.com/maps/demo#countries/us/us-ca-all
Are there any possible solutions or any support for the mentioned problem above?
If you are after the California's cities in GeoJSON, this is where you can find it.
The only thing that I need to mention is that your file and my file are two different coordinate systems. Yours is EPSG: 102243, mine is EPSG: 4326.
If you have any trouble with these coordinate systems, you can always convert them from one to the other using GeoConverter. Just make sure to expand the Advanced Options menu so you can set the input EPSG and the output EPSG.
As part of a developer challenge, I am trying to determine the land mass closest to a given coordinate. Obviously, if the point is on land, I use reverse geocoding and can get details. The problem is that if the point is in a body of water, especially oceans, it often won't return anything (Google, Nokia, Bing). I'd like to know that a point 3 miles off the coast of California is 3 miles from USA, or x miles from Japan, y miles from South Korea when a point is reasonably near more than one country. Is there any service that provides this information?
Take a KML file of the world's Maine Regions
Simplify it down to a minimum number of rough polygons.
Take your location, does it lie within one of the polygons?
If your location lies with a polygon, then it is at sea, iterate though the points on the inner and outer boundary to find the nearest one using the Haversine Formula. This will be the nearest point on land.
If your location does not lies with a polygon, you are already on land, do a direct reverse geocode.
Just imagine the world is a bit like the board from the game Diplomacy
Now coalesce the sea areas into larger polygons with holes for islands.
If you're not at sea you must be on the land right?
check older post in here Verify if a point is Land or Water in Google Maps, check the answer about the Koordinates Vector JSON Query service
Is there a service that provides latitude and longitude for UK phone numbers?
For example:
Query: 0141 574 xxx, Returns: (55.8659829, -4.2602205) [Glasgow City Centre]
Allow me to stress that I am not looking for a reverse-directory-enquires. I am more interested in 'local area' for things like weather by phone or "Where's my nearest Pizza Shop?"
If this service doesn't exist your suggestions on how to implement it or where to get data from would also be incredibly useful.
I am aware that Ofcom provides a list of area codes with a place name [1] suitable for geolocation, but I have my concerns about resolution. I see this as a particular problem in smaller towns and rural areas where an area code will cover a large geographical area.
Second Example:
Area Code: 01555, Ofcom: Lanark
However:
01555 860xxx is Crossford (4 miles W of Lanark)
01555 77xxxx is Carluke (5 miles NW)
01555 89xxxx is Lesmahagow (5 miles SW)
01555 840xxx is Carnwath (7 miles NE)
Therefore 01555 covers about ~80 sq miles. That's not particularly local.
[1] Ofcom Area Code Tool: http://www.ofcom.org.uk/consumer/2009/09/telephone-area-codes-tool/
You can get a resonable location for numbers allocated to BT.
The "L" digits map to a particular exchange within that area:
(02X) LLLL XXXX (2+8)
(011X) LLL XXXX (3+7)
(01X1) LLL XXXX (3+7)
(01XXX) LLXXXX (4+6)
(01XXX) LLXXX (4+5)
(01XXXX) LXXXX (5+5)
(01XXXX) LXXX (5+4)
For cable providers (especially those using fibre optic delivery), there is sometimes only one exchange per area code and therefore the numbers in each LL range cover the entire area code.
For numbers allocoted to other providers there's a similar problem. Additionally, those numbers may be allocated as VoIP and in use in another area or even in a completely different country. For non-BT numbers location data cannot be relied on.
For people who have moved and kept their number, location data will also be inaccurate.
That said, CodeLook does a reasonable job of showing the right data: http://www.telecom-tariffs.co.uk/codelook.htm
You may have a problem in that not all numerics after area codes are geographic. Some have been block allocated to Cable Providers. I know my own number has belonged to myself and also a person who lived about 5 miles northeast of my current location, the link... we belong to the same cable provider.
What sort of telephone numbers are they? If they are businesses, what do you think of the possibility of searching for the whole number using say, Googles API, and lifting the actual address from the page? - I know thats harder to do than that, just exploring some possibilities ..;-
I'm working on a GeoTargeting application. I'm curious if longitude and latitude of a point on the earth can change?
If you know the exact position of the statue of liberty how sure is it that longitude and latitude will stay the same.
Does it change according to the season, time in the year, or slowly over time
Wikipedia to the rescue:
The surface layer of the Earth, the
lithosphere, is broken up into several
tectonic plates. Each plate moves in a
different direction, at speeds of
about 50 to 100 mm per year. As a
result, for example, the longitudinal
difference between a point on the
equator in Uganda (on the African
Plate) and a point on the equator in
Ecuador (on the South American Plate)
is increasing by about 0.0014
arcseconds per year.
It depends on the map projection variables you use. Currently WGS-84 is used mostly.
The same point can have different coordinates depending on the variables. They do not differ a lot, I remember the difference between EUR-50 (or something like that) and WGS-84 was at most 50 meters or something.
You're tangentially referring to geodetics, which is the science of modelling (representing) the shape of the earth. So while a physical location may not change, the datum (model) used by a geodetic coordinate system will change, fortunately this does not happen frequently.
In North America NAD83 is the mostly widely used datum, which replaced NAD27.
Did I mention that Geographic Information Systems (GIS) was my foray into software development?
Yes. Zip codes get split all the time, and doing so would move the center of the zip code to a new location.
47.554 always equals 47.554
But if the shape of the earth changes or you are using different methods of calculations (there are plenty) or if the input data changes in precision or if if your compiler treats floating point differently..
you'll end up in different long/lat