Applying a geohash_decode function to a column in a dataframe - machine-learning

Well, I geohash_encoded the geographical coordinates to geohashes. My aim is to calculate the distance with some level of accuracy. I am trying to geohash_decode the geohashes back to geographical coordinates but I have failed to come up with a function that can do that to a column in a dataframe

Assuming:
you are asking about Python (apologies if this was an R, Scala or other dataframe question, but you didn't specify )
you have a Python pandas DataFrame object df
df has a column named geohash containing your geohashes
you have the geohash2 library installed and imported (this may work with other Geohash libraries...)
you want to overwrite df with a new DataFrame containing all the old data plus the new latitude and longitude columns
The following should work:
def gh_decode(hash):
lat, lon = geohash2.decode(hash)
return pd.Series({"latitude":lat, "longitude":lon})
df = df.join(df["geohash"].apply(gh_decode))

Related

mean and standard deviation in timeseries

I have a financial time series and I want to make a new dataset out of it . I want to take every 20 data point(rows) and replace them with one data points like this :
[mean of those 20 data points , standard deviation of those 20 data points].
I actually think I need gaussian model for the variation or the standard deviation.
and I use python 3.
my dataset is like the first column is the index(number of days) and the second column is the close prices
I do not know the code for taking every 20 data point and replace them with data I wrote above
If the data points are stored in a dataframe, say df, you could group them using groupby like this -
df.groupby(df.index / 20)
You could compute the mean and standard deviation of the groups as follows, and concatenate both of them if you need to.
df.groupby(df.index / 20).mean()
df.groupby(df.index / 20).std()

Seaborn PairGrid for time series data, with color code given by time

In pandas, when you have time series data nicely put in a dataframe (meaning you have a datetime index and, say, two columns) you can plot the relation between the two columns on a scatter plot, and get the color code to represent the time:
df.plot.scatter('col1','col2',c=df.index)
How can you achieve this in a seaborn.PairGrid, with the off-diagonal elements given by regplot()?
I came up with this:
def make_regplot(*args,**kwargs):
sns.regplot(*args,**kwargs,scatter=False)
plt.scatter(*args,c=df.index)
g = sns.PairGrid(df)
g = g.map_diag(plt.hist)
g = g.map_offdiag(make_regplot)
I was wondering if there was a nicer solution, also in view of possible issues like this one https://github.com/mwaskom/seaborn/issues/1079.

Trying to create an SPSS Modeler Stream to calculate minimum distance between many geographic lat-long points and Points of Interest

I have a list of hundreds of thousand of addresses with their Latitude/Longitude data.
In a second table, I have the Latitude/Longitude of hundreds of gas stations.
I need to derive a table with the lat/long of the house, the distance to THE CLOSEST gas station, and the name of the gas station.
I'm trying to create a stream for SPSS Modeler (I'm using version 18.2).
In excel was fairly simple (see example below). In sheet one are the houses with lat/long, in sheet two the gas stations data, in sheet three how I did it in excel for a limited number of points, and in sheet four the resulting table. Basically, in excel I took the lat/long of each house and calculated the distance to all the gas stations and kept the smallest one.
I'll appreciate any directions or ideas on how to generate an SPSS Modeler stream to do something similar.
You can download my excel sample here: https://github.com/schapsis/calculating-distance-question
Please see the solution posted at https://developer.ibm.com/answers/questions/512605/how-to-calculate-distance-between-many-geographic/

Splitting a city into zones, is it a good idea?

I'm working on a geolocation based personal project where I'd like to fetch the suppliers based on the user's latitude & longitude value. And the deal is suppliers have variable supply radius, few suppliers supply only within 5km of their radius while some may supply across the entire city.
The general way to go about this is for each supplier calculate the distance between the supplier & the user. If it is less than or equal to it's supply radius then display that supplier in the results.
But this might be very slow, so I thought I'd split the city into four zones(pick four latitude & longitude values from google maps for North East West South) & whenever a supplier is added I'll do the math & assign the zones to which they can supply in the database. Now whenever I get the user's latitude & longitude I'd determine the zone & fetch suppliers that can supply to that zone, do the distance calculation & filter them out. This way I do the calculation on less number of suppliers instead of the entire list.
But is it a good idea or can I do better ?
In you are using Postgres/Postgis, you can make use of spatial indexes, and then use ST_DWithin(geom1, geom2, distance) type queries see ST_DWithin docs. The spatial index will partition the space for you, making this kind of query very efficient and avoid you having to come up with any spatial partitioning scheme of your own.
Another operator you can use is the <-> operator, which is very efficient with a spatial index and is used in the order by clause, to get the nearest y things to some point x, (k nearest neighbour search) see <-> operator docs. One caveat for this operator to work properly with the index, the point you are searching for, needs to be a constant, as it sounds like it would be in your case.

How to find GPS-coordinates within a bounding circle

I have a series of GPS coordinates in decimal dotted format, multiplied by 1.000.000. For example a latitude of 51.1 and a longitude of 4.1 would be saved as Y 51100000 and X 4100000. These coordinates are saved in an SQlite 3 database.
Using Ruby 1.9.2 and Rails 3.0.8, I need to be able to get all records that are within a certain radius of a certain center point. For instance, given a center point of latitude 51 and longitude 4, I need to find all records within a 10 kilometer radius.
This article explains pretty well how to perform a query to get those records, but SQlite does not seem to support the mathematical functions that are used: http://www.movable-type.co.uk/scripts/latlong-db.html
Is there any other way I would be able to retrieve the proper records from the database that does not involve iterating through the entire table?
Thanks!
Unless you MUST use SQLite, try it with Sphinx:
I would convert GPS coordinates to lat/lng coordinates, and then use Sphinx (there is gem thinking-sphinx for rails: http://freelancing-god.github.com/ts/en/). With Sphinx you can search points within a given circle with Sphinx's function: #geodist
A brilliant example of how to do it you will see here: http://joeyschoblaska.com/blog_posts/220-thinking-sphinx-searching-by-location-and-keyword

Resources