Machine Learning with APIs and storing results - machine-learning

I have a (commercial) machine learning project where I want to enrich the existing information with data from different API sources (like Google Places, Facebook Graph), save this in a database and analyse the whole data with machine learning algorithms (e.g. perform clustering).
But according to e.g. Google Places caching and storing data is not allowed:
3.2.4 (a) No Scraping. Customer will not extract, export, or otherwise scrape Google Maps Content for use outside the Services. For example, Customer will not: (i) pre-fetch, index, store, reshare, or rehost Google Maps Content outside the services; (ii) bulk download Google Maps tiles, Street View images, geocodes, directions, distance matrix results, roads information, places information, elevation values, and time zone details; (iii) copy and save business names, addresses, or user reviews; or (iv) use Google Maps Content with text-to-speech services. https://cloud.google.com/maps-platform/terms/#3-license
3.2.4 (b) No Caching. Customer will not cache Google Maps Content except as expressly permitted under the Maps Service Specific Terms
https://cloud.google.com/maps-platform/terms/#3-license
5.4 Caching. Customer can temporarily cache latitude (lat) and longitude (lng) values from the Places API for up to 30 consecutive calendar days, after which Customer must delete the cached latitude and longitude values. Customer can cache Places API Place ID (place_id) values, in accordance with the Places API Policies.
https://cloud.google.com/maps-platform/terms/maps-service-terms/
How did you proceed in similar projects? Is it really caching what I want to do in the project or is there a slight difference? I've made a request to Google but they couldn't answer this question on the telephone. And I need to store the data somewhere to be able to perform the machine learning algorithms.

Related

Geolocation of twitter users from tweets

I would like to retrieve the GPS longitude and latitude coordinates of Twitter users from the posts. I needed high granular geolocations, so want to collect tweets whose location
is automatically recorded by Twitter trough the GPS and not self-reported by the user.
Before, Twitter provides this access through the tweepy.Stream class, so for example:
import tweepy
LOCATION = [-124.7771694, 24.520833, -66.947028, 49.384472]
class MyStreamListener(tweepy.Stream):
# class methods
stream = MyStreamListener(apikey,apikeysecret,accesstoken,accesstokensecret)
stream.filter(locations=LOCATION)
However, as stated in the tweepy documentation, new Twitter apps cannot use Stream class beyond April 29, 2022.
New Twitter Developer Apps created on or after April 29, 2022 will not be able to gain access to v1.1 statuses/sample and v1.1 statuses/filter, the Twitter API v1.1 endpoints that Stream uses. Twitter API v2 can be used instead with StreamingClient.
Unfortunately, the StreamingClient does not provide the locations parameter in its filter() or any other method of the class.
Does that mean Twitter stops providing this metadata to researchers?
First of all, addressing this comment from your original question...
so want to collect tweets whose location is automatically recorded by
Twitter t[h]rough the GPS and not self-reported by the user.
Twitter does not automatically record user location at all. It is entirely down to a user to choose to add location data to a Tweet. A relatively small proportion of Tweets carry location information, and far fewer of those carry specific GPS information since the option to add information at that level was removed from the Twitter app several years ago.
To get into the details...
Streaming works differently in the modern version of the Twitter API.
In v1.1 you would provide "track" and "filter" options to a single API endpoint and then listen as matches came in.
In v2 of the API, you have a connection (via StreamingClient if you are using Tweepy) and you create rules via a separate endpoint (the StreamRule in Tweepy). These can contain a number of operators.
value (str | None) – The rule text. If you are using a Standard
Project at the Basic access level, you can use the basic set of
operators, can submit up to 25 concurrent rules, and can submit rules
up to 512 characters long. If you are using an Academic Research
Project at the Basic access level, you can use all available
operators, can submit up to 1,000 concurrent rules, and can submit
rules up to 1,024 characters long.
You'll need to refer to the Twitter documentation on creating search queries for rules. At a high level, you can use the has:geo operator to find Tweets with geo information, and potentially the place: operator to narrow down to areas. That's how you can use the current API to filter based on location. Note that available operators may vary based on your Twitter API access level.

Tools for estimating nearby locations without calling external API (i.e Google Maps)

In my rails application i have some model Location that holds the address and corresponding latitude and longitude.
At the main page, user can search for a location from which he wants to find the nearest places in location table.
I use JavaScript Google API for geocoding on the client side. Since the limit per user is 25000 req/day i guess i do not need to worry about it cause no one will want to search for a location so many times. I use Ajax for sending geocoded latitude and longitude from client to the server.
Now i'm on the sever side and have a reference point and table of locations. But at this moment, i guess i cannot use geocoder gem which use Google API for estimating some nearby locations, cause from the server side there is a limit of 2500 requests per day and i expect to exceed it(don't want to pay either).
What tool can i use to easily return some nearby locations without calling external API? I know that there are other API's but all of them have either limitations or obligations and since it's more like mathematical calculation i can't see the reason to play around with API at this point.
Assuming that you're using a relational database, you can use ElasticSearch to index your Locations and search through them using Geo Distance Range Query.
Other similar option would be use a Mongo collection and take advantage of the Mongo's $near operator.
Edit:
There're some questions related to your question which can be useful if you're using MySQL/PostgreSQL:
Find closest 10 cities with MySQL using latitude and longitude?
postgres longitude longitude query

Which Maps API should I use?

I am creating a webpage that includes maps for my software engineering thesis. The page will include following features:
Show a specific location and save it to a database;
Showing different roads in the same map and save then to database;
Getting the nearest road that passes nearby a specific location pointed by a user - a little search function;
Users might be allowed to create different roads, which can be saved in a database.
The thing is that the service (API) used should be free. For this reason, we might not be using Google Maps.
We are using Java for the Model Classes.
Which maps API can I use?
How can I ask it which of the roads is nearest to a certain point (location) on the map?
Google Maps API is Free as long as you are not using more than a certain amount of traffic. If its for a class project it should be fine, but if that project turned into a commercial site, it would become expensive.

Geocoding on the fly vs database lookup

I'm starting a new rails project that integrates closely with Google Maps. When a user searches for a city, I'm trying to decide whether to geocode the address on the fly (using Google's Geocoding API) or to look up the city in a database pre-populated with lat/long. After I have the lat/long I will plot it on Google Maps.
Which do you think would perform better? With the database lookup, the table would have to be pretty large to account for all the cities I would need and I would have to fallback on the geocoding API anyway for any cities that I don't have in my database.
I wasn't sure if there is a common practice to this or not. I don't need a user's specific location, but just a city they are searching for.
The size of the table is no problem, as long as you index on the city name.
Performance of indexed database queries outspeed web API access by far.
An other point is, that you have better controll of the found data. For example, if you find more than one matching city, you can provide a choice of your DB entries, while Google sometimes reports none or some random (or at least unexpected) search result.
This is, why I had to change to a DB search first strategy in one of my project: Google somtimes didn't find my customers addresses but something total different (i.e. small villages with the same name as the expected bigger one)
Why not do both?
Have the address's geocoded information in your database as "Address Cache" and then call the Google Maps Geocode API only if the address doesn't already exist in your database. That's the approach I used in my Google Maps to SugarCRM integration. It works well. BTW, the Google Maps Geocode API is impressively fast, so users rarely notice. Yet, there is a 2,500/day limit on request and it's also throttled to about 10 requests per second. So, considering those limits, I think a combination database/geocode approach is much better in the long run.
https://github.com/jjwdesign/JJWDesign-Google-Maps

Building a Bing Store Locator

I've tried Google but I'm not smart enough to build my own Google Store locator. Everything is either old or in PHP and I use Python. Even GAE uses Python but I can't find any tutorials for Python at all. I even had a bounty here on S.o. for resources to a Python store locator and the only answer had a post from 2008 that was updated to "deprecated" on the post itself.
So I decided to give Bing a try and it has more noob options, such as "enter address here" and then it's listed in the app. The only problem is that everything is pointing me to Bing Spacial Datasend and it says they want to charge an arm and a leg.
Also, if you know of any, are there any good tutorials on building a Bing Store Locator? Google search has come up empty for me but they could always be hiding them. Thanks.
If you have less than 50 locations you can use the Bing Spatial Data Services under the free terms of use. If your application is a public facing web site, which most store locators are you can also generate 125,000 transactions against Bing Maps for free per year. If this is enough or not would depend on the number of stores and customers you have. The Bing Spatial Data Services is a really good option as you simply upload your data and it exposes it as a spatial rest service for you which you can access directly from JavaScript without the need for any server side code. Here is an example of how to query a data source in the Bing Spatial Data Services: http://www.bingmapsportal.com/ISDK/AjaxV7#SpatialDataServices1
If your application has a lot more volume then you would need a Bing Maps license. The cost of a license varies depending on the amount of transactions your application will use. If you have a Bing Maps license data sources can have up to 600,000 locations in a single data source and each Bing Maps account is allowed up to 25 data sources.

Resources