Point in polygon based search vs geo hash based search - geolocation

I'm looking for some advice.
I'm developing a system with geographic triggers, these enable my device to perform certain actions depending on where it is. The triggers are contained within polygons that are stored in my database I've explored multiple options to get this working, however, I'm not very familiar with geo-spacial systems.
An option would be to use the current location of the device and query the DB directly to give me all the polygons that contain that point, thus, all the triggers since they are linked together. A potential problem with this approach, I think, would be the possible amount of polygons stored, and the frequency of the queries, since this system serves multiple devices simultaneously and each one of them polls every few seconds.
An other option I'm exploring is to encode the polygons to an array of geo-hashes and then attach the trigger to each one of them.
Green is the geohashes that the trigger will be attached to, yellow are areas that need to be recalculated with a higher precision. The idea is to encode the polygon in the most efficient way down to X precision.
An other optimization I came up with is to only store the intersection of polygons with roads since these devices are only use in motor vehicles.
Doing this enable the device to work offline performing it's own encoding and lookup, with a potential disadvantage being that the device will have to implement logic to stay up-to-date with triggers added or removed ( potentially every 24 hours )
I'm looking for the most efficient way to implement this given some constrains such as:
Potentially unreliable networks ( the device has LTE connectivity )
Limited processing power, the devices for now are based on a raspberry pi 3 Compute module, however, they perform other tasks such as image processing.
Limited storage, since they store videos and images.
Potential large amount of triggers/polygons
Potential large amount of devices.
Any thoughts are greatly appreciated.

Related

Storing any number series data in a time-series database

I would like to make use of time-series database InfluxDb to store data points indexed by another number instead of time which every data point is stored against. So I can take advantage all the features for a series of datapoints against this number..
For example I have a rocket doing multiple launches on which I have several sensors recording temperature, air pressure, fuel level &c. And I want to graph these datapoints against elevation not time..
I realise I could store elevation itself against time then from the time for say a temperature reading work out the elevation and project the results - but that working out would lose the performance characteristics of just querying the datapoints indexed by elevation. Also third party tools which use the time-series database won't be able to simply get these datapoints against elevation as opposed to time to graph them out, e.g. Grafana, without me putting something in-between to marry the data up..
One idea I had was to have a fake time where meters = seconds and store against this, then I would need make that a composite with something else to differentiate rocket launches, e.g. increment year by 1 starting at year 0.. So I don't see every launch starting at the same elevation and can separate the "number-series" from each other - I guess I would have that problem anyway and the proper way to that would be through tags..
What makes you believe that this approach would be more efficient than storing the elevation jointly with your other sensor data? Fetching data is pretty cheap so the performance gain might be very light compared to the augmented complexity of your keys. Not to mention that you would still need to have the time make part of your elevation-timestamp, otherwise you will end up with duplicate pseudo timestamps and therefore incomplete data as most time series databases do not allow multiple values at the same timestamp for a given series.
I would encourage you to also have a look at other time series databases which include elevation as part of their standard data model. Check out Warp 10 for that matter (std disclaimer, I am the co-founder of SenX, maker of Warp 10).

Analyzing Sensor Data stored in cassandra and draw graphs

I'm collecting data from different sensors and write them to a Cassandra database.
The Sensor-ID accts as a partition key, the timestamp of the sensors data as clustering column. Additionally a value of the sensor is stored.
Each sensor collects something about 30000 to 60000 values a day.
The simplest thing I wane do is draw a graph showing this data. This is not a problem for a few hours but when showing a week or even a longer range, all the data has to be loaded into the backend (a rails application) for further processing. This isn't really fast with my test dataset and won't be faster in production I think.
So my question is, how to speed this up. I thought about pre-processing the data directly in the database but it seems, that Cassandra isn't able to do such things.
For a graph with a width of 1000px it isn't interesting to draw ten thousands of points - so it would be interesting to gather only relevant, pre-aggregated data from the database.
For example, when showing the data for a whole day in a graph with a width of 1000px, it would be enough to take 1000 average values (this would be an average clustered by 86seconds - 60*60*24 / 1000).
Is this a good approach? Or are there other techniques fasten this up? How would I handle this with database? Create a second Table and store some average values? But the resolution of the graph may change...
Other approaches would be drawing mean values by day, week, month and so on. Maybe vor this a second table could do a good job!
Cassandra is all about letting you write and read your data quickly. Think of it as just a data store. It can't (really) do any processing on that data.
If you want to do operations on it, then you are going to need to put the data into something else. Storm is quite popular for building computation clusters for processing data from Cassandra, but without knowing exactly the scale you need to operate at, then that may be overkill.
Another option which might suit you is to aggregate data on the way in, or perhaps in nightly jobs. This is how OLAP is often done with other technologies. This can work if you know in advance what you need to aggregate. You could build your sets into hourly, daily, whatever, then pull a smaller amount of data into Rails for graphing (and possibly aggregate it even further to exactly meet the desired graph requirements).
For the purposes of storing, aggregating, and graphing your sensor data, you might consider RRDtool which does basically everything you describe. Its main limitation is it does not store raw data, but instead stores aggregated, interpolated values. (If you need the raw data, you can still use Cassandra for that.)
AndySavage is onto something here when it comes to precomputing aggregate values. This does require you to understand in advance the sorts of metrics you'd like to see from the sensor values generally.
You correctly identify the limitation of a graph in informing the viewer. Questions you need to ask really fall into areas such as:
When you aggregate are you interested in the mean, median, spread of the values?
What's the biggest aggregation that you're interested in?
What's the goal of the data visualisation - is it really necessary to be looking at a whole year of data?
Are outliers the important part of the dataset?
Each of these questions will lead you down a different path with visualisation and the application itself too.
Once you know what you're wanting to do, an ETL process harnessing some form of analytical processing will be needed. This is where the Hadoop world would be useful investigating.
Regarding your decision to use Cassandra as your timeseries historian, how is that working for you? I'm looking at technical solutions for a similar requirement at the moment and it's one of the options on the table.

People counting using OpenCV

I'm starting a search to implement a system that must count people flow of some place.
The final idea is to have something like http://www.youtube.com/watch?v=u7N1MCBRdl0 . I'm working with OpenCv to start creating it, I'm reading and studying about. But I'd like to know if some one can give me some hints of source code exemples, articles and anything elese that can make me get faster on my deal.
I started with blobtrack.exe sample to study, but I got not good results.
Tks in advice.
Blob detection is the correct way to do this, as long as you choose good threshold values and your lighting is even and consistent; but the real problem here is writing a tracking algorithm that can keep track of multiple blobs, being resistant to dropped frames. Basically you want to be able to assign persistent IDs to each blob over multiple frames, keeping in mind that due to changing lighting conditions and due to people walking very close together and/or crossing paths, the blobs may drop out for several frames, split, and/or merge.
To do this 'properly' you'd want a fuzzy ID assignment algorithm that is resistant to dropped frames (ie blob ID remains, and ideally predicts motion, if the blob drops out for a frame or two). You'd probably also want to keep a history of ID merges and splits, so that if two IDs merge to one, and then the one splits to two, you can re-assign the individual merged IDs to the resulting two blobs.
In my experience the openFrameworks openCv basic example is a good starting point.
I'll not put this as the right answer.
It is just an option for those who are able to read in Portugues or can use a translator. It's my graduation project and there is the explanation of a option to count people in it.
Limitations:
It's do not behave well on envirionaments that change so much the background light.
It must be configured for each location that you will use it.
Advantages:
It's fast!
I used OpenCV to do the basic features as, capture screen, go trough the pixels, etc. But the algorithm to count people was done by my self.
You can check it on this paper
Final opinion about this project: It's not prepared to go alive, to became a product. But it works very well as base for study.

Finding out distance between router and receiver?

A general question: is it possible to retrieve information about how far away e.g. a computer is from a wifi-router. For instance I want to get data on my computer if I'm 10 meters away from my home-wifispot or 2 meters.
Any idea if that is even possible?
Edit: How about bluetooth? Is it possible to get information about how far away bluetooth-connected devices are one from another?
I would recommend a measuring line or just good-old-fashioned guesstimating.
There is no "simple" way to do it (complex ways may involve building "accurate" signal maps ahead of time or trying to fit a better equation which is still subject to anumber of the limitations with the naive rule) and the rule of thumb "1/r^2" is just that -- a general rule of thumb. On the other hand, perhaps there is some existing software that will show you your RSS strength and make the task feel accomplished :-)
You will find useful links if you google for "RSS signal distance". This kind of task seems quite a common topic in academia w.r.t. small wireless devices ("motes") as well and there have been some interesting approaches to this problem such as the fitting of secondary low-frequency acoustic sensors.
You can query the signal strength which is some kind of indication of distance and obstructions and a few other factors all rolled into one measure. With just plain wifi though this isn't possible directly.
Try measuring the response time of the router to pings, with the data rate set to constant to avoid that effecting the response time. Take lots of samples and remove outliers to reduce errors, but you will still have a substantial quantization error. Subtract the latency of the router and computer, divide by 6 then multiply by the speed of light and hopefully you will have the distance to a resolution of a few metres.

What is Scaling?

I always get this argument against RoR that it dont scale but I never get any appropriate answer wtf it really means? So here is a novice asking, what the hell is this " scaling " and how you measure it?
What the hell is this "scaling"...
As a general term, scalability means the responsiveness of a project to different kinds of demand. A project that scales well is one that doesn't have any trouble keeping up with requests for more of its services -- or, at the least, doesn't have to start turning away requests because it can't handle them.
It's often the case that simply increasing the size of a problem by an order of magnitude or two exposes weaknesses in the strategies that were used to solve it. When such weaknesses are exposed, it might be said that the solution to the problem doesn't "scale well".
For example, bogo sort is easy to implement, but as soon as you're sorting more than a handful of things, it starts taking a very long time to get the answer you want. It would be fair to say that bogo sort doesn't scale well.
... and how you measure it?
That's a harder question to answer. In general, there aren't units associated with scalability; statements like "that system is N times as scalable as this one is" at best would be an apples-to-oranges comparison.
Scalability is most frequently measured by seeing how well a system stands up to different kinds of demand in test conditions. People might say a system scales well if, over a wide range of demand of different kinds, it can keep up. This is especially true if it stands up to demand that it doesn't currently experience, but might be expected to if there's a sudden surge in popularity. (Think of the Slashdot/Digg/Reddit effects.)
Scaling or scalability refers to how a project can grow or expand to respond to the demand:
http://en.wikipedia.org/wiki/Scalability
Scalability has a wide variety of uses as indicated by Wikipedia:
Scalability can be measured in various dimensions, such as:
Load scalability: The ability for a distributed system to easily
expand and contract its resource pool
to accommodate heavier or lighter
loads. Alternatively, the ease with
which a system or component can be
modified, added, or removed, to
accommodate changing load.
Geographic scalability: The ability to maintain performance,
usefulness, or usability regardless of
expansion from concentration in a
local area to a more distributed
geographic pattern.
Administrative scalability: The ability for an increasing number of
organizations to easily share a single
distributed system.
Functional scalability: The ability to enhance the system by
adding new functionality at minimal
effort.
In one area where I work we are concerned with the performance of high-throughput and parallel computing as the number of processors is increased.
More generally it is often found that increasing the problem by (say) one or two orders of magnitude throws up a completely new set of challenges which are not easily predictable from the smaller system
It is a term for expressing the ability of a system to keep its performance as it grows over time.
Ideally what you want, is a system to reach linear scalability. It means that by adding new units of resources, the system equally grows in its ability to perform.
For example: It means, that when three webapp servers can handle a thousand concurrent users, that by adding three more servers, it can handle double the amount, two thousand concurrent users in this case and no less.
If a system does not have the property of linear scalability, there is a point where adding more resources, e.g. hardware, will not bring any additional benefit, performance, for instance, converges to zero: As more and more servers are put to the task. In the above example, the additional benefit of each new server becomes smaller and smaller until it reaches zero.
Thus, scalability is the factor that tells you what you get as output from a given input. It's value range lies between 0 and positive infinity, in theory. In practice, anything equal to 1 is most desirable...
Scalability refers to the ability for a system to accomodate a changing number of users. This can be an increasing or decreasing number of users as we now try to plan our systems around cloud computing and rented computing time.
Think about what is involved in making an order entry system designed for 1000 reps scale to accomodate 100,000 reps. What hardware needs to be added? What about the databases? In a nutshell, this is scalability.
Scalability of an application refers to how it is able to perform as the load on the application changes. This is often affected by the number of connected users, amount of data in a database, etc.
It is the ability for a system to accept an increased workload, more functionality, changing database, ... without impacting the original design or system.

Resources