Is spacial search in P2P network possible? - geolocation

I want to build a Javascript/HTML5 geolocation based social network and I wonder the best choice of possible architectures. Client-server can be simple to develop but drawback is the system ressources that could be very high, especially because the application must manage moves (worst case: a user that is in a car must see others users that are around him in cars).
Basicaly, in a client-server architecture, server tasks will be :
collects and stores latitude and longitude of the users (could have thousands of them)
makes geo distance search for that user (to get the list of users present around him in a radius)
builds and sends to the client an XML file with position of the users in the list
These 3 operation must be done periodically, every 3 or 5 seconds because I want a "live" map that shows users in the list moving in their environnement (city, town).
All these 3 points could be optimized :
client send his position when moving of 10 meters to reduce amount of data to process
"spherical rectangle" search in MyISAM table with spatial index (use of MBRContains) to off load MySQL database.
common output file : the XML that is sent can be the same if 2 users are located in a radius of x meters (the 2 users are close each-other).
It is hard to make load estimation at this stage but I think client-server architecture is not appropriate for that type of application and peer2peer could be a nice answer if 2 clients could communicate when they are near each other.
My point is:
Is there any methode to make possible a client to blind search other clients that are located in a certain radius without the help of a central server ? (it is possible with UDP broadcast :-)
edit : Correction. UDP Brodcast allow a client to poll a machine wherever it is, in certain range or IP address.
Thank you for your help,
Florent

You will have to have central peers/servers, because you need to centralize some information to be able to perform you functionalities.
I would go for the following:
Assign square miles (or whatever size you want) to specific servers.
Have devices send a 'I am here' message with their coordinates to some dispatcher that will forward these to the correct square mile server for handling.
Have servers register when a device enters a square mile they manage. This could be a central map to make sure a device is registered to one and only one square.
Forward this message to all other devices in the square.
And/or make sure you include to which square this message is intended and make sure the devices checks it before displays it to the user.
Tune the size of the square and the rate of 'I am here' message. That's it.

The answer actually depends on many things so I'll help out with basic strategy. To understand things out you'll need to understand how does Kademlia works (Kademlia is a DHT P2P network that stores information).
In Kademlia at first startup each node picks random ID which is a 160 bit number that represents point in a space of all possible 160 bit IDs.
The ID of the information that needs to be stored is obtained with SHA-1 function (it receives arbitrary string, and outputs 160 bit number that is treated like ID of the information that needs to be stored)
After that you have the ID of the information, you publish it, the information is physically stored on a node that has it's ID close to information ID.
(The illustration is taken from here)
The information is queried via it's ID. Both the information lookups or node lookups takes O(log(N)) hops to obtain the required information. The "XOR" metric is used in Kademlia (in your case it can be ordinary Euclidian metric).
Each node maintains an array of buckets, each bucket contains addresses of nodes that are appropriate to the current bucket. The appropriate'ness is a measure of how close the IDs are. consider example:
0 160
Node 1 ID: 1101000101011111101110101001010...
Node 2 ID: 1101011101011111101110101001010...
Node 3 ID: 1101000101011001101110101001010...
After applying XOR metric to Nodes #1,2 i.e (computing the number that represents the virtual distance between these nodes) we get:
index - 012345678901234
xor - 000001100000000... (the difference is in 5-th msb bit)
order - msb lsb
After applying Xor metric to Nodes #1,3 we get:
index - 012345678901234
xor - 000000000000011... (the difference is in 13-th msb bit)
order - msb lsb
Apparently Node 1 is closer to Node 3 since it has difference in less significant bits than the distance from Node 1 to Node 2. And therefore from a point of view of a Node 1, it's neighbor Node 3 goes to 13-th bucket(higher index means closer IDs), and Node 2 goes to to 5-th bucket which contains a group of nodes that are 5 MSB radixes away from a current node ID.
Such data structure allows each node to know it's surroundings in variety of 160 levels of distances.
Back to your example, to allow efficient geospacial queries you'll need to replace Kademlias XOR metric with ordinary Euclidian metric. In this case you will have your ID's as a 3D or 2D vectors, and unfortunately due to fact that Euclidian metric results with floating point numbers which are not directly suitable for this type of algorithm so you will need to convert them to a discrete binary numbers somehow in a way similar to what XOR function does. After that, finding node's neighboring nodes is a trivial task.
Hope this helps. Oh by the way look to HyperDex, new searchable distributed datastore closely tied to euclidian metric, might help...

Related

Shortest path in games (StarCraft example)

In games like StarCraft you can have up to 200 units (for player) in a map.
There are small but also big maps.
When you for example grab 50 units and tell them to go to the other side of the map some algorithm kicks in and they find path through the obsticles (river, hills, rocks and other).
My question is do you know how the game doesnt slow down because you have 50 paths to calculate. In the meantime other things happens like drones collecting minerals buildinds are made and so on. And if the map is big it should be harder and slower.
So even if the algorithm is good it will take some time for 100 units.
Do you know how this works maybe the algorithm is similar to other games.
As i said when you tell units to move you did not see any delay for calculating the path - they start to run to the destination immediately.
The question is how they make the units go through the shortest path but fast.
There is no delay in most of the games (StarCraft, WarCraft and so on)
Thank you.
I guess it just needs to subdivide the problem and memoize the results. Example: 2 units. Unit1 goes from A to C but the shortest path goes through B. Unit2 goes from B to C.
B to C only needs to be calculated once and can be reused by both.
See https://en.m.wikipedia.org/wiki/Dynamic_programming
In this wikipedia page it specifically mentions dijkstra's algorithm for path finding that works by subdividing the problem and store results to be reused.
There is also a pretty good looking alternative here http://www.gamasutra.com/blogs/TylerGlaiel/20121007/178966/Some_experiments_in_pathfinding__AI.php where it takes into account dynamic stuff like obstacles and still performs very well (video demo: https://www.youtube.com/watch?v=z4W1zSOLr_g).
Another interesting technique, does a completely different approach:
Calculate the shortest path from the goal position to every point on the map: see the full explanation here: https://www.youtube.com/watch?v=Bspb9g9nTto - although this one is inefficient for large maps
First of all 100 units is not such a large number, pathfinding is fast enough on modern computers that it is not a big resource sink. Even on older games, optimizations are made to make it even faster, and you can see that unit will sometimes get lost or stuck, which shouldn't really happen with a general algorithm like A*.
If the map does not change map, you can preprocess it to build a set of nodes representing regions of the map. For example, if the map is two islands connected by a narrow bridge, there would be three "regions" - island 1, island 2, bridge. In reality you would probably do this with some graph algorithm, not manually. For instance:
Score every tile with distance to nearest impassable tile.
Put all adjacent tiles with score above the threshold in the same region.
When done, gradually expand outwards from all regions to encompass low-score tiles as well.
Make a new graph where each region-region intersection is a node, and calculate shortest paths between them.
Then your pathfinding algorithm becomes two stage:
Find which region the unit is in.
Find which region the target is in.
If different regions, calculate shortest path to target region first using the region graph from above.
Once in the same region, calculate path normally on the tile grid.
When moving between distant locations, this should be much faster because you are now searching through a handful of nodes (on the region graph) plus a relatively small number of tiles, instead of the hundreds of tiles that comprise those regions. For example, if we have 3 islands A, B, C with bridges 1 and 2 connecting A-B and B-C respectively, then units moving from A to C don't really need to search all of B every time, they only care about shortest way from bridge 1 to bridge 2. If you have a lot of islands this can really speed things up.
Of course the problem is that regions may change due to, for instance, buildings blocking a path or units temporarily obstructing a passageway. The solution to this is up to your imagination. You could try to carefully update the region graph every time the map is altered, if the map is rarely altered in your game. Or you could just let units naively trust the region graph until they bump into an obstacle. With some games you can see particularly bad cases of the latter because a unit will continue running towards a valley even after it's been walled off, and only after hitting the wall it will turn back and go around. I think the original Starcraft had this issue when units block a narrow path. They would try to take a really long detour instead of waiting for the crowd to free up a bridge.
There's also algorithms that accomplish analogous optimizations without explicitly building the region graph, for instance JPS works roughly this way.

Lowest value from 2 payloads in Node-Red

I have a IoT system in home and two temperature sensors.
One of the sensors could work in some hours in direct sun.
The real temperature is always the lowest value, so sometimes temp1, sometimes temp2.
What I want to achieve is:
read the temperature from sensors1 (via MQTT)
read the temperature from sensors2 (via MQTT)
compare values
find the lowest one and send in via MQTT
go back to reading in loop
For this example I can simulate readings with injection nodes
How to do that? I am new in Node-Red, have tried but without success.
Here is my flow:
[{"id":"fa6372cc.47f92","type":"tab","label":"Flow 8","disabled":false,"info":""},{"id":"5ac90e03.22da3","type":"join","z":"fa6372cc.47f92","name":"","mode":"custom","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"","joinerType":"str","accumulate":true,"timeout":"","count":"2","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":990,"y":340,"wires":[["f09774bf.3c8428","a197b84d.6a7338"]]},{"id":"f09774bf.3c8428","type":"debug","z":"fa6372cc.47f92","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":1130,"y":340,"wires":[]},{"id":"43900e79.98cd8","type":"change","z":"fa6372cc.47f92","name":"set payload value","rules":[{"t":"set","p":"payload","pt":"msg","to":"req.params.value","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":790,"y":340,"wires":[["5ac90e03.22da3"]]},{"id":"b71d9143.c03bd","type":"change","z":"fa6372cc.47f92","name":"set topic temp1","rules":[{"t":"set","p":"topic","pt":"msg","to":"temp1","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":560,"y":320,"wires":[["43900e79.98cd8"]]},{"id":"e87114aa.6cd1","type":"change","z":"fa6372cc.47f92","name":"set topic temp2","rules":[{"t":"set","p":"topic","pt":"msg","to":"temp2","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":560,"y":360,"wires":[["43900e79.98cd8"]]},{"id":"783c47fd.8dd58","type":"inject","z":"fa6372cc.47f92","name":"temp source 2","topic":"","payload":"12","payloadType":"num","repeat":"3","crontab":"","once":false,"onceDelay":"1.5","x":380,"y":360,"wires":[["e87114aa.6cd1"]]},{"id":"271dedab.aaa7b2","type":"inject","z":"fa6372cc.47f92","name":"temp source 1","topic":"","payload":"10","payloadType":"num","repeat":"2","crontab":"","once":false,"onceDelay":"1","x":380,"y":320,"wires":[["b71d9143.c03bd"]]},{"id":"a197b84d.6a7338","type":"mqtt out","z":"fa6372cc.47f92","name":"temperature","topic":"domoticz/in","qos":"","retain":"","broker":"7e3561ec.acad","x":1150,"y":280,"wires":[]},{"id":"7e3561ec.acad","type":"mqtt-broker","z":"","name":"Domoticz","broker":"192.168.6.11","port":"8084","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthRetain":"false","birthPayload":"","closeTopic":"","closeRetain":"false","closePayload":"","willTopic":"","willQos":"0","willRetain":"false","willPayload":""}]
One way to do it would be like this:
This is storing the two temps in flow variables - the first flow initially sets them to a high number so the "min" in "choose lower value" will later work. In this case I've used a change node setting the payload to the JSONata of
$min([$flowContext("temp1"), $flowContext("temp2")])
but there's a few ways you could choose to do it.
Here is the code to try:
[{"id":"6bc2755e.9feb9c","type":"debug","z":"f454a93f.0e89d8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":990,"y":340,"wires":[]},{"id":"38bd03eb.f7d06c","type":"change","z":"f454a93f.0e89d8","name":"choose lower value","rules":[{"t":"set","p":"payload","pt":"msg","to":"$min([$flowContext(\"temp1\"), $flowContext(\"temp2\")])\t","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":790,"y":340,"wires":[["6bc2755e.9feb9c"]]},{"id":"9066677f.eb0358","type":"change","z":"f454a93f.0e89d8","name":"store temp1","rules":[{"t":"set","p":"temp1","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":320,"wires":[["38bd03eb.f7d06c"]]},{"id":"a70c9b2a.e7db58","type":"change","z":"f454a93f.0e89d8","name":"store temp2","rules":[{"t":"set","p":"temp2","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":360,"wires":[["38bd03eb.f7d06c"]]},{"id":"4bd27616.d022c8","type":"inject","z":"f454a93f.0e89d8","name":"temp source 2","topic":"","payload":"12","payloadType":"num","repeat":"","crontab":"","once":false,"onceDelay":"1.5","x":370,"y":360,"wires":[["a70c9b2a.e7db58"]]},{"id":"7378dd4f.3825b4","type":"inject","z":"f454a93f.0e89d8","name":"temp source 1","topic":"","payload":"10","payloadType":"num","repeat":"","crontab":"","once":false,"onceDelay":"1","x":370,"y":320,"wires":[["9066677f.eb0358"]]},{"id":"314eb0ec.85211","type":"inject","z":"f454a93f.0e89d8","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":true,"onceDelay":0.1,"x":370,"y":260,"wires":[["688646b.138a6b8"]]},{"id":"688646b.138a6b8","type":"change","z":"f454a93f.0e89d8","name":"set to high","rules":[{"t":"set","p":"temp1","pt":"flow","to":"999","tot":"num"},{"t":"set","p":"temp2","pt":"flow","to":"999","tot":"num"}],"action":"","property":"","from":"","to":"","reg":false,"x":550,"y":260,"wires":[[]]}]

Cluster Analysis for crowds of people

I have location data from a large number of users (hundreds of thousands). I store the current position and a few historical data points (minute data going back one hour).
How would I go about detecting crowds that gather around natural events like birthday parties etc.? Even smaller crowds (let's say starting from 5 people) should be detected.
The algorithm needs to work in almost real time (or at least once a minute) to detect crowds as they happen.
I have looked into many cluster analysis algorithms, but most of them seem like a bad choice. They either take too long (I have seen O(n^3) and O(2^n)) or need to know how many clusters there are beforehand.
Can someone help me? Thank you!
Let each user be it's own cluster. When she gets within distance R to another user form a new cluster and separate again when the person leaves. You have your event when:
Number of people is greater than N
They are in the same place for the timer greater than T
The party is not moving (might indicate a public transport)
It's not located in public service buildings (hospital, school etc.)
(good number of other conditions)
One minute is plenty of time to get it done even on hundreds of thousands of people. In naive implementation it would be O(n^2), but mind there is no point in comparing location of each individual, only those in close neighbourhood. In first approximation you can divide the "world" into sectors, which also makes it easy to make the task parallel - and in turn easily scale. More users? Just add a few more nodes and downscale.
One idea would be to think in terms of 'mass' and centre of gravity. First of all, do not mark something as event until the mass is not greater than e.g. 15 units. Sure, location is imprecise, but in case of events it should average around centre of the event. If your cluster grows in any direction without adding substantial mass, then most likely it isn't right. Look at methods like DBSCAN (density-based clustering), good inspiration can be also taken from physical systems, even Ising model (here you think in terms of temperature and "flipping" someone to join the crowd)ale at time of limited activity.
How to avoid "single-linkage problem" mentioned by author in comments? One idea would be to think in terms of 'mass' and centre of gravity. First of all, do not mark something as event until the mass is not greater than e.g. 15 units. Sure, location is imprecise, but in case of events it should average around centre of the event. If your cluster grows in any direction without adding substantial mass, then most likely it isn't right. Look at methods like DBSCAN (density-based clustering), good inspiration can be also taken from physical systems, even Ising model (here you think in terms of temperature and "flipping" someone to join the crowd). It is not a novel problem and I am sure there are papers that cover it (partially), e.g. Is There a Crowd? Experiences in Using Density-Based Clustering and Outlier Detection.
There is little use in doing a full clustering.
Just uses good database index.
Keep a database of the current positions.
Whenever you get a new coordinate, query the database with the desired radius, say 50 meters. A good index will do this in O(log n) for a small radius. If you get enough results, this may be an event, or someone joining an ongoing event.

How do systems typically map an 997 or 999 acknowledgement back to the originating ISA?

The implementation guides (and most web resources I can find) describe the GS06 and ST02 Control Numbers as being unique only within the Interchange they are contained in. So when we build our GS and ST segments we just start the control numbers at 1 and increment as we add more Functional Groups and/or Transaction Sets. The ISA13 control numbers we generate are always unique.
The dilemma is when we receive a 999 acknowledgment; it does not include any reference to the ISA control number that it's responding to. So we have no way to find the correct originating Functional Group in our records.
This seems like a problem that anyone receiving functional acknowledgements would face, but clearly lots of systems and companies handle it, so what is the typical practice to reconcile 997s or 999s? I think we must be missing something in our reading of the guides.
GS06 and ST02 only have to be unique within the interchange, but if you use an ID that's truly unique for each one (not just within the message), then you can skip right to the proper transaction set or functional group, not just the right message.
I typically have GS start at 1 and increment the same way that you do, but the ST02 I keep unique (to the extent allowed by the 9 character limit).
GS06 is supposed to be globally unique, not only within the interchange. This is from X12-6
In order to provide sufficient discrimination for the acknowledgment
process to operate reliably and to ensure that audit trails are
unambiguous, the combination of Functional ID Code (GS01), Application
Sender's ID (GS02), Application Receiver's ID (GS03), and Functional
Group Control Numbers (GS06, GE02) shall by themselves be unique
within a reasonably extended time frame whose boundaries shall be
defined by trading partner agreement. Because at some point it may be
necessary to reuse a sequence of control numbers, the Functional Group
Date and Time may serve as an additional discriminant only to
differentiate functional group identity over the longest possible time
frame.

Entity resolution for venues and other geo locations

Say I want to build a check-in aggregator that counts visits across platforms, so that I can know for a given place how many people have checked in there on Foursquare, Gowalla, BrightKite, etc. Is there a good library or set of tools I can use out of the box to associate the venue entries in each service with a unique place identifier of my own?
I basically want a function that can map from a pair of (placename, address, lat/long) tuples to [0,1) confidence that they refer to the same real-world location.
Someone must have done this already, but my google-fu is weak.
Yes, you can submit the two addresses using geocoder.net (assuming you're a .Net developer, you didn't say). It provides a common interface for address verification and geocoding, so you can be reasonably sure that one address equals another.
If you can't get them to standardize and match, you can compare their distances and assume they are the same place if they are below a certain threshold away from each other.
I'm pessimist that there is such a tool already accessible.
A good solution to match pairs based on the entity resolution literature would be to
get the placenames, define and use a good distance function on them (eg. edit distance),
get the address, standardize (eg. with the mentioned geocoder.net tools), and also define distance between them,
get the coordinates and get a distance (this is easy: there are lots of libraries and tools for geographic distance calculations, and that seems to be a good metric),
turn the distances to probabilities ("what is the probability of such a distance, if we suppose these are the same places")(not straightforward),
and combine the probabilities (not straightforward also).
Then maybe a closure-like algorithm (close the set according to merging pairs above a given probability treshold) also can help to find all the matchings (for example when different names accumulate for a given venue).
It wouldn't be a bad tool or service however.

Resources