What is the criteria that maze is valid - breadth-first-search

I know that valid maze needs to have one entry point and one exit point.
bfs traverse per level from level 0 , level 1 and so on and then we marked true if it's already visited till the queue is empty.
Assuming '#' represents a wall and '.' represents a path how do I tell to computer that the maze is connected or not ?
for example this one is connected
# . #
# . #
# . #
i've been surfing online looking for explanation but it's just not enough,

This is what BFS or other path-finding algorithms do.
It searches for a path - a continuous connected non-wall nodes, . in the example you gave.
To be able to do so, the algorithm need to know who are the neighbors of each node.
This information is typically provided to the algorithm by an adjacency matrix, or a neighbors list.

Related

Efficiently Finding all paths between 2 nodes in a directed graph - RGL Gem

I am struggling to find 1 efficient algorithm which will give me all possible paths between 2 nodes in a directed graph.
I found RGL gem, fastest so far in terms of calculations. I am able to find the shortest path using the Dijkstras Shortest Path Algorithm from the gem.
I googled, inspite of getting many solutions (ruby/non-ruby), either couldn't convert the code or the code is taking forever to calculate (inefficient).
I am here primarily if someone can suggest to find all paths using/tweaking various algorithms from RGL gem itself (if possible) or some other efficient way.
Input of directed graph can be an array of arrays..
[[1,2], [2,3], ..]
P.S. : Just to avoid negative votes/comments, unfortunately I don't have inefficient code snippet to show as I discarded it days ago and didn't save it anywhere for the record or reproduce here.
The main problem is that the number of paths between two nodes grows exponentially in the number of overall nodes. Thus any algorithm finding all paths between two nodes, will be very slow on larger graphs.
Example:
As an example imagine a grid of n x n nodes each connected to their 4 neighbors. Now you want to find all paths from the bottom left node to the top right node. Even when you only allow for moves to the right (r) and moves up (u) your resulting paths can be described by any string of length 2n with equal number of (r)'s and (u)'s. This will give you "2n choose n" number of possible paths (ignoring other moves and cycles)

Algorithm to map out a closed maze and remember how it looks for future use

I'm working on a project where I'll have an agent in a random maze, and this maze does not have an exit. The goal would be for the agent to explore the maze and 'remember' how it looks. After some time I'll spawn an item at a random location and the agent will be notified only if it has mapped out that given area. The agent will use the map it has generated to determine the shortest path to the item.
I know of maze algorithms like A*, but these algorithms require a start and end position for the traversal to stop. These algorithms don't 'remember' how the maze looks they just determine the shortest path between two points. Since the maze is closed there is no end position. My initial idea was to have the agent travel randomly and fill in a 2D array of how the map looks, this just seems inefficient to me. Any ideas would be great.
So you will have two steps, exploration and traversing.
Suppose you have explored the maze completely, then when the item appears, you can just use A* with goal being the item.
To explore the map and store it, you can create a data structure appropriate for the map. For example, if the connecting paths don't matter and only the conjunctions do, then just create a Node class where each node has a list of connected nodes. Finally, you can start a breadth-first search or depth-first search to explore the whole map, while storing the info in the aforementioned data structure.
Depending on the actual map, either exploration algorithms might be more effective. I'd start with depth-first though, since that sounds similar to our human approach to mazes - always turn in the same direction at an intersection! (Good that dfs takes care of circular paths!)

How to encode dependency path as a feature for classification?

I am trying to implement relation extraction between verb pairs. I want to use dependency path from one verb to the other as a feature for my classifier (predicts if relation X exists or not). But I am not sure how to encode the dependency path as a feature. Following are some example dependency paths, as space separated relation annotations from StanfordCoreNLP Collapsed Dependencies:
nsubj acl nmod:from acl nmod:by conj:and
nsubj nmod:into
nsubj acl:relcl advmod nmod:of
It is important to keep in mind that these path are of variable length and a relation could reappear without any restriction.
Two compromising ways of encoding this feature that come to my mind are:
1) Ignore the sequence, and just have one feature for each relation with its value being the number of times it appears in the path
2) Have a sliding window of length n, and have one feature for each possible pair of relations with the value being the number of times those two relations appeared consecutively. I suppose this is how one encodes n-grams. However, the number of possible relations is 50, which means I cannot really go with this approach.
Any suggestions are welcomed.
We had a project that built a classifier based off of dependency paths. I asked the group member who developed the system, and he said:
indicator feature for the whole path
So if you have the training data point (verb1 -e1-> w1 -e2-> w2 -e3-> w3 -e4-> verb2, relation1) the feature would be (e1-e2-e3-e4)
And he also did ngram sequences, so for that same data point, you would also have (e1), (e2), (e3), (e4), (e1-e2), (e2-e3), (e3-e4), (e1-e2-e3), (e2-e3-e4)
He also recommended collapsing appositive edges to make the paths smaller.
Also, I should note that he developed a set of high precision rules for each relation, and used this to create a large set of training data.

A* circular path finding algorithm with restrictions

I have a road map represented as a directed graph of junctions and links leading from one junction to another, each link is weighted with it's own traversal time (the time it takes to cross the link) and im asked to find an algorithm to get from junction A to junction B and back from junction B to junction A so that the total path cost (in time) takes no longer than 10% more time than the optimal path cost (that is the path cost returned by A* algorithm) while keeping the time overlaps of the path to B and the path from B to a minimum, that is if t(x,y) represents the time to cross link (x,y) i need to bring to minimum the sum of t(x,y) + t(y,x) for the links that overlap.
the algorithm should be optimal for the problem at hand and complete (it should also be efficient) and probably use some variants of A* like A*epsilon and the likes...
does anyone have a clue how to go about this problem?
i was thinking of representing the states of this problem as (junction,flag) where flag indicates whether the current node is a part of a path that already passed junction B and the goal state is (A,True) and then using A*epsilon on this... but i don't know how to take into account the time overlap issue.. i guess what im suggesting is not the way im intended to solve this.
any help would be greatly appreciated :)

Is spacial search in P2P network possible?

I want to build a Javascript/HTML5 geolocation based social network and I wonder the best choice of possible architectures. Client-server can be simple to develop but drawback is the system ressources that could be very high, especially because the application must manage moves (worst case: a user that is in a car must see others users that are around him in cars).
Basicaly, in a client-server architecture, server tasks will be :
collects and stores latitude and longitude of the users (could have thousands of them)
makes geo distance search for that user (to get the list of users present around him in a radius)
builds and sends to the client an XML file with position of the users in the list
These 3 operation must be done periodically, every 3 or 5 seconds because I want a "live" map that shows users in the list moving in their environnement (city, town).
All these 3 points could be optimized :
client send his position when moving of 10 meters to reduce amount of data to process
"spherical rectangle" search in MyISAM table with spatial index (use of MBRContains) to off load MySQL database.
common output file : the XML that is sent can be the same if 2 users are located in a radius of x meters (the 2 users are close each-other).
It is hard to make load estimation at this stage but I think client-server architecture is not appropriate for that type of application and peer2peer could be a nice answer if 2 clients could communicate when they are near each other.
My point is:
Is there any methode to make possible a client to blind search other clients that are located in a certain radius without the help of a central server ? (it is possible with UDP broadcast :-)
edit : Correction. UDP Brodcast allow a client to poll a machine wherever it is, in certain range or IP address.
Thank you for your help,
Florent
You will have to have central peers/servers, because you need to centralize some information to be able to perform you functionalities.
I would go for the following:
Assign square miles (or whatever size you want) to specific servers.
Have devices send a 'I am here' message with their coordinates to some dispatcher that will forward these to the correct square mile server for handling.
Have servers register when a device enters a square mile they manage. This could be a central map to make sure a device is registered to one and only one square.
Forward this message to all other devices in the square.
And/or make sure you include to which square this message is intended and make sure the devices checks it before displays it to the user.
Tune the size of the square and the rate of 'I am here' message. That's it.
The answer actually depends on many things so I'll help out with basic strategy. To understand things out you'll need to understand how does Kademlia works (Kademlia is a DHT P2P network that stores information).
In Kademlia at first startup each node picks random ID which is a 160 bit number that represents point in a space of all possible 160 bit IDs.
The ID of the information that needs to be stored is obtained with SHA-1 function (it receives arbitrary string, and outputs 160 bit number that is treated like ID of the information that needs to be stored)
After that you have the ID of the information, you publish it, the information is physically stored on a node that has it's ID close to information ID.
(The illustration is taken from here)
The information is queried via it's ID. Both the information lookups or node lookups takes O(log(N)) hops to obtain the required information. The "XOR" metric is used in Kademlia (in your case it can be ordinary Euclidian metric).
Each node maintains an array of buckets, each bucket contains addresses of nodes that are appropriate to the current bucket. The appropriate'ness is a measure of how close the IDs are. consider example:
0 160
Node 1 ID: 1101000101011111101110101001010...
Node 2 ID: 1101011101011111101110101001010...
Node 3 ID: 1101000101011001101110101001010...
After applying XOR metric to Nodes #1,2 i.e (computing the number that represents the virtual distance between these nodes) we get:
index - 012345678901234
xor - 000001100000000... (the difference is in 5-th msb bit)
order - msb lsb
After applying Xor metric to Nodes #1,3 we get:
index - 012345678901234
xor - 000000000000011... (the difference is in 13-th msb bit)
order - msb lsb
Apparently Node 1 is closer to Node 3 since it has difference in less significant bits than the distance from Node 1 to Node 2. And therefore from a point of view of a Node 1, it's neighbor Node 3 goes to 13-th bucket(higher index means closer IDs), and Node 2 goes to to 5-th bucket which contains a group of nodes that are 5 MSB radixes away from a current node ID.
Such data structure allows each node to know it's surroundings in variety of 160 levels of distances.
Back to your example, to allow efficient geospacial queries you'll need to replace Kademlias XOR metric with ordinary Euclidian metric. In this case you will have your ID's as a 3D or 2D vectors, and unfortunately due to fact that Euclidian metric results with floating point numbers which are not directly suitable for this type of algorithm so you will need to convert them to a discrete binary numbers somehow in a way similar to what XOR function does. After that, finding node's neighboring nodes is a trivial task.
Hope this helps. Oh by the way look to HyperDex, new searchable distributed datastore closely tied to euclidian metric, might help...

Resources