In Bayesian networks, what does it mean a node is "instantiated" - machine-learning

i am trying to follow these slides on bayesian networks.
Can anybody explain me what it means that a node in a bayesian network is "instantiated"?

It means the node is created. Spawned. Brought to existence. If B isn't represented by an instance (roughly: does not exist), then the path is different than if B exists (is instantiated).
You can get evidence either by instantiating a node (in which case its truth value is known) or by arriving to this node from some other node. So either the node is instantiated and you get evidence from its truth value, or it is not and you get evidence from the flow.

Instantiating a node in Bayesian networks is different from object oriented programming. A node is instantiated when it's value is known through observing what it represents. If it is not instantiated then it's value can be updated through Bayesian inference.
In the example A -> B -> C. Assuming nodes are boolean (either true or false) then if you instantiate C (e.g. C = true) then the values of B and A will update using Bayesian inference. However, if B is also instantiated then it d-separates A and C, so instantiating C will not update A. The rules of d-separation depend on the type of node configuration, so that instantiating a node may either d-separate or d-connect nodes.

Related

torch.no_grad() and detach() combined

I encountered many code fragments like the following for choosing an action, that include a mix of torch.no_grad and detach (where actor is some actor, SomeDistribution your preferred distribution), and I'm wondering whether they make sense:
def f():
with torch.no_grad():
x = actor(observation)
dist = SomeDistribution(x)
sample = dist.sample()
return sample.detach()
Is the use of detach in the return statement not unnecessary, as x has its requires_grad already set to False, so all computations using x should already be detached from the graph? Or do the computations after the torch.no_grad wrapper somehow end up on the graph again, so we need to detach them once again in the end (in which case it seems to me that no_grad would be unnecessary)?
Also, if I'm right, I suppose instead of omitting detach one could also omit torch.no_grad, and end up with the same functionality, but worse performance, so torch.no_grad is to be preferred?
While it may be redundant, it depends on the internals of actor and SomeDistribution. In general, there are three cases I can think of where detach would be necessary in this code. Since you've already observed that x has requires_grad set to False then cases 2 and 3 don't apply to your specific case.
If SomeDistribution has internal parameters (leaf tensors with requires_grad=True) then dist.sample() may result in a computation graph connecting sample to those parameters. Without detaching, that computation graph, including those parameters, would be unnecessarily kept in memory after returning.
The default behavior within a torch.no_grad context is to return the result of tensor operations having requires_grad set to False. However, if actor(observation) for some reason explicitly sets requires_grad of its return value to True before returning, then a computation graph may be created that connects x to sample. Without detaching, that computation graph, including x, would be unnecessarily kept in memory after returning.
This one seems even more unlikely, but if actor(observation) actually just returns a reference to observation, and observation.requires_grad is True, then a computation graph all the way from observation to sample may be constructed during dist.sample().
As for the suggestion of removing the no_grad context in leu of detach, this may result in the construction of a computation graph connecting observation (if it requires gradients) and/or the parameters of the distribution (if it has any) to x. The graph would be discarded after detach, but it does take time and memory to create the computation graph, so there may be a performance penalty.
In conclusion, it's safer to do both no_grad and detach, though the necessity of either depends on the details of the distribution and actor.

How to find a function that fits a given data set?

The search algorithm is a Breadth first search. I'm not sure how to store terms from and equation into a open list. The function f(x) has the form of ax^e1 + bx^e2 + cx^e3 + k, where a, b, c, are coefficients; k is constant. All exponents, coefficients, and constants are integers between 0 and 5.
Initial state: of the problem solving process should be any term from the ax^e1, bx^e2, cX^e3, k.
The algorithm gradually expands the number of terms in each level of the list.
Not sure how to add the terms to an equation from an open Queue. That is the question.
The general problem that you are dealing belongs to the regression analysis area, and several techniques are available to find a function that fits a given data set, including the popular least squares methods for finding the line of best fit given a dataset (a brief starting point is the related page on wikipedia, but if you want to deepen this topic, you should look at the research paper out there).
If you want to stick with the breadth first search algorithm, although this kind of approach is not common for such a problem, first of all, you need to define all the elements for a search problem, namely (see for more information Chapter 3 of the book of Stuart and Russell, Artificial Intelligence: A Modern Approach):
Initial state: Some initial values for the different terms.
Actions: in your case it should be a change in the different terms. Note that you should discretize the changes in the values.
Transition function: function that determines the new states given a state and an action.
Goal test: a check to recognize whether a state is a goal state or not, and so to terminate the search. There are different ways to define this test in a regression problem. One way is to set a threshold for the sum of the square errors.
Step cost: The cost for an action. In such an abstract problem, probably you can consider the unweighted distance from the initial state on the search graph.
Note that you should carefully think about these elements, as, for example, they determine how efficient your search would be or whether you will have cycles in the search graph.
After you defined all of the elements for the search problem, you basically have to implement:
Node, that contains information about the parent, the state, and the current cost;
Function to expand a given node that returns the successor nodes (according to the transition function, the actions, and the step cost);
Goal test;
The actual search algorithm. In the queue at the beginning you will have the node containing the initial state. After, it is updated with the successor nodes.

ELKI hierarchical clustering - "mrg_" Cluster object

I'm using ELKI's SimplifiedHierarchyExtraction with AnderbergHierarchicalClustering, LatLngDistanceFunction and minClSize = 100.
I saw that beside the "clu_" Clusters there are also 2 -3 "mrg_" Clusters which have some DBID's, but the number of it is < minClSize.
My question is: what is the best way to handle this "mrg_" Clusters?:
passing its DBID´s to one of its "clu_" children?
taking them as a cluster although they are under the minClSize?
simply ignoring them?
This is a hierarchical result.
You need to include all child clusters into a cluster.
So the mrg_ cluster has some (potentially 0) new objects, plus all those objects in child clusters. In particular, it can have more than one child cluster (that is why it is called merge)

Memory allocation for a tree and its subtree

In linux system.
If I have two binary trees, tree A has millions of nodes, while tree B has only a few hundred nodes.
I want to check if B is a subtree of A.
One solution I am thinking is, say, A uses 50Mb of the memory, and the addresses are contiguous, while B uses 1Kb. If B is part of A, the addresses of B would be a subset of A's addresses (I guess?).
So can I use tree A’s memory address range and B’s memory address range to determine if B is a subtree of A?
UPDATE:
I think if we are using static memory allocation, and there is one node that refers to the same pointer as the root of B refers to, probably when we find that node, we can determine B is a subtree of A.
You cannot use the memory addresses of A and B to check for the equality of A and B.
An alternative is to generate a hash of the B tree. Then do a depth first traversal of A, generating the hash of the subtrees of A. If the hash of a subtree of A matches the hash of B, then verify that B matches that particular A subtree.
See this for generating a hash from a tree.

Determining if OZ variable is bound?

Is there a safe way to ask if a single assignment variable in OZ is bound or not?
Using an unassigned data flow variable in a way that requires the value will cause the program to wait until a value is assigned. In a sequential environment, this means the the program hangs. Assigning a different value to a variable will cause the program to fail. So both ways "tell" me if the variable was bound but not in a safe way.
I'm looking for some function "Bound" where
local X Y=1 Xbound YBound in
Xbound={Bound? X}
Ybound={Bound? Y}
end
gives false and true for Xbound and Ybound respectively.
My use case involves processing a list where values are added incrementally with the last value always being unbound. I want to use the last bound item (the one before unbound one.) And I'm trying to work in the OZ paradigm with the least concepts added (so no mutable variables or exceptions.)
You can check whether a variable is bound with the function IsDet.
See here: http://mozart.github.io/mozart-v1/doc-1.4.0/base/node4.html (also works in Mozart 1.3.0)
A word of caution: if you are using multiple threads, this opens the door for race conditions.

Resources