Non binary decision tree to binary decision tree (Machine learning) - machine-learning

This is homework question, so I just need help may be yes/No and few comment will be appreciated!
Prove: Arbitrary tree (NON binary tree) can be converted to equivalent binary decision tree.
My answer:
Every decision can be generated just using binary decisions. Hence that decision tree too.
I don't know formal proof. Its like I can argue with Entropy(Gain actually) for that node will be E(S) - E(L) - E(R). And before that may be it is E(S) - E(Y|X=t1) - E(Y|X=t2) - and so on.
But don't know how to say?!

You can give a constructive proof of something like this, demonstrating how to convert an arbitrary decision tree into a binary decision tree.
Imagine that you are sitting at node A, and you have a choice of traversing to B, C, and D based on whether or not your example satisfies requirements B, C or D. If this is a proper decision tree, B, C and D are mutually exclusive and cover all cases.
A -> B
-> C
-> D
Since they're mutually exclusive, you could imagine splitting your tree into a binary decision: B or not B; on the not B branch, we know that either C or D has to be true, since B, C, and D were mutually exclusive and cover all cases. In other words:
A -> B
-> ~B
---> C
---> D
Then you can copy whatever was going to go after B on to the branch that follows B, performing the same simplification. Same for C and D.

Related

I'm trying to solve DFA

I have to do L1 U L2 and intersection L1 n L2
You can run through the formal Cartesian product machine construction to algorithmically derive automata for the intersection and union of L1 and L2. However, since these languages are so easy, it might be simpler to give the languages and just write down a DFA for each one.
L1 is the language of all strings of as and bs with at least one a. L2 is the language of all strings of as and bs with at least two bs.
To accept the intersection of L1 and L2, we need to see at least one as and two bs. Below, we have six states:
q0, the initial state, where we need one a and two bs
q1, where we still need two bs
q2, where we still need one b
q3, where we need no more (accepting state)
q4, where we still need one a and one b
q5, where we still need one a
--->q0-a->q1-b->q2-b->q3
-b->q4-a->q2 q3
-b->q5-a->q3
(where transitions are missing, they are self loops)
Note that there are six states: this is the same as if we had done the Cartesian product machine construction on the original DFAs of two and three states, respectively.
For union, we can use the exact same DFA and change the set of accepting states to q1, q3, q5. This captures the fact that we now accept when either condition is true (and states q1 and q5 are where one, but not both (as in q3) conditions become satisfied).

Finding an equivalent LR grammar for the same number of "a" and "b" grammar?

I can't seem to find an equivalent LR grammar for:
S → aSbS | bSaS | ε
which I think recognize strings with the same number of 'a' than 'b'.
What would be a workaround for this? Is it possible to find and LR grammar for this?
Thanks in advance!
EDIT:
I have found what I think is an equivalent grammar but I haven't been able to prove it.
I think I need to prove that the original grammar generates the language above, and then prove that language is generated for the following equivalent grammar. But I am not sure how to do it. How should I do it?
S → aBS | bAS | ε
B → b | aBB
A → a | bAA
Thanks in advance...
PS: I have already proven that this new grammar is LL(1), SLR(1), LR(1) and LALR(1).
Unless a grammar is directly related to another grammar -- for example through standard transformations such as normalization, null-production eliminate, and so on -- proving that two grammars derivee the same language is very difficult without knowing what the language is. It is usually easier to prove (independently) that each grammar derives the language.
The first grammar you provide:
S → aSbS | bSaS | ε
does in fact derive the language of all strings over the alphabet {a, b}* where the number of as is the same as the number of bs. We can prove that in two parts: first, that every sentence recognized by the grammar has that property, and second that every sentence which has that property can be derived by that grammar. Both proofs proceed by induction.
For the forward proof, we proceed by induction on the number of derivations. Suppose we have some derivation S → α → β → … → ω where all the greek letters represent sequences of non-terminals and terminals.
If the length of the derivation is exactly zero, so that it starts and ends with S, then there are no terminals in any derived sentence so its clear that every derived sentence has the same number of as and bs. (Base step)
Now for the induction step. Suppose that every derivation of length i is known to end with a derived sentence which has the same number of as and bs. We want to prove from that premise that every derivation of length i+1 ends with a sentence which has the same number of as and bs. But that is also clear: each of the three possible production steps preserves parity.
Now, let's look at the opposite direction: every sentence with the same number of as and bs can be derived from that grammar. We'll do this by induction on the length of the string. Our induction premise will be that if it is the case that for every j ≤ i, every sentence with exactly j as and j bs has a derivation from S, then every sentence with exactly i+1 as and i+1 bs. (Here we are only considering sentences consisting only of terminals.)
Consider such a ssentence. It either starts with an a or a b. Suppose that it starts with an a: then there is at least one b in the sentence such that the prefix ending with that b has the same number of each terminal. (Think of the string as a walk along a square grid: every a moves diagonally up and right one unit, and every b moves diagonally down and right. Since the endpoint is at exactly the same height as the beginning point and there are no wormholes in the graph, once we ascend we must sooner or later descend back to the starting height, which is a prefix ending b.) So the interior of that prefix (everything except the a at the beginning and the b at the end) is balanced, as is the remainder of the string. Both of those are shorter, so by the induction hypothesis they can be derived from S. Making those substitutions, we get aSbS, which can be derived from S. An identical argument applies to strings starting with b. Again, the base step is trivial.
So that's basically the proof procedure you'll need to adapt for your grammar.
Good luck.
By the way, this sort of question can also be posed on cs.stackexchange.com or math.stackexchange.com, where the MathJax is available. MathJax makes writing out mathematical proofs much less tedious, so you may well find that you'll get more readable answers there.

When does the hypothesis space contain the target concept

What does it mean when it is written that-
Hypothesis space contains the target concept?
If possible with an example.
TLDR: It means you can learn with zero error.
Here is an example what it means: Suppose a concept: f(a,b,c,d) = a & b & (!c | !d) (input are in boolean domain).
This concept is in a ML task usualy represented by the data, so you are given a dataset:
a | b | c | d = f
--+---+---+---+---
T T T T = F
T T T F = T
T T F T = T
... etc ...
And your hypothesis space is decision trees. In this case your hypothesis space contains target concept, as you can do (for example, there are more possibilities):
It can be proven, that any binary formula (concept) can be learned as
a decision tree. Thus General binary formulas are subset of decision trees. That means, when you know the concept is a binary formula (that you even may not know), you will be able to learn it with a decision tree (given enough examples) with zero error.
On the other hand if you want to learn the example concept by
monotone conjunctions, you can't do it, because binary formulas are not subset of monotone conjunctions.
(By subsets, I mean in terms of possible concepts. And from the subset relation, you can make statements about containing target concept in hypothesis space.)
Monotone conjunction is a set of conjunctions in which the variables are not negated. And you have more of those, when any of the conjunctions is true, the output is also true. Is is a subset of DNF where you cannot use negations.
Some concepts can be learned by montone conjunctions, but you cannot learn general binary formula concept by it. That means, you will not be able to learn with zero error, general binary formulas are not subset of monotone conjunctions.
Here is a nice PDF from Princeton on basics of ML: http://www.cs.princeton.edu/courses/archive/spr06/cos511/scribe_notes/0209.pdf

confusion about apprenticeship learning algorithm step

I've been following the paper here http://ai.stanford.edu/~ang/papers/icml04-apprentice.pdf but cannot figure out what operation the division symbol in section 3.1 indicates. All of the mu vectors are the same dimensionality; how are we supposed to perform division with them?
It looks like a typical division of numbers. You have something of form
A^T B
-----
C^T D
where A, B, C and D are vectors, thus A^T B is a number (it is just a dot product) and so is C^T D.

how to parse Context-sensitive grammar?

CSG is similar to CFG but the reduce symbol is multiple.
So, can I just use CFG parser to parse CSG with reducing production to multiple terminals or non-terminals?
Like
1. S → a bc
2. S → a S B c
3. c B → W B
4. W B → W X
5. W X → B X
6. B X → B c
7. b B → b b
When we meet W X, can we just reduce W X to W B?
When we meet W B, can we just reduce W B to c B?
So if CSG parser is based on CFG parser, it's not hard to write, is it true?
But when I checked wiki, it said to parse CSG, we should use linear bounded automaton.
What is linear bounded automaton?
Context sensitive grammars are non-deterministic. So you can not assume that a reduction will take place, just because the RHS happens to be visible at some point in a derivation.
LBAs (linear-bounded automata) are also non-deterministic, so they are not really a practical algorithm. (You can simulate one with backtracking, but there is no convenient bound on the amount of time it might take to perform a parse.) The fact that they are acceptors for CSGs is interesting for parsing theory but not really for parsing practice.
Just as with CFGs, there are different classes of CSGs. Some restricted subclasses of CSGs are easier to parse (CFGs are one subclass, for example), but I don't believe there has been much investigation into practical uses; in practice, CSGs are hard to write, and there is no obvious analog of a parse tree which can be constructed from a derivation.
For more reading, you could start with the wikipedia entry on LBAs and continue by following its references. Good luck.

Resources