I did create a dictionary with 2 languages(English/Persian) at the one file like this:
بگو B E G U
خزنده KH A Z A N D E
قدت GH A D E T
چنده CH A N D E
قد GH A D
من M A N
شب SH A B
hi H AA Y
hello H E L L O
how H O V
are AA R
you Y U
what V AA T
is I Z
your Y O R
name N E Y M
old O L D
where V E R
from F E R AA M
And used http://www.speech.cs.cmu.edu/tools/lmtool-new.html to build the language model. Then I tried to learn an acoustic model with that language model and test it.
It works good for Persian voices but doesn't work for English words. After some try&error I found that the problem is about my phoneset. I used my own phoneset as you can see above, but it seems pocketsphinx doesn't accept this phoneset for English words and it only accepts it's own phoneset for English!
So I want to know did I found the problem true? Should I use the pocketsphinx phoneset for my Persian words as well? Where should I find it's complete phoneset and a guide to learn how to use it for Persian words?
You have to build a new acoustic model with joined phoneset
Related
I am wondering what is the best way to support different versions of a language in my grammar.
I am working on modifying an existing grammar for a language and there is a new version of the language, introducing new keywords and additional syntax I should be able to parse. However, existing codebase written in the language can already use these new keywords as identifiers for example, so I have to make this extension optional.
So my question is: what is the preferred way to write conditional lexer and parser rules, based on a boolean value? Semantic predicates came to my mind, but I am relatively new to antlr and I'm not sure if it is a good idea to use them for such a purpose.
I had very good success with semantic predicates in the MySQL grammar, to support various MySQL versions. This includes new features, removed features and features that were valid only for a certain MySQL version range. Additionally, you can use the semantic predicates to tell the user in which version a specific syntax would be valid. But you have to parse the predicates yourself for that.
As an example, in this line a new import statement is conditionally added:
simpleStatement:
// DDL
...
| {serverVersion >= 80000}? importStatement
I have a field serverVersion in my common recognizer class from which both generated lexer and parser classes derive. This field is set with a valid version, right before the parsing process is triggered.
Also in the lexer you can guard keywords with this approach, like shown in this and surrounding lines in the MySQL lexer:
MASTER_SYMBOL: M A S T E R;
MASTER_TLS_VERSION_SYMBOL: M A S T E R '_' T L S '_' V E R S I O N {serverVersion >= 50713}?;
MASTER_USER_SYMBOL: M A S T E R '_' U S E R;
MASTER_HEARTBEAT_PERIOD_SYMBOL: M A S T E R '_' H E A R T B E A T '_' P E R I O D?;
MATCH_SYMBOL: M A T C H; // SQL-2003-R
MAX_CONNECTIONS_PER_HOUR_SYMBOL: M A X '_' C O N N E C T I O N S '_' P E R '_' H O U R;
MAX_QUERIES_PER_HOUR_SYMBOL: M A X '_' Q U E R I E S '_' P E R '_' H O U R;
MAX_ROWS_SYMBOL: M A X '_' R O W S;
MAX_SIZE_SYMBOL: M A X '_' S I Z E;
MAX_STATEMENT_TIME_SYMBOL:
M A X '_' S T A T E M E N T '_' T I M E {50704 < serverVersion && serverVersion < 50708}?
;
MAX_SYMBOL: M A X { setType(determineFunction(MAX_SYMBOL)); }; // SQL-2003-N
MAX_UPDATES_PER_HOUR_SYMBOL: M A X '_' U P D A T E S '_' P E R '_' H O U R;
MAX_USER_CONNECTIONS_SYMBOL: M A X '_' U S E R '_' C O N N E C T I O N S;
There are two approaches you can take:
If the additional syntax is not valid with the earlier version of the grammar and the interpretation of the previously valid expressions are not changing - only then you can consider using something like semantic predicates to be able to gauge which part of input is parsed with the new grammar and which one with the old one.
Example being: extending integer calculator to support floats
1.0 is invalid with the earlier grammar and new grammar does not change semantics of 1 (integer) calculations.
This condition is not so easy to be met as it may seem - there might be quite nuanced conditions particularly if the grammar or its new versions are complex.
Have two versions of the lexer/parser and switch them on independently as #lex-li suggests. This is the safe path that does not have to deal with the semantic changes of the old expressions with the additions of the new grammar syntax.
I have written a parser in Haskell, which parses formulas in the form of string inputs and produces a Haskell data type defined by the BNF below.
formula ::= true
| false
| var
| formula & formula
| ∀ var . formula
| (formula)
var ::= letter { letter | digit }*
Now I would like to create an instance of Show so that I can nicely print the formulas defined by my types (I don't want to use deriving (Show)). My question is: How do I define my function so that it can tell when parentheses are necessary? I don't want too many, nor too little parentheses.
For example, given the formula ∀ X . (X & Y) & (∀ Y . Y) & false which, when parsed, produces the data structure
And (And (Forall "X" (And (Var "X") (Var "Y"))) (Forall "Y" (Var "Y"))) False
we have
Too little parentheses: ∀ X . X & Y & ∀ Y . Y & false
Too much parentheses: (∀ X . (((X) & (Y)))) & (∀ Y . (Y)) & (false)
Just right: ∀ X . (X & Y) & (∀ Y . Y) & false
Is there a way to gauge how many parenthesis are necessary so that the semantics is never ambiguous? I appreciate any feedback.
Untested pseudocode:
instance Show Formula where
showsPrec _p True = "True"
showsPrec _p False = "False"
showsPrec p (And f1 f2) = showParen (p > 5) $
showsPrec 5 f1 . (" & " ++) . showsPrec 5 f2
showsPrec p (Forall x f) = showParen (p > 8) $
("forall " ++ x ++) . showsPrec 8 f
...
(I should probably use showString instead of those ++ above. It should work anyway, I think.)
Above, the integer p represents the precedence of the context where we are showing the current formula. For example, if we are showing f inside f & ... then p will have the precedence level of &.
If we need to print a symbol in a context which has higher precedence, we need to add parentheses. E.g. if f is a | b we can't write a | b & ..., otherwise it is interpreted as a | (b & ...). We need to put parentheses around a | b. This is done by the showParen (p > ...).
When we recurse, we pass the precedence level of the symbol at hand to the subterms.
Above, I chose the precedence levels randomly. You need to adjust them to your tastes. You should also check that the levels you choose play along the standard libraries. E.g. printing Just someFormula should not generate things like Just a & b, but add parentheses.
I have been trying to solve these two questions, but haven't had much luck.
Question 1: Show that the decomposition rule:
A → BC implies A → B and A → C,
is a sound rule, namely, that the functional dependencies A → B and A → C
are logically implied by the functional dependency A → BC.
Question 2: Let F be the following collection of functional dependencies
for relation schema R = (A, B, C, D, E):
D → A
BA → C
C → E
E → DB .
a) Compute the closure F + of F .
b) What are the candidate keys for R? List all of them.
c) List the dependencies in the canonical cover of the above set of
dependencies F (in other words, compute F c , as we have seen in class).
Any input will be helpful.
I am confused with dependency preserving property of database relations (tables). Do we have to look at initial FD set or what else? I tried to solve some problems on this subject. The questions before this one all feed my initial estimation, which is 'look at the given FD set. If you don't lose any of them in your new relation set, then this is dependency preserving'.
But when I come to this question I am confused.
Consider the relation R = (A B C D E F G H) and the following FD set:
FD1 E -› D
FD2 B, E -› C G
FD3 D, G -› E
FD4 C -› A B
FD5 E, G -› C
FD6 A, E -› B D
FD7 C, E, D -› G
FD8 A, G -› E
These are the given relations
R1 (E F G H)
R2 (A B E G)
R3 (C D E G)
R4 (A B C)
Answer says that this decomposition is dependency preserving. According to my estimation we lose FD2 so, this must not be dependency preserving.
I need an expert to clarify this concept for me.
This question was a part of homework questions. I wasn't sure if I am thinking right when I do the homework.
In my answer I wrote:
This decomposition is not dependency preserving because in this decomposition we loose the FD DF--> BC .
And my database teacher accepted this answer as an right answer I wanted to clearify the subject here also.
Ferda
The decomposition is dependency preserving as FD2 BE->CG can be achieved by relations R2(BE->G) and R3(EG->C).
Closure of BE gives CG.
What is the easiest way for Example1 to be converted to Example2 (I would be doing this with much longer lists)? Column C and D shall be associated to Col B for the output of Example2. This is not just to make Col B replicate Col A, although that is part of the solution. Thank you in advance!
Example1:
Col A Col B Col C Col D
a e d c
l l o a
e x g t
x a s s
Example2:
Col A Col B Col C Col D
a a s s
l l o a
e e d c
x x g t
It is not totally clear what you want to achieve and what the data qualities are, so a few assumptions:
all items in Col A are also in Col B
items in Col A are unique
Consider the following screenshot. Column A has been copied into column F. The formula in G1 is
=INDEX(B$1:B$4,MATCH($F1,$B$1:$B$4,0))
Copy the formula across to I1 and then copy G1 to I1 down.
If that does not do what you need, please edit your question, add a better data sample and more explanation.