Explain output of lowerquality/gentle - machine-learning

I used lowerquality gentle to get phonemes for a lip reading ai project.
Can anyone explain what "start off set" and "end off set" is?

Related

My 2 tailed sample significance is not showing in paired sample t test result on SPSS

I am trying to do a paired sample t-test analysis with SPSS, but the column that should hold the two-tailed significance is split in two: "1 sided p" and "2 sided p". I do not know how to interpret this result. Please help me out.
I want to either get the result I am looking for, which is the two-tailed significance, or understand how to interpret the results I am getting, which is 1 sided p and 2 sided p. I'm not permitted to add images yet so here is a link to the report:
C:\Users\User\Documents\paired sample t test stack.png
Thank you in advance.
I tried to run the test about 4 times with different variables and tried clicking on other options before running the analysis but the result is the same.
I unfortunately don't have SPSS anymore and cannot see your link, but alas I have looked on YouTube and found a video that shows the output of a paired samples t-test for SPSS. Here is what they have and I have highlighted what I suspect your interpretation issue is:
Basically, SPSS by default gives you the result of a one-tail and two-tail test automatically without really saying which is "correct" (this is what "one-sided" and "two-sided" mean by the way). If you are only interested in testing if there is a significant difference in either direction (two-tailed), then you only use the two-tail test p value. So in your case, just ignore the "one-sided" p-value and use the "two-sided" p value instead.

Compare speech and train a model with limited data

I'm trying to do a small project in machine learning, but don't know much.
Let's say I have 50 people saying the sentence "Hi, how are you".
Then I want to train a model that will check when I say:
"Hi, how are you" => Good
"Hi, bow are you" => Wrong
Note that I don't care about the meaning of the sentence or if the
words are correct. I just wanna check that what I'm saying is the
same as what the other 50 people said.
Note also that the spoken language that I'm gonna use isn't English.
What's the easiest way to achieve this?
Use speech to text and then just compare (I do have the text as well). but since the sentences are not english, open source speech to text might not be very good.
Use something like Kaldi to train the model?
Any other way?
If it is just one sentence, I guess a small amount of recordings will be sufficient for your project. You can record different voices of saying "hi how are you", and create your data according to this tutorial:
kaldi-asr data prep tutorial
Then, you can use a Kaldi recipe doing all the job for you if your data and dictionary is ready. After the training, you could just test your model (decode some new audio) if it has low WER (word error rate). IF your WER is low enough, you could just use this model for your project.

Deep Learning Book: What's the meaning of Pa in this section?

I'm reading Deep Learning Book and puzzled by this "undefined identifier" (the Pa in the image, line 4). It appears at Page 208. Can you tell me just what Pa() means? Just a tip so that I can refer to Google. Thanks a lot!
Link to origin image | I'm not allowed to post image directly
It means "Parents". The feed forward computation needs the values of the previous nodes to proceed.

Parsing a phrase

I am trying to make an algorithm to parse(i dont know if this is the correct word) a question and to get the correct answer to it.
Example
If someone ask "What is the Sun?", the correct answer would be "Is a Star"
This would be obtained from a list of phrases such as this:
"Is a Star"
"Is hot and brigth"
"I dont know"
etc
Now, I would like to know where can I get information about this,
I think the main problem here is how to make the program understand that "sun" is a star, and how to get the most accurate answer about it, becouse "Is hot and brigth" also is a valid answer.
Thanks
It is problem known as Machine Learning from Artificial Intelligence domain.
You can not just parse some phrases if you want to write good algorithm. It is not as simple as it seems to be.
You want to write your own application like http://www.cleverbot.com
I think you need to read and learn more about Machine Learning.

Finding meaningful sub-sentences from a sentence

Is there a way to to find all the sub-sentences of a sentence that still are meaningful and contain at least one subject, verb, and a predicate/object?
For example, if we have a sentence like "I am going to do a seminar on NLP at SXSW in Austin next month". We can extract the following meaningful sub-sentences from this sentence: "I am going to do a seminar", "I am going to do a seminar on NLP", "I am going to do a seminar on NLP at SXSW", "I am going to do a seminar at SXSW", "I am going to do a seminar in Austin", "I am going to do a seminar on NLP next month", etc.
Please note that there is no deduced sentences here (e.g. "There will be a NLP seminar at SXSW next month". Although this is true, we don't need this as part of this problem.) . All generated sentences are strictly part of the given sentence.
How can we approach solving this problem? I was thinking of creating annotated training data that has a set of legal sub-sentences for each sentence in the training data set. And then write some supervised learning algorithm(s) to generate a model.
I am quite new to NLP and Machine Learning, so it would be great if you guys could suggest some ways to solve this problem.
You can use dependency parser provided by Stanford CoreNLP.
Collapsed output of your sentence will look like below.
nsubj(going-3, I-1)
xsubj(do-5, I-1)
aux(going-3, am-2)
root(ROOT-0, going-3)
aux(do-5, to-4)
xcomp(going-3, do-5)
det(seminar-7, a-6)
dobj(do-5, seminar-7)
prep_on(seminar-7, NLP-9)
prep_at(do-5, -11)
prep_in(do-5, Austin-13)
amod(month-15, next-14)
tmod(do-5, month-15)
The last 5 of your sentence output are optional. You can remove one or more parts that are not essential to your sentence.
Most of this optional parts are belong to prepositional and modifier e.g : prep_in, prep_do, advmod, tmod, etc. See Stanford Dependency Manual.
For example, if you remove all modifier from the output, you will get
I am going to do a seminar on NLP at SXSW in Austin.
There's a paper titled "Using Discourse Commitments to Recognize Textual Entailment" by Hickl et al that discusses the extraction of discourse commitments (sub-sentences). The paper includes a description of their algorithm which in some level operates on rules. They used it for RTE, and there may be some minimal levels of deduction in the output. Text simplification maybe a related area to look at.
The following paper http://www.mpi-inf.mpg.de/~rgemulla/publications/delcorro13clausie.pdf processes the dependencies from the Stanford parser and contructs simple clauses (text-simplification).
See the online demo - https://d5gate.ag5.mpi-sb.mpg.de/ClausIEGate/ClausIEGate
One approach would be with a parser such as a PCFG. Trying to just train a model to detect 'subsentences' is likely to suffer from data sparsity. Also, I am doubtful that you could write down a really clean and unambiguous definition of a subsentence, and if you can't define it, you can't get annotators to annotate for it.

Resources