Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been trying to create a game using OpenEars. I am using dynamic language model method. But the performance is not up to the mark. Recognition is very low. Is there any advantage in using static language model ?? Any other method to improve speech recognition ???
OpenEars developer here. There are a few things that can result in sub-par recognition. Here are the really big ones:
Testing recognition using the Simulator rather than a real device
Having misspelled words in your language model (this is a big one
that accounts for a very large number of reported issues – if the
word is misspelled, how will its correct pronunciation be found or
derived and entered in the phonetic dictionary? It can't be, and then
correct pronunciations get false negatives)
Having extraneous punctuation in your language model. Check this out by taking a look at the .arpa file contents and the .dic file contents and seeing if the entries in each seem to match each other or not.
Having a native-speaker accent which is very different from the US accents the acoustic model is trained with, or having a non-native-speaker accent (this isn't fair, but it's reality)
Having the language model largely consist of non-English words such as non-English last names, non-English street names, or intentionally-misspelled band/startup names, since all pronunciation ends up being estimated.
But static language models versus dynamic language models have never been a big consideration for accuracy levels. If you'd like to troubleshoot this with me further I'd recommend that you visit the OpenEars support forums where I'd be happy to help with that, since Stack Overflow is not intended for ongoing back-and-forth troubleshooting processes and this is probably one of those.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm doing software engineering and I'm learning Artificial Intelligence course in the current semester, then I need to give the project at the end of this semester (after 3 months). So, my question is which project is recommended for me, voice expression AI project or face expression AI project?
VOICE EXPRESSION: This software will listen user's conversation whole day then at the end of the day, it will show that how many bad words the user spoken, for how much time user got hyper, for how much time user got angry, etc.
FACE EXPRESSION: This software will monitor the screen of computer (for example teacher delivering lecture on ZOOM and camera of students are opened) then it will tell the user(or teacher) who is taking interest in user's lecture, who is confused, who wanna ask question etc.
So, If I'm a beginner in AI what project should I choose from those two projects? or should I choose easy project other than those two projects?
In principle, voice analysis seems to me easier than face analysis. To begin with, there is only one dimension, rather than two, and it would probably be easier to recognise words in a stream of sound than faces in a stream of images. However, I have a background in phonetics/signal processing, so sounds do look easier to me than images. If you've done image processing before, that might be better suited for you.
The key for a good project should not necessarily be how easy or hard it is, but whether it is something you are (a) interested in, (b) capable of achieving, and (c) relate to the course.
Also, be clear about what you want to achieve and how easy that is to determine: matching the sound pattern of a word is something much more objective than trying to identify if someone is bored or wants to ask a question based on facial expressions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a weird requirement where all my data to be displayed in the app is coming from a remote server. The data consists of simple text, numbers, dates and prices. Now, had all this data been static, the task was simple but here the problem is that the data is dynamic (coming from the server) and also the app has to be localised in at least 20 languages. The biggest challenge is to convert the price values into user selected currencies on his device settings on the run.
For currency conversion, you can use the Yahoo API. Examples can be seen in this StackOverflow question.
http://finance.yahoo.com/currency-converter/#from=USD;to=EUR;amt=1
This url format could be used to fetch conversion rates in different formats.
http://download.finance.yahoo.com/d/quotes.csv?s=AUDUSD=X&f=nl1d1t1
Substitute quotes.csv with appropriate format and parameters with the required codes
As far as localization of your text coming from the server, that's a much more difficult problem. I think it would be very difficult to translate this "on the fly". The correct solution is to include the language in the request and have the server return any text in the requested language.
I guess another approach you could take (which may not be feasible depending on how many different strings you are dealing with) is enumerate all the possible strings returned from the server. Then take the more traditional approach of having these strings translated and included in your app.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a few thousand sentences of varying length. The statements are in many forms, ranging from a 3 character reply to 4000 character reply with a lot code snippets. Code snippets can be any language.
How do I recognise comments which are questions (are interrogative) and does not have code snippets? Comments need not have a question form or a strict structural form.
The app is built on ruby on rails 3
Some example sentences:
1: How to solve segmentation fault? #valid
2: You'll have to use the BigInteger #invalid
3: some tips to remove runtime error #invalid
4: :disappointed: :disappointed: Okay #invalid (contains smilies)
5: In which category this problem fall? Graph Theory? #valid
This is an example of text classification problem, that is generally solved by generating some features and applying machine-learning classification algorithm to them.
For your particular case, question detection is well studied area. One of simplest possible approaches is heuristic one using regular expressions
Following solution is taken from this paper:
A sentence is detected as a question if it fulfills any of the
following: • It ends with a question mark, and is not a URL. • It
contains a phrase that begins with words that fit an interrogative
question pattern. This is a generalization of 5W-1H question words.
For example, the second phrase of “When you are free, can you give me
a call” is a strong indicator that the sentence is a question. • It
fits the pattern of common questions that are not in the interrogative
form. For instance, “Let me know when you will be free” is one such
question.
A more complex solutions are also described and you can find them is mentioned paper of googling "question detection algorithm"
For code snippet detection there are existing solutions that detect programming language, as mentioned in the comments. One example is http://www.rubyinside.com/sourceclassifier-identifying-programming-languages-quickly-1431.html
They probably can be adapted to detect if the specific sample is code or not. Or you can train simple Naive Bayes classifier using one of existing libraries
Text classification is one way of doing it, but for that you would need a good amount of sample data to train your model, to be able to detect your patterns accurately.
You can also parse these sentences to get parts of speech (POS) and then easily look for words like who, which, how, when etc to detect questions.
Stanford NLP has a Ruby library which provides POS tagger which you can use.
https://github.com/tiendung/ruby-nlp
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Looking for APIs, methods, research, etc on the subject of deciding whether a tweet (a string, really) conveys a mood of danger.
For example:
Danger: "this house across the street is on fire!!
Not danger: "this girl is on fire! love this song"
There is little research done on the particular problem of detecting danger, but there are a few research papers describing methods to detect natural hazards. Your example is reminiscent of the title of one of them: Finding Fires with Twitter. Another research that you may find useful is Emergency Situation Awareness: Twitter Case Studies.
In general, however, the best approach to solve such a problem is through supervised classification, very similar to how sentiment analysis is (or rather, was, because there are more sophisticated machine learning paradigms like Deep Learning being applied nowadays) done.
The essence is to label documents (in your case, tweets) into "danger" and "not danger". This labeling is done by human experts. Ideally, they should be well versed in the language and the domain. So, using native English speakers who know the colloquialisms of Twitter would be perfect annotators for this task.
Once adequate number of documents have been labeled, the baseline (i.e. the basic approach) is usually achieved by creating n-gram word vectors as feature vectors, and running SVM. If you are not aware of machine learning details, please read up on them before doing this.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I just came up with an idea that I want to develop into an application to distinguish/auto detect voices from different people.
Sample use case: After training with Obama and Romney's data, the application would be able to detect whenever either one speak again (not necessary the same content from the training data)
I am wondering if there are any existing research on this. (I don't know how to search for this. I tried a couple keywords and got no significant results.)
If not, what is a good way to start? How to choose features, data representation, models, etc.
Thanks!
I found Speaker recognition on Wikipedia which in turn linked to An overview of text-independent speaker recognition: From features to supervectors (Kinnunen, Li, 2010).
From the abstract of the paper:
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods.