Wit.ai recognizes numbers as location

Wit.ai recognizes numbers as location - machine-learning

We are facing the issue that wit.ai recognizes almost every number as a location. Sometimes even as a DateTime, but almost never as a number. We tried to teach it that 1 is a number, 2 is a number, etc., but it doesn't seem to pick that up, see the screenshot below:
Are we doing something wrong?

I tried a new bot app and just added one sentence with wit/number entity in the sample database and It works okay for me as you can see:
Maybe your problem is with other samples which make the tagging problem confusing for the algorithm behind wit.ai (probably a CRF or RNN based model). I suggest you to add many more samples with wit/number entity in them and check if it works.

Related

Slow Batch Upload to Google Sheets for Batch Upload of Records through Power Apps

I am fairly new to Power Apps, and am trying to make a batch data entry form.
I am prototyping this now, and while I think in theory it should be working I keep running into technical errors.
The data source I'm using is google sheets. For prototyping purposes, there are three columns, item_id, item, and recorded_value.
For this app, it will be pulling a list of standard values into a gallery, where the input values can then be selected.
The approach I have taken is to create a gallery, which is added to a collection using the code below:
ClearCollect(
collection,
ForAll(
Filter(Gallery1.AllItems,true),
{ item:t_item.Text,item_id:t_item_id.Text,
recorded_value:t_recorded_value.Text
}
)
)
This is then uploaded to google sheets, I have found "success" using the two methods below:
ForAll(collection,Patch(records, Defaults(records),{item:item,item_id:item_id,recorded_value:recorded_value}))
or
Collect(records, collection)
I would say overall I am seeing 2 main issues in the testing:
The initial 'collect' seems like it fails to capture items on occasion. I don't know if it is cache related or what, but it seems like unless I scroll all the way down it will leave some fields blank (maybe not an issue in real use, but seems odd)
Uploading of records seems to take excruciatingly long in some cases. While initially it was just straight up crashing due to the problems in issue 1, I have found that it will sometimes get to say item 85 before sitting for a minute or so and then going through the rest of the list. For just 99 items it is taking several minutes to upload.
Ultimately I am looking to know if there is a better approach for what I am doing. I am basically just wanting to take a max of 99 rows and paste it on to the table, but it feels really inefficient right now due to the looping nature of the function. I am not sure if this is more of a powerapps or google sheets issue, but any advice would be appreciated.

From everything I could research, it seems like batch upload of records like this is going to be time consuming nearly any way you approach it.
I was able to come up with a workaround however which more or less eliminates the problem.
Instead of uploading each individual record, I am taking the approach of concatenating all records in the collection in a single cell through a variable, using delimiters to differentiate the rows/columns. (set variable with concat function, then patch the variable to the data source.)
This method allows all of the data to be stored nearly instantaneously.
After that I am just going to perform some basic etl through Python to transform the data into a more standard format and load it into SQL server which is fairly trivial to do.
I recommend others looking to take a 'batch insert' approach try something similar, as it will now only take users essentially a second to load records rather than several minutes.

What do 'random jumps' in Google's pageRank really mean?

I read somewhere that the added S matrix of 1/n elements together with the fudge factor 0.15 which Google uses is just not accurate and just comes to solve another problem.
On the other hand I have read somewhere else that it does have a meaning. And it is used for random jumps. We first ask whether a surfer wants to continue to click or not. So according to what I read the meaning is -85% continue to click -15% don't.
My question is... this is maybe good for first click. But how does this work in other iterations? How can anyone land at a random page? Isn't it the whole assumption of page rank that every one is linked to the other?
If I can just land on a page without coming from somewhere else then the ranking isn't accurate at all.
But most importantly I don't understand what does the added 1/n matrix mean? If I am at a page I can only click on clicks which I see. What does it mean to say that I can go somewhere else?
If they mean that I just Google search again then why don't call it a second chain? Why include it in the first ?
Also, is it 15% that I randomly jump or 15% that I stop surfing? (Or are they the same thing? )
And to my first question - is it a fudge inaccurate factor that is made to solve other problems or it does really mean something as said above and it IS a correct measurement to include it even by its own merit?

"Random jumps" could correspond to lots of things:
Entering an address in URL bar
Visiting a "Favorite" link
Visiting a home page (or any one of the links on it!)
Visiting a link from a content aggregator / social media
People do actually do these things when browsing online; going to a random page in your index is a very crude approximation of this behavior.
If you're Google or some other entity with lots of surfing/tracking data, you can actually measure the probabilities people "jump into" particular websites to get a better model! The random-jump probabilities don't need to be totally uniform; they just need to be non-zero for every website.
The random-jumps is the simplest way to ensure the matrix/corresponding chain is Ergodic which makes it easier to analyze and guarantees convergence.

Trying to make a search engine for issues

Our company has a lot of data that are issue which are stored in a database.We want to create a search engine so that people can check how the issues were previously dealt with.We cannot use any 3rd party api as there is sensitive data an we want to keep it as in house. Right now the approach is as following :-
Clean up the data and then use a DOC2VEC to represent each issue as a vector .
Find the closest 5 issue using some distance metric.
The problem is that the results are not at all useful.The problem is most of the data is one liner and some issue description.There are spelling mistakes and stack traces and other things.
Is this the right approch or should we switch to something else?
Right now we are testing on 200K data.
Thanks for the help.

Vuforia: UserDefinedTargets is better than ImageTargets database?

I'm experimenting with Vuforia. It's going pretty well so far.
Previously I've had the ImageTarget demo working with my own targets, so I know I can get this to work for my own purposes. I also realise targets should have a good "star rating" so that Vuforia can successfully track them.
However, the following experiment is confusing me:
I create my own target database using the Target Manager, with one target, which shows up as ZERO star rating. I know Vuforia likes high star ratings, but bear with me. As I expected the ImageTargets app does not seem to recognize my target image. No surprises there really given the ZERO star rating.
However, if instead I run the UserDefinedTargets demo and I take a "live" image of the same target, Vuforia is perfectly able to track the target !
Can anyone explain why this might be the case and how I can fix the problem?
Ideally, I would like to use ImageTargets as this allows me to load in databases as I please.
Alternatively, I would like to be able to store a database captured within the UserDefinedTargets app which I can reuse at a later stage.
Overall, I'd like to know why using the Target Manager doesn't work, but using the UserDefinedTarget app does work, and how I might be able to fix the problem.

Rather than add this to the question, which is already quite lengthy, I thought it better to put it as an answer, although I'm open to other comments and answers!
I think the UserDefinedTarget app may recognize the images "better" because directly after the user defined target image is taken, the camera (i.e. mobile phone) is in the correct position already. This does not, however, explain the excellent "re-recognition" rate, i.e. if the camera is moved away from the target and then brought back over the target, the UserDefinedTargets app recognizes the target instantly every time.
Hmmm...

iOS / C: Algorithm to detect phonemes

I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch

If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.

I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.

Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.

http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart