Google Cloud Speech - Last Names from Speech Context Phrases Lowercased - google-cloud-speech

I'm passing in a list of names to the Google Cloud Speech API like this:
speech_contexts: [{ phrases: [ "Bob Smith" ] }]
Google correctly identifies the names (like "Bob Smith") and inserts them into the text. However, they all appear in the transcription with the last name lowercased: "Bob smith". Why isn't the name appearing in the transcription in the format it was passed into the API ?

The phrase hints parameter is used by Speech API to perform and boost the recognizing and transcribing tasks, improving the probability that such words or phrases will be recognized; However, it can't guarantee that the text will match exactly.
I think you should try by sending these words separately in order to validate if the service can perform the transcription as specified in the prase hints as well as implement the maxAlternatives property that will retrieve multiple alternative results that you can filter based on the confidence value.
In case none of these options are helpful, you can use the Issue Tracker tool to raise and/or track a Speech API feature request in order to improve this service functionality.

Related

Microsoft Graph API Bulk Edit of Noncontiguous Rows in Excel

I have a need to be able to edit multiple (10-20) noncontiguous rows in an Excel table via the Microsoft Graph API. My application receives a list of 10-20 strings as input. It then needs to be able to find the rows of data associated with those strings (they are all in the same column) and update each row (separate column) with different values. I am able to update the rows using individual PATCH requests that specify the specific row index to update, however, sending 10-20 separate HTTP requests is not sufficient due to performance reasons.
Here is what I have tried so far:
JSON batching. I created a JSON batch request where each request in the batch updates a row of data at a specific row index. However, only a few of the calls actually succeed while the rest of them fail due to being unable to acquire a lock to edit the Excel document. Using the dependsOn feature in JSON batching fixed the issue, but performance was hardly better than sending the update requests separately.
Concurrent PATCH requests. If I use multiple threads to make the PATCH requests concurrently I run into the same issue as above. A few of them succeed while the others fail as they can not acquire a lock to edit the Excel document.
Filtering/sorting the table in order to perform a range update on the specific rows currently visible. I was able to apply a table filter using the Microsoft Graph API, however, it appears that you can only define two criterion to filter on and I need to be able to filter the data on 10-20 different values. Thus it does not seem like I will be able to accomplish this using a range update since I cannot filter on enough values at the same time and the rows cannot be sorted in such a way that would leave them all in a contiguous block.
Is there any feature in the Microsoft Graph API I am not aware of that would enable me to do what I am proposing? Or any other idea/approach I am not thinking of? I would think that making bulk edits to noncontiguous rows in a range/table would be a common problem. I have searched through the API documentation/forums/etc. and cannot seem to find anything else that would help.
Any help/information in the right direction would be greatly appreciated!
After much trial and error I was able to solve my problem using filtering. I stumbled across this readme on filter apply: https://github.com/microsoftgraph/microsoft-graph-docs/blob/master/api-reference/v1.0/api/filter_apply.md which has an example request body of:
{
"criteria": {
"criterion1": "criterion1-value",
"criterion2": "criterion2-value",
"color": "color-value",
"operator": {
},
"icon": {
"set": "set-value",
"index": 99
},
"dynamicCriteria": "dynamicCriteria-value",
"values": {
},
"filterOn": "filterOn-value"
}
}
Although this didn't help me immediately, it got me thinking in the right direction. I was unable to find any more documentation about how the request format works but I started playing with the request body until finally I got something working. I changed "values" to an array of String and "filterOn" to "values". Now rather than being limited to criterion1 and criterion2 I can filter on whatever values I pass in the "values" array.
{
"criteria": {
"values": [
"1",
"2",
"3",
"4",
"5"
],
"filterOn": "values"
}
}
After applying the filter I retrieve the visibleView range, which I discovered here: https://developer.microsoft.com/en-us/excel/blogs/additions-to-excel-rest-api-on-microsoft-graph/, like this:
/workbook/tables('tableName')/range/visibleView?$select=values
Lastly, I perform a bulk edit on the visibleView range with a PATCH request like this:
/workbook/tables('tableName')/range/visibleView
and a request body with a "values" array that matches the number of columns/rows I am updating.
Unfortunately this simple task was made difficult by a lack of Microsoft Graph API documentation, but hopefully this information here is able to help someone else.

How do I check whether a given string is a valid geographical location or not?

I have a list of strings (noun phrases) and I want to filter out all valid geographical locations from them. Most of these (unwanted location names) are country or city or state names. What would be a way to do this? Is there any open-source lookup table available which contains all country, states, cities of the world?
Example desired output:
TREC4: false, Vienna: true, Ministry: false, IBM: false, Montreal: true, Singapore: true
Unlike this post: Verify user input location string is a valid geographic location?
I have a high number of strings like these (~0.7 million) so google geolocation API is probably not an option for me.
You can use geoplanet data by Yahoo, or geonames data by geonames.org.
Here is a link to geoplanet TSV file containing 5 million geographical places of the world :
https://developer.yahoo.com/geo/geoplanet/data/
Moreover, geoplanet data will provide you type ( city,country,suburb etc) of the geographical place, along with a unique id.
https://developer.yahoo.com/geo/geoplanet/guide/concepts.html
You can do a lowercase, sanitized ( e.g. remove special characters and other anomalies) match of your needle string to the names present in this data.
If you do not want full file scans, first processing this data to store it in a fast lookup database like mongodb or redis will be beneficial.
I can suggest the following three options:
a) Using the Alchemy API: http://www.alchemyapi.com/
If you try their demo, places like France, Honolulu give the entity type as Country or City
b) Using TAGME: http://tagme.di.unipi.it/
TAGME connects every entity in a given text to the corresponding wikipedia page. Crawl the wikipedia page and check the infobox and filter
c) Using Wikipedia Miner: I was unable to find relevant links for this. However, this also works like TAGME.
Suggest you to try all three and do majority voting for each instance.

Tweepy: Search in stream api

In Tweepy api for twitter I know we can search tweets by -
api.search(q="a and b")
Here it will search for both a and b appearing anywhere in the status in any order. However I need to do same for the Tweepy stream api, Is there any way to do that?
I know there is track field -
stream.filter(track=['a','b'])
But this would return status containing either of a or b, I need both of the keywords in any order.
We could also do like, search for only a then manually filtering out status which contain b, but then we would be discarding huge no. of tweets as stream api gives only 1% of the tweets
Yes, this can be done easily. Looking at the docs for the Twitter API track paramenter:
A comma-separated list of phrases which will be used to determine what Tweets will be delivered on the stream. A phrase may be one or more terms separated by spaces, and a phrase will match if all of the terms in the phrase are present in the Tweet, regardless of order and ignoring case. By this model, you can think of commas as logical ORs, while spaces are equivalent to logical ANDs. For example, ‘the twitter’ is (the AND twitter), and ‘the,twitter’ is (the OR twitter).
By this logic, to filter by a and b:
stream.filter(track=['a b'])

Google prediction api, Having the output as a list that have a variable size

I want to train a model that will allow me to generat a LIST of tag related to certain text, my output list will have variable size depending in the context. In the examples that i found, the model return always one output.
I am wondering if the Google prediction Api can help me and if there are any examples.
It seems this is what you will get when using a CATEGORICAL rather than a REGRESSION model. From the Google documentation: https://developers.google.com/prediction/docs/developer-guide#examples
When you submit a query, the API tries to find the category that most closely describes your query. In the Hello World example, the query "mucho bueno" returns the result
...Most likely: Spanish, Full results:
[
{"label":"French","score":-46.33},
{"label":"Spanish","score":-16.33},
{"label":"English","score":-33.08}
]

User input parsing - city / state / zipcode / country

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country.
A common example would be what Google maps does.
Some examples of input would be:
"City, State, Country"
"City, Country"
"City, Zip Code, Country"
"City, State, Zip Code"
"Zip Code"
What would be an efficient and correct way to parse this input from a user?
If you are aware of any example implementations please share :)
The first step would be to break up the text into individual tokens using spaces or commas as the delimiting characters. For scalability, you can then hand each token to a thread or server (if using a Map-Reducer like architecture) to figure out what each token is. For instance,
If we have numbers in the pattern, then it's probably a zip code.
Is the item in the list of known states?
Countries are also fairly easy to handle like states, there's a limited number.
What order are the tokens in compared to the common ways of writing an address? Most input will probably follow the local post office custom for address formats.
Once you have the individual token results, you can glue the parts back together to get a full address. In the cases where there are questions, you can prompt the user what they really meant (like Google maps) and add that information to a learned list.
The easiest method to add that support to an applications, assuming you're not trying to build a map system, is to query Google or Yahoo and ask them to parse the date for you.
I am myself very fascinated with how Google handles that. I do not remember seeing anything similar anywhere else.
I believe, you try to separate an input string in words trying various delimeters - space, comma, semicolon etc. Then you have several combinations. For each combination, you take each words and match it against country, city, town, postal code database. Then you define some metric on how to evaluate the group match result for each combination. Here should also be cross rules, like if the postal code does not match well, but country, city, town match well and in combination refer to a valid address then the metric yields a high mark.
It is sure difficult and not an evening code exercise. It also requires strong computational resources - a shared hosting would probably crack under just 10 requests, but a data center could serve it well.
Not sure if there is an example implementation. Many geographical services are offered on paid basis. Something that sophisticated as GoogleMaps would likely cost a fortune.
Correct me if I'm wrong.
I found a simple PHP implementation
http://www.eotz.com/2008/07/parsing-location-string-php/
Yahoo seems to have a webservice that offers the functionality (sort of)
http://developer.yahoo.com/geo/placemaker/
Openstreetmap seems to offer the same search functionality on its homepage
http://www.openstreetmap.org/
Assuming you're only dealing with those four fields (City Zip State Country), there are finite values for all fields except for City, and even that I guess if you have a big city list is also finite. So just split each field by comma then check against each field list.
Assuming we're talking US addresses-
Zip is most obvious, so check for
that first.
State has 50x2 options
(California or CA), check that next
Country has ~190x2 options, depending
on how encompassing you want to be
(US, United States, USA).
Whatever is left over is probably your City.
As far as efficiency goes, it might make sense to check a handful of 'standard' formats first, like Dan suggests.

Resources