Using google assistant SDK with raw text as user input - google-assistant-sdk

I am currently working on a project that uses the Google Assistant SDK with Python. I have it working with direct audio listening, but I want to know if there is a way to use it with raw text input instead of listening to audio.

This is, apparently, a common request - but there is no way to do it yet. (Given this is still an early Developer Preview, and there have been many requests for this, we can hope they'll deliver it as part of a forthcoming update.)

Related

How to design a multi platform video conference/chat app?

I am a developer who is still learning . I want to design an app which can allow multiple people to have a video conference/chats simultaneously something like zoom . I know i can design native apps like specific for Android as well as iOS but I am still learning Android development and have no idea about iOS code .I searched and found that we can have hybrid apps having React,Node.js or with Angular.js and they work on different platforms .But as I'm a newbie I need suggestions as well as guidance .what I'm expecting in my app are following things :
Should support all video resolutions and audio quality, should
work in low and high network scenarios
Should be low on usage of power/ processor
Should not have any external hardware dependency
Should work on any device
Should have chat option during conference, even the multi
people conference
Should have sign-in and non-sign-in options to join a
conference
Can be browser and/or app based interface
Should have encrypted network communication
Should have audio/ video recording feature
Should have screen/file sharing capabilities
Should allow audio to close captioning during chat
(multilingual)
Should have capabilities to host multiple concurrent
conferences having multiple participants in each conference
I know its a tedious task to involve everything I discussed but I need guidance how to do this .
I have already told my expectation so now I want to know what steps I need to do so ,How to start as well as where to start ,what language/library I should choose ,whether having a hybrid app be a good idea or should I go for native apps .As I have earlier said I am a learner so I am going to learn each and everything to get my project done ,so whether its react or node or angular or whatever experienced developer are going to suggest/guide here .I know my question may look broad or even vague but still I am asking only because I see stack-overflow as a group of supportive accomplished coders .Hope you guys will help me in getting my project done .Thank you !
OK then you have got much work to do. I will point you to some references which should give you a good start. I will try to keep this as short as possible.
As you mentioned, WebRTC is the way to go.
With WebRTC, you can add real-time communication capabilities to your
application that works on top of an open standard. It supports video,
voice, and generic data to be sent between peers, allowing developers
to build powerful voice- and video communication solutions. The
technology is available on all modern browsers as well as on native
clients for all major platforms.
This blog explains how WebRTC functions in details - https://medium.com/#anto.christo.20/understanding-web-real-time-communication-webrtc-d4cec5a43f2f
This blog explains how to build peer2peer video calling in android -
https://medium.com/#anto.christo.20/understanding-web-real-time-communication-webrtc-d4cec5a43f2f
https://webrtc.org/ also contains lot of headstart material including sample code.
Once you have done this you can add other features on top of it.
Now, this will take care of peer2peer but if you want o build a multi-user functionality from scratch there is some extra work required as mentioned in the answer - how to build multi-user video chatting web app using webRTC, node.js and socket.io

Emoji support in Google's Cloud Speech API

I've noticed that certain apps on Android (ie. gboard) support translating phrases such as 'poop emoji' into the actual emoji as part of speech recognition. I was wondering if this is something that is supported through google's cloud speech APIs that I could similarly use in my own applications?
In my initial scan of the API I can't see anything that might indicate a way to turn this on (ie. RecognitionConfig et.al has no obvious toggles for it), and in some quick one-off tests in my own app I wasn't provided emoji-fied results from the service.
I've done a bunch of googling but found nothing so far.
Any insight here would be awesome, thanks!
-edit- Thanks to the answer below I have learned this currently is not supported. I've gone to Google's issue tracker to request this feature. If anyone wishes to track the feature request the link is:
https://issuetracker.google.com/u/1/issues/113978818
The Cloud Speech-to-Text API service doesn't currently support emoji phrases recognition; however, you can use the Send Feedback button located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in case you want to raise a Speech API feature request in order to notify to Google about this desired functionality.
Finally, you can refer to the Release Notes section of Speech-to-Text API to keep the track of the new features and functionalities added to the service.

What algorithm does youtube use to generate a transcript for videos?

I am looking into developing an application to transcribe an audio file for me, then it gives me a document with words or phrases and times spoken, just like YouTube does. I could just upload files to YouTube and then get the transcript but I want to use it offline. Anyone to help? Where can I start?
Not sure about Youtube, but I would start with Google Cloud Speech API, and if you're not happy with it, then I'd go through these 5 as well.
Also, bear in mind that Chrome has Web Speech API built in (and most likely Firefox has something similar, but I never had a need to explore that), so if what you're doing is for web, you should check that out too.
Let us know if this helped.

"Now Playing..." track info on blackberry smartphone

I'm developing java-based app for Blackberry OS 7. One of the possible features can be to see the name of currently playing audio file. The question: is it possible to get this information from system using only Java API? Is it possible at all to get access to this kind of information?
I don't suppose there is any straightforward way to get this information, at least not any that is documented. BlackBerry's Java API is somewhat silly in that sense, unfortunately.
If you want you can probably do some ad-hockery to attempt to get that info, but it won't be painless. One example can be scraping the UI elements of the "Now Playing" application that runs in the background during playback for any changes and reflecting those in your app, but then again, it won't be exactly elegant.

Microsoft/Ford Sync SDK

Just got a car with the Microsoft sync system in it. Did a quick search online and was curious if anyone is aware of any SDK that may exist, sample open source add-on applications, etc.
Thanks in advance.
UPDATE:
Looks like Ford has finally released their SDK:
https://developer.ford.com/
Ford has a website SYNC Mobile Application Developer Network but the SYNC SDK does not look to be available yet (their site mentions possibly later this year). It appears they are stilling working on the API before releasing it. All they are offering now is a way to register to be notified of new info as it becomes available.
From their About page:
Ford is hard at work developing an API
to allow developers to integrate their
Smartphone applications with SYNC. The
Developer Program website will educate
developers about the Ford SYNC
platform and how to interact with it
via the API. There will be a full set
of documentation, example
applications, reference libraries, and
even a developer forum so you can
reach out to the community for quick
help.
With the available SYNC API's, mobile
application developers will be able to
do some of the following:
Create a voice UI for your application using the in-vehicle
speech recognition system.
Write information to the radio head display or in-vehicle touchscreen
Speak text using text-to-speech engine.
Use the in-vehicle menu system to provide commands or options for
your mobile application
Get button presses from the radio and steering wheel controls.
Receive vehicle data (speed, GPS location, fuel economy, etc.)
The official API and full website
launch is set to be targeted for later
this year.
It looks like the SDK is coming very soon. The story was just posted on Engadget:
http://www.engadget.com/2009/12/18/ford-to-give-sync-some-app-store-flavor-opening-api-to-devs-in/
Now just imagine what you could do with access to your automobile functionality!
I don't think there is any. Its a closed ecosystem.
http://www.autoblog.com/2009/01/09/ces-2009-sync-could-one-day-add-app-development-like-iphone/
From what I understand it is based on the CE 6.0 platform using windows automotive 4.1, but I could be wrong. We really need a forum to get this going. Hurry up Ford! Release the SDK!
Given the way Windows automotive is there is only two ways of putting a ROM on Sync. Using JTAG to put your own custom bootloader (forget it), or though the USB. Which you will need to know how to sign the file so sync will think it is an OEM ROM. So at this point even if you were able to come up with your own custom rom by using Microsoft eMbedded Visual C++ you would still have no way to get it on there.
BTW, the SDK they are talking about releasing will only be to develop apps for applink. (not modifying the OS). However, to upload the apps we might be able to find out how to sign the .bin file for sync to accept a ROM through the USB.
Then again this is just from my understanding... I am no great developer or anything.
Ford launched sdk in ces, check it out: http://techcrunch.com/2013/01/08/ford-launches-its-openxc-sdk-and-hardware-specs-to-let-developers-access-its-cars-sensors-and-metrics/

Resources