Google speech-to-text output json file as input to text-to-speech API? - google-cloud-speech

I have an audio file containing the monologue for a video. I don't like the persons voice so I wanted to convert it to a Google Cloud Text-To-Speech voice, specifically the en-GB female voice.
I was able to create the speech-to-text json file using the API, but the output format of the json file isn't compatible with the input json format used by the text-to-speech API.
Is there a way to bridge the output from Google's speech-to-text engine as input to their text-to-speech engine?

Apparently there is no way to do this automatically.
One thing that comes to my mind is to create a simple script to get the output of the Speech-to-Text API and then format the input for the Text-to-Speech API if you plan to perform this operation usually, if not maybe the easiest thing would be just copying and pasting the output of the Speech-to-Text API to the Text-to-Speech API.

Related

How to use API instead of using Google BigQuery Data Transfer Service?

I am trying to create BigQuery Data transfer config for Google Adwords through API using a programming language (Python, Java). I looked at the documentation about BigQuery data transfer API. But there is no proper process for that. Maybe I could not understand properly. Can anyone help me in understanding how to use API to get daily analytic data from YouTube instead of paying YouTube to use their BigQuery Data transfer?
You need to get started using Adwords SQL
https://developers.google.com/adwords/api/docs/guides/first-api-call
Refer to the Getting Started section of the Python client library README file to download and install the AdWords API client library for Python.

What are the endpoints for the new Microsoft speech service WebSocket APIs?

I want to use the new MS Speech Translation API, but I am working with Go so there is no SDK. I have a WebSockets implementation for the previous Translator Speech API, so raw WebSocket are no issue.
The documentation states that it is using WebSockets, but I was unable to find the endpoints in the documentation. Does anyone know what are the WS endpoints and their path/header parameters?
EDIT:
The documentation also says: "If you already have code that uses Bing Speech or Translator Speech via WebSockets, you can update it to use the Speech service. The WebSocket protocols are compatible, only the endpoints are different." But the new endpoints are missing.
After digging into the binaries of client SDKs I have found the Speech Translate API to be wss://<REGION>.s2s.speech.microsoft.com/speech/translation/cognitiveservices/v1
Another problem is that the WebSocket protocol is NOT compatible despite the documentation says so. Good thing is that after experiments I have found out that the new Speech Translation WS API uses the same protocol as the old Bing Speech WS API, except for URL query parameters. The Bing Speech API has a language parameter and the Speech Translate preview API has from, to, voice and features. The from and to work as expected, you can even send more languages in to (comma separated and the TTS is missing). I have not tried the voice. The features looks like doing nothing and there are always partial results, timing info and TTS.
The responses are also different, but similar to Bing Speech. They have headers and there are multiple different JSONs. Just observe the raw strings.
As this is a preview API it can change at any time.
There hasn't been substantial changes in the Websocket protocol, so the old documentation should be reasonable accurate.
The Microsoft Cognitive Services Speech SDK doesn't support GO yet, it is on the roadmap, but will not happen this calendar year.
thx
Wolfgang

What technology should i use for streaming tweets and analysis?

I need to stream live tweets from twitter API and then analyse them. I should use kafka to get tweets or spark streaming directly or both ?
You can use Kafka Connect to ingest tweets, and then Kafka Streams or KSQL to analyse them. Check out this article here which describes exactly this.
Depending on your language of choice I would use one of the libraries listed here: https://developer.twitter.com/en/docs/developer-utilities/twitter-libraries. Which ever you choose, you will be using statuses/filter in the Twitter API, so get familiar with the doc here: https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter.html

iOS StreamingKit Framework Encryption

I am considering using the following framework to streaming audio in an iOS app, on the GitHub page it mentions it allows encryption/decryption but i cannot find any documentation on it!
Has anyone been able to achieve this and if so, how?
https://github.com/tumtumtum/StreamingKit
The only code that I can see that deals with encryption is the ability to stream using a https URL and the ability to decode a variety of encoded audio types. What additional capabilities were you looking for?

Use Google Gears geolocation from a python app

I'd like to use the Google geolocation API in my app, written in Python. My problem is that Google provides a JSON interface (easily useable from Python) but from http://code.google.com/p/gears/wiki/GeolocationAPI I see that the API "is published to allow developers to provide their own network location server for use through the Gears API. Google's network location server is only to be used through the Gears API. See section 5.3 of the Gears Terms of Service at [address]."
It is a very strange thing: there is a very cool JSON but I cannot use it. I have to use it through Google Gears instead. But how can I do it from a Python app?
For example, I see that the geolocation service provided by Firefox calls directly the JSON API. Why is FF able to do that?
Thanks,
Alessio Palmero Aprosio
Google has deprecated Gears entirely, as the geolocation feature is now standard in modern browsers (for certain values of "standard").
The pylocation module may provide the information you need. It can output the geolocation data in text, json, or xml.

Resources