Local file for Google Speech - google-cloud-speech

I followed this page:
https://cloud.google.com/speech/docs/getting-started
and I could reach the end of it without problems.
In the example though, the file
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
is processed.
What if I want to process a local file? In case this is not possible, how can I upload my .flac via command line?
Thanks

You're now able to process a local file by specifying a local path instead of the google storage one:
gcloud ml speech recognize '/Users/xxx/cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
You can send this command by using the gcloud tool (https://cloud.google.com/speech-to-text/docs/quickstart-gcloud).

Solution found:
I created my own bucket (my_bucket_test), and I upload the file there via:
gsutil cp speech.flac gs://my_bucket_test

If you don't want to create a bucket (costs extra time and money) - you can stream the local files. The following code is copied directly from the Google cloud docs:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech
client = speech.SpeechClient()
with io.open(stream_file, "rb") as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (
speech.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
streaming_config = speech.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(
config=streaming_config,
requests=requests,
)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print("Finished: {}".format(result.is_final))
print("Stability: {}".format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print("Confidence: {}".format(alternative.confidence))
print(u"Transcript: {}".format(alternative.transcript))
Here is the URL incase the package's function names are edited over time: https://cloud.google.com/speech-to-text/docs/streaming-recognize

Related

FTP To GCS Operator on Google Composer

I see there is no support for an FTP To Google Cloud Storage operator, only from SFTP to GCS.
I am trying to create a custom one using the following function, but I am getting a 404 error.
Is it possible to use the upload_from_file function using a temporary open file from the with statement?
with tempfile.NamedTemporaryFile(mode='rb') as file:
filename = file.name
self.ftp_hook.retrieve_file(
remote_full_path=remote_full_path, local_full_path_or_buffer=filename
)
file_path = os.path.join(destination_object)
self.log.info(file_path)
blob_file = gcs_bucket.blob(file_path)
blob_file.upload_from_file(file)

How to process video with OpenCV2 in Google Cloud function?

Starting point:
There is video called myVideo.mp4 in a folder (/1_original_videos) in a Bucket called myBucket in Google Cloud Storage.
myBucket
-->/1_original_video
-->myVideo.mp4
Goal:
The goal is to take this video, split it into chunks in a Cloud Function myCloudFunction and save the chunks in a subfolder called chunks in myBucket. The part of dividing into chunks is not a problem. The problem is reading the video.
myCloudFunction must be triggered with an HTTP trigger.
_______________
myVideo.mp4 ---->|myCloudFunction|----> chunk0.mp4, chunk1.mp4, chunk2.mp4, ... , chunkN-1.mp4
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
^
|
|
|
HTTP trigger
If the video were on my local computer, in order to read it, the following would be enough:
import cv2
cap = cv2.VideoCapture("/some/path/in/my/local/computer/myVideo.mp4")
Attempts:
Path with authenticated URL:
import cv2
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
When testing this approach, this is the resulting message (see complete code below):
"File Cannot be Opened"
Complete code:
import cv2
def video2chunks(request):
# Request:
REQUEST_JSON = request.get_json()
#If the HTTP contains a key called "start" (e.g. "{"start":"whatever"}"):
if REQUEST_JSON and 'start' in REQUEST_JSON:
try:
# Create VideoCapture object:
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
# If no VideoCapture object is created:
if not cap.isOpened():
message = "File Cannot be Opened"
# If a Videocapture object is created, compute some of the video parameters:
else:
fps = int(cap.get(cv2.CAP_PROP_FPS))
size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = int(cv2.VideoWriter_fourcc('X','V','I','D')) # XVID codecs
message = "Video downloaded successfully. Some params are: "
message += "FPS= " + str(fps) + " | size= " + str(size)
except Exception as e:
message = str(e)
else:
message = "You did not provide a key called start "
return message
I have been trying to find examples or a better way to do this in a Cloud Function but so far have been unsuccessful. Any alternatives would also be very much appreciated.
I'm not aware whether the cv2 library supports reading directly from Cloud Storage in some way. Nonetheless as Christoph points out you may download the file, process it and upload the results. The code will be essentially the same as running locally.
One thing to note is that Cloud Functions offer a temporal directory which is the way I chose to store the image. However it's important to know that any file stored there is actually consuming part of your function RAM, so the allocated function memory should be sized accordingly. Also you may notice the temp files are deleted before exiting the function, this is just a best practice in Cloud Functions.
import cv2
import os
from google.cloud import storage
def myfunc(request):
# Substitute the variables below for whatever suits your needs
# BUCKET_ID :: The bucket ID
# INPUT_IMAGE_GCS :: Path to GCS object
# OUTPUT_IMAGE_PATH :: Path to save the resulting image/video
# Read video and save to /tmp directory
bucket = storage.Client().bucket(BUCKET_ID)
blob = bucket.blob(INPUT_IMAGE_GCS)
blob.download_to_filename('/tmp/video.mp4')
# Video processing stuff
vidcap = cv2.VideoCapture('/tmp/video.mp4')
success, image = vidcap.read()
cv2.imwrite("/tmp/frame.jpg", image)
# Save results to GCS
img_blob = bucket.blob('potato/frame.jpg')
img_blob.upload_from_filename(OUTPUT_IMAGE_PATH)
# Delete tmp resources to free memory
os.remove('/tmp/video.mp4')
os.remove('/tmp/frame.jpg')
return '', 200

How to get a bitmap image in ruby?

The google vision API requires a bitmap sent as an argument. I am trying to convert a png from a URL to a bitmap to pass to the google api:
require "google/cloud/vision"
PROJECT_ID = Rails.application.secrets["project_id"]
KEY_FILE = "#{Rails.root}/#{Rails.application.secrets["key_file"]}"
google_vision = Google::Cloud::Vision.new project: PROJECT_ID, keyfile: KEY_FILE
img = open("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png").read
image = google_vision.image img
ArgumentError: string contains null byte
This is the source code processing of the gem:
def self.from_source source, vision = nil
if source.respond_to?(:read) && source.respond_to?(:rewind)
return from_io(source, vision)
end
# Convert Storage::File objects to the URL
source = source.to_gs_url if source.respond_to? :to_gs_url
# Everything should be a string from now on
source = String source
# Create an Image from a HTTP/HTTPS URL or Google Storage URL.
return from_url(source, vision) if url? source
# Create an image from a file on the filesystem
if File.file? source
unless File.readable? source
fail ArgumentError, "Cannot read #{source}"
end
return from_io(File.open(source, "rb"), vision)
end
fail ArgumentError, "Unable to convert #{source} to an Image"
end
https://github.com/GoogleCloudPlatform/google-cloud-ruby
Why is it telling me string contains null byte? How can I get a bitmap in ruby?
According to the documentation (which, to be fair, is not exactly easy to find without digging into the source code), Google::Cloud::Vision#image doesn't want the raw image bytes, it wants a path or URL of some sort:
Use Vision::Project#image to create images for the Cloud Vision service.
You can provide a file path:
[...]
Or any publicly-accessible image HTTP/HTTPS URL:
[...]
Or, you can initialize the image with a Google Cloud Storage URI:
So you'd want to say something like:
image = google_vision.image "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
instead of reading the image data yourself.
Instead of using write you want to use IO.copy_stream as it streams the download straight to the file system instead of reading the whole file into memory and then writing it:
require 'open-uri'
require 'tempfile'
uri = URI("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png")
tmp_img = Tempfile.new(uri.path.split('/').last)
IO.copy_stream(open(uri), tmp_img)
Note that you don't need to set the 'r:BINARY' flag as the bytes are just streamed without actually reading the file.
You can then use the file by:
require "google/cloud/vision"
# Use fetch as it raises an error if the key is not present
PROJECT_ID = Rails.application.secrets.fetch("project_id")
# Rails.root is a Pathname object so use `.join` to construct paths
KEY_FILE = Rails.root.join(Rails.application.secrets.fetch("key_file"))
google_vision = Google::Cloud::Vision.new(
project: PROJECT_ID,
keyfile: KEY_FILE
)
image = google_vision.image(File.absolute_path(tmp_img))
When you are done you clean up by calling tmp_img.unlink.
Remember to read things in binary format:
open("https://www.google.com/..._272x92dp.png",'r:BINARY').read
If you forget this it might try and open it as UTF-8 textual data which would cause lots of problems.

Downloading a YouTube video through Wget

I am trying to download YouTube videos through Wget. The first thing necessary is to capture the URL of the actual video resource. Suppose I want to download this video: video. Opening up the page in the Firebug console reveals something like this:
The link which I have encircled looks like the link to the resource, for there we see only the video: http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1. However, when I am trying to download this resource with Wget, a 4 KB file of name r-KBncrOggI#version=3&autohide=1 gets stored in my hard-drive, nothing else. What should I do to get the actual video?
And secondly, is there a way to capture different resources for videos of different resolutions, like 360px, 480px, etc.?
Here is one VERY simplified, yet functional version of the youtube-download utility I cited on my another answer:
#!/usr/bin/env perl
use strict;
use warnings;
# CPAN modules we depend on
use JSON::XS;
use LWP::UserAgent;
use URI::Escape;
# Initialize the User Agent
# YouTube servers are weird, so *don't* parse headers!
my $ua = LWP::UserAgent->new(parse_head => 0);
# fetch video page or abort
my $res = $ua->get($ARGV[0]);
die "bad HTTP response" unless $res->is_success;
# scrape video metadata
if ($res->content =~ /\byt\.playerConfig\s*=\s*({.+?});/sx) {
# parse as JSON or abort
my $json = eval { decode_json $1 };
die "bad JSON: $1" if $#;
# inside the JSON 'args' property, there's an encoded
# url_encoded_fmt_stream_map property which points
# to stream URLs and signatures
while ($json->{args}{url_encoded_fmt_stream_map} =~ /\burl=(http.+?)&sig=([0-9A-F\.]+)/gx) {
# decode URL and attach signature
my $url = uri_unescape($1) . "&signature=$2";
print $url, "\n";
}
}
Usage example (it returns several URLs to streams with different encoding/quality):
$ perl youtube.pl http://www.youtube.com/watch?v=r-KBncrOggI | head -n 1
http://r19---sn-bg07sner.c.youtube.com/videoplayback?fexp=923014%2C916623%2C920704%2C912806%2C922403%2C922405%2C929901%2C913605%2C925710%2C929104%2C929110%2C908493%2C920201%2C913302%2C919009%2C911116%2C926403%2C910221%2C901451&ms=au&mv=m&mt=1357996514&cp=U0hUTVBNUF9FUUNONF9IR1RCOk01RjRyaG4wTHdQ&id=afe2819dcace8202&ratebypass=yes&key=yt1&newshard=yes&expire=1358022107&ip=201.52.68.216&ipbits=8&upn=m-kyX9-4Tgc&sparams=cp%2Cid%2Cip%2Cipbits%2Citag%2Cratebypass%2Csource%2Cupn%2Cexpire&itag=44&sver=3&source=youtube,quality=large&signature=A1E7E91DD087067ED59101EF2AE421A3503C7FED.87CBE6AE7FB8D9E2B67FEFA9449D0FA769AEA739
I'm afraid it's not that easy do get the right link for the video resource.
The link you got, http://www.youtube.com/v/r-KBncrOggI?version=3&autohide=1, points to the player rather than the video itself. There is one Perl utility, youtube-download, which is well-maintained and does the trick. This is how to get the HQ version (magic fmt=18) of that video:
stas#Stanislaws-MacBook-Pro:~$ youtube-download -o "{title}.{suffix}" --fmt 18 r-KBncrOggI
--> Working on r-KBncrOggI
Downloading `Sourav Ganguly in Farhan Akhtar's Show - Oye! It's Friday!.mp4`
75161060/75161060 (100.00%)
Download successful!
stas#Stanislaws-MacBook-Pro:~$
There might be better command-line YouTube Downloaders around. But sorry, one doesn't simply download a video using Firebug and wget any more :(
The only way I know to capture that URL manually is by watching the active downloads of the browser:
That largest data chunks are video data, so you can copy its URL:
http://s.youtube.com/s?lact=111116&uga=m30&volume=4.513679238953965&sd=BBE62AA4AHH1357937949850490&rendering=accelerated&fs=0&decoding=software&nsivbblmax=679542.000&hcbt=105.345&sendtmp=1&fmt=35&w=640&vtmp=1&referrer=None&hl=en_US&nsivbblmin=486355.000&nsivbblmean=603805.166&md=1&plid=AATTCZEEeM825vCx&ns=yt&ptk=youtube_none&csipt=watch7&rt=110.904&tsphab=1&nsiabblmax=129097.000&tspne=0&tpmt=110&nsiabblmin=123113.000&tspfdt=436&hbd=30900552&et=110.146&hbt=30.770&st=70.213&cfps=25&cr=BR&h=480&screenw=1440&nsiabblmean=125949.872&cpn=JlqV9j_oE1jzk7Zc&nsivbblc=343&nsiabblc=343&docid=r-KBncrOggI&len=1302.676&screenh=900&abd=1&pixel_ratio=1&bc=26131333&playerw=854&idpj=0&hcbd=25408143&playerh=510&ldpj=0&fexp=920704,919009,922403,916709,912806,929110,928008,920201,901451,909708,913605,925710,916623,929104,913302,910221,911116,914093,922405,929901&scoville=1&el=detailpage&bd=6676317&nsidf=1&vid=Yfg8gnutZoTD4G5SVKCxpsPvirbqG7pvR&bt=40.333&mos=0&vq=auto
However, for a large video, this will only return a part of the stream unless you figure out the URL query parameter responsible for stream range to be downloaded and adjust it.
A bonus: everything changes periodically as YouTube is constantly evolving. So, don't do that manually unless you carve pain.

Load or Stress Testing Tool with URL Import Functionality

Can someone recommend a load testing tool which allows you to either:
a. replay an IIS (7) log(s) to simulate a real live site daily run;
b. import a CSV or equivalent list of URLS so we can achieve a similar thing as above but at a URL level;
c. .net API so I can create simple tests easily from my list of URLS is also a good way to go.
I do not really want to record my tests.
I think I can do B) with WAPT but need to create an XML file manually, not too much grief, but wondering if any tools cover these scenarios out the box.
Visual Studio Test Edition would require some code to parse the file into a suitable test run.
It is a great load testing solution.
Our load testing service lets you write a very simple script using JavaScript to pull data out of a CSV file and then fetch those URLs. For example, the following code would pluck 10 random URLs from the CSV file and fetch them as part of a single session:
var c = browserMob.openHttpClient();
var csv = browserMob.getCSV("urls.csv");
browserMob.beginTransaction();
for (var i = 0; i < 10; i++) {
browserMob.beginStep("Step 1");
var url = csv.random().get("url");
c.get(url);
browserMob.endStep();
}
browserMob.endTransaction();
The CSV file itself needs to be a normal CSV file with the first row containing a header named "url". This script would be run repeatedly for each virtual user participating in a load test.
We have support for so called 'uri-format' in our open-source tool called Yandex.Tank You simply put all your uris to a file, one uri -- one line, then specify headers in your load.ini like this:
[phantom]
address=example.org
rps_schedule=line(1, 1600, 2m)
headers = [Host: mts-maps.yandex.ru]
[Connection: close] [Bloody: yes]
ammo_file = ammo.uri
ammo.uri:
/
/index.html
/1/example.html
/2/example.html

Resources