Starting point:
There is video called myVideo.mp4 in a folder (/1_original_videos) in a Bucket called myBucket in Google Cloud Storage.
myBucket
-->/1_original_video
-->myVideo.mp4
Goal:
The goal is to take this video, split it into chunks in a Cloud Function myCloudFunction and save the chunks in a subfolder called chunks in myBucket. The part of dividing into chunks is not a problem. The problem is reading the video.
myCloudFunction must be triggered with an HTTP trigger.
_______________
myVideo.mp4 ---->|myCloudFunction|----> chunk0.mp4, chunk1.mp4, chunk2.mp4, ... , chunkN-1.mp4
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
^
|
|
|
HTTP trigger
If the video were on my local computer, in order to read it, the following would be enough:
import cv2
cap = cv2.VideoCapture("/some/path/in/my/local/computer/myVideo.mp4")
Attempts:
Path with authenticated URL:
import cv2
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
When testing this approach, this is the resulting message (see complete code below):
"File Cannot be Opened"
Complete code:
import cv2
def video2chunks(request):
# Request:
REQUEST_JSON = request.get_json()
#If the HTTP contains a key called "start" (e.g. "{"start":"whatever"}"):
if REQUEST_JSON and 'start' in REQUEST_JSON:
try:
# Create VideoCapture object:
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
# If no VideoCapture object is created:
if not cap.isOpened():
message = "File Cannot be Opened"
# If a Videocapture object is created, compute some of the video parameters:
else:
fps = int(cap.get(cv2.CAP_PROP_FPS))
size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = int(cv2.VideoWriter_fourcc('X','V','I','D')) # XVID codecs
message = "Video downloaded successfully. Some params are: "
message += "FPS= " + str(fps) + " | size= " + str(size)
except Exception as e:
message = str(e)
else:
message = "You did not provide a key called start "
return message
I have been trying to find examples or a better way to do this in a Cloud Function but so far have been unsuccessful. Any alternatives would also be very much appreciated.
I'm not aware whether the cv2 library supports reading directly from Cloud Storage in some way. Nonetheless as Christoph points out you may download the file, process it and upload the results. The code will be essentially the same as running locally.
One thing to note is that Cloud Functions offer a temporal directory which is the way I chose to store the image. However it's important to know that any file stored there is actually consuming part of your function RAM, so the allocated function memory should be sized accordingly. Also you may notice the temp files are deleted before exiting the function, this is just a best practice in Cloud Functions.
import cv2
import os
from google.cloud import storage
def myfunc(request):
# Substitute the variables below for whatever suits your needs
# BUCKET_ID :: The bucket ID
# INPUT_IMAGE_GCS :: Path to GCS object
# OUTPUT_IMAGE_PATH :: Path to save the resulting image/video
# Read video and save to /tmp directory
bucket = storage.Client().bucket(BUCKET_ID)
blob = bucket.blob(INPUT_IMAGE_GCS)
blob.download_to_filename('/tmp/video.mp4')
# Video processing stuff
vidcap = cv2.VideoCapture('/tmp/video.mp4')
success, image = vidcap.read()
cv2.imwrite("/tmp/frame.jpg", image)
# Save results to GCS
img_blob = bucket.blob('potato/frame.jpg')
img_blob.upload_from_filename(OUTPUT_IMAGE_PATH)
# Delete tmp resources to free memory
os.remove('/tmp/video.mp4')
os.remove('/tmp/frame.jpg')
return '', 200
Related
I'm trying to create a dataflow pipeline and deploy it in Cloud environment. I have a code like below, which is trying to read files from a specific folder within a GCS bucket :
def read_from_bucket():
file_paths=[]
client = storage.Client()
bucket = client.bucket("bucket_name")
blobs_specific = list(bucket.list_blobs(prefix="test_folder/"))
for blob in blobs_specific:
file_paths.append(blob)
return file_paths
The above function returns a list of file paths present in that GCS folder.
Now this list is sent to the below code, which will filter the files as per their extension and store it in respective folders in the GCS bucket.
def write_to_bucket():
client = storage.Client()
for blob in client.list_blobs("dataplus-temp"):
source_bucket = client.bucket("source_bucket")
source_blob = source_bucket.blob(blob.name)
file_extension = pathlib.Path(blob.name).suffix
if file_extension == ".json":
destination_bucket=client.bucket("destination_bucket")
new_blob = source_bucket.copy_blob(source_blob,destination_bucket,'source_bucket/JSON/{}'.format(source_blob))
elif file_extension == ".txt":
destination_bucket=client.bucket("destination_bucket")
new_blob = source_bucket.copy_blob(source_blob,destination_bucket,'Text/{}'.format(source_blob))
I have to execute the above implementation using dataflow pipeline , in such a way that the file path has go to dataflow pipeline and it should get stored in the respective folders. I created a dataflow pipeline like below, but not sure whether I used right Ptrasformations.
pipe= beam.Pipeline()(
pipe
|"Read data from bucket" >> beam.ParDo(read_from_bucket)
|"Write files to folders" >> beam.ParDo(write_to_bucket)
)
pipe.run()
executed like :
python .\\filename.py
--region asia-east1
--runner DataflowRunner
--project proejct_name
--temp_location gs://bucket_name/tmp
--job_name test
I'm getting below error after execution:
return inputs[0].windowing
AttributeError: 'PBegin' object has no attribute 'windowing'
I have checked apache-beam documentation, but somehow unable to understand. I'm new to apache-beam, just started as a beginner, please consider if this question is silly.
Kindly help me in how to solve this.
To solve your issue you have to use existing Beam IO to read and write to Cloud Storage, example :
import apache_beam as beam
from apache_beam.io import ReadFromText, WriteToText
def run():
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
(
p
| "Read data from bucket" >> ReadFromText("gs://your_input_bucket/your_folder/*")
| "Write files to folders" >> WriteToText("gs://your_dest_bucket")
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
run()
I am trying to use Haar cascade classifier for object detection.I have copied a code for haar cascade algorithm but its not working.It's giving error as
unknown url type: '//drive.google.com/drive/folders/11XfAPOgFv7qJbdUdPpHKy8pt6aItGvyg'
even though this link is working.
import urllib.request, urllib.error, urllib.parse
import cv2
import os
def store_raw_images():
neg_images_link = '//drive.google.com/drive/folders/11XfAPOgFv7qJbdUdPpHKy8pt6aItGvyg'
neg_image_urls = urllib.request.urlopen(neg_images_link).read().decode()
pic_num = 1
if not os.path.exists('neg'):
os.makedirs('neg')
for i in neg_image_urls.split('\n'):
try:
print(i)
urllib.request.urlretrieve(i, "neg/"+str(pic_num)+".jpg")
img = cv2.imread("neg/"+str(pic_num)+".jpg",cv2.IMREAD_GRAYSCALE)
# should be larger than samples / pos pic (so we can place our image on it)
resized_image = cv2.resize(img, (100, 100))
cv2.imwrite("neg/"+str(pic_num)+".jpg",resized_image)
pic_num += 1
except Exception as e:
print(str(e))
store_raw_images()
I am expecting output as set of negative images for creating dataset module for object detection.
I think the missing "https:" at the start of the url is the causing the specific error.
Furthermore, you cannot just load a drive folder when it is not shared (you should use the drive link) and event then it is not optimal, you have to parse the html response and it may not even work.
I strongly suggest you to use a normal HTTP server or the Google Drive python API.
The google vision API requires a bitmap sent as an argument. I am trying to convert a png from a URL to a bitmap to pass to the google api:
require "google/cloud/vision"
PROJECT_ID = Rails.application.secrets["project_id"]
KEY_FILE = "#{Rails.root}/#{Rails.application.secrets["key_file"]}"
google_vision = Google::Cloud::Vision.new project: PROJECT_ID, keyfile: KEY_FILE
img = open("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png").read
image = google_vision.image img
ArgumentError: string contains null byte
This is the source code processing of the gem:
def self.from_source source, vision = nil
if source.respond_to?(:read) && source.respond_to?(:rewind)
return from_io(source, vision)
end
# Convert Storage::File objects to the URL
source = source.to_gs_url if source.respond_to? :to_gs_url
# Everything should be a string from now on
source = String source
# Create an Image from a HTTP/HTTPS URL or Google Storage URL.
return from_url(source, vision) if url? source
# Create an image from a file on the filesystem
if File.file? source
unless File.readable? source
fail ArgumentError, "Cannot read #{source}"
end
return from_io(File.open(source, "rb"), vision)
end
fail ArgumentError, "Unable to convert #{source} to an Image"
end
https://github.com/GoogleCloudPlatform/google-cloud-ruby
Why is it telling me string contains null byte? How can I get a bitmap in ruby?
According to the documentation (which, to be fair, is not exactly easy to find without digging into the source code), Google::Cloud::Vision#image doesn't want the raw image bytes, it wants a path or URL of some sort:
Use Vision::Project#image to create images for the Cloud Vision service.
You can provide a file path:
[...]
Or any publicly-accessible image HTTP/HTTPS URL:
[...]
Or, you can initialize the image with a Google Cloud Storage URI:
So you'd want to say something like:
image = google_vision.image "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
instead of reading the image data yourself.
Instead of using write you want to use IO.copy_stream as it streams the download straight to the file system instead of reading the whole file into memory and then writing it:
require 'open-uri'
require 'tempfile'
uri = URI("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png")
tmp_img = Tempfile.new(uri.path.split('/').last)
IO.copy_stream(open(uri), tmp_img)
Note that you don't need to set the 'r:BINARY' flag as the bytes are just streamed without actually reading the file.
You can then use the file by:
require "google/cloud/vision"
# Use fetch as it raises an error if the key is not present
PROJECT_ID = Rails.application.secrets.fetch("project_id")
# Rails.root is a Pathname object so use `.join` to construct paths
KEY_FILE = Rails.root.join(Rails.application.secrets.fetch("key_file"))
google_vision = Google::Cloud::Vision.new(
project: PROJECT_ID,
keyfile: KEY_FILE
)
image = google_vision.image(File.absolute_path(tmp_img))
When you are done you clean up by calling tmp_img.unlink.
Remember to read things in binary format:
open("https://www.google.com/..._272x92dp.png",'r:BINARY').read
If you forget this it might try and open it as UTF-8 textual data which would cause lots of problems.
I followed this page:
https://cloud.google.com/speech/docs/getting-started
and I could reach the end of it without problems.
In the example though, the file
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
is processed.
What if I want to process a local file? In case this is not possible, how can I upload my .flac via command line?
Thanks
You're now able to process a local file by specifying a local path instead of the google storage one:
gcloud ml speech recognize '/Users/xxx/cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
You can send this command by using the gcloud tool (https://cloud.google.com/speech-to-text/docs/quickstart-gcloud).
Solution found:
I created my own bucket (my_bucket_test), and I upload the file there via:
gsutil cp speech.flac gs://my_bucket_test
If you don't want to create a bucket (costs extra time and money) - you can stream the local files. The following code is copied directly from the Google cloud docs:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech
client = speech.SpeechClient()
with io.open(stream_file, "rb") as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (
speech.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
streaming_config = speech.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(
config=streaming_config,
requests=requests,
)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print("Finished: {}".format(result.is_final))
print("Stability: {}".format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print("Confidence: {}".format(alternative.confidence))
print(u"Transcript: {}".format(alternative.transcript))
Here is the URL incase the package's function names are edited over time: https://cloud.google.com/speech-to-text/docs/streaming-recognize
The code below resizes my image. But I am not sure how to write it out to a temp file or blob so I can upload it to s3.
origImage = MiniMagick::Image.open(myPhoto.tempfile.path)
origImage.resize "200x200"
thumbKey = "tiny-#{key}"
obj = bucket.objects[thumbKey].write(:file => origImage.write("tiny.jpg"))
I can upload the original file just fine to s3 with the below command:
obj = bucket.objects[key].write('data')
obj.write(:file => myPhoto.tempfile)
I think I want to create a temp file, read the image file into it and upload that:
thumbFile = Tempfile.new('temp')
thumbFile.write(origImage.read)
obj = bucket.objects[thumbKey].write(:file => thumbFile)
but the origImage class doesn't have a read command.
UPDATE: I was reading the source code and found this out about the write command
# Writes the temporary file out to either a file location (by passing in a String) or by
# passing in a Stream that you can #write(chunk) to repeatedly
#
# #param output_to [IOStream, String] Some kind of stream object that needs to be read or a file path as a String
# #return [IOStream, Boolean] If you pass in a file location [String] then you get a success boolean. If its a stream, you get it back.
# Writes the temporary image that we are using for processing to the output path
And the s3 api docs say you can stream the content using a code block like:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
How do I change my code so
origImage.write(s3stream here)
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
UPDATE 2
This code successfully uploads the thumbnail file to s3. But I would still love to know how to stream it up. It would be much more efficient I think.
#resize image and upload a thumbnail
smallImage = MiniMagick::Image.open(myPhoto.tempfile.path)
smallImage.resize "200x200"
thumbKey = "tiny-#{key}"
newFile = Tempfile.new("tempimage")
smallImage.write(newFile.path)
obj = bucket.objects[thumbKey].write('data')
obj.write(:file => newFile)
smallImage.to_blob ?
below code copy from https://github.com/probablycorey/mini_magick/blob/master/lib/mini_magick.rb
# Gives you raw image data back
# #return [String] binary string
def to_blob
f = File.new #path
f.binmode
f.read
ensure
f.close if f
end
Have you looked into the paperclip gem? The gem offers direct compatibility to s3 and works great.