I'm using these precompiled binaries of pyaudio with WASAPI support. I want to play a wav file via WASAPI. I found index of default output device for this api:
import pyaudio
p = pyaudio.PyAudio()
print p.get_host_api_info_by_index(3)
>>{'index': 3, 'name': u'Windows WASAPI', 'defaultOutputDevice': 11L, 'type': 13L, 'deviceCount': 3L, 'defaultInputDevice': 12L, 'structVersion': 1L}
Then I play a wav file via this device:
import pyaudio
import wave
CHUNK = 1024
wf = wave.open('test.wav', 'rb')
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2)
stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
channels=wf.getnchannels(),
rate=wf.getframerate(),
output_device_index=11,
output=True)
# read data
data = wf.readframes(CHUNK)
# play stream (3)
while data != '':
stream.write(data)
data = wf.readframes(CHUNK)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
When file is playing I'm still able to hear another sounds in the system, but in exclusive WASAPI mode all other sounds must be blocked. So how to enable WASAPI exclusive mode in pyaudio?
There is need to change sources of pyaudio. We need to modify _portaudiomodule.c.
Include pa_win_wasapi.h:
#include pa_win_wasapi.h
Change this line:
outputParameters->hostApiSpecificStreamInfo = NULL;
On this:
struct PaWasapiStreamInfo wasapiInfo;
wasapiInfo.size = sizeof(PaWasapiStreamInfo);
wasapiInfo.hostApiType = paWASAPI;
wasapiInfo.version = 1;
wasapiInfo.flags = (paWinWasapiExclusive|paWinWasapiThreadPriority);
wasapiInfo.threadPriority = eThreadPriorityProAudio;
outputParameters->hostApiSpecificStreamInfo = (&wasapiInfo);
Now we need to compile pyaudio.
Place portaudio dir in pyaudio with name portaudio-v19, name is important
Install MinGW/MSYS: gcc, make and MSYS console we need
In MSYS console cd to portaudio-v19
./configure --with-winapi=wasapi --enable-shared=no
make
cd ..
change these lines:
external_libraries += ['winmm']
extra_link_args += ['-lwinmm']
in setup.py on these:
external_libraries += ["winmm","ole32","uuid"]
extra_link_args += ["-lwinmm","-lole32","-luuid"]
python setup.py build --static-link -cmingw32
python setup.py install --skip-build
That's all. Now pyadio is able to play sound in WASAPI exclusive mode.
Related
Starting point:
There is video called myVideo.mp4 in a folder (/1_original_videos) in a Bucket called myBucket in Google Cloud Storage.
myBucket
-->/1_original_video
-->myVideo.mp4
Goal:
The goal is to take this video, split it into chunks in a Cloud Function myCloudFunction and save the chunks in a subfolder called chunks in myBucket. The part of dividing into chunks is not a problem. The problem is reading the video.
myCloudFunction must be triggered with an HTTP trigger.
_______________
myVideo.mp4 ---->|myCloudFunction|----> chunk0.mp4, chunk1.mp4, chunk2.mp4, ... , chunkN-1.mp4
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
^
|
|
|
HTTP trigger
If the video were on my local computer, in order to read it, the following would be enough:
import cv2
cap = cv2.VideoCapture("/some/path/in/my/local/computer/myVideo.mp4")
Attempts:
Path with authenticated URL:
import cv2
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
When testing this approach, this is the resulting message (see complete code below):
"File Cannot be Opened"
Complete code:
import cv2
def video2chunks(request):
# Request:
REQUEST_JSON = request.get_json()
#If the HTTP contains a key called "start" (e.g. "{"start":"whatever"}"):
if REQUEST_JSON and 'start' in REQUEST_JSON:
try:
# Create VideoCapture object:
cap = cv2.VideoCapture("https://storage.cloud.google.com/myBucket/1_original_videos/myVideo.mp4")
# If no VideoCapture object is created:
if not cap.isOpened():
message = "File Cannot be Opened"
# If a Videocapture object is created, compute some of the video parameters:
else:
fps = int(cap.get(cv2.CAP_PROP_FPS))
size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = int(cv2.VideoWriter_fourcc('X','V','I','D')) # XVID codecs
message = "Video downloaded successfully. Some params are: "
message += "FPS= " + str(fps) + " | size= " + str(size)
except Exception as e:
message = str(e)
else:
message = "You did not provide a key called start "
return message
I have been trying to find examples or a better way to do this in a Cloud Function but so far have been unsuccessful. Any alternatives would also be very much appreciated.
I'm not aware whether the cv2 library supports reading directly from Cloud Storage in some way. Nonetheless as Christoph points out you may download the file, process it and upload the results. The code will be essentially the same as running locally.
One thing to note is that Cloud Functions offer a temporal directory which is the way I chose to store the image. However it's important to know that any file stored there is actually consuming part of your function RAM, so the allocated function memory should be sized accordingly. Also you may notice the temp files are deleted before exiting the function, this is just a best practice in Cloud Functions.
import cv2
import os
from google.cloud import storage
def myfunc(request):
# Substitute the variables below for whatever suits your needs
# BUCKET_ID :: The bucket ID
# INPUT_IMAGE_GCS :: Path to GCS object
# OUTPUT_IMAGE_PATH :: Path to save the resulting image/video
# Read video and save to /tmp directory
bucket = storage.Client().bucket(BUCKET_ID)
blob = bucket.blob(INPUT_IMAGE_GCS)
blob.download_to_filename('/tmp/video.mp4')
# Video processing stuff
vidcap = cv2.VideoCapture('/tmp/video.mp4')
success, image = vidcap.read()
cv2.imwrite("/tmp/frame.jpg", image)
# Save results to GCS
img_blob = bucket.blob('potato/frame.jpg')
img_blob.upload_from_filename(OUTPUT_IMAGE_PATH)
# Delete tmp resources to free memory
os.remove('/tmp/video.mp4')
os.remove('/tmp/frame.jpg')
return '', 200
I am trying to transcribe audio from a stream using this tutorial (section, "Performing streaming speech recognition on a local file"): https://cloud.google.com/speech-to-text/docs/streaming-recognize
The file is an M3U file, so I am trying to use the RecognitionConfig.AudioEncoding.MP3 option, but the MP3 attribute is being rejected. When I try to autocomplete the option, MP3 does not appear either.
The documentation show that the MP3 attribute is only available in version v1beta1 (https://cloud.google.com/text-to-speech/docs/reference/rpc/google.cloud.texttospeech.v1beta1#google.cloud.texttospeech.v1beta1.AudioEncoding), and I ran the pip upgrade.
Is there something else I need to do to install v1beta1?
Note that the second link you shared, regarding v1beta1, is for the Text-to-Speech API which is the other way around of the examples you are following (Speech-to-Text API).
In that case, to use RecognitionConfig.AudioEncoding.MP3, you'll need to use the v1p1beta1 version instead. No changes are needed to the pip command (pip install --upgrade google-cloud-speech) but you need to import the right version (speech_v1p1beta1) in your Python code:
# [START speech_transcribe_streaming]
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech_v1p1beta1.SpeechClient()
And now you can use the MP3 encoding:
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=16000,
language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
Full code here but it's just the base example with the previous changes.
Tested with an MP3 sample:
$ python mp3.py sample.mp3
Finished: True
Stability: 0.0
Confidence: 0.9875912666320801
Transcript: I'm sorry Dave I'm afraid I can't do that
l try to read the following video, downloaded from http://www.sample-videos.com/
which is http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4
Here is my code :
import cv2
cap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
if(cap.isOpened()== False):
print("Error opening video stream or file")
count = 0
while (cap.isOpened()):
# capture frame by frame :
ret, frame = cap.read()
if ret==True:
# Display the resulting frame
cv2.imshow('Frame', frame)
cv2.imwrite("frame%d.jpg" % count, frame)
count +=1
print(count)
However l get Error opening video stream or file at cap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
and ret equals False always
My OpenCV version is 3.1.0
There may be the following issue with your machine:
configure the video path
check the permission to access the file
install an additional codec
You might have installed opencv but there are some prerequisites needs to be installed while reading a .mp4 video file using open cv.
You can verify that by simply reading an .avi format file and .mp4 file
[it could read .avi file but not .mp4 file]
To read a mp4 .file
Install ffmpeg package compiled with H.264 codec:
H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) is a standard for video compression, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.
Ref : https://www.debiantutorials.com/how-to-install-ffmpeg-with-h-264mpeg-4-avc/
Few suggestions to make sure all prerequisites are available
1. check ffmpeg package compiled with H.264 is already installed in the machine using the command below.
ffmpeg -version
2. Installation of open-cv in anaconda will reduce the stress to install ffmpeg package compiled with H.264
3. Make sure that the user created in the machine has got enough privilege to read and write in specific application related directories
a. Check the read and write permission using the command below
ls -ld <folder-path>
or
namei -mo <folder-path>
b. Alter the access writes based on the user privilege required (sudo access needed else we need to engage admin to alter the permission)
eg : sudo chmod -R 740 <folder-path>** [ Recursive rwx for user ,r for group ]
I followed this page:
https://cloud.google.com/speech/docs/getting-started
and I could reach the end of it without problems.
In the example though, the file
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
is processed.
What if I want to process a local file? In case this is not possible, how can I upload my .flac via command line?
Thanks
You're now able to process a local file by specifying a local path instead of the google storage one:
gcloud ml speech recognize '/Users/xxx/cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
You can send this command by using the gcloud tool (https://cloud.google.com/speech-to-text/docs/quickstart-gcloud).
Solution found:
I created my own bucket (my_bucket_test), and I upload the file there via:
gsutil cp speech.flac gs://my_bucket_test
If you don't want to create a bucket (costs extra time and money) - you can stream the local files. The following code is copied directly from the Google cloud docs:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech
client = speech.SpeechClient()
with io.open(stream_file, "rb") as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (
speech.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
streaming_config = speech.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(
config=streaming_config,
requests=requests,
)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print("Finished: {}".format(result.is_final))
print("Stability: {}".format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print("Confidence: {}".format(alternative.confidence))
print(u"Transcript: {}".format(alternative.transcript))
Here is the URL incase the package's function names are edited over time: https://cloud.google.com/speech-to-text/docs/streaming-recognize
This is a beginner question, since I am new to iOS(I started it today), so please pardon my ignorance and lack of iOS knowledge.
After building and successfully using FFMpeg for Android I wanted to do the same for iOS.
So I built FFMpeg successfully for iOS by following this link, but after all that pain I am confused as how to use FFMpeg in iOS, I mean how can I pass command line arguments to libffmpeg.a file?
I am assuming that there must be a way to run the .a file as an executable and then pass command line arguments and hope for FFMpeg to do the magic, I did the same in Android and it worked beautifully.
I am also aware that I can use ffmpeg.c class and use its main method, but the question remains; how do I pass those command line arguments?
Is there something I am supposed to be aware of here, is the thing what I am doing now correct or am I falling short on my approach?
I wanted to mix two audio files, so the command for doing that would be ffmpeg -i firstSound.wav -i secondSound.wav -filter_complex amix=inputs=2:duration=longest finalOutput.wav, how do I do the same in iOS?
Can someone please shed some light on this?
You don't pass arguments to a .a file as it's a library file. It's something you build your application with, giving you access to the functions provided by the ffmpeg library. I'm not sure what the state of play with Android is but it's likely it's generating a command line executable instead.
Have a look at the ffmpeg documentation, there's probably a way to do what you want with the library, however building and running ffmpeg as a standalone, pass-in-arguments, binary is unlikely.
You can do it in your main.c, and of course you wouldn't hardcode args these are just for illustration
I assume your using ffmpeg for playback since your playing with iframeextractor, what actually is the goal of what your trying to do.
/* Called from the main */
int main(int argc, char **argv)
int flags, i;
/*
argv[1] = "-fs";
argv[2] = "-skipframe";
argv[3] = "30";
argv[4] = "-fast";
argv[5] = "-sync";
argv[6] = "video";
argv[7] = "-drp";
argv[8] = "-skipidct";
argv[9] = "10";
argv[10] = "-skiploop";
argv[11] = "50";
argv[12] = "-threads";
argv[13] = "5";
//argv[14] = "-an";
argv[15] = "http://172.16.1.33:63478/hulu-f4fa0821-767a-490a-8cb5-f03788760e31/1-hulu-f4fa0821-767a-490a-8cb5-f03788760e31.mpg";
argc += 14;
*/
/* register all codecs, demux and protocols */
avcodec_register_all();
avdevice_register_all();
av_register_all();
parse_options(argc, argv, options, opt_input_file);
. .. mo
}