python opencv create image from bytearray

python opencv create image from bytearray - opencv

I am capturing video from a Ricoh Theta V camera. It delivers the video as Motion JPEG (MJPEG). To get the video you have to do an HTTP POST alas which means I cannot use the cv2.VideoCapture(url) feature.
So the way to do this per numerous posts on the web and SO is something like this:
bytes = bytes()
while True:
bytes += stream.read(1024)
a = bytes.find(b'\xff\xd8')
b = bytes.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = bytes[a:b+2]
bytes = bytes[b+2:]
i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
cv2.imshow('i', i)
if cv2.waitKey(1) == 27:
exit(0)
That actually works, except it is slow. I'm processing a 1920x1080 jpeg stream. on a Mac Book Pro running OSX 10.12.6. The call to imdecode takes approx 425000 microseconds to process each image
Any idea how to do this without imdecode or make imdecode faster? I'd like it to work at 60FPS with HD video (at least).
I'm using Python3.7 and OpenCV4.

Updated Again
I looked into JPEG decoding from the memory buffer using PyTurboJPEG, the code goes like this to compare with OpenCV's imdecode():
#!/usr/bin/env python3
import cv2
from turbojpeg import TurboJPEG, TJPF_GRAY, TJSAMP_GRAY
# Load image into memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# Decode JPEG from memory into Numpy array using OpenCV
i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
# Use default library installation
jpeg = TurboJPEG()
# Decode JPEG from memory using turbojpeg
i1 = jpeg.decode(r)
cv2.imshow('Decoded with TurboJPEG', i1)
cv2.waitKey(0)
And the answer is that TurboJPEG is 7x faster! That is 4.6ms versus 32.2ms.
In [18]: %timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
32.2 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [19]: %timeit i1 = jpeg.decode(r)
4.63 ms ± 55.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Kudos to #Nuzhny for spotting it first!
Updated Answer
I have been doing some further benchmarks on this and was unable to verify your claim that it is faster to save an image to disk and read it with imread() than it is to use imdecode() from memory. Here is how I tested in IPython:
import cv2
# First use 'imread()'
%timeit i1 = cv2.imread('image.jpg', cv2.IMREAD_COLOR)
116 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Now prepare the exact same image in memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# And try again with 'imdecode()'
%timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
113 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, I find imdecode() around 3% faster than imread() on my machine. Even if I include the np.asarray() into the timing, it is still quicker from memory than disk - and I have seriously fast 3GB/s NVME disks on my machine...
Original Answer
I haven't tested this but it seems to me that you are doing this in a loop:
read 1k bytes
append it to a buffer
look for JPEG SOI marker (0xffdb)
look for JPEG EOI marker (0xffd9)
if you have found both the start and the end of a JPEG frame, decode it
1) Now, most JPEG images with any interesting content I have seen are between 30kB to 300kB so you are going to do 30-300 append operations on a buffer. I don't know much abut Python but I guess that may cause a re-allocation of memory, which I guess may be slow.
2) Next you are going to look for the SOI marker in the first 1kB, then again in the first 2kB, then again in the first 3kB, then again in the first 4kB - even if you have already found it!
3) Likewise, you are going to look for the EOI marker in the first 1kB, the first 2kB...
So, I would suggest you try:
1) allocating a bigger buffer at the start and acquiring directly into it at the appropriate offset
2) not searching for the SOI marker if you have already found it - e.g. set it to -1 at the start of each frame and only try and find it if it is still -1
3) only look for the EOI marker in the new data on each iteration, not in all the data you have already searched on previous iterations
4) furthermore, actually, don't bother looking for the EOI marker unless you have already found the SOI marker, because the end of a frame without the corresponding start is no use to you anyway - it is incomplete.
I may be wrong in my assumptions, (I have been before!) but at least if they are public someone cleverer than me can check them!!!

I recommend to use turbo-jpeg. It has a python API: PyTurboJPEG.

Related

Python parallelization for code to combine multiple images

I am new to Python and am trying to parallelize a program that I somehow pieced together from the internet. The program reads all image files (usually multiple series of images such as abc001,abc002...abc015 and xyz001,xyz002....xyz015) in a specific folder and then combines images in a specified range. Most times, the number of files exceeds 10000, and my latest case requires me to combine 24000 images. Could someone help me with:
Taking 2 sets of images from different directories. Currently I have to move these images into 1 directory and then work in said directory.
Reading only specified files. Currently my program reads all files, saves names in an array (I think it's an array. Could be a directory also) and then uses only the images required to combine. If I specify a range of files, it still checks against all files in the directory and takes a lot of time.
Parallel Processing - I work with usually 10k files or sometimes more. These are images saved from the fluid simulations that I run at specific times. Currently, I save about 2k files at a time in separate folders and run the program to combine these 2000 files at one time. And then I copy all the output files to a separate folder to keep them together. It would be great if I could use all 16 cores on the processor to combine all files in 1 go.
Image series 1 is like so.
Consider it to be a series of photos of the cat walking towards the camera. Each frame is is suffixed with 001,002,...,n.
Image series 1 is like so.
Consider it to be a series of photos of the cat's expression changing with each frame. Each frame is is suffixed with 001,002,...,n.
The code currently combines each frame from set1 and set2 to provide output.png as shown in the link here.
import sys
import os
from PIL import Image
keywords=input('Enter initial characters of image series 1 [Ex:Scalar_ , VoF_Scene_]:\n')
keywords2=input('Enter initial characters of image series 2 [Ex:Scalar_ , VoF_Scene_]:\n')
directory = input('Enter correct folder name where images are present :\n') # FOLDER WHERE IMAGES ARE LOCATED
result1 = {}
result2={}
name_count1=0
name_count2=0
for filename in os.listdir(directory):
if keywords in filename:
name_count1 +=1
result1[name_count1] = os.path.join(directory, filename)
if keywords2 in filename:
name_count2 +=1
result2[name_count2] = os.path.join(directory, filename)
num1=input('Enter initial number of series:\n')
num2=input('Enter final number of series:\n')
num1=int(num1)
num2=int(num2)
if name_count1==(num2-num1+1):
a1=1
a2=name_count1
elif name_count2==(num2-num1+1):
a1=1
a2=name_count2
else:
a1=num1
a2=num2+1
for x in range(a1,a2):
y=format(x,'05') # '05' signifies number of digits in the series of file name Ex: [Scalar_scene_1_00345.png --> 5 digits], [Temperature_section_2_951.jpg --> 3 digits]. Change accordingly
y=str(y)
for comparison_name1 in result1:
for comparison_name2 in result2:
test1=result1[comparison_name1]
test2=result2[comparison_name2]
if y in test1 and y in test2:
a=test1
b=test2
test=[a,b]
images = [Image.open(x) for x in test]
widths, heights = zip(*(i.size for i in images))
total_width = sum(widths)
max_height = max(heights)
new_im = Image.new('RGB', (total_width, max_height))
x_offset = 0
for im in images:
new_im.paste(im, (x_offset,0))
x_offset += im.size[0]
output_name='output'+y+'.png'
new_im.save(os.path.join(directory, output_name))

I did a Python version as well, it's not quite as fast but it is maybe closer to your heart :-)
#!/usr/bin/env python3
import cv2
import numpy as np
from multiprocessing import Pool
def doOne(params):
"""Append the two input images side-by-side to output the third."""
imA = cv2.imread(params[0], cv2.IMREAD_UNCHANGED)
imB = cv2.imread(params[1], cv2.IMREAD_UNCHANGED)
res = np.hstack((imA, imB))
cv2.imwrite(params[2], res)
if __name__ == '__main__':
# Build the list of jobs - each entry is a tuple with 2 input filenames and an output filename
jobList = []
for i in range(1000):
# Horizontally append a-XXXXX.png to b-XXXXX.png to make c-XXXXX.png
jobList.append( (f'a-{i:05d}.png', f'b-{i:05d}.png', f'c-{i:05d}.png') )
# Make a pool of processes - 1 per CPU core
with Pool() as pool:
# Map the list of jobs to the pool of processes
pool.map(doOne, jobList)

You can do this a little quicker with libvips. To join two images left-right, enter:
vips join left.png out.png result.png horizontal
To test, I made 200 pairs of 1200x800 PNGs like this:
for i in {1..200}; do cp x.png left$i.png; cp x.png right$i.png; done
Then tried a benchmark:
time parallel vips join left{}.png right{}.png result{}.png horizontal ::: {1..200}
real 0m42.662s
user 2m35.983s
sys 0m6.446s
With imagemagick on the same laptop I see:
time parallel convert left{}.png right{}.png +append result{}.png ::: {1..200}
real 0m55.088s
user 3m24.556s
sys 0m6.400s

You can do that much faster without Python, and using multi-processing with ImageMagick or libvips.
The first part is all setup:
Make 20 images, called a-000.png ... a-019.png that go from red to blue:
convert -size 64x64 xc:red xc:blue -morph 18 a-%03d.png
Make 20 images, called b-000.png ... b-019.png that go from yellow to magenta:
convert -size 64x64 xc:yellow xc:magenta -morph 18 b-%03d.png
Now append them side-by-side into c-000.png ... c-019.png
for ((f=0;f<20;f++))
do
z=$(printf "%03d" $f)
convert a-${z}.png b-${z}.png +append c-${z}.png
done
Those images look like this:
If that looks good, you can do them all in parallel with GNU Parallel:
parallel convert a-{}.png b-{}.png +append c-{}.png ::: {1..19}
Benchmark
I did a quick benchmark and made 20,000 images a-00000.png...a-019999.png and another 20,000 images b-00000.png...b-019999.png with each image 1200x800 pixels. Then I ran the following command to append each pair horizontally and write 20,000 output images c-00000.png...c-019999.png:
seq -f "%05g" 0 19999 | parallel --eta convert a-{}.png b-{}.png +append c-{}.png
and that takes 16 minutes on my MacBook Pro with all 12 CPU cores pegged at 100% throughout. Note that you can:
add spacers between the images,
write annotation onto the images,
add borders,
resize
if you wish and do lots of other processing - this is just a simple example.
Note also that you can get even quicker times - in the region of 10-12 minutes if you accept JPEG instead of PNG as the output format.

OpenCV 4.1.1.26 reports 90000.0 fps for a 25fps RTSP stream

I have an RTP/RTSP stream that's running at 25fps, as verified by ffprobe -i <URI>. Also, VLC plays back the RTSP stream at a real-time rate, but doesn't show me the FPS in the Media Information window.
However, when I use OpenCV 4.1.1.26 to retrieve the input stream's frame rate, it is giving me a response of 90000.0.
Question: How can I use OpenCV to probe for the correct frame rate of the RTSP stream? What would cause it to report 90000.0 instead of 25?
Here's my Python function to retrieve the frame rate:
import cv2
vid : cv2.VideoCapture = cv2.VideoCapture('rtsp://192.168.1.10/cam1/mpeg4')
def get_framerate(video: cv2.VideoCapture):
fps = video.get(cv2.CAP_PROP_FPS)
print('FPS is {0}'.format(fps))
get_framerate(vid)
MacOS Catalina
Python 3.7.4

I hope this helps you somehow. It is a simple calculator that takes cont captures and measure the beginning and the ending time. Then with the rule of three, i converted it to fps.
Related to you second question i read here that it could be due to bad installation. Also, you can check that your camera is working properly by printing ret variable. If it is true then you should be able to see the fps, if it is false then you can have an unpredictable result.
cv2.imshow() and key = cv2.waitKey(1) should be commented as it adds ping/delay resulting in bad measurement.
I post this as a comment because i do not have enough reputation points.
img = cv2.VideoCapture('rtsp://192.168.1.10/cam1/mpeg4')
while True:
if cont == 50:
a = datetime.now() - start
b = (a.seconds * 10e6 + a.microseconds)
print((a.seconds * 10e6 + a.microseconds), "fps = ", (50 * 10e6)/ b)
break
ret, frame = img.read()
# Comment for best test
cv2.imshow('fer', frame)
key = cv2.waitKey(1)
if key == ord('q'):
break
cont+=1
img.release()
cv2.destroyAllWindows()`

How to skip to a specific frame in a given spectrogram file

I'm encountering problems skipping ahead to a specific frame of a melspec feature set found here. The aim of getting features from the feature set is to analyse the difference in beats per second (BPS) so that i can match up the BPS of two tracks in order to mix between the two tracks or warp the timing of the track to synchronise the two pieces of music together. The feature set does specify the following:
Pre-extracted in the "feature" directory are space-delimited floating-point ASCII matrices:
beat_synchronus: one beat-synchronus vector per line
non-beat-synchronus: 512-sample hop frames # 22050Hz sample rate, one vector per line one vector per line:"
I'm not quite sure how to interpret this - is the melspec beat or non beat synchronous and how does that work in regards to delimitating frames?
I've got as far as working out the frame duration thanks to this answer but I don't know how to apply the knowledge gained from the frame duration to the task of navigating to a specific timecode or frame. The closest I've got is working out the offset divided by the frame to work out how many frames need to be skipped to get to the offset (1 second into the track for example gives 2583 frames). However, the file is not demarcated into lines and as far as I can tell is just a continuous list of entries. This leads to the question of what the size is of a given frame is (if that's the right terminology) is it the case that it is 2383 entries to the second need to be skipped to get to the right entry or is it the case that each frame has a specific number of entries and I need to skip 2583 frames of size x? what is size x (512?)?
I've been able to open the file for melspec but for the melspec file there are no delimiters between entries. It is instead a continuous list of entries.
The code I have so far is as follows to work out the duration of a frame, and therefore the number of frames in an offset track to be skipped. However this does not indicate the size of a given frame and how to access that from the file for the melspec.
spectrogram is the file_path for a given feature set. The offset is the time in seconds offset from the start of a track.
def skipToFrame(spectrogram, offset):
SAMPLE_RATE =22050
HOP_LENGTH = 512
#work out the duration of each frame.
FRAME_TIME = HOP_LENGTH/SAMPLE_RATE
# work out how many frames are in the offset period (e.g 1 second).
SHIFT_FRAMES = offset/FRAME_TIME
# readlines of file so that offset is applied.
with open(spectrogram) as feature_set:
indices = int(SHIFT_FRAMES)
for line in feature_set:
print(line)
feature_set.close()
This gives a list of 10 lines of results, which do not seem to be naturally delimited by line.

The sample file you are referring is a matrix of 128 x 7392 values.
To better understand the format of this file, you may look at the extractFeatures.py script used to extract the features. You may notice that the melspec feature is described as "non-beat-synchronus" and computed using librosa.feature.melspectrogram, using mostly default arguments and producing an output S of n_mel rows by t columns.
To figure out the value of n_mel you need to look at librosa.filters.mel, which indicates a default value of 128. The number of frames t on the other hand is computed internally in librosa.util.frame as 1 + int((len(y) - frame_length) / hop_length), where the frame_length uses the default value 2048 and the hop_length uses the default value 512.
To summarize, the 128 rows correspond to the 128 MEL-frequency bins, and the 7392 columns correspond to time frames.
You could thus use the following to extract the column of interest:
def skipToFrame(spectrogram, offset):
SAMPLE_RATE =22050
HOP_LENGTH = 512
#work out the duration of each frame.
FRAME_TIME = HOP_LENGTH/SAMPLE_RATE
# work out how many frames are in the offset period (e.g 1 second).
SHIFT_FRAMES = offset/FRAME_TIME
# readlines of file so that offset is applied.
with open(spectrogram) as feature_set:
indices = int(SHIFT_FRAMES)
for line in feature_set:
print(line.split(" ")[indices])
feature_set.close()
Using numpy you could also read the entire spectrogram and address a specific column:
import numpy as np
def skipToFrame(spectrogram, offset):
SAMPLE_RATE =22050
HOP_LENGTH = 512
#work out the duration of each frame.
FRAME_TIME = HOP_LENGTH/SAMPLE_RATE
# work out how many frames are in the offset period (e.g 1 second).
SHIFT_FRAMES = offset/FRAME_TIME
data = np.loadtxt(spectrogram)
column = int(SHIFT_FRAMES)
print(data[:,column])
Going back on the fact that the feature extraction was done using librosa, you may also consider using librosa.core.time_to_frames instead of manually computing the frame number:
def skipToFrame(spectrogram, offset):
SHIFT_FRAMES = librosa.core.time_to_frames(offset, sr=22050, hop_length=512, n_fft=2048)
...
On a final note, you should be aware that each of these time frame uses 2048 samples, but they overlap such that each successive frame advances 512 sample relative to the previous sample. So the frames cover the following time intervals:
frame # | start (s) | end (s)
================================
1 | 0.000 | 0.093
2 | 0.023 | 0.116
3 | 0.046 | 0.139
...
41 | 0.929 | 1.022
42 | 0.952 | 1.045
...
7392 | 171.619 | 171.712

GPU vs CPU end to end latency for dynamic image resizing

I have currently used OpenCV and ImageMagick for some throughput benchmarking and I am not finding working with GPU to be much faster than CPUs. Our usecase on site is to resize dynamically to the size requested from a master copy based on a service call and trying to evaluate if having GPU makes sense to resize per service call dynamically.
Sharing the code I wrote for OpenCV. I am running the following function for all the images stored in a folder serially and Ultimately I am running N such processes to achieve X number of image resizes.I want to understand if my approach is incorrect to evaluate or if the usecase doesn't fit typical GPU usecases. And what exactly might be limiting GPU performance. I am not even maximizing the utilization to anywhere close to 100%
resizeGPU.cpp:
{
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator (cv::cuda::HostMem::AllocType::PAGE_LOCKED));
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
cuda::GpuMat d_src;
d_src.upload(src,stream);
auto t_end_h2d = std::chrono::high_resolution_clock::now();
cuda::GpuMat d_dst;
cuda::resize(d_src, d_dst, Size(400, 400),0,0, CV_INTER_AREA,stream);
auto t_end_resize = std::chrono::high_resolution_clock::now();
Mat dst;
d_dst.download(dst,stream);
auto t_end_d2h = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",host2device,"<<std::chrono::duration<double, std::milli>(t_end_h2d-t_end_read).count()
<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_h2d).count()
<<",device2host,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_end_resize).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_start).count()<<endl;
}
resizeCPU.cpp:
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
Mat dst;
resize(src, dst, Size(400, 400),0,0, CV_INTER_AREA);
auto t_end_resize = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_read).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_start).count()<<endl;
Compiling : g++ -std=c++11 resizeCPU.cpp -o resizeCPU pkg-config --cflags --libs opencv
I am running each program N number of times controlled by following code : runMultipleGPU.sh
#!/bin/bash
echo $1
START=1
END=$1
for (( c=$START; c<=$END; c++ ))
do
./resizeGPU "$c" &#>/dev/null #&disown;
done
wait
echo All done
Run : ./runMultipleGPU.sh
Those timers around lead to following aggregate data
No_processes resizeCPU resizeGPU memcpyGPU totalresizeGPU
1 1.51 0.55 2.13 2.68
10 5.67 0.37 2.43 2.80
15 6.35 2.30 12.45 14.75
20 6.30 2.05 10.56 12.61
30 8.09 4.57 23.97 28.55
No of images run per process : 267
Average size of the image: 624Kb
According to data above, as we increase the number of processes(leading to increased number of simultaneous resizes) the resize perform
ance(which includes actual resize + host to device and device to host copy) increases significantly on GPU vs CPU.
Similar results after using ImageMagick which uses OpenCL beneath
Code :
setenv("MAGICK_OCL_DEVICE","OFF",1); //Turn in ON to use GPU acceleration
Image image;
auto t_start_read = std::chrono::high_resolution_clock::now();
image.read( full_path );
auto t_end_read = std::chrono::high_resolution_clock::now();
image.resize( Geometry(400,400) );
auto t_end_resize = std::chrono::high_resolution_clock::now();
Results :
No_procs resizeCPU resizeGPU
1 63.23 8.54
10 76.16 31.04
15 76.56 50.79
20 76.58 71.68
30 86.29 140.17
Test Machine configuration:
4 GPU (Tesla P100) - but test only utilizes 1 GPU
64 CPU cores (over Intel Xeon 2680 v4 CPU )
OpenCV version : 3.4.0
ImageMagick version : 6.9.9-26 Q16 x86_64 2018-01-17
Cuda Toolkit : 9.0

Highly propable this is too late to help you. However for people looking at this answer this is my suggestion to improve performance. The way you are setting pinned memory does not give you the boost you are looking for.
This is: Using
//method 1
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator(cv::cuda::HostMem::AllocType::PAGE_LOCKED));
In the comments of this discussion. Somebody suggested doing as you. The person answering said that it was slower. I was timing the implementation of a sobel derivatives close to the one in Coldvision.io Sobel The main steps are:
Read color image
gaussian blurring of the color image using a
radius of 3 and a delta of 1;
grayscale conversion;
computing the x and y gradiants
merging them into the final output image.
Instead I implemented a version swaping the order of step 2 and 3. Converting to gray scale first and then denoising the result by passing a gaussian.
I was running openCV 3.4 in windows 10. Cuda 9.0. My CPU is an i7-6820HQ. GPU is a Quadro M1200.
I try your method and this one:
//Method 2
//allocate pinned memory
cv::cuda::HostMem memory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
//Read input image from the disk
Mat input = imread(input_file, CV_LOAD_IMAGE_COLOR);
if (input.empty())
{
std::cout << "Image Not Found: " << input_file << std::endl;
return;
}
input.copyTo(memory);
// copy the input image from CPU to GPU memory
cuda::GpuMat gpuInput;
cv::cuda::Stream stream;
gpuInput.upload(memory, stream);
//Do your processing...
//allocate pinned memory for output
cv::cuda::HostMem outMemory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
gpuOutput.download(outMemory, stream);
cv::Mat output = outMemory.createMatHeader();
I calculated the gain as: (t1-t2)/t1*100. Where t1 is the time running the code normally. t2 running it using pinned memory. The negative values is when the method is slower than running in non-pinned memory.
image size Gain % Method 1 Gain % Method 2
800x600 2.9 8.2
1280x1024 2.5 15.3
1600x1200 0.2 7.0
2048x1536 -2.3 14.6
4096x3072 -1.0 17.2

openCV RunningAvg implementation

I am writing a small script (in Python) that generates and updates a running average of a camera feed. When I call cv.RunningAvg it returns:
cv2.error: func != 0
Where am I stumbling in implementing cv.RunningAvg? Script follows:
import cv
feed = cv.CaptureFromCAM(0)
frame = cv.QueryFrame(feed)
moving_average = cv.QueryFrame(feed)
cv.NamedWindow('live', cv.CV_WINDOW_AUTOSIZE)
def loop():
frame = cv.QueryFrame(feed)
cv.ShowImage('live', frame)
c = cv.WaitKey(10)
cv.RunningAvg(frame, moving_average, 0.020, None)
while True:
loop()

I am not sure about the error, but check out the documentation for cv.RunningAvg
It says destination should be 32 or 64-bit floating point.
So I made a small correction in your code and it works. I created a 32-bit floating point image to store running average values, then another 8 bit image so that I can show running average image :
import cv2.cv as cv
feed = cv.CaptureFromCAM(0)
frame = cv.QueryFrame(feed)
moving_average = cv.CreateImage(cv.GetSize(frame),32,3) # image to store running avg
avg_show = cv.CreateImage(cv.GetSize(frame),8,3) # image to show running avg
def loop():
frame = cv.QueryFrame(feed)
c = cv.WaitKey(10)
cv.RunningAvg(frame, moving_average, 0.1, None)
cv.ConvertScaleAbs(moving_average,avg_show) # converting back to 8-bit to show
cv.ShowImage('live', frame)
cv.ShowImage('avg',avg_show)
while True:
loop()
cv.DestroyAllWindows()
Now see the result :
At a particular instant, I saved a frame and its corresponding running average frame.
Original frame :
You can see the obstacle (my hand) blocks the objects in behind.
Now running average frame :
It almost removed my hand and shows objects in background.
That is how it is a good tool for background subtraction.
One more example from a typical traffic video :
You can see more details and samples here : http://opencvpython.blogspot.com/2012/07/background-extraction-using-running.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart