Picasso producing OutOfMemoryError - picasso

We are using Picasso to load all images in our app, from small avatars to large full screen images and we are getting 1 of these errors for every 10 daily active users. The Picasso cache is filling up, but our understanding that it should maintain itself.
Our logs indicate these errors are occurring most often when loading the large full screen images (1080x1920) and large avatars (720x720) on high end devices (Galaxy S4), but occasionally on small avatars (135x135).
com.couchsurfing.mobile.data.PicassoException: Error while loading image with Picasso
at com.couchsurfing.mobile.data.DataModule$1.onImageLoadFailed(DataModule.java:158)
at com.squareup.picasso.Picasso.complete(Picasso.java:374)
at com.squareup.picasso.Picasso$1.handleMessage(Picasso.java:97)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:137)
at android.app.ActivityThread.main(ActivityThread.java:5419)
at java.lang.reflect.Method.invokeNative(Method.java)
at java.lang.reflect.Method.invoke(Method.java:525)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1187)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)
at dalvik.system.NativeStart.main(NativeStart.java)
Caused by: java.lang.RuntimeException: ===============BEGIN PICASSO STATS ===============
Memory Cache Stats
Max Cache Size: 19173961
Cache Size: 17988408
Cache % Full: 94
Cache Hits: 228
Cache Misses: 244
Network Stats
Download Count: 131
Total Download Size: 3375735
Average Download Size: 25768
Bitmap Stats
Total Bitmaps Decoded: 206
Total Bitmap Size: 144932008
Total Transformed Bitmaps: 160
Total Transformed Bitmap Size: 40233240
Average Bitmap Size: 703553
Average Transformed Bitmap Size: 195306
===============END PICASSO STATS ===============
at com.squareup.picasso.BitmapHunter.run(BitmapHunter.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:390)
at java.util.concurrent.FutureTask.run(FutureTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1080)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573)
at java.lang.Thread.run(Thread.java:841)
at com.squareup.picasso.Utils$PicassoThread.run(Utils.java:394)
Caused by: java.lang.OutOfMemoryError
at android.graphics.BitmapFactory.nativeDecodeStream(BitmapFactory.java)
at android.graphics.BitmapFactory.decodeStream(BitmapFactory.java:623)
at com.squareup.picasso.NetworkBitmapHunter.decodeStream(NetworkBitmapHunter.java:118)
at com.squareup.picasso.NetworkBitmapHunter.decode(NetworkBitmapHunter.java:72)
at com.squareup.picasso.BitmapHunter.hunt(BitmapHunter.java:144)
at com.squareup.picasso.BitmapHunter.run(BitmapHunter.java:101)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:390)
at java.util.concurrent.FutureTask.run(FutureTask.java:234)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1080)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573)
at java.lang.Thread.run(Thread.java:841)
at com.squareup.picasso.Utils$PicassoThread.run(Utils.java:394)

I've had same issue and one of the temporary workarounds is
<application
...
android:largeHeap="true">
For now, I haven't found another solution (maybe I've searched not enough). But many people have OOM with Picasso.

Are you calling fit() when loading images? I suspect this was my issue, I assumed images down sampled themselves with Picasso but you still need to tell it do so.

Related

Change video stream resolution in YoloV4 demo

Here's what shows when loading the live stream demo for Yolov4:
Webcam index: 2
[ WARN:0] global ../modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
Video stream: 2304 x 1536
Objects:
Then it starts finding objects with 2 fps.
How do I change the video stream resolution to 1080p or 720p? The frame rate is very slow and this appears to be the fix.
Can't find it within the makefile or cfg folder. Any thoughts? Is this an opencv problem?
Thanks!
cfg settings:
[net]
batch=64
subdivisions=8
# Training
#width=512
#height=512
width=320
height=320
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.0013
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1
I tried with the built-in camera and connected my phone(IP) and got 1080 on both with smooth results. I didn't find anywhere to change the webcam settings which are stuck on 2304x1536. Where would camera settings be located?
After searching around for a solution to this issue myself I finally found it!
In the darknet/src/ folder is a file named "image_opencv.cpp". At lines 597 and 598 you will find the following 2 commented commands:
//cap->set(CV_CAP_PROP_FRAME_WIDTH, 1280);
&
//cap->set(CV_CAP_PROP_FRAME_HEIGHT, 960);
After trying out these commands a lot more errors showed up, this is due to yolov4 (and my install) using OpenCV 4.1.1. Which has a different syntax. Your resolution should change to 1920x1080 if you replace the two aforementioned commands with these:
cap->set(cv::CAP_PROP_FRAME_WIDTH, 1920);
cap->set(cv::CAP_PROP_FRAME_HEIGHT, 1080);
Notice that the comment slashes have been removed as to activate the commands.

Handling Xarray/Dask Memory

I'm trying to use Xarray and Dask to open a multi-file dataset. However, I'm running into memory errors.
I have files that are typically this shape:
xr.open_dataset("/work/ba0989/a270077/coupled_ice_paper/model_data/coupled/LIG_coupled/outdata/fesom//LIG_coupled_fesom_thetao_19680101.nc")
<xarray.Dataset>
Dimensions: (depth: 46, nodes_2d: 126859, time: 366)
Coordinates:
* time (time) datetime64[ns] 1968-01-02 1968-01-03 ... 1969-01-01
* depth (depth) float64 -0.0 10.0 20.0 30.0 ... 5.4e+03 5.65e+03 5.9e+03
Dimensions without coordinates: nodes_2d
Data variables:
thetao (time, depth, nodes_3d) float32 ...
Attributes:
output_schedule: unit: d first: 1 rate: 1
30 files --> 41.5 GB
I also can set up a dask.distributed Client object:
Client()
<Client: 'tcp://127.0.0.1:43229' processes=8 threads=48, memory=68.72 GB>
So, if I suppose there is enough memory for the data to be loaded. However, when I then run xr.open_mfdataset, I very often get these sorts of warnings:
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 8.25 GB -- Worker memory limit: 8.59 GB
I guess there is something I can do with the chunks argument?
Any help would be very appreciated; unfortunately I'm not sure where to begin trying. I could, in principle, open just the first file (they will always have the same shape) to figure out how to ideally rechunk the files.
Thanks!
Paul
Examples of the chunks and parallel keywords to the opening functions, which correspond to how you utilise dask, can be found in this doc section.
That should be all you need!

python opencv create image from bytearray

I am capturing video from a Ricoh Theta V camera. It delivers the video as Motion JPEG (MJPEG). To get the video you have to do an HTTP POST alas which means I cannot use the cv2.VideoCapture(url) feature.
So the way to do this per numerous posts on the web and SO is something like this:
bytes = bytes()
while True:
bytes += stream.read(1024)
a = bytes.find(b'\xff\xd8')
b = bytes.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = bytes[a:b+2]
bytes = bytes[b+2:]
i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
cv2.imshow('i', i)
if cv2.waitKey(1) == 27:
exit(0)
That actually works, except it is slow. I'm processing a 1920x1080 jpeg stream. on a Mac Book Pro running OSX 10.12.6. The call to imdecode takes approx 425000 microseconds to process each image
Any idea how to do this without imdecode or make imdecode faster? I'd like it to work at 60FPS with HD video (at least).
I'm using Python3.7 and OpenCV4.
Updated Again
I looked into JPEG decoding from the memory buffer using PyTurboJPEG, the code goes like this to compare with OpenCV's imdecode():
#!/usr/bin/env python3
import cv2
from turbojpeg import TurboJPEG, TJPF_GRAY, TJSAMP_GRAY
# Load image into memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# Decode JPEG from memory into Numpy array using OpenCV
i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
# Use default library installation
jpeg = TurboJPEG()
# Decode JPEG from memory using turbojpeg
i1 = jpeg.decode(r)
cv2.imshow('Decoded with TurboJPEG', i1)
cv2.waitKey(0)
And the answer is that TurboJPEG is 7x faster! That is 4.6ms versus 32.2ms.
In [18]: %timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
32.2 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [19]: %timeit i1 = jpeg.decode(r)
4.63 ms ± 55.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Kudos to #Nuzhny for spotting it first!
Updated Answer
I have been doing some further benchmarks on this and was unable to verify your claim that it is faster to save an image to disk and read it with imread() than it is to use imdecode() from memory. Here is how I tested in IPython:
import cv2
# First use 'imread()'
%timeit i1 = cv2.imread('image.jpg', cv2.IMREAD_COLOR)
116 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Now prepare the exact same image in memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# And try again with 'imdecode()'
%timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
113 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, I find imdecode() around 3% faster than imread() on my machine. Even if I include the np.asarray() into the timing, it is still quicker from memory than disk - and I have seriously fast 3GB/s NVME disks on my machine...
Original Answer
I haven't tested this but it seems to me that you are doing this in a loop:
read 1k bytes
append it to a buffer
look for JPEG SOI marker (0xffdb)
look for JPEG EOI marker (0xffd9)
if you have found both the start and the end of a JPEG frame, decode it
1) Now, most JPEG images with any interesting content I have seen are between 30kB to 300kB so you are going to do 30-300 append operations on a buffer. I don't know much abut Python but I guess that may cause a re-allocation of memory, which I guess may be slow.
2) Next you are going to look for the SOI marker in the first 1kB, then again in the first 2kB, then again in the first 3kB, then again in the first 4kB - even if you have already found it!
3) Likewise, you are going to look for the EOI marker in the first 1kB, the first 2kB...
So, I would suggest you try:
1) allocating a bigger buffer at the start and acquiring directly into it at the appropriate offset
2) not searching for the SOI marker if you have already found it - e.g. set it to -1 at the start of each frame and only try and find it if it is still -1
3) only look for the EOI marker in the new data on each iteration, not in all the data you have already searched on previous iterations
4) furthermore, actually, don't bother looking for the EOI marker unless you have already found the SOI marker, because the end of a frame without the corresponding start is no use to you anyway - it is incomplete.
I may be wrong in my assumptions, (I have been before!) but at least if they are public someone cleverer than me can check them!!!
I recommend to use turbo-jpeg. It has a python API: PyTurboJPEG.

ImageMagick memory usage

I have 100 PNG-files and each of them is 8250x4090 big. I need to append them with Imagemagick to one big PNG-file (82500 x 40900) so that I have 10 rows and 10 columns . I know how the code must look like but I get the errors: convert.exe: unable to extend cache
`C:\Row_345.png': No space left on device # error/cache.c/OpenPixelCache/3689.
convert.exe: Memory allocation failed `C:\Row_345.png' # error/png.c/WriteOnePNGImage/8725.
First question: How much space is needed (approximately)? I have 8 GB of Ram and 30 GB free SSD and it wasn't enough. The pictures have polygons and lines in up to 5 different colors. The biggest PNG is 300 KB)
Second question: Is there a way how to make it more clever so that it won't use that much space?
ImageMagick needs 8 bytes per pixel if you are using a Q16 build. A Q8 build only needs 4 bytes per pixel.
82500 * 40900 * 8 = about 27Gbytes
82500 * 40900 * 4 = about 13.5 Gbytes
The size of the PNG is irrelevant; ImageMagick stores them uncompressed.
Possibly ImageMagick is trying to hold two copies -- your 100 small images plus the large result. It may be that you'll have enough memory plus disk to run your conversion with ImageMagick-Q8.
Try doing just a single row of 10 at a time, ten times - so you get 10 rows of 10. Then do row1 plus row2. Then rows 1&2 plus row 3.
convert 1.png 2.png 3.png ... +append row1.png
convert 11.png 12.png 13.png ... +append row2.png
...
convert 91.png 92.png 93.png ... +append row10.png
Then
convert row1.png row2.png -append row1and2.png
You can add -debug cache to your ImageMagick convert command like this:
convert -debug cache 1.png 2.png 3.png ... +append row1.png
You can also look at your resource settings as to what is available to ImageMagick like this:
identify -list resource
File Area Memory Map Disk Thread Time
-------------------------------------------------------------------------------
768 1.0386GB 3.8692GiB 7.7384GiB unlimited 4 unlimited
And increase resources like this:
convert -limit memory 32MiB ...

Improve performance with libpng

I have a microcontroller with a LCD display. I need to display several PNG images. Since the performance of the microcontroller is limited the time to display an image is too large.
I made benchmarks and detected that the most time is spent in the libpng and not in accessing the display memory or the storage where the (compressed) file is located.
I can manipulate the PNG files before transferring them to the microcontroller.
The data is actually be read inside the callback function registerd with png_set_read_fn.
Edit:
The pictures are encoded with 8 bits per color plus transparency resulting in 32 bits per pixel. But most of the pictures have gray colors.
Here is the sequence of functions that I use to convert:
png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, 0, show_png_error, show_png_warn);
info_ptr = png_create_info_struct(png_ptr);
end_info = png_create_info_struct(png_ptr);
png_set_user_limits(png_ptr, MAX_X, MAX_Y);
png_set_read_fn(png_ptr, 0, &read_callback);
png_set_sig_bytes(png_ptr, 0);
png_read_info(png_ptr, info_ptr);
png_read_update_info(png_ptr, info_ptr);
result->image = malloc(required_size);
height = png_get_image_height(png_ptr, info_ptr);
png_bytep *row_pointers = malloc(sizeof(void*) * height);
for (i = 0; i < height; ++i)
row_pointers[i] = result->image + (i * png_get_rowbytes(png_ptr, info_ptr));
png_set_invert_alpha(png_ptr);
png_read_image(png_ptr, row_pointers);
png_read_end(png_ptr, end_info);
free(row_pointers);
png_destroy_read_struct(&png_ptr, &info_ptr, &end_info);
What parameters should be considered to get the fastest decompression?
It depends upon the nature of the images.
For photos, pngcrush method 12 (filter type 1, zlib strategy 2, zlib level 2) works well. For images with 256 or fewer colors, method 7 (filter type 0, zlib level 9, zlib strategy 0) works well.
Method 12 also happens to be a very fast compressor but as I understand it, that does not matter to you. zlib strategy 2 is Huffman-only compression so the result is the same for any non-zero zlib compression level.
In your code, to obtain the same behavior as pngcrush method 7, use
png_set_compression_level(png_ptr, 9);
png_set_compression_strategy(png_ptr, 0);
png_set_filter(png_ptr,PNG_FILTER_NONE);
and to get pngcrush method 12 behavior,
png_set_compression_level(png_ptr, 2);
png_set_compression_strategy(png_ptr, 2);
png_set_filter(png_ptr,PNG_FILTER_SUB);

Resources