Golang async face detection - opencv

I'm using an OpenCV binding library for Go and trying to asynchronously detect objects in 10 images but keep getting this panic. Detecting only 4 images never fails.
var wg sync.WaitGroup
for j := 0; j < 10; j++ {
wg.Add(1)
go func(i int) {
image := opencv.LoadImage(strconv.Itoa(i) + ".jpg")
defer image.Release()
faces := cascade.DetectObjects(image)
fmt.Println((len(faces) > 0))
wg.Done()
}(j)
}
wg.Wait()
I'm fairly new to OpenCV and Go and trying to figure where the problem lies. I'm guessing some resource is being exhausted but which one.

Each time you call DetectObjects the underlying implementation of OpenCV builds a tree of classifiers and stores them inside of cascade. You can see part of the handling of these chunks of memory at https://github.com/Itseez/opencv/blob/master/modules/objdetect/src/haar.cpp line 2002
Your original code only had one cascade as a global. Each new go routine call DetectObjects used the same root cascade. Each new image would have free'd the old memory and rebuilt a new tree and eventually they would stomp on each other's memory use and cause a dereference through 0, causing the panic.
Moving the allocation of the cascade inside the goroutine allocates a new one for each DetectObject call and they do not share any memory.
The fact that it never happened on 4 images, but failed on 5 images is the nature of computing. You got lucky with 4 images and never saw the problem. You always saw the problem on 5 images because exactly the same thing happened each time (regardless of concurrency).
Repeating the same image multiple times doesn't cause the cascade tree to be rebuilt. If the image didn't change why do the work ... an optimization in OpenCV to handle multiple images frames.

The problem seemed to be having the cascade as a global variable.
Once I moved
cascade := opencv.LoadHaarClassifierCascade("haarcascade_frontalface_alt.xml")
into the goroutine all was fine.

You are not handling for a nil image
image := opencv.LoadImage(strconv.Itoa(i) + ".jpg")
if image == nil {
// handle error
}

Related

OpenCV CUDA API very slow at the first call

I am using cuda::resize to resize a vector of images (in GpuMat)
It shows the first call takes ~15ms, and the rests only take ~0.3ms. So I want to ask if there are ways to shorten the time of the first call.
Here is my code(simplified):
for (int i = 0; i < num_images; ++i)
{
full_img = v_GpuMat[i].clone(); // vGpuMat is vector of images in cuda::GpuMat
seam_scale = 0.4377;
cuda::resize(full_img, img, Size(), seam_scale, seam_scale, INTER_LINEAR);
}
Thank you very much.
CUDA device memory allocations and copying data from device to host and vice versa are very slow. Please try allocale memory and load data outside the main loop. Cloning matrix allocates new device memory each time, try to use copying data instead of cloning it should speed up your code.
After checking the result in Nvidia-Visual-Profile, I found it is the cudaLaunchKernel that takes ~20ms and will be called only the first time.
If you have to make a continuous process like me, maybe one of the solution can be a dry run before you process your own tasks. As for this sample, make a cuda::resize out of the loop is much faster.

Neo4J huge performance degradation after records added to spatial layer

So I have around 70 million spatial records that i want to add to the spatial layer (I've tested with a small set and everything is smoothly, queries returning the same results as postgis and the layer operation seems fine)
But when i try to add all the spatial records to the database, the performance degrades rapidly to the point that it gets really slow at around 5 million (around 2h running time) records and hangs at ~7.7 million (8 hours lapsed).
Since the spatial index is an Rtree that uses the graph structure to construct itself, i am wondering why is it degrading when the number os records increase.
Rtree insertions are O(n) if im not mistaken and thats why im concerned it might be something between the rearranging of bounding boxes, nodes that are not tree leaves that are causing the addToLayer process to get slower over time.
Currently im adding nodes to the layer like that (lots of hardcoded stuff since im trying to figure out the problem before patterns and code style):
Transaction tx = database.beginTx();
try {
ResourceIterable<Node> layerNodes = GlobalGraphOperations.at(database).getAllNodesWithLabel(label);
long i = 0L;
for (Node node : layerNodes) {
Transaction tx2 = database.beginTx();
try {
layer.add(node);
i++;
if (i % commitInterval == 0) {
log("indexing (" + i + " nodes added) ... time in seconds: "
+ (1.0 * (System.currentTimeMillis() - startTime) / 1000));
}
tx2.success();
} finally {
tx2.close();
}
}
tx.success();
} finally {
tx.close();
}
Any thoughts ? Any ideas of how performance could be increased ?
ps.: using java API
Neo4j 2.1.2, Spatial 0.13
Core i5 3570k #4.5Ghz, 32GB ram
dedicated 2TB 7200 hard drive to the database (no OS, no virtual memory files, only the data itself)
ps2.: All geometries are LineStrings (if thats important :P) they represent streets, roads, etc..
ps3.: the nodes are already in the database, i only need to add them to the Layer so that i can perform spatial queries, bbox and wkb attributes are OK, tested and working for a small set.
Thank you in advance
After altering and running the code again (which takes 5hours only to insert the points into the database, no layer involved) this happened, will try to increase the jvm heap and the embeddedgraph memory parameters.
indexing (4020000 nodes added) ... time in seconds: 8557.361
Exception in thread "main" org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:140)
at gis.CataImporter.addDataToLayer(CataImporter.java:263)
at Neo4JLoadData.addDataToLayer(Neo4JLoadData.java:138)
at Neo4JLoadData.main(Neo4JLoadData.java:86)
Caused by: javax.transaction.SystemException: Kernel has encountered some problem, please perform neccesary action (tx recovery/restart)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.neo4j.kernel.impl.transaction.KernelHealth.assertHealthy(KernelHealth.java:61)
at org.neo4j.kernel.impl.transaction.TxManager.assertTmOk(TxManager.java:339)
at org.neo4j.kernel.impl.transaction.TxManager.getTransaction(TxManager.java:725)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:119)
... 3 more
Caused by: javax.transaction.xa.XAException
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:560)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:448)
at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:385)
at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:123)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:124)
at gis.CataImporter.addDataToLayer(CataImporter.java:256)
... 2 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.neo4j.kernel.impl.nioneo.store.DynamicRecord.clone(DynamicRecord.java:179)
at org.neo4j.kernel.impl.nioneo.store.PropertyBlock.clone(PropertyBlock.java:215)
at org.neo4j.kernel.impl.nioneo.store.PropertyRecord.clone(PropertyRecord.java:221)
at org.neo4j.kernel.impl.nioneo.xa.Loaders$2.clone(Loaders.java:118)
at org.neo4j.kernel.impl.nioneo.xa.Loaders$2.clone(Loaders.java:81)
at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.ensureHasBeforeRecordImage(RecordChanges.java:217)
at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.prepareForChange(RecordChanges.java:162)
at org.neo4j.kernel.impl.nioneo.xa.RecordChanges$RecordChange.forChangingData(RecordChanges.java:157)
at org.neo4j.kernel.impl.nioneo.xa.PropertyCreator.primitiveChangeProperty(PropertyCreator.java:64)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransactionContext.primitiveChangeProperty(NeoStoreTransactionContext.java:125)
at org.neo4j.kernel.impl.nioneo.xa.NeoStoreTransaction.nodeChangeProperty(NeoStoreTransaction.java:1244)
at org.neo4j.kernel.impl.persistence.PersistenceManager.nodeChangeProperty(PersistenceManager.java:119)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation$1.visitNodePropertyChanges(KernelTransactionImplementation.java:344)
at org.neo4j.kernel.impl.api.state.TxStateImpl$6.visitPropertyChanges(TxStateImpl.java:238)
at org.neo4j.kernel.impl.api.state.PropertyContainerState.accept(PropertyContainerState.java:187)
at org.neo4j.kernel.impl.api.state.NodeState.accept(NodeState.java:148)
at org.neo4j.kernel.impl.api.state.TxStateImpl.accept(TxStateImpl.java:160)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.createTransactionCommands(KernelTransactionImplementation.java:332)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.prepare(KernelTransactionImplementation.java:123)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.prepareKernelTx(XaResourceManager.java:900)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:510)
at org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
at org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:548)
... 7 more
28/07 -> Increasing memory did not help, now im testing some modifications in the RTreeIndex and LayerRTreeIndex (what exactly does the field maxNodeReferences does ?
// Constructor
public LayerRTreeIndex(GraphDatabaseService database, Layer layer) {
this(database, layer, 100);
}
public LayerRTreeIndex(GraphDatabaseService database, Layer layer, int maxNodeReferences) {
super(database, layer.getLayerNode(), layer.getGeometryEncoder(), maxNodeReferences);
this.layer = layer;
}
It is hardcoded to 100, and changing its value changes when (number of nodes added wise) my addToLayer method crashes into OutOfMemory error, If im not mistaken, changing that field's value increases or decreases the tree's width and depth (being 100 wider than 50, and 50 being deeper than 100).
To summarize the progress so far:
Incorrect usage of transactions corrected by #Jim
Memory Heap increased to 27GB following #Peter 's advice
3 spatial layers to go, but now the problem gets real because they're the big ones.
Did some memory profiling while adding nodes to the spatial layer and i found interesting points.
Memory and GC profiling: http://postimg.org/gallery/biffn9zq/
The type that uses the most memory througout the entire process is the byte[], which i can only assume belongs to the geometries wkb properties (either the geometry itself or the rtree's bbox).
Having that in mind, I also noticed (you can check on the new profiling images) that the ammount of heap space used never goes below the 18GB mark.
According to this question are java primitives garbage collected
primitive types in java are raw data, therefore not being subjected to garbage collection, and are only freed from the method's stack when the method returns (so maybe when i create a new spatial layer, all those wkb byte arrays will remain in memory until I manually close the layer object).
Does that make any sense ? isnt there a better way to manage memory resources so that the layer doesnt keep unused, old data loaded ?
Catacavaco,
You are doing each add as a separate transaction. To make use of your commitInterval, you need to change your code to something like this.
Transaction tx = database.beginTx();
try {
ResourceIterable<Node> layerNodes = GlobalGraphOperations.at(database).getAllNodesWithLabel(label);
long i = 0L;
for (Node node : layerNodes) {
layer.add(node);
i++;
if (i % commitInterval == 0) {
tx.success();
tx.close();
log("indexing (" + i + " nodes added) ... time in seconds: "
+ (1.0 * (System.currentTimeMillis() - startTime) / 1000));
tx = database.beginTx();
}
}
tx.success();
} finally {
tx.close();
}
See if this helps.
Grace and peace,
Jim
Looking at Error java.lang.OutOfMemoryError: GC overhead limit exceeded, there might be some excessive object creation going on. From your profiling results it doesn't look like it, could you double check?
Finally solved the problem with three fixes:
setting cache_type=none
increasing heap size for neostore low level graph engine and
setting use_memory_mapped_buffers=true so that memory management is done by the OS and not the slowish JVM
that way, my custom batch insertion in the spatial layers went much faster, and without any errors/exceptions
Thanks for all the help provided, i guess my answer is just a combination of all the tips people provided here, thanks very much

Leaking memory in detectMultiScale when detecting faces

I'm using the following function (successfully) to detect faces using OpenCV in iOS, but it seems to leak 4-5Mb of memory every second according to Instruments.
The function is called from processFrame() regularly.
By a process of elimination it's the line that called detectMultiScale on the face_cascade that's causing the problem.
I've tried surrounding sections with an autoreleasepool (as I've had that issue before releasing memory on non-UI threads when doing video processing) but that didn't make a difference.
I've also tried forcing the faces Vector to release its memory, but again to no avail.
Does anyone have any ideas?
- (bool)detectAndDisplay :(Mat)frame
{
BOOL bFaceFound = false;
vector<cv::Rect> faces;
Mat frame_gray;
cvtColor(frame, frame_gray, CV_BGRA2GRAY);
equalizeHist(frame_gray, frame_gray);
// the following line leaks 5Mb of memory per second
face_cascade.detectMultiScale(frame_gray, faces, 1.1, 2, 0 | CV_HAAR_SCALE_IMAGE, cv::Size(100, 100));
for(unsigned int i = 0; i < faces.size(); ++i)
{
rectangle(frame, cv::Point(faces[i].x, faces[i].y),
cv::Point(faces[i].x + faces[i].width, faces[i].y + faces[i].height),
cv::Scalar(0,255,255));
bFaceFound = true;
}
return bFaceFound;
}
I am using the same source code as you, suffering from exactly same problem - memory leakage. The only differences are: I use Qt5 for Windows and I am loading separate .jpg images (thousands of them actually). I have tried same techniques to prevent the crashes, but in vain. I wonder whether you already solved the problem?
The similar issue is described here (bold paragraph, at the real bottom of the page), however this is write-up for the previous version of OpenCV interface. Author says:
The above code (function detect and draw) has a serious memory leak when run in an infinite for loop for real time face detection.
My humble guess is, the leakage is caused by badly handled resources inside detectMultiScale() method. I did not check it out yet, but cvHaarDetectObjects() method explained here, might be a better solution (however using an old version of OpenCV is probably not the best idea).
Combined with a suggestion from previous link (add this line at the end of operations: cvReleaseMemStorage(&storage)), the leakage should be plugged.
Writing this post made me want to try this out, so I will let you know as soon as I know whether this works or not.
EDIT : My guess was probably wrong. Try to simply clear 'faces' vector after every detection instead of releasing it's memory. I am running the script now for quite a piece of time, few hundreds of faces detected, still no sign of problems.
EDIT 2 : Yep, this is it. Just add faces.clean() after every detection. Everything will work just fine. Cheers.

Go: extensive memory usage when reusing map keys

As part of my Go tutorial, I'm writing simple program counting words across multiple files. I have a few go routines for processing files and creating map[string]int telling how many occurrence of particular word have been found. The map is then being sent to reducing routine, which aggregates values to a single map. Sounds pretty straightforward and looks like a perfect (map-reduce) task for Go!
I have around 10k document with 1.6 million unique words. What I found is my memory usage is growing fast and constantly while running the code and I'm running out of memory at about half way of processing (12GB box, 7GB free). So yes, it uses gigabytes for this small data set!
Trying to figure out where the problem lies, I found the reducer collecting and aggregating data is to blame. Here comes the code:
func reduceWords (input chan map[string]int, output chan int) {
total := make(map[string]int)
for wordMap := range input {
for w, c := range wordMap {
total[w] += c
}
}
output <- len(total)
}
If I remove the map from the sample above the memory stays within reasonable limits (a few hundred megabytes). What I found though, is taking copy of a string also solves the problem, i.e. the following sample doesn't eat up my memory:
func reduceWords (input chan map[string]int, output chan int) {
total := make(map[string]int)
for wordMap := range input {
for w, c := range wordMap {
copyW := make([]byte, len(w)) // <-- will put a copy here!
copy(copyW, w)
total[string(copyW)] += c
}
}
output <- len(total)
}
Is it possible it's a wordMap instance not being destructed after every iteration when I use the value directly? (As a C++ programmer I have limited intuition when comes to GC.) Is it desirable behaviour? Am I doing something wrong? Should I be disappointed with Go or rather with myself?
Thanks!
What does your code look like that turns files into strings? I would look for a problem there. If you are converting large blocks (whole files maybe?) to strings, and then slicing those into words, then you are pinning the entire block if you save any one word. Try keeping the blocks as []byte, slicing those into words, and then converting words to the string type individually.

Can immutable be a memory hog?

Let's say we have a memory-intensive class like an Image, with chainable methods like Resize() and ConvertTo().
If this class is immutable, won't it take a huge amount of memory when I start doing things like i.Resize(500, 800).Rotate(90).ConvertTo(Gif), compared to a mutable one which modifies itself? How to handle a situation like this in a functional language?
If this class is immutable, won't it take a huge amount of memory?
Typically your memory requirements for that single object might double, because you might have an "old copy" and a "new copy" live at once. So you can view this phenomenon, over the lifetime of the program, as having one more large object allocated than you might in a typical imperative program. (Objects that aren't being "worked on" just sit there, with the same memory requirements as in any other language.)
How to handle a situation like this in a functional language?
Do absolutely nothing. Or more accurately, allocate new objects in good health.
If you are using an implementation designed for functional programming, the allocator and garbage collector are almost certainly tuned for high allocation rates, and everything will be fine. If you have the misfortune to try to run functional code on the JVM, well, performance won't be quite as good as with a bespoke implementation, but for most programs it will still be fine.
Can you provide more detail?
Sure. I'm going to take an exceptionally simple example: 1000x1000 greyscale image with 8 bits per pixel, rotated 180 degrees. Here's what we know:
To represent the image in memory requires 1MB.
If the image is mutable, it's possible to rotate 180 degrees by doing an update in place. The amount of temporary space needed is enough to hold one pixel. You write a doubly nested loop that amounts to
for (i in columns) do
for (j in first half of rows) do {
pixel temp := a[i, j];
a[i, j] := a[width-i, height-j];
a[width-i, height-j] := tmp
}
If the image is immutable, it's required to create an entire new image, and temporarily you have to hang onto the old image. The code is something like this:
new_a = Image.tabulate (width, height) (\ x y -> a[width-x, height-y])
The tabulate function allocates an entire, immutable 2D array and initializes its contents. During this operation, the old image is temporarily occupying memory. But when tabulate completes, the old image a should no longer be used, and its memory is now free (which is to say, eligible for recycling by the garbage collector). The amount of temporary space required, then, is enough to hold one image.
While the rotation is going on, there's no need to have copies of objects of other classes; the temporary space is needed only for the image being rotated.
N.B. For other operations such as rescaling or rotating a (non-square) image by 90 degrees, it is quite likely that even when images are mutable, a temporary copy of the entire image is going to be necessary, because the dimensions change. On the other hand, colorspace transformations and other computations which are done pixel by pixel can be done using mutation with a very small temporary space.
Yes. Immutability is a component of the eternal time-space tradeoff in computing: you sacrifice memory in exchange for the increased processing speed you gain in parallelism by foregoing locks and other concurrent access control measures.
Functional languages typically handle operations of this nature by chunking them into very fine grains. Your Image class doesn't actually hold the logical data bits of the image; rather, it uses pointers or references to much smaller immutable data segments which contain the image data. When operations need to be performed on the image data, the smaller segments are cloned and mutated, and a new copy of the Image is returned with updated references -- most of which point to data which has not been copied or changed and has remained intact.
This is one reason why functional design requires a different fundamental thought process from imperative design. Not only are algorithms themselves laid out very differently, but data storage and structures need to be laid out differently as well to account for the memory overhead of copying.
In some cases, immutability forces you to clone the object and needs to allocate more memory. It doesn't necessary occupy the memory, because older copies can be discarded. For example, the CLR garbage collector deals with this situation quite well, so this isn't (usually) a big deal.
However, chaining of operations doesn't actually mean cloning the object. This is certainly the case for functional lists. When you use them in the typical way, you only need to allocate a memory cell for a single element (when appending elements to the front of the list).
Your example with image processing can be also implemented in a more efficient way. I'll use C# syntax to keep the code easy to understand without knowing any FP (but it would look better in a usual functional language). Instead of actually cloning the image, you could just store the operations that you want to do with the image. For example something like this:
class Image {
Bitmap source;
FileFormat format;
float newWidth, newHeight;
float rotation;
// Public constructor to load the image from a file
public Image(string sourceFile) {
this.source = Bitmap.FromFile(sourceFile);
this.newWidth = this.source.Width;
this.newHeight = this.source.Height;
}
// Private constructor used by the 'cloning' methods
private Image(Bitmap s, float w, float h, float r, FileFormat fmt) {
source = s; newWidth = w; newHeight = h;
rotation = r; format = fmt;
}
// Methods that can be used for creating modified clones of
// the 'Image' value using method chaining - these methods only
// store operations that we need to do later
public Image Rotate(float r) {
return new Image(source, newWidth, newHeight, rotation + r, format);
}
public Image Resize(float w, float h) {
return new Image(source, w, h, rotation, format);
}
public Image ConvertTo(FileFormat fmt) {
return new Image(source, newWidth, newHeight, rotation, fmt);
}
public void SaveFile(string f) {
// process all the operations here and save the image
}
}
The class doesn't actually create a clone of the entire bitmap each time you invoke a method. It only keeps track of what needs to be done later, when you'll finally try to save the image. In the following example, the underlying Bitmap would be created only once:
var i = new Image("file.jpg");
i.Resize(500, 800).Rotate(90).ConvertTo(Gif).SaveFile("fileNew.gif");
In summary, the code looks like you're cloning the object and you're actually creating a new copy of the Image class each time you call some operation. However, that doesn't mean that the operation is memory expensive - this can be hidden in the functional library, which can be implemented in all sorts of ways (but still preserv the important referential transparency).
It depends on the type of data structures used, their application in a given program. In general, immutability does not have to be overly expensive on memory.
You may have noticed that the persistent data structures used in functional programs tend to eschew arrays. This is because persistent data structures typically reuse most of their components when they are "modified". (They are not really modified, of course. A new data structure is returned, but the old one is just the same as it was.) See this picture to get an idea of how the structure sharing can work. In general, tree structures are favoured, because a new immutable tree can be created out of an old immutable tree only rewriting the path from the root to the node in question. Everything else can be reused, making the process efficient in both time and memory.
In regards to your example, there are several ways to solve the problem other than copying a whole massive array. (That actually would be horribly inefficient.) My preferred solution would be to use a tree of array chunks to represent the image, allowing for relatively little copying on updates. Note an additional advantage: we can at relatively small cost store multiple versions of our data.
I don't mean to argue that immutability is always and everywhere the answer -- the truth and righteousness of functional programming should be tempered with pragmatism, after all.
Yes one of the disadvantage of using immutable objects is that they tend to hog memory, One thing that comes to my mind is something similar to lazy evaluation, which is when a new copy is requested provide a reference and when the user does some changes then initialize the new copy of the object.
Short, tangential answer: in FP language I'm familiar with (scala, erlang, clojure, F#), and for the usual data structures: arrays, lists, vectors, tuples, you need to understand shallow/deep copies and how implemented:
e.g.
Scala, clone() object vs. copy constructor
Does Scala AnyRef.clone perform a shallow or deep copy?
Erlang: message passing a shallow-copied data structure can blow up a process:
http://groups.google.com/group/erlang-programming/msg/bb39d1a147f72800

Resources