Modifying Caffe to accept 16 bit data inside lmdb - machine-learning

I'm trying to make some modifications on Caffe in order to accept my 16 bit data.
I succeded into creating a lmdb dataset filled with 16 bit unsigned, unencoded 256x256 image instead of usual caffe 8 bit unsigned data, saved as "string" as would be the usual 8 bit lmdb that can be created with image_convert or DIGITS utilities.
I've changed the io.py functions array_to_datum and datum_to_array in order to create this lmdb at 16bit "string" data.
Now, if I use this lmdb on caffe (just 4 classes), the networks runs but doesn't converge. I strongly suspect that is not reading my data properly.
Problem is, the io.py functions array_to_blobproto and blobproto_to_array doesn't seems to make any distinction between internal data contents, and I cannot find the code where I should modify for dealing with 16bits.
Could anyone give me an hint to where to work on?
Edit:
Messing around in the code, I think that one of the possibility should be to create a new data layer or a new image data layer if I would like to work directly on the png without going throught lmdb. But trying to modify that C++ code is not a trivial task for me, esp. I cannot easily follow the data flow inside the code. I see that new layer can be written in python. Do you think that a new input data layer could work nicely or would slow down the cnn performance?

I don't know much about converting and adapting caffe/lmdb interface, but it seems like a very risky path to take if you are not 100% certain with what you are doing.
For instance, you changed the io functions in the python interface, but I don't think caffe is using this interface when running from command line (e.g., $CAFFE_ROOT/build/tools/caffe train ...). Have you looked into the cpp io functions in io.cpp file?
I would strongly suggest an alternative path: use hdf5 binary inputs instead of lmdb.
You can convert the 16bit images you have to float32 and store them in hdf5 files, then input them to caffe via "HDF5Data" layer.
Read more about hdf5 and caffe in this thread.

Related

How to K-Means clustering of pdf raw data

I want to cluster pdf documents based on their structure, not only the text content.
The main problem with the text only approach is, that it will loose the information if a document has a pdf form structure or was it just a plain doc or does it contain pictures?
For our further processing these information are most important.
My main goal is now to be able to classify a document regarding mainly its structure not only the text content.
The documents to classify are stored in a SQL database as byte[] (varbinary), so my idea is now to use the this raw data for classification, without prior text conversion.
Because if I look at the hex output of these data, I can see repeating structures which seems to be similar to the different doc classes I want to separate.
You can see some similar byte patterns as first impression in my attached screenshot.
So my idea is now to train a K-Means model with e.g. a hex output string.
In the next step I would try to find the best number of clusters with the elbow method, which should be around 350 - 500.
The size of the pdf data varies between 20 kByte and 5 MB, mostly around 150 kBytes. To train the model I have +30.k documents.
When I research that, the results are sparse. I only find this article, which make me unsure about the best way to solve my task.
https://www.ibm.com/support/pages/clustering-binary-data-k-means-should-be-avoided
My questions are:
Is K-Means the best algorithm for my goal?
What method do you would recommend?
How to normalize or transform the data for the best results?
Like Ian in the comments said, to use raw data seems a bad idea.
With further research I found the best solution to first read the structure of the PDF file e.g. with an approach like this:
https://github.com/Uzi-Granot/PdfFileAnaylyzer
I normalized and clustered the data with this information, which gives me good results.

When labeling dimension is too big and want to find another way rather than one-hot encoding

I am a beginner who learns machine learning.
I try to make some model(FNN) and this model has too many output labels to use a one-hot encoding.
Could you help me?
I want to solve this problem :
labeling data is for fruits:
Type (Apple, Grapes, Peach), Quality(Good, Normal, Bad), Price(Expensive, Normal, Cheap), Size(Big, Normal, Small)
So, If I make one-hot encoding, the data size up to 3*3*3*3, 81
I think that the labeling data looks like 4 one-hot-encoding sequence data.
Is there any way to make labeling data in small-dimension, not 81 dimension one hot encoding?
I think binary encoding also can be used, but recognized some shortcoming to use binary encoding in NN.
Thanks :D
If you one hot encode your 4 variables you will have 3+3+3+3=12 variables, not 81.
The concept is that you need to create a binary variable for every category in a categorical feature, not one for every possible combination of categories in the four features.
Nevertheless, other possible approaches are Numerical Encoding, Binary Encoding (as you mentioned), or Frequency Encoding (change every category with its frequency in the dataset). The results often depend on the problem, so try different approaches and see what best fits yours!
But even if you use One-Hot-Encoding, as #DavideDn pointed out, you will have 12 features, not 81, which isn't a concerning number.
However, let's say the number was indeed 81, you could still use dimensionality reduction techniques (like Principal Component Analysis) to solve the problem.

How are Whole Slide Images handled in Deep Learning

I was planning on doing some classification/segmentation on whole slide images. Since the images are huge, I was wondering about the methods that can be applied to process them. So far I've come across techniques that split the image into multiple parts, process those parts and combine the results. However, I would like to know more about other better approaches and if it's the good one. Any reference to existing literature would be of great help.
pyvips has a feature for generating patches from slide images efficiently.
This benchmark shows how it works. It can generate about 25,000 64x64 patches a second in the 8 basic orientations from an SVS file:
https://github.com/libvips/pyvips/issues/100#issuecomment-493960943
It's handy for training. I don't know how that compares to the other patch generation systems people use.
To read these images, the standard library is "open-slide" [https://openslide.org/api/python/]. By "open-slide" you can read, e.g., patches or thumbnails.
For basic image processing operations like filtering, "libvips" and its python binding "pyvips" is quick and convenient to use [https://libvips.github.io/pyvips/vimage.html].
If you need to pass data (like random patches) to a machine learning model, I would personally suggest "PyDmed". When training, e.g., a classifier or a generative model, the loading speed of "PyDmed" is suitable for feeding batches of data to GPU(s).
Here is the link to PyDmed public repo:
https://github.com/amirakbarnejad/PyDmed
Here is the link to PyDmed quick start:
https://amirakbarnejad.github.io/Tutorial/tutorial_section1.html
As akbarnejad mentioned, my preference is to use openslide.
I usually end up writing bespoke dataloaders to feed into pytorch models that use openslide to first do some simple segmentation using various thresholds of a low resolution (thumbnail) image of the slide to get patch coordinates and then pull out the relevant patches of tissue for feeding into the training model.
There are a few good examples of this and tools that try to make it simpler for both pytorch and Keras -
Pytorch
wsi-preprocessing
Keras
Deeplearning-digital-pathology
Both
deep-openslide

Analysis of 3d image data as 2d

I have a tif image of size around ~10 Gb. I need to perform object classification or pixel classification in this image. The dimension of image data has zyx form. My voxel size in x=0.6, y=0.6 and z=1.2. Z is the depth of the object. My RAM can not take whole image.
If I do classification of pixels in each Z plane separately and then merge to get the final shape and volume of object.
Would I loose any information and my final shape or volume of object will be wrong?
#ankit agrawal you probably have found the answer but my advice would be definitely not to say you need more memory.
I have had a similar problem and if anyone else comes across them the options below will help.
Options
The answer about splitting into just z planes is correct. You could lose information in the z plane. The idea isn't a bad one but you could take Regions of Interest (ROIs) /split your image into chunks. So that they could be more manageable but say split the x/2, y/2 and z/2. Then you get a bunch of chunks that be used in memory. Then stack the data back up later
use the library [Dask] (https://dask.org/) , it creates all this for you. It's designed for parallelism and can be scaled on a single computer or a cluster. Using the dask.array part lets you create lots of chunked numpy arrays. Even better, go use dask-image (there should be a link to this in dask). It's a wrapper of dask.array and many scipy ndimage functions. Lastly when the file is split appropriately the computation can be faster because of parallelism. Not always but I have easily worked with 20GB data sets on a laptop with 16GB. The files were 8bit so when many libraries and function upfloat blowing your memory up. This allows you keep a handle on it all. If you stick to the core functions it will work fine. Gets harder when you work with mapped blocks.
If you still have this issue.
The issue with doing a classification in each z-plane individually is that you might not be able to classify objects with that sort of restricted information.
You can easily think of that the same way for a 2D face detection problem where you would try to detect the face in each row individually - that is probably not going to be very robust and you will loose valuable spatial information. In the end you'll probably end up with no detections to merge.
Solution proposal:
My advice would be to increase the size of your voxels until it can be processed by your processing unit, saying decrease the resolution of your data and do a classification with a low confidence threshold. Then come back and do another classification on the volumes with detections in them, this time aim for a higher confidence threshold. This can be done iteratively as need be.
I think breaking the image any (x/y/z) plane kind of defeats the point of the voxel concept because the representation of a three-dimensional object is flattened and you lose the spatial relational data.
I think a couple of options are:
Use a distributed computing cluster, like Hadoop.
Look into storing that image in a geospatial database like GeoMesa, so it may be queried efficiently, then you can just hold in memory what you need to train locally.
10GB isn't so large, so perhaps upgrade your memory capacity?

caffe LayerSetUp and Reshape?

I am reading caffe source code now.
I am confused about LayerSetUp and Reshape methods.
Some layers have both of these methods, others have one or none...
Why? Can anyone explain this to me?
LayerSetUp is called once when loading the net. It's aim is to
(a) Verify the layer has exactly the right number of input/output blobs
(b) Read the parameters of the layer from the prototxt
(c) Initialize internal parameters
On the other hand, Reshape is used to allocate memory for parameters and output blobs and can be called even after the net was setup. For instance, it is common for detection networks to change the input shape, thus Reshapeing all consequent blobs.

Resources