HDF5 data layer definition for multiple HDF5 files

HDF5 data layer definition for multiple HDF5 files - machine-learning

I know that Caffe won't let you have HDF5 data layer larger than 2GB. I had a large dataset and I split my large dataset into 5 chunks of <2GB. I listed the five files in 'train.txt' file. How can I define it in the "HDF5Data" layer in my network prototxt file? Just listing all of them as top generates an error. Any small example for this? Thanks!
Cheers

You should have text file 'train.txt' with the following content
/path/to/first.h5
/path/to/second.h5
/path/to/third.h5
/path/to/fourth.h5
/path/to/fifth.h5
Then, as source of "HDF5Data" layer you should give only 'train.txt':
layer {
type: "HDF5Data"
name: "data"
# put your "top" here, if you have several - then go ahead
hdf5_data_param {
source: "/path/to/train.txt" # only the list file goes here.
}
include { phase: TRAIN }
}
As you can see, the '/path/to/first.h5' are not listed explicitly in the train.prototxt, only in train.txt.

Related

How would I label training and testing data for a Convolutional Neural Network?

This is a bit of an abstract question.
I have a group of 28x28 px images from certain people, and I would like to label that data with each person who wrote it. How would I go about labeling it for training and testing? This is my first neural network, and I'm having difficulty finding any tutorials that suit my particular need. It feels like most Data, like MNIST/EMNIST, are already labeled.
Some more info is that I'm using Python 3, and Keras with Tensorflow backend.

I am assuming that you know who wrote each image. Then this is a matter of associating that information (the class label) with each image. There are several ways of doing this. Two common approaches are:
Folder structure
Make a folder for each class (person), and put the images inside.
Folder contents:
john/01.png
john/02.png
jane/03.png
susan/...
CSV file
In this case the images can be all in one folder, and then a dedicate Comma-Separated-Values file is used to contain
Folder contents:
dataset.csv
images/01.png
images/02.png
images/03.png
images/....
dataset.csv contents:
filename,person
images/01.png,john
images/02.png,john
images/03.png,jane
...
The CSV approach is nice if you have additional data about each file that you want to store. For instance metadata that could be relevant such as who recorded the file, when was it recorded, with what kind of equipment, what locations etc.
Combinations of the two are also possible, of course.

How to combine multiple h5 files?

Limited by the device, I could only produce several h5 files (the format of each file are same with shape of [idx, 1, 224, 224]) for huge dataset (>100GB) and now I'm confused about the solution to combine these files into a single one for further training on PyTorch. enter image description here

In h5py, groups and files support copy(), which can be used to move groups (including the root group) and their contents between files.
See the docs here (scroll down a bit to find copy()):
http://docs.h5py.org/en/latest/high/group.html
The HDF5 distribution also includes a command-line tool called h5copy that can be used to move things around, and the C API has an H5Ocopy() function.

Converting Caffe layers with multiple nodes

I'm using caffe-tensorflow to convert a model.
I'm getting the error:
"Multiple top nodes are not supported"
I do have some layers inside the prototxt with multiple top nodes:
layer {
name: "slice2"
type: "Slice"
bottom: "conv2"
top: "slice2_1"
top: "slice2_2"
slice_param {
slice_dim: 1
}
}
Is there a way for me to do the conversion? (with or without caffe-tensorflow)
Thank you

You can't do this in caffe-tensorflow which only supports single top (output) nodes (see comments in graph.py).
Your alternatives are:
clone the github code and add the functionality
implement your model in Tensorflow code.

Preparing image dataset for input into Caffe deep learning

I know the first step is to create two file lists with the corresponding labels, one for the training and one for the test set. Suppose the former is called train.txt and the latter val.txt. The paths in these file lists should be relative. The labels should start at 0 and look similar to this:
relative/path/img1.jpg 0
relative/path/img2.jpg 0
relative/path/img3.jpg 1
relative/path/img4.jpg 1
relative/path/img5.jpg 2
For each of these two sets, we will create a separate LevelDB. Is this formatted as a text file? I thought I would create a directory with several subdirectories for each of my classes. Do I manually have to create a text file?

Please see this tutorial on how to use convert_imageset to build levelDb or lmdb datasets for caffe's training.
As you can see from these instruction it does not matter how you arrange the image files on your disk (same folder/different folders...) as long as you have the correct paths in your 'train.txt'/'val.txt' files relative to '/path/to/jpegs/' argument. But if you want to use convert_imageset tool, you'll have to create a text file listing all the images you want to use.

Gimp save all layers to files with the layers size

I've a .xcf-file with 20+ layers that I use for making a sprite-file.
I'd like to save all these layers to separate files with only the content and size of each layer.
I found this script for gimp: https://github.com/jiilee/gimp
Unfortunately that script creates files with the full size of the image and not the size of each layer.
To give an example:
An image with that is 700px wide and 400px high.
A layer is placed at x: 100px, y: 29px with width: 72px, height: 21px;
The script I found makes a file that is 700px x 400px and not as I need 72px x 21px.
Is it possible to do this automatically?

Yes, but I'd recommend the use of Python as a scripting language in this case, rather than script-fu.
For something like this, you would not even need a plug-in, you can just type something like the following on filters->python-fu->console
>>> img = gimp.image_list()[0]
>>> img.layers[0].name
'C'
>>> folder = "/tmp/"
>>> for layer in img.layers:
... name = folder + layer.name + ".png"
... pdb.gimp_file_save(img, layer, name, name)
...
>>>
Done! This snippet assumes a single image open in GIMP (it takes the last open
image in the list by default with the [0].)
The call to export an image takes in a drawable (which can be a layer, a mask, or other item) as input, and saves the file guessing the type by the filename extension with the default parameters.
And, of course, you can improve on it by creating a more elaborate naming scheme for the files, and so on. To make this core into an automated working python-fu plug-in, you just have to put it inside a function, in a .py file in your personal plug-in directory. (if on Unix/Mac OS/Linux you have to set it as executable as well) - and make calls to the register and main functions of GIMP-fu, in the same file -
pick an example for here, http://www.ibm.com/developerworks/opensource/library/os-autogimp/ on how to arrange the file and make these calls. (You can go straight to listing 6).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

HDF5 data layer definition for multiple HDF5 files - machine-learning

Related

How would I label training and testing data for a Convolutional Neural Network?

How to combine multiple h5 files?

Converting Caffe layers with multiple nodes

Preparing image dataset for input into Caffe deep learning

Gimp save all layers to files with the layers size

Categories

Resources