Image Preprocessing in Machine Learning - machine-learning

So i recently took a very deep dive into machine learning using keras and tensorflow. I have been working on a dataset for skin cancer detection, i have all the images in a separate folder, and together with it came two separate csv files :hmnist_8_8_L( has 64 columns which i guess is a 8 by 8 pixel representation) and hmnist_8_8_RGB(has 194 columns that i dont know how they got).
My worry is that perhaps i didn't get a clear understanding of how this two files were arrived at? how did the hmnist_8_8_RGB.csv get the 194 columns out of a single image?

Looking into the data (https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000#hmnist_8_8_RGB.csv) i see that the file contains 192 pixel columns and the label.
So the columns in your hmnist_8_8_RGB.csv should be 193 and not 194.
Since the images used for this file are in RGB scale you have 8x8x3 =192 ( pixels x pixels x color channels). The last column is the label category.
Please pay attention that also in the file hmnist_8_8_L the last column is dedicated to the label.
As marginal note, in future try to give more context in your question. Ad example a link to the dataset would have been appreciated

Related

what's dataset type in tensorflow object-detection api?

I am trying to do my own object detection using my own dataset. I started my first machine learning program from google tensorflow object detection api, the link is here:eager_few_shot_od_training_tf2_colab.ipynb
In the colab tutorial, the author use javascript label the images, the result like this:
gt_boxes = [
np.array([[0.436, 0.591, 0.629, 0.712]], dtype=np.float32),
np.array([[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),
np.array([[0.464, 0.414, 0.626, 0.548]], dtype=np.float32),
np.array([[0.313, 0.308, 0.648, 0.526]], dtype=np.float32),
np.array([[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)
]
When I run my own program, I use labelimg replace to javascript, but the dataset is not compatible.
Now I have two questions, the first one is what is the dataset type in colab tutorial? coco, yolo, voc, or any other? the second is how transform dataset between labelimg data and colab tutorial data? My target is using labelimg to label data then substitute in colab tutorial.
The "data type" are just ratio values based on the height and width of the image. So the coordinates are just ratio values for where to start and end the bounding box. Since each image is going to be preprocessed, that is, it's dimensions are changed when fed into the model (batch,height,width,channel) the bounding box coordinates must have the correct ratio as the image might change dimensions from it's original size.
Like for the example, the model expects images to be 640x640. So if you provide an image of 800x600 it has to be resized. Now if the model gave back the coordinates [100,100,150,150] for an 640x640, clearly that would not be the same for 800x600 images.
However, to get this data format you should use PascalVOC when using labelImg.
The typical way to do this is to create TFRecord files and decode them in your training script order to create datasets. However, you are free to choose whatever method you like Tensorflow dataset in order to train your model.
Hope this answered your questions.

Training on a big set of high-res pictures with turicreate out of memory

I'm trying to use Turicreate to train a model on some 150 pictures that are pretty high-res (4Mb each, 3000X5000).
I'm running
model = tc.object_detector.create(train_data, max_iterations=10)
and after a while I'm getting and 'low virtual memory' warning and right after my computer gets stuck.
I was wondering what's the best practice here to be able to train on such a batch of pictures.
Full code that I'm using:
import turicreate as tc
data = tc.SFrame('annotations.sframe')
train_data, test_data = data.random_split(0.8)
model = tc.object_detector.create(train_data, max_iterations=10)
predictions = model.predict(test_data)
metrics = model.evaluate(test_data)
model.save('mymodel.model')
model.export_coreml('MyCustomObjectDetector.mlmodel')
Normally you'd want to reduce the batch size, i.e. how large of a portion of the training data is used for one iteration. Apparently that's not easy to tweak yet in Turicreate, so it looks like the program is using the full dataset for one epoch. Ideally you'd want to use a smaller portion, for example 32 or 64 images. There's some discussion about the topic on Github, and apparently batch size as a public parameter might be coming in some future release.
3000 x 5000 is also fairly large for this kind of work. You will probably want to downsize the images, i.e. by using bicubic interpolation implemented in for example Scipy. Depending on the kind of images you are working on even a factor of 10 along each dimension might not be too much shrinkage.
Reduce the size of the data set images for example to (width: 400 height: 300), and bump up max_iterations to at least 1000.

How could I reverse the effect of kernel fisher?

I have used Kernel fisher's discriminant analysis in my project and it worked just great. but my problem arises from the fact that when I mapped my data set using kernel functions, all data and also all eigenvalues and eigenvectors are in that space and for testing new samples I face some problems. let me explain it with an example. when I have for example 50 samples with 10 features for describing each sample, my data matrix is 50 by 10 and mapping this function will result in a 50 by 50 matrix in the new feature space. so the eigenvectors (W in FDA) are also in 50D space. now for testing a new sample that is a vector with 10 elements as its features, the mapped data matrix will be 10 by 10 and it is not in the 50D space, so I can't project it into W to obtain which class does it belong to... pleas help me, what could I do?
You are not supposed to map testing points against themselves but against training set. This is why kernel methods (especially not sparse) in general do not scale well - you have to keep the old training set all the time. Thus you will obtain your projection through K(TEST_SAMPLES, TRAINING_SET) which is 10x50 and can be used in your 50 dimensional space.

IJG library or Windows photo viewer DQT generation

The question is: how Windows photo viewer generates DQ (discrete quantization) tables? And/or how any editor or application which uses IJG library generates DQ tables? I'm trying to find out the algorithm of recomputing this tables when the image is resaved and parameters with help of which it computes them.
The IJG library uses baseline set of quantization tables using the sample one in the JPEG standard (or at least used to). It then uses the "quality" parameter to scale the values in those tables. Off the top of my head, something like quality setting 75 uses the unmodified table. Quality values higher than the baseline scale those values smaller. Lower quality values scale them larger.

How are matrices stored in memory?

Note - may be more related to computer organization than software, not sure.
I'm trying to understand something related to data compression, say for jpeg photos. Essentially a very dense matrix is converted (via discrete cosine transforms) into a much more sparse matrix. Supposedly it is this sparse matrix that is stored. Take a look at this link:
http://en.wikipedia.org/wiki/JPEG
Comparing the original 8x8 sub-block image example to matrix "B", which is transformed to have overall lower magnitude values and much more zeros throughout. How is matrix B stored such that it saves much more memory over the original matrix?
The original matrix clearly needs 8x8 (number of entries) x 8 bits/entry since values can range randomly from 0 to 255. OK, so I think it's pretty clear we need 64 bytes of memory for this. Matrix B on the other hand, hmmm. Best case scenario I can think of is that values range from -26 to +5, so at most an entry (like -26) needs 6 bits (5 bits to form 26, 1 bit for sign I guess). So then you could store 8x8x6 bits = 48 bytes.
The other possibility I see is that the matrix is stored in a "zig zag" order from the top left. Then we can specify a start and an end address and just keep storing along the diagonals until we're only left with zeros. Let's say it's a 32-bit machine; then 2 addresses (start + end) will constitute 8 bytes; for the other non-zero entries at 6 bits each, say, we have to go along almost all the top diagonals to store a sum of 28 elements. In total this scheme would take 29 bytes.
To summarize my question: if JPEG and other image encoders are claiming to save space by using algorithms to make the image matrix less dense, how is this extra space being realized in my hard disk?
Cheers
The dct needs to be accompanied with other compression schemes that take advantage of the zeros/high frequency occurrences. A simple example is run length encoding.
JPEG uses a variant of Huffman coding.
As it says in "Entropy coding" a zig-zag pattern is used, together with RLE which will already reduce size for many cases. However, as far as I know the DCT isn't giving a sparse matrix per se. But it usually enhances the entropy of the matrix. This is the point where the compressen becomes lossy: The intput matrix is transferred with DCT, then the values are quantizised and then the huffman-encoding is used.
The most simple compression would take advantage of repeated sequences of symbols (zeros). A matrix in memory may look like this (suppose in dec system)
0000000000000100000000000210000000000004301000300000000004
After compression it may look like this
(0,13)1(0,11)21(0,12)43010003(0,11)4
(Symbol,Count)...
As my under stand, JPEG on only compress, it also drop data. After the 8x8 block transfer to frequent domain, it drop the in-significant (high-frequent) data, which means it only has to save the significant 6x6 or even 4x4 data. That it can has higher compress rate then non-lost method (like gif)

Resources