randomly selection of images from File - image-processing

I have a file that contains a 400 images. What I want is to separate this file into two files: train_images and test_images.
The train_images should contains 150 images selected randomly, and all these images must be different from each other. Then, the test_images should also contains 150 images selected randomly, and should be different from each other, even from the images selected in the file train_images.
I begin by writing a code that aims to select a random number of images from a Faces file and put them on train_images file. I need your help in order to respond to my behavior described above.
clear all;
close all;
clc;
Train_images='train_faces';
mkdir(Train_images);
ImageFiles = dir('Faces');
totalNumberOfImages = length(ImageFiles)-1;
scrambledList = randperm(totalNumberOfImages);
numberIWantToUse = 150;
loop_counter = 1;
for index = scrambledList(1:numberIWantToUse)
baseFileName = ImageFiles(index).name;
str = fullfile('faces', baseFileName); % Better than STRCAT
face = imread(str);
imwrite( face, fullfile(Train_images, ['hello' num2str(index) '.jpg']));
loop_counter = loop_counter + 1;
end
Any help will be very appreciated.

Your code looks good to me. When you implement the test, you can re-run the scrambledList = randperm(totalNumberOfImages); then select the first 150 elements in scrambledList as you did in training process.
You can also directly re-initialize the loop:
for index = scrambledList(numberIWantToUse+1 : 2*numberIWantToUse)
... % same thing you wrote in your training loop
end
with this approach, your test sample will be completely different from the training sample.

Supposing that you have the Bioinformatics Toolbox, you can use crossvalind using the parameter HoldOut:
This is an example. trainand test are logical arrays, so you can use findto get the actual indexes:
ImageFiles = dir('Faces');
ImageFilesIndexes = ones(1,length(ImageFiles )) %Use a numeric array instead the char array
proportion = 150/400; %Testing set
[train,test] = crossvalind('holdout',ImageFilesIndexes,proportion );
training_files = ImageFiles(train); %250 files: It is better to use more data to train
testing_files = ImageFiles(test); %150 files
%Then do whatever you like with the files
Other possibilities are dividerand ( Neural Network Toolbox) and cvpartition (Statistics Toolbox)

Related

Load and merge many files from S3 using Dask

I have about 1m "result" files in S3 bucket which I want to process. Each result file should be merge with additional columns from an associated "context" file, which I have about 50k of (i.e. each context is associated with about 20 results)
Processing it serially is slow so I am using dask to parallelize some of the work.
In my serial code, I just load everything up-front and merge them, e.g.
contexts_map = {get_context_id(ctx_file): load_context(ctx_file) for ctx_file in ctx_files}
data = []
for result_file in result_files:
ctx_id, res_id = get_context_and_res_id(result_file)
ctx = contexts_map[ctx_id]
data.append(process_result(ctx))
df = pd.DataFrame(data)
Initially I thought to divide the data and process in batches using dask (i.e. run the above in parallel on several batches) but then I read about dask bag and dask dataframe from_delayed and thought to use it. What I have:
delayed_get_context = delayed(get_context)
# load the contexts
ctx_map = {}
for ctx_file in ctx_files:
ctx_id = get_context_id(ctx_file)
ctx_map[ctx_file] = delayed_get_context(ctx_item)
# process the contexts
delayed_get_context_stats = delayed(get_context_stats)
ctx_stat_map = {ctx_id: delayed_get_context_stats(ctx) for ctx_id, ctx in ctx_map}
# the main bag of result files to process
res_bag = db.from_sequence(res_items, npartitions=num_workers * 2)
# prepare a list of corresponding delayed per results
# the order in this list corresponds to order of res_bag
res_context_list = [
ctx_stat_map[get_context_and_res_id(item)[0]] for item in res_items
]
# then create a bag from that list
ctx_bag = db.from_sequence(res_context_list, npartitions=num_workers * 2)
# create delays for the results
delayed_extract = delayed(extract_stats)
# from what I understand, if one of the arguments is also a bug
# it is distributed in accordance to the "main" bag
results = res_bag.map(delayed_extract, ctx_stats=ctx_bag)
df = ddf.from_delayed(results)
df = df.compute()
df.to_csv("results.csv")
This create a computation graph similar to the following:
When I run this on a subset (as in the image above) it works ok. Running the code on 1m items, I don't see anything happen (maybe didn't wait enough for it to finish building the graph and moving things around?)
With that, does the code above makes sense? Should I have done it another way?
One of the things I am "afraid" of with the above implementation is that there's a lot of data movement.
I could potentially spend some time up-front to arrange context+results and then treat that as the "unit-of-work" and maybe get better results?
Any feedback here would be appreciated - is there a better approach?
And another question - what number of partitions I should use? I saw in the docs it will default to about 100, but is there some rule of thumb to use here?

options for saving xarray dataset with to_netcdf

I would like to add units, long_name, and maybe a description to a variable while using the to_netcdf command. Let me know if you know how.
Here is my code that work:
filename = path+'file.nc'
ds = xr.Dataset({'sla': (('time_counter','x', 'y'), SLA)}, coords={'time_counter':time_counter,'nav_lon':(('x','y'),lon),'nav_lat':(('x','y'),lat)})
ds.to_netcdf(filename, 'w')
Supplementary informations if you want to use this:
'sla' is the name I give while saving the variable SLA
SLA has 3 dimensions; I give them the names 'time_counter', 'x', and 'y'
I defined coordinates, one of which ('time_counter') is directly a dimension of SLA, but also it is possible to have a coordinate with multiple dimensions (e.g., 'nav_lon' and 'nav_lat' have 2 dimensions.
Here is the link that explain the function: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.to_netcdf.html
You can set the attributes of each variable before saving the Dataset to NetCDF, for example (after creating your ds):
ds['sla'].attrs = {'units': 'something'}
After the to_netcdf() step I get (part of the ncdump -h):
double sla(time_counter, x, y) ;
...
sla:units = "something" ;

Lua random image

I have a Lua script I am using in a tabletop game and basically you have a "token" that represents a creature. When it dies, it overlays an image (which I have indicated in an .xml script) with an image of like a blood splat, or tombstone etc.
How do I make it so it would randomize which image gets overlayed?
The Script is here.
The lines below (178-184) are the main section that tells it "put image X over the token". I want it to randomize between say, 5 different images..
if not widgetDeathIndicator then
widgetDeathIndicator = tokenCT.addBitmapWidget("token_dead");
widgetDeathIndicator.setBitmap("token_dead");
widgetDeathIndicator.setName("deathindicator");
widgetDeathIndicator.setTooltipText(sName .. " has fallen, as if dead.");
widgetDeathIndicator.setSize(nWidth-20, nHeight-20);
end
token_dead is the name of the current image being used, which in the .xml directs to a .png
Yes, you can use math.random for this.
local images = {
'token_dead',
'another_image_name',
'yet_another_image_name',
}
local image = images[math.random(#images)]
math.random(n) will return a pseudo-random integer between 1 and n, so if you pass in #images (the length of the images table) you will get a valid pseudo-random table index for images.
To get better randomness you should set math.randomseed before you call math.random. (If you don't set it, then math.random will return the same sequence of "random" numbers each time.)

xtable with latex math symbols in the table

I'm trying to write this table with checkboxes using the xtable package. It would seem that the checkboxes that I've chosen are what is throwing the error. I'm just at a loss how to fix it.
library(xtable)
## I want to create a table with the names of some people in two columns
nStu = 10
## Create fake names
names = character(nStu)
for(i in 1:nStu){
names[i] = paste(LETTERS[i],rep(letters[i],5),sep='',collapse='')
}
## put check boxes behind each of the names
squares = rep('$ \\square $',nStu)
## Build the table
rosterTab = data.frame('Name'=names,'Mostly'=squares,'Sometimes'=squares, stringsAsFactors = FALSE)
## Now chop it in half and paste the halves together. (Yes, if nStu is odd, this will have to be fixed)
lTab = nStu%/%2
aTab = rosterTab[1:lTab, ]
bTab = rosterTab[(lTab+1):nStu, ]
outTab = cbind(aTab,bTab)
## Everything before this point runs fine.
outTab.tab = xtable(outTab,label=FALSE)
align(outTab.tab) = 'llcc||lcc'
print(outTab.tab, include.rownames=FALSE, sanitize.text.function = function(x){x})
The error message that I'm getting is:
Error in as.string(y) : Cannot coerce argument to a string
This error goes away if I use:
squares = rep('aaa',nStu)
Ideally, I want to get the names from a csv file (which I can do quite easily), and will use knitr to write this into a LaTeX document. (I want to do this for a bunch of input files, so automating this task seems useful to me.)
Here are some other ideas that I've considered:
A $\LaTeX$ only solution. Note, that there is one other difficulty (other than the squares not being in the input file), and that is that I would need to do some text manipulation on the strings in the input file.
Replacing the \square symbol with some other object that looks like a checkbox, that (through knitr) I can send to $\LaTeX$.

Gain information about objects in an image

I have a binary image which contains several separated regions. I want to put a threshold on the Area (number of pixels) that these regions occupy, in the way that: a region would be omitted if it has fewer pixels than the threshold. I already have tried these codes (using bwconncomp):
[...]
% let's assume threshold = 50
CC = bwconncomp(my_image);
L = labelmatrix(CC);
A = cell( size(CC.PixelIdxList,1) , size(CC.PixelIdxList,2) );
A = CC.PixelIdxList;
for column = 1 : size(CC.PixelIdxList,2)
if numel(CC.PixelIdxList{column}) < 50, A{column} = 0;
end
end
But at this point I don't know how to convert cell C back to the shape of my image and then show it! Are there any tricks to do that?
Is there any easier and straighter way to gain information about objects in an image than this one I used in here?
I also need to know length and width of these objects. These objects do not necessarily have any specific geometrical shape!
Thanks
Since no one took the effort to answer my question in here, I found it somewhere else. Now I'm coppying it to here, just in case if anyone novice like me might need to know that.
In order to know length and width of objects in an image:
labeledImage = bwlabel(my_image, 8);
regioninfo = regionprops(labeledImage , 'MajorAxisLength', 'MinorAxisLength');
lengths = [regioninfo.MajorAxisLength]; %array
widths = [regioninfo.MinorAxisLength]; %array

Resources