How to get annotations data from Open Images Dataset V6? - machine-learning

I used images and annotation data from the open images dataset v6.
I was able to retrieve the images, but not the annotation information.
Can you please tell me what to do?
Current status
I ran the code on GoogleColaboratory, referring to the demonstration of fiftyone.
I was able to run it up to the following point
dataset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types=["detections", "classifications"],
classes=["Bottle"],
max_samples=250,
seed=51,
shuffle=True,
dataset_name="open-images-sample-mix-data",
)
person_subset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types=["detections", "classifications"],
classes=["Person"],
max_samples=250,
seed=51,
shuffle=True,
dataset_name="Person-subset",
)
can_subset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types=["detections", "classifications"],
classes={'Tin can'},
max_samples=250,
seed=51,
shuffle=True,
dataset_name="Tin_can-subset",
)
box_subset = foz.load_zoo_dataset(
"open-images-v6",
split="validation",
label_types=["detections", "classifications"],
classes=["Box"],
max_samples=250,
seed=51,
shuffle=True,
dataset_name="Box-subset",
)
_ = dataset.merge_samples(person_subset)
_ = dataset.merge_samples(box_subset)
_ = dataset.merge_samples(can_subset)
However, from here I want to open the file detections.csv and type the coordinate information into the generated text, but the file is too huge to open.
Could you please tell me how to get the coordinate information?

The base Open Images annotation csv files are quite large. The best way to access the bounding box coordinates would be to just iterate of the FiftyOne dataset directly and access the coordinates from the FiftyOne Detection label objects.
bboxes = []
for sample in dataset:
for detection in sample.detections.detections:
bbox = detection.bounding_box
bboxes.append(bbox)
In this loop, you can also access other information to store in your text file like the sample ids and classification annotations. While this loop is the most flexible way to get other information that you want from the dataset, if you just want the bounding box coordinates, the most efficient way to get that information is to use dataset.values()
bboxes = dataset.values("detections.detections.bounding_box")
Either way, you would then write these lists of box coordinates to a text file programmatically.

Related

Creating a new dataset of hidden state probabilities using a HMM results in different shapes after each run

I'm trying to create a new dataset of hidden state probabilities using a hidden Markov model. Everything works fine unless each time the output dataset comes up with different values (sometimes the same values) for hidden_states_train and hidden_states_test hence resulting a different column sizes in the columns stack/ a feature mismatch. e.g New dataset size (15261, 197) (5087, 194), New dataset size (15261, 197) (5087, 197) etc.
I can't figure out why this is happening each time I run the code. I tried to give same number of samples for both X_train_st and X_test_st but this keeps happening. If I set n_comp in range a smaller range e.g for n_comp in range(1,6) then often it results the same shapes.
Can someone shed some light to what's going on and a possible fix, please?
newX = X_train_st
newXtest = X_test_st
for n_comp in range(1,16):
print("fitting to HMM and decoding %d ..." % n_comp , end="")
modelHMM = GaussianHMM(n_components=n_comp, covariance_type="diag").fit(X_train_st)
hidden_states_train = to_categorical(modelHMM.predict(X_train_st))
hidden_states_test = to_categorical(modelHMM.predict(X_test_st))
print("done")
newX = np.column_stack((newX,hidden_states_train))
newXtest = np.column_stack((newXtest,hidden_states_test))
print('New dataset size',newX.shape,newXtest.shape)

Image preprocessing - Train and test image data are not being read in corresponding order

I am trying to train a CNN model for an image processing problem statement. - I am facing a major issue in the preprocessing stage, where the train datasets of both train_rain and train_no_rain are not in the order I wish for them to be. This is affecting the performance of my model, as it is important for my model to ID an image with rain streaks and then the same image without them.
Any solutions to this issue?
Here are the samples of what I am trying to imply -
Say after reading the datasets as shown below:
path_1 = "gdrive/My Drive/Rain100H/train/rainy"
train_rain = []
no_train_rain = 0
gauss_img = []
for img in glob.glob(path_1+"/*.png"):
im = cv.imread(img)
im = cv.resize(im,(128,128))
#Gaussian Blur
im_gb = cv.GaussianBlur(im,(5,5),0)
gauss_img.append(im_gb)
cv.waitKey()
no_train_rain+=1
train_rain.append(im)
train_no_rain = []
no_train_no_rain = 0
path_2 = "gdrive/My Drive/Rain100H/train/no rain"
for img in glob.glob(path_2+"/*.png"):
im = cv.imread(img)
im = cv.resize(im,(128,128))
cv.waitKey()
no_train_no_rain+=1
train_no_rain.append(im)
Now I want to read the first images from train_rain and train_no_rain, AFTER converting them to numpy arrays. and I did that using this -
import matplotlib.pyplot as plt
first image from train_rain
plt.imshow(train_rain[1])
first image from train_no_rain
plt.imshow(train_no_rain[1])
But ideally, the first image in train_no_rain should be:
PS: The datasets have all the images beforehand, it's just that they are not being read in a particular order.
Any sort of help would be much appreciated :)

Open Cascade Write glTF Writer

Open Cascade has glTF writer in their current development branch - RWGltf_CafWriter
I am trying to convert STP to glTF using it and got starting point from this question - Any Open source Libraries to Convert STEP files to glTF file format?
It looks doable, but I am new to Open Cascade technology and have few questions
While calculating triangulation for shapes using BRepMesh_IncrementalMesh, it needs line deflection and angle deflection, what are these and what should be its values?
RWGltf_CafWriter requires TDocStd_Document and TDF_LabelSequence, how do we get these from Shapes?
Thank You
While calculating triangulation for shapes using BRepMesh_IncrementalMesh,
it needs line deflection and angle deflection,
what are these and what should be its values?
Deflection parameters define the mesh quality. Within specific domain / algorithm, you should probably know in advance applicable deviation of your geometry (like no more than 1 mm). However, in context of visualization and arbitrary CAD model, linear deflection is usually defined relatively to the bounding box of the document.
RWGltf_CafWriter requires TDocStd_Document and TDF_LabelSequence, how do we get these from Shapes?
TDocStd_Document is an XDE document supported by various file format translators - including STEP and glTF. If at that point you have a single TopoDS_Shape from STEP file, then you probably used a simplified STEP translator STEPControl_Reader. To preserve the structure of original document, it is better using STEPCAFControl_Reader filling in an XDE document.
Within XDE document, shapes (and not only shapes) are stored as Labels, so that TDF_LabelSequence collection is used to pass through the information like a sequence of root shapes (model tree roots in the document), which are called Free Shapes:
// read / create / fill in the document
Handle(TDocStd_Document) theXdeDoc; // created in advance
STEPCAFControl_Reader aStepReader;
if (!aStepReader.ReadFile ("myStep.stp") != IFSelect_RetDone) { // parse error }
if (!aStepReader.Transfer (theXdeDoc)) { // translation error }
...
// collect document roots into temporary compound
Handle(XCAFDoc_ShapeTool) aShapeTool = XCAFDoc_DocumentTool::ShapeTool (myXdeDoc->Main());
TDF_LabelSequence aRootLabels;
aShapeTool->GetFreeShapes (aRootLabels);
TopoDS_Compound aCompound;
BRep_Builder aBuildTool;
aBuildTool.MakeCompound (aCompound);
for (TDF_LabelSequence::Iterator aRootIter (aRootLabels); aRootIter.More(); aRootIter.Next())
{
const TDF_Label& aRootLabel = aRootIter.Value();
TopoDS_Shape aRootShape;
if (XCAFDoc_ShapeTool::GetShape (aRootLabel, aRootShape))
{
aBuildTool.Add (aCompound, aRootShape);
}
}
// perform meshing
Handle(Prs3d_Drawer) aDrawer = new Prs3d_Drawer(); // holds visualization defaults
BRepMesh_IncrementalMesh anAlgo;
anAlgo.ChangeParameters().Deflection = Prs3d::GetDeflection (aCompound, aDrawer);
anAlgo.ChangeParameters().Angle = 20.0 * M_PI / 180.0; // 20 degrees
anAlgo.ChangeParameters().InParallel = true;
anAlgo.SetShape (aCompound);
anAlgo.Perform();
...
// write or export the document
TColStd_IndexedDataMapOfStringString aMetadata;
RWGltf_CafWriter aGltfWriter ("exported.glb", true);
// STEP reader translates into mm units by default
aGltfWriter.ChangeCoordinateSystemConverter().SetInputLengthUnit (0.001);
aGltfWriter.ChangeCoordinateSystemConverter().SetInputCoordinateSystem (RWMesh_CoordinateSystem_Zup);
if (!aGltfWriter.Perform (theXdeDoc, aMetadata, Handle(Message_ProgressIndicator)())) { // export error }
In Draw Harness the conversion may look like this (the source code of commands can be used as a helpful reference of working code using related OCCT algorithms):
pload XDE OCAF VISUALIZATION MODELING
# read STEP file into XDE document
ReadStep D myStep.stp
# display the document in 3D viewer (will also compute default triangulation)
vinit
XDisplay -dispMode 1 D
vfit
# export XDE document into glTF file
WriteGltf D myGltf.glb

How to get features from several layers using c++ in caffe

How can I get both the 4096 dim feature layer and the 1000 dim class layer in caffe after one forward pass using C++?
I tried to look it up in extract_features.cpp but it uses some weird datum object, so I cannot really understand how it works.
So far I was simply cropping my prototxt files up to the layer that I wanted to extract and used
[...]
net->ForwardPrefilled();
Blob<float> *output_layer = net->output_blobs()[0];
const float *begin = output_layer->cpu_data();
const float *end = begin + output_layer->channels();
return vector<float>(begin, end);
but that does not work if I want to extract two specific layers (eg "prob" and "fc7") simultaneously.
Update
The simple work flow of extract_feature.cpp(suppose you have a shared_ptr<Net<float> > net object in c++):
perform net forward to process input: net->Forward().
In this step, there is a Data layer in the net to read the input images. So if in your own app/code you want read an image to cv::Mat image and feed it into net, you can write a code like:
// for data preprocess
shared_ptr<caffe::DataTransformer<float> > data_transformer;
caffe::TransformationParameter trans_para;
// set mean
trans_para.set_mean_file("/path/to/image_mean.binaryproto");
// set crop size, e.g.here is cropping 227x227
trans_para.set_crop_size(227);
// instantiate a DataTransformer using trans_para for image preprocess
data_transformer.reset(new caffe::DataTransformer<float>(trans_para, caffe::TEST));
const std::vector<caffe::Blob<float> *> net_input = net->input_blobs();
// maybe you need to resize image before this step
data_transformer->Transform(image, *net_input[0]);
net->Forward();
And the net.prototxt should have a Input layer as the first layer, e.g. this deploy.prototxt.
get the feature blobs according to their names:const boost::shared_ptr<Blob<Dtype> > feature_blob = net->blob_by_name(blob_names[i])
extract the feature data from the blob you get into a structure you want, e.g. an arry, a simple sample code can be:
count = feature_blob->channels() * feature_blob->height() *
feature_blob->width();
float* feature_array = new float[count];
const float* feature_blob_data = feature_blob->cpu_data() +
feature_blob->offset(n); // feature data generated from
// the nth input image within a batch
memcpy(feature_array, feature_blob_data, count * sizeof(float));
...// other operations
delete [] feature_array;
Note that the data stored from feature_blob_data is in row-major order.
The extract_feature.cpp's usage should be like this for your task:
path/to/extract_features your_pretrained_model.caffemodel \
net.prototxt 4096_dim_feature_blob_name,1000_dim_class_feature_blob_name \
saved_4096_dim_feature_database,saved_1000_dim_class_feature_database \
num_mini_batches(times for forward pass) lmdb(or leveldb) GPU(or CPU)
The net.prototxt should contain a data layer that can read the input image data.
And when running, it will first the read image data from the data layer within net.prototxt and perform num_mini_batches times of forward pass and extract the 2 two feature blob 4096_dim_feature_blob_name, 1000_dim_class_feature_blob_name's data into a structure typed of Datum and then serialize them to save in the database saved_4096_dim_feature_database, saved_1000_dim_class_feature_database which are typed of lmdb or leveldb.
When finished, you can read the saved feature data from saved_4096_dim_feature_database, saved_1000_dim_class_feature_database using a data layer in net.prototxt respectively.
BTW, datum is a structure that can store at most 4D data as well as the data's shape and label information etc. It is defined in caffe.proto, generated using google protobuf and is convenient for data interchange between caffe and database like LMDB and LEVELDB.

Blocproc in matlab with two output variables

I have the following problem. I have to compute dense SIFT interest points in a very high dimensional image (182MP). When I run the code in the full image Matlab always close suddently. So I decided to run the code in image patches.
the code
I tried to use blocproc in matlab to call the c++ function that performs the dense sift interest points detection this way:
fun = #(block_struct) denseSIFT(block_struct.data, options);
[dsift , infodsift] = blockproc(ndvi,[1000 1000],fun);
where dsift is the sift descriptors (vectors) and infodsift has the information of the interest points, such as the x and y coordinates.
the problem
The problem is the fact that blocproc just allow one output, but i want both outputs. The following error is given by matlab when i run the code.
Error using blockproc
Too many output arguments.
Is there a way for me doing this?
Would it be a problem for you to "hard code" a version of blockproc?
Assuming for a moment that you can divide your image into NxM smaller images, you could loop around as follows:
bigImage = someFunction();
sz = size(bigImage);
smallSize = sz ./ [N M];
dsift = cell(N,M);
infodsift = cell(N,M);
for ii = 1:N
for jj = 1:M
smallImage = bigImage((ii-1)*smallSize(1) + (1:smallSize(1)), (jj-1)*smallSize(2) + (1:smallSize(2));
[dsift{ii,jj} infodsift{ii,jj}] = denseSIFT(smallImage, options);
end
end
The results will then be in the two cell arrays. No real need to pre-allocate, but it's tidier if you do. If the individual matrices are the same size, you can convert into a single large matrix with
dsiftFull = cell2mat(dsift);
Almost magic. This won't work if your matrices are different sizes - but then, if they are, I'm not sure you would even want to put them all in a single one (unless you decide to horzcat them).
If you do decide you want a list of "all the colums as a giant matrix", then you can do
giantMatrix = [dsift{:}];
This will return a matrix with (in your example) 128 rows, and as many columns as there were "interest points" found. It's shorthand for
giantMatrix = [dsift{1,1} dsift{2,1} dsift{3,1} ... dsift{N,M}];

Resources