I'm trying to make custom points geometry with colors in QtQuick3D. When I use only position semantic (and to vertex data setting only XYZ coordinates), everything works. But now I would like to use RGB colors from laser scanner. I've written something like this, but nothing appears on the screen. What I'm doing wrong? "RGB" values are normalized (from 0 to 1), "A" values are all ones.
Sample data:
[[-9.66899963e+02 4.84399994e+02 -1.10599991e+02 2.35294119e-01
2.54901975e-01 1.56862751e-01 1.00000000e+00]
[-5.90700012e+02 -1.01000000e+02 -2.39400009e+02 4.11764711e-01
4.66666669e-01 2.47058824e-01 1.00000000e+00]
[-5.92099976e+02 -1.00899994e+02 -2.39199997e+02 4.50980395e-01
4.94117647e-01 2.66666681e-01 1.00000000e+00]
[-5.92099976e+02 -1.01500000e+02 -2.39199997e+02 4.43137258e-01
4.90196079e-01 2.66666681e-01 1.00000000e+00]
[-6.40899963e+02 -1.00899994e+02 -2.08800003e+02 3.52941185e-01
3.92156869e-01 2.19607845e-01 1.00000000e+00]
[-6.40799988e+02 -1.01500000e+02 -2.08699982e+02 3.76470596e-01
4.07843143e-01 2.15686277e-01 1.00000000e+00]
[-6.03599976e+02 -1.62100006e+02 -2.71199982e+02 1.64705887e-01
1.52941182e-01 9.80392173e-02 1.00000000e+00]
[-6.22200012e+02 -1.51400009e+02 -2.94600006e+02 1.96078435e-01
1.92156866e-01 1.17647059e-01 1.00000000e+00]
[-6.08099976e+02 -1.62500000e+02 -2.85700012e+02 1.41176477e-01
1.45098045e-01 9.41176489e-02 1.00000000e+00]
[-6.22400024e+02 -1.51199997e+02 -2.93899994e+02 1.96078435e-01
1.92156866e-01 1.17647059e-01 1.00000000e+00]]
CustomGeometry updateData method:
def updateData(self):
self.clear()
stride = 7
FLOAT_SIZE = 4
mins = np.amin(self.xyzrgba, axis=0)
maxs = np.amax(self.xyzrgba, axis=0)
self.setPrimitiveType(QQuick3DGeometry.PrimitiveType.Points)
self.setStride(stride * FLOAT_SIZE)
self.setBounds(QVector3D(mins[0], mins[1], mins[2]), QVector3D(maxs[0], maxs[1], maxs[2]))
self.addAttribute(QQuick3DGeometry.Attribute.PositionSemantic, 0, QQuick3DGeometry.Attribute.F32Type)
self.addAttribute(QQuick3DGeometry.Attribute.ColorSemantic, 3*FLOAT_SIZE, QQuick3DGeometry.Attribute.F32Type)
self.setVertexData(self.xyzrgba.tobytes())
QML:
Model {
id: modelPCL
visible: cbPCL.checked
geometry: PointCloudGeometry { }
objectName: "PointCloud"
materials: DefaultMaterial {}
pickable: true
property bool isPicked: false
}
Related
I want to optimize size of vertex buffer. Currently my layout for VBO is:
x y | r g b a
It's consumed by shader like this:
struct VertexInput {
#location(0) position: vec2<f32>,
#location(1) color: vec4<f32>,
}
And I'm storing mesh in buffer like this: |Mesh1|Mesh2|LargeMesh3|, because my meshes are dynamic. It's being rendered in one drawCall (seems like it's called Draw Call Batching).
I want to reduce sent data to GPU by setting different color for every mesh, not every vertex. And every mesh is different. How can I achive it?
I'm drawing strokes:
I achieved it with #trojanfoe's help with multiple drawCalls.
I created second buffer with stepMode: 'instance' and passed colors to it.
Layout:
vertex: {
module: this.shaderModule,
entryPoint: 'vertex',
buffers: [
{
arrayStride: 2 * VBO_ARRAY.BYTES_PER_ELEMENT,
stepMode: 'vertex',
attributes: [
{
format: 'float32x2',
offset: 0,
shaderLocation: 0,
},
],
},
{
arrayStride: 4 * VBO_ARRAY.BYTES_PER_ELEMENT,
stepMode: 'instance',
attributes: [
{
format: 'float32x4',
offset: 0,
shaderLocation: 1,
},
],
},
],
}
Added to renderPass:
pass.setVertexBuffer(0, this.vbo.buffer)
pass.setVertexBuffer(1, this.clrbo.buffer)
And used in shader as is:
struct VertexInput {
#location(0) position: vec2<f32>,
#location(1) color: vec4<f32>,
}
struct VSOutput {
#builtin(position) position: vec4<f32>,
#location(0) color: vec4<f32>,
}
#vertex
fn vertex(vert: VertexInput) -> VSOutput {
var out: VSOutput;
out.color = vert.color;
....
return out;
}
#fragment
fn fragment(in: VSOutput) -> #location(0) vec4<f32> {
return in.color;
}
However, I'm not sure it will work with multiple meshes merged in one buffer and rendered with one draw call.
I have a collection of points in tiling 2D space (I believe its called toroidal geometry/space), and I want to find their mean:
The basic approach would be to just take their mean 'locally' where you would just treat the space as non-tiling. Looking at the example, I'd guess that that would be somewhere in about the middle. However, looking at the extended example, I'd say that the middle is probably one of the worst representations of the data.
I'd say the objective is to find a location where the total variation from the mean is at a minimum
One potential method would be to try all combinations of points in each of the 9 neighbours, and then see which one has the lowest variance, but that becomes extremely inefficient very quickly:
Big O = O(8^n)
I believe it could probably be made more efficient by doing something like treating the x and y independently, but that would only reduce it to O(5^n), so still not manageable.
Perhaps hill-climbing might work? Where I have a random point, and then calculate the lowest possible variance for each point, then make some random adjustments and test again reverting if the variance decreases, I then repeat this until I reach a seemingly optimal value.
Is there a better method? Or maybe some sort of heuristic 'good enough' method?
as of my understanding, you are trying to find the center of mass.
to do so (for each tile) you have to find the sum of each positions multiplied by its weight (assume it is 1 because they are just identical points positioned differently) then divide this by the sum of the weights (which is the number of points in this example as weight = 1). The formula
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML-full"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({"HTML-CSS": { preferredFont: "TeX", availableFonts:["STIX","TeX"], linebreaks: { automatic:true }, EqnChunk:(MathJax.Hub.Browser.isMobile ? 10 : 50) }, tex2jax: { inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ], displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], processEscapes: true, ignoreClass: "tex2jax_ignore|dno" }, TeX: { noUndefined: { attributes: { mathcolor: "red", mathbackground: "#FFEEEE", mathsize: "90%" } }, Macros: { href: "{}" } }, messageStyle: "none" }); </script>
$$G\left( x,y \right) =\frac{\sum_{i=1}^n{m_i\cdot p_i\left( x_i\,\,,y_i \right)}}{\sum_{i=1}^n{m_i}}$$
in our example:
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML-full"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({"HTML-CSS": { preferredFont: "TeX", availableFonts:["STIX","TeX"], linebreaks: { automatic:true }, EqnChunk:(MathJax.Hub.Browser.isMobile ? 10 : 50) }, tex2jax: { inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ], displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], processEscapes: true, ignoreClass: "tex2jax_ignore|dno" }, TeX: { noUndefined: { attributes: { mathcolor: "red", mathbackground: "#FFEEEE", mathsize: "90%" } }, Macros: { href: "{}" } }, messageStyle: "none" }); </script>
$$G\left( x,y \right) =\frac{\sum_{i=1}^n{p_i\left( x_i\,\,,y_i \right)}}{n}$$
here is a python implementation :
#assuming l is a list of 2D tuples as follow: [(x1,y1),(x2,y2),...]
def findCenter(l):
cx,cy = 0,0
for p in l:
cx += p[0]
cy += p[1]
return (cx / len(l), cy / len(l))
# the result is a tuple of non-integer, if you need
# an integer result use: return (cx // len(l),cy // len(l))
# remove the outer parenthesis to take result as 2 separate values
as for multiple tiles you can calculate the center for each tile then the center for the centers, or treat all points from multiple tiles as points from one big tile and calculate the center of all of them.
I created a custom layer in python so that I can feed the data directly.
but I noticed it runs extremely slow and the GPU usage is at most 1% ( the memory is allocated, i.e. I can see that when I run the script, it allocates 2100MB VRAM and terminating the training, frees around 1G.
I'm not sure if this is an expected behavior or I'm doing something wrong.
Here is the script I wrote (based on this former pr) :
import json
import caffe
import numpy as np
from random import shuffle
from PIL import Image
class MyDataLayer(caffe.Layer):
"""
This is a simple datalayer for training a network on CIFAR10.
"""
def setup(self, bottom, top):
self.top_names = ['data', 'label']
# === Read input parameters ===
params = eval(self.param_str)
# Check the paramameters for validity.
check_params(params)
# store input as class variables
self.batch_size = params['batch_size']
# Create a batch loader to load the images.
self.batch_loader = BatchLoader(params, None)
# === reshape tops ===
# since we use a fixed input image size, we can shape the data layer
# once. Else, we'd have to do it in the reshape call.
top[0].reshape(self.batch_size, 3, params['im_height'], params['im_width'])
# this is for our label, since we only have one label we set this to 1
top[1].reshape(self.batch_size, 1)
print_info("MyDataLayer", params)
def forward(self, bottom, top):
"""
Load data.
"""
for itt in range(self.batch_size):
# Use the batch loader to load the next image.
im, label = self.batch_loader.load_next_image()
# Add directly to the caffe data layer
top[0].data[itt, ...] = im
top[1].data[itt, ...] = label
def reshape(self, bottom, top):
"""
There is no need to reshape the data, since the input is of fixed size
(rows and columns)
"""
pass
def backward(self, top, propagate_down, bottom):
"""
These layers does not back propagate
"""
pass
class BatchLoader(object):
"""
This class abstracts away the loading of images.
Images can either be loaded singly, or in a batch. The latter is used for
the asyncronous data layer to preload batches while other processing is
performed.
labels:
the format is like :
png_data_batch_1/leptodactylus_pentadactylus_s_000004.png 6
png_data_batch_1/camion_s_000148.png 9
png_data_batch_1/tipper_truck_s_001250.png 9
"""
def __init__(self, params, result):
self.result = result
self.batch_size = params['batch_size']
self.image_root = params['image_root']
self.im_shape = [params['im_height'],params['im_width']]
# get list of images and their labels.
self.image_labels = params['label']
#getting the list of all image filenames along with their labels
self.imagelist = [line.rstrip('\n\r') for line in open(self.image_labels)]
self._cur = 0 # current image
# this class does some simple data-manipulations
self.transformer = SimpleTransformer()
print ("BatchLoader initialized with {} images".format(len(self.imagelist)))
def load_next_image(self):
"""
Load the next image in a batch.
"""
# Did we finish an epoch?
if self._cur == len(self.imagelist):
self._cur = 0
shuffle(self.imagelist)
# Load an image
image_and_label = self.imagelist[self._cur] # Get the image index
#read the image filename
image_file_name = image_and_label[0:-1]
#load the image
im = np.asarray(Image.open(self.image_root +'/'+image_file_name))
#im = scipy.misc.imresize(im, self.im_shape) # resize
# do a simple horizontal flip as data augmentation
flip = np.random.choice(2)*2-1
im = im[:, ::flip, :]
# Load and prepare ground truth
#read the label
label = image_and_label[-1]
#convert to onehot encoded vector
#fix: caffe automatically converts the label into one hot encoded vector. so we only need to simply use the decimal number (i.e. the plain label number)
#one_hot_label = np.eye(10)[label]
self._cur += 1
return self.transformer.preprocess(im), label
def check_params(params):
"""
A utility function to check the parameters for the data layers.
"""
required = ['batch_size', 'image_root', 'im_width', 'im_height', 'label']
for r in required:
assert r in params.keys(), 'Params must include {}'.format(r)
def print_info(name, params):
"""
Ouput some info regarding the class
"""
print ("{} initialized for split: {}, with bs: {}, im_shape: {}.".format(
name,
params['image_root'],
params['batch_size'],
params['im_height'],
params['im_width'],
params['label']))
class SimpleTransformer:
"""
SimpleTransformer is a simple class for preprocessing and deprocessing
images for caffe.
"""
def __init__(self, mean=[125.30, 123.05, 114.06]):
self.mean = np.array(mean, dtype=np.float32)
self.scale = 1.0
def set_mean(self, mean):
"""
Set the mean to subtract for centering the data.
"""
self.mean = mean
def set_scale(self, scale):
"""
Set the data scaling.
"""
self.scale = scale
def preprocess(self, im):
"""
preprocess() emulate the pre-processing occuring in the vgg16 caffe
prototxt.
"""
im = np.float32(im)
im = im[:, :, ::-1] # change to BGR
im -= self.mean
im *= self.scale
im = im.transpose((2, 0, 1))
return im
def deprocess(self, im):
"""
inverse of preprocess()
"""
im = im.transpose(1, 2, 0)
im /= self.scale
im += self.mean
im = im[:, :, ::-1] # change to RGB
return np.uint8(im)
And in my train_test.prototxt file I have :
name: "CIFAR10_SimpleTest_PythonLayer"
layer {
name: 'MyPythonLayer'
type: 'Python'
top: 'data'
top: 'label'
include {
phase: TRAIN
}
python_param {
#the python script filename
module: 'mypythonlayer'
#the class name
layer: 'MyDataLayer'
#needed parameters in json
param_str: '{"phase":"TRAIN", "batch_size":10, "im_height":32, "im_width":32, "image_root": "G:/Caffe/examples/cifar10/testbed/Train and Test using Pycaffe", "label": "G:/Caffe/examples/cifar10/testbed/Train and Test using Pycaffe/train_cifar10.txt"}'
}
}
layer {
name: 'MyPythonLayer'
type: 'Python'
top: 'data'
top: 'label'
include {
phase: TEST
}
python_param {
#the python script filename
module: 'mypythonlayer'
#the class name
layer: 'MyDataLayer'
#needed parameters in json
param_str: '{"phase":"TEST", "batch_size":10, "im_height":32, "im_width":32, "image_root": "G:/Caffe/examples/cifar10/testbed/Train and Test using Pycaffe", "label": "G:/Caffe/examples/cifar10/testbed/Train and Test using Pycaffe/test_cifar10.txt"}'
}
}
Whats wrong here?
Your data layer is not efficient enough and it takes most of the training time (you should try caffe time ... to get a more detailed profiling). At each forward pass you are waiting for the python layer to read batch_size images from disk one after the other. This can take forever.
You should consider using Multiprocessing to perform the reading at the background while the net is processing the previous batches: this should give you good CPU/GPU utilization.
See this example for multiprocessing python data layer.
Python layers are executed on CPU not the GPU so it's slow because things have to keep going between the CPU and GPU when training. That's also why you see low gpu usage because its waiting on the cpu to execute the python layer.
In the tensorflow API tf.shape, it says
This operation returns a 1-D integer tensor representing the shape of input.
However, when I call
features = {
'k_mask': tf.VarLenFeature(tf.int64),
'features': tf.VarLenFeature(tf.int64),
'labels': tf.FixedLenFeature([3], tf.int64),
'k_ids': tf.VarLenFeature(tf.int64)
}
parsed_features = tf.parse_single_example(example_proto, features)
features_index = tf.sparse_tensor_to_dense(parsed_features['features'])
print(sess.run(tf.shape(features_index)))
I get the result of [[59]], which is a 2-D integer tensor. The feature_index can be print as
[[ 6217 5882 17223 17235 6008 3580 17233 6038 16340 6116 5458 5747
5957 5755 17238 5745 6030 6078 5786 4373 5888 16284 3574 3569
5811 6117 5748 17228 5810 5833 5823 5885 5986 6034 5756 6105
5832 6199 6087 5744 6037 5933 6095 5785 16290 6124 3559 5787
6111 3570 6109 17322 3840 5962 3566 16950 6006 3584 6011]]
I thought this is a normal [1, 59] tensor. I try the following code:
v1 = tf.constant([[4,3,1,7]])
print(sess.run(v1)) # [[4 3 1 7]]
print(sess.run(tf.shape(v1))) # [1 4]
It looks as expected.
I want transform feature_index to shape of [59,1]. Would anyone knows why the return type is 2-d and how to convert the tensor?
Finally solved as below:
features_index = tf.sparse_tensor_to_dense(parsed_features['features'])
indices = tf.reshape(features_index, [tf.shape(features_index)[0], -1])
get shape(indices) == [59,1]
The Metal Performance Shader framework provides support for building your own Convolutional Neural Nets. When creating for instance an MSPCNNConvolution it requires a 4D weight tensor as init parameter that is represented as a 1D float pointer.
init(device: MTLDevice,
convolutionDescriptor: MPSCNNConvolutionDescriptor,
kernelWeights: UnsafePointer<Float>,
biasTerms: UnsafePointer<Float>?,
flags: MPSCNNConvolutionFlags)
The documentation has this to say about the 4D tensor
The layout of the filter weight is arranged so that it can be
reinterpreted as a 4D tensor (array)
weight[outputChannels][kernelHeight][kernelWidth][inputChannels/groups]
Unfortunately that information doesn't really tell me how to arrange a 4D array into a one dimensional Float pointer.
I tried ordering the weights like the BNNS counterpart requires it, but without luck.
How do I properly represent the 4D tensor (array) as a 1D Float pointer (array)?
PS: I tried arranging it like a C array and getting the pointer to the flat array, but it didn't work.
UPDATE
#RhythmicFistman: That's how I stored it in a plain array, which I can convert to a UsafePointer<Float> (but doesn't work):
var output = Array<Float>(repeating: 0, count: weights.count)
for o in 0..<outputChannels {
for ky in 0..<kernelHeight {
for kx in 0..<kernelWidth {
for i in 0..<inputChannels {
let offset = ((o * kernelHeight + ky) * kernelWidth + kx) * inputChannels + i
output[offset] = ...
}
}
}
}
Ok so I figured it out. Here are the 2 python functions I use to reform my convolutions and fully connected matrices
# shape required for MPSCNN [oC kH kW iC]
# tensorflow order is [kH kW iC oC]
def convshape(a):
a = np.swapaxes(a, 2, 3)
a = np.swapaxes(a, 1, 2)
a = np.swapaxes(a, 0, 1)
return a
# fully connected only requires a x/y swap
def fullshape(a):
a = np.swapaxes(a, 0, 1)
return a
This is something I recently had to do for Caffe weights, so I can provide the Swift implementation for how I reordered those. The following function takes in a Float array of Caffe weights for a convolution (in [c_o][c_i][h][w] order) and reorders those to what Metal expects ([c_o][h][w][c_i] order):
public func convertCaffeWeightsToMPS(_ weights:[Float], kernelSize:(width:Int, height:Int), inputChannels:Int, outputChannels:Int, groups:Int) -> [Float] {
var weightArray:[Float] = Array(repeating:0.0, count:weights.count)
var outputIndex = 0
let groupedInputChannels = inputChannels / groups
let outputChannelWidth = groupedInputChannels * kernelSize.width * kernelSize.height
// MPS ordering: [c_o][h][w][c_i]
for outputChannel in 0..<outputChannels {
for heightInKernel in 0..<kernelSize.height {
for widthInKernel in 0..<kernelSize.width {
for inputChannel in 0..<groupedInputChannels {
// Caffe ordering: [c_o][c_i][h][w]
let calculatedIndex = outputChannel * outputChannelWidth + inputChannel * kernelSize.width * kernelSize.height + heightInKernel * kernelSize.width + widthInKernel
weightArray[outputIndex] = weights[calculatedIndex]
outputIndex += 1
}
}
}
}
return weightArray
}
Based on my layer visualization, this seems to generate the correct convolution results (matching those produced by Caffe). I believe it also properly takes grouping into account, but I need to verify that.
Tensorflow has a different ordering than Caffe, but you should be able to change the math in the inner part of the loop to account for that.
The documentation here assumes some expertise in C. In that context, a[x][y][z] is typically collapsed into a 1-d array when x, y and z are constants known at compile time. When this happens, the z component varies most quickly, followed by y, followed by x -- outside in.
If we have a[2][2][2], it is collapsed to 1D as:
{ a[0][0][0], a[0][0][1], a[0][1][0], a[0][1][1],
a[1][0][0], a[1][0][1], a[1][1][0], a[1][1][1] }
I think tensorflow already has a convenient method for such task:
tf.transpose(aWeightTensor, perm=[3, 0, 1, 2])
Full documentation: https://www.tensorflow.org/api_docs/python/tf/transpose