Is it possible to use INT8 input layer for tensorrt? - nvidia

I want to have input layer as 8bit integer, to avoid int->float conversion on CPU:
ITensor* data = network->addInput(
m_InputBlobName.c_str(), nvinfer1::DataType::kINT8,
DimsCHW{static_cast<int>(m_InputC), static_cast<int>(m_InputH),
static_cast<int>(m_InputW)});
but it gives me this error message:
[E] [TRT] Parameter check failed at: ../builder/Network.cpp::addInput::466, condition: type != DataType::kINT8
Is it possible to make it work, or INT8 is only intended to be used for approximate calculations?

I found python api with addInput description:
add_input()
addInput(const char *name, DataType type, Dims dimensions)=0 -> ITensor *
Add an input tensor to the network.
The name of the input tensor is used to find the index into the buffer array for an engine built from the network.
Parameters:
name (*) – The name of the tensor.
type (*) – The type of the data held in the tensor.
dimensions (*) – The dimensions of the tensor.
Only DataType::kFLOAT, DataType::kHALF and DataType::kINT32 are valid input tensor types. The volume of the dimensions, including the maximum batch size, must be less than 2^30 elements.
Returns: The new tensor or None if there is an error.

Related

Getting the error "dtw() got an unexpected keyword argument 'dist'" while calculating dtw of 2 voice samples

I am getting the error "dtw() got an unexpected keyword argument 'dist'" while I'm trying to calculate the dtw of 2 wav files. I can't figure out why or what to do to fix it. I am attaching the code below.
import librosa
import librosa.display
y1, sr1 = librosa.load('sample_data/Abir_Arshad_22.wav')
y2, sr2 = librosa.load('sample_data/Abir_Arshad_22.wav')
%pylab inline
subplot(1, 2, 1)
mfcc1 = librosa.feature.mfcc(y1, sr1)
librosa.display.specshow(mfcc1)
subplot(1, 2, 2)
mfcc2 = librosa.feature.mfcc(y2, sr2)
librosa.display.specshow(mfcc2)
from dtw import dtw
from numpy.linalg import norm
dist, cost, acc_cost, path = dtw(mfcc1.T, mfcc2.T, dist=lambda x, y: norm(x - y, ord=1))
print ('Normalized distance between the two sounds:', dist)
the error is occurring in the 2nd last line.
The error message is straight forward. Lets read the docs of the method you are calling:
https://dynamictimewarping.github.io/py-api/html/api/dtw.dtw.html#dtw.dtw
The dtw function has the following parameters:
Parameters x – query vector or local cost matrix
y – reference vector, unused if x given as cost matrix
dist_method – pointwise (local) distance function to use.
step_pattern – a stepPattern object describing the local warping steps
allowed with their cost (see [stepPattern()])
window_type – windowing function. Character: “none”, “itakura”,
“sakoechiba”, “slantedband”, or a function (see details).
open_begin,open_end – perform open-ended alignments
keep_internals – preserve the cumulative cost matrix, inputs, and
other internal structures
distance_only – only compute distance (no backtrack, faster)
You try to pass an argument named dist and that argument simply is not known.
Instead, removing that argument would solve the issue, such as
dist, cost, acc_cost, path = dtw(mfcc1.T, mfcc2.T)

How are floating-point pixel values converted to integer values?

How does image library (such as PIL, OpenCV, etc) convert floating-point values to integer pixel values?
For example
import numpy as np
from PIL import Image
# Creates a random image and saves in a file
def get_random_img(m=0, s=1, fname='temp.png'):
im = m + s * np.random.randn(60, 60, 3) # For eg. min: -3.8947058634971179, max: 3.6822041760496904
print(im[0, 0]) # for eg. array([ 0.36234732, 0.96987366, 0.08343])
imp = Image.fromarray(im, 'RGB') # (*)
print(np.array(imp)[0, 0]) # [140 , 74, 217]
imp.save(fname)
return im, imp
For the above method, an example is provided in the comment (which is randomly produced). My question is: how does (*) convert ndarray (which can range from - infinity to plus infinity) to pixel values between 0 and 255?
I tried to investigate the Pil.Image.fromarray method and eventually ended by at line #798 d.decode(data) within Pil.Image.Image().frombytes method. I could find the implementation of decode method, thus unable to know what computation goes behind the conversion.
My initial thought was that maybe the method use minimum (to 0) and maximum (to 255) value from the array and then map all the other values accordingly between 0 and 255. But upon investigations, I found out that's not what is happening. Moreover, how does it handle when the values of the array range between 0 and 1 or any other range of values?
Some libraries assume that floating-point pixel values are between 0 and 1, and will linearly map that range to 0 and 255 when casting to 8-bit unsigned integer. Some others will find the minimum and maximum values and map those to 0 and 255. You should always explicitly do this conversion if you want to be sure of what happened to your data.
In general, a pixel does not need to be 8-bit unsigned integer. A pixel can have any numerical type. Usually a pixel intensity represents an amount of light, or a density of some sort, but this is not always the case. Any physical quantity can be sampled in 2 or more dimensions. The range of meaningful values thus depends on what is imaged. Negative values are often also meaningful.
Many cameras have 8-bit precision when converting light intensity to a digital number. Likewise, displays typically have an b-bit intensity range. This is the reason many image file formats store only 8-bit unsigned integer data. However, some cameras have 12 bits or more, and some processes derive pixel data with a higher precision that one does not want to quantize. Therefore formats such as TIFF and ICS will allow you to save images in just about any numeric format you can think of.
I'm afraid it has done nothing anywhere near as clever as you hoped! It has merely interpreted the first byte of the first float as a uint8, then the second byte as another uint8...
from random import random, seed
import numpy as np
from PIL import Image
# Generate repeatable random data, so other folks get the same results
np.random.seed(42)
# Make a single RGB pixel
im = np.random.randn(1, 1, 3)
# Print the floating point values - not that we are interested in them
print(im)
# OUTPUT: [[[ 0.49671415 -0.1382643 0.64768854]]]
# Save that pixel to a file so we can dump it
im.tofile('array.bin')
# Now make a PIL Image from it and print the uint8 RGB values
imp = Image.fromarray(im, 'RGB')
print(imp.getpixel((0,0)))
# OUTPUT: (124, 48, 169)
So, PIL has interpreted our data as RGB=124/48/169
Now look at the hex we dumped. It is 24 bytes long, i.e. 3 float64 (8-byte) values, one for red, one for green and one for blue for the 1 pixel in our image:
xxd array.bin
Output
00000000: 7c30 a928 2aca df3f 2a05 de05 a5b2 c1bf |0.(*..?*.......
00000010: 685e 2450 ddb9 e43f h^$P...?
And the first byte (7c) has become 124, the second byte (30) has become 48 and the third byte (a9) has become 169.
TLDR; PIL has merely taken the first byte of the first float as the Red uint8 channel of the first pixel, then the second byte of the first float as the Green uint8 channel of the first pixel and the third byte of the first float as the Blue uint8 channel of the first pixel.

nclass * nsamples in random forest is higher than the largest number supported by Integer type

I'm running random forest on a dataset with nsample = 20,379,102 and 5 features. The target feature is categorical with nclass = 107,189 levels. I receive the following error:
Error in integer(nclass * nsample) : vector size cannot be NA
Calls: randomForest ... randomForest.formula -> randomForest.default -> integer
In addition: Warning message:
In nclass * nsample : NAs produced by integer overflow
Execution halted
obviously nclass * nsamples in random forest source code is defined as integer which in my problem it is bigger than the largest number the Integer type supports.
I thought about training on several parts of data and combine the models, however the largset dataset that I could train on it contains nsample = 175,545 and nclass = 1257 which is very small portion of data.
Do you suggest any idea to fix the limitation of integer?

Convert indices in longtensor format to binary selection mask in torch

I have a LongTensor which contains all the indices I want from another tensor. How can I convert this LongTensor into a ByteTensor that can be used as a selection mask.
Assume,
th> imageLabels:size()
17549
3
[torch.LongStorage of size 2]
[0.0001s]
th> indices
1
22
32
[torch.LongTensor of size 3]
I need a way to access imageLabels using [index] notation so that I can change some values in imageLabels in-place.
Is there any way to do this? As far as I understood from the docs, :index, :narrow operations return a completely new Tensor.
Correct, :index, narrow return a new tensor, the new tensor uses the same original storage as stated in the doc here: "For methods narrow, select and sub the returned tensor shares the same Storage as the original"
I ended up using indexFill.
targetTensor:indexFill(1, indices, 0)
the first argument is the dimension,
indices is the LongTensor containing all the indices we are interested in
0 is the value to fill. Can be any number
Hope this helps. Its all in the docs. We have to read it patiently.

vDSP_fft_zrip understanding the transformed DSPSplitComplex content

Assuming "A" is a real vector packed (with vDSP_ctoz) in the proper way, doing:
vDSP_fft_zrip(setupReal, &A, 1, FFT_LENGTH_LOG2, FFT_FORWARD);
Will transform my real content to it's frequency representation.
What do then the following values represents ?:
A.realp[0];
A.imagp[0];
A.realp[i];
A.imagp[i];
A.realp[N-1];
A.imagp[N-1];
Actually I'm wondering where is the DC and Nyquist components stored. Also is the A.imagp[j] the imaginary part of A.realp[j] ?
Let H be the vector that is the mathematical result of the FFT, so that Hk is the kth element of the vector. H0 is the DC component, and HN/2 is the Nyquist component. Then:
A.realp[0] contains H0.
A.imagp[0] contains HN/2.
For 0 < k < N/2, A.realp[k] and A.imagp[k] combined contain Hk. Specifically, A.realp[k] contains the real part of Hk, and A.imagp[k] contains the imaginary part of Hk. Equivalently, Hk = A.realp[k] + i • A.imagp[k].
Some documentation about the vDSP FFTs is here.

Resources