I'm trying to use a WebGL shader to visualize some audio data. I have an array of number which represent a normalized .wav file:
"demo": {"duration": 0.021111111111111112, "samplerate": 44100, "subsample": 100, "data":
[-0.018585205078125, -0.05145263671875, 0.0645751953125, -0.059326171875, 0.006072998046875,
-0.0294189453125, 0.04620361328125, 0.0694580078125, -0.0849609375, -0.053253173828125,
-0.133697509765625, 0.002166748046875, 0.110931396484375, 0.052337646484375, 0.1214599609375,
-0.19488525390625, 0.00970458984375, 0.0145263671875, -0.01446533203125, 0.12530517578125,
-0.115997314453125, 0.010589599609375, -0.127838134765625, 0.0775146484375, -0.0048828125,
0.001007080078125, -0.164337158203125, -0.146270751953125, 0.077545166015625, -0.012725830078125,
0.087158203125, -0.130462646484375, 0.088287353515625, -0.02996826171875, 0.156280517578125,
0.0230712890625, 0.199920654296875, -0.062164306640625, -0.166107177734375, 0.04888916015625,
-0.00384521484375, 0.1611328125, -0.153961181640625, -0.164947509765625, 0.03314208984375,
0.098052978515625, 0.042083740234375, 0.1318359375, -0.2388916015625, 0.100006103515625,
0.04754638671875, 0.009674072265625, 0.1630859375, -0.161834716796875, 0.005584716796875,
-0.126953125, 0.04388427734375, 0.048095703125, 0.13763427734375, -0.148406982421875,
-0.250274658203125, 0.04815673828125, 0.087371826171875, 0.0931396484375, -0.02789306640625,
-0.282073974609375, 0.134063720703125, 0.14483642578125, -0.0025634765625, 0.206756591796875,
-0.350555419921875, 0.19439697265625, -0.004638671875, 0.03741455078125, 0.203338623046875,
-0.222137451171875, 0.04315185546875, -0.19219970703125, 0.10284423828125, 0.069976806640625,
0.062530517578125, -0.0782470703125, -0.22076416015625, 0.13287353515625, 0.031341552734375,
0.08673095703125]}
I've been looking at this example of using audio data in a shader via a Uint8Array(numPoints), and wondering:
Is my normalized waveform data able to be converted to a Uint8Array?
Are Uint8Arrays the only kind of array you can pass into into a shader?
My actual waveforms contain a lot of data and are about ~1MB. I'm wondering if this is an unreasonable amount of data to try and pass into the shader via a texture, and if there might be a way for me to "subsample" the data so that fewer points are needed?
While you could certainly convert your normalized data to unsigned bytes you may just use floating point textures via the OES_texture_float extension (not needed with WebGL2 contexts). 1MB of data is not that much considering that a single RGBA FullHD framebuffer already is ~8.3MB and the minimum supported max texture size is 4096(=67.1MB). Whether you can sparsely sample the data depends on your requirements / usecase.
Related
Hi I am trying to program an app that will display simple 3d models in iOS on Xcode and I have run into a small problem but I can not find a solution to this problem in Apples Documentation or in any forums on the internet I have looked in. I have an big array with vertices for triangles in 3 Dimensions that I want to transform into world space in the rendering process in metal. I read in an article online that in order to tell metal to tell the graphics processor to transform the vertices in the rendering process you need to put this matrix in a metal buffer and then tell the rendering process to use this buffer with the matrix in it in this line of code:
renderEncoder.setVertexBuffer(ROTMATRIX, offset: 0, index: 1)
if "ROTMATRIX" is the name of the metal buffer that contains the models rotation matrix. The problem is that I do not know how to put the matrix inside this buffer. I constructed a matrix for the model called MODMAT like this:
var A = simd_float4(1, 0, 0, 0)
var B = simd_float4(0, 0, 0, 0)
var C = simd_float4(0, 0, 1, 0)
var D = simd_float4(0, 0, 0, 1)
var MODMAT = float4x4([A, B, C, D])
I tried to put the matrix MODMAT in ROTMATRIX in this line of code:
ROTMATRIX.contents().copyMemory(from: MODMAT, byteCount: 64)
But the compiler in Xcode says that it "Cannot convert value of type 'float4x4' (aka 'simd_float4x4') to expected argument type 'UnsafeRawPointer'". So I need to provide it with the unsafe raw pointer to the matrix MODMAT. So is it possible to create this kind of pointer to a Matrix in Swift and if not how should I modify ROTMATRIX in the correct way?
Best Regards Simon
contents returns an UnsafeMutableRawPointer. You can use either storeBytes(of:toByteOffset:as:) or storeBytes(of:as:) to store a simd_float4x4 to this pointer. In fact, you can use this to store any value of a trivial (basically, values that can be copied bit for bit without any refcounting and so on) type.
Refer to documentation page for UnsafeMutableRawPointer and contents
I'm creating a 3D texture in webgl with the
gl.bindTexture(gl.TEXTURE_3D, texture) {
const level = 0;
const internalFormat = gl.R32F;
const width = 512;
const height = 512;
const depth = 512;
const border = 0;
const format = gl.RED;
const type = gl.FLOAT;
const data = imagesDataArray;
...... command
It seems that the size of 512512512 when using 32F values is somewhat of a dealbraker since the chrome (running on laptop 8 gb ram) browser crashes when uploading a 3D texture of this size, but not always. Using a texture of say size 512512256 seems to always work on my laptop.
Is there any way to tell in advance the maximum size of 3D texture that the GPU in relation to webgl2 can accomodate?
Best regards
Unfortunately no, there isn't a way to tell how much space there is.
You can query the largest dimensions the GPU can handle but you can't query to amount of memory it has available just like you can't query how much memory is available to JavaScript.
That said, 512*512*512*1(R)*4(32F) is at least 0.5 Gig. Does your laptop's GPU have 0.5Gig? You actually probably need at least 1gig of GPU memory to use .5gig as a texture since the OS needs space for your apps' windows etc...
The Browsers also put different limits on how much memory you can use.
Some things to check.
How much GPU memory do you have.
If it's not more than 0.5gig you're out of luck
If your GPU has more than 0.5gig, try a different browser
Firefox probably has different limits than Chrome
Can you create the texture at all?
use gl.texStorage3D and then call gl.getError. Does it get an out of memory error or crash right there.
If gl.texStorage3D does not crash can you upload a little data at a time with gl.texSubImage3D
I suspect this won't work even if gl.texStorage3D does work because the browser will still have to allocate 0.5gig to clear out your texture. If it does work this points to another issue which is that to upload a texture you need 3x-4x the memory, at least in Chrome.
There's your data in JavaScript
data = new Float32Array(size);
That data gets sent to the GPU process
gl.texSubImage3D (or any other texSub or texImage command)
The GPU process sends that data to the driver
glTexSubImage3D(...) in C++
Whether the driver needs 1 or 2 copies I have no idea. It's possible it keeps
a copy in ram and uploads one to the GPU. It keeps the copy so it can
re-upload the data if it needs to swap it out to make room for something else.
Whether or not this happens is up to the driver.
Also note that while I don't think this is the issue the drive is allowed to expand the texture to RGBA32F needing 2gig. It's probably not doing this but I know in the past certain formats were emulated.
Note: texImage potentially takes more memory than texStorage because the semantics of texImage mean that the driver can't actually make the texture until just before you draw since it has no idea if you're going to add mip levels later. texStorage on the other hand you tell the driver the exact size and number of mips to start with so it needs no intermediate storage.
function main() {
const gl = document.createElement('canvas').getContext('webgl2');
if (!gl) {
return alert('need webgl2');
}
const tex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_3D, tex);
gl.texStorage3D(gl.TEXTURE_3D, 1, gl.R32F, 512, 512, 512);
log('texStorage2D:', glEnumToString(gl, gl.getError()));
const data = new Float32Array(512*512*512);
for (let depth = 0; depth < 512; ++depth) {
gl.texSubImage3D(gl.TEXTURE_3D, 0, 0, 0, depth, 512, 512, 1, gl.RED, gl.FLOAT, data, 512 * 512 * depth);
}
log('texSubImage3D:', glEnumToString(gl, gl.getError()));
}
main();
function glEnumToString(gl, value) {
return Object.keys(WebGL2RenderingContext.prototype)
.filter(k => gl[k] === value)
.join(' | ');
}
function log(...args) {
const elem = document.createElement('pre');
elem.textContent = [...args].join(' ');
document.body.appendChild(elem);
}
I am currently working on replicating YOLOv2 (not tiny) on iOS (Swift4) using MPS.
A problem is that it is hard for me to implement space_to_depth function (https://www.tensorflow.org/api_docs/python/tf/space_to_depth) and concatenation of two results from convolutions (13x13x256 + 13x13x1024 -> 13x13x1280). Could you give me some advice on making these parts? My codes are below.
...
let conv19 = MPSCNNConvolutionNode(source: conv18.resultImage,
weights: DataSource("conv19", 3, 3, 1024, 1024))
let conv20 = MPSCNNConvolutionNode(source: conv19.resultImage,
weights: DataSource("conv20", 3, 3, 1024, 1024))
let conv21 = MPSCNNConvolutionNode(source: conv13.resultImage,
weights: DataSource("conv21", 1, 1, 512, 64))
/*****
1. space_to_depth with conv21
2. concatenate the result of conv20(13x13x1024) to the result of 1 (13x13x256)
I need your help to implement this part!
******/
I believe space_to_depth can be expressed in form of a convolution:
For instance, for an input with dimension [1,2,2,1], Use 4 convolution kernels that each output one number to one channel, ie. [[1,0],[0,0]] [[0,1],[0,0]] [[0,0],[1,0]] [[0,0],[0,1]], this should put all input numbers from spatial dimension to depth dimension.
MPS actually has a concat node. See here: https://developer.apple.com/documentation/metalperformanceshaders/mpsnnconcatenationnode
You can use it like this:
concatNode = [[MPSNNConcatenationNode alloc] initWithSources:#[layerA.resultImage, layerB.resultImage]];
If you are working with the high level interface and the MPSNNGraph, you should just use a MPSNNConcatenationNode, as described by Tianyu Liu above.
If you are working with the low level interface, manhandling the MPSKernels around yourself, then this is done by:
Create a 1280 channel destination image to hold the result
Run the first filter as normal to produce the first 256 channels of the result
Run the second filter to produce the remaining channels, with the destinationFeatureChannelOffset set to 256.
That should be enough in all cases, except when the data is not the product of a MPSKernel. In that case, you'll need to copy it in yourself or use something like a linear neuron (a=1,b=0) to do it.
Hi i'm using program Visual Structure From Motion to recover the structure of a 3d-place. However, i 've already computed my descriptors and my features; so i want to use them in Visual Structure From Motion.I've read that the file which contains informations about descriptor should has the following pattern:
[Header][Location Data][Descriptor Data][EOF]
[Header] = int[5] = {name, version, npoint, 5, 128};
name = ('S'+ ('I'<<8)+('F'<<16)+('T'<<24));
version = ('V'+('4'<<8)+('.'<<16)+('0'<<24)); or ('V'+('5'<<8)+('.'<<16)+('0'<<24)) if containing color info
npoint = number of features.
[Location Data] is a npoint x 5 float matrix and each row is [x, y, color, scale, orientation].
Write color by casting the float to unsigned char[4]
scale & orientation are only used for visualization, so you can simply write 0 for them
Sort features in the order of decreasing importance, since VisualSFM may use only part of those features.
VisualSFM sorts the features in the order of decreasing scales.
[Descriptor Data] is a npoint x 128 unsigned char matrix. Note the feature descriptors are normalized to 512.
[EOF] int eof_marker = (0xff+('E'<<8)+('O'<<16)+('F'<<24));
There's someone that write a concrete example of this file? This file should be generated automatically by my application.
I'm looking for a way to get the treble and bass data from a song for some incrementation of time (say 0.1 seconds) and in the range of 0.0 to 1.0. I've googled around but haven't been able to find anything remotely close to what I'm looking for. Ultimately I want to be able to represent the treble and bass level while the song is playing.
Thanks!
Its reasonably easy. You need to perform an FFT and then sum up the bins that interest you. A lot of how you select will depend on the sampling rate of your audio.
You then need to choose an appropriate FFT order to get good information in the frequency bins returned.
So if you do an order 8 FFT you will need 256 samples. This will return you 128 complex pairs.
Next you need to convert these to magnitude. This is actually quite simple. if you are using std::complex you can simply perform a std::abs on the complex number and you will have its magnitude (sqrt( r^2 + i^2 )).
Interestingly at this point there is something called Parseval's theorem. This theorem states that after performinng a fourier transform the sum of the bins returned is equal to the sum of mean squares of the input signal.
This means that to get the amplitude of a specific set of bins you can simply add them together divide by the number of them and then sqrt to get the RMS amplitude value of those bins.
So where does this leave you?
Well from here you need to figure out which bins you are adding together.
A treble tone is defined as above 2000Hz.
A bass tone is below 300Hz (if my memory serves me correctly).
Mids are between 300Hz and 2kHz.
Now suppose your sample rate is 8kHz. The Nyquist rate says that the highest frequency you can represent in 8kHz sampling is 4kHz. Each bin thus represents 4000/128 or 31.25Hz.
So if the first 10 bins (Up to 312.5Hz) are used for Bass frequencies. Bin 10 to Bin 63 represent the mids. Finally bin 64 to 127 is the trebles.
You can then calculate the RMS value as described above and you have the RMS values.
RMS values can be converted to dBFS values by performing 20.0f * log10f( rmsVal );. This will return you a value from 0dB (max amplitude) down to -infinity dB (min amplitude). Be aware amplitudes do not range from -1 to 1.
To help you along, here is a bit of my C++ based FFT class for iPhone (which uses vDSP under the hood):
MacOSFFT::MacOSFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetup = (void*)vDSP_create_fftsetup( mFFTOrder, 0 );
mImagBuffer.resize( 1 << mFFTOrder );
mRealBufferOut.resize( 1 << mFFTOrder );
mImagBufferOut.resize( 1 << mFFTOrder );
}
MacOSFFT::~MacOSFFT()
{
vDSP_destroy_fftsetup( (FFTSetup)mFFTSetup );
}
bool MacOSFFT::ForwardFFT( std::vector< std::complex< float > >& outVec, const std::vector< float >& inVec )
{
return ForwardFFT( &outVec.front(), &inVec.front(), inVec.size() );
}
bool MacOSFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
// Bring in a pre-allocated imaginary buffer that is initialised to 0.
DSPSplitComplex dspscIn;
dspscIn.realp = (float*)pIn;
dspscIn.imagp = &mImagBuffer.front();
DSPSplitComplex dspscOut;
dspscOut.realp = &mRealBufferOut.front();
dspscOut.imagp = &mImagBufferOut.front();
vDSP_fft_zop( (FFTSetup)mFFTSetup, &dspscIn, 1, &dspscOut, 1, mFFTOrder, kFFTDirection_Forward );
vDSP_ztoc( &dspscOut, 1, (DSPComplex*)pOut, 1, num );
return true;
}
It seems that you're looking for Fast Fourier Transform sample code.
It is quite a large topic to cover in an answer.
The tools you will need are already build in iOS: vDSP API
This should help you: vDSP Programming Guide
And there is also a FFT Sample Code available
You might also want to check out iPhoneFFT. Though that code is slighlty
outdated it can help you understand processes "under-the-hood".
Refer to auriotouch2 example from Apple - it has everything from frequency analysis to UI representation of what you want.