Using ICA over MIT BIH NST dataset - machine-learning

In this dataset, I wanted to use signals with same unit and different SNR as input signals in ICA, i.e.
ica_input = np.array([ record_118e_6(MLII),
record_118e00(MLII),
record_118e06(MLII),
record_118e12(MLII),
record_118e18(MLII)
])
Is this a correct input to ICA?
Can I here consider the signal with different SNR linear mix of noises and true signal?

According to this article (which states how the dataset is generated by nst), The above input channels are linear mix of noise and clean signal and from plotting one can see that this data is clearly non gaussian hence ICA can be used in this case. Please correct me if I am wrong.

Related

How do I perform a differentiable operation selection in TensorFlow?

I am trying to produce a mathematical operation selection nn model, which is based on the scalar input. The operation is selected based on the softmax result which is produce by the nn. Then this operation has to be applied to the scalar input in order to produce the final output. So far I’ve come up with applying argmax and onehot on the softmax output in order to produce a mask which then is applied on the concated values matrix from all the possible operations to be performed (as show in the pseudo code below). The issue is that neither argmax nor onehot appears to be differentiable. I am new to this, so any would be highly appreciated. Thanks in advance.
#perform softmax
logits = tf.matmul(current_input, W) + b
softmax = tf.nn.softmax(logits)
#perform all possible operations on the input
op_1_val = tf_op_1(current_input)
op_2_val = tf_op_2(current_input)
op_3_val = tf_op_2(current_input)
values = tf.concat([op_1_val, op_2_val, op_3_val], 1)
#create a mask
argmax = tf.argmax(softmax, 1)
mask = tf.one_hot(argmax, num_of_operations)
#produce the input, by masking out those operation results which have not been selected
output = values * mask
I believe that this is not possible. This is similar to Hard Attention described in this paper. Hard attention is used in Image captioning to allow the model to focus only on a certain part of the image at each step. Hard attention is not differentiable but there are 2 ways to go around this:
1- Use Reinforcement Learning (RL): RL is made to train models that makes decisions. Even though, the loss function won't back-propagate any gradients to the softmax used for the decision, you can use RL techniques to optimize the decision. For a simplified example, you can consider the loss as penalty, and send to the node, with the maximum value in the softmax layer, a policy gradient proportional to the penalty in order to decrease the score of the decision if it was bad (results in a high loss).
2- Use something like soft attention: instead of picking only one operation, mix them with weights based on the softmax. so instead of:
output = values * mask
Use:
output = values * softmax
Now, the operations will converge down to zero based on how much the softmax will not select them. This is easier to train compared to RL but it won't work if you must completely remove the non-selected operations from the final result (set them to zero completely).
This is another answer that talks about Hard and Soft attention that you may find helpful: https://stackoverflow.com/a/35852153/6938290

Should I normalize my features before throwing them into RNN?

I am playing some demos about recurrent neural network.
I noticed that the scale of my data in each column differs a lot. So I am considering to do some preprocess work before I throw data batches into my RNN. The close column is the target I want to predict in the future.
open high low volume price_change p_change ma5 ma10 \
0 20.64 20.64 20.37 163623.62 -0.08 -0.39 20.772 20.721
1 20.92 20.92 20.60 218505.95 -0.30 -1.43 20.780 20.718
2 21.00 21.15 20.72 269101.41 -0.08 -0.38 20.812 20.755
3 20.70 21.57 20.70 645855.38 0.32 1.55 20.782 20.788
4 20.60 20.70 20.20 458860.16 0.10 0.48 20.694 20.806
ma20 v_ma5 v_ma10 v_ma20 close
0 20.954 351189.30 388345.91 394078.37 20.56
1 20.990 373384.46 403747.59 411728.38 20.64
2 21.022 392464.55 405000.55 426124.42 20.94
3 21.054 445386.85 403945.59 473166.37 21.02
4 21.038 486615.13 378825.52 461835.35 20.70
My question is, is preprocessing the data with, say StandardScaler in sklearn necessary in my case? And why?
(You are welcome to edit my question)
It will be beneficial to normalize your training data. Having different features with widely different scales fed to your model will cause the network to weight the features not equally. This can cause a falsely prioritisation of some features over the others in the representation.
Despite that the whole discussion on data preprocessing is controversial either on when exactly it is necessary and how to correctly normalize the data for each given model and application domain there is a general consensus in Machine Learning that running a Mean subtraction as well as a general Normalization preprocessing step is helpful.
In the case of Mean subtraction, the mean of every individual feature is being subtracted from the data which can be interpreted as centering the data around the origin from a geometric point of view. This is true for every dimensionality.
Normalizing the data after the Mean subtraction step results in a normalization of the data dimensionality to approximately the same scale. Note that the different features will loose any prioritization over each other after this step as mentioned above. If you have good reasons to think that the different scales in your features bear important information that the network may need to truly understand the underlying patterns in your dataset, then a normalization will be harmful. A standard approach would be to scale the inputs to have mean of 0 and a variance of 1.
Further preprocessing operations may be helpful in specific cases such as performing PCA or Whitening on your data. Look into the awesome notes of CS231n (Setting up the data and the model) for further reference on these topics as well as for a more detailed explenation of the topics above.
Definetly yes. Most of neural networks work best with data beetwen 0-1 or -1 to 1(depends on output function). Also when some inputs are higher then others network will "think" they are more important. This can make learning very long. Network must first lower weights in this inputs.
I found this https://arxiv.org/abs/1510.01378
If you normalize it may improve convergence so you will get lower training times.

How are the following types of neural network-like techniques called?

The neural network applications I've seen always learn the weights of their inputs and use fixed "hidden layers".
But I'm wondering about the following techniques:
1) fixed inputs, but the hidden layers are no longer fixed, in the sense that the functions of the input they compute can be tweaked (learned)
2) fixed inputs, but the hidden layers are no longer fixed, in the sense that although they have clusters which compute fixed functions (multiplication, addition, etc... just like ALUs in a CPU or GPU) of their inputs, the weights of the connections between them and between them and the input can be learned (this should in some ways be equivalent to 1) )
These could be used to model systems for which we know the inputs and the output but not how the input is turned into the output (figuring out what is inside a "black box"). Do such techniques exist and if so, what are they called?
For part (1) of your question, there are a couple of relatively recent techniques that come to mind.
The first one is a type of feedforward layer called "maxout" which computes a piecewise linear output function of its inputs.
Consider a traditional neural network unit with d inputs and a linear transfer function. We can describe the output of this unit as a function of its input z (a vector with d elements) as g(z) = w z, where w is a vector with d weight values.
In a maxout unit, the output of the unit is described as
g(z) = max_k w_k z
where w_k is a vector with d weight values, and there are k such weight vectors [w_1 ... w_k] per unit. Each of the weight vectors in the maxout unit computes some linear function of the input, and the max combines all of these linear functions into a single, convex, piecewise linear function. The individual weight vectors can be learned by the network, so that in effect each linear transform learns to model a specific part of the input (z) space.
You can read more about maxout networks at http://arxiv.org/abs/1302.4389.
The second technique that has recently been developed is the "parametric relu" unit. In this type of unit, all neurons in a network layer compute an output g(z) = max(0, w z) + a min(w z, 0), as compared to the more traditional rectified linear unit, which computes g(z) = max(0, w z). The parameter a is shared across all neurons in a layer in the network and is learned along with the weight vector w.
The prelu technique is described by http://arxiv.org/abs/1502.01852.
Maxout units have been shown to work well for a number of image classification tasks, particularly when combined with dropout to prevent overtraining. It's unclear whether the parametric relu units are extremely useful in modeling images, but the prelu paper gets really great results on what has for a while been considered the benchmark task in image classification.

N step fft in D language

I am using fft function from std.numeric
Complex!double[] resultfft = fft(timeDomainAmplitudeVal);
The parameter timeDomainAmplitudeVal is audio amplitude data. Sample rate 44100 hz and there is 131072(2^16) samples
I am seeing that resultfft has the same size as timeDomainAmplitudeVal(131072) which does not fits my project(also makes no sense) . I need to be able to divide FFT to N equally spaced frequencies. And I need this N to be defined by me .
Is there anyway to implement this with std.numeric.fft or can you have any advices for fft library?
Ps: I will be glad to hear if some DSP libraries exist also
That's just how Fourier transforms work in the practical number-crunching world. Give S samples of signal, get S amplitudes. (Ignoring issues with complex numbers and symmetries.)
If you want N amplitudes, you'll have to interpolate the S-points amplitudes you get from FFT. Your biggest decision is to choose between linear, cubic, truncated sinc, etc.
Altnernative: resample the original audio signal to have your desired N samples in the same overall time interval. Then FFT it.
take a look at pfft, a fast FFT written in D.
http://jerro.github.io/pfft/doc/pfft.pfft.html
or numpy & Pyd
http://docs.scipy.org/doc/numpy/reference/routines.fft.html
http://pyd.dsource.org/
HTH
This is absolutely normal that the FFT gives the same data length.
Here some C++ code to perform windows FFT analysis with overlap and optional "zero-phase" ordering. http://pastebin.com/4YKgbed1
What do FFT coefficients mean?
Question: "OK so I've done the FFT and I'm said I can recover the original signal. Now, what are these coefficients."
Answer: "You can think of coefficient i as representing the phase and amplitude of frequencies from SR*i/(2*N) to SR*(i+1)/(2*N). This is a helpful metaphor. But a more accurate view is that coefficient i is the contribution of a sine of frequency SR*i/(2*N) in a reconstruction of the original input chunk."

HMM - Training data and format

I'm wanting to implement an HMM (Hidden Markov Model) in order to identify particular words. So far, I have managed to extract the Coefficients (MFCC) of the signal and wondered if this is ok data in order to train the HMM?
Also, is the format (below) correct for training the HMM?
The format:
Foreach sample, there are a sequence of MFCC Coefficients, I have provided two of these samples as an example...
-13.8033 0.645476 3.2174 -0.625136 -0.470134 -2.96368 0.701151 0.464246 1.1898 -1.88515 0.0805242 0.311573 0.732487
-19.4252 -5.65454 0.853437 0.317219 0.146167 -1.93742 0.381944 -2.01793 -0.561144 -0.896783 -0.105491 -1.06504 -0.797318
Hope someone can help :)
You can have two approaches.
One is doing vector quantization on those vectors in order to convert the continuos MFCC vectors into discretes observations for the HMM.
Other is perform the training in HMM using a continuos approach.
You can see more in this thread:
Simple speech recognition from scratch

Resources