tf.nn.embedding_lookup with float input? - machine-learning

I would like to implement an embedding table with float inputs instead of int32 or 64b.
The reason is that instead of words like in a simple RNN, I would like to use percentages.
For example in case of a recipe; I may have 1000 or 3000 ingredients; but in every recipe I may have a maximum of 80.
The ingredients will be represented in percentage for example: ingredient1=0.2 ingredient2=0.8... etc
my problem is that tensorflow forces me to use integers for my embedding table:
TypeError: Value passed to parameter ‘indices’ has DataType float32 not in list of allowed values: int32, int64
any suggestion?
I appreciate your feedback,
example of embedding look up:
inputs = tf.placeholder(tf.float32, shape=[None, ninp], name=“x”)
n_vocab = len(int_to_vocab)
n_embedding = 200 # Number of embedding features
with train_graph.as_default():
embedding = tf.Variable(tf.random_uniform((n_vocab, n_embedding), -1, 1))
embed = tf.nn.embedding_lookup(embedding, inputs)
the error is caused by
inputs = tf.placeholder(**tf.float32,** shape=[None, ninp], name=“x”)
I have thought of an algorithm that could work using loops. But, I was wondering if there is a more direct solution.
Thanks!

tf.nn.embedding_lookup can't allow float input, because the point of this function is to select the embeddings at the specified rows.
Example:
Here there are 5 words and 5 embedding 3D vectors, and the operation returns the 3-rd row (with 0-indexing). This is equivalent to this line in tensorflow:
embed = tf.nn.embedding_lookup(embed_matrix, [3])
You can't possibly look up a floating point index, such as 0.2 or 0.8, because there is no 0.2 and 0.8 row index in the matrix. Highly recommend this post by Chris McCormick about word2vec.
What you describe sounds more like a softmax loss function, which outputs a probability distribution over the target classes.

Related

Creating a ML algorithm where the train data does not have same number of columns in all records

So I have the following train data (no header, explanation bellow):
[1.3264,1.3264,1.3263,1.32632]
[2.32598,2.3256,2.3257,2.326,2.3256,2.3257,2.32566]
[10.3215,10.3215,10.3214,10.3214,10.3214,10.32124]
It does not have an header because all elements with exception of the last 1 on each array are inputs and the last one is the result/output.
So taking first example: 1.3264,1.3264,1.3263 are inputs/feed data that I want to give to the algorith and 1.32632 is the outcome/result.
All of these are historical values that would lead to a pattern recognition.
I would like to give some test data to the algorith and he would give me outcome/result based on that pattern he identified.
From all the examples I looked into with ML and sklearn, I have never seen one where you have(for the same type of data) multiple entries. They all seem to have the same number of columns and diferent types of inputs whereas mine is always the same type of input.
You can try two different approaches:
Extract features from your variable length data to make the features have fixed size. After that you can use any algorithm from sklearn or other packages. Feature extraction is highly domain-specific process that requires context of what the data actually is. For example you can try similar features:
import numpy as np
def extract_features_one_row(arr):
arr = np.array(arr[:-1])
y = arr[-1]
features = [
np.mean(arr),
np.sum(arr),
np.median(arr),
np.std(arr),
np.percentile(arr, 5),
np.percentile(arr, 95),
np.percentile(arr, 25),
np.percentile(arr, 75),
(arr[1:] > arr[:-1]).sum(), # number of increasing pairs
(arr > arr.mean()).sum(), # number of elements > mean value
# extract trends, number of modes, etc
]
return features, y
data = [
[1.3264, 1.3264, 1.3263, 1.32632],
[2.32598, 2.3256, 2.3257, 2.326, 2.3256, 2.3257, 2.32566],
[10.3215, 10.3215, 10.3214, 10.3214, 10.3214, 10.32124],
]
X, y = zip(*[extract_features_one_row(row) for row in data])
X = np.array(X) # (3, 10)
print(X.shape, y)
So now X_data have the same number of columns.
Use ML algorithm that supports variable length data: Recurrent neural networks, transformers, convolutional networks with padding.

Trying to do PCA analysis on interest rate swaps data (multivariate time series)

I have a data set with 20 non-overlapping different swap rates (spot1y, 1y1y, 2y1y, 3y1y, 4y1y, 5y2y, 7y3y, 10y2y, 12y3y...) over the past year.
I want to use PCA / multiregression and look at residuals in order to determine which sectors on the curve are cheap/rich. Has anyone had experience with this? I've done PCA but not for time series. I'd ideally like to model something similar to the first figure here but in USD.
https://plus.credit-suisse.com/rpc4/ravDocView?docid=kv66a7
Thanks!
Here are some broad strokes that can help answer your question. Also, that's a neat analysis from CS :)
Let's be pythonistas and use NumPy. You can imagine your dataset as a 20x261 array of floats. The first place to start is creating the array. Suppose you have a CSV file storing the raw data persistently. Then a reasonable first step to load the data would be something as simple as:
import numpy
x = numpy.loadtxt("path/to/my/file")
The object x is our raw time series matrix, and we verify the truthness of x.shape == (20, 261). The next step is to transform this array into it's covariance matrix. Whether it has been done on the raw data already, or it still has to be done, the first step is centering each time series on it's mean, like this:
x_centered = x - x.mean(axis=1, keepdims=True)
The purpose of this step is to help simplify any necessary rescaling, and is a very good habit that usually shouldn't be skipped. The call to x.mean uses the parameters axis and keepdims to make sure each row (e.g. the time series for spot1yr, ...) is centered with it's mean value.
The next steps are to square and scale x to produce a swap rate covariance array. With 2-dimensional arrays like x, there are two ways to square it-- one that leads to a 261x261 array and another that leads to a 20x20 array. It's the second array we are interested in, and the squaring procedure that will work for our purposes is:
x_centered_squared = numpy.matmul(x_centered, x_centered.transpose())
Then, to scale one can chose between 1/261 or 1/(261-1) depending on the statistical context, which looks like this:
x_covariance = x_centered_squared * (1/261)
The array x_covariance has an entry for how each swap rate changes with itself, and changes with any one of the other swap rates. In linear-algebraic terms, it is a symmetric operator that characterizes the spread of each swap rate.
Linear algebra also tells us that this array can be decomposed into it's associated eigen-spectrum, with elements in this spectrum being scalar-vector pairs, or eigenvalue-eigenvector pairs. In the analysis you shared, x_covariance's eigenvalues are plotted in exhibit two as percent variance explained. To produce the data for a plot like exhibit two (which you will always want to furnish to the readers of your PCA), you simply divide each eigenvalue by the sum of all of them, then multiply each by 100.0. Due to the convenient properties of x_covariance, a suitable way to compute it's spectrum is like this:
vals, vects = numpy.linalg.eig(x_covariance)
We are now in a position to talk about residuals! Here is their definition (with our namespace): residuals_ij = x_ij − reconstructed_ij; i = 1:20; j = 1:261. Thus for every datum in x, there is a corresponding residual, and to find them, we need to recover the reconstructed_ij array. We can do this column-by-column, operating on each x_i with a change of basis operator to produce each reconstructed_i, each of which can be viewed as coordinates in a proper subspace of the original or raw basis. The analysis describes a modified Gram-Schmidt approach to compute the change of basis operator we need, which ensures this proper subspace's basis is an orthogonal set.
What we are going to do in the approach is take the eigenvectors corresponding to the three largest eigenvalues, and transform them into three mutually orthogonal vectors, x, y, z. Research the web for active discussions and questions geared toward developing the Gram-Schmidt process for all sorts of practical applications, but for simplicity let's follow the analysis by hand:
x = vects[0] - sum([])
xx = numpy.dot(x, x)
y = vects[1] - sum(
(numpy.dot(x, vects[1]) / xx) * x
)
yy = numpy.dot(y, y)
z = vects[2] - sum(
(numpy.dot(x, vects[2]) / xx) * x,
(numpy.dot(y, vects[2]) / yy) * y
)
It's reasonable to implement normalization before or after this step, which should be informed by the data of course.
Now with the raw data, we implicitly made the assumption that the basis is standard, we need a map between {e1, e2, ..., e20} and {x,y,z}, which is given by
ch_of_basis = numpy.array([x,y,z]).transpose()
This can be used to compute each reconstructed_i, like this:
reconstructed = []
for measurement in x.transpose().tolist():
reconstructed.append(numpy.dot(ch_of_basis, measurement))
reconstructed = numpy.array(reconstructed).transpose()
And then you get the residuals by subtraction:
residuals = x - reconstructed
This flow obviously might need further tuning, but it's the gist of how to do compute all the residuals. To get that periodic bar plot, take the average of each row in residuals.

Could the inputs of the Perceptron training algorithm be had different types?

Have to have the inputs in the Perceptron training algorithm the same type?
i.e Could one input be had a boolean type and another input be had integer type?
They cannot be arbitrary. Look at the calculation steps and termination condition (convergence criterion):
Update:
y[j](t) = f[w(t) ⋅ x[j] ]
= f[w[0](t) x[j,0] + w[1](t) x[j,1] + ⋯ + w[n](t) x[j,n] ]
Convergence:
error = sum(abs(d[j] - y[j](t) )for all j)
error / j_max < epsilon
This requires that you have at least a partially-ordered data type with defined dot-product with your weight type, (usually multiplication with the weight type and addition on the product type), subtraction for the error computation, and some valid convergence value epsilon.
I strongly recommend that you stick with real (float or double) weights. Your input could be of another type if you're properly formal about the operations, but it's a bit tricky in practice: can you define that dot-product on your input and weight vector? For string input, what is
"hello" ⋅ [0.66, 0.21, -1.13]
More generally, how do you store an evaluation vector in your perceptron and then measure how well it matches the input? How do you adjust the vector in back-propagation?
If you can manage those, you can handle your input type.
Personally, I recommend that you stick with the first suggestion: map your inputs to numbers if you can.
It is based on which library you are using for building it. In general all neural networks have just one type of input - decimals. Nothing else is supported from theoretical point of view. What all libraries do under the hood is convert any other data type to decimal input. Strings are converted via dictionaries, booleans - just to 0 and 1 (or -1 and 1) etc.

Having trouble creating my Neural Network inputs

I'm currently working on a neural network that should have N parameters in input. Each parameters can have M different values (discrete values), let's say {A,B,C,…,M}. It also has a discrete number of outputs.
How can I create my inputs from this situation? Should I have N×M inputs (having 0 or 1 as value), or should I think of a different approach?
You can either have NxM boolean inputs or have N inputs where each one is a float that goes from 0 to 1. In the latter case the float values would be: {A/M, B/M, C/M, ... 1}. For example if you have 4 inputs each one with discrete values: {1,2,3,4} then you can change the domain values to {0.25 , 0.50 , 0.75 , 1.00}.
Actually there are a lot of ways to encode your inputs, but I have found better results when my inputs lie in the domain [0,1] (as there are some ML functions that expect that).

Encog query classification

I'm trying to process this dataset using Encog. In order to do so, I combined the outputs into one (can't seem to figure out how to use multiple expected outputs, even tho I unsuccessfully tried to manually train a NN with 4 output neurons) with the values: "disease1", "disease2", "none" and "both".
Starting from there, used the analyst wizard in the CSV, and the automatic process trained a NN with the expected outputs. A peak from the file:
"field:1","field:2","field:3","field:4","field:5","field:6","field:7","Output:field:7"
40.5,yes,yes,yes,yes,no,both,both
41.2,no,yes,yes,no,yes,second,second
Now my problem is: how do I query it? I tried with classification, but as far as I've understood, the result only gives me the values {0,1,2}, so there are two classes which I can't differentiate (both are 0).
This same problem applies to the Iris example presented in the wiki. Also, how does Encog extrapolate from the output neuron values to the 0/1/2 results?
Edit: the solution I have found was to use a separate network for disease 1 and disease 2, but I really would like to know if it was possible to combine those into one.
You are correct, that you will need to combine the output column to a single value. Encog analyst will only classify to a single output column. That output column can have many different values. So normalizing the two output columns to none,first,second,both will work. If you use the underlying neural networks directly, you could actually train for two outputs each doing an independent classification. But for this discussion I will assume we are dealing with the analyst.
Are you querying the network using the workbench, or in code? By default Encog analyst encodes to the neural network using equilateral encoding. This results in a number of output neurons equal to n-1, where n is the number of classes. If you choose one-of-n encoding in the analyst wizard, then the regular classify method on the BasicNetwork will work, as it is only designed for one-of-n.
If you would like to query (in code) using equilateral, then you can use a method similar to the following. I am adding this to the next version of Encog.
/**
* Used to classify a neural network that has been encoded using equilateral encoding.
* This is the default for the Encog analyst. Equilateral encoding uses an output count
* equal to the number of classes minus one.
* #param input The input to the neural network.
* #param high The high value of the activation range, usually 1.
* #param low The low end of the normalization range, usually -1 or 0.
* #return The class that the input belongs to.
*/
public int classifyEquilateral(final MLData input,double high, double low) {
MLData result = this.compute(input);
Equilateral eq = new Equilateral(getOutputCount()+1,high,low);
return eq.decode(result.getData());
}

Resources