Convert dgeMatrix for downstream tasks - text2vec

I am trying to cluster sentence embeddings based on Glove model from text2vec. I generated the embeddings using the glove model like so (I create the iterator, vocab etc in the standard way).
# create document term matrix
dtm = create_dtm(it, vectorizer)
# assign the word embeddings
common_terms = intersect(colnames(dtm), rownames(word_vectors) )
# normalise
dtm_averaged <- text2vec::normalize(dtm[, common_terms], "l1")
# compute average sentence embeddings
sentence_vectors = dtm_averaged %*% word_vectors[common_terms, ]
The resulting object is of dgeMatrix class, which is equivalent to matrix class as I understand. dgeMatrix class isn't used for many downstream tasks so I would like to convert the matrix. The object, however, is 6GB large, and I have problems converting the matrix to a data frame or even text file for further processing.
Ideally , I'd use this matrix in Spark for further analysis such as k-means clustering. My question what would be the best strategy to use the matrix for downstream tasks.
a) Convert to matrix class or data frame
b) write the matrix to file?
c) something completely different
I run the models on Google Cloud and have a machine with 32gb ram and 28 cpu.
Thanks for your help.

Related

How to use DeepQLearning in Julia for very large states?

I would like to use the DeepQLearning.jl package from https://github.com/JuliaPOMDP/DeepQLearning.jl. In order to do so, we have to do something similar to
using DeepQLearning
using POMDPs
using Flux
using POMDPModels
using POMDPSimulators
using POMDPPolicies
# load MDP model from POMDPModels or define your own!
mdp = SimpleGridWorld();
# Define the Q network (see Flux.jl documentation)
# the gridworld state is represented by a 2 dimensional vector.
model = Chain(Dense(2, 32), Dense(32, length(actions(mdp))))
exploration = EpsGreedyPolicy(mdp, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))
solver = DeepQLearningSolver(qnetwork = model, max_steps=10000,
exploration_policy = exploration,
learning_rate=0.005,log_freq=500,
recurrence=false,double_q=true, dueling=true, prioritized_replay=true)
policy = solve(solver, mdp)
sim = RolloutSimulator(max_steps=30)
r_tot = simulate(sim, mdp, policy)
println("Total discounted reward for 1 simulation: $r_tot")
In the line mdp = SimpleGridWorld(), we create the MDP. When I was trying to create the MDP, I had the problem of very large state space. A state in my MDP is a vector in {1,2,...,m}^n for some m and n. So, when defining the function POMDPs.states(mdp::myMDP), I realized that I must iterate over all the states which are very large, i.e., m^n.
Am I using the package in the wrong way? Or we must iterate the states even if there are exponentially many? If the latter, then what is the point of using Deep Q Learning? I thought, Deep Q Learning can help when the action and state spaces are very large.
DeepQLearning does not require to enumerate the state space and can handle continuous space problems.
DeepQLearning.jl only uses the generative interface of POMDPs.jl. As such, you do not need to implement the states function but just gen and initialstate (see the link on how to implement the generative interface).
However, due to the discrete action nature of DQN you also need POMDPs.actions(mdp::YourMDP) which should return an iterator over the action space.
By making those modifications to your implementation you should be able to use the solver.
The neural network in DQN takes as input a vector representation of the state. If your state is a m dimensional vector, the neural network input will be of size m. The output size of the network will be equal to the number of actions in your model.
In the case of the grid world example, the input size of the Flux model is 2 (x, y positions) and the output size is length(actions(mdp))=4.

How to Decompose and Visualise Slope Component in Tensorflow Probability

I'm running tensorflow 2.1 and tensorflow_probability 0.9. I have fit a Structural Time Series Model with a seasonal component. I am using code from the Tensorflow Probability Structural Time Series Probability example:
Tensorflow Github.
In the example there is a great plot where the decomposition is visualised:
# Get the distributions over component outputs from the posterior marginals on
# training data, and from the forecast model.
component_dists = sts.decompose_by_component(
demand_model,
observed_time_series=demand_training_data,
parameter_samples=q_samples_demand_)
forecast_component_dists = sts.decompose_forecast_by_component(
demand_model,
forecast_dist=demand_forecast_dist,
parameter_samples=q_samples_demand_)
demand_component_means_, demand_component_stddevs_ = (
{k.name: c.mean() for k, c in component_dists.items()},
{k.name: c.stddev() for k, c in component_dists.items()})
(
demand_forecast_component_means_,
demand_forecast_component_stddevs_
) = (
{k.name: c.mean() for k, c in forecast_component_dists.items()},
{k.name: c.stddev() for k, c in forecast_component_dists.items()}
)
When using a trend component, is it possible to decompose and visualise both:
trend/_level_scale & trend/_slope_scale
I have tried many permutations to extract the nested element of the trend component with no luck.
Thanks for your time in advance.
We didn't write a separate STS interface for this, but you can access the posterior on latent states (in this case, both the level and slope) by directly querying the underlying state-space model for its marginal means and covariances:
ssm = model.make_state_space_model(
num_timesteps=num_timesteps,
param_vals=parameter_samples)
posterior_means, posterior_covs = (
ssm.posterior_marginals(observed_time_series))
You should also be able to draw samples from the joint posterior by running ssm.posterior_sample(observed_time_series, num_samples).
It looks like there's currently a glitch when drawing posterior samples from a model with no batch shape (Could not find valid device for node. Node:{{node Reshape}}): while we fix that, it should work to add an artificial batch dimension as a workaround:
ssm.posterior_sample(observed_time_series[tf.newaxis, ...], num_samples).

Feed a complex-valued image into Neural network (tensorflow)

I'm working on a project which tries to "learn" a relationship between a set of around 10 k complex-valued input images (amplitude/phase; real/imag) and a real-valued output-vector with 48 entries. This output-vector is not a set of labels, but a set of numbers which represents the best parameters to optimize the visual impression of the given complex-valued image. These parameters are generated by an algorithm. It's possible, that there is some noise in the data (comming from images and from the algorithm which generates the parameter-vector)
Those parameters more-less depends on the FFT (fast-fourier-transform) of the input image. Therfore I was thinking of feeding the network (5 hidden-layers, but architecture shouldn't matter right now) with a 1D-reshaped version of the FFT(complexImage) - some pseudocode:
// discretize spectrum
obj_ft = fftshift(fft2(object));
obj_real_2d = real(obj_ft);
obj_imag_2d = imag(obj_ft);
// convert 2D in 1D rows
obj_real_1d = reshape(obj_real_2d, 1, []);
obj_imag_1d = reshape(obj_imag_2d, 1, []);
// create complex variable for 1d object and concat
obj_complx_1d(index, :) = [obj_real_1d obj_imag_1d];
opt_param_1D(index, :) = get_opt_param(object);
I was wondering if there is a better approach for feeding complex-valued images into a deep-network. I'd like to avoid the use of complex gradients, because it's not really necessary?! I "just" try to find a "black-box" which outputs the optimized parameters after inserting a new image.
Tensorflow gets the input: obj_complx_1d and output-vector opt_param_1D for training.
There are several ways you can treat complex signals as input.
Use a transform to make them into 'images'. Short Time Fourier Transforms are used to make spectrograms which are 2D. The x-axis being time, y-axis being frequency. If you have complex input data, you may choose to simply look at the magnitude spectrum, or the power spectral density of your transformed data.
Something else that I've seen in practice is to treat the in-phase and quadrature (real/imaginary) channels separate in early layers of the network, and operate across both in higher layers. In the early layers, your network will learn characteristics of each channel, in higher layers it will learn the relationship between the I/Q channels.
These guys do a lot with complex signals and neural nets. In particular check out 'Convolutional Radio Modulation Recognition Networks'
https://radioml.com/research/
The simplest way to feed complex valued numbers with out using complex gradients in your models is to represent the complex values in a different representation. The two main ways are:
Magnitude/Angle components
Real/Imaginary components
I'll show this idea using magnitude/angle components. Assuming you have a 2d numpy array representing an image with shape = (WIDTH, HEIGHT)
import numpy as np
kSpace = np.fft.ifftshift(np.fft.fft2(img))
This would give you a 2D complex array. You can then transform the array into a
data = np.dstack((np.abs(kSpace), np.angle(kSpace)))
This array will be a numpy array with shape = (WIDTH, HEIGHT, 2). This array represents one complex valued image. For a set of images, make sure to concatenate them together to get an array of shape = (NUM_IMAGES, WIDTH, HEIGHT, 2)
I made a simple example of using tensorflow to learn an Fourier Transform with a simple neural network. You can find this example at https://github.com/michaelmendoza/learning-tensorflow

How to construct a character based seq2seq model in tensorflow

What changes are required to the existing seq2seq model in tensorflow so that I can use character units rather then the existing word units for the seq2seq task? And will this be a good configuration for a predictive ext application?
The following function signatures may need modification for this task:
def embedding_rnn_seq2seq(encoder_inputs, decoder_inputs, cell,
num_encoder_symbols, num_decoder_symbols,
output_projection=None, feed_previous=False,
dtype=dtypes.float32, scope=None):
Apart from reduced input output vocabulary what other parameter changes would be be required to implement such a character level seq2seq model ?
I think you could use the existing seq2seq model in tensorflow without any code changes for character-based units if you prepare your input data files by whitespace separating your training examples like this:
The quick brown fox.
Becomes:
T h e _SPACE_ q u i c k _SPACE_ b r o w n _SPACE_ f o x .
Then your vocabulary naturally becomes characters not words.
You can experiment vocab sizes, with embedding size, eliminate embedding layer, etc. to see what works best for your data.

Translating a TensorFlow LSTM into synapticjs

I'm working on implementing an interface between a TensorFlow basic LSTM that's already been trained and a javascript version that can be run in the browser. The problem is that in all of the literature that I've read LSTMs are modeled as mini-networks (using only connections, nodes and gates) and TensorFlow seems to have a lot more going on.
The two questions that I have are:
Can the TensorFlow model be easily translated into a more conventional neural network structure?
Is there a practical way to map the trainable variables that TensorFlow gives you to this structure?
I can get the 'trainable variables' out of TensorFlow, the issue is that they appear to only have one value for bias per LSTM node, where most of the models I've seen would include several biases for the memory cell, the inputs and the output.
Internally, the LSTMCell class stores the LSTM weights as a one big matrix instead of 8 smaller ones for efficiency purposes. It is quite easy to divide it horizontally and vertically to get to the more conventional representation. However, it might be easier and more efficient if your library does the similar optimization.
Here is the relevant piece of code of the BasicLSTMCell:
concat = linear([inputs, h], 4 * self._num_units, True)
# i = input_gate, j = new_input, f = forget_gate, o = output_gate
i, j, f, o = array_ops.split(1, 4, concat)
The linear function does the matrix multiplication to transform the concatenated input and the previous h state into 4 matrices of [batch_size, self._num_units] shape. The linear transformation uses a single matrix and bias variables that you're referring to in the question. The result is then split into different gates used by the LSTM transformation.
If you'd like to explicitly get the transformations for each gate, you can split that matrix and bias into 4 blocks. It is also quite easy to implement it from scratch using 4 or 8 linear transformations.

Resources