MCMC Trace Plot in Edward - machine-learning

I'm using a Dirichlet Process Mixture Model (DPMM) to infer cluster assignments and cluster parameters on a synthetic dataset using Edward based on the following community post. I'm using GPU-accelerated Metropolis Hastings to learn the posterior distribution over model parameters. For example, for cluster means, we have:
D = 2 #dimension of the data
K = 5 #cluster truncation
T = 10000 #number of samples
mu = Normal(loc=tf.zeros(D), scale=tf.ones(D), sample_shape=K)
qmu = Normal(loc=tf.zeros(D), scale=tf.ones(D), sample_shape=K) #posterior
gmu = Normal(loc=tf.zeros(D), scale=tf.ones(D), sample_shape=K) #proposal
inference = ed.MetropolisHastings(
latent_vars={mu: qmu, ...},
proposal_vars={mu: gmu, ...},
data={x: x_data})
I'm interested in generating a trace-plot to visualize samples from the posterior distribution qmu. I'm looking for something similar to PyMC pm.traceplot()
How do I generate a trace plot in Edward?

For an Empirical distribution used in sampling, we can access the sampled values as follows:
thin=4
burnin=2000
qmu_trace = qmu.params[burnin::thin].eval()
We can then plot the trace and compute histogram and auto-correlation as usual.

Related

While applying MinMax scaling does each column needs to be treated independently for train and test?

So, I have 8 columns in my dataframe: 5 features and other 3 are targets. After following these process, the results obtained are not good. Can anyone provide any feedback in the steps followed?
Here I am defining 2 minmax scaling variables, one for features and other for targets columns. Once model predicts the values, we run reverse scaling on features and predicted targets again to obtain the results.
#smoothening and minMax scaling
smoother=tsmoothie.KalmanSmoother(component='level_trend', component_noise={'level':0.1, 'trend':0.1})
scaler_features = MinMaxScaler(feature_range=(0,1))
scaler_targets = MinMaxScaler(feature_range=(0,1))
#setting up features and targets from the df
df_norm_feature = scaler_features.fit_transform(raw_df.iloc[:,:5])
df_norm_target = scaler_targets.fit_transform(raw_df.iloc[:,5:])
#smoothening features and targets
smoother.smooth(df_norm_feature)
smoothed_features = smoother.smooth_data
smoother.smooth(df_norm_target)
smoothed_targets = smoother.smooth_data
#split into train test and train the data, and prepare the model on train.
#for reverse transformation I am using the following code.
test_resultsForAll = mode.predict(test_data)
transformed_test_resultsForAll = scaler_targets.inverse_transform(test_resultsForAll))
but the results obtained via this method are not good. Are there any mistakes in the order of steps or do I need to perform minMax scaling & smoothening on the whole dataset at once?

MLJ: selecting rows and columns for training in evaluate

I want to implement a kernel ridge regression that also works within MLJ. Moreover, I want to have the option to use either feature vectors or a predefined kernel matrix as in Python sklearn.
When I run this code
const MMI = MLJModelInterface
MMI.#mlj_model mutable struct KRRModel <: MLJModelInterface.Deterministic
mu::Float64 = 1::(_ > 0)
kernel::String = "linear"
end
function MMI.fit(m::KRRModel,verbosity::Int,K,y)
K = MLJBase.matrix(K)
fitresult = inv(K+m.mu*I)*y
cache = nothing
report = nothing
return (fitresult,cache,report)
end
N = 10
K = randn(N,N)
K = K*K
a = randn(N)
y = K*a + 0.2*randn(N)
m = KRRModel()
kregressor = machine(m,K,y)
cv = CV(; nfolds=6, shuffle=nothing, rng=nothing)
evaluate!(kregressor, resampling=cv, measure=rms, verbosity=1)
the evaluate! function evaluates the machine on different subsets of rows of K. Due to the Representer Theorem, a kernel ridge regression has a number of nonzero coefficients equal to the number of samples. Hence, a reduced size matrix K[train_rows,train_rows] can be used instead of K[train_rows,:].
To denote I'm using a kernel matrix I'd set m.kernel = "" . How do I make evaluate! select the columns as well as the rows to form a smaller matrix when m.kernel = ""?
This is my first time using MLJ and I'd like to make as few modifications as possible.
Quoting the answer I got on the Julia Discourse from #ablaom
The intended use of evaluate! is to estimate the generalisation error
associated with some supervised learning model, by subsampling
observations, as in cross-validation, a common use-case. I’m afraid
there is no natural way for evaluate! do feature subsampling.
https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/
FYI: There is a version of kernel regression implementing the MLJ
model interface, namely kernel partial least squares regression from
the package GitHub - lalvim/PartialLeastSquaresRegressor.jl:
Implementation of a Partial Least Squares Regressor 2 .

How to Decompose and Visualise Slope Component in Tensorflow Probability

I'm running tensorflow 2.1 and tensorflow_probability 0.9. I have fit a Structural Time Series Model with a seasonal component. I am using code from the Tensorflow Probability Structural Time Series Probability example:
Tensorflow Github.
In the example there is a great plot where the decomposition is visualised:
# Get the distributions over component outputs from the posterior marginals on
# training data, and from the forecast model.
component_dists = sts.decompose_by_component(
demand_model,
observed_time_series=demand_training_data,
parameter_samples=q_samples_demand_)
forecast_component_dists = sts.decompose_forecast_by_component(
demand_model,
forecast_dist=demand_forecast_dist,
parameter_samples=q_samples_demand_)
demand_component_means_, demand_component_stddevs_ = (
{k.name: c.mean() for k, c in component_dists.items()},
{k.name: c.stddev() for k, c in component_dists.items()})
(
demand_forecast_component_means_,
demand_forecast_component_stddevs_
) = (
{k.name: c.mean() for k, c in forecast_component_dists.items()},
{k.name: c.stddev() for k, c in forecast_component_dists.items()}
)
When using a trend component, is it possible to decompose and visualise both:
trend/_level_scale & trend/_slope_scale
I have tried many permutations to extract the nested element of the trend component with no luck.
Thanks for your time in advance.
We didn't write a separate STS interface for this, but you can access the posterior on latent states (in this case, both the level and slope) by directly querying the underlying state-space model for its marginal means and covariances:
ssm = model.make_state_space_model(
num_timesteps=num_timesteps,
param_vals=parameter_samples)
posterior_means, posterior_covs = (
ssm.posterior_marginals(observed_time_series))
You should also be able to draw samples from the joint posterior by running ssm.posterior_sample(observed_time_series, num_samples).
It looks like there's currently a glitch when drawing posterior samples from a model with no batch shape (Could not find valid device for node. Node:{{node Reshape}}): while we fix that, it should work to add an artificial batch dimension as a workaround:
ssm.posterior_sample(observed_time_series[tf.newaxis, ...], num_samples).

How to check deep embedded clustering on new data?

I'm using DEC from mxnet (https://github.com/apache/incubator-mxnet/tree/master/example/deep-embedded-clustering)
While it defaults to run on the MNIST, I have changed the datasource to several hundreds of documents (which should be perfectly fine, given that mxnet can work with the Reuters dataset)
The question; after training MXNET, how can I use it on new, unseen data? It shows me a new prediction each time!
Here is the code for collecting the dataset:
vectorizer = TfidfVectorizer(dtype=np.float64, stop_words='english', max_features=2000, norm='l2', sublinear_tf=True).fit(training)
X = vectorizer.transform(training)
X = np.asarray(X.todense()) # * np.sqrt(X.shape[1])
Y = np.asarray(labels)
Here is the code for prediction:
def predict(self, TrainX, X, update_interval=None):
N = TrainX.shape[0]
if not update_interval:
update_interval = N
batch_size = 256
test_iter = mx.io.NDArrayIter({'data': TrainX}, batch_size=batch_size, shuffle=False,
last_batch_handle='pad')
args = {k: mx.nd.array(v.asnumpy(), ctx=self.xpu) for k, v in self.args.items()}
z = list(model.extract_feature(self.feature, args, None, test_iter, N, self.xpu).values())[0]
kmeans = KMeans(self.num_centers, n_init=20)
kmeans.fit(z)
args['dec_mu'][:] = kmeans.cluster_centers_
print(args)
sample_iter = mx.io.NDArrayIter({'data': X})
z = list(model.extract_feature(self.feature, args, None, sample_iter, N, self.xpu).values())[0]
p = np.zeros((z.shape[0], self.num_centers))
self.dec_op.forward([z, args['dec_mu'].asnumpy()], [p])
print(p)
y_pred = p.argmax(axis=1)
self.y_pred = y_pred
return y_pred
Explanation: I thought I also need to pass a sample of the data I trained the system with. That is why you see both TrainX and X there.
Any help is greatly appreciated.
Clustering methods (by themselves) don't provide a method for labelling samples that weren't included in the calculation for deriving the clusters. You could re-run the clustering algorithm with the new samples, but the clusters are likely to change and be given different cluster labels due to different random initializations. So this is probably why you're seeing different predictions each time.
One option is to use the cluster labels from the clustering method in a supervised way, to predict the cluster labels for new samples. You could find the closest cluster center to your new sample (in the feature space) and use that as the cluster label, but this ignores the shape of the clusters. A better solution would be to train a classification model to predict the cluster labels for new samples given the previously clustered data. Success of these methods will depend on the quality of your clustering (i.e. the feature space used, separability of clusters, etc).

Feature Vectors in Radial Basis Function Network

I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN.
Can any one please help me to understand this one.
A goal to get to this:
From inputs like this:
An RBF network essentially involves fitting data with a linear combination of functions that obey a set of core properties -- chief among these is radial symmetry. The parameters of each of these functions is learned by incremental adjustment based on errors generated through repeated presentation of inputs.
If I understand (it's been a very long time since I used one of these networks), your question pertains to preprocessing of the data in the point cloud. I believe that each of the points in your point cloud should serve as one input. If I understand properly, the features are your three dimensions, and as such each point can already be considered a "feature vector."
You have other choices that remain, namely the number of radial basis neurons in your hidden layer, and the radial basis functions to use (a Gaussian is a popular first choice). The training of the network and the surface reconstruction can be done in a number of ways but I believe this is beyond the scope of the question.
I don't know if it will help, but here's a simple python implementation of an RBF network performing function approximation, with one-dimensional inputs:
import numpy as np
import matplotlib.pyplot as plt
def fit_me(x):
return (x-2) * (2*x+1) / (1+x**2)
def rbf(x, mu, sigma=1.5):
return np.exp( -(x-mu)**2 / (2*sigma**2));
# Core parameters including number of training
# and testing points, minimum and maximum x values
# for training and testing points, and the number
# of rbf (hidden) nodes to use
num_points = 100 # number of inputs (each 1D)
num_rbfs = 20.0 # number of centers
x_min = -5
x_max = 10
# Training data, evenly spaced points
x_train = np.linspace(x_min, x_max, num_points)
y_train = fit_me(x_train)
# Testing data, more evenly spaced points
x_test = np.linspace(x_min, x_max, num_points*3)
y_test = fit_me(x_test)
# Centers of each of the rbf nodes
centers = np.linspace(-5, 10, num_rbfs)
# Everything is in place to train the network
# and attempt to approximate the function 'fit_me'.
# Start by creating a matrix G in which each row
# corresponds to an x value within the domain and each
# column i contains the values of rbf_i(x).
center_cols, x_rows = np.meshgrid(centers, x_train)
G = rbf(center_cols, x_rows)
plt.plot(G)
plt.title('Radial Basis Functions')
plt.show()
# Simple training in this case: use pseudoinverse to get weights
weights = np.dot(np.linalg.pinv(G), y_train)
# To test, create meshgrid for test points
center_cols, x_rows = np.meshgrid(centers, x_test)
G_test = rbf(center_cols, x_rows)
# apply weights to G_test
y_predict = np.dot(G_test, weights)
plt.plot(y_predict)
plt.title('Predicted function')
plt.show()
error = y_predict - y_test
plt.plot(error)
plt.title('Function approximation error')
plt.show()
First, you can explore the way in which inputs are provided to the network and how the RBF nodes are used. This should extend to 2D inputs in a straightforward way, though training may get a bit more involved.
To do proper surface reconstruction you'll likely need a representation of the surface that is altogether different than the representation of the function that's learned here. Not sure how to take this last step.

Resources