How do I apply multiple linear transformations? - manim

I'm trying to use LinearTransformationScene's apply_matrix multiple times:
from manim import *
class LT(LinearTransformationScene):
def __init__(self):
super().__init__(
self,
show_coordinates=True,
leave_ghost_vectors=True,
)
def construct(self):
P = [[1, 1], [1, -1]];
D = [[2, 0], [0, 0.5]];
P_inv = [[0.5, 0.5], [0.5, 0.5]];
self.apply_matrix(P);
self.wait();
self.apply_matrix(D);
self.wait();
self.apply_matrix(P_inv);
self.wait();
But I get this error: submobjects must be of type VMobject.
I'm hoping to create an animation that:
Applies the matrix P
Pauses briefly
Applies another matrix D
Pauses briefly again
And finally, applies the inverse of P, P_inv.
How do I accomplish this? There were similar questions posted, but no one posted about this specific error.

These specialized Scene classes are unfortunately not very well maintained, this is a known issue. There is a simple workaround though: after calling self.apply_matrix, add
self.moving_mobjects = []
and the next application of apply_matrix will work as intended again.

Related

OperatorNotAllowedInGraphError: Iterating over a symbolic `tf.Tensor` is not allowed when using a dataset with tuples

I am trying to create my own transformer with tensorflow and of course I want to train it. For the purpuse I use dataset to handle my data. The data is created by a code snippet from the tensorflow dataset.from_tensor_slices() method documentation article. Nevertheless, tensorflow is giving me the following error when I call the fit() method:
"OperatorNotAllowedInGraphError: Iterating over a symbolic tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature."
Here is the code that I am using:
import numpy as np
import tensorflow as tf
batched_features = tf.constant([[[1, 3], [2, 3]],
[[2, 1], [1, 2]],
[[3, 3], [3, 2]]], shape=(3, 2, 2))
batched_labels = tf.constant([['A', 'A'],
['B', 'B'],
['A', 'B']], shape=(3, 2, 1))
dataset = tf.data.Dataset.from_tensor_slices((batched_features, batched_labels))
dataset = dataset.batch(1)
for element in dataset.as_numpy_iterator():
print(element)
class MyTransformer(tf.keras.Model):
def __init__(self):
super().__init__()
def call(self, inputs, training):
print(type(inputs))
feature, lable = inputs
return feature
model = MyTransformer()
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss=tf.keras.losses.BinaryCrossentropy(),
metrics=[tf.keras.metrics.BinaryAccuracy(),
tf.keras.metrics.FalseNegatives()])
model.fit(train_data, batch_size = 1, epochs = 1)
The code is reduced significantly just for the purpuse of reproducing the issue.
I've tried passing the data as dictionary instead of tuple and couple more things but nothing worked. It seems that I am missing something.

Counting on NumberLine

I've been searching everywhere and I can't seem to find how one counts (with "bouncing curved arrows") on a NumberLine. How to do so?
This is my desired output:
I'm still quite new to manim. Thank you in advance.
You can use CurvedArrow for that.
Example :
class CountingNumberLine(Scene):
def construct(self):
n = NumberLine(x_range=[-5, 5, 1], include_numbers=True)
self.add(n)
for i in range(3):
self.add(CurvedArrow(n.number_to_point(i), n.number_to_point(i + 1), angle=- TAU / 4))
Result
Note that if you want to have only an arrow for the last count (like in your image) you can use ArcBetweenPoint instead of CurvedArrow.

How does correlation work for an even-sized filter in this example?

(I know a question like this exists, but I wanted help with a specific example)
If the linear filter has even dimensions, how is the "center" defined? i.e. in the following scenario:
filter = np.array([[a, b],
[c, d]])
and the image was:
image = np.array([[0, 1, 0],
[1, 1, 1],
[0, 1, 0]])
what would be the result of correlation of the image with the linear filter?
Which of the elements of the even-sized filter is considered the origin is an arbitrary choice. Each implementation will make a different choice. Though a and d are the two most likely choices for reasons of similarity of the two image dimensions.
For example, MATLAB's imfilter (which implements correlation, not convolution) does the following:
f = [1,2;4,8];
img = [0,1,0;1,1,1;0,1,0];
imfilter(img,f,'same')
ans =
14 13 4
11 7 1
2 1 0
meaning that a is the origin of the kernel in this case. Other implementations might make a different choice.

Constructing discrete table-based CPDs in tensorflow-probablity?

I'm trying to construct the simplest example of Bayesian network with several discrete random variables and conditional probabilities (the "Student Network" from Koller's book, see 1)
Although a bit unwieldy, I managed to build this network using pymc3. Especially, creating the CPDs is not that straightforward in pymc3, see the snippet below:
import pymc3 as pm
...
with pm.Model() as basic_model:
# parameters for categorical are indexed as [0, 1, 2, ...]
difficulty = pm.Categorical(name='difficulty', p=[0.6, 0.4])
intelligence = pm.Categorical(name='intelligence', p=[0.7, 0.3])
grade = pm.Categorical(name='grade',
p=pm.math.switch(
theano.tensor.eq(intelligence, 0),
pm.math.switch(
theano.tensor.eq(difficulty, 0),
[0.3, 0.4, 0.3], # I=0, D=0
[0.05, 0.25, 0.7] # I=0, D=1
),
pm.math.switch(
theano.tensor.eq(difficulty, 0),
[0.9, 0.08, 0.02], # I=1, D=0
[0.5, 0.3, 0.2] # I=1, D=1
)
)
)
letter = pm.Categorical(name='letter', p=pm.math.switch(
...
But I have no idea how to build this network using tensoflow-probability (versions: tfp-nightly==0.7.0.dev20190517, tf-nightly-2.0-preview==2.0.0.dev20190517)
For the unconditioned binary variables, one can use categorical distribution, such as
from tensorflow_probability import distributions as tfd
from tensorflow_probability import edward2 as ed
difficulty = ed.RandomVariable(
tfd.Categorical(
probs=[0.6, 0.4],
name='difficulty'
)
)
But how to construct the CPDs?
There are few classes/methods in tensorflow-probability that might be relevant (in tensorflow_probability/python/distributions/deterministic.py or the deprecated ConditionalDistribution) but the documentation is rather sparse (one needs deep understanding of tfp).
--- Updated question ---
Chris' answer is a good starting point. However, things are still a bit unclear even for a very simple two-variable model.
This works nicely:
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Bernoulli(probs=tf.gather([0.1, 0.9], indices=dist_x), validate_args=True)
))
print(jdn.sample(10))
but this one fails
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Categorical(probs=tf.gather_nd([[0.1, 0.9], [0.5, 0.5]], indices=[dist_x]))
))
print(jdn.sample(10))
(I'm trying to model categorical explicitly in the second example just for learning purposes)
-- Update: solved ---
Obviously, the last example wrongly used tf.gather_nd instead of tf.gather as we only wanted to select the first or the second row based on the dist_x outome. This code works now:
jdn = tfd.JointDistributionNamed(dict(
dist_x=tfd.Categorical([0.2, 0.8], validate_args=True),
dist_y=lambda dist_x: tfd.Categorical(probs=tf.gather([[0.1, 0.9], [0.5, 0.5]], indices=[dist_x]))
))
print(jdn.sample(10))
The tricky thing about this, and presumably the reason it's subtler than expected in PyMC, is -- as with almost everything in vectorized programming -- handling shapes.
In TF/TFP, the (IMO) nicest way to solve this is with one of the new TFP JointDistribution{Sequential,Named,Coroutine} classes. These let you naturally represent hierarchical PGM models, and then sample from them, evaluate log probs, etc.
I whipped up a colab notebook demoing all 3 approaches, for the full student network: https://colab.research.google.com/drive/1D2VZ3OE6tp5pHTsnOAf_7nZZZ74GTeex
Note the crucial use of tf.gather and tf.gather_nd to manage the vectorization of the various binary and categorical switching.
Have a look and let me know if you have any questions!

Julia ReverseDiff: how to take a gradient w.r.t. only a subset of inputs?

In my data flow, I'm querying a small subset of a database, using those results to construct about a dozen arrays, and then, given some parameter values, computing a likelihood value. Then repeating for a subset of the database. I want to compute the gradient of the likelihood function with respect to the parameters but not the data. But ReverseDiff computes the gradient with respect to all inputs. How can I get around this? Specifically, how can I construct a ReverseDiff.Tape object
TL;DR: How to marry stochastic gradient descent and ReverseDiff? (I'm not wedded to using ReverseDiff. It just seemed like the right tool for the job.)
It seems like this has to be a common coding pattern. It's used all the time in my field. But I'm missing something. Julia's scoping rules seem to undermine the scoped/anonymous function approach, and ReverseDiff is holding on to the original data values in the tape generated instead of using the mutated values.
some sample code of things that don't work
using ReverseDiff
using Base.Test
mutable struct data
X::Array{Float64, 2}
end
const D = data(zeros(Float64, 2, 2))
# baseline known data to compare against
function f1(params)
X = float.([1 2; 3 4])
f2(params, X)
end
# X is data, want derivative wrt to params only
function f2(params, X)
sum(params[1]' * X[:, 1] - (params[1] .* params[2])' * X[:, 2].^2)
end
# store data of interest in D.X so that we can call just f2(params) and get our
# gradient
f2(params) = f2(params, D.X)
# use an inner function and swap out Z's data
function scope_test()
function f2_only_params(params)
f2(params, Z)
end
Z = float.([6 7; 1 3])
f2_tape = ReverseDiff.GradientTape(f2_only_params, [1, 2])
Z[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3,4])
return grad
end
function struct_test()
D.X[:] = float.([6 7; 1 3])
f2_tape = ReverseDiff.GradientTape(f2, [1., 2.])
D.X[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3., 4.])
return grad
end
function struct_test2()
D.X[:] = float.([1 2; 3 4])
f2_tape = ReverseDiff.GradientTape(f2, [3., 4.])
D.X[:] = float.([1 2; 3 4])
grad = ReverseDiff.gradient!(f2_tape, [3., 4.])
return grad
end
D.X[:] = float.([1 2; 3 4])
#test f1([3., 4.]) == f2([3., 4.], D.X)
#test f1([3., 4.]) == f2([3., 4.])
f1_tape = ReverseDiff.GradientTape(f1, [3,4])
f1_grad = ReverseDiff.gradient!(f1_tape, [3,4])
# fails! uses line 33 values
#test scope_test() == f1_grad
# fails, uses line 42 values
#test struct_test() == f1_grad
# succeeds, so, not completely random
#test struct_test2() == f1_grad
This is currently not possible (sadly). And there is a GitHub issue with the two work-arounds:
https://github.com/JuliaDiff/ReverseDiff.jl/issues/36
either do not use a prerecorded tape
or differentiate relative to all arguments and ignore the gradient for some of the input parameters.
I had the same issue, and I used the grad function of Knet instead. I supports only the differentiation relative to one argument, but this argument can be quite flexible (e.g. an array of arrays or an dict or arrays).
Thanks Alex, your answer was 90% of the way there. AutoGrad (what Knet is using at the time of writing) does provide a very nice interface that is natural I think for most users. However, it turns out that using anonymous functions with ReverseDiff is faster than the approach taken by AutoGrad, for reasons I don't quite understand.
If you follow the chain of issues referenced in what you linked, this seems to be what the ReverseDiff/ForwardDiff folks want people doing:
ReverseDiff.gradient(p -> f(p, non_differentiated_data), params)
Certainly disappointing that we can't get a precompiled tape with this incredibly common usage scenario, and maybe future work will change things. But this seems to be where things stand now.
Some references for those interested in further reading:
https://github.com/JuliaDiff/ForwardDiff.jl/issues/77
https://github.com/JuliaDiff/ForwardDiff.jl/issues/32
https://github.com/JuliaDiff/ForwardDiff.jl/pull/182

Resources