I am following the Pyro introductory tutorial in forecasting, and trying to access the learned parameters after training the model, I get different results using different access methods for some of them (while getting identical results for others).
Here is the stripped-down reproducible code from the tutorial:
import torch
import pyro
import pyro.distributions as dist
from pyro.contrib.examples.bart import load_bart_od
from pyro.contrib.forecast import ForecastingModel, Forecaster
pyro.enable_validation(True)
pyro.clear_param_store()
pyro.__version__
# '1.3.1'
torch.__version__
# '1.5.0+cu101'
# import & prepare the data
dataset = load_bart_od()
T, O, D = dataset["counts"].shape
data = dataset["counts"][:T // (24 * 7) * 24 * 7].reshape(T // (24 * 7), -1).sum(-1).log()
data = data.unsqueeze(-1)
T0 = 0 # begining
T2 = data.size(-2) # end
T1 = T2 - 52 # train/test split
# define the model class
class Model1(ForecastingModel):
def model(self, zero_data, covariates):
data_dim = zero_data.size(-1)
feature_dim = covariates.size(-1)
bias = pyro.sample("bias", dist.Normal(0, 10).expand([data_dim]).to_event(1))
weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([feature_dim]).to_event(1))
prediction = bias + (weight * covariates).sum(-1, keepdim=True)
assert prediction.shape[-2:] == zero_data.shape
noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))
noise_dist = dist.Normal(0, noise_scale)
self.predict(noise_dist, prediction)
# fit the model
pyro.set_rng_seed(1)
pyro.clear_param_store()
time = torch.arange(float(T2)) / 365
covariates = torch.stack([time], dim=-1)
forecaster = Forecaster(Model1(), data[:T1], covariates[:T1], learning_rate=0.1)
So far so good; now, I want to inspect the learned latent parameters stored in Paramstore. Seems there are more than one ways to do this; using the get_all_param_names() method:
for name in pyro.get_param_store().get_all_param_names():
print(name, pyro.param(name).data.numpy())
I get
AutoNormal.locs.bias [14.585433]
AutoNormal.scales.bias [0.00631594]
AutoNormal.locs.weight [0.11947815]
AutoNormal.scales.weight [0.00922901]
AutoNormal.locs.noise_scale [-2.0719821]
AutoNormal.scales.noise_scale [0.03469057]
But using the named_parameters() method:
pyro.get_param_store().named_parameters()
gives the same values for the location (locs) parameters, but different values for all scales ones:
dict_items([
('AutoNormal.locs.bias', Parameter containing: tensor([14.5854], requires_grad=True)),
('AutoNormal.scales.bias', Parameter containing: tensor([-5.0647], requires_grad=True)),
('AutoNormal.locs.weight', Parameter containing: tensor([0.1195], requires_grad=True)),
('AutoNormal.scales.weight', Parameter containing: tensor([-4.6854], requires_grad=True)),
('AutoNormal.locs.noise_scale', Parameter containing: tensor([-2.0720], requires_grad=True)),
('AutoNormal.scales.noise_scale', Parameter containing: tensor([-3.3613], requires_grad=True))
])
How is this possible? According to the documentation, Paramstore is a simple key-value store; and there are only these six keys in it:
pyro.get_param_store().get_all_param_names() # .keys() method gives identical result
# result
dict_keys([
'AutoNormal.locs.bias',
'AutoNormal.scales.bias',
'AutoNormal.locs.weight',
'AutoNormal.scales.weight',
'AutoNormal.locs.noise_scale',
'AutoNormal.scales.noise_scale'])
so, there is no way that one method access one set of items and the other a different one.
Am I missing something here?
pyro.param() returns transformed parameters in this case to the positive reals for scales.
Here is the situation, as revealed in the Github thread I opened in parallel with this question...
Paramstore is no more just a simple key-value store - it also performs constraint transformations; quoting a Pyro developer from the above link:
here's some historical background. The ParamStore was originally just a key-value store. Then we added support for constrained parameters; this introduced a new layer of separation between user-facing constrained values and internal unconstrained values. We created a new dict-like user-facing interface that exposed only constrained values, but to keep backwards compatibility with old code we kept the old interface around. The two interfaces are distinguished in the source files [...] but as you observe it looks like we forgot to mark the old interface as DEPRECATED.
I guess in clarifying docs we should:
clarify that the ParamStore is no longer a simple key-value store
but also performs constraint transforms;
mark all "old" style interface methods as DEPRECATED;
remove "old" style interface usage from examples and tutorials.
As a consequence, it turns out that, while pyro.param() returns the results in the constrained (user-facing) space, the older method named_parameters() returns the unconstrained (i.e. for internal use only) values, hence the apparent discrepancy.
It's not difficult to verify indeed that the scales values returned by the two methods above are related by a logarithmic transformation:
import numpy as np
items = list(pyro.get_param_store().named_parameters()) # unconstrained space
i = 0
for name in pyro.get_param_store().keys():
if 'scales' in name:
temp = np.log(
pyro.param(name).item() # constrained space
)
print(temp, items[i][1][0].item() , np.allclose(temp, items[i][1][0].item()))
i+=1
# result:
-5.027793402915326 -5.0277934074401855 True
-4.600319371162187 -4.6003193855285645 True
-3.3920585732532835 -3.3920586109161377 True
Why does this discrepancy affect only scales parameters? That's because scales (i.e. essentially variances) are by definition constrained to be positive; that doesn't hold for locs (i.e. means), which are not constrained, hence the two representations coincide for them.
As a result of the question above, a new bullet has now been added in the Paramstore documentation, giving a relevant hint:
in general parameters are associated with both constrained and unconstrained values. for example, under the hood a parameter that is constrained to be positive is represented as an unconstrained tensor in log space.
as well as in the documentation of the named_parameters() method of the old interface:
Note that, in the event the parameter is constrained, unconstrained_value is in the unconstrained space implicitly used by the constraint.
Related
I am looking for a way to describe the constraints of the Direct Collocation method in pydrake.
I got the robot model from my own URDF by using FindResource as this(l11-16).
Then, I tried to make some functions which calculate the positions of the joints as swing_foot_height(q) of this.
However there is a problem.
It is maybe a type error.
I defined q as following
robot = MultibodyPlant(time_step=0.0)
scene_graph = SceneGraph()
robot.RegisterAsSourceForSceneGraph(scene_graph)
file_name = FindResource("models/robot.urdf")
Parser(robot).AddModelFromFile(file_name)
robot.Finalize()
context = robot.CreateDefaultContext()
dircol = DirectCollocation(
robot,
context,
...(Omission)...
input_port_index=robot.get_actuation_input_port().get_index())
x = dircol.state()
nq = biped_robot.num_positions()
q = x[0:nq]
Then, I used this q for the function like swing_foot_height(q).
The error is like
SetPositions(): incompatible function arguments. The following argument types are supported:
...
q: numpy.ndarray[numpy.float64[m, 1]]
...
Invoked with:
...
array([Variable('x(0)', Continuous), ... Variable('x(9)', Continuous)],dtype=object)
Are there some way to avoid this error?
Right. In the compass gait notebook that you cited, there was an important line:
# overwrite MultibodyPlant with its autodiff copy
compass_gait = compass_gait.ToAutoDiffXd()
so that multibody plant that was being used in the constraint is actually an AutoDiffXd version of the plant.
The littledog notebook has more examples of this, with a more robust implementation that works for both float and autodiff constraint evaluations.
As far as I understand this, trajectory optimization with DirectCollocation converts the data type of the decision variables (in your case, x and q) to AutoDiffXd type. That is the type you're seeing here in the "Invoked with" error message. This is the type used for automatic differentiation which is used for finding the gradients for the optimization solver.
You'll need to convert back to float to use the SetPositions() function.
I want to create a custom aggregator where the state is the unique client state of each client. To initialize I can define client states as usual, and then use federated_collect to place #SERVER placement since thats what initialize_fn() wants. I can also do the same for creating new_state in the next_fn(). The problem is once I don't know how I can "broadcast" these states back into clients. Normally federated_broadcast takes say A#SERVER and then makes copies of it equal to number of clients. so for two clients it would be {A}#CLIENTS lets say (A A). What I want is to have AB#SERVER turning into (A B).
I am currently defining client states outside the aggregation process, and then passing these in run_one_round of iterative process. use federated_collect to collect these states from measurements of aggregator, and then unstack it outside. So from the outside of the federated computations it looks like
server_state, train_metrics , client_states, aggregation_state = iterative_process.next(
server_state, sampled_train_data, client_states, aggregation_state)
client_states = [x for x in client_states]
In TFF
output = aggregation_process.next(aggregation_state, client_outputs.weights_delta, client_states)
new_aggregation_state = output.state
round_model_delta = output.result
new_client_states = output.measurements
in aggregator
measurements = tff.federated_collect(new_client_states)
return tff.templates.MeasuredProcessOutput(
state=new_state, result= round_model_delta, measurements=measurements)
But I am trying to define and handle these client states completely inside the aggregator so that I can plug this aggregator into tff.learning.build_federated_averaging_process like
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
model_update_aggregation_factory=my_aggregation_factory)
Is that possible? if so how?
tff.federated_collect is likely not the desired tool in this situation, and it will be removed in future versions of TFF (see commit #030a406).
Alternatively, A tff.federated_computation can both take #CLIENTS parameters as input, and return #CLIENT placed values as output. Instead of collecting all the values on the server first (implying that the system is communicating the states); it maybe best to leave the values on the clients.
When executing TFF in a simulation environment (e.g. invoking a tff.Computation in a Colab notebook) a T#CLIENT placed value will be returned as a list of T objects; one for each client. This can later be used as a parameter to future tff.Computation invocation.
Example:
#tff.tf_computation(tf.int32)
def sqrt(value):
return tf.math.sqrt(tf.cast(value, tf.float32))
#tff.federated_computation(tff.types.at_clients(tf.int32))
def federated_sqrt(values):
return tff.federated_map(sqrt, values)
client_values = [1,2,3,4]
federated_sqrt(client_values)
>>> [<tf.Tensor: shape=(), dtype=float32, numpy=1.0>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.4142135>,
<tf.Tensor: shape=(), dtype=float32, numpy=1.7320508>,
<tf.Tensor: shape=(), dtype=float32, numpy=2.0>]
Important caveat: the order of inputs and outputs is not necessarily guaranteed to be the same. An example of how to index and track states across invocations can be found in the tensorflow_federated/python/examples/stateful_clients/ directory inside the repository.
this is a three part question on the use of the Python API to Z3 (Z3Py).
I thought I knew the difference between a constant and a variable but apparently not. I was thinking I could declare a sort and instantiate a variable of that sort as follows:
Node, (a1,a2,a3) = EnumSort('Node', ['a1','a2','a3'])
n1 = Node('n1') # c.f. x = Int('x')
But python throws an exception saying that you can't "call Node". The only thing that seems to work is to declare n1 a constant
Node, (a1,a2,a3) = EnumSort('Node', ['a1','a2','a3'])
n1 = Const('n1',Node)
but I'm baffled at this since I would think that a1,a2,a3 are the constants. Perhaps n1 is a symbolic constant, but how would I declare an actual variable?
How to create a constant set? I tried starting with an empty set and adding to it but that doesn't work
Node, (a1,a2,a3) = EnumSort('Node', ['a1','a2','a3'])
n1 = Const('n1',Node)
nodes = EmptySet(Node)
SetAdd(nodes, a1) #<-- want to create a set {a1}
solve([IsMember(n1,nodes)])
But this doesn't work Z3 returns no solution. On the other hand replacing the 3rd line with
nodes = Const('nodes',SetSort(Node))
is now too permissive, allowing Z3 to interpret nodes as any set of nodes that's needed to satisfy the formula. How do I create just the set {a1}?
Is there an easy way to create pairs, other than having to go through the datatype declaration which seems a bit cumbersome? eg
Edge = Datatype('Edge')
Edge.declare('pr', ('fst', Node), ('snd',Node))
Edge.create()
edge1 = Edge.pr(a1,a2)
Declaring Enums
Const is the right way to declare as you found out. It's a bit misleading indeed, but it is actually how all symbolic variables are created. For instance, you can say:
a = Const('a', IntSort())
and that would be equivalent to saying
a = Int('a')
It's just that the latter looks nicer, but in fact it's merely a function z3 folks defined that sort of does what the former does. If you like that syntax, you can do the following:
NodeSort, (a1,a2,a3) = EnumSort('Node', ['a1','a2','a3'])
def Node(nm):
return Const(nm, NodeSort)
Now you can say:
n1 = Node ('n1')
which is what you intended I suppose.
Inserting to sets
You're on the right track; but keep in mind that the function SetAdd does not modify the set argument. It just creates a new one. So, simply give it a name and use it like this:
emptyNodes = EmptySet(Node)
myNodes = SetAdd(emptyNodes, a1)
solve([IsMember(n1,myNodes)])
Or, you can simply substitute:
mySet = SetAdd(SetAdd(EmptySet(Node), a1), a2)
which would create the set {a1, a2}.
As a rule of thumb, the API tries to be always functional, i.e., no destructive updates to existing variables, but you instead create new values out of old.
Working with pairs
That's the only way. But nothing is stopping you from defining your own functions to simplify this task, just like we did with the Node function in the first part. After all, z3py is essentially Python library and z3 folks did a lot of work to make it nicer, but you also have the entire power of Python to simplify your life. In fact, many other interfaces to z3 from other languages (Scala, Haskell, O'Caml etc.) precisely do that to provide a much easier to work with API using the features of their respective host languages.
I have a SOAP node, that retrieve information from a URL in a tree structure.
Then i have a compute node to define each environment variable to each namespace variable of the SOAP retrieve.
And finally, i have a mapping node, to move the content to my message assembly structure in XML.
The error its giving me it's this (IN THE COMPUTE NODE):
I have a structure like this:
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
ListDocs
Description
DocType
ListTypes
Attribute
Lenght
Description
Nature
Required
The problem is that, when i do the definition of the variables, I do it like the code below, in the COMPUTE NODE:
WHILE I < InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos
DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.tipoDocumento = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:DocType;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.attribute = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:atribbute;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.lenght = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:lenght;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.nature = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:nature;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.required = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs.ns75:listTypes.ns75:required;
SET I = I+1;
END WHILE;
BUT, in my XML final structure, it only prints the values of my first listDocs, and i want to print all of my listDocs structures.
NOTE: WITH THE WHILE LIKE THIS, IT DOESN'T EVEN WORK. I HAVE TO REMOVE THE WHILE TO PRINT THE FIRST listDocs like i said Above.
Any help?
I NEED HELP TO LOOP THE STRUCTURES, WITH A WHILE OR SOMETHING.
You should try to use the following synthax :
DECLARE I INTEGER 1;
DECLARE J INTEGER;
J = CARDINALITY(InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos[])
WHILE I <= J DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:description;
....
END WHILE;
You only missed the CARDINALITY function to get the number of elements, and also the [] to define the table, and then using this [I] while accessing the elements
Note : in my sample above, the environment will be overridden at each iteration of the loop, so only the last record will be printed. You can use the [I] in the output as well if you want to construct a table in output, or you can use the following code to push each message to the output terminal (this means you have one message in input, and 3 message coming out of the output terminal)
PROPAGATE TO TERMINAL 'Out';
So for example, based on your code, if you want to generate 3 messages based on your input containing multiple element :
DECLARE I INTEGER 1;
DECLARE J INTEGER;
J = CARDINALITY(InputRoot.SOAP.Body.ns:obterTiposDocProcessosResponse.ns:return.ns75:processo.ns75:listaTiposDocumentos[])
WHILE I <= J DO
SET Environment.Variables.XMLMessage.return.process.listDocs.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.tipoDocumento = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:DocType;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.attribute = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:atribbute;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.lenght = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:lenght;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:description;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.nature = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:nature;
SET Environment.Variables.XMLMessage.return.process.listDocs.listTypes.required = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:required;
PROPAGATE TO TERMINAL 'Out';
END WHILE;
RETURN FALSE;
For your global information, the RETURN TRUE is the instruction "pushing" the message built in the ESQL code to the output terminal. If you use PROPAGATE instruction (same effect), you should RETURN FALSE to avoid sending an empty message after looping on your records. Another way to do it is to propagate on another terminal (i.e : 'out1'), and keep the return true. In this case, you would have all you records coming out from the out1 terminal, and a message going out of the output temrinal (due to the return true) once all the messages have been propagated (this might be useful in many situations)
So the key to understanding IIB and ESQL is that you are looking at in memory Trees built from nodes.
Each Node has pointers/REFERENCEs to PARENT, NEXTSIBLING, PREVSIBLING, FIRSTCHILD and LASTCHILD Nodes.
Nodes also have FIELDNAME, FIELDNAMESPACE, FIELDTYPE and FIELDVALUE attributes.
And last but not least that you are building Output Trees by navigating Input Trees. The Environment Tree, which you are using, is a special long lasting Tree that you can both read from and write to.
So in your code InputRoot.SOAP.Body.ns75:processo.ns75:listDocs can be thought of as shorthand for instructions to navigate to the ns75:listDocs Node. The dots '.' tell ESQL interpreter the name of the child Node of the current Node. If you were telling someone how to navigate the Nodes it would go something like this.
Start at InputRoot. InputRoot is a special Node that is automatically available to you in your ESQL Modules code.
Navigate to the first child Node of InputRoot that has the name SOAP
Navigate to the first child Node of SOAP that has the name Body
Navigate to the first child Node of Body that has the name listDocs and is in the ns75 namespace.
In the absence of a subscript ESQL assumes you want the first Node that matches the specified name ns75:listDocs and ns75:listDocs[1] both refer to the same Node.
This explains what was happening in your code. You were always navigating to the same listDocs[1] node in the InputRoot and Environment Trees.
#Jerem's code improves on what you were doing by at least navigating across the listDocs nodes in the Input tree.
For each iteration of the loop the subscript [I] gets incremented and thus it chooses a different listDocs Node. The listDocs Nodes are siblings and thus the code will access the first, second and third instance of the listDocs Nodes.
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[1] <-- Iteration I=1
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[2] <-- Iteration I=2
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[3] <-- Iteration I=3
To correct #Jerem's answer you'd need to use subscripts on the lefthand side of the statement as well. Picking the description field as an example you'd need to change your code as follows.
SET Environment.Variables.XMLMessage.return.process.listDocs[I].listTypes.description = InputRoot.SOAP.Body.ns75:processo.ns75:listDocs[I].ns75:listTypes.ns75:description;
Using subscripts is regarded as a performance no no. Imagine you had 10,000 listDocs this would result in each and every iteration of the loop walking down the tree over the InputRoot, SOAP, Body, ns75:processo Nodes and then across the listDocs sibling nodes until it found the ns75:listDocs[I] Node.
This means by the time we get round to processing ns75:listDocs[10000] it will have had to repetetively walked over all the other listDocs Nodes time and time again, In fact we can calculate it would have walked over (4 x 10,000) + ((10,000 x (10,000 + 1)) / 2) = 50,045,000 Nodes
So it's REFERENCE's to the rescue and also the answer to your question. Try a loop like this.
DECLARE ns75 NAMESPACE 'http://something.or.other.from.your.wsdl';
DECLARE InListDocsRef REFERENCE TO
InputRoot.SOAP.Body.ns75:processo.ns75:listDocs;
WHILE LASTMOVE(InListDocsRef) DO
DECLARE EnvListDocsRef REFERENCE TO Environment;
CREATE LASTCHILD OF Environment.Variables.XMLMessage.return.process AS EnvListDocsRef NAME 'listDocs';
SET EnvListDocsRef.description = InListDocsRef.ns75:description;
SET EnvListDocsRef.tipoDocumento = InListDocsRef.ns75:DocType;
SET EnvListDocsRef.listTypes.attribute = InListDocsRef.ns75:listTypes.ns75:atribbute;
SET EnvListDocsRef.listTypes.lenght = InListDocsRef.ns75:listTypes.ns75:lenght;
SET EnvListDocsRef.listTypes.description = InListDocsRef.ns75:listTypes.ns75:description;
SET EnvListDocsRef.listTypes.nature = InListDocsRef.ns75:listTypes.ns75:nature;
SET EnvListDocsRef.listTypes.required = InListDocsRef.ns75:listTypes.ns75:required;
MOVE InListDocsRef NEXTSIBLING REPEAT NAME;
END WHILE;
The code above only walks over 4 + 10,000 Nodes i.e. 10 thousand Nodes vs 50 million Nodes.
A couple of other useful things to know about setting references are:
To point to the last element you can use a subscript of [<]. So to point to the last ListItem in the aggregate MyList you would code Environment.MyList.ListItem[<]
You can use an asterisk * to set a reference to an element in the tree that you don't know the name of e.g. Environment.MyAggregate.* points to the first child of MyAggregate regardless of it's name.
You can also use asterisks * to choose an element irregardless of it's namespace InListDocsRef.*:listTypes.*:description
For anonymous namespaced elements use *:* but be very careful * and *:* are not the same thing the first means no namespace any element and the second means any namespace any element.
To process lists in reverse combine the [<] subscript with the PREVIOUSSIBLING option of MOVE.
So a chunk of code for reversing a list might go something like:
DECLARE MyReverseListItemWalkingRef REFERENCE TO Environment.MyList.ListItem[<];
WHILE LASTMOVE(MyReverseListItemWalkingRef) DO
CREATE LASTCHILD OF OuputRoot.ReversedList.Item NAME 'Description' VALUE MyReverseListItemWalkingRef.Desc;
MOVE MyReverseListItemWalkingRef PREVIOUSSIBLING REPEAT NAME;
END WHILE;
Learn how to use REFERENCES they are extremely powerful and one of your simplest options when it comes to performance.
When creating a graph of calculations using delayed I'm trying to assign names so that if I visualize the graph it's readable. However, for delayed variables that are dependent on functions the name parameter doesn't seem to affect the key. Here's a toy example:
def calc_avg(a, b):
return pd.concat([a, b], axis=1).mean(axis=1)
def calc_ratio(a, b):
return a / b
a = delayed(pd.Series(np.random.rand(10)), name='a')
b = delayed(pd.Series(np.random.rand(10)), name='b')
c = delayed(pd.Series(np.random.rand(10)), name='c')
x = delayed(calc_avg, name='avg_result')(a,b)
y = delayed(calc_ratio, name='ratio_result')(x,c)
y.visualize()
You can see the visualization here (I can't embed images), but rather than seeing 'avg_result' I see 'calc_avg-#0' and rather than see 'ratio_result' I see 'calc_ratio-#1'. If I look at x.key or y.key they do not match the names that I provided. Is this the expected behavior?
The key of a dask result needs to be unique for every combination of the function that was delayed, and the inputs you give it. What you see above is the expected behaviour: you are naming the function, but a call with different inputs would expect a different output, so the key must be different.
You can specify the key you'd like associated not when you define the delayed function, but when you call it:
x = delayed(calc_avg)(a, b, dask_key_name='avg_result')
y = delayed(calc_ratio)(x, c, dask_key_name='ratio_result')