Understanding State Groups in Context in Drake - drake

I've created a diagram with an LQR Controller, a MultiBodyPlant, a scenegraph, and a PlanarSceneGraphVisualizer.
While trying to run this simulation, I set the random initial conditions using the function: context.SetDiscreteState(randInitState). However, with this, I get the following error:
RuntimeError: Context::SetDiscreteState(): expected exactly 1 discrete state group but there were 2 groups. Use the other signature if you have multiple groups.
And indeed when I check the number of groups using context.num_discrete_state_groups(), it returns 2. So, then I have to specify the group index while setting the state using the command context.SetDiscreteState(0, randInitState). This works but I don't exactly know why. I understand that I have to select a correct group to set the state for but what exactly is a group here? In the cartpole example given here, the context was set using context.SetContinuousState(UprightState() + 0.1 * np.random.randn(4,)) without specifying any group(s).
Are groups only valid for discrete systems? The context documentation talks about groups and but doesn't define them.
Is there a place to find the definition of what a group is while setting up a drake simulation with multiple systems inside a diagram and how to check the group index of a system?

We would typically recommend that you use a workflow that sets the context using a subsystem interface. E.g.
plant_context = plant.GetMyMutableContextFromRoot(context)
plant_context.SetContinuousState(...)
Figuring out the discrete index of a state group for a DiagramContext might be possible, but it's certainly not typical.
You might find it helpful to print the context. In pydrake, you can actually just call print(context), and you will see the different elements and where they are coming from.

Related

Can I assume the running order of LeafSystem's CalcOutput function?

I am working on a LeafSystem like this:
class exampleLeafSystem(LeafSystem):
def __init__(self, plant):
self._plant = plant
self._plant_context = plant.CreateDefaultContext()
self.DeclareVectorIutputPort("q_v", BasicVector(6))
self.DeclareVectorOutputPort("tau", BasicVector(3), self.TauCalcOutput)
self.DeclareVectorOutputPort("xe", BasicVector(3), self.xeCalcOutput)
def TauCalcOutput(self, context, output):
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
# Do some calculation with self._plant and self._plant_context to get the output
output.SetFromVector(tau)
def xeCalcOutput(self, context, output):
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
# Do some calculation with self._plant and self._plant_context to get the output
output.SetFromVector(xe)
In the two methods TauCalcOutput and xeCalcOutput here, I need frist update the MultibodyPlant's state information and then do the calculation to compute the output. However, since I do not know the order of the two CalcOutput method being called, in order for this two method to use the newest state information, I have to write
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
in both methods, which seems a bit unnecessary. If I can assume the order of the CalcOutput methods being runned, for example, say that TauCalcOutput always runs first, then I can only have
q_v = self.get_input_port(0).Eval(context)
self._plant.SetPositionsAndVelocities(self._plant_context, q_v)
in TauCalcOutput method, without worrying that XeCalcOutput method will use MultibodyPlant's state information that is one step lagged.
So my question is that is there a specific order for the CalcOutput methods being called?
No. The contract is that output ports can be called in any order at any time, and are only called when they are evaluated (e.g. by a downstream system). The order that they will be called will depend on the other systems in the Diagram; they might not all get called during a single simulation step (e.g. if one system is consumed by a discrete time system with time step 0.1, and another by time step 0.2), or may not be called at all if they are not connected. Users can even call get_output_port().Eval() manually.
For a general approach to avoiding duplicate computation in output ports, you should store the result of the shared computation in the Context, either as state or as a "cache entry". See DeclareCacheEntry for more details.
For this workflow, specifically, perhaps the simplest solution is to check whether the positions and velocities are already set to the same value, just to avoid repeating the kinematics evaluation, for instance as you see here:
https://github.com/RobotLocomotion/drake/blob/6e6e37ffa677362245773f13c0628f0042b47414/multibody/inverse_kinematics/kinematic_constraint_utilities.cc#L47-L54

Updating LeafSystem discrete state before publishing output

I have a LeafSystem (controller) with two output ports, each of which depend on the solution to the same MathematicalProgram. My initial idea was to solve the program and store the solution as a discrete state which the output port callbacks can access and copy appropriately.
My interpretation of the documentation (https://drake.mit.edu/doxygen_cxx/group__discrete__systems.html) and what I see when implementing this, however, is that the output callbacks use the discrete state before the PerStepDiscreteUpdateEvent.
Now for my questions -
Is this behavior that I've described above consistent with how the Simulator handles update events or am I missing something there?
Is there a way to update the discrete state before the output calculation and have the updated state be used in the output?
Is there a different design that would be more appropriate here?
The simple solution to your problem is cache entry.
Declare a cache entry that does your mathematical program work and updates the associated cache entry (it stores the results). When each output port is evaluated, they both "Eval" the cache entry and draw whatever data they need from the stored result. Then, no matter which port is evaluated first, the second one will always benefit from the pre-computation.
You can look at the cache entry notes for more detail.

Question about SPSS modeler (There is an obstacle for make the stream run automatically)

I have SPSSmodeler stream which is now used and updated every week constantly to generate a certain dataset. A raw data for this stream is also renewed on a weekly basis.
In part of this stream, there is a chunk of nodes that were necessary to modify and update manually every week, and the sequence of this part is below: Type Node => Restructure Node => Aggregate Node
To simplify the explanation of those nodes' role, I drew an image of them as bellow.
Because the original raw data is changed weekly basis, the range of Unit value above is always varied, sometimes more than 6 (maybe 100) others less than 6 (maybe 3). That is why somebody has to modify there and update those chunk of nodes on a weekly basis until now. *Unit value has a certain limitation (300 for now)
However, now we are aiming to run this stream automatically without touching any human operations on it that we need to customize there to work perfectly, automatically. Please help and will appreciate your efforts, thanks!
In order to automatize, I suggest to try to use global nodes combined with clem scripts inside the execution (default script). I have a stream that calculates the first date and the last date and those variables are used to rename files at the end of execution. I think you could use something similar as explained here:
1) Create derive nodes to bring the unit values used in the weekly stream
2) Save this information in a table named 'count_variable'
3) Use a Global node named Global with a query similar to this:
#GLOBAL_MAX(variable created in (2)) (only to record the number of variables. The step 2 created a table with only 1 values, so the GLOBAL_MAX will only bring the number of variables).
4) The query inside the execution tab will be similar to this:
execute count_variable
var tabledata
var fn
set tabledata = count_variable.output
set count_variable = value tabledata at 1 1
execute Global
5) You now can use the information of variables just using the already creatde "count_variable"
It's not easy to explain just by typing, but I hope to have been helpful.
Please mark as +1 in this answer if it was relevant one.
I think there is a better, simpler and more effective (yet risky, due to node's requirements to input data) solution to your problem. It is called Transpose node and does exactly that - pivot your table. But just from version 18.1 on. Here's an example:
https://developer.ibm.com/answers/questions/389161/how-does-new-feature-partial-transpose-work-in-sps/

Stream de-duplication on Dataflow | Running services on Dataflow services

I want to de-dupe a stream of data based on an ID in a windowed fashion. The stream we receive has and we want to remove data with matching within N-hour time windows. A straight-forward approach is to use an external key-store (BigTable or something similar) where we look-up for keys and write if required but our qps is extremely large making maintaining such a service pretty hard. The alternative approach I came up with was to groupBy within a timewindow so that all data for a user within a time-window falls within the same group and then, in each group, we use a separate key-store service where we look up for duplicates by the key. So, I have a few questions about this approach
[1] If I run a groupBy transform, is there any guarantee that each group will be processed in the same slave? If guaranteed, we can group by the userid and then within each group compare the sessionid for each user
[2] If it is feasible, my next question is to whether we can run such other services in each of the slave machines that run the job - in the example above, I would like to have a local Redis running which can then be used by each group to look up or write an ID too.
The idea seems off what Dataflow is supposed to do but I believe such use cases should be common - so if there is a better model to approach this problem, I am looking forward to that too. We essentially want to avoid external lookups as much as possible given the amount of data we have.
1) In the Dataflow model, there is no guarantee that the same machine will see all the groups across windows for the key. Imagine that a VM dies or new VMs are added and work is split across them for scaling.
2) Your welcome to run other services on the Dataflow VMs since they are general purpose but note that you will have to contend with resource requirements of the other applications on the host potentially causing out of memory issues.
Note that you may want to take a look at RemoveDuplicates and use that if it fits your usecase.
It also seems like you might want to be using session windows to dedupe elements. You would call:
PCollection<T> pc = ...;
PCollection<T> windowed_pc = pc.apply(
Window<T>into(Sessions.withGapDuration(Duration.standardMinutes(N hours))));
Each new element will keep extending the length of the window so it won't close until the gap closes. If you also apply an AfterCount speculative trigger of 1 with an AfterWatermark trigger on a downstream GroupByKey. The trigger would fire as soon as it could which would be once it has seen at least one element and then once more when the session closes. After the GroupByKey you would have a DoFn that filters out an element which isn't an early firing based upon the pane information ([3], [4]).
DoFn(T -> KV<session key, T>)
|
\|/
Window.into(Session window)
|
\|/
Group by key
|
\|/
DoFn(Filter based upon pane information)
It is sort of unclear from your description, can you provide more details?
Sorry for not being clear. I gave the setup you mentioned a try, except for the early and late firings part, and it is working on smaller samples. I have a couple of follow up questions, related to scaling this up. Also, I was hoping I could give you more information on what the exact scenario is.
So, we have incoming data stream, each item of which can be uniquely identified by their fields. We also know that duplicates occur pretty far apart and for now, we care about those within a 6 hour window. And regarding the volume of data, we have atleast 100K events every second, which span across a million different users - so within this 6 hour window, we could get a few billion events into the pipeline.
Given this background, my questions are
[1] For the sessioning to happen by key, I should run it on something like
PCollection<KV<key, T>> windowed_pc = pc.apply(
Window<KV<key,T>>into(Sessions.withGapDuration(Duration.standardMinutes(6 hours))));
where key is a combination of the 3 ids I had mentioned earlier. Based on the definition of Sessions, only if I run it on this KV would I be able to manage sessions per-key. This would mean that Dataflow would have too many open sessions at any given time waiting for them to close and I was worried if it would scale or I would run into any bottle-necks.
[2] Once I perform Sessioning as above, I have already removed the duplicates based on the firings since I will only care about the first firing in each session which already destroys duplicates. I no longer need the RemoveDuplicates transform which I found was a combination of (WithKeys, Combine.PerKey, Values) transforms in order, essentially performing the same operation. Is this the right assumption to make?
[3] If the solution in [1] going to be a problem, the alternative is to reduce the key for sessioning to be just user-id, session-id ignoring the sequence-id and then, running a RemoveDuplicates on top of each resulting window by sequence-id. This might reduce the number of open sessions but still would leave a lot of open sessions (#users * #sessions per user) which can easily run into millions. FWIW, I dont think we can session only by user-id since then the session might never close as different sessions for same user could keep coming in and also determining the session gap in this scenario becomes infeasible.
Hope my problem is a little more clear this time. Please let me know any of my approaches make the best use of Dataflow or if I am missing something.
Thanks
I tried out this solution at a larger scale and as long as I provide sufficient workers and disks, the pipeline scales well although I am seeing a different problem now.
After this sessionization, I run a Combine.perKey on the key and then perform a ParDo which looks into c.pane().getTiming() and only rejects anything other than an EARLY firing. I tried counting both EARLY and ONTIME firings in this ParDo and it looks like the ontime-panes are actually deduped more precisely than the early ones. I mean, the #early-firings still has some duplicates whereas the #ontime-firings is less than that and has more duplicates removed. Is there any reason this could happen? Also, is my approach towards deduping using a Combine+ParDo the right one or could I do something better?
events.apply(
WithKeys.<String, EventInfo>of(new SerializableFunction<EventInfo, String>() {
#Override
public java.lang.String apply(EventInfo input) {
return input.getUniqueKey();
}
})
)
.apply(
Window.named("sessioner").<KV<String, EventInfo>>into(
Sessions.withGapDuration(mSessionGap)
)
.triggering(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.ZERO)
.accumulatingFiredPanes()
);

How do systems typically map an 997 or 999 acknowledgement back to the originating ISA?

The implementation guides (and most web resources I can find) describe the GS06 and ST02 Control Numbers as being unique only within the Interchange they are contained in. So when we build our GS and ST segments we just start the control numbers at 1 and increment as we add more Functional Groups and/or Transaction Sets. The ISA13 control numbers we generate are always unique.
The dilemma is when we receive a 999 acknowledgment; it does not include any reference to the ISA control number that it's responding to. So we have no way to find the correct originating Functional Group in our records.
This seems like a problem that anyone receiving functional acknowledgements would face, but clearly lots of systems and companies handle it, so what is the typical practice to reconcile 997s or 999s? I think we must be missing something in our reading of the guides.
GS06 and ST02 only have to be unique within the interchange, but if you use an ID that's truly unique for each one (not just within the message), then you can skip right to the proper transaction set or functional group, not just the right message.
I typically have GS start at 1 and increment the same way that you do, but the ST02 I keep unique (to the extent allowed by the 9 character limit).
GS06 is supposed to be globally unique, not only within the interchange. This is from X12-6
In order to provide sufficient discrimination for the acknowledgment
process to operate reliably and to ensure that audit trails are
unambiguous, the combination of Functional ID Code (GS01), Application
Sender's ID (GS02), Application Receiver's ID (GS03), and Functional
Group Control Numbers (GS06, GE02) shall by themselves be unique
within a reasonably extended time frame whose boundaries shall be
defined by trading partner agreement. Because at some point it may be
necessary to reuse a sequence of control numbers, the Functional Group
Date and Time may serve as an additional discriminant only to
differentiate functional group identity over the longest possible time
frame.

Resources