Canonical term for "simultaneous data-latching" - buffer

I have a module (in my case on an FPGA) where several input values (registers) are updated sequentially (if at all), but are all copied in parallel in a single, atomic step to guarantee coherency during the following data processing steps.
Is there a common term for this?
"Double-Buffering" comes close, but usually refers to two buffers in parallel that are swapped, while I'm actually copying them "forward".
"Simultaneous data-latching" is what I'm using at the moment, but it sounds unfamiliar...
Illustration
store_1 store_all do_stuff
| | |
+-+-+ +-+-+ +-+-+
value_1 --> |Reg| --- int_1 --> |Reg| --> input_1 --> | M |
+---+ +---+ | o |
| | d |
store_2 | | u | --> outputs
| | | l |
+-+-+ +-+-+ | e |
value_2 --> |Reg| --- int_2 --> |Reg| --> input_2 --> | |
+---+ +---+ +---+
__ ____________________
value_1 __><____________________
store_1 ______|_________________
_______ _______________
input_1 _______><_______________
__________ ____________
value_2 __________><____________
store_2 ______________|_________
_______________ _______
input_2 _______________><_______
store_all ____________________|___
_____________________ _
out_1 _____________________><_
_____________________ _
out_2 _____________________><_
do_stuff _______________________|
Here store_1, store_2 and store_all are write-enable signals for their registers, are synchronously clocked with the same clock.

Looks like basic shift-registers to me.
This type of arrangement might be called retiming registers if there presence is purely to help ease the routing pressure to 'Module'. Or perhaps pipelining registers if they are inserted to maintain some specific alignment between 'value_1/value_2' and 'outputs', say if the latency through Module changed at some point.
If control signals store_1 / store_2 /store_all are infact clocks, they may be called synchronisation stages, although the design has problems in this case.
Your question mixes words that are tangentially related to digital logic timing, without being specific.

Related

Which Starspace training mode to use for multi-level embeddings

I am using the StarSpace embedding framework for the first time and am unclear on the "modes" that it provides for training and the differences between them.
The options are:
wordspace
sentencespace
articlespace
tagspace
docspace
pagespace
entityrelationspace/graphspace
Let's say I have a dataset that looks like this:
| Author | City | Tweet_ID | Tweet_contents |
|:-------|:-------|:----------|:-----------------------------------|
| A | NYC | 1 | "This is usually a short sentence" |
| A | LONDON | 2 | "Another short sentence" |
| B | PARIS | 3 | "Check out this cool track" |
| B | BERLIN | 4 | "I like turtles" |
| C | PARIS | 5 | "It was a dark and stormy night" |
| ... | ... | ... | ... |
(In reality, my dataset is not a language data and looks nothing like this, but this example demonstrates the point well enough.)
I would like to simultaneously create embeddings from scratch (not using pre-existing embeddings at any point) for each of the following:
Authors
Cities
Tweet/Sentences/Documents (EG. 1, 2, 3, 4, 5, etc.)
Words (EG. 'This', 'is', 'usually', ..., 'stormy', 'night', etc.)
Even after reading the coumentation, it doesn't seem clear which 'mode' of starspace training I should be using.
If anyone could help me understand how to interpret the modes to help select the appropriate one, that would be much appreciated.
I would also like to know if there are conditions under which the embeddings generated using one of the modes above, would in some way be equivalent to the embeddings built using a different mode (ignoring the fact that the embeddings would be different because of the non-determinstic nature of the process.)
Thank you

Mapping timeseries+static information into an ML model (XGBoost)

So lets say I have multiple probs, where one prob has two input DataFrames:
Input:
One constant stream of data (e.g. from a sensor) Second step: Multiple streams from multiple sensors
> df_prob1_stream1
timestamp | ident | measure1 | measure2 | total_amount |
----------------------------+--------+--------------+----------+--------------+
2019-09-16 20:00:10.053174 | A | 0.380 | 0.08 | 2952618 |
2019-09-16 20:00:00.080592 | A | 0.300 | 0.11 | 2982228 |
... (1 million more rows - until a pre-defined ts) ...
One static DataFrame of information, mapped to an unique identifier called ident, which needs to be assigned to the ident column in each df_probX_streamX in order to let the system recognize, that this data is related.
> df_global
ident | some1 | some2 | some3 |
--------+--------------+----------+--------------+
A | LARGE | 8137 | 1 |
B | SMALL | 1234 | 2 |
Output:
A binary classifier [0,1]
So how can I suitable train XGBoost to be able to make the best usage of one timeseries DataFrame in combination with one static DataFrame (containg additional context information) in one prob? Any help would be appreciated.

Visualising the motion of multiple robots

I am trying to add multiple robot instances and visualising their motions. I tried a couple of ways to do this and I ran into errors/issues. They are as follows:
I tried adding another model instance after the system is created.
parsers::urdf::AddModelInstanceFromUrdfFileToWorld(
FindResourceOrThrow("path/CompassGait.urdf"),
multibody::joints::kRollPitchYaw, tree.get());
parsers::urdf::AddModelInstanceFromUrdfFileToWorld(
FindResourceOrThrow("path/quadrotor.urdf"),
multibody::joints::kRollPitchYaw, tree.get());
As expected, there are two robots which are visible in the visualiser and there are 26 output ports in the system. But i am unable to visualise the required motion by the quadrotor. It seems to be following the x,y,z and roll pitch and yaw derivatives given as an input for the compass gait's output port. Is this an expected behaviour?
I experience a similar thing when I add 2 compass gait models and try to make them follow the same path. Even though I give the ports 14-27 the same inputs as i give to 0-13. The second robot is stuck near the origin while the first one moves fine as expected without any issues.
I needed some help or maybe some examples where I can get a better idea about visualising the motion for multiple robots.
[Updated] Please see note at the bottom.
drake::systems::DrakeVisualizer (which I assume you're using to publish your visualization messages) was designed to be connected to the state_output_port of drake::systems::RigidBodyPlant. According to the documentation for RigidBodyPlant,
The state vector, x, consists of generalized positions followed by generalized velocities.
That is, the generalized positions for all model instances come before the generalized velocities. When working with RigidBodyPlant and DrakeVisualizer this ordering is handled automatically.
From your question, however, I gather that you have separate, specialized systems for your quadrotor and compass-gait models (as per their respective examples). Each of these systems outputs its state as [q, v], where q is the generalized position vector and v is the generalized velocity vector. In that case you will need to use drake::systems::Demultiplexer and drake::systems::Multiplexer to split the outputs of the quadrotor and compass-gait systems and reassemble them in the required order:
+---------------+ +-------------+ q +-------------+
| | | +-------->+ |
| Compass-gait +-->+Demultiplexer| | |
| | | +-----+ v | |
+---------------+ +-------------+ | | |
+----->+ | +-----------------+
| | | | | |
| | | Multiplexer +-->+ DrakeVisualizer |
q| | | | | |
| +-->+ | +-----------------+
+---------------+ +-------------+ | | |
| | | +--+ | |
| Quadrotor +-->+Demultiplexer| | |
| | | +---+---->+ |
+---------------+ +-------------+ v +-------------+
Note: RigidBodyPlant and associated classes are being replaced by drake::multibody::MultibodyPlant and drake::geometry::SceneGraph. See run_quadrotor_lqr.cc for an example of using these new tools with a specialized plant.

Slowly increasing memory usage of Dask Sheduler

I'm running a test:
client = Client('127.0.0.1:8786')
def x(i):
return {}
while True:
start = time.time()
a = client.submit(randint(0,1000000))
res = a.result()
del a
end = time.time()
print("Ran on %s with res %s" % (end-start, res))
client.shutdown()
del client
I used it (with more code) to get an estimate of my queries performance. But for this example I've removed all things I could think of.
The above code leaks roughly 0.1 MB per second, which I would guesstimate to roughly 0.3MB per 1000 calls.
Am I doing something wrong in my code?
My python debugging skills are a bit rusty (and with a bit I mean I last used objgraph on Orbited (the precursor to websockets) in 2009 https://pypi.python.org/pypi/orbited) but from what I can see, checking to number of references before and after:
Counting objects in the scheduler, before and after using objgraph.show_most_common_types()
| What | Before | After | Diff |
|-------------+------------------+--------|---------+
| function | 33318 | 33399 | 81 |
| dict | 17988 | 18277 | 289 |
| tuple | 16439 | 28062 | 11623 |
| list | 10926 | 11257 | 331 |
| OrderedDict | N/A | 7168 | 7168|
It's not a huge number of RAM in any case, but digging deeper I found that t scheduler._transition_counter is 11453 and scheduler.transition_log is filled with:
('x-25ca747a80f8057c081bf1bca6ddd481', 'released', 'waiting',
OrderedDict([('x-25ca747a80f8057c081bf1bca6ddd481', 'processing')]), 4121),
('x-25ca747a80f8057c081bf1bca6ddd481', 'waiting', 'processing', {}, 4122),
('x-25cb592650bd793a4123f2df39a54e29', 'memory', 'released', OrderedDict(), 4123),
('x-25cb592650bd793a4123f2df39a54e29', 'released', 'forgotten', {}, 4124),
('x-25ca747a80f8057c081bf1bca6ddd481', 'processing', 'memory', OrderedDict(), 4125),
('x-b6621de1a823857d2f206fbe8afbeb46', 'released', 'waiting', OrderedDict([('x-b6621de1a823857d2f206fbe8afbeb46', 'processing')]), 4126)
First error on my part
Which of course led me to realise the first error on my part was not configuring transition-log-length.
After setting configuration transition-log-length to 10:
| What | Before | After | Diff |
| ---------------+----------+--------+---------|
| function | 33323 | 33336 | 13 |
| dict | 17987 | 18120 | 133 |
| tuple | 16530 | 16342 | -188 |
| list | 10928 | 11136 | 208 |
| _lru_list_elem | N/A | 5609 | 5609 |
A quick google found that _lru_list_elem is made by #functools.lru_cache which in turn is in invoked in key_split (in distributed/utils.py)
Which is the LRU cache, of up to 100 000 items.
Second try
Based on the code it appears as Dask should climb up to roughly 10k _lru_list_elem
After running my script again and watching the memory it climbs quite fast up until I approach 100k _lru_list_elem, afterwards it stops climbing almost entirely.
This appears to be the case, since it pretty much flat-lines after 100k
So no leak, but fun to get hands dirty on Dask source code and Python memory profilers
For diagnostic, logging, and performance reasons the Dask scheduler keeps records on many of its interactions with workers and clients in fixed-sized deques. These records do accumulate, but only to a finite extent.
We also try to ensure that we don't keep around anything that would be too large.
Seeing memory use climb up until a nice round number like what you've seen and then stay steady seems to be consistent with this.

Transitioning to previous state

I am designing a state-machine and have one specific state that I can enter from two different states... I am not sure how to go back to the previous state... or am I modeling it wrong ?
to illustrate :
| state | trigger | nextstate
---------------------------------
1. | initial | evtX | A
2. | initial | evtY | B
3. | B | evtX | A
4. | A | evtZ | ????
The last row is where I am having trouble. I need to transition to initial state, if A was arrived at from the transiton in row number 1 and I need to transition to state B, if A was arrived at from transition in row number 3.
How can i model this better ?
In fact, you have two different A states:
| state | trigger | nextstate
---------------------------------
1. | initial | evtX | A1
2. | initial | evtY | B
3. | B | evtX | A2
4. | A1 | evtZ | initial
4. | A2 | evtZ | B
If you want something more powerful, try with Harel/UML statecharts (which have 'superstates, orthogonal regions, and activities as part of a state" [1]). You might have a look at SCXML as weel [2]. I don't know any of them though.
[1] http://en.wikipedia.org/wiki/Harel_statechart#Harel_statechart
[2] http://en.wikipedia.org/wiki/SCXML

Resources