Xbee Stops Searching For Coordinator?

Xbee Stops Searching For Coordinator? - timeout

I'm trying to assamble a healty communication between two Xbee modules. I've succedd what I want but there's a problem that I couldn't figure out.
There are 2 parameters in X-CTU which makes me confused.
JV - Channel Verification
NW - Network Watchdog Timeout
I've read all instructions from datasheet, I've googled many times but I couldn't find any significant solution. What is my problem? I'm working on some "worst cases". For now, the worst case for me is starting of router before loooong looong time from starting of coordinator. I've solved this problem by setting JV = 0 and NW = 1. But if I set JV = 1, NW = 1, and if I started coordinator long time (for ex. half an hour) after router, router does not trying to find coordinator. Why is this happening? I couldn't understand what's going on in Xbee.

The JV parameter will cause an XBee Router node to search for its Coordinator upon power-up (and only on power-up!). If it does not find the Coordinator it will leave the current network and channel its on and go looking around for another Coordinator it can join.
The NW parameter is like the JV parameter but more agressive: it will cause the XBee Router to leave the network if it doesn't hear from the Coordinator in NW * 3 minutes.
If you have set JV=1 and NW=10 on your XBee router then the XBee will search for its Coordinator on power up and check to see that it has spoke with the Coordinator at least once in 30 minutes. If it either can't find the Coordinator or if it hasn't spoken to the Coordinator, it will cause the Router to leave the network and look for another network to join.
If your XBee Router is unable to find the Coordinator any longer have a look at the Coordinators NJ parameter. NJ must be set to 0xff (255) to always allow joining. Otherwise if NJ < 255, it cause the Coordinator to disable joining from nodes after NJ seconds. For example, if you set NJ to 60 then Routers will only be able to join the network 60 seconds after the Coordinator has created the network. This parameter is normally set for security reasons where the network has been deployed and you want to make absolutely positive that nobody else can join.
If all else fails feel free to ask your question in the Digi XBee ZB forum: http://forums.digi.com/support/forum/listthreads?forum=5

Related

Distributed Hash Tables. Do I need shortcuts (even in a large system)?

I am learning about DHT and I came upon shortcuts. It seems they are used to make the routing faster and skip going directly back-the-chain for better performance. What I don't understand is: Suppose we have a circular DHT made out of 100 servers/nodes/HT. You get some key data to server/node/HT 10 and it must be sent to server/node/HT 76. When the destination is reached, and the value is taken couldn't I just provide the IP of the requester (server 10) and then it will directly send the value to 10, which seems to make shortcuts useless?
Thank you in advance.
Edit: Useless for returning the value. Not getting to it.

You're assuming a circular network layout and forwading-based routing. Both of which only apply to a subset of DHTs.
Anyway, the forward path would still go through all the nodes, any of which might be down or have transient network problems. As the number of hops goes up so does the cumulative error probability. Additionally it increases latency, which matters on a global scale because at least simple DHT routing algorithms don't account for physical proximity.
For the return it can also matter if reachability is asymmetrical, e.g. due to firewalls.

Where are NVMe commands located inside the PCIe BAR?

According to the NVMe specification, the BAR has tail and head fields for each queue. For example:
Submission Queue y Tail Doorbell (SQyTDBL):
Start: 1000h + (2y * (4 << CAP.DSTRD))
End: 1003h + (2y * (4 << CAP.DSTRD))
Submission Queue y Head Doorbell (SQyHDBL):
Start: 1000h + ((2y + 1) * (4 << CAP.DSTRD))
End: 1003h + ((2y + 1) * (4 << CAP.DSTRD))
Are there the queue itself or just mere pointers? Is this correct? If it is the queue, I would assume the DSTRD indicates the maximum length of all queues.
Moreover, the specification talks about two optional regions: Host Memory Buffer (HMB) and Controller Memory Buffer (CMB).
HMB: a region within the host's DRAM (PCIe root)
CMB: a region within the NVMe controller's DRAM (inside the SSD)
If both are optional, where is it located then? Since endpoint PCIe only works with BARs and PCI Headers, I don't see any other place they might be located, other than a BAR.

Sorry but I am doing this from memory but I have implemented an FPGA NVMe host so hopefully my memory will be enough to answer your questions and more, if I get something wrong though at least you know why. I'll be providing reference sections from the specification which you can find here. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf Also as a note before I really answer your question I want to clarify some confusion, understanding the spec takes some time I honestly recommend reading it bottom to top the last few sections help give context for the first few as strange as that sounds.
These are the submission and completion queues, specifically the subqueue tail and completion queue head respectively (SECTION 3.1). More on this later I just wanted to correct the missconception that you access the submission queue head as the host, you do not only the controller (traditionally the drive) does. A simple reminder submission is you asking the drive to do something, completion is the drive telling you how it went. Read SECTION 7.2 for more info.
Before you can send anything to these queues you must first setup said queues. Baseline in the system these queues do not exist, you must use the admin queue to set them up.
28h 2Fh ASQ Admin Submission Queue Base Address
30h 37h ACQ Admin Completion Queue Base Address
Your statement about DSTRD is a huge miss understanding. This field is from the capabilities register (0x0) Figure 3.1.1. This field is the controller (drive) telling you the "doorbell stride" which says how many bytes are between each doorbell, I've never seen a drive report anything but 0 for this value since well, why would you want to leave dead space between doorbell registers.
Please be careful with the size of your writes, in my experience most NVMe drives require you to send writes of at least 2dwords (8 bytes) even if you only intend to send 1dword of data, just a note.
Onto actually helping you use this thing as a host, please reference SECTION 7.6.1 to find the initialization sequence. Notice how you must setup multiple registers, read certain parameters and other such things.
Assuming you or someone else has done initalization let me now answer the core of your question, how to use these queues. The thing is, this answer spans MANY sections of the spec and is the core of it. So with that I am going to break it down as best I can for a simple write command. Please note you CANNOT write, until you have first created the queues using the admin queues which leverage different opcodes from a different section of the spec, sorry I cannot write all of this out.
STEPS TO WRITING DATA TO AN NVMe DRIVE.
In the creation of the submission queue you will specify the size of this specific queue. This is the number of commands that can be placed in the queue at one time for processing. Along with this you will specify the queue base address. So for this example let's assume you set the base address to 0x1000_0000 and size 16 (0x10). Figure 105 let's us know that every submission queue entry has a size of 64bytes (0x40) so queue entry 0 is at 0x1000_0000 entry 1 is at 0x1000_0040 2 0x1000_0080 and so on for our 16 entries then it loops back.
You will first store data for writing, let's say you were given 512bytes (0x200) of data to write. So for simplicity you place that data at 0x2000_0000 - 0x2000_0200.
You create the submission queue command. This is not a simple process. I'm not going to document all of this for you but understand you should be referencing Figure 104, Figure 346, and Section 6.15. This is not enough however. You will also need to understand PRP vs SGL and which you are using (PRP is easier to start with). NLB (Number of logical blocks) which determine your write size, with NVMe you do not specify writes in bytes but in terms of NLBs which the size is specified by the controller (drive), it may implement multiple NLB sizes but this is up to the drive not you as the host, you just get to pick from what it supports Section 5.15.2.1, Figure 245 You want to look at identify namespace to tell you the LBA (logical block address) size, this will lead you down a rabbit hole to determine the actual size but that's ok the info is there.
Ok so you finished this mess and have created the submission command. Let's assume the host has already completed 2 commands on this queue (at start this will be 0 I'm picking 2 just to be clearer in my example). What you now need to do is place this command at 0x1000_0080.
Now let's assume this is queue 1 (from the equation you posted the queue number is the y value. Note that queue 0 is the admin queue). What you need to do is poke the controllers submission queue tail doorbell to say how many commands are now loaded (thus you can queue multiple up at once and only tell the drive when you are ready to). In this case the number is 2. So you need to write the value 2 to register 0x1008.
At this point the drive will go. aha, the host has told me there are new commands to fetch. So the controller will go to queue base address + commandsize*2 and fetch 64bytes of data aka 1 command (address 0x1000_0080). The controller will decode this command as a write which means the controller (drive) must read data from some address and put it in memory where it was told to. This means your write command should tell the drive to go to address 0x2000_0000 and read 512 bytes of data, and it will if you scope the PCIe bus. At this point the drive will fill out a completion queue entry (16 bytes specified at Section 4.6) and place it in the completion queue address you specified at queue creation (plus 0x20 since this is the 2nd completion). Then the controller will generate and MSI-X interrupt.
At this point you must go to wherever the completion queue was placed and read the response to check status, and also if you queued multiple submissions check the SQID to see what finished since jobs can finish out of order. You then must write to the completion queue head (0x100C) to indicate that you have retrieved the completion queue (success or failure). Notice here you never interact with the submission queue head (that's up to the controller since only he knows when the submission queue entry was processed) and only the controller places things in the completion queue tail since only he can create new entries.
I'm sorry this is so long and not well formatted but hopefully you now have a slightly better understanding of NVMe, it's a bit of a mess at first but once you get it it all makes sense. Just remember my example assumed you had created a queue which baseline doesn't exist. First you need to setup the admin submission and completion queues (0x28 and 0x30) which has queue ID 0 thus it's tail/head doorbell is address 0x1000,0x1004 respectively. You then must reference Section 5 to find the opcodes to make stuff happen but I have faith you can figure it out from what I've given you. If you have any more questions put a comment down and I'll see what I can do.

How to paginate using Orient 3.0 streams

According to streaming example at http://orientdb.com/docs/3.0.x/java/Java-Query-API.html, we can use the Orient result set streaming API as follows
ODatabaseDocument db;
...
String statement = "SELECT FROM V WHERE name = ? and surnanme = ?";
OResultSet rs = db.query(statement, "John", "Smith");
rs.stream().forEach(x -> System.out.println(x.getProperty("age")));
rs.close();
This is fine but too trivial - what if we need to keep the rs/stream around? We can't very well close the resultset because we want to reuse the stream on a subsequent user request in a web application, say (in scenarios such as paging).
But to keep the streams "alive" the Orient user guide says that:
OResultSet is implemented as a paginated structure, that holds some
iterators open during the iteration. This is true both in remote and
in embedded usage.
You should always invoke OResultSet.close() at the end of the
execution, to free resources.
OResultSet instances are automatically closed when you close the
ODatabase that returned them.
It is important to always close result sets, even when they are
converted to streams (after the stream is consumed).
Are there any best practices around this. As far as I can tell, we would need to:
1) Keep the Orient database connection open until the user "paging" session is done (which could be say 5-10 minutes). Only when the user says "done" can we close the result set & close the database connection. The Orient database connection (and whatever stream it generated) thus becomes "private" to a single application user. Moreover, since every user request can be activated on a different thread, the said database connection would need to be made active on the current thread before using it.
2) Use the Java Stream API to navigate through arbitrary subsets of the "arbitrarily" large resultset. How would memory usage be handled by the underlying Orient db stream implementation? What determines the memory usage for using a "single rs/stream" and keeping it around for a while? What happens when we have thousands of open rs/streams especially if each user has their own "private" rs/stream they're looking at?
3) If a given Orient database connection can only be used on a single thread at a time (an Orient requirement), how do we handle multiple users with their own custom long-lived rs/streams/connections? Does this mean that if we have a 1000 clients using their own private rs/stream (that they hang on to for say 5 minutes), then we have to keep 1000 database connections open (i.e. one for each user/rs?) What are the limits around this? This style is obviously quite different from the more typical execute query/close rs pattern for quick user interaction that is stateless from one request to the next (naive paging that re-executes queries every time for a given range and this can get expensive)
P.S. I realize that once we get a Java stream, then we pretty much start just using the Java API itself - so I suppose that JOOQ streaming usage (for example) would be pretty similar to Orient streaming usage once you start getting into the Stream interfaces - I'm not familiar with the Java Streams API, but I suppose How to paginate a list of objects in Java 8? is a good place to start?

My conclusion is that streaming works well when scrolling through a large result set without consuming a large amount of memory or having to keep re-executing offset/limit queries (similar to forward only scrolling over JDBC resultsets). A typical use case is an export scenario.
For forward and backward paging, in Orient at least, you likely need an indexed property/properties and perform range queries - you'll need to make sure the index is SB-tree so that it supports range queries.
FYI, Solr has a cursor mechanism which works pretty well for forward pagination on sorted results - but if you keep some simple state markers on the client you can also go back to results already encountered. "go to" random pages is not supported in Solr cursors but you can always re-sort/filter on some other criteria in order to move "useful" results to the top of the resultset instead of deep paging (https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html)

Missing master heartbeat does not cause node to react in a CANopen system

I have a strange finding about the heartbeat-protocol in CANopen. Maybe somebody else has seen something like this and maybe it is supposed to work like this... Anyway, here's what it's about:
In CANopen there are two timeout-based life-guarding mechanisms: the first is node guarding, which I will not mention further, since it's considered old news.
The other one is called heartbeat. It is pretty simple: Any participant on the network sends a regular message stating its node ID and its state. The frequency is defined by object 0x1017sub0 and is called heartbeat-producer-time. If it is set to zero, no heartbeat is being sent.
Any other participant can then define a number of nodes it wants to find on the network plus the maximum time there may be between two consecutive heartbeat-messages. This information is stored in object 0x1016sub1..n as 32-bit entries for as many nodes as this particular node wants to listen to.
The entries consist of the node ID (bits 22 to 16) and the mentioned maximum time that may elaps between heartbeats, called the heartbeat-consumer-time (in bits 15..0). Again if the entry is zero, it is being ignored.
As you may have gathered, there is no distinction between network-master (node ID 1) and slaves (node IDs 2 to 127).
So far the theory, now for my problem:
I configure one of the slave-nodes in my network as a heartbeat-consumer for the master, so there's an entry in object 0x1016sub1 that looks like this: 0x000107D0. Meaning that a heartbeat-message from the master is expected after at least two seconds.
I have observed that this works in two examples. If I send a master-heartbeat for a time and then stop, the node either returns to pre-operational mode or sends an appropriate emergency-message.
If I don't send any master-heartbeat-messages, I would expect that after I start the node (send it into operational mode) it takes at most two seconds for the node to either return to pre-operational mode or send an appropriate emergency-message or perhaps even both. But in the two examples I tried, nothing happened. If I never send any heartbeat, the node never expects one and just keeps on running.
The two examples are very different from each other. I am not sure whether they use the same CANopen-stack library perhaps.
Is there an explanation?

If you read CANopen User Manual, section 1.3.1.6, page 39, you will notice that the heartbeat consumer is first activated upon receiving a heartbeat from the producer. I would assume then that, since in your example the first heartbeat is never sent, the consumer is not activated.

What guarantees does erlang's "monitor" give?

While reading the ERTS user's guide, I found this section:
The only signal ordering guarantee given is the following. If an entity sends multiple signals to the same destination
entity, the order will be preserved. That is, if A sends a signal S1 to B, and later sends the signal S2 to B, S1 is
guaranteed not to arrive after S2.
I've also happened across this while doing further research googling:
Erlang Reference Manual, 13.5:
Message sending is asynchronous and safe, the message is guaranteed to eventually reach the recipient, provided that the recipient exists.
That seems very vague and I'd like to know what guarantees I can rely on in the following scenario:
A,B are processes on two different nodes.
Assume A does not crash and B was a valid node at some point.
A and B monitor each other.
A sends messages M1,M2,M3 to B
In the above scenario, is it possible that B receives M1,M3 (M2 is dropped),
without any sort of 'DOWN'/'EXIT'/heartbeat timeout being received at A?

There are no other guarantees other than the ordering guarantee. Note that by default you don't even know who the sender is, unless the sender encodes this in the message.
Your example could happen:
A sends M1 and M2
B receives M1
The node on which B resides gets disconnected
The node on which B resides comes up again
A sends M3 to B
B receives M3
M2 can be lost on the network link in this scenario. It is highly unlikely this happens, but it can happen. The usual trick is to have some kind of notion of such errors. Either by having a timeout trigger, or by monitoring the node or Pid which is the recipient of the message.
Updated scenario:
In the updated scenario, provided I read it correctly, then A would get a 'DOWN' style message at some point, and likewise, it would get a message telling you that the node is up again, if you monitor the node.
Though often, such things are better modeled using an idempotent protocol if at all possible.

Reading through the erlang mailing-list and the academic faq, it seems like there are a few guarantees provided by the ERTS implementation, however I was not able to determine whether or not they are guaranteed at a language/specification level, too.
If you assume TCP is "reliable", then the current implementation guarantees that
given A,B are processes on different nodes (&hosts) and A monitors B, A sends to B, assuming A doesn't crash, any message delivery failures* between the two nodes or host/node/process failures on B will lead to A getting a 'DOWN' message (or 'EXIT' in the case of links). [ See 1 and 2 ]
*From what I have read on the mailing-list thread , this property is almost entirely based on the fact that TCP is used, so "message delivery failure" means any situation where TCP decides that a failure has occurred/the connection needs to be closed.
The academic faq talks about this like it's also a language/specification level guarantee, however I couldn't yet find anything to back that up.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart