We were recently reading the BEAM book as part of our reading group. In Chapter 7 there is an allusion to the ref trick/Synchronous Call Trick in Erlang.
Unfortunately, the book is incomplete and after discussion we were unable to figure out the ref trick was.
When performing a receive, the BEAM VM scans the mailbox in order to find the first suitable (matching) message, and blocks the process if it does not find any.
The 'trick' is that, since it's impossible for a new reference to be in the mailbox before it was created, there's no need to scan the whole mailbox when receive {Reference, Term}, only from the point where Reference was created.
That's the meaning of the following phrase:
The compiler recognizes code that uses a newly created reference (ref) in a receive (see [ref_trick_code]), and emits code to avoid the long inbox scan since the new ref can not already be in the inbox.
Related
I'm writing a data pipeline using Reactor and Reactor Kafka and use spring's Message<> to save
the ReceiverOffset of ReceiverRecord in the headers, to be able to use ReciverOffset.acknowledge() when finish processing. I'm also using the out-of-order commit feature enabled.
When an event process fails I want to be able to log the error, write to another topic that represents all the failure events, and commit to the source topic. I'm currently solving that by returning Either<Message<Error>,Message<myPojo>> from each processing stage, that way the stream will not be stopped by exceptions and I'm able to save the original event headers and eventually commit the failed messages at the button of the pipeline.
The problem is that each step of the pipline gets Either<> as input and needs to filter the previous errors, apply the logic only on the Either.right and that could be cumbersome, especially when working with buffers and the operator get 'List<Either<>>' as input. So I would want to keep my business pipeline clean and get only Message<MyPojo> as input but also not missing errors that need to be handled.
I read that sending those message erros to other channel or stream is a soulution for that.
Spring Integration uses that pattern for error handling and I also read an article (link to article) that solves this problem in Akka Streams using 'divertTo()':
I couldn't find documentation or code examples of how to implement that in Reactor,
is there any way to use Spring Integration error channel with Reactor? or any other ideas to implement that?
Not familiar with reactor per se, but you can keep the stream linear. The trick, since Vavr's Either is right-biased is to use flatMap, which would take a function from Message<MyPojo> to Either<Message<Error>, Message<MyPojo>>. If the Either coming in is a right (i.e. a Message<MyPojo>, the function gets invoked and otherwise it just gets passed through.
// Apologies if the Java is atrocious... haven't written Java since pre-Java 8
incomingEither.flatMap(
myPojoMessage -> ... // compute a new Either
)
Presumably at some point you want to do something (publish to a dead-letter topic, tickle metrics, whatever) with the Message<Error> case, so for that, orElseRun will come in handy.
Message sending in Erlang is asynchronous, meaning that a send expression such as PidB ! msg evaluated by a process PidA immediately yields the result msg without blocking the latter. Naturally, its side effect is that of sending msg to PidB.
Since this mode of message passing does not provide any message delivery guarantees, the sender must itself ascertain whether a message has been actually delivered by asking the recipient to confirm accordingly. After all, confirming whether a message has been delivered might not always be required.
This holds true in both the local and distributed cases: in the latter scenario, the sender cannot simply assume that the remote node is always available; in the local scenario, where processes live on the same Erlang node, a process may send a message to a non-existent process.
I am curious as to how the side effect portion of !, i.e, message sending, works at the VM-level when the sender and recipient processes live on the same node. In particular, I would like to know whether the sending operation completes before returning. By completes, I mean to say that for the specific case of local processes, the sender: (i) acquires a lock on the message queue of the recipient, (ii) writes the message directly into its queue, (iii) releases the lock and, (iv) finally returns.
I came across this post which I did not fully understand, although it seems to indicate that this could be the case.
Erik Stenman's The Beam Book, which explains many implementation details of the Erlang VM, answers your question in great detail in its "Lock Free Message Passing" section. The full answer is too long to copy here, but the short answer to your question is that yes, the sending process completely copies its message to a memory area accessible to the receiver. If you consult the book you'll find that it's more complicated than steps i-iv you describe in your question due to issues such as different send flags, whether locks are already taken by other processes, multiple memory areas, and the state of the receiving process.
I'm facing a situation where I have multiple robots, most running full ROS stacks (complete with Master) and I'd like to selectively route some topics through another messaging framework to the other robots (some of which not running ROS).
The naive way to do this works, namely, to set up a node that subscribes to the ROS topics in question and sends that over the network, after which another node publishes it (if its ROS). Great, but it seems odd to have to do this much serializing. Right now the message goes from its message type to the ROS serialization, back to the message type, then to a different serialization format (currently Pickle), across the network, then back to the message type, then back to the ROS serialization, then back to the message type.
So the question is, can I simplify this? How can I operate on the ROS serialized data (ie subscribe without rospy automagically deserializing for me)? http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers suggests that I can access the connection information as dict of strings, which may be half of the solution, but how can the other end take the connection information and republish it without first deserializing and then immediately reserializing?
Edit: I just found https://gist.github.com/wkentaro/2cd56593107c158e2e02 , which seems to solve half of this. It uses AnyMsg to avoid deserializing on the ROS subscriber side, but then when it republishes it still deserializes and immediately reserializes the message. Is what I'm asking impossible?
Just to close the loop on this, it turns out you can publish AnyMsgs, it's just that the linked examples chose not to.
I've been toying with Azure Data Lake Store and in the documentation Microsoft claims that the system is optimized for low-latency small writes to files. Testing it out I tried to perform a big amount of writes on parallel tasks to a single file, but this method fails in most cases returning a Bad Request. This link https://issues.apache.org/jira/secure/attachment/12445209/appendDesign3.pdf shows that HDFS isn't made to handle concurrent appends on a single file, so I tried a second time using the ConcurrentAppendAsync method found in the API, but although the method doesn't crash, my file's never modified on the store.
What you have found out is correct about how parallel writes will work. I am assuming you have already read the documentation of ConcurrentAppendAsync.
So, in your case, did you use the same file for the Webhdfs write test and the ConcurrentAppendAsync? If that's the case, then ConcurrentAppendAsync will not work, as mentioned in the documentation. But you should have got an error in that case.
In any case, let us know what happened and we can investigate further.
Thanks,
Sachin Sheth
Program Manager - Azure Data Lake
I heard that tagging messages with references in Erlang as shown here (see part about references), will prevent the process to go through the all message queue when using "receive". Is that true?
Yes, see OTP-8623 in http://www.erlang.org/doc/apps/erts/notes.html#id65167