Is data representation typically part of a "distributed application middleware"? - middleware

I am currently building a lightweight application layer which provides distributed services to applications of a specific type. This layer provides synchronization and data transmission services to applications which use that layer via an API. Therefore I classify this software as "middleware", since it bridges communication among heterogeneous distributed applications of a specific type. However, my software does not cover data representation. It therefore "only" delivers messages to other applications in a synchronized manner, but does not specify how messages look like and how they can be parsed/read/interpreted/or whatever. Instead, the developer should decide what message format he may use, e.g. JSON, XML, Protobuf, etc. The applications are most of the times governed by one developer party. Now, my question is, whether this is a severe "feature-lack" for being classified as a "distributed application middleware". The aim of the software is to glue together some heterogeneous software applications, where the software type cannot be compared to conventional software and therefore needs specific type of services (which prevents the user to "simply" use CORBA, etc.).
Thanks a lot!

Even though you leave the concrete message format open, you still have specified what formats (JSON, XML) can be used (whether hardcoded or by other means). Therefore in my opinion you have specified data representation.
If your software is modular in adding new formats, then that modularity itself is a feature (and not a lack of a feature).

Related

When it is justified to use UA OPC and UA OPC architectures over MQTT

I am new to using OPC UA, I would like me to clarify some doubts I have about OPC UA which are as follows:
– In what situations is the use of OPC UA.
– OPC UA Architectures over MQTT.
If there is any document that explains these two doubts, I thank you
OPC UA it is probably the de-facto standard for industrial M2M communication and it is very important in the context of Industrie 4.0.
Let's say you have an industrial machinery (like a PLC) that manages some others, like sensors. With OPCUA you can model into the PLC (which becomes an OPCUA server) some data, using an information model (object-structured and hierarchical, similar concepts to UML) built using rules defined by OPCUA standard (https://opcfoundation.org/developer-tools/specifications-unified-architecture/part-3-address-space-model/). So the PLC first gather data from these sensors using a specific industry protocol, then model in its address space some data that is considered relevant.
You can also build a (opcua) server on the sensors, imagine a temperature or humidity sensor in which you model data such as not only the value of the temperature, but also the manufacturer, engineering unit (Fahrenheit or Celsius for instance). But you can also insert methods within a server and associate to them some specific actions, for example turn on/off a specific functionality if some conditions occur. For all specifications you can look at https://opcfoundation.org/developer-tools/specifications-unified-architecture, where, after signing up, you can download specifications in detail. Another good documentation that I found is http://documentation.unified-automation.com/uasdkcpp/1.6.1/html/index.html where it is explained the main concepts.
Once you defined your opcua servers with an information model within its address space, you can start interacting with some others industrial machinery, in a standardized way. These machinery could be MES or HMI applications and they have to be opcua clients. They can query the opcua server above mentioned, browsing their address space, reading values, calling methods, monitoring some interesting variables or events (subscribing to them the server will send a notification when a change occurs). The main advantage is that all these operations are performed via the use of standardized messages: if you want to write a data you have to send a WriteRequest message, if you want to read the client will send a ReadRequest and so on. Since everything is standardized (from data types to serialization of messages), all clients can understand structure of opcua servers (even if they are from different manufacturers). Without that every manufacturer could use its own way to define services or variables and you have to create your application (let's say HMI) to fit to that particular vendor's APIs or conventions.
Regarding OPCUA over MQTT, in this you can find some useful information OPC UA protocol vs MQTT protocol. As I said before OPCUA has the advantage of defining a structured and a standard information model, accessible via standard services, so using MQTT is only one part of the whole.
Another good reference to understand information models in opcua server could be OPC Unified Architecture

What is the difference between data integration softwares and ESB?

I have been working on a project which collects data from various third party data sources and mines into our data stores (DI). We have been using Pentaho for this.
I want to know if this can also be done with ESB (Camel or Mule) ?
And what other features does ESB brings which DI do not offers ?
I have read lots of articles on both ESB and DI but none of them were able to resolve this query. I have also read about mule data connectors for third party data sources.
DI (Data Integration not 'dependency-injection') or ETL approaches tend to be long running batch-style jobs to approach the solution of moving data from System A to System B. The ESB or lightweight integration approach is generally to break up the task into smaller pieces (blocks of data, or single event per data item) and allow for other systems to subscribe to the data stream-- generally over an Enterprise Messaging System-- without having to impact System A, System B or the existing code project. This also means that there is no human dependency requirement in the project plan. If System C comes along, they do not necessarily require resources from the System B team to access the data stream
There are suitable use cases to have both in any given environment. However, in my experience (Big Data/MDM best practices tend to agree) is that if you have an originating stream of data, some other system will want to access the data stream at some point as well. If the ability to access the data stream without having to change existing code, systems or other teams within your organization sounds useful in your use case, than it would be a good idea to design for that up front and go with the ESB approach. This allows new interested consumers to come in and not have to rewrite the process used by the existing systems. ESB/Lightweight integration systems tend to allow that design pattern more efficiently than DI/ETL tools.
Some random thoughts:
ESB's support that "one bad record problem" by allowing you to route that to an error queue to have a human look at it and then republish
ETL/DI tend to have a straight-line happy-path speed advantage
ETL/DI start getting complicated once you go past the simple point-to-point integration use case
IMHO: ESB's are better at supporting versioning of data sets, services and data models.
ETL/DI tend to have more mature UI's for non-technical users to perform data mapping tasks
ESB's are really strong at supporting runtime decoupling of systems. If System B is down, the data just sits in a queue until it comes back up. No long running blocking thread or risk of having to restart a job
ESB has a slightly higher ramp-up curve
ETL/DI generally leads to ESB eventually (most vendors offer both a DI and ESB product)

Are the HL7-FHIR, HL7 CDA, CIMI, openEHR and ISO13606 approaches aiming to solve the same health data exchange problems?

Are the HL7-FHIR, HL7 CDA, CIMI, openEHR and ISO13606 approaches aiming to solve the same health data exchange problems?
FHIR, CDA, 13606, CIMI, and openEHR all offer partial and overlapping approaches to 'solving health data exchange problems'. They each have strengths and weaknesses, and can work together as well as overlapping each other.
FHIR is an API exchange spec that's easy to adopt
CDA is a document format that's widely supported
CIMI is a community defining formal semantic models for content
openEHR does agreed semantic models and an application infrastructure
13606 is for EHR extract exchange
CIMI is clearly an initiative to think about the content of archetypes and recurring patterns
FHIR is a specification for API's including a limited set of content models
openEHR is a community and an open-source specification with standardised Reference Model, and Archetype Object Model, Data types and Termlist
CEN/ISO 13606 is a community using its formal, public(?), and CEN and ISO standardised Reference Model, and Archetype Object Model, Data types and Termlist
Scopes of all overlap. The most overlap is between openEHR and 13606. And to a lesser extend with CIMI.
Two level Modeling Paradigm. CIMI, openEHR and 13606 have a lot of interactions and adhere to the Two level Modeling Paradigm.
Archetypes can be used by FHIR. CIMI is creating archetypes as are the openEHR and 13606 communities.
I see a future for 13606 in context of cloud EHR, were the exact location of data is not always known, but what matters is how to get access to them.
13606 can provide a standard for interfacing to the cloud, and provide features as queries and requests for detailed information, instead of precooked general purpose message-formats, like patient summaries, etc.
Erik, you write: "I do not understand why you call the openEHR specification proprietary (it is CC-BY-ND licenced and freely available online) and you call the ISO 13606 more open (it is copyrighted and behind a paywall)"
The point is is that in case of an ISO standard third parties should not claim IP. You must pay for the information, and you may not distribute the copyrighted text, but you may use the information without risk of being confronted with excessive claims afterwards.
There is a policy on ISO deliverables regarding to patents, which gives insurance for not having to deal with excessive patent-claims afterwards. See for more information:
http://isotc.iso.org/livelink/livelink/fetch/2000/2122/3770791/Common_Policy.htm?nodeid=6344764&vernum=-2
Resume: There can be IP claims on ISO deliverables, except from the copyrighted text, but those claims must be handled in a non-discriminatory and reasonable way. So, no excessive claims are possible.
In a patent-related legal case, the judge will find it important that the deliverable was published at ISO.
Two equitable defenses to patent infringement that may arise from a patent owner's delay in taking action are laches and equitable estoppel. Delays give rise to a presumption that a delay is unreasonable, inexcusable, and prejudicial. This is certainly true when it concerns an ISO deliverable.
This policy does not exist in the case CC-BY-ND licenced work. This work gives no guarantee at all. The user of CC-BY-ND licenced work is not safe for claims.
Therefore it is important that AOM2.0 will be submitted to ISO. It can only be submitted to ISO in context of the 13606 Renewal. That is why the OpenEHR community, for its own sake, must work on a Reference Model agnostic standard in all parts, to help and convince the ISO13606 renewal committee to implement it.
AOM1.4 has been an ISO standard for years, so we can be pretty sure that there is no hidden IP on that.
I would say the only standard which aim is NOT to solve data exchange problems is openEHR.
openEHR defines a complete EHR platform architecture to manage clinical data structure definitions (archetypes, templates), including constraints and terminology/translations, manage clinical information (canonical information model), access clinical information (standard query language AQL), define rules for clinical decision support (standard rule language GDL), and defines a service model (REST API is close to be approved).
So looking at openEHR, it tries to solve the all the interoperability problems that come before any data exchange but are needed to make data exchanged interpreted and used correctly, in short openEHR allows interoperability but doesn't define how data is technically exchanged.

Servant and objects - relation

I read a lot about servant and objects used in technologies such as ICE or Corba. There are a lot of resources where I can read something like this :
One servant can handle multiple objects (for resource saving).
One object can be handled by multiple servants (for reliability).
Could somebody tell me a real life example for this two statements ?
If i am not mistaken, this term was coined by Douglas Schmidt in his paper describing Common Object Request Architecture.
Here is a direct quote of few definitions:
Note: see picture below for clarity
Object -- This is a CORBA programming entity that consists of an identity, an interface, and an implementation, which is known as a Servant.
Servant -- This is an implementation programming language entity that defines the operations that support a CORBA IDL interface. Servants can be written in a variety of languages, including C, C++, Java, Smalltalk, and Ada.
CORBA IDL stubs and skeletons -- CORBA IDL stubs and skeletons serve as the ``glue'' between the client and server applications, respectively, and the ORB
ORB Interface -- An ORB is a logical entity that may be implemented in various ways (such as one or more processes or a set of libraries). To decouple applications from implementation details, the CORBA specification defines an abstract interface for an ORB. This interface provides various helper functions such as converting object references to strings and vice versa, and creating argument lists for requests made through the dynamic invocation interface described below.
CORBA
The Common Object Request Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) designed to facilitate the communication of systems that are deployed on diverse platforms. CORBA enables collaboration between systems on different operating systems, programming languages, and computing hardware
So, there are clients, server, client and server proxies, and ORB core. Client and server use proxies to communicate via ORB core, which provides a mechanism for transparently communicating client requests to target object implementations. From client perspective, this makes calls on remote objects look like the objects are in local address space and therefore simplifies design of clients in distributed environment.
Given all the above, Servant is an implementation which is an invocation target for remote client calls, and is abstracting remote objects which are actual targets.
As for your question, one servant can handle calls to multiple distributed objects which are encapsulated by the Servant. Note that the client doesn't access these objects directly but goes via Servant.
One servant for multiple objects is for example a bank, each bank account is an object but in this case you don't want to have a servant in memory for each bank account, so you have one servant for all bank accounts.
One object handled by multiple servants is for things like load balancing and fault tolerance. The client doesn't know which exact one it is executed on.

Middleware to build data-gathering and monitoring for a distributed system [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am currently looking for a good middleware to build a solution to for a monitoring and maintenance system. We are tasked with the challenge to monitor, gather data from and maintain a distributed system consisting of up to 10,000 individual nodes.
The system is clustered into groups of 5-20 nodes. Each group produces data (as a team) by processing incoming sensor data. Each group has a dedicated node (blue boxes) acting as a facade/proxy for the group, exposing data and state from the group to the outside world. These clusters are geographically separated and may connect to the outside world over different networks (one may run over fiber, another over 3G/Satellite). It is likely we will experience both shorter (seconds/minutes) and longer (hours) outages. The data is persisted by each cluster locally.
This data needs to be collected (continuously and reliably) by external & centralized server(s) (green boxes) for further processing, analysis and viewing by various clients (orange boxes). Also, we need to monitor the state of all nodes through each groups proxy node. It is not required to monitor each node directly, even though it would be good if the middleware could support that (handle heartbeat/state messages from ~10,000 nodes). In case of proxy failure, other methods are available to pinpoint individual nodes.
Furthermore, we need to be able to interact with each node to tweak settings etc. but that seems to be more easily solved since that is mostly manually handled per-node when needed. Some batch tweaking may be needed, but all-in-all it looks like a standard RPC situation (Web Service or alike). Of course, if the middleware can handle this too, via some Request/Response mechanism that would be a plus.
Requirements:
1000+ nodes publishing/offering continuous data
Data needs to be reliably (in some way) and continuously gathered to one or more servers. This will likely be built on top of the middleware using some kind of explicit request/response to ask for lost data. If this could be handled automatically by the middleware this is of course a plus.
More than one server/subscriber needs to be able to be connected to the same data producer/publisher and receive the same data
Data rate is max in the range of 10-20 per second per group
Messages sizes range from maybe ~100 bytes to 4-5 kbytes
Nodes range from embedded constrained systems to normal COTS Linux/Windows boxes
Nodes generally use C/C++, servers and clients generally C++/C#
Nodes should (preferable) not need to install additional SW or servers, i.e. one dedicated broker or extra service per node is expensive
Security will be message-based, i.e. no transport security needed
We are looking for a solution that can handle the communication between primarily proxy nodes (blue) and servers (green) for the data publishing/polling/downloading and from clients (orange) to individual nodes (RPC style) for tweaking settings.
There seems to be a lot of discussions and recommendations for the reversed situation; distributing data from server(s) to many clients, but it has been harder to find information related to the described situation. The general solution seems to be to use SNMP, Nagios, Ganglia etc. to monitor and modify large number of nodes, but the tricky part for us is the data gathering.
We have briefly looked at solutions like DDS, ZeroMQ, RabbitMQ (broker needed on all nodes?), SNMP, various monitoring tools, Web Services (JSON-RPC, REST/Protocol Buffers) etc.
So, do you have any recommendations for an easy-to-use, robust, stable, light, cross-platform, cross-language middleware (or other) solution that would fit the bill? As simple as possible but not simpler.
Disclosure: I am a long-time DDS specialist/enthusiast and I work for one of the DDS vendors.
Good DDS implementations will provide you with what you are looking for. Collection of data and monitoring of nodes is a traditional use-case for DDS and should be its sweet spot. Interacting with nodes and tweaking them is possible as well, for example by using so-called content filters to send data to a particular node. This assumes that you have a means to uniquely identify each node in the system, for example by means of a string or integer ID.
Because of the hierarchical nature of the system and its sheer (potential) size, you will probably have to introduce some routing mechanisms to forward data between clusters. Some DDS implementations can provide generic services for that. Bridging to other technologies, like DBMS or web-interfaces, is often supported as well.
Especially if you have multicast at your disposal, discovery of all participants in the system can be done automatically and will require minimal configuration. This is not required though.
To me, it looks like your system is complicated enough to require customization. I do not believe that any solution will "fit the bill easily", especially if your system needs to be fault-tolerant and robust. Most of all, you need to be aware of your requirements. A few words about DDS in the context of the ones you have mentioned:
1000+ nodes publishing/offering continuous data
This is a big number, but should be possible, especially since you have the option to take advantage of the data-partitioning features supported by DDS.
Data needs to be reliably (in some way) and continuously gathered to
one or more servers. This will likely be built on top of the
middleware using some kind of explicit request/response to ask for
lost data. If this could be handled automatically by the middleware
this is of course a plus.
DDS supports a rich set of so-called Quality of Service (QoS) settings specifying how the infrastructure should treat that data it is distributing. These are name-value pairs set by the developer. Reliability and data-availability area among the supported QoS-es. This should take care of your requirement automatically.
More than one server/subscriber needs to be able to be connected to
the same data producer/publisher and receive the same data
One-to-many or many-to-many distribution is a common use-case.
Data rate is max in the range of 10-20 per second per group
Adding up to a total maximum of 20,000 messages per second is doable, especially if data-flows are partitioned.
Messages sizes range from maybe ~100 bytes to 4-5 kbytes
As long as messages do not get excessively large, the number of messages is typically more limiting than the total amount of kbytes transported over the wire -- unless large messages are of very complicated structure.
Nodes range from embedded constrained systems to normal COTS
Linux/Windows boxes
Some DDS implementations support a large range of OS/platform combinations, which can be mixed in a system.
Nodes generally use C/C++, servers and clients generally C++/C#
These are typically supported and can be mixed in a system.
Nodes should (preferable) not need to install additional SW or
servers, i.e. one dedicated broker or extra service per node is
expensive
Such options are available, but the need for extra services depends on the DDS implementation and the features you want to use.
Security will be message-based, i.e. no transport security needed
That certainly makes life easier for you -- but not so much for those who have to implement that protection at the message level. DDS Security is one of the newer standards in the DDS ecosystem that provides a comprehensive security model transparent to the application.
Seems ZeroMQ will fit the bill easily, with no central infrastructure to manage. Since your monitoring servers are fixed, it's really quite a simple problem to solve. This section in the 0MQ Guide may help:
http://zguide.zeromq.org/page:all#Distributed-Logging-and-Monitoring
You mention "reliability", but could you specify the actual set of failures you want to recover? If you are using TCP then the network is by definition "reliable" already.

Resources