Exposing a library via zeromq - libraries

I am wanting to know what would be the best way to expose a library via zeromq. Say, I install a machine learning library (mll) on one machine, and I have a zeromq broker running on another. Now, if I have a zeromq client which needs to call functions within the mll, how can it do so via the broker.
I am wanting to know the steps I will need to take to make this work for libraries in a generic way.

Basically you need to have a "listener" that picks up data from ZMQ and feeds it to your machine-learning backend code, then transmits the results back to the requestor.
There are a lot of design choices to be made, such as what format to use to serialize data between client and server (JSON? YAML? Pickle? Thrift? ...) , and how to encode requests and request options. But all things considered, this is a pretty straightforward ZMQ usage.
The problem comes when you want a more feature-rich, complete, robust, etc. design--things like multi-threaded or multi-process servers, multi-machine scalability, secure user / request authentication and authorization, job reporting and dashboard, or job checkpointing. All those "extras" are common "network job scheduler" or "(enterprise) message broker" functions that tend to come out-of-the-box with packages like Celery or RQ.
If you don't want to go the full "message broker middleware" route, you might start by examining others' designs for lightweight ZMQ-based job brokers, such as this one from Jeff Knupp.

Related

monitor the amount of requests openstack4j does

Jenkins's openstack-plugin uses openstack4j for talking to an openstack cloud. I'm looking for a way that we can we can monitor the amount of http(s) API calls openstack4j does, from client side perspective.
Some possible things to know:
Jenkins can tell me that? (although I believe openstack4j does the http(s) call independently)
it's running inside a container, some https call monitoring tools that I could use on that level?
Regarding your questions:
I don't think Jenkins can do this monitoring for you, in the end, it's just a big, distributed, job scheduler and runner. If there's no plugin purposefully written for this, it can't. You'd have to write it yourself.
Regarding the monitoring, there's a bunch of questions to answer, actually:
Do you want just a Java based solution?
Surprisingly, I couldn't find anything Java based, the standard Java Management Extensions (JMX) apparently do not have direct support for investigating a process' open network connections.
If it doesn't have to be Java-specific, you could use tcpdump or tshark to analyze the traffic, as long as you know where the calls go, for example.
Another generic Linux based alternative is to launch the process through strace. You might need to make some adjustments for Java.
Is the connection HTTP or HTTPS (it matters a lot)?
For HTTPS one option would be to man-in-the-middle the HTTPS connection with some sort of proxy. Then you can just check the logs of the proxy for the connections

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

Should an iphone app communicate directly with a cassandra backend?

Obviously there are multiple steps and phases of implementing such a thing.
I was thinking I would eventually have a webserver that takes http json requests from the ios app, and then queries the cassandra backend and sends results back. I could load balance and all that fancy stuff still, and also provide a logical layer on server side, and keep the client app lightweight.
I'm not sure i understand how cassandra clients fit though. It seems like the cassandra objective c client could eliminate the need for the above approach.
I saw another question and answer but it wasnt clear, perhaps because it varys on the need.
An iPhone app should not directly connect to a Cassandra backend or any other DB store.
First of all, talking to a database often requires adapting a very specific binary protocol (for Cassandra in particular, binary CQL or Thrift). Writing an adapter that would let your Objective-C app communicate in this binary protocol is a major piece of work, and could easily cost more than the rest of your app in effort. If you hide the DB behind a web-server, however, you will be able to select from a variety of existing adapters available in different server-side languages, meaning that you don't need to redo all that low-level work. You'll only be responsible for a relatively small piece of server-side code that would translate your REST queries and forward them to one of the Cassandra adapters (which expose easy-to-use interfaces).
Secondly, if you wanted to connect to a remote database from the phone, your database server would have to open its ports to the internet at large, which is a very bad security practice, even if you use SSL and user credentials. Again, if you hide behind a web server, you will be putting in a layer of technology that has evolved for decades to remain secure on the public internet.
Finally, having your phone talk to Cassandra directly is a poor architectural pattern. When you write apps that communicate on the internet, you want them to know as little as possible about each other, only how to talk to each other (preferably in a standard protocol). That way you can replace or upgrade individual components while keeping everything else the same. This may not sound like a lot, but is actually the main reason why phones, or web browsers, don't directly talk to databases. (If this setup were a good idea in principle, the first two problems could be easily solved given enough engineering effort.)
The approach you first suggested with JSON and the web server is the only correct way to go.
Use something like RESTful API, there are many reasons for that.
if your servers ip addresses change you have to update all client, if you add more nodes you will need to update all clients, if you decide to upgrade your cassandra and some functions change your clients will break and you need to update all clients.

Why or when should I use messages queues such as RabbitMQ, ZeroMQ in Erlang?

Hello awesome Erlang community!
I'm making a little project that contains a Client and a Backend. (Complicated.. right?) :)
I'm making it in erlang.
The client and backend will be two separate processes and I'm wondering if I would need to (or should I) use some sort of message queue to get them to interact?
I know I can get them to interact using their PIDs and send messages using the "!" operator.
I guess what I'm trying to say is I'm struggling with finding an answer for this question:
"Why or when should I use message queues such as RabbitMQ, ZeroMQ in Erlang"?
You want to use a messaging library when you need something that the native message passing facility won't provide.
These include:
If you need to guarantee that your messages are processed at
least once, exactly once etc. (i.e. transaction)
If your system load is such that it would be convenient if you could
hold your messages on disk instead of memory (persistence)
You need other bells and whistles like security, interop with other
systems, complex messaging pattern (routing) etc.
I would go for a messaging component when you need to decouple the different layers of my system. Also, a messaging component allows you to be able to do different integration patters with your messages/requests like topic/fanout/route based on headers...
A messaging system is also used for scalibility purposes, so you can have multiple instances of the same process running simultaneously consuming from the same queue.
Last thing I want to mention is that RabbitMQ is a message broker but ZeroMQ is not, it is a messaging library.
If you can sacrifice reliability for performance, use ZeroMq.
If you need reliability (message persistence, etc), and can give up some performance, use a brokered solution like RabbitMq.

What is the most common approach for designing large scale server programs?

Ok I know this is pretty broad, but let me narrow it down a bit. I've done a little bit of client-server programming but nothing that would need to handle more than just a couple clients at a time. So I was wondering design-wise what the most mainstream approach to these servers is. And if people could reference either tutorials, books, or ebooks.
Haha ok. didn't really narrow it down. I guess what I'm looking for is a simple but literal example of how the server side program is setup.
The way I see it: client sends command: server receives command and puts into queue, server has either a single dedicated thread or a thread pool that constantly polls this queue, then sends the appropriate response back to the client. Is non-blocking I/O often used?
I suppose just tutorials, time and practice are really what I need.
*EDIT: Thanks for your responses! Here is a little more of what I'm trying to do I suppose.
This is mainly for the purpose of learning so I'd rather steer away from use of frameworks or libraries as much as I can. Take for example this somewhat made up idea:
There is a client program it does some function and constantly streams the output to a server(there can be many of these clients), the server then creates statistics and stores most of the data. And lets say there is an admin client that can log into the server and if any clients are streaming data to the server it in turn would stream that data to each of the admin clients connected.
This is how I envision the server program logic:
The server would have 3 Threads for managing incoming connections(one for each port listening on) then spawning a thread to manage each connection:
1)ClientConnection which would basically just receive output, which we'll just say is text
2)AdminConnection which would be for sending commands between server and admin client
3)AdminDataConnection which would basically be for streaming client output to the admin client
When data comes in from a client to the server the server parses what is relevant and puts that data in a queue lets say adminDataQueue. In turn there is a Thread that watches this queue and every 200ms(or whatever) would check the queue to see if there is data, if there is, then cycle through the AdminDataConnections and send it to each.
Now for the AdminConnection, this would be for any commands or direct requests of data. So you could request for statistics, the server-side would receive the command for statistics then send a command saying incoming statistics, then immediately after that send a statistics object or data.
As for the AdminDataConnection, it is just the output from the clients with maybe a few simple commands intertwined.
Aside from the bandwidth concerns of the logical problem of all the client data being funneled together to each of the admin clients. What sort of problems would arise from this design due to scaling issues(again neglecting bandwidth between clients and server; and admin clients and server.
There are a couple of basic approaches to doing this.
Worker threads or processes. Apache does this in most of its multiprocessing modes. In some versions of this, a thread or process is spawned for each request when the request arrives; in other versions, there's a pool of waiting threads which are assigned work as it arrives (avoiding the fork/thread create overhead when the request arrives).
Asynchronous (non-blocking) I/O and an event loop. This is basically using the UNIX select call (although both FreeBSD and Linux provide more optimized alternatives such as kqueue). lighttpd uses this approach and is able to achieve very high scalability, but any in-server computation blocks all other requests. Concurrent dynamic request handling is passed on to separate processes (via CGI) or waiting processes (via FastCGI or its equivalent).
I don't have any particular references handy to point you to, but if you look at the web sites for open source projects using the different approaches for information on their design wouldn't be a bad start.
In my experience, building a worker thread/process setup is easier when working from the ground up. If you have a good asynchronous framework that integrates fully with your other communications tasks (such as database queries), however, it can be very powerful and frees you from some (but not all) thread locking concerns. If you're working in Python, Twisted is one such framework. I've also been using Lwt for OCaml lately with good success.

Resources