We have a sensor that publishes data to a TCP socket.
How could we create an ingest rule in CrateDB to read from that specific socket?
Do we need to use MQTT to read data from the Socket and then publish it as topics so a CrateDB ingest rule can read it?
Such approach sounds inefficient. We would like to populate the table with data directly from the TCP socket. Is that possible?
No this is currently not possible inside CrateDB. Also please aware that the MQTT implementation at CrateDB is deprecated and will be removed in future versions, https://crate.io/docs/crate/reference/en/latest/admin/ingestion/sources/mqtt.html.
Main reasons are that the current implementation was very limited (e.g only implements MQTT Quality of Service (QoS) level one), using a dedicated MQTT ingest service is much more flexible in rule definition and protocol support, etc...
Related
I am quite confused about MQTT clustering, it doesn't seem to be part of the MQTT protocol and I was wondering if each MQTT broker implementation has its own way to implement it. Also, do you know which kind of information are shared between cluster nodes? It seems like it retains information related to the session related to pub/sub but not the messages, is that correct? Thanks!
No, there is nothing in the MQTT protocol about clustering brokers. There is support for bridging topics between 2 brokers, but this is purely at the message level, it carries no information about clients or sessions.
Any clustering is implemented independently by a given broker, what information shared would also be dependent on the that implementation. But would need to include the following:
Client Session information, including subscriptions
Message
Information about which messages have been delivered to what clients
I have a large amount of streaming IoT sensor data and I'm using QuestDB as a time series database. I'm curious whether it's better to send this over Postgres or via Influx line protocol if I have a lot of measurements. I would like to use Python for this, if possible, but it depends which performs better.
InfluxDB line protocol is designed for high-throughput ingestion. This interface is for write operations only, so you will still need to use PostgreSQL wire protocol or REST API for querying.
It's recommended then for your IoT / high-throughput scenario to use InfluxDB line protocol as this is the most efficient. There are Python examples for simple socket writes over TCP and UDP on the official documentation.
Just asking one silly question, hope someone can answer this.
I'm bit confused regarding MQTT broker. Basically, the confusion is, there are so many things being used for data storing, transfer and processing (like Flume, HDInsight, Spark etc). So, when and why I need to use one MQTT broker?
If I would like to use Windows 10 IoT application with HiveMQ, from where can I get the details? how to use it? How I get benefit out of this MQTT broker? Can I not send data from my IoT application directly using Azure or HDFS? So, how MQTT broker fits into it or helping me to achieve something?
I'm new to all these and tried to find some tutorials, however, I'm not getting anything proper. Please explain it in more details or give some tutorials for this?
MQTT is a client-server protocol for pub-sub based transport that has a comparatively small overhead, and thus applicable to mobile and IoT applications (unlike Flume, etc.). The MQTT broker is basically a server that handles messaging to/from MQTT clients and among them. The functionality pretty much stops at the transport layer, even though various MQTT add-ons exist.
If you are looking to implement a solution that would reliably transfer data from your IoT devices to the back-end system for processing, I would suggest you take a look into Kaa open-source IoT platform. It goes much further than MQTT by providing not only the transport layer, suitable for low-power IoT devices, but also a solid chunk of the application level logic (including the object bindings for your application-level data structures, temporary data persistence, etc.).
Here is a link to a webinar that explains how to build a scalable IoT analytics system with Kaa and Spark in less than an hour.
This is an architectural choice. IoT applications are possible without MQTT but there are some advantages when using MQTT. If you are completely new to MQTT, take a look at this in-depth MQTT series: http://forkbomb-blog.de/2015/all-you-need-to-know-about-mqtt
Basically the main architectural advantage is publish / subscribe designed for low-latency, high throughput (mobile) communication with minimal protocol overhead (which is important if bandwidth is at a premium). You can completely decouple consumers and producers.
HDFS is the (distributed) Hadoop file system and is the foundation for Map / Reduce processing. It is not comparable to a MQTT broker. The MQTT broker could write to the HDFS, though (in case of HiveMQ with a custom plugin).
Basically MQTT is a protocol while the products you are mentioning are, well, products which solve completely different problems:
Flume is basically used for log aggregation at scale. You won't use MQTT for that, at least there is not too much advantage because this is typically done in backend applications.
Spark and Hadoop shine at Big Data crunching. They are a framework and not a ready to use solution. They are not really comparable to MQTT. Often MQTT brokers like HiveMQ are used in conjunction with these, Spark / Hadoop for data processing and HiveMQ for communication.
I hope this helps you getting started. Best would be to read about typical use cases of all these technologies, this is a bit too broad for a single SO answer.
MQTT is a data transport, so the usual thing I have to compare it with is HTTP. HTTP has two important characteristics, a) It goes from one point to another, b) It is request/response, so only one end can start a data transfer. MQTT connects many end points to many end points, and either end can start a data transfer. So, if you have just one device and only one service or person that will ever access it, and only by polling, then HTTP is great. MQTT means many devices can post data to many services or people, AND the other way around. Your question assumes that your data is always going to land up in some sort of data store, but many interactions are about events and responding to them immediately, like ringing a doorbell, or lowering the landing gear. In these cases you will often want to both record the data, and have an immediate action occur, like your phone making a doorbell noise.
Finally, you send data to MQTT semantically, rather than by IP address.
This means that your services subscribes to /mikeshouse/doorbell rather than polling 192.168.22.4, which is a huge gain once you have a number of devices.
I'm glossing over their documentation here :
http://www.rubydoc.info/github/github/statsd-ruby/Statsd
And there's methods for recording data, but I can't seem to find anything about retrieving recorded data. I'm adopting a projecting with an existing statsd addition. It's host is likely a defunct URL. Perhaps, is the host where those stats are recorded?
The statsd server implementations that Mircea links just take care of receiving, aggregating metrics and publishing them to a backend service. Etsy's statsd definition (bold is mine):
A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers, sent over UDP or TCP and sends
aggregates to one or more pluggable backend services (e.g.,
Graphite).
To retrieve the recorded data you have to query the backend. Check the list of available backends. The most common one is Graphite.
See also this question: How does StatsD store its data?
There are 2 parts to statsd: a client and a server.
What you're looking at is the client part. You will not see functionality related to retrieving the data as it's not there - it normally is on the server side.
Here is a list of statsd server implementations:
http://www.joemiller.me/2011/09/21/list-of-statsd-server-implementations/
Research and pick one that fits your needs.
Statsd originally started at etsy: https://github.com/etsy/statsd/wiki
I want to implement a demo application to listen data via TCP/IP.
Data Transmitter will transmit a series or ASCII char or a series of string all the time. It feeds data into TCP/IP address (eg. 127.0.0.1:22) This could be a GPS transmitter.
I want to implement a demo application for receiving data by clicking the start button and listening to the data via TCP/IP and display it accordingly.
Correct me if I am wrong, I don't think I can use Server/Client server for this purpose. I tried to create a client application with TIdTcpClient, it receives only one time data. I don't think Indy has a TCP listening component.
Thanks in advance.
If you wanna monitor network comunications between some device and some other program on your computer using of TIdTCPServer won't work. Why? Once Indy will read network data it will mark it as processed and delete it from network buffer. So that data probably won't even reach to the other program on your computer. Workaround for this is that you design your application to actually work similar as network bridge. Your application listens to the data on one port and then forwards that data on another port on which the other program is listening. But the main problem is that you have to make this to work both ways.
What you need is somekind of a component which is able to peek at the network data but don't interact with it. This is usually done on driver level.
Now if it is not abolutely necessary to have such functionality in your own software but you are only interested in getting the data I recomend you try Wireshark (http://www.wireshark.org/). Wireshark is a verry powerfull freware software which alows you to monitor all netwrok traffic on basically all protocols without causing any interuptions. In order for this software to work it instals special driver which serves for intercepting the network data.
Maybe you would want to use same driver in your application if this functionality needs to be in your application.
Based on your diagram I think that your implementation could also be based on a message-oriented middleware, using a message broker which receives the GPS transmitter or other data.
The message broker would then store the data internally and forward it to all interested clients which are connected. A typical messaging pattern in this case is a "Topic", which broadcasts the messages similar to a radio station.
So the middleware will ensure that the information will be collected (optionally also persisted to disk) and then guarantees the delivery to the receivers. This can be done even in a way where receivers which have been off-line for a while still receive the GPS messages created while they where not listening ('retroactive consumers').
There are many popular free open source message brokers, and most of them also can be used with Delphi.