Kafka sink to InfluxDB - mqtt

I'm trying to get data from my kafka topic into InfluxDB using the Confluent/Kafka stack. At the moment, the messages in the topic have a form of {"tag1":"123","tag2":"456"} (I have relatively good control over the message format, I chose the JSON to be as above, could include a timestamp etc if necessary).
Ideally, I would like to add many tags without needing to specify a schema/column names in the future.
I followed https://docs.confluent.io/kafka-connect-influxdb/current/influx-db-sink-connector/index.html (the "Schemaless JSON tags example") as this matches my use case quite closely. The "key" of each message is currently just the MQTT topic name (the topic's source is an MQTT connector). So I set the "key.converter" to "stringconverter" (instead of JSONconverter as in the example).
Other examples I've seen online seem to suggest the need for a schema to be set, which I'd like to avoid. Using InfluxDB v1.8, everything on Docker/maintained on Portainer.
I cannot seem to start the connector and never get any data to move across.
Below is the config for my InfluxDBSink Connector:
{
"name": "InfluxDBSinkKafka",
"config": {
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"name": "InfluxDBSinkKafka",
"connector.class": "io.confluent.influxdb.InfluxDBSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"topics": "KAFKATOPIC1",
"influxdb.url": "http://URL:PORT",
"influxdb.db": "tagdata",
"measurement.name.format": "${topic}"
}
}
The connector fails, and each time I click "start" (the play button) the following pops up in the connect container's logs:
[2022-03-22 15:46:52,562] INFO [Worker clientId=connect-1, groupId=compose-connect-group]
Connector InfluxDBSinkKafka target state change (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2022-03-22 15:46:52,562] INFO Setting connector InfluxDBSinkKafka state to STARTED (org.apache.kafka.connect.runtime.Worker)
[2022-03-22 15:46:52,562] INFO SinkConnectorConfig values:
config.action.reload = restart
connector.class = io.confluent.influxdb.InfluxDBSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = InfluxDBSinkKafka
predicates = []
tasks.max = 1
topics = [KAFKATOPIC1]
topics.regex =
transforms = []
value.converter = class org.apache.kafka.connect.json.JsonConverter
(org.apache.kafka.connect.runtime.SinkConnectorConfig)
[2022-03-22 15:46:52,563] INFO EnrichedConnectorConfig values:
config.action.reload = restart
connector.class = io.confluent.influxdb.InfluxDBSinkConnector
errors.deadletterqueue.context.headers.enable = false
errors.deadletterqueue.topic.name =
errors.deadletterqueue.topic.replication.factor = 3
errors.log.enable = false
errors.log.include.messages = false
errors.retry.delay.max.ms = 60000
errors.retry.timeout = 0
errors.tolerance = none
header.converter = null
key.converter = class org.apache.kafka.connect.storage.StringConverter
name = InfluxDBSinkKafka
predicates = []
tasks.max = 1
topics = [KAFKATOPIC1]
topics.regex =
transforms = []
value.converter = class org.apache.kafka.connect.json.JsonConverter
(org.apache.kafka.connect.runtime.ConnectorConfig$EnrichedConnectorConfig)
I am feeling a little out of my depth and would appreciate any and all help.

The trick here is getting the data in the right format to Kafka in the first place. My MQTT source stream needed to have the value converter set to Bytearray with e schema url and schema = true. Then the Influx Sink started working when I used the jsonconverter, with schema=false. Then it started working. This is deceptive because the message queue looks the same with different valueconverters for the MQTT source connecter, so it took a while to figure out that was the problem.
After getting this working, and realising the confluent stack was perhaps a little overkill for this task, I went with the (much) easier route of pushing MQTT directly to Telegraf and having Telegraf push into InfluxDB. I would recommend this.

Related

How can I maximize throughput in Docker and Akka HTTP?

I am building a specific jig for performance measurement. I have a load generator, boom (https://github.com/rakyll/boom). With this I can generate a pretty decent amount of load.
I also have a Docker image containing nginx as a load balancer, and two Akka-HTTP based REST servers. These do nothing except count hits (they always just return 200).
Running one of these servers stand-alone (outside the Docker) I have been able to get 1000 hits/second. Not sure if that's good or not. In this Docker configuration that figure drops to about 220 hits/second. I was kinda expecting, well... 2000 hits/second or thereabouts. Higher would even be better. I'd be happy if I can find a way to get 3-4K hits/sec with this arrangement.
I often get an error message like this:
[9549] Get http://192.168.99.100:9090/dispatcher?reply_to=foo: dial tcp 192.168.99.100:9090: socket: too many open files
Tried running my Docker with --ulimit nofile=2048, but that didn't help. My application.conf for Akka is merely:
akka {
loglevel = "ERROR"
stdout-loglevel = "ERROR"
http.host-connection-pool.max-open-requests = 512
}
The server code:
object Main extends App {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
println(":: Starting Simulator on port "+args(0))
Http().bindAndHandle(route, java.net.InetAddress.getLoopbackAddress.getHostAddress, args(0).toInt)
var hits = 0
var isTiming = false
var numSec = 1
lazy val route =
get {
path("dispatcher") {
if(isTiming) hits += 1
complete(StatusCodes.OK)
} ~
path("startTiming" / IntNumber) { sec =>
isTiming = true
hits = 0
numSec = sec
val timeUnit = FiniteDuration(sec, SECONDS)
system.scheduler.scheduleOnce(timeUnit){ isTiming = false }
complete(StatusCodes.OK)
} ~
path("tps") {
val tps = hits/numSec * 2
complete(s"""${args(0)}: TPS-$tps\n""")
}
}
}
Theory of operation: Start traffic flowing then call the /startTiming/10 endpoint (for a 10-second capture on one of the 2 servers). After 10 seconds, call /tps a couple of times and the timing node will return approx. hits/second (x2).
Any idea how I can get more performance out of this?

Erlang supervisor dynamic change to restart intensity

My question is, can one modify the restart intensity thresholds of an already running supervisor, apart from in a release upgrade scenario, and if so, how?
It's never come up before, but running a supervisor with initially no children, so that another process starts children by way of supervisor:start_child/2, so my sup init/1 being like this:
init([]) ->
RestartSt = {simple_one_for_one, 10, 10},
ChSpec = [foo, {foo,start_link,[]}, transient, 1000, worker, [foo]}],
{ok, {RestartSt, ChSpec}}.
At the time of supervisor start, the likely number of children is unknown; certainly it could vary dramatically from 10, to 10,000, or more.
A restart intensity of say 20 is generous enough for 10 children, but for say 10,000 children I would like to be able to increase it... and decrease it as the number of children drops due to normal terminations.
There's no API for doing this, so I believe you're stuck with the upgrade approach unless you want to propose a new API for this to the OTP team by submitting a pull request providing a complete patch with code changes, new tests, and documentation changes.
There's also a really dirty hack way of doing this that involves manipulating internal supervisor state, and so it's absolutely not something I would recommend for a production system but I think it's still interesting to look at. A supervisor stores restart intensity in its internal loop state. You can see this state by calling sys:get_state/1,2 on a supervisor process. For example, here's the state of a supervisor in the Yaws web server:
1> rr(supervisor).
[child,state]
2> sys:get_state(yaws_sup).
#state{name = {local,yaws_sup},
strategy = one_for_all,
children = [#child{pid = <0.67.0>,name = yaws_sup_restarts,
mfargs = {yaws_sup_restarts,start_link,[]},
restart_type = transient,shutdown = infinity,
child_type = supervisor,
modules = [yaws_sup_restarts]},
#child{pid = <0.42.0>,name = yaws_server,
mfargs = {yaws_server,start_link,
[{env,true,false,false,false,false,false,"default"}]},
restart_type = permanent,shutdown = 120000,
child_type = worker,
modules = [yaws_server]},
#child{pid = <0.39.0>,name = yaws_trace,
mfargs = {yaws_trace,start_link,[]},
restart_type = permanent,shutdown = 5000,
child_type = worker,
modules = [yaws_trace]},
#child{pid = <0.36.0>,name = yaws_log,
mfargs = {yaws_log,start_link,[]},
restart_type = permanent,shutdown = 5000,
child_type = worker,
modules = [yaws_log]}],
dynamics = undefined,intensity = 0,period = 1,restarts = [],
module = yaws_sup,args = []}
The initial rr command retrieves the record definitions from supervisor so we can see the field names when we get the state from yaws_sup, otherwise we would just get a tuple full of anonymous values.
The retrieved state shows the intensity in this case to be 0. We can change it using sys:replace_state/2,3:
3> sys:replace_state(yaws_sup, fun(S) -> S#state{intensity=2} end).
#state{name = {local,yaws_sup},
strategy = one_for_all,
children = [#child{pid = <0.67.0>,name = yaws_sup_restarts,
mfargs = {yaws_sup_restarts,start_link,[]},
restart_type = transient,shutdown = infinity,
child_type = supervisor,
modules = [yaws_sup_restarts]},
#child{pid = <0.42.0>,name = yaws_server,
mfargs = {yaws_server,start_link,
[{env,true,false,false,false,false,false,"default"}]},
restart_type = permanent,shutdown = 120000,
child_type = worker,
modules = [yaws_server]},
#child{pid = <0.39.0>,name = yaws_trace,
mfargs = {yaws_trace,start_link,[]},
restart_type = permanent,shutdown = 5000,
child_type = worker,
modules = [yaws_trace]},
#child{pid = <0.36.0>,name = yaws_log,
mfargs = {yaws_log,start_link,[]},
restart_type = permanent,shutdown = 5000,
child_type = worker,
modules = [yaws_log]}],
dynamics = undefined,intensity = 2,period = 1,restarts = [],
module = yaws_sup,args = []}
Our second argument to sys:replace_state/2 takes a state record as an argument and changes its intensity field to 2. The sys:replace_state/2,3 functions return the new state, and as you can see near the end of the result here, intensity is now 2 instead of 0.
As the sys:replace_state/2,3 documentation explains, these functions are intended only for debugging purposes, so using them to do this in a production system is definitely not something I recommend. The second argument to replace_state here shows that this approach requires knowledge of the details of the internal state record of supervisor, which we obtained here via the rr shell command, so if that record ever changes, this code may stop working. Even more fragile would be treating the supervisor state record as a tuple and counting on the intensity field to be in a particular tuple position so you can change its value. Therefore, if you really want this functionality of changing a supervisor's restart intensity, you're best off in the long run proposing to the OTP team that it be added; if you're going to take that route, I recommend first proposing the idea on the erlang-questions mailing list to gauge interest.
One solution would be to nest your supervisors. But the main question is what do you want to achieve by this restart intensities. The intensity when you want to kill the supervisor needs to be a indication for something very wrong e.g. a needed resource unexpectedly not being available.

Flume Multiplexing not working

I have configured my flume agent like below. Somehow, the flume agent doesn't run properly. It keeps hanging without any errors. Is there any problem with the below configuration.
FYI: I have a file named "country" with hard-coded header as state
#Define sources, sink and channels
foo.sources = s1
foo.channels = chn-az chn-oth
foo.sinks = sink-az sink-oth
#
### # # Define a source on agent and connect to channel memory-channel.
foo.sources.s1.type = exec
foo.sources.s1.command = cat /home/hadoop/flume/country.txt
foo.sources.s1.batchSize = 1
foo.sources.s1.channels = chn-ca chn-oth
#selector configuration
foo.sources.s1.selector.type = multiplexing
foo.sources.s1.selector.header = state
foo.sources.s1.selector.mapping.AZ = chn-az
foo.sources.s1.selector.default = chn-oth
#
#
### Define a memory channel on agent called memory-channel.
foo.channels.chn-az.type = memory
foo.channels.chn-oth.type = memory
#
#
##Define sinks that outputs to hdfs.
foo.sinks.sink-az.channel = chn-az
foo.sinks.sink-az.type = hdfs
foo.sinks.sink-az.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-az.hdfs.filePrefix = statefilter
foo.sinks.sink-az.hdfs.fileType = DataStream
foo.sinks.sink-az.hdfs.writeFormat = Text
foo.sinks.sink-az.batchSize = 1
foo.sinks.sink-az.rollInterval = 0
#
foo.sinks.sink-oth.channel = chn-oth
foo.sinks.sink-oth.type = hdfs
foo.sinks.sink-oth.hdfs.path = hdfs://master:9099/user/hadoop/flume
foo.sinks.sink-oth.hdfs.filePrefix = statefilter
foo.sinks.sink-oth.hdfs.fileType = DataStream
foo.sinks.sink-oth.batchSize = 1
foo.sinks.sink-oth.rollInterval = 0
Thanks,
Vinoth
Regarding the channels list configured at the source:
foo.sources.s1.channels = chn-ca chn-oth
I think chn-ca should be chn-az.
Nevertheless, I think such a configuration will never work since the "state" header used by the selector is not created by any Flume component. You must introduce an interceptor for that, typically the Regex Extractor Interceptor.

FedEx error: Special service invalid 8200

I'm trying to connect to FedEx International WebServices API for Ship Service.
I'm using WSDL of v13.
Below is my SOAP Content:
<ns:ProcessShipmentRequest xmlns:ns="http://fedex.com/ws/ship/v13">
<ns:WebAuthenticationDetail>
<ns:UserCredential>
<ns:Key>GHmnVXAyWqWUemqD</ns:Key>
<ns:Password>1VYHs6O1vhKA3xPVAExhx1vUB</ns:Password>
</ns:UserCredential>
</ns:WebAuthenticationDetail>
<ns:ClientDetail>
<ns:AccountNumber>510087666</ns:AccountNumber>
<ns:MeterNumber>100115929</ns:MeterNumber>
</ns:ClientDetail>
<ns:Version>
<ns:ServiceId>ship</ns:ServiceId>
<ns:Major>13</ns:Major>
<ns:Intermediate>0</ns:Intermediate>
<ns:Minor>0</ns:Minor>
</ns:Version>
<ns:RequestedShipment>
<ns:ShipTimestamp>2015-04-17T12:51:03.404660Z</ns:ShipTimestamp>
<ns:DropoffType>REGULAR_PICKUP</ns:DropoffType>
<ns:ServiceType>INTERNATIONAL_PRIORITY</ns:ServiceType>
<ns:PackagingType>FEDEX_25KG_BOX</ns:PackagingType>
<ns:TotalWeight>
<ns:Units>LB</ns:Units>
<ns:Value>40.0</ns:Value>
</ns:TotalWeight>
<ns:Shipper>
<ns:Contact>
<ns:CompanyName>Your Company</ns:CompanyName>
<ns:PhoneNumber>4354454365746</ns:PhoneNumber>
</ns:Contact>
<ns:Address>
<ns:StreetLines>3121 W Government Way</ns:StreetLines>
<ns:StreetLines>Seattle</ns:StreetLines>
<ns:City>Seattle</ns:City>
<ns:StateOrProvinceCode>WA</ns:StateOrProvinceCode>
<ns:PostalCode>98199</ns:PostalCode>
<ns:CountryCode>US</ns:CountryCode>
<ns:CountryName>United States</ns:CountryName>
<ns:Residential>false</ns:Residential>
</ns:Address>
</ns:Shipper>
<ns:Recipient>
<ns:Contact>
<ns:PersonName>Agrolait</ns:PersonName>
<ns:PhoneNumber>3210588558</ns:PhoneNumber>
</ns:Contact>
<ns:Address>
<ns:StreetLines>1010 EASY ST</ns:StreetLines>
<ns:StreetLines>Apt# 11</ns:StreetLines>
<ns:City>OTTAWA</ns:City>
<ns:StateOrProvinceCode>ON</ns:StateOrProvinceCode>
<ns:PostalCode>K1A0B1</ns:PostalCode>
<ns:CountryCode>CA</ns:CountryCode>
<ns:Residential>false</ns:Residential>
</ns:Address>
</ns:Recipient>
<ns:ShippingChargesPayment>
<ns:PaymentType>SENDER</ns:PaymentType>
<ns:Payor>
<ns:ResponsibleParty>
<ns:AccountNumber>510087666</ns:AccountNumber>
<ns:Contact>
<ns:PersonName>Your Company</ns:PersonName>
<ns:CompanyName>Your Company</ns:CompanyName>
<ns:PhoneNumber>4354454365746</ns:PhoneNumber>
</ns:Contact>
<ns:Address>
<ns:CountryCode>US</ns:CountryCode>
</ns:Address>
</ns:ResponsibleParty>
</ns:Payor>
</ns:ShippingChargesPayment>
<ns:CustomsClearanceDetail>
<ns:ClearanceBrokerage>BROKER_UNASSIGNED</ns:ClearanceBrokerage>
<ns:DutiesPayment>
<ns:PaymentType>SENDER</ns:PaymentType>
<ns:Payor>
<ns:ResponsibleParty>
<ns:AccountNumber>510087666</ns:AccountNumber>
<ns:Contact>
<ns:PersonName>Your Company</ns:PersonName>
<ns:CompanyName>Your Company</ns:CompanyName>
<ns:PhoneNumber>4354454365746</ns:PhoneNumber>
</ns:Contact>
</ns:ResponsibleParty>
</ns:Payor>
</ns:DutiesPayment>
<ns:DocumentContent>NON_DOCUMENTS</ns:DocumentContent>
<ns:CustomsValue>
<ns:Currency>USD</ns:Currency>
<ns:Amount>100.0</ns:Amount>
</ns:CustomsValue>
<ns:FreightOnValue>OWN_RISK</ns:FreightOnValue>
<ns:CommercialInvoice>
<ns:TaxesOrMiscellaneousChargeType>TAXES</ns:TaxesOrMiscellaneousChargeType>
<ns:Purpose>SOLD</ns:Purpose>
<ns:TermsOfSale>FOB_OR_FCA</ns:TermsOfSale>
</ns:CommercialInvoice>
<ns:Commodities>
<ns:NumberOfPieces>1</ns:NumberOfPieces>
<ns:Description>[CARD] Graphics Card</ns:Description>
<ns:CountryOfManufacture>US</ns:CountryOfManufacture>
<ns:Weight>
<ns:Units>LB</ns:Units>
<ns:Value>10.0</ns:Value>
</ns:Weight>
<ns:Quantity>1</ns:Quantity>
<ns:QuantityUnits>Unit(s)</ns:QuantityUnits>
<ns:UnitPrice>
<ns:Currency>USD</ns:Currency>
<ns:Amount>100.0</ns:Amount>
</ns:UnitPrice>
<ns:CustomsValue>
<ns:Currency>USD</ns:Currency>
<ns:Amount>100.0</ns:Amount>
</ns:CustomsValue>
</ns:Commodities>
</ns:CustomsClearanceDetail>
<ns:LabelSpecification>
<ns:LabelFormatType>COMMON2D</ns:LabelFormatType>
<ns:ImageType>PNG</ns:ImageType>
<ns:LabelStockType>PAPER_4X6</ns:LabelStockType>
<ns:LabelPrintingOrientation>BOTTOM_EDGE_OF_TEXT_FIRST</ns:LabelPrintingOrientation>
</ns:LabelSpecification>
<ns:ShippingDocumentSpecification>
<ns:ShippingDocumentTypes>CERTIFICATE_OF_ORIGIN</ns:ShippingDocumentTypes>
</ns:ShippingDocumentSpecification>
<ns:RateRequestTypes>ACCOUNT</ns:RateRequestTypes>
<ns:EdtRequestType>ALL</ns:EdtRequestType>
<ns:PackageCount>1</ns:PackageCount>
<ns:RequestedPackageLineItems>
<ns:SequenceNumber>1</ns:SequenceNumber>
<ns:Weight>
<ns:Units>LB</ns:Units>
<ns:Value>40.0</ns:Value>
</ns:Weight>
<ns:Dimensions>
<ns:Length>17</ns:Length>
<ns:Width>12</ns:Width>
<ns:Height>3</ns:Height>
<ns:Units>IN</ns:Units>
</ns:Dimensions>
<ns:PhysicalPackaging>BOX</ns:PhysicalPackaging>
</ns:RequestedPackageLineItems>
</ns:RequestedShipment>
</ns:ProcessShipmentRequest>
When I send the request I get Below Response:
(reply){
HighestSeverity = "ERROR"
Notifications[] =
(Notification){
Severity = "ERROR"
Source = "ship"
Code = "8200"
Message = "Special service is invalid."
LocalizedMessage = "Special service is invalid."
},
(Notification){
Severity = "WARNING"
Source = "ship"
Code = "7037"
Message = "Harmonized code is missing or invalid for commodity (COMMODITY_INDEX}; estimated duties and taxes were not returned."
LocalizedMessage = "Harmonized code is missing or invalid for commodity (COMMODITY_INDEX}; estimated duties and taxes were not returned."
MessageParameters[] =
(NotificationParameter){
Id = "COMMODITY_INDEX"
Value = "1"
},
},
Version =
(VersionId){
ServiceId = "ship"
Major = 13
Intermediate = 0
Minor = 0
}
}
Warning I can have fixed but the Special Service Invalid Error Persists. Please let me know If I'm passing some wrong value or if I'm missing some Value.
Note: I tried adding the special services like COD, but the issue is same.
Since you don't have a special service defined, this error is really odd. You need to email websupport at fedex dot com for resolution. FedEx error responses are terrible.

Graphite carbon-relay not working

I have two graphite setup and I am trying to relay the traffic between the two, but somehow the carbon-relay is not working.
My cache runs on 2003/2004 and relay on 2013/2014
Following are the configurations done :
#carbon file
[cache:b]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004
CACHE_QUERY_PORT = 7012
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2003:a, aa.bb.cc.dd:2003:b
#relay-rules file
[default]
default = true
destinations = 127.0.0.1:2003:a, aa.bb.cc.dd:2003:b
Any pointers will be helpful
As part of the recent project at work, I've figured out that carbon demons uses PICKLE protocol when sending data to the destinations.
So the destination of carbon-relay should be carbon-cache's pickle receiver port instead.
#carbon.conf
....
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004:a, aa.bb.cc.dd:2004:b
Also modify the relay-rules.conf with the same destinations specified in carbon.conf
relay-rules.conf
.....
[default]
default = true
destinations = 127.0.0.1:2004:a, aa.bb.cc.dd:2004:b

Resources