Understanding filebeat monitoring stats when ingesting netflow traffic

Understanding filebeat monitoring stats when ingesting netflow traffic - monitoring

I'm running filebeat 7.14.0 to ingest Netflow data, which is then stored in Elasticsearch and viewed on Kibana. When I run filebeat -e, I will see some logs generated by filebeat every 30s.
I'm trying to understand the stats more. For example, I see
"input":{"netflow":{"flows":1234,"packets":{"dropped":2345,"received":12345}}}}
But each netflow packet contains about 10 netflow records, so when I receive 12345 packets, I would expect 123450 flows, and the stats only show 1234 flows. Does it mean I'm missing a lot of flows?

For Better understanding of logs , enable logging in debug mode and add
Logging.Selector : ["input"]
This will show you stats per sec in the logs .
Add grep "stats" while checking the log to easily check the stats .
This will show you Flows and packets per second

Related

429 Rate exceeded responses for no apparent reason

We're using Contentful to manage CMS content. When you save content in Contentful it sends webhooks for a service we've set up on Cloud Run, which in turn ensures the updated content is built and deployed.
This setup has been previously so that the Cloud Run service was limited to 1 container max, with 80 concurrent requests limit. This should be plenty for the few webhooks we get occasionally.
Now when debugging complaints about content not being updated I bumped into a very persistent and irritating issue - Google Cloud Run does not try to process the 2 webhooks sent by Contentful, but instead responds to one of the 2 with status 429 and Rate exceeded. in response body.
This response does not come from our backend, I can see in the Cloud Run Logs tab the message generated by Google: The request was aborted because there was no available instance.
I've tried:
Increasing number of processes on the container from 1 to 2 - should not be necessary due to use of an async framework
Increasing number of containers from 1 to 2
The issue persists for the webhooks from Contentful.
If I try making requests from my local machine with hey that defaults to 200 requests with 50 concurrency, they all go through without any 429 status codes returned.
What is going on that generates 429 status codes when a specific client - in this case Contentful - makes ONLY 2 requests in quick succession? How do we disable or bypass this behavior?
gcloud run services describe <name> gives me these details of the deployment:
+ Service [redacted] in region europe-north1
URL: https://[redacted].a.run.app
Ingress: all
Traffic:
100% LATEST (currently [redacted])
Last updated on 2021-01-19T13:48:46.172388Z by [redacted]:
Revision [redacted]
Image: eu.gcr.io/[redacted]/[redacted]:c0a2e7a6-56d5-4f6f-b241-1dd9ed96dd30
Port: 8080
Memory: 256Mi
CPU: 1000m
Service account: [redacted]-compute#developer.gserviceaccount.com
Env vars:
WEB_CONCURRENCY 2
Concurrency: 80
Max Instances: 2
Timeout: 300s

This is more a speculation that an answer, but I would try re-deploying you Cloud Run service with min-instances set to 1 (or more).
Here is why.
In the Cloud Run troubleshooting docs they write (emphasis mine):
This error can also be caused by a sudden increase in traffic, a long container startup time or a long request processing time.
Your Cloud Run service receives webhook events from a CMS (Contentful). And, as you wrote, these updates are rather sporadic. So I think that your situation could be the same as the one described in this comment on Medium:
I tested “max-instances: 2” and the conclusion is I got 429 — Rate exceeded responses from Google frontend proxy because no container was running. It seems that a very low count of instances will deregister your service completely from the load-balancer until a second request was made.
If Google Cloud did indeed de-register your Cloud Run service completely because it was not receiving any traffic, re-deploying the service with at least one container instance could fix your issue. Another way would be to call your Cloud Run service every once in a while just to keep it "warm".

Azure Kubernetes Service pods not starting health check is timing out and no error logs

I am using Azure Kuerbets Services and Im having a huge problem to detect why pods (of specific type) isnt starting... The only thing that happens is that when new pods starts health check is timing out and silently AKS go back to old deployed services that worked... I have made a lot of trace output in service to detect where it fails if its external calls that are blocked etc and I have a global try/catch in Program.cs but no information comes out... AKS listen on stdout and grabbing logs there and push them to external tool.... I have tried to increase values when health check should start etc as below but with no result
livenessProbe:
.
.
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
.
.
initialDelaySeconds: 50
periodSeconds: 15
When running service locally its up in 10-15 sec
Obviously things seems to fail before service is started or something is timing out and I'm wondering...
Can I fetch logs or monitor whats happening and why pods are so slow in AKS when pods are starting?
Is it possible to monitor what comes out on stdout on an virtual machine that belongs to AKS-cluster?
Feels like I have tested everything but I cant find any reason why health-monitoring is refusing requests.
Thanks!

If you have enabled Azure Monitor for Container when you created your cluster, the logs of your application will be pushed to a Log Analytics workspace in the table ContainerLog. If Azure Monitor is not enable, you can use kubectl to see what is output to stdout and sdterr with the following command :
kubectl logs {pod-name} -n {namespace}
You can also check the kubernetes events, you'll see events saying that the probes failed If this is really the problem :
kubectl get events -n {namespace}

Logging Data via Docker Logs - .NET Core 3.1 Filebeat ELK

I am using Docker logs with the standard driver to show dummy data from a .Net Core web app via docker logs [options]
More specifically, I am using ILogger<>
logger.LogDebug("Hello from web app");
The output is:
2020-02-03T15:29:02.378269378Z dbug: WebAppLogger.Startup[0]
2020-02-03T15:29:02.378353836Z Hello from Configure
Eventually I want to use Filebeat in conjunction with ELK to send these logs to, but the examples I have found are more like this:
11/June/2019:00:10:45 + 0000 DEBUG "This is a generic log message example"
Mine seems to go onto 2 lines with two slightly different times (milliseconds). Is there a more apt logging library for this, or is it something I am doing wrong?

Print Data from Confluent Source and Sink connectors

i have source and sink connectors installed using confluent and they are working fine. but when i see docker logs using
docker logs -f container-name
the output is something like this
[2018-09-19 09:44:06,571] INFO WorkerSourceTask{id=mariadb-source-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-09-19 09:44:06,571] INFO WorkerSourceTask{id=mariadb-source-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-09-19 09:44:06,573] INFO WorkerSourceTask{id=mariadb-source-0} Finished commitOffsets successfully in 2 ms (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-09-19 09:44:16,194] INFO WorkerSinkTask{id=oracle-sink-0} Committing offsets asynchronously using sequence number 1077: {faheemuserdbtest-0=OffsetAndMetadata{offset=7, metadata=''}} (org.apache.kafka.connect.runtime.WorkerSinkTask)
[2018-09-19 09:44:16,574] INFO WorkerSourceTask{id=mariadb-source-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2018-09-19 09:44:16,574] INFO WorkerSourceTask{id=mariadb-source-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTas
but it doesn't the actual data going through the topics, is there a way i can print that data in logs? because i'm moving these logs to a kibana dashboard.
yes i can read data from Kafka topic but that is not my scenario.

Depending on the connector, if you enabled TRACE logging in the Connector Log4j properties, you can see the messages.
If using the Confluent's docker images, there are some CONNECT_LOG4J_LOGGERS environment variables for controlling that
If you want the actual JDBC data in Elasticsearch, you'd typically install the Elasticsearch sink, though rather than parse it out of those logs

No, you can't see that data in the logs.
The connectors don't print the actual data copied around. If you have such requirement, you probably would have to change the logging mechanism in the source and sink connector source code and customize it according to your requirement.

HDFS write from kafka : createBlockOutputStream Exception

I'm using Hadoop from docker swarm with 1 namenode and 3 datanodes (on 3 physical machines).
i'm also using kafka and kafka connect + hdfs connector to write messages into HDFS in parquet format.
I'm able to write data to HDFS using HDFS clients (hdfs put).
But when kafka is writing messages, it works at the very beginning, then if fails with this error :
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.8:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1533)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1309)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
[2018-05-23 10:30:10,125] INFO Abandoning BP-468254989-172.17.0.2-1527063205150:blk_1073741825_1001 (org.apache.hadoop.hdfs.DFSClient:1265)
[2018-05-23 10:30:10,148] INFO Excluding datanode DatanodeInfoWithStorage[10.0.0.8:50010,DS-cd1c0b17-bebb-4379-a5e8-5de7ff7a7064,DISK] (org.apache.hadoop.hdfs.DFSClient:1269)
[2018-05-23 10:31:10,203] INFO Exception in createBlockOutputStream (org.apache.hadoop.hdfs.DFSClient:1368)
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.9:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1533)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1309)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1262)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
And then the datanodes are not reachable anymore for the process :
[2018-05-23 10:32:10,316] WARN DataStreamer Exception (org.apache.hadoop.hdfs.DFSClient:557)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /topics/+tmp/test_hdfs/year=2018/month=05/day=23/hour=08/60e75c4c-9129-454f-aa87-6c3461b54445_tmp.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1733)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2496)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:828)
But if I look into the hadoop web admin console, all the nodes seem to be up and OK.
I've checked the hdfs-site and the "dfs.client.use.datanode.hostname" setting is set to true both on namenode and datanodes. All ips in hadoop configuration files are defined using 0.0.0.0 addresses.
I've tried to format the namenode too, but the error happened again.
Could the problem be that Kafka is writing too fast in HDFS, so it overwhelms it? It would be weird as I've tried the same configuration on a smaller cluster and it worked good even with a big throughputof kafka messages.
Do you have any other idea of the origin of this problem?
Thanks

dfs.client.use.datanode.hostname=true has to be configured also to the client side and, following your log stack:
java.nio.channels.SocketChannel[connection-pending remote=/10.0.0.9:50010]
I guess 10.0.0.9 refers to a private net IP; thus, it seems that the property is not set in your client within hdfs-client.xml.
You can find more detail here.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart