I am using Scaffeine in my project (https://github.com/blemale/scaffeine), a Scala wrapper for Caffeine (https://github.com/ben-manes/caffeine). I also have a prometheus JMX collector embedded in my metrics API (https://github.com/Segence/kamon-jmx-collector).
However when I launch my application, I can't really see any MBeans for Caffeine in VisualVM.
Also, when looking at the Caffeine project, I found that in the caffeine/jcache/src/main/resources/reference.conf there is a config for JMX monitoring:
monitoring {
# If cache statistics should be recorded and externalized
statistics = false
# If the configuration should be externalized
management = false
}
Both are set to false. Is there a way to configure Caffeine so that it exposes MBeans to JMX?
Thanks Ben Manes,
This would be the answer according to prometheus:
import io.prometheus.client.cache.caffeine.CacheMetricsCollector
CacheMetricsCollector cacheMetrics = new CacheMetricsCollector().register();
Cache<String, String> cache = Caffeine.newBuilder().recordStats().build();
cacheMetrics.addCache("myCacheLabel", cache);
Related
We are trying to use our logback.xml that we use in GCP Cloud run which has amazing filtering features. Our logback.xml contains this for cloud run
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder">
<layout class="com.orderlyhealth.api.logging.logback.GCPCloudLoggingJSONLayout">
<pattern>${CONSOLE_PATTERN}</pattern>
</layout>
</encoder>
</appender>
And our GCPCloudLoggingJSONLayout does a great job at setting all the things we need like clientId, customerRequestId, etc. etc. and we can filter across many many microservices on one customer or one customer request. We lose this in dataflow currently though. We tried adding logback.xml to src/main/resources and deploying the project seems to use it in the shell like so
{"message":"[main][-][:] o.a.b.r.d.DataflowRunner Template successfully created.\n",
"logger":"org.apache.beam.runners.dataflow.DataflowRunner",
"transactionId":null,"socket":null,"clntSocket":null,
"version":null,
"timestamp":{"seconds":1619694798,"nanos":4000000},
"thread":"main",
"severity":"INFO",
"instanceId":null,
"headers":{},
"messageInfo":{"message":"Message short enough. Displayed top level"}
}
thanks for any ideas on modifying dataflow logging.
Currently we see this instead which is not nearly as useful for tracing the customer request through systems
I don't think you can change how Dataflow logs to Cloud logging.
Instead, you can change how/what you log and let Dataflow pass them through to cloud logging. See Logging pipeline messages.
Or you can use cloud logging client libraries in your pipeline directly: https://cloud.google.com/logging/docs/reference/libraries.
Please take a look at How to override Google DataFlow logging with logback? for the latest version of this answer
I copied the current answer there to make it easier for folks who want to look:
Dataflow relies on using java.util.logging (aka JUL) as the logging backend for SLF4J and adds various bridges ensuring that logs from other libraries are output as well. With this kind of setup, we are limited to adding any additional details to the log message itself only.
This also applies to any runner executing a portable job since the container with the SDK harness has a similar logging configuration. For example Dataflow Runner V2.
To do this we want to create a custom formatter to apply to the root JUL logger. For example:
public class CustomFormatter extends SimpleFormatter {
public String formatMessage(LogRecord record) {
// implement whatever logic the is needed to add details to the message portion of the log statement
return super.formatMessage(record);
}
}
And then during start-up of the worker we need to update the root logger to use this formatter. We can achieve this using a JvmInitializer and implement the beforeProcessing method like so:
#AutoService(JvmInitializer.class)
public class LoggerInitializer implements JvmInitializer {
public void beforeProcessing(PipelineOptions options) {
LogManager logManager = LogManager.getLogManager();
Logger rootLogger = logManager.getLogger("");
for (Handler handler : rootLogger.getHandlers()) {
handler.setFormatter(new CustomFormatter());
}
}
}
In the Flume agent I am collection the elements from Kafka topics and I need to insert them in ES. However I need to perform a previous digestion process in the sink, so I need to write a custom sink to pass the data from the Agent's channel to a java digestion module (which I have written already).
Can anyone share with me a template of a custom sink and can use as a reference? Flumes official website doesn't say much about this topic:
A custom sink’s class and its dependencies must be included in the agent’s classpath when starting the Flume agent. The type of the custom sink is its FQCN.
https://flume.apache.org/FlumeUserGuide.html#custom-sink
And once the custom sink is ready, How could I link the following three files to make the agent work:
custom sink
ingestion jar (java module to perform the ingestion process)
FlumeAgent.properties
Thank you for any feedback. I will keep adding information as soon as I progress in this task.
Hope you are trying to use Flume to recieve events from Kafka (source) and forwarding it to ES (sink) with some data processing logic already you have.
With this understanding, I would suggest you to look into Flume interceptors which is responsible for altering/filtering the events on fly before sending to Sink.
So all your business logic to alter the events can be implemented as a custom interceptor and it should be configured to the Flume channel.
For reference you can checkout the native interceptors source code already available. This should probably give you an idea on the Flume interceptor framework.
Here is the ES Sink source code
Sample Flume config
a1.sources = kafkaSource
a1.sinks = ES_Sink
a1.channels = channel1
a1.sources.kafkaSource.interceptors = i1
a1.sources.kafkaSource.interceptors.i1.type = org.apache.flume.interceptor.<Custom_Interceptor_name>$Builder
a1.sinks.ES_Sink.channel = channel1
a1.sinks.ES_Sink.type = elasticsearch
a1.sinks.ES_Sink.hostNames = 127.0.0.1:9200
I am kind of newbie to apache flume , I have configured single tier agent with sink group -load balance manually , I would like to know how can i test the sink group load balancing ? any idea folks
You can define two different sinks and mention them in the Sink Groups as below,
agent1.sinkgroups = g1
agent1.sinkgroups.g1.sinks = HDFS1 HDFS2
agent1.sinkgroups.g1.processor.type = load_balance
agent1.sinkgroups.g1.processor.backoff = true
agent1.sinkgroups.g1.processor.selector = round_robin
Here both of them are HDFS sinks.
You can mention the process selector (round_robin[default], random or custom selector) which defines how should the load be balanced between two sinks.
When you run the agent, you can see that two different set of data is stored in two respective HDFS paths(sinks).
Other two optional parameters are backoff and selector.maxTimeOut
You can refer this link for more info Flume 1.6.0 User Guide
guys
I met a problem.I use logg4j and apache-flume to collect logs.the architecture is use logg4j remote print,the config like this:
log4j.appender.flume=org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname=192.168.152.49
log4j.appender.flume.Port=44446
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
while the configure of flume like this:
a1.sources.r1.type=avro
a1.sources.r1.bind=192.168.152.49
a1.sources.r1.port=44446
it works!but the question is when the flume closed.the application which use logg4j can't print log!so is anybody can tell me.
how to fix this problem
It depends on how you want to handle Flume being down. With the regular Log4jAppender, you can enable unsafe mode which will log the error in the log4j LogLog, but otherwise fail silently. To do that you can set log4j.appender.flume.UnsafeMode = true. You can see an example here:
https://github.com/kite-sdk/kite-examples/blob/master/logging/src/main/resources/log4j.properties#L20
With unsafe enabled, any events you log while Flume is down will be lost.
If you want to be able to point to multiple Flume agents and have it balance the load between them as well as fail over if one of them goes down, you can use the LoadBalancingLog4jAppender instead. The docs here should help:
http://flume.apache.org/FlumeUserGuide.html#load-balancing-log4j-appender
I am trying to use flume to use the Twitter Stream API and index the tweet to my elasticsearch. I setup my flume.conf to use com.cloudera.flume.source.TwitterSource as twitter source (with my dev tokens) and I use the default elastisearch for the sink.
I am able to get the tweets (because I also save it into HDFS, and when I open the file I can see the tweets) but when i search into my elasticsearch, I get as response :
{
_index: twitter-2014-02-14
_type: tweet-rt
_id: ilL5ZrBRSlqrZcsVUbnO-g
_version: 1
_score: 1
_source: {
#message: org.elasticsearch.common.xcontent.XContentBuilder#12da4409
#timestamp: 2014-02-14T10:16:13.000Z
#fields: {
timestamp: 1392372973000
}
}
here example of my flume config.
# - ElasticSearch Sink
TwitterAgent.sinks.ES.type = elasticsearch
TwitterAgent.sinks.ES.channel = FileChannel
TwitterAgent.sinks.ES.hostNames = 192.168.10.100:9300
TwitterAgent.sinks.ES.indexName = twitter
TwitterAgent.sinks.ES.indexType = tweet-rt
TwitterAgent.sinks.ES.clusterName = testou
Do I have to add something else ? I dont understand why ES cannot deserialize my tweet.
Any ideas?
thankyou
This is weird. It's doing some form of identityHashCode on the XContentBuilder to get that message and it should not.
I think I'd recommend clearing out Flume and re-installing. I'd be concerned about classpath and JAR dependency issues.
What version of Flume?
For others who come across this error, this is a bug in flume elastic search sink which has been fixed now. See https://issues.apache.org/jira/browse/FLUME-2126
If you are on flume version earlier than 1.6 you may want to cherry pick and build one with this patch against your version.