Bound off-heap memory consumption in ksqlDB - ksqldb

I'm trying to bound somehow off-heap memory consumption in ksqlDB server. I found an article about this: https://www.confluent.io/blog/bounding-ksqldb-memory-usage. In first approach, I set KSQL_OPTS:
-Dksql.streams.rocksdb.config.setter=io.confluent.ksql.rocksdb.KsqlBoundedMemoryRocksDBConfigSetter
but I got:
[2020-11-16 07:25:15,848] ERROR Failed to start KSQL (io.confluent.ksql.rest.server.KsqlServerMain:60)
org.apache.kafka.common.config.ConfigException: Invalid value io.confluent.ksql.rocksdb.KsqlBoundedMemoryRocksDBConfigSetter for configuration rocksdb.config.setter: Class io.confluent.ksql.rocksdb.KsqlBoundedMemoryRocksDBConfigSetter could not be found.
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:728)
at io.confluent.ksql.config.ConfigItem$Resolved.parseValue(ConfigItem.java:125)
at io.confluent.ksql.util.KsqlConfig.lambda$resolveStreamsConfig$2(KsqlConfig.java:693)
at java.base/java.util.Optional.map(Optional.java:265)
at io.confluent.ksql.util.KsqlConfig.resolveStreamsConfig(KsqlConfig.java:693)
at io.confluent.ksql.util.KsqlConfig.lambda$applyStreamsConfig$0(KsqlConfig.java:675)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1746)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at io.confluent.ksql.util.KsqlConfig.applyStreamsConfig(KsqlConfig.java:678)
at io.confluent.ksql.util.KsqlConfig.buildStreamingConfig(KsqlConfig.java:701)
at io.confluent.ksql.util.KsqlConfig.<init>(KsqlConfig.java:738)
at io.confluent.ksql.util.KsqlConfig.<init>(KsqlConfig.java:708)
at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:50)
I also tried to set this parameter in ksqldb-server.properties but result was the same. I'm using docker image: confluentinc/ksqldb-server:0.9.0. Is it correct way to do this in such a way or I miss something?

The ksqldb-rocksdb-config-setter jar isn't included in the Docker image. You can pull the jar in from confluent's maven repository.
See: https://github.com/confluentinc/ksql/issues/6644

Related

Upgrading docker artifactory pro 6.X to 7.X

compose artifactory pro v6.9.0 running
In my compose I have two services :
image: docker.bintray.io/jfrog/artifactory-pro:6.9.0
image: docker.io/library/postgres:9.6.11
I was able to upgrade to 6.23.13 without any problem just by changing the version of the image.
When I try the same thing with any 7.X version (after upgrading to at least 6.10 as the doc says), I have errors.
For example, trying 7.21.3, I have these warnings
2022-08-01T08:41:30.343L [tomct] [WARNING] [ ] [org.apache.catalina.startup.HostConfig] [org.apache.catalina.startup.HostConfig deployDescriptor] - A docBase [/opt/jfrog/artifactory/app/artifactory/tomcat/webapps/access.war] inside the host appBase has been specified, and will be ignored
2022-08-01T08:41:30.343L [tomct] [WARNING] [ ] [org.apache.catalina.startup.HostConfig] [org.apache.catalina.startup.HostConfig deployDescriptor] - A docBase [/opt/jfrog/artifactory/app/artifactory/tomcat/webapps/artifactory.war] inside the host appBase has been specified, and will be ignored
...
2022-08-01T08:41:37.344Z [jfrt ] [WARN ] [ce1b2553475da56b] [c.z.h.p.ProxyConnection:182 ] [ocalhost-startStop-2] - HikariCP Main - Connection org.apache.derby.impl.jdbc.EmbedConnection#1597179442 (XID = 24), (SESSIONID = 3), (DATABASE = {db.home}), (DRDAID = null) marked as broken because of SQLSTATE(0A000), ErrorCode(20000)
java.sql.SQLFeatureNotSupportedException: Feature not implemented: No details.
and these errors
08:41:34,803 |-ERROR in ch.qos.logback.core.joran.action.AppenderAction - Could not create an Appender of type [org.artifactory.usage.appender.UsageTrafficTimeBasedRollingFileAppender]. ch.qos.logback.core.util.DynamicClassLoadingException: Failed to instantiate type org.artifactory.usage.appender.UsageTrafficTimeBasedRollingFileAppender
...
2022-08-01T08:41:37.113Z [jfrt ] [ERROR] [ce1b2553475da56b] [d.d.l.DbDistributeLocksDao:506] [ocalhost-startStop-2] - Unable to detect database version Unable to get connection from unique lock data source
2022-08-01T08:41:37.353Z [jfrt ] [ERROR] [ce1b2553475da56b] [tifactoryHomeConfigListener:55] [ocalhost-startStop-2] - Failed initializing Home. Caught exception:
java.lang.IllegalStateException: Could not find database table: db_properties
After reading the docs I do not clearly understand if I have to download the new docker-compose package from jfrog. I've tried but once there, the config.sh ask for external database and no question about reusing existing image directory.
Thx for help
Not sure if this is still relevant for you, but I got the same problem. As far as I see there are new DB connection environment variables, that must be set. This error message is just a symptom, because without the new DB connection env, Artifactory will try to use an in-memory database (Apache Derby). Something like this is needed to fix it (as Docker Compose config of the Artifactory container):
environment:
- JF_SHARED_DATABASE_TYPE=postgresql
- JF_SHARED_DATABASE_USERNAME=${POSTGRESQL_USERNAME}
- JF_SHARED_DATABASE_PASSWORD=${POSTGRESQL_PASSWORD}
- JF_SHARED_DATABASE_URL=jdbc:postgresql://postgresql:5432/artifactory
- JF_SHARED_DATABASE_DRIVER=org.postgresql.Driver

Flink consumer job is unable to connect to Kafka from Docker

I have a simple streaming Flink Scala job which connects to a Kafka topic and maps its
org.apache.avro.generic.GenericRecord messages and map into Json string.
When it is running in IntelliJ it ingests the topic well and printing out the jsons.
When I run it in docker-compose I got the following exception:
com.esotericsoftware.kryo.KryoException: Error constructing instance of class: org.apache.avro.Schema$LockableArrayList
Serialization trace:
types (org.apache.avro.Schema$UnionSchema)
schema (org.apache.avro.Schema$Field)
fieldMap (org.apache.avro.Schema$RecordSchema)
schema (org.apache.avro.generic.GenericData$Record)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:136)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1061)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.create(CollectionSerializer.java:89)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:93)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:143)
at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:21)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:679)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:528)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:657)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:262)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:111)
at org.apache.flink.api.java.typeutils.runtime.TupleSerializer.copy(TupleSerializer.java:37)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:635)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:612)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:592)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:727)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:705)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
at org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:715)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
Caused by: java.lang.IllegalAccessException: Class com.twitter.chill.Instantiators$ can not access a member of class org.apache.avro.Schema$LockableArrayList with modifiers "public"
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:102)
at java.lang.reflect.AccessibleObject.slowCheckMemberAccess(AccessibleObject.java:296)
at java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:288)
at java.lang.reflect.Constructor.newInstance(Constructor.java:413)
at com.twitter.chill.Instantiators$.$anonfun$normalJava$1(KryoBase.scala:170)
at com.twitter.chill.Instantiators$$anon$1.newInstance(KryoBase.scala:133)
... 37 more
I tried forcing Avro serialization with:
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.getConfig.disableForceKryo()
env.getConfig.enableForceAvro()
but got the same error.
Based on [this][1] I'm using all Flink related dependencies as "provided" with no good result.
What can be the difference between running the job in the IDE and in
Docker?
How can I fix the job to be able to read the Kafka topic from
Docker?
How shall I setup Docker for this?
What can I handle Kryo/Avro serialization issue?
SS
[1]: http://www.alternatestack.com/development/com-esotericsoftware-kryo-kryoexception-unusual-solution-upgrading-flink/

enable local cache on emqx

I see the documentation on https://docs.emqx.io/broker/v3/en/guide.html#emq-x-bridge-cache-configuration and it says that you can enable the cache on file if the network fails because emqx now is not doing this stuff.
When i set, for example the parameter on emqx 3.0.0.0 it fails on start and says in the lof file that is not declared:
You've tried to set bridge.xxx.queue.replayq_seg_bytes, but there is no setting with that name.
2020-03-03T19:43:22.777171+03:00 [error] Did you mean one of these?
2020-03-03T19:43:22.962094+03:00 [error] bridge.$name.mqueue_type
2020-03-03T19:43:22.962572+03:00 [error] bridge.$name.clean_start
2020-03-03T19:43:22.962760+03:00 [error] bridge.$name.start_type
2020-03-03T19:43:23.102793+03:00 [error] Error generating configuration in phase transform_datatypes
2020-03-03T19:43:23.103040+03:00 [error] Conf file attempted to set unknown variable: bridge.aps.queue.replayq_seg_bytes
You know if its problem of my version of emqx or is possible a problem with the sintax.
Thanks in advance
Greetings
It's sintax error.
bridge.xxx.queue.replayq_seg_bytes
It's means set the xxx bridge use queue.replayq_seg_bytes config.
bridge.mqtt.xxx.address = 127.0.0.1:1883
Is exists? By the way the EMQ X v4.0.6 is recommended.

Flume NullPointerExceptions on checkpoint

I've setup a file to file source/sink , just as a test of basic flume functionality.
Im currently using the "exec" source, with the command being "tail -F mytmpfile".
In my script, I continuously echo "....." >> mytmpfile , so that the tail command constitutes a stream.
However, I've started seeing the following exception in the flume logs:
java.lang. IllegalStateException: Channel closed [channel=c1]. Due to
java.lang.NullPointerException: null
at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:353)
at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)
at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:183)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException
at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:895)
at org.apache.flume.channel.file.Log.replay(Log.java:406)
at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:303)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:236)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more
Any thoughts on where this NullPointerException is coming from? It appears from scanning the code that maybe it related to a missing folder or directory. But I cant find the exact line on the git hub branches.
This is using apache-flume-1.3.1.23-...
In the past I've had problems with file channels, and they've normally boiled down to two problems:
1) If you're running multiple agents on the same box, make sure you configure them to have separate dataDirs and checkpointDir.
2) On Linux boxes, check that your tmpfs isn't near its capacity. If it's getting full, flume will complain. Try stopping the flume agent, unmount tmpfs, enlarge it, remount and restart the agent.

hadoop only launch local job by default why?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.
Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
If you're using Hadoop 2 and your job is running locally instead of on the cluster, ensure that you have setup mapred-site.xml to contain the mapreduce.framework.name property with a value of yarn. You also need to set up an aux-service in yarn-site.xml
Checkout the Cloudera Hadoop 2 operator migration blog for more information.
I had the same problem that every mapreduce v2 (mrv2) or yarn task only ran with the mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
The Resourcemanager and Nodemanagers were accessible and the mapreduce.framework.name was set to yarn.
Setting the HADOOP_MAPRED_HOME before executing the job fixed the problem for me.
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
cheers
dan

Resources