With SpringBoot + OpenTelemetry the stack-trace for every exception runs into 100s of lines. Most of the lines apart from the top few are often not useful in troubleshooting the issue. The default truncation can solve the large event problem, but as it could be arbitrary we may miss some key data.
Our requirement is to produce something of this sort in the JSONTemplateLayout context:
java.lang.Exception: top-level-exception
at MyClass.methodA(MyClass.java:25)
<15 more frames, maximum>
...truncated...
Caused by: java.lang.RuntimeException: root-cause-exception
at SomeOtherClass.methodB(SomeOtherClass.java:55)
<15 more frames, maximum>
...truncated...
This way we don't lose the chain that caused the exception to truncation but at the same time not have more than N frames at each level in the chain (N=16 in the example above).
If there's a way to pass in the resolver for stack trace this could work, but I couldn't find a way to pass in a custom resolver.
Related
I have a spring application which builds a reactive pipeline as follows:
buildPipeline(). // returns a flux based on changeStreamEvents or Kafka receives
.bufferTimeout( capacity, Duration.ofSeconds(1))
. flatMap( r -> {
element x = r.get(r.size()-1)
//some processing on element and the batch obtained
})
.doOnError( e-> log.info("error occurred:" + e.toString())
.subscribe()
However, I see my application intermediately throwing the below error -
java.lang.illegalArgumentException:3.9 While the Subscription is not cancelled, Subscription.request(long n) MUST throw a java.lang.illegalArgumentException if argument <= 0
at com.mongodb.reactivestreams.client.internal.ObservableToPublisher$1
$1.request(ObservableToPublisher.java:43)
at reactor.core.publisher.FluxMap$MapSubscriber.request(FluxMap.java:155)
at reactor.core.publisher.FluxBufferTimeout
$BufferimeoutSubscriber.requestMore(FluxBufferTimeout.java:317)
I'm not able to determine what is wrong, and why the stream is terminating with this error.
Any help would be highly appreciated.
The application started throwing this error after I added "bufferTimeout" to add a feature of batching. Before that, I had never encountered this exception.
Not sure how to replicate the issue as well, as it is not occurring locally or in UAT, but only in production environment of the application.
Any leads would be helpful.
Thanks!
Try adding a onBackPressureBuffer(), so that in case of low demand this operator buffers the requests, and emits items in a controlled way.
Processing streaming events and writing files in hourly buckets is a challenge due to windows, as some events from incoming hour can go into previous ones and such.
I've been digging around Apache Beam and its triggers but I'm struggling to manage triggering with timestamp as follows...
Window.<GenericRecord>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes())
This is what I've been doing so far, triggering 1 min windows no matter what timestamp. However, I would like to include the timestamp within the object so that it gets triggered just for those within.
Window.<GenericRecord>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(AfterWatermark
.pastEndOfWindow())
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes())
The objects that I'm dealing with have a timestamp object, however, this is a long field and not an Instant field whatsoever.
"{ \"name\": \"timestamp\", \"type\": \"long\", \"logicalType\": \"timestamp-micros\" },"
Having my POJO class with that long field triggers nothing, but if I swap it for an Instant class and recreate the object properly, the following error is thrown whenever a PubSub message is read.
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.Long
I've been also thinking to create a kind of wrapper class around GenericRecord which contains a timestamp, but would need to just use the GenericRecord part within once its ready to write with FileIO to .parquet.
Which other ways do I have to use watermark triggers?
EDIT: After #Anton comments, I've tried the following.
.apply("Apply timestamps", WithTimestamps.of(
(SerializableFunction<GenericRecord, Instant>) item -> new Instant(Long.valueOf(item.get("timestamp").toString())))
.withAllowedTimestampSkew(Duration.standardSeconds(30)))
Even it it has been deprecated this seem to pass through the pipeline but still not written (still getting discarded prior writing for some reason by the previously shown trigger?).
And also tried the other mentioned approach using outputWithTimestamp but due to the delay, it's printing the following error...
Caused by: java.lang.IllegalArgumentException: Cannot output with timestamp 2019-06-12T18:59:58.609Z. Output timestamps must be no earlier than the timestamp of the current input (2019-06-12T18:59:59.848Z) minus the allowed skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the allowed skew.
I have a Cloud Dataflow pipeline in which I alter the original timestamp for the event in order to simulate real world scenarios of events arriving late. However, it appears I'm dropping some percentage of my events on each run of the pipeline. Inside my DoFn I use the following code to change the timestamp:
Instant newTimestamp = originalTimestamp.minus(Duration.standardMinutes(RANDOM.nextInt(15)));
c.outputWithTimestamp(KV.of(Integer.toString(RANDOM.nextInt(100)), element), newTimestamp);
The problem is most likely caused by your DoFn step outputting a timestamp that is earlier than the timestamp that was received by the processing step minus the allowed timestamp skew. The exception that would be thrown can be found here in the code:
https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/util/DoFnRunnerBase.java#L493
This behavior is documented with regard to using outputWithTimestamp here:
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/DoFn.Context#outputWithTimestamp-OutputT-org.joda.time.Instant-
While you could override the getAllowedTimestampSkew function, is is also documented that this might cause unpredictable issues with the watermark calculations so it should only be used without windowing/grouping.
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/DoFn#getAllowedTimestampSkew--
I'm getting the following exception when running the pipeline locally. There is no exception when submitting for cloud execution.
Thanks,
Genady
INFO: Executing pipeline using the DirectPipelineRunner.
Exception in thread "main" java.lang.IllegalStateException: no evaluator registered for GroupedValues [GroupedValues]
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:606)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:583)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
at app.Main.main(Main.java:124)
The code outline is basically this:
PCollection<KV<MyKey, Iterable<MyValue>>> groupedByMyKey = ...
PCollection<KV<MyKey, MyAggregated>> aggregated = groupedByMyKey.apply(
Combine.<MyKey, MyValue, MyAggregated>groupedValues(new Aggregator()));
Aggregator class extends CombineFn<MyValue, List<MyValue>, MyAggregated>
Can you share a code snippet that triggers this? GroupedValues is a PTransform that is often used within various combining transforms, so it might be from using something like Min, Max, etc.
The error means that the DirectPipelineRunner doesn't know how to evaluate a GroupedValues. However, that's unexpected, since that should have been expanded into a ParDo before execution.
I found the reason to this behaviour
I was using a command line argument to run it in remote mode (--runner=BlockingDataflowPipelineRunner) and then forced it to run locally with
PipelineRunner<?> runner = DirectPipelineRunner.fromOptions(options);
runner.run(p);
After removing these lines and just using the --runner=DirectPipelineRunner argument it worked as expected.
I've recently started working with Neo4J and so far I haven't been able to find an answer to the problems I'm having, in particular with the server. I'm using version 1.8.1 and running the server as a service on Windows, not embedded. The graph I have has around 7m nodes and nearly 11m relationships.
With small queries and multiples of, things run nicely. However, when I'm trying to pull back more complex queries, potentially thousands of rows, things go sour. If I'm using the console, I'll get nothing and then after a few minutes or more the word undefined appears (it's trying to do something in Javascript but I'm not sure what). If I'm using Neo4JClient in .NET, it'll time out (I'm working this through a WCF service) and I suspect my problems are server side.
Here is a sample cypher query that has caused me problems in the console:
start begin = node:idx(ID="1234")
MATCH begin-[r1?:RELATED_TO]-n1-[r2?:RELATED_TO]-n2-[r3?:RELATED_TO]-n3-[r4?:RELATED_TO]-n4
RETURN begin.Title?, r1.RelationType?, n1.Title?, r2.RelationType?, n2.Title?, r3.RelationType?, n3.Title?, r4.RelationType?, n4.Title?;
I've looked through the logs and I'm receiving the following severe error:
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at java.io.StringWriter.write(Unknown Source)
at java.io.PrintWriter.newLine(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at java.io.PrintWriter.println(Unknown Source)
at org.neo4j.cypher.PipeExecutionResult$$anonfun$dumpToString$1.apply(PipeExecutionResult.scala:96)
at org.neo4j.cypher.PipeExecutionResult$$anonfun$dumpToString$1.apply(PipeExecutionResult.scala:96)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:45)
at org.neo4j.cypher.PipeExecutionResult.dumpToString(PipeExecutionResult.scala:96)
at org.neo4j.cypher.PipeExecutionResult.dumpToString(PipeExecutionResult.scala:124)
at org.neo4j.cypher.javacompat.ExecutionResult.toString(ExecutionResult.java:90)
at org.neo4j.shell.kernel.apps.Start.exec(Start.java:72)
at org.neo4j.shell.kernel.apps.ReadOnlyGraphDatabaseApp.execute(ReadOnlyGraphDatabaseApp.java:32)
at org.neo4j.shell.impl.AbstractAppServer.interpretLine(AbstractAppServer.java:127)
at org.neo4j.shell.kernel.GraphDatabaseShellServer.interpretLine(GraphDatabaseShellServer.java:92)
at org.neo4j.shell.impl.AbstractClient.evaluate(AbstractClient.java:130)
at org.neo4j.shell.impl.AbstractClient.evaluate(AbstractClient.java:114)
at org.neo4j.server.webadmin.rest.ShellSession.evaluate(ShellSession.java:96)
at org.neo4j.server.webadmin.rest.ConsoleService.exec(ConsoleService.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
From an educated guess perspective looking at the stack trace, is it that I'm pulling back too many records? Since it's running out of memory whilst expanding the StringBuffer.
I've wondered whether GC could be playing a part, so I got hold of GCViewer. It doesn't seem to be Garbage Collection, I can add in a screenshot from GCViewer if you think it will be useful.
I've allocated the JVM anywhere between the default value and 8G of memory. Here are some of my settings from my configuration files (I'll try and include only the relevant ones). Let me know if you need any more.
Neo4J.properties
# Default values for the low-level graph engine
use_memory_mapped_buffers=false
# Keep logical logs, helps debugging but uses more disk space, enabled for legacy reasons
keep_logical_logs=true
Neo4J-server.properties
# HTTP logging is disabled. HTTP logging can be enabled by setting this property to 'true'.
org.neo4j.server.http.log.enabled=false
Neo4J-Wrapper.conf (possibly inexpertly slotted together)
# Uncomment the following line to enable garbage collection logging
wrapper.java.additional.4=-Xloggc:data/log/neo4j-gc.log
# Setting a different Garbage Collector as recommended by Neo4J
wrapper.java.additional.5=-XX:+UseConcMarkSweepGC
# other beneficial settings that should boost performance
wrapper.java.additional.6=-d64
wrapper.java.additional.7=-server
wrapper.java.additional.8=-Xss1024k
# Initial Java Heap Size (in MB)
wrapper.java.initmemory=1024
# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=8000
Any help would be gratefully appreciated.
your query is simply too complex. when you have such a large graph than to be sure you will not reach your heap memory limit, you must have appropriate memory allocated. you might want to play a little bit with this configuration: io examples.
however, your query could be simplified to this:
start begin = node:idx(ID="1234")
MATCH p=begin-[r1:RELATED_TO*0..4]-n4
RETURN p
Craig the problem is that you use the Neo4j-Shell which is just an ops tool and just collects the data in memory before sending back, it was never meant to handle huge result sets.
You probably want to run your queries directly against the http endpoint with streaming enabled (X-Stream:true http-header) then you have no problem with that anymore.