The reactor documentation states the following:
Nothing happens until you subscribe
If that was true, why do I see a java.lang.NullPointerException when I run the following code snippet, which has a reactor chain without a subscription?
#Test
void test() {
String a = null;
Flux.just(a.toLowerCase())
.doOnNext(System.out::println);
}
Deepak,
Nothing happens means the data will not be flowing through the chain of your functions to your consumers until a subscription happens.
You're getting NPE because Java tries to compute the value which is given to a hot operator just() on the Flux definition step.
You can also convert just() to a cold operator using defer() so you will receive NPE only after a subscription happened:
public Flux<String> test() {
String a = null;
return Flux.defer(() -> Flux.just(a.toLowerCase()))
.doOnNext(System.out::println);
}
Please, read more about hot vs hold operators.
Update:
Small example of cold and hot publishers. Each time new subscription happens cold publisher's body is recalculated. Meanwhile, just() is only producing time that was calculated only once at definition time.
Mono<Date> currentTime = Mono.just(Calendar.getInstance().getTime());
Mono<Date> realCurrentTime = Mono.defer(() -> Mono.just(Calendar.getInstance().getTime()));
// 1 sec sleep
Thread.sleep(1000);
currentTime.subscribe(time -> System.out.println("Current Time " + time.getTime()));
realCurrentTime.subscribe(time -> System.out.println("Real current Time " + time.getTime()));
Thread.sleep(2000);
currentTime.subscribe(time -> System.out.println("Current Time " + time.getTime()));
realCurrentTime.subscribe(time -> System.out.println("Real current Time " + time.getTime()));
The output is:
Current Time 1583788755759
Real current Time 1583788756826
Current Time 1583788755759
Real current Time 1583788758833
Related
For some time have been using create an EmitterProcessor with built in sink as follows:
EmitterProcessor<String> emitter = EmitterProcessor.create();
FluxSink<String> sink = emitter.sink(FluxSink.OverflowStrategy.LATEST);
The sink publishes using a Flux .from command
Flux<String> out = Flux
.from(emitter
.log(log.getName()));
and the sink can be passed around, and populated with strings, simply using the next instruction.
Now we see that EmitterProcessor is deprecated.
It's all replaced with Sinks.many() like this
Many<String> sink = Sinks.many().unicast().onBackpressureBuffer();
but how to use that to publish from?
The answer was casting the Sinks.many() to asFlux()
Flux<String> out = Flux
.from(sink.asFlux()
.log(log.getName()));
Also using that for cancel and termination of the flux
sink.asFlux().doOnCancel(() -> {
cancelSink(id, request);
});
/* Handle errors, eviction, expiration */
sink.asFlux().doOnTerminate(() -> {
disposeSink(id);
});
UPDATE The cancel and terminate don't appear to work per this question
In Reactor Netty, when sending data to TCP channel via out.send(publisher), one would expect any publisher to work. However, if instead of a simple immediate Flux we use a more complex one with delayed elements, then it stops working properly.
For example, if we take this hello world TCP echo server, it works as expected:
import reactor.core.publisher.Flux;
import reactor.netty.DisposableServer;
import reactor.netty.tcp.TcpServer;
import java.time.Duration;
public class Reactor1 {
public static void main(String[] args) throws Exception {
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.flatMap(s ->
out.sendString(Flux.just(s.toUpperCase()))
))
.bind()
.block();
server.channel().closeFuture().sync();
}
}
However, if we change out.sendString to
out.sendString(Flux.just(s.toUpperCase()).delayElements(Duration.ofSeconds(1)))
then we would expect that for each received item an output will be produced with one second delay.
However, the way server behaves is that if it receives multiple items during the interval, it will produce output only for the first item. For example, below we type aa and bb during the first second, but only AA gets produced as output (after one second):
$ nc localhost 3344
aa
bb
AA <after one second>
Then, if we later type additional line, we get output (after one second) but from the previous input:
cc
BB <after one second>
Any ideas how to make send() work as expected with a delayed Flux?
I think you shouldn't recreate publisher for the out.sendString(...)
This works:
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> out
.options(NettyPipeline.SendOptions::flushOnEach)
.sendString(in.receive()
.asString()
.map(String::toUpperCase)
.delayElements(Duration.ofSeconds(1))))
.bind()
.block();
server.channel().closeFuture().sync();
Try to use concatMap. This works:
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.concatMap(s ->
out.sendString(Flux.just(s.toUpperCase())
.delayElements(Duration.ofSeconds(1)))
))
.bind()
.block();
server.channel().closeFuture().sync();
Delaying on the incoming traffic
DisposableServer server = TcpServer.create()
.port(3344)
.handle((in, out) -> in
.receive()
.asString()
.timestamp()
.delayElements(Duration.ofSeconds(1))
.concatMap(tuple2 ->
out.sendString(
Flux.just(tuple2.getT2().toUpperCase() +
" " +
(System.currentTimeMillis() - tuple2.getT1())
))
))
.bind()
.block();
Is it normal that in Quartz, for the JMX Attribute CurrentlyExecutingJobs=> [item] => jobRunTime always is "-1" while it is currently running, or is there some setting in Quartz to ensure the jobRunTime is updated appropriately?
(confirmed via jconsole, Mission Control, and jmx code)
Usecase is to track/monitor long-running jobs, and thought jobRunTime would be the appropriate path. The alternative path is "fireTime" + CURRENT_NOW calculation, but wanted to avoid extra calculation if it was already occurring somewhere.
After chasing this around, this particular value is not updated without it being manually set. Reviewing tools that monitor Quartz jobs, such as Javamelody, they have to calculate it every time too:
elapsedTime = System.currentTimeMillis()- quartzAdapter.getContextFireTime(jobExecutionContext).getTime();
If you want to manually update the jobruntime value for long-running jobs to check the value rather than calculating it outside, you have to change every job you have to support this feature. Here is a rough example that can be modified for your needs sourced from: https://github.com/dhartford/quartz-snippets/blob/master/update_jobruntime_timer_innerclass
/**
* inner class to handle scheduled updates of the Quartz jobruntime attribute
*/
class UpdateJobTimer extends TimerTask{
private JobExecutionContextImpl jec;
/* usage example, such as at the start of the execute method of the Job interface:
* Timer timer = new Timer();
* //update every 10 seconds (in milliseconds), whatever poll timing you want
* timer.schedule(new UpdateJobTimer(jec), 0, 10000);
* ...
* timer.cancel(); //do cleanup in all appropriate spots
*/
UpdateJobTimer(JobExecutionContextImpl jec){
this.jec = jec;
}
#Override
public void run() {
long runtimeinms = jec.getFireTime().getTime() - new java.util.Date().getTime();
jec.setJobRunTime(runtimeinms);
System.out.println("DEBUG TIMERTASK on JOB: " + jec.getJobDetail().getKey().getName() + " triggered [" + jec.getFireTime() + "] updated [" + new java.util.Date() + "]" );
}
}`
I have a Pub/Sub topic + subscription and want to consume and aggregate the unbounded data from the subscription in a Dataflow. I use a fixed window and write the aggregates to BigQuery.
Reading and writing (without windowing and aggregation) works fine. But when I pipe the data into a fixed window (to count the elements in each window) the window is never triggered. And thus the aggregates are not written.
Here is my word publisher (it uses kinglear.txt from the examples as input file):
public static class AddCurrentTimestampFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
c.outputWithTimestamp(c.element(), new Instant(System.currentTimeMillis()));
}
}
public static class ExtractWordsFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
String[] words = c.element().split("[^a-zA-Z']+");
for (String word:words){ if(!word.isEmpty()){ c.output(word); }}
}
}
// main:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
p.apply("ReadLines", TextIO.Read.from(o.getInputFile()))
.apply("Lines2Words", ParDo.of(new ExtractWordsFn()))
.apply("AddTimestampFn", ParDo.of(new AddCurrentTimestampFn()))
.apply("WriteTopic", PubsubIO.Write.topic(o.getTopic()));
p.run();
Here is my windowed word counter:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
BigQueryIO.Write.Bound tablePipe = BigQueryIO.Write.to(o.getTable(o))
.withSchema(o.getSchema())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND);
Window.Bound<String> w = Window
.<String>into(FixedWindows.of(Duration.standardSeconds(1)));
p.apply("ReadTopic", PubsubIO.Read.subscription(o.getSubscription()))
.apply("FixedWindow", w)
.apply("CountWords", Count.<String>perElement())
.apply("CreateRows", ParDo.of(new WordCountToRowFn()))
.apply("WriteRows", tablePipe);
p.run();
The above subscriber will not work, since the window does not seem to trigger using the default trigger. However, if I manually define a trigger the code works and the counts are written to BigQuery.
Window.Bound<String> w = Window.<String>into(FixedWindows.of(Duration.standardSeconds(1)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes();
I like to avoid specifying custom triggers if possible.
Questions:
Why does my solution not work with Dataflow's default trigger?
How do I have to change my publisher or subscriber to trigger windows using the default trigger?
How are you determining the trigger never fires?
Your PubSubIO.Write and PubSubIO.Read transforms should both specify a timestamp label using withTimestampLabel, otherwise the timestamps you've added will not be written to PubSub and the publish times will be used.
Either way, the input watermark of the pipeline will be derived from the timestamps of the elements waiting in PubSub. Once all inputs have been processed, it will stay back for a few minutes (in case there was a delay in the publisher) before advancing to real time.
What you are likely seeing is that all the elements are published in the same ~1 second window (since the input file is pretty small). These are all read and processed relatively quickly, but the 1-second window they are put in will not trigger until after the input watermark has advanced, indicating that all data in that 1-second window has been consumed.
This won't happen until several minutes, which may make it look like the trigger isn't working. The trigger you wrote fired after 1 second of processing time, which would fire much earlier, but there is no guarantee all the data has been processed.
Steps to get better behavior from the default trigger:
Use withTimestampLabel on both the write and read pubsub steps.
Have the publisher spread the timestamps out further (eg., run for several minutes and spread the timestamps out across that range)
We have a Grails project that runs behind a load balancer. There are three instances of the Grails application running on the server (using separate Tomcat instances). Each instance has its own searchable index. Because the indexes are separate, the automatic update is not enough keeping the index consistent between the application instances. Because of this we have disabled the searchable index mirroring and updates to the index are done manually in a scheduled quartz job. According to our understanding no other part of the application should modify the index.
The quartz job runs once a minute and it checks from the database which rows have been updated by the application, and re-indexes those objects. The job also checks if the same job is already running so it doesn’t do any concurrent indexing. The application runs fine for few hours after the startup and then suddenly when the job is starting, LockObtainFailedException is thrown:
22.10.2012 11:20:40 [xxxx.ReindexJob] ERROR Could not update searchable index, class org.compass.core.engine.SearchEngineException:
Failed to open writer for sub index [product]; nested exception is
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
out:
SimpleFSLock#/home/xxx/tomcat/searchable-index/index/product/lucene-a7bbc72a49512284f5ac54f5d7d32849-write.lock
According to the log the last time the job was executed, re-indexing was done without any errors and the job finished successfully. Still, this time the re-index operation throws the locking exception, as if the previous operation was unfinished and the lock had not been released. The lock will not be released until the application is restarted.
We tried to solve the problem by manually opening the locked index, which causes the following error to be printed to the log:
22.10.2012 11:21:30 [manager.IndexWritersManager ] ERROR Illegal state, marking an index writer as open, while another is marked as
open for sub index [product]
After this the job seems to be working correctly and doesn’t become stuck in a locked state again. However this causes the application to constantly use 100 % of the CPU resource. Below is a shortened version of the quartz job code.
Any help would be appreciated to solve the problem, thanks in advance.
class ReindexJob {
def compass
...
static Calendar lastIndexed
static triggers = {
// Every day every minute (at xx:xx:30), start delay 2 min
// cronExpression: "s m h D M W [Y]"
cron name: "ReindexTrigger", cronExpression: "30 * * * * ?", startDelay: 120000
}
def execute() {
if (ConcurrencyHelper.isLocked(ConcurrencyHelper.Locks.LUCENE_INDEX)) {
log.error("Search index has been locked, not doing anything.")
return
}
try {
boolean acquiredLock = ConcurrencyHelper.lock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
if (!acquiredLock) {
log.warn("Could not lock search index, not doing anything.")
return
}
Calendar reindexDate = lastIndexed
Calendar newReindexDate = Calendar.instance
if (!reindexDate) {
reindexDate = Calendar.instance
reindexDate.add(Calendar.MINUTE, -3)
lastIndexed = reindexDate
}
log.debug("+++ Starting ReindexJob, last indexed ${TextHelper.formatDate("yyyy-MM-dd HH:mm:ss", reindexDate.time)} +++")
Long start = System.currentTimeMillis()
String reindexMessage = ""
// Retrieve the ids of products that have been modified since the job last ran
String productQuery = "select p.id from Product ..."
List<Long> productIds = Product.executeQuery(productQuery, ["lastIndexedDate": reindexDate.time, "lastIndexedCalendar": reindexDate])
if (productIds) {
reindexMessage += "Found ${productIds.size()} product(s) to reindex. "
final int BATCH_SIZE = 10
Long time = TimeHelper.timer {
for (int inserted = 0; inserted < productIds.size(); inserted += BATCH_SIZE) {
log.debug("Indexing from ${inserted + 1} to ${Math.min(inserted + BATCH_SIZE, productIds.size())}: ${productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size()))}")
Product.reindex(productIds.subList(inserted, Math.min(inserted + BATCH_SIZE, productIds.size())))
Thread.sleep(250)
}
}
reindexMessage += " (${time / 1000} s). "
} else {
reindexMessage += "No products to reindex. "
}
log.debug(reindexMessage)
// Re-index brands
Brand.reindex()
lastIndexed = newReindexDate
log.debug("+++ Finished ReindexJob (${(System.currentTimeMillis() - start) / 1000} s) +++")
} catch (Exception e) {
log.error("Could not update searchable index, ${e.class}: ${e.message}")
if (e instanceof org.apache.lucene.store.LockObtainFailedException || e instanceof org.compass.core.engine.SearchEngineException) {
log.info("This is a Lucene index locking exception.")
for (String subIndex in compass.searchEngineIndexManager.getSubIndexes()) {
if (compass.searchEngineIndexManager.isLocked(subIndex)) {
log.info("Releasing Lucene index lock for sub index ${subIndex}")
compass.searchEngineIndexManager.releaseLock(subIndex)
}
}
}
} finally {
ConcurrencyHelper.unlock(ConcurrencyHelper.Locks.LUCENE_INDEX, "ReindexJob")
}
}
}
Based on JMX CPU samples, it seems that Compass is doing some scheduling behind the scenes. From 1 minute CPU samples it seems like there are few things different when normal and 100% CPU instances are compared:
org.apache.lucene.index.IndexWriter.doWait() is using most of the CPU time.
Compass Scheduled Executor Thread is shown in the thread list, this was not seen in a normal situation.
One Compass Executor Thread is doing commitMerge, in a normal situation none of these threads was doing commitMerge.
You can try increasing the 'compass.transaction.lockTimeout' setting. The default is 10 (seconds).
Another option is to disable concurrency in Compass and make it synchronous. This is controlled with the 'compass.transaction.processor.read_committed.concurrentOperations': 'false' setting. You might also have to set 'compass.transaction.processor' to 'read_committed'
These are the compass settings we are currently using:
compassSettings = [
'compass.engine.optimizer.schedule.period': '300',
'compass.engine.mergeFactor':'1000',
'compass.engine.maxBufferedDocs':'1000',
'compass.engine.ramBufferSize': '128',
'compass.engine.useCompoundFile': 'false',
'compass.transaction.processor': 'read_committed',
'compass.transaction.processor.read_committed.concurrentOperations': 'false',
'compass.transaction.lockTimeout': '30',
'compass.transaction.lockPollInterval': '500',
'compass.transaction.readCommitted.translog.connection': 'ram://'
]
This has concurrency switched off. You can make it asynchronous by changing the 'compass.transaction.processor.read_committed.concurrentOperations' setting to 'true'. (or removing the entry).
Compass configuration reference:
http://static.compassframework.org/docs/latest/core-configuration.html
Documentation for the concurrency of read_committed processor:
http://www.compass-project.org/docs/latest/reference/html/core-searchengine.html#core-searchengine-transaction-read_committed
If you want to keep async operations, you can also control the number of threads it uses. Using compass.transaction.processor.read_committed.concurrencyLevel=1 setting would allow asynchronous operations but just use one thread (the default is 5 threads). There are also the compass.transaction.processor.read_committed.backlog and compass.transaction.processor.read_committed.addTimeout settings.
I hope this helps.