Why mongodb source always return the entire documents but not new added document - spring-cloud-dataflow

this is the DSL using to create mongodb source ingestion to log:
stream create --name mongodb7hdfs7 --definition
"mongodb120
--database=sourcedb
--host=172.20.74.91
--fixed-delay=5
--cron='*/10 * * * * *'
--initial-delay=5
--max-messages=1
--collection=sourceCol | log" --deploy
When monitoring the log, it can always see the entire documents of the collection are ingested to the sink. But what we expect is to see only new added document will be ingested to the sink. In the monitored log, we would see all the documents of the collection being printed in the log every 10 seconds. But the expected result is supposed to only see the latest added document being printed.
Can someone help on this? Much appreciated!
BTW, the http source is always only ingesting new added data to the sink.

Related

Launching composed task built by DSL from stream application

Every example I've seen (task-launcher sink and triggertask source ) shows how to launch the task defined by uri attribute.
My tasks definitions look like this :
sampleTask <t2: timestamp || t1: timestamp>
sampleTask-t1 timestamp
sampleTask-t2 timestamp
sampleTaskRunner composed-task-runner --graph=sampleTask
My question is how do I launch the composed task runner (sampleTaskRunner, defined by DSL) from stream application.
Thanks
UPDATE
I ended up with the below solution that triggers task using SCDF REST API :
composedTask definition :
<timestamp || mySampleTask>
Stream definition :
http | httpclient | log
Deployment properties :
app.http.port=81
app.httpclient.body=name=composedTask&arguments=--increment-instance-enabled=true
app.httpclient.http-method=POST
app.httpclient.url=http://localhost:9393/tasks/executions
app.httpclient.headers-expression={'Content-Type':'application/x-www-form-urlencoded'}
Though it's easy to implement http sink component, would be great if stream application starters will provide one out of the box.
Another concern I have is about discovering the SCDF REST URL when deployed in distributed environment.
Here's a quick take from one of the SCDF's R&D team members (Glenn Renfro).
stream create foozer --definition "trigger --fixed-delay=5 | tasklaunchrequest-transform --uri=maven://org.springframework.cloud.task.app:composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT --command-line-arguments='--graph=sampleTask-t1||sampleTask-t2 --increment-instance-enabled=true --spring.datasource.url=jdbc:mariadb://localhost:3306/test --spring.datasource.username=root --spring.datasource.password=password --spring.datasource.driverClassName=org.mariadb.jdbc.Driver' | task-launcher-local" --deploy
In the foozer stream definition,
1) "trigger" source happens to trigger an upstream event every 5s
2) "tasklaunchrequest-transform" processor takes a few arguments; more specifically, it uses "composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT" to launch a composed-task graph (i.e., sampleTask-t1||sampleTask-t2)
3) Pay attention to --increment-instance-enabled. This was recently added to CTR application and this provides the ability to re-launch a composed-task in a recurring cadence
4) Since the CTR and SCDF must share the same database, we are also passing datasource properties as command-line args. (SCDF-server is already started with the same datasource credentials)
Hope this helps.
Lastly, we will add a sample to the reference guide via: spring-cloud/spring-cloud-dataflow#1780

Aggregate-counter on an existing stream

I'm trying to create an aggregate counter for various streams I have set up. In SpringXD it would look like this: "tap:stream:MyCustomStream > aggregate-counter".
In Spring Cloud Dataflow so far I have done ":MyKafkaTopic > aggregate-counter", which seems to create a Kafka consumer and read the payload to determine a count of events on the topic. I'd like to be able to tap any stream not just a Kafka source, e.g. "MyApp1 | MyApp2" --name MyCustomStream.
The provided example "stream create --definition ":mainstream.http > counter" --name tap_at_http --deploy" essentially assumes mainstream.http is a Kafka topic (or RabbitMQ topic).
Anyone done this before?
Going by your example,
stream create foo --definition "MyApp1 | MyApp2"
If you'd have to TAP the foo stream at the producer, MyApp1 level, your TAP stream would like the following.
stream create bar --definition ":foo.MyApp1 > MyApp3"
You're just pointing to the producer in the stream where you'd like to TAP to get a copy of same data. The format is: :<streamName>.<label/appName>. You could use "labels" instead of app names, too. Please review the reference guide for more details.
The provided example "stream create --definition ":mainstream.http > counter" --name tap_at_http --deploy" essentially assumes mainstream.http is a Kafka topic (or RabbitMQ topic).
In this case, mainstream is the stream name and you're TAP'ing at http source application, which equates to :mainstream.http.
This is analogous to tap:stream:foo in Spring XD. By default, Spring XD assumes the producer if there's only in the stream. You'd have to specify it when you TAP at the processor, though.
In SCDF, we require it specifically to make it more descriptive and the DSL is easy to follow as well.

change trace log format in emqtt message broker

I am using emqtt message broker for mqtt.
I am not a erlang developer and has zero knowledge on that.
I have used this erlang based broker, because after searching many open source broker online and suggestions from people about the advantage of erlang based server.
Now i am kind of stuck with the out put of the emqttd_cli trace command.
Its not json type and if i use a perl parser to convert to json type i am getting delayed output.
I want to know, in which file i could change the trace log output format.
I looked on the trace code of the broker and found a file src/emqttd_protocol.erl. An exported function named trace/3 has the code that you need.
Second argument of this function, named Packet, has the information of receive & send data via broker. You can fetch required data from it and format according to how you want to print.
Edit : Sample modified code added
trace(recv, Packet, ProtoState) ->
PacketHeader = Packet#mqtt_packet.header,
HostInfo = esockd_net:format(ProtoState#proto_state.peername),
%% PacketInfo = {ClientId, Username, ClientIP, ClientPort, Payload, QoS, Retain}
PacketInfo = {ProtoState#proto_state.client_id, ProtoState#proto_state.username, lists:nth(1, HostInfo), lists:nth(3, HostInfo), Packet#mqtt_packet.payload, PacketHeader#mqtt_packet_header.qos, PacketHeader#mqtt_packet_header.retain},
?LOG(info, "Data Received ~s", [PacketInfo], ProtoState);

Write to the system's standard error in Progress

I am writing a small program in Progress that needs to write an error message to the system's standard error. What ways, simple if at all possible, can I use to print to standard error?
I am using OpenEdge 11.3.
When on Windows (10.2B+) you can use .NET:
System.Console:Error:WriteLine ("This is an error message") .
together with
prowin32 2> stderr.out
Progress doesn't provide a way to write to stderr - the easiest way I can think of is to output-through an external program that takes stdin and echoes it to stderr.
You could look into LOG-MANAGER:WRITE-MESSAGE. It won't log to standard output or standard error, but to a client-specific log. This log should be monitored in any case (specifically if the client is an application server).
From the documentation:
For an interactive or batch client, the WRITE-MESSAGE( ) method writes the log entries to the log file specified by the LOGFILE-NAME attribute or the Client Logging (-clientlog) startup parameter. For WebSpeed agents and AppServer servers, the WRITE-MESSAGE() method writes the log entries to the server log file. For DataServers, the WRITE-MESSAGE() method writes the log entries to the log file specified by the DataServer Logging (-dslog) startup parameter.
LOG-MANAGER:WRITE-MESSAGE("Got here, x=" + STRING(x), "DEBUG1").
Will write this in the log:
[04/12/05#13:19:19.742-0500] P-003616 T-001984 1 4GL DEBUG1 Got here, x=5
There are quite a lot of options regarding the LOG-MANAGER system, what messages to display, where the file is placed, etc.
There is no easy way, but in Unixen you can always do something like this using OUTPUT THROUGH (untested):
output through "cat >&2" no-echo unbuffered.
Alternatively -- and this is tested -- if you just want error messages from a batch-mode program to go to standard out then
output through "tee" ...
...definitely works.

Neo4j: Java API IndexHits<Node>.size() is 0

I'm trying to use the Java API for Neo4j but I seem to be stuck at IndexHits. If I query the DB with Cypher using
START n=node:types(type="Process") RETURN n;
I get all 2087 nodes of type "Process".
In my application I have the following lines
Index<Node> nodeIndex = db.index().forNodes("types");
IndexHits<Node> hits = nodeIndex.get("type", "Process");
System.out.println("Node index size: " + hits.size());
which leads my console to spit out a value of 0. Here, db is of course an instance of GraphDatabaseService.
I expected an object that included all 2087 nodes. What am I doing wrong?
The .size() question is just the prelude to my iterator
for(Node process : hits) { ... }
but that does not much when hits.size() == 0. According to http://api.neo4j.org/1.9.2/org/neo4j/graphdb/index/IndexHits.html this should be possible, provided there is something in hits.
Thanks in advance for your help.
I figured it out. Man, I feel so embarrassed...
It so happens that I had set up the DB_PATH to my default data folder, whereas the default storage folder is the default data folder plus graph.db. When I tried to run the code from that corrected DB_PATH I got an error saying that a lock file was in place because the Neo4j server was running. After shutting it down it worked perfectly.
So, if you happen to see the following error, just stop the server and run the code again:
Caused by: org.neo4j.kernel.StoreLockException: Could not create lock file
at org.neo4j.kernel.StoreLocker.checkLock(StoreLocker.java:74)
at org.neo4j.kernel.StoreLockerLifecycleAdapter.start(StoreLockerLifecycleAdapter.java:40)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:491)
I found on several forums that you cannot run the Neo4j server and use the Java API to query it at the same time.

Resources