Is there a way to show specific job's log message events using their event ID's?
For example, you can:
./dsjob -lognewest Project Job
And it will give you the latest Event ID - for example:
Newest id = 3270
You can limit the extracted Event ID's messages with:
./dsjob -logsum -max 30 Project Job
How can I show only the messages of Event ID's between N and N ID?
If not - is there a way to check how many times one job run on a specific date?
Related
Consider the notion of an input stream of intertwined records representing a user interaction (for example a product purchase). Imagine we receive records that indicate a user has placed a product in their shopping basket. At some time later, they perform a check-out ... or ... abandon their cart.
I thus receive a stream of records such as:
Transaction: 123, Added item A to basket
Transaction: 123, Added item B to basket
...
Transaction: 123, Checked out basket
My goal is to output from the pipeline the aggregate of the transaction. For example, given the above, I want to output:
Transaction 123, Items A, B, ... Sale completed
or if no check-out occurs within 24 hours from the last event:
Transaction 123, Items A, B, ... Sale abandoned
... and this is where I'm stuck. I feel that there is some way to think about this story from an Apache Beam pipeline perspective but I'm afraid I'm at a loss on where to begin. I'm thinking that I somehow want to window the records by both transaction and termination and only emit a batch for processing when either an end of transaction record is received or some time interval has elapsed since the last record seen.
Data based window markers have an inherent assumption on ordering of data which is not supported by Beam. In the above schenario, it is assumed that checkout event will come after all the add to the cart events.
However to solve this problem in a crude way you can use State along with Session window to express this in a crude way.
PCollection-RawEvents: Read raw events
PCollection-1: PCollection-RawEvents -> Apply 24 hour SessionWindow to all events.
PCollection-Checkout: PCollection-1 -> Push all the elements for a key in BagState. Read back the state and publish the event Transaction 123, Items A, B, ... Sale completed when you get checkout event Transaction: 123, Checked out basket.
PCollection-Abandon: PCollection-1 -> GroupByKey -> Publish Transaction 123, Items A, B, ... Sale abandoned if Transaction: 123, Checked out basket is not present.
PCollection-Unified: Flatten (PCollection-Checkout, PCollection-Abandon)
I have a list of email addresses in SPSS. I'm trying to write syntax to count how many times each email address appears.
For instance:
In my desired output, if johndoe#aol.com appears in the data 3 times, I want all instances of his email to show a 3 in my new column.
I know I can write syntax to have it count (ie johndoe#aol.com will be assigned 1 the first time, then 2 then 3)... but this is not what I want.
Thanks!
Steps to do this:
Sort cases by email.
Get the counts using the Aggregate command.
Use the Identify Duplicate Cases command to generate an indicator of whether a given email is the first of its kind in the file.
Select cases that aren't the first with that particular email.
All four of those commands are in the Data menu in the GUI. Syntax to do the whole thing:
SORT CASES BY Email.
*This will create a new variable N_EMAIL with the counts. It will appear for every case.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/PRESORTED
/BREAK=Email
/N_EMAIL=N.
*Now we generate a "PrimaryFirst" indicator showing whether a given case is the first instance of its email.
MATCH FILES
/FILE=*
/BY Email
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryLast InDupGrp MatchSequence.
EXECUTE.
*Filter out duplicate cases.
SELECT IF PrimaryFirst = 1.
EXECUTE.
*Final cleanup.
DELETE VARIABLES PrimaryFirst.
Just run this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=EmailAddress /num_instances=N.
A new column will appear in the dataset called num_instances (you can of course select another name) which will have the desired count appear in all instances of each Email address.
my relationships look like this
A-[:CHATS_WITH]->B - denotes that the user have sent at least 1 mesg to the other user
then messages
A-[:FROM]->message-[:SENT_TO]->B
and vice versa
B-[:FROM]->message-[:SENT_TO]->A
and so on
now i would like to select all users a given user chats with together with the latest message between the two.
for now i have managed to get all messages between two users with this query
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
RETURN other,m ORDER BY m.timestamp DESC
how can I return just the latest message for each conversation?
Taking what you already have do you just want to tag LIMIT 1 to the end of the query?
The preferential way in a graph store is to manually manage a linked list to model the interaction stream in which case you'd just select the head or tail of the list. This is because you are playing to the graphs strengths (traversal) rather than reading data out of every Message node.
EDIT - Last message to each distinct contact.
I think you'll have to collect all the messages into an ordered collection and then return the head, but this sounds like it get get very slow if you have many friends/messages.
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
WITH other, m
ORDER BY m.timestamp DESC
RETURN other, HEAD(COLLECT(m))
See: Neo Linked Lists and Neo Modelling a Newsfeed.
I have a basic Esper query as follows:
#Name("MyTestQuery")
#Description("My First Test Query")
select sum(qty), venue
from MyTestWindow
group by venue
The query seems to duplicate the results of my sum i.e. if I send in a qty of 10 my query will fire multiple times and output:
10, 20, 30, 40
However, if I remove the group by function then it just outputs 10.
Is anyone able to advise why this might happen?
Typically you need to qualify the Stream name (MyTestWindow) with a window, so it is
"from MyTestWindow.win:time(1 sec) ". You need to select an appropriate window type from many Epser offers, depending on your application.
This example:
select sum(qty), venue
from MyTestWindow.win:time_batch(1 sec)
group by venue
having sum(qty) is not null
You can run a simple test of this at http://esper-epl-tryout.appspot.com/epltryout/mainform.html
the best way of doing a group by is to trigger an artificial "event" after sending in all events. this way you can fully control what you want you want to output and not let Esper's engine run in real time.
You might have to use the "distinct" feature in select to avoid duplicates. Esper can sometimes create duplicate events when you aren't using trigger variables, so distinct will allow you to get rid of unwanted events.
You can use win:time_batch to specified time interval in one update and coalesce function to handle the null value
select venue, sum(coalesce(ty, 0))
from MyTestWindow.win:time_batch(1 sec)
group by venue
I need to model a forum with Neo4j. I have "forums" nodes which have messages and, optionally, these messages have replies: forum-->message-->reply
The cypher query I am using to retrieve the messages of a forum and their replies is:
start forum=node({forumId}) match forum-[*1..]->msg
where (msg.parent=0 and msg.ts<={ts} or msg.parent<>0)
return msg ORDER BY msg.ts DESC limit 10
This query retrieves the messages with time<=ts and all their replies (a message has parent=0 and a reply has parent<>0)
My problem is that I need to retrieve pages of 10 messages (limit 10) independently of the number or replies.
For example, if I had 20 messages and the first one with 100 replies, it would only return 10 rows: the first message and 9 replies but I need the first 10 messages and the 100 replies of the first one.
How can I limit the result based on the number of messages and not their replies?
The ts property is indexed, but is this query efficient when mixing it with other where clauses?
Do you know a better way to model this kind of forum with Neo?
Supposing you switch to labels and avoid IDs (as they can be recycled and therefore are not stable identifiers):
MATCH (forum:FORUM)<--(message:MESSAGE {parent:0})
WHERE forum.name = '%s' // where %s identifies the forum in a *stable* way
WITH message // using a subquery allows to apply LIMIT only to main messages
ORDER BY message.ts DESC
LIMIT 10
OPTIONAL MATCH (message)<-[:REPLIES_TO]-(replies)
RETURN message, replies
The only important change here is to split the reply and message matching in two sub-queries, so that the LIMIT clause applies to the first subquery only.
However, you need to link the relevant replies to the matched main messages in the second subquery (I introduced a fictional relationship REPLIES_TO to link replies to messages).
And when you need to fetch page 2,3,4 etc.
You need an extra parameter (which the biggest message timestamp of the previous page, let's say previous_timestamp).
The first sub-query WHERE clause becomes:
WHERE forum.name = '%s' AND message.ts > previous_timestamp