I have a service that accepts and processes tasks. A Task has a status: queued, running, failed, cancelled or finished. Once in a while the service spits out a log entry with the json, like this:
2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 0, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }
I would like to plot a pie chart that would show the breakdown of the tasks by status ("running_tasks", "queued_tasks", "finished_tasks", "failed_tasks":, "cancelled_tasks" in the json message). So far I have failed to do so, because I cannot come up with how to construct a table out of such message. Any clues would be highly appreciated — thanks in advance!
Try something like this. Basically, you have to de-transpose the data. I hope this makes sense!
...
| parse field=some_log_line "INFO - *" as jsonMessage
| json field=jsonMessage "running_tasks"
| json field=jsonMessage "queued_tasks"
| json field=jsonMessage "finished_tasks"
| "running_tasks,queued_tasks,finished_tasks," as message_keys
| parse regex field=message_keys "(?<message_key>.*?)," multi
| if (message_key="running_tasks", running_tasks, 0) as message_value
| if (message_key="queued_tasks", queued_tasks, message_value) as message_value
| if (message_key="finished_tasks", finished_tasks, message_value) as message_value
| fields message_key, message_value
| max(message_value) by message_key
First of all, Sumo Logic supports parsing JSON into fields. In your example not the whole line is a JSON, but only the part after "-", so you can add this to your query:
...
| parse "INFO - *" as jsonMessage
| json auto
Then, you can use running_tasks, queued_tasks, etc. as ordinary fields, e.g.
...
| timeslice 1m
| max(running_tasks), max(queued_tasks) by _timeslice
Disclaimer: I am currently employed by Sumo Logic.
Below is a pure python solution that will you plot the data.
The output (entries) is a dict where the key is the time stamp and the value is a dict that contains the interesting info. log_lines holds a collection of log messages and is used as the input.
import json
import pprint
log_lines = [
'2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 2, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }',
'2021-09-09 00:31:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 5, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }'
]
entries = dict()
for line in log_lines:
date = line[:line.find('[') - 1]
data = json.loads(line[line.find('{'):])
sub_set = {k: data.get(k,0) for k in
["running_tasks", "queued_tasks", "finished_tasks", "failed_tasks", "cancelled_tasks"]}
entries[date] = sub_set
pprint.pprint(entries)
output
{'2021-09-09 00:30:46,742': {'cancelled_tasks': 3,
'failed_tasks': 0,
'finished_tasks': 0,
'queued_tasks': 0,
'running_tasks': 2},
'2021-09-09 00:31:46,742': {'cancelled_tasks': 3,
'failed_tasks': 0,
'finished_tasks': 0,
'queued_tasks': 0,
'running_tasks': 5}}
Related
I'm following the multiple choice QA tutorial and trying to modify it slightly to fit my data. My data is exactly the same, except that I have 5 labels instead of 4:
# original data:
from datasets import load_dataset
swag = load_dataset("swag", "regular")
set(swag["train"]['label'])
>>> {0, 1, 2, 3}
# my data:
set(train_dataset["train"]['label'])
>>>
{0, 1, 2, 3, 4}
I'm running the code in the tutorial and getting the error:
nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed.
I found from here and here that this can be caused when the target values are out of bounds, which can happend when using nn.CrossEntropyLoss which expects a torch.LongTensor with values in the range [0, nb_classes-1].
I will not copy the entire script from the tutorial since it's in the link above, but I found that the error can be replicated by modifying the DataCollatorForMultipleChoice function by adding an extra label as follows:
from random import choices
#dataclass
class DataCollatorForMultipleChoice:
"""
Data collator that will dynamically pad the inputs for multiple choice received.
"""
tokenizer: PreTrainedTokenizerBase
padding: Union[bool, str, PaddingStrategy] = True
max_length: Optional[int] = None
pad_to_multiple_of: Optional[int] = None
def __call__(self, features):
label_name = "label" if "label" in features[0].keys() else "labels"
labels = [feature.pop(label_name) for feature in features]
labels = [random.choice(range(5)) for _ in range(16)] #<<<---ADDING EXTRA LABEL HERE. INSTEAD OF 0-4 THIS IS BETWEEN 0-5
print(len(labels))
print(labels)
batch_size = len(features)
num_choices = len(features[0]["input_ids"])
flattened_features = [
[{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
]
flattened_features = sum(flattened_features, [])
batch = self.tokenizer.pad(
flattened_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors="pt",
)
batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()}
batch["labels"] = torch.tensor(labels, dtype=torch.int64)
return batch
Then when I run the trainer I get:
16 # batch
[0, 0, 2, 1, 1, 1, 0, 4, 0, 4, 3, 0, 0, 0, 1, 1] # labels
... nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion `t >= 0 && t < n_classes` failed.
I tried changing the number of labels in the model:
# original:
# model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")
# my modification:
model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased", num_labels=5)
but I got the same error.
The script runs just fine with my data if I modify the added line from above to
labels = [random.choice(range(4)) for _ in range(16)] # note that now it's from 0-4 and not from 0-5
I'm trying to create a binary perceptron classifier using SAS to develop my skills with SAS. The data has been cleaned and split into training and test sets. Due to my inexperience, I expanded the label vector into a table of seven identical columns to correspond to the seven weights to make the calculations more straightforward, at least, given my limited experience this seemed to be a usable method. Anyway, I run the following:
PROC IML;
W = {0, 0, 0, 0, 0, 0, 0};
USE Work.X_train;
XVarNames = {"Pclass" "Sex" "Age" "FamSize" "EmbC" "EmbQ" "EmbS"};
READ ALL VAR XVarNames INTO X_trn;
USE Work.y_train;
YVarNames = {"S1" "S2" "S3" "S4" "S5" "S6" "S7"};
READ ALL VAR YVarNames INTO y_trn;
DO i = 1 to 668;
IF W`*X_trn[i] > 0 THEN Z = {1, 1, 1, 1, 1, 1, 1};
ELSE Z = {0, 0, 0, 0, 0, 0, 0};
W = W+(y_trn[i]`-Z)#X_trn[i]`;
END;
PRINT W;
RUN;
and the result is a column vector with seven entries each having value -2.373. The particular value isn't important, but clearly, a weight vector that is comprised of identical values is not useful. The question then is, what error in the code am I making that is producing this result?
My intuition is that something with how I am trying to call each row of observations for X_trn and y_trn into the equation is resulting in this error. Otherwise, it might be due to the matrix arithmetic in the W = line, but the orientation of all of the vectors seems to be appropriate.
I can't find any info about this error on google, so I'm posting here to see if anyone knows.
Basically, my code has a snippet that looks something like this:
int rc = pthread_cond_timedwait(&cond, &mutex, &ts);
if ( (0 != rc) && (ETIMEDOUT != rc)) {
assert(false); // This should not happen.
}
Occasionally, my program will crash and the corefile will show that rc = 454.
454 does not map to any of the error codes in errno.h. In addition, looking at the list of possible return values that can be given by pthread_cond_timedwait(), none of them resemble 454.
I've looked into the parameters passed in, but I don't really know how to interpret them or where I would be able to learn how.
(gdb) p *mutex
$20 = {m_lock = {m_owner = 100179, m_flags = 0, m_ceilings = {0, 0}, m_spare = {0, 0, 0, 0}}, m_type = PTHREAD_MUTEX_ERRORCHECK, m_owner = 0x80a004c00, m_count = 0, m_refcount = 0, m_spinloops = 0, m_yieldloops = 0, m_qe = {tqe_next = 0x0, tqe_prev = 0x80a004f10}}
(gdb) p *cond
$21 = {c_lock = {m_owner = 0, m_flags = 0, m_ceilings = {0, 0}, m_spare = {0, 0, 0, 0}}, c_kerncv = {c_has_waiters = 1, c_flags = 0, c_spare = {0, 0}}, c_pshared = 0, c_clockid = 0}
(gdb) p ts
$22 = {tv_sec = 1400543215, tv_nsec = 0}
The internals of "cond" look suspicious to me but, as I mentioned, I have no way to be sure.
Since it's FreeBSD, we can look at the source to see where you're getting the mysterious 454 return from. Using the source archives at fxr.watson.org, I searched for the symbol pthread_cond_timedwait and the only credible references are in the GLIBC27 code, so we'll look there.
In source file pthread_cond_timewait.c, we see the function __pthread_cond_timedwait. There are three returns; the first is EINVAL, the second is the return from __pthread_mutex_unlock_usercnt and the third is the return from __pthread_mutex_cond_lock. Now that I've given you the tools to find the answer, you can go chase down the rest of the answer yourself. The 454 must have come from one of the unlock or lock call.
The versioned_symbol macro at the bottom of the source file is what makes the local __pthread_cond_timedwait call the global pthread_cond_timedwait function.
I'm trying to use Pybrain to predict sequences of characters belonging to the Reber grammar.
Concretely what I'm doing is generating strings using the Reber grammar graph (you can check it here : http://www.felixgers.de/papers/phd.pdf page 22). An example of such string could be BPVVE. I want my neural network to learn the underlying rules of the grammar. For each of these string I create a sequence that would typically look like this :
[B, T, S, X, P, V, E,] , [B, T, S, X, P, V, E,]
B -> value = [1, 0, 0, 0, 0, 0, 0,] , target = [0, 0, 0, 0, 1, 0, 0,]
P -> value = [0, 0, 0, 0, 1, 0, 0,] , target = [0, 0, 0, 0, 0, 1, 0,]
V -> value = [0, 0, 0, 0, 0, 1, 0,] , target = [0, 0, 0, 0, 0, 1, 0,]
V -> value = [0, 0, 0, 0, 0, 1, 0,] , target = [0, 0, 0, 0, 0, 0, 1,]
E -> E is ignored for now because it marks the end
as you can see the value is just a 7-d vector representing the current letter and the target is the next letter in the Reber word.
Here is the code I'm trying to run :
#!/usr/bin/python
import reberGrammar as reber
import random as rnd
from pylab import *
from pybrain.supervised import RPropMinusTrainer
from pybrain.supervised import BackpropTrainer
from pybrain.datasets import SequenceClassificationDataSet
from pybrain.structure.modules import LSTMLayer, SoftmaxLayer
from pybrain.tools.validation import testOnSequenceData
from pybrain.tools.shortcuts import buildNetwork
def reberToListInt(word): #e.g. "BPVVE" -> [0,4,3,3,5]
out = [None]*len(word)
for i,l in enumerate(word):
if l == 'B':
out[i] = 0
elif l == 'T':
out[i] = 1
elif l == 'S':
out[i] = 2
elif l == 'V':
out[i] = 3
elif l == 'P':
out[i] = 4
elif l == 'E':
out[i] = 5
else :
out[i] = 6
return out
def buildReberDataSet(numSample):
"""Generate a 7 class dataset"""
reberLexicon = reber.ReberGrammarLexicon(numSample)
DS = SequenceClassificationDataSet(7, 7, nb_classes=7)
for rw in reberLexicon.lexicon:
DS.newSequence()
rw2 = reberToListInt(rw)
for i in range(len(rw2)-1): #inserting one letter at a time
inpt = outpt = [0.0]*7
inpt[rw2[i]]=1.0
outpt[rw2[i+1]]=1.0
DS.addSample(inpt,outpt)
return DS
def printDataSet(DS, numLines): #just to print some stat
print "\t############"
print "Number of sequences: ",DS.getNumSequences()
print "Input and output dimensions: ", DS.indim,"\t", DS.outdim
print "\n"
for i in range(numLines):
for inp, target in DS.getSequenceIterator(i):
print inp,
print "\n"
print "\t#############"
'''Dataset creation / split into training and test sets'''
fullDS = buildReberDataSet(700)
tstdata, trndata = fullDS.splitWithProportion( 0.25 )
trndata._convertToOneOfMany( bounds=[0.,1.])
tstdata._convertToOneOfMany( bounds=[0.,1.])
#printDataSet(trndata,2)
'''Network setup / training'''
rnn = buildNetwork( trndata.indim, 7, trndata.outdim, hiddenclass=LSTMLayer, outclass=SoftmaxLayer, outputbias=False, recurrent=True)
trainer = RPropMinusTrainer( rnn, dataset=trndata, verbose=True )
#trainer = BackpropTrainer( rnn, dataset=trndata, verbose=True, momentum=0.9, learningrate=0.5 )
trainError=[]
testError =[]
#errors = trainer.trainUntilConvergence()
for i in range(9):
trainer.trainEpochs( 2 )
trainError.append(100. * (1.0-testOnSequenceData(rnn, trndata)))
testError.append(100. * (1.0-testOnSequenceData(rnn, tstdata)))
print "train error: %5.2f%%" % trainError[i], ", test error: %5.2f%%" % testError[i]
plot(trainError)
hold(True)
plot(testError)
show()
I fail to train this net. The errors are fluctuating a lot and there is no real convergence. I would really appreciate some advises on this.
Here is the code I'm using to generate Reber strings :
#!/usr/bin/python
import random as rnd
class ReberGrammarLexicon(object):
lexicon = set() #contain Reber words
graph = [ [(1,'T'), (5,'P')], \
[(1, 'S'), (2, 'X')], \
[(3,'S') ,(5, 'X')], \
[(6, 'E')], \
[(3, 'V'),(2, 'P')], \
[(4, 'V'), (5, 'T')] ] #store the graph
def __init__(self, num, maxSize = 1000): #fill Lexicon with num words
self.maxSize = maxSize
if maxSize < 5:
raise NameError('maxSize too small, require maxSize > 4')
while len(self.lexicon) < num:
word = self.generateWord()
if word != None:
self.lexicon.add(word)
def generateWord(self): #generate one word
c = 2
currentEdge = 0
word = 'B'
while c <= self.maxSize:
inc = rnd.randint(0,len(self.graph[currentEdge])-1)
nextEdge = self.graph[currentEdge][inc][0]
word += self.graph[currentEdge][inc][1]
currentEdge = nextEdge
if currentEdge == 6 :
break
c+=1
if c > self.maxSize :
return None
return word
Thanks,
Best
I have a quite old C corporate parser code that was generated from an ancient Yacc and uses the yyact, yypact, yypgo, yyr1, yyr2, yytoks, yyexca, yychk, yydef tables (but no yyreds though) and the original grammar source is lost. That legacy piece of code need revamping but I cannot afford to recode it from scratch.
Could it be possible to mechanically retrieve / regenerate the parsing rules by deduction of the parsing tables in order to reconstruct the grammar?
Example with a little expression parser that I can process with the same ancient Yacc:
yytabelem yyexca[] ={
-1, 1,
0, -1,
-2, 0,
-1, 21,
261, 0,
-2, 8,
};
yytabelem yyact[]={
13, 9, 10, 11, 12, 23, 8, 22, 13, 9,
10, 11, 12, 9, 10, 11, 12, 1, 2, 11,
12, 6, 7, 4, 3, 0, 16, 5, 0, 14,
15, 0, 0, 0, 17, 18, 19, 20, 21, 0,
0, 24 };
yytabelem yypact[]={
-248, -1000, -236, -261, -236, -236, -1000, -1000, -248, -236,
-236, -236, -236, -236, -253, -1000, -263, -245, -245, -1000,
-1000, -249, -1000, -248, -1000 };
yytabelem yypgo[]={
0, 17, 24 };
yytabelem yyr1[]={
0, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2 };
yytabelem yyr2[]={
0, 8, 12, 0, 6, 6, 6, 6, 6, 6,
4, 2, 2 };
yytabelem yychk[]={
-1000, -1, 266, -2, 259, 263, 257, 258, 267, 262,
263, 264, 265, 261, -2, -2, -1, -2, -2, -2,
-2, -2, 260, 268, -1 };
yytabelem yydef[]={
3, -2, 0, 0, 0, 0, 11, 12, 3, 0,
0, 0, 0, 0, 0, 10, 1, 4, 5, 6,
7, -2, 9, 3, 2 };
yytoktype yytoks[] =
{
"NAME", 257,
"NUMBER", 258,
"LPAREN", 259,
"RPAREN", 260,
"EQUAL", 261,
"PLUS", 262,
"MINUS", 263,
"TIMES", 264,
"DIVIDE", 265,
"IF", 266,
"THEN", 267,
"ELSE", 268,
"LOW", 269,
"UMINUS", 270,
"-unknown-", -1 /* ends search */
};
/* am getting this table in my example,
but it is not present in the studied parser :^( */
char * yyreds[] =
{
"-no such reduction-",
"stmt : IF exp THEN stmt",
"stmt : IF exp THEN stmt ELSE stmt",
"stmt : /* empty */",
"exp : exp PLUS exp",
"exp : exp MINUS exp",
"exp : exp TIMES exp",
"exp : exp DIVIDE exp",
"exp : exp EQUAL exp",
"exp : LPAREN exp RPAREN",
"exp : MINUS exp",
"exp : NAME",
"exp : NUMBER",
};
I am looking to retrieve
stmt : IF exp THEN stmt
| IF exp THEN stmt ELSE stmt
| /*others*/
;
exp : exp PLUS exp
| exp MINUS exp
| exp TIMES exp
| exp DIVIDE exp
| exp EQUAL exp
| LPAREN exp RPAREN
| MINUS exp
| NAME
| NUMBER
;
Edit: I have stripped down the generated parser of my example for clarity, but to help some analysis i have published the whole generated code as a gist. Please not that for some unknown reason there is no yyreds table in the parser I am trying to study / change. I suppose it would not have been fun :^S
An interesting problem. Just from matching the tables to the grammar, it seems that yyr1 and yyr2 give you the "outline" of the rules -- yyr1 is the symbol on the left side of each rule, while yyr2 is 2x the number of symbols on the right side. You also have the names of all the terminals in a convenient table. But the names of the non-terminals are lost.
To figure out which symbols go on the rhs of each rule, you'll need to reconstruct the state machine from the tables, which likely involves reading and understanding the code in the y.tab.c file that actually does the parsing. Some of the tables (looks like yypact, yychk and yydef) are indexed by state number. It seems likely that yyact is indexed by yypact[state] + token. But those are only guesses. You need to look at the parsing code and understand how its using the tables to encode possible shifts, reduces, and gotos.
Once you have the state machine, you can backtrack from the states containing reductions of specific rules through the states that have shifts and gotos of that rule. A shift into a reduction state means the last symbol on the rhs of that rule is the token shifted. A goto into a reduction state means the last symbol on the rhs is symbol for the goto. The second-to-last symbol comes from the shift/goto to the state that does the shift/goto to the reduction state, and so on.
edit
As I surmised, yypact is the 'primary action' for a state. If the value is YYFLAG (-1000), this is a reduce-only state (no shifts). Otherwise it is a potential shift state and yyact[yypact[state] + token] gives you the potential state to shift to. If yypact[state] + token is out of range for the yyact table, or the token doesn't match the entry symbol for that state, then there's no shift on that token.
yychk is the entry symbol for each state -- a positive number means you shift to that state on that token, while a negative means you goto that state on that non-terminal.
yydef is the reduction for that state -- a positive number means reduce that rule, while 0 means no reduction, and -2 means two or more possible reductions. yyexca is the table of reductions for those states with more than one reduction. The pair -1 state means the following entries are for the given state; following pairs of token rule mean that for lookahead token it should reduce rule. A -2 for token is a wildcard (end of the list), while a 0 for the rule means no rule to reduce (an error instead), and -1 means accept the input.
The yypgo table is the gotos for a symbol -- you go to state yyact[yypgo[sym] + state + 1] if that's in range for yyact and yyact[yypgo[sym]] otherwise.
So to reconstruct rules, look at the yydef and yyexca tables to see which states reduce each rule, and go backwards to see how the state is reached.
For example, rule #1. From the yyr1 and yyr2 tables, we know its of the form S1: X X X X -- non-terminal #1 on the lhs and 4 symbols on the rhs. Its reduced in state 16 (from the yydef table, and the accessing symbol for state 16 (from yychk) is -1. So its:
S1: ?? ?? ?? S1
You get into state 16 from yyact[26], and yypgo[1] == 17, so that means the goto is coming from state 8 (26 == yypgo[1] + 8 + 1. The accessing symbol of state 8 is 267 (THEN) so now we have:
S1: ?? ?? THEN S1
You get into state 8 from yyact[6], so the previous state has yypact[state] == -261 which is state 3. yychk[3] == -2, so we have:
S1: ?? S2 THEN S1
You get into state 3 from yyact[24], and yypgo[2] == 24 so any state might goto 3 here. So we're now kind of stuck for this rule; to figure out what the first symbol is, we need to work our way forward from state 0 (the start state) to reconstruct the state machine.
edit
This code will decode the state machine from the table format above and print out all the shift/reduce/goto actions in each state:
#define ALEN(A) (sizeof(A)/sizeof(A[0]))
for (int state = 0; state < ALEN(yypact); state++) {
printf("state %d:\n", state);
for (int i = 0; i < ALEN(yyact); i++) {
int sym = yychk[yyact[i]];
if (sym > 0 && i == yypact[state] + sym)
printf("\ttoken %d shift state %d\n", sym, yyact[i]);
if (sym < 0 && -sym < ALEN(yypgo) &&
(i == yypgo[-sym] || i == yypgo[-sym] + state + 1))
printf("\tsymbol %d goto state %d\n", -sym, yyact[i]); }
if (yydef[state] > 0)
printf("\tdefault reduce rule %d\n", yydef[state]);
if (yydef[state] < 0) {
for (int i = 0; i < ALEN(yyexca); i+= 2) {
if (yyexca[i] == -1 && yyexca[i+1] == state) {
for (int j = i+2; j < ALEN(yyexca) && yyexca[j] != -1; j += 2) {
if (yyexca[j] < 0) printf ("\tdefault ");
else printf("\ttoken %d ", yyexca[j]);
if (yyexca[j+1] < 0) printf("accept\n");
else if(yyexca[j+1] == 0) printf("error\n");
else printf("reduce rule %d\n", yyexca[j+1]); } } } } }
It will produce output like:
state 0:
symbol 1 goto state 1
token 266 shift state 2
symbol 2 goto state 3
default reduce rule 3
state 1:
symbol 1 goto state 1
symbol 2 goto state 3
token 0 accept
default error
state 2:
symbol 1 goto state 1
token 257 shift state 6
token 258 shift state 7
token 259 shift state 4
symbol 2 goto state 3
token 263 shift state 5
state 3:
token 261 shift state 13
token 262 shift state 9
token 263 shift state 10
token 264 shift state 11
token 265 shift state 12
token 267 shift state 8
symbol 1 goto state 1
symbol 2 goto state 3
..etc
which should be helpful for reconstructing the grammar.