I want to split an input parameter inputDetails to unit level. I'm using tokenize for doing this. Here is my code:
Groovy Code:
def inputDetails = "1234-a0-12;1111-b0-34";
def cDesc = inputDetails.tokenize(";");
for (int i=0; i<cDesc.size(); ++i)
{
def cVer = cDesc.get(i);
def cNum = cVer.tokenize("-");
def a = cNum.get(0);
def b = cNum.get(1);
def c = cNum.get(2);
println (" DEBUG : Input details are, ${a} : ${b} : ${c} \n");
}
Output:
DEBUG : Input details are, 1234 : a0 : 12
DEBUG : Input details are, 1111 : b0 : 34
This output is correct and expected. But if I change the first line of Groovy code to following:
def inputDetails = "1234-a0-12;1111-b0";
I get following error message:
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java_util_List$get$6.call(Unknown Source)
at Script1.run(Script1.groovy:9)
How can I fix it to prevent getting IndexOutOfBoundsException while supporting both, 1234-a0-12;1111-b0-34 and 1234-a0-12;1111-b0 inputs?
You can use Groovy's multiple assignment feature to safely grab 3 values from the second tokenization. Consider following example:
def inputDetails = "1234-a0-12;1111-b0-34"
def cDesc = inputDetails.tokenize(";")
cDesc.each { part ->
def (p1, p2, p3) = part.tokenize('-')
println "DEBUG: Input details are, ${p1} : ${p2} : ${p3}"
}
Output:
DEBUG: Input details are, 1234 : a0 : 12
DEBUG: Input details are, 1111 : b0 : 34
The good thing is that this approach prevents IndexOutOfBoundsException or NullPointerException. If we change the first line to
def inputDetails = "1234-a0-12;1111-b0"
the result is:
DEBUG: Input details are, 1234 : a0 : 12
DEBUG: Input details are, 1111 : b0 : null
You can split the string into a 2D list by further splitting on '-':
def inputDetails = "1234-a0-12;1111-b0-34"
def elements = inputDetails.split(';').collect{it.split('-')}
elements is of type List<List<String>>. When printed, it yields:
[[1234, a0, 12], [1111, b0, 34]]
With this, you can afford more flexibility instead of hard-coding array indexes.
And with "1234-a0-12;1111-b0", it's split into [[1234, a0, 12], [1111, b0]]
Related
I'm trying to parse matrices written this way : [[1,2];[3,2];[3,4]]
So, the matrix syntax is of the form: [[A0,0, A0,1, ...]; [A1,0, A1,1, ...];...]
The semicolon is used to separate the rows of a matrix, so it is not present in theassignment of a matrix that has only one row. On the other hand, the comma is usedto separate the columns of a matrix, which on the other hand will not be present in the assignment of a matrix that has only one column.
Now my question is : how can I parse a matrix with a variable size ? I guess with recursive rule in ply yacc, but how? Every attempt lead to a infinite recursion.
This is the error I get :
WARNING: Symbol 'primary' is unreachable
WARNING: Symbol 'expr' is unreachable
ERROR: Infinite recursion detected for symbol 'primary'
ERROR: Infinite recursion detected for symbol 'expr'
When I thry this kind of code:
def p_test(t):
'''primary : constant
| LPAREN expr RPAREN'''
print('yo')
t[0] = 1
def p_expression_matrice(t):
'''expr : primary
| primary '+' primary'''
print('hey')
t[0] = 1
(this i just a first attempt to understand how to write recurion in yacc, not even an answer to my real problem)
This is my lexer :
from global_variables import tokens
import ply.lex as lex
t_PLUS = r'\+'
t_MINUS = r'\-'
t_TIMES = r'\*'
t_DIVIDE = r'\/'
t_MODULO = r'\%'
t_EQUALS = r'\='
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_LBRACK = r'\['
t_RBRACK = r'\]'
t_SEMICOLON = r'\;'
t_COMMA = r'\,'
t_POWER = r'\^'
t_QUESTION = r'\?'
t_NAME = r'[a-zA-Z]{2,}|[a-hj-zA-HJ-Z]' # all words (only letters) except the word 'i' alone
t_COMMAND = r'![\x00-\x7F]*' # all unicode characters after '!'
def t_NUMBER(t):
r'\d+(\.\d+)?'
try:
t.value = int(t.value)
except:
t.value = float(t.value)
return t
def t_IMAGINE(t):
r'i'
t.value = 1j
return t
t_ignore = " \t"
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
lexer = lex.lex()
After multiple tries, I got this :
def p_test3(t):
'''expression : expression SEMICOLON expression'''
t[0] = []
t[0].append(t[1])
t[0].append(t[3])
def p_test2(t):
'''expression : LBRACK expression RBRACK'''
t[0] = t[2]
def p_test(t):
'''expression : expression COMMA NUMBER'''
t[0] = []
try:
for i in t[1]:
t[0].append(i)
except:
t[0].append(t[1])
t[0].append(t[3])
Which alllows me to parse this : [[1,2,3,4,5];[5,4,3,2,1]] to get this : [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]] stored in a array of array.
Next step, I will store many of these in a dict and I'ill keep going, thanks for the support #rici :)
I have the following snip of dxl code,
I would like to copy the object ID with the filter F3 is on. :
I dont know what I am doing wrong it gives me (ID) of all the object.
string Id
int x=0;
int y=0;
Id = o."SourceID"
Filter f0 = hasNoLinks(linkFilterIncoming, "*")
Filter f1=attribute "_TraceTo" == "System"
Filter f2 = attribute "Object Type" == "requirement"
Filter f3 = f1&&f2&&f0
addFilter(m,f3,x,y)
print x ":\t" fullName(module(m)) "\n"
wOutKLHUntraced << Id "\t" fullName(module(m)) "\n"
First, you need to add the statement filtering on after adding the filter, so that the filter is applied. Then the filtered objects will be the only ones visible.
Then, you set "Id" way too early in the script. At line 4, "o" is set to
some object, I don't know which one, but certainly not the result of
your filter. Instead, after the statement filtering on, add statements
Object o = first m // the first object that is now visible
Id = o."SourceID"
My Script is running good, but gives different results : as I am running this script in a for loop for around 30 module :
Am I am setting somewhere wrong filters ?
Stream TbdUntraced;
string s
string d
Object o
string trac
int numReqs = 0;
string IdNum
string untraced
int x=0;
int y=0;
int a =0;
for o in m do
{
ensureInLinkedModulesLoaded(o,S_SATISFIES );
s = o."Object Type"
string Id
string Topic
Topic = o."_Topic"
numReqs++;
Filter f0 = hasNoLinks(linkFilterIncoming, "*")
Filter f1 = contains(attribute "_TraceTo", "TBD", false)
Filter f2 = attribute "Object Type" == "requirement"
Filter f3 = attribute "MMS5-Autoliv_Supplier_Status" == "agreed"
Filter f4 = attribute "MMS5-Autoliv_Supplier_Status" == "partly agreed"
Filter f7 = f0&&f2&&(f3||f4)&&f1
addFilter(m,f7,x,y)
filtering on
d = o."MMS5-Autoliv_OEM_Status"
Id = o."SourceID"
Topic = o."_Topic"
print x ":\t" name(module(m)) "\n"
TbdUntraced << Id "\t" Topic "\t"name(module(m)) "\n"
}
We are trying to use Learning2Search from vowpal-wabbit for NER
We are using ATIS dataset.
In ATIS there are 127 Entities (including Others category)
Training set has 4978 and test has 893 sentences.
How ever when we run it on test set it is mapping everything either class 1(Airline name) or class 2(Airport code)
Which is wired.
We tried another dataset (https://github.com/glample/tagger/tree/master/dataset), same behavior.
Looks like I am not using it the right way. Any pointers will be of great help.
Code snippet :
with open("/tweetsdb/ner/datasets/atis.pkl") as f:
train, test, dicts = cPickle.load(f)
idx2words = {v: k for k, v in dicts['words2idx'].iteritems()}
idx2labels = {v: k for k, v in dicts['labels2idx'].iteritems()}
idx2tables = {v: k for k, v in dicts['tables2idx'].iteritems()}
#Convert the dataset into a format compatible with Vowpal Wabbit
training_set = []
for i in xrange(len(train[0])):
zip_label_ent_idx = zip(train[2][i], train[0][i])
label_ent_actual = [(int(i[0]), idx2words[i[1]]) for i in zip_label_ent_idx]
training_set.append(label_ent_actual)
# Do like wise to get test chunk
class SequenceLabeler(pyvw.SearchTask):
def __init__(self, vw, sch, num_actions):
pyvw.SearchTask.__init__(self, vw, sch, num_actions)
sch.set_options( sch.AUTO_HAMMING_LOSS | sch.AUTO_CONDITION_FEATURES )
def _run(self, sentence):
output = []
for n in range(len(sentence)):
pos,word = sentence[n]
with self.vw.example({'w': [word]}) as ex:
pred = self.sch.predict(examples=ex, my_tag=n+1, oracle=pos, condition=[(n,'p'), (n-1, 'q')])
output.append(pred)
return output
vw = pyvw.vw("--search 3 --search_task hook --ring_size 1024")
Code for training the model:
#Training
sequenceLabeler = vw.init_search_task(SequenceLabeler)
for i in xrange(3):
sequenceLabeler.learn(training_set[:10])
Code for Prediction:
pred = []
for i in random.sample(xrange(len(test_set)), 10):
test_example = [ (999, word[1]) for word in test_set[i] ]
test_labels = [ label[0] for label in test_set[i] ]
print 'input sentence:', ' '.join([word[1] for word in test_set[i]])
print 'actual labels:', ' '.join([str(label) for label in test_labels])
print 'predicted labels:', ' '.join([str(pred) for pred in sequenceLabeler.predict(test_example)])
To see the full code, pls refer to this notebook:
https://github.com/nsanthanam/ner/blob/master/vowpal_wabbit_atis.ipynb
I am also new to this algorithm, but did some pilot studies recently.
To your problem, the answer is that you set a wrong parameter in
vw = pyvw.vw("--search 3 --search_task hook --ring_size 1024")
Here, the search should be set as '127', and in this way, vw will use your 127 tags.
vw = pyvw.vw("--search 127 --search_task hook --ring_size 1024")
Also, my feeling is that vw doesn't work really well with so many tags. I might be wrong, please let me know your result :)
I'm currently trying to define registers of architecture I work with via TableGen. There're supposed to be 2 computation blocks XR and YR and a pseudoblock XYR referring to them. For example XYR3 is a vector pseudoregister embracing X3 and Y3.
// Classes for registers of my namespace.
class TigerSHARCReg<bits<5> num, string n, list<string> altNames = []> :
Register<n, altNames>
{
field bits<5> Num = num;
let Namespace = "TigerSHARC";
}
class TigerSHARCVReg<bits<5> num, string n, list<TigerSHARCReg> subregs, list<SubRegIndex> indices = []> :
RegisterWithSubRegs<n, subregs>
{
field bits<5> Num = num;
let Namespace = "TigerSHARC";
let SubRegIndices = indices;
}
class TigerSHARCSubRegIndex<int size, int offset> : SubRegIndex<size, offset>
{
let Namespace = "TigerSHARC";
}
// === === ===
// XR registers and XR register class
foreach num = 0-31 in
def XR#num : TigerSHARCReg<num, "XR"#num>;
def XR : RegisterClass<"TigerSHARC", [i32, f32], 32,
(sequence "XR%u", 0, 31)>;
// YR registers and YR register class
foreach num = 0-31 in
def YR#n : TigerSHARCReg<num, "YR"#num>;
def YR : RegisterClass<"TigerSHARC", [i32, f32], 32,
(sequence "YR%u", 0, 31)>;
// There only two subregisters in each XYR
def XYRsub0 : TigerSHARCSubRegIndex<1, 0>;
def XYRsub1 : TigerSHARCSubRegIndex<1, 0>;
// XYR registers and XYR register class
foreach num = 0-31 in
def XYR#num : TigerSHARCVReg<0, "XYR0", [XR#num, YR#num], [XYRsub0, XYRsub1]>;
def XYR : RegisterClass<"TigerSHARC", [v2i32], 32, (sequence "XYR%u", 0, 31)>;
The problem is in theese lines:
foreach num = 0-31 in
def XYR#num : TigerSHARCVReg<0, "XYR0", [XR#num, YR#num], [XYRsub0, XYRsub1]>;
"#" concats only strings so [XR#num, YR#num] is incorrect notation. I've tried XR[num] but it doesn't seem to work either.
Is there a way to refer to an existing register in a loop?
Also, am I even doing it right?
Looks like that instead of [XR#num, YR#num] one should use [!cast< MyTypeReg >("XR"#n), !cast< MyTypeReg >("YR"#n)]. !cast(a) looks in a symbol table string a.
I need to parse a text (output from a svn command) in order to retrieve a number (svn revision).
This is my code. Note that I need to retrieve all the output stream as a text to do other operations.
def proc = cmdLine.execute() // Call *execute* on the strin
proc.waitFor() // Wait for the command to finish
def output = proc.in.text
//other stuff happening here
output.eachLine {
line ->
def revisionPrefix = "Last Changed Rev: "
if (line.startsWith(revisionPrefix)) res = new Integer(line.substring(revisionPrefix.length()).trim())
}
This code is working fine, but since I'm still a novice in Groovy, I'm wondering if there were a better idiomatic way to avoid the ugly if...
Example of svn output (but of course the problem is more general)
Path: .
Working Copy Root Path: /svn
URL: svn+ssh://svn.company.com/opt/svnserve/repos/project/trunk
Repository Root: svn+ssh://svn.company.com/opt/svnserve/repos
Repository UUID: 516c549e-805d-4d3d-bafa-98aea39579ae
Revision: 25447
Node Kind: directory
Schedule: normal
Last Changed Author: ubi
Last Changed Rev: 25362
Last Changed Date: 2012-11-22 10:27:00 +0000 (Thu, 22 Nov 2012)
I've got inspiration from the answer below and I solved using find(). My solution is:
def revisionPrefix = "Last Changed Rev: "
def line = output.readLines().find { line -> line.startsWith(revisionPrefix) }
def res = new Integer(line?.substring(revisionPrefix.length())?.trim()?:"0")
3 lines, no if, very clean
One possible alternative is:
def output = cmdLine.execute().text
Integer res = output.readLines().findResult { line ->
(line =~ /^Last Changed Rev: (\d+)$/).with { m ->
if( m.matches() ) {
m[ 0 ][ 1 ] as Integer
}
}
}
Not sure it's better or not. I'm sure others will have different alternatives
Edit:
Also, beware of using proc.text. if your proc outputs a lot of stuff, then you could end up blocking when the inputstream gets full...
Here is a heavily commented alternative, using consumeProcessOutput:
// Run the command
String output = cmdLine.execute().with { proc ->
// Then, with a StringWriter
new StringWriter().with { sw ->
// Consume the output of the process
proc.consumeProcessOutput( sw, System.err )
// Make sure we worked
assert proc.waitFor() == 0
// Return the output (goes into `output` var)
sw.toString()
}
}
// Extract the version from by looking through all the lines
Integer version = output.readLines().findResult { line ->
// Pass the line through a regular expression
(line =~ /Last Changed Rev: (\d+)/).with { m ->
// And if it matches
if( m.matches() ) {
// Return the \d+ part as an Integer
m[ 0 ][ 1 ] as Integer
}
}
}