Extracting information from ParseResults in pyparsing

Extracting information from ParseResults in pyparsing - parsing

I'm discovering the pyparsing module which is really cool. I'm trying to parse a set of simple boolean expressions in which some identifiers (foo/bar/bla and zoo) are compared to numeric values. The parser is used to check that user expression is correct but I would like also to get the name of the identifiers used in the expression (i.e which combination of foo/bar/bla and zoo was used). I can't get a simple way to do it.
In the example below foo and bar are used in the expression. But how can I get this information ?
Best
from pyparsing import oneOf
from pyparsing import Group
from pyparsing import Regex
from pyparsing import operatorPrecedence
from pyparsing import opAssoc
from pyparsing import Literal
from pyparsing import Word
from pyparsing import nums
from pyparsing import Combine
from pyparsing import Optional
from pyparsing import CaselessLiteral
from pyparsing import alphanums
from pyparsing import quotedString
from pyparsing import Forward
lparen = Literal("(")
rparen = Literal(")")
and_operator = CaselessLiteral("and")
or_operator = CaselessLiteral("or")
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
point = Literal('.')
e = CaselessLiteral('E')
plusorminus = Literal('+') | Literal('-')
number = Word(nums)
integer = Combine( Optional(plusorminus) + number )
float_nb = Combine( integer +
Optional( point + Optional(number) ) +
Optional( e + integer ))
value = float_nb
value.resultsName = 'value'
identifier = oneOf(['foo','bar', 'bla', 'zoo'], caseless=False)
identifier.resultsName = 'key'
group_1 = Group(identifier + comparison_operator + value)
group_2 = Group(value + comparison_operator + identifier)
comparison = group_1 | group_2
boolean_expr = operatorPrecedence(
comparison,
[(and_operator, 2, opAssoc.LEFT),
(or_operator, 2, opAssoc.LEFT)])
boolean_expr_par = "(" + boolean_expr + ")"
expression = Forward()
expression << boolean_expr | boolean_expr_par
exp = expression.parseString('2.5 > foo and (3 < bar or (foo > 10 and bar < 3)) ' , parseAll=True)
# Now how can I get the 'identifiers used in exp' ?

I had a similar problem.
I used the 'setParseAction' method in your 'identifier' parser to set a function recording each occurrence of the token matching 'identifier'. Then I print the recorded items:
I declare :
idSet = set()
def recordID(tokens):
idSet.add(tokens[0])
return
I modify your 'identifier' parser as follows:
identifier = oneOf(['foo','bar', 'bla', 'zoo'], caseless=False).setParseAction(recordID)
At the end of you script I print 'idSet':
exp = expression.parseString('2.5 > foo and (3 < bar or (foo > 10 and bar < 3)) ' , parseAll=True)
print(idSet)
It gives the following outcome:
{'foo', 'bar'}

Related

Is it possible to dump the EBNF/BNF grammar table of a pyparsing object?

Preface: this may be an stupid uniformed question.
I have a grammar I wrote with the pyparsing library (and the help of stack-overflow posts) that parses nested expressions with parenthesis, curly, and square brackets. I'm curious what productions in a grammar table would look like. I was wondering if there was a way to automatically generate this for an arbitrary pyparsing context free grammar.
For reference the pyparsing grammer is defined here:
def parse_nestings(string, only_curl=False):
r"""
References:
http://stackoverflow.com/questions/4801403/pyparsing-nested-mutiple-opener-clo
CommandLine:
python -m utool.util_gridsearch parse_nestings:1 --show
Example:
>>> from utool.util_gridsearch import * # NOQA
>>> import utool as ut
>>> string = r'lambda u: sign(u) * abs(u)**3.0 * greater(u, 0)'
>>> parsed_blocks = parse_nestings(string)
>>> recombined = recombine_nestings(parsed_blocks)
>>> print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
>>> print('recombined = %r' % (recombined,))
>>> print('orig = %r' % (string,))
PARSED_BLOCKS = [
('nonNested', 'lambda u: sign'),
('paren', [('ITEM', '('), ('nonNested', 'u'), ('ITEM', ')')]),
('nonNested', '* abs'),
('paren', [('ITEM', '('), ('nonNested', 'u'), ('ITEM', ')')]),
('nonNested', '**3.0 * greater'),
('paren', [('ITEM', '('), ('nonNested', 'u, 0'), ('ITEM', ')')]),
]
Example:
>>> from utool.util_gridsearch import * # NOQA
>>> import utool as ut
>>> string = r'\chapter{Identification \textbf{foobar} workflow}\label{chap:application}'
>>> parsed_blocks = parse_nestings(string)
>>> print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
PARSED_BLOCKS = [
('nonNested', '\\chapter'),
('curl', [('ITEM', '{'), ('nonNested', 'Identification \\textbf'), ('curl', [('ITEM', '{'), ('nonNested', 'foobar'), ('ITEM', '}')]), ('nonNested', 'workflow'), ('ITEM', '}')]),
('nonNested', '\\label'),
('curl', [('ITEM', '{'), ('nonNested', 'chap:application'), ('ITEM', '}')]),
]
"""
import utool as ut # NOQA
import pyparsing as pp
def as_tagged(parent, doctag=None):
"""Returns the parse results as XML. Tags are created for tokens and lists that have defined results names."""
namedItems = dict((v[1], k) for (k, vlist) in parent._ParseResults__tokdict.items()
for v in vlist)
# collapse out indents if formatting is not desired
parentTag = None
if doctag is not None:
parentTag = doctag
else:
if parent._ParseResults__name:
parentTag = parent._ParseResults__name
if not parentTag:
parentTag = "ITEM"
out = []
for i, res in enumerate(parent._ParseResults__toklist):
if isinstance(res, pp.ParseResults):
if i in namedItems:
child = as_tagged(res, namedItems[i])
else:
child = as_tagged(res, None)
out.append(child)
else:
# individual token, see if there is a name for it
resTag = None
if i in namedItems:
resTag = namedItems[i]
if not resTag:
resTag = "ITEM"
child = (resTag, pp._ustr(res))
out += [child]
return (parentTag, out)
def combine_nested(opener, closer, content, name=None):
r"""
opener, closer, content = '(', ')', nest_body
"""
import utool as ut # NOQA
ret1 = pp.Forward()
_NEST = ut.identity
#_NEST = pp.Suppress
opener_ = _NEST(opener)
closer_ = _NEST(closer)
group = pp.Group(opener_ + pp.ZeroOrMore(content) + closer_)
ret2 = ret1 << group
if ret2 is None:
ret2 = ret1
else:
pass
#raise AssertionError('Weird pyparsing behavior. Comment this line if encountered. pp.__version__ = %r' % (pp.__version__,))
if name is None:
ret3 = ret2
else:
ret3 = ret2.setResultsName(name)
assert ret3 is not None, 'cannot have a None return'
return ret3
# Current Best Grammar
nest_body = pp.Forward()
nestedParens = combine_nested('(', ')', content=nest_body, name='paren')
nestedBrackets = combine_nested('[', ']', content=nest_body, name='brak')
nestedCurlies = combine_nested('{', '}', content=nest_body, name='curl')
nonBracePrintables = ''.join(c for c in pp.printables if c not in '(){}[]') + ' '
nonNested = pp.Word(nonBracePrintables).setResultsName('nonNested')
nonNested = nonNested.leaveWhitespace()
# if with_curl and not with_paren and not with_brak:
if only_curl:
# TODO figure out how to chain |
nest_body << (nonNested | nestedCurlies)
else:
nest_body << (nonNested | nestedParens | nestedBrackets | nestedCurlies)
nest_body = nest_body.leaveWhitespace()
parser = pp.ZeroOrMore(nest_body)
debug_ = ut.VERBOSE
if len(string) > 0:
tokens = parser.parseString(string)
if debug_:
print('string = %r' % (string,))
print('tokens List: ' + ut.repr3(tokens.asList()))
print('tokens XML: ' + tokens.asXML())
parsed_blocks = as_tagged(tokens)[1]
if debug_:
print('PARSED_BLOCKS = ' + ut.repr3(parsed_blocks, nl=1))
else:
parsed_blocks = []
return parsed_blocks

How can I get the length of a protein chain from a PDB file with Biopython?

I have tried it this way first:
for model in structure:
for residue in model.get_residues():
if PDB.is_aa(residue):
x += 1
and then that way:
len(structure[0][chain])
But none of them seem to work...

Your code should work and give you the correct results.
from Bio import PDB
parser = PDB.PDBParser()
pdb1 ='./1bfg.pdb'
structure = parser.get_structure("1bfg", pdb1)
model = structure[0]
res_no = 0
non_resi = 0
for model in structure:
for chain in model:
for r in chain.get_residues():
if r.id[0] == ' ':
res_no +=1
else:
non_resi +=1
print ("Residues: %i" % (res_no))
print ("Other: %i" % (non_resi))
res_no2 = 0
non_resi2 = 0
for model in structure:
for residue in model.get_residues():
if PDB.is_aa(residue):
res_no2 += 1
else:
non_resi2 += 1
print ("Residues2: %i" % (res_no2))
print ("Other2: %i" % (non_resi2))
Output:
Residues: 126
Other: 99
Residues2: 126
Other2: 99
Your statement
print (len(structure[0]['A']))
gives you the sum (225) of all residues, in this case all amino acids and water atoms.
The numbers seem to be correct when compared to manual inspection using PyMol.
What is the specific error message you are getting or the output you are expecting? Any specific PDB file?
Since the PDB file is mostly used to store the coordinates of the resolved atoms, it is not always possible to get the full structure. Another approach would be use to the cif files.
from Bio import PDB
parser = PDB.PDBParser()
pdb1 ='./1bfg.cif'
m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)
if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
print ('Full structure:')
full_structure = (m['_entity_poly.pdbx_seq_one_letter_code'])
print (full_structure)
print (len(full_structure))
Output:
Full structure:
PALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGVSANRYLAMKEDGRLLASKSVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAILFLPMSAKS
146
For multiple chains:
from Bio import PDB
parser = PDB.PDBParser()
pdb1 ='./4hlu.cif'
m = PDB.MMCIF2Dict.MMCIF2Dict(pdb1)
if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
full_structure = m['_entity_poly.pdbx_seq_one_letter_code']
chains = m['_entity_poly.pdbx_strand_id']
for c in chains:
print('Chain %s' % (c))
print('Sequence: %s' % (full_structure[chains.index(c)]))

It's just:
from Bio.PDB import PDBParser
from Bio import PDB
pdb = PDBParser().get_structure("1bfg", "1bfg.pdb")
for chain in pdb.get_chains():
print(len([_ for _ in chain.get_residues() if PDB.is_aa(_)]))

I appreciated Peters' answer, but I also realized the res.id[0] == " " is more robust (i.e. HIE). PDB.is_aa() cannot detect HIE is an amino acid while HIE is ε-nitrogen protonated histidine. So I recommend:
from Bio import PDB
parser = PDB.PDBParser()
pdb1 ='./1bfg.pdb'
structure = parser.get_structure("1bfg", pdb)
model = structure[0]
res_no = 0
non_resi = 0
for model in structure:
for chain in model:
for r in chain.get_residues():
if r.id[0] == ' ':
res_no +=1
else:
non_resi +=1
print ("Residues: %i" % (res_no))
print ("Other: %i" % (non_resi))

I think you would actually want to do something like
m = Bio.PDB.MMCIF2Dict.MMCIF2Dict(pdb_cif_file)
if '_entity_poly.pdbx_seq_one_letter_code' in m.keys():
full_structure = m['_entity_poly.pdbx_seq_one_letter_code']
chains = m['_entity_poly.pdbx_strand_id']
for c in chains:
for ci in c.split(','):
print('Chain %s' % (ci))
print('Sequence: %s' % (full_structure[chains.index(c)]))

Parse from file to dictionary in correct order, in Python

I've written some code to parse an EMBL file and dump specific regions of the file into a dictionary.
The keys of the dictionary correlate to the label of a specific region that I want to capture and each key's value is the region itself.
I have then created another function to write the contents of the dictionary to a text file.
However, I have found that the text file contains the information in a different order to that found in the original EMBL file.
I can't figure out why it is doing this - is it because dictionaries are unordered? Is there any way around it?
from Bio import SeqIO
s6633 = SeqIO.read("6633_seq.embl", "embl")
def make_dict_realgenes(x):
dict = {}
for i in range(len(x.features)):
if x.features[i].type == 'CDS':
if 'hypothetical' not in x.features[i].qualifiers['product'][0]:
try:
if x.features[i].location.strand == -1:
x1 = x.features[i].location.end
y1 = x1 + 30
dict[str(x.features[i].qualifiers['product'][0])] =\
str(x[x1:y1].seq.reverse_complement())
else:
x2 = x.features[i].location.start
y2 = x2 - 30
dict[x.features[i].qualifiers['product'][0]] =\
str(x[y2:x2].seq)
except KeyError:
if x.features[i].location.strand == -1:
x1 = x.features[i].location.end
y1 = x1 + 30
dict[str(x.features[i].qualifiers['translation'][0])] =\
str(x[x1:y1].seq.reverse_complement())
else:
x2 = x.features[i].location.start
y2 = x2 - 30
dict[x.features[i].qualifiers['translation'][0]] =\
str(x[y2:x2].seq)
return dict
def rbs_file(dict):
list = []
c = 0
for k, v in dict.iteritems():
list.append(">" + k + " " + str(c) + "\n" + v + "\n")
c = c + 1
f = open("out.txt", "w")
a = 0
for i in list:
f.write(i)
a = a + 1
f.close()

To preserve order in a dictionary, use an OrderedDict from collections. Try Changing the top of your code to this:
from collections import OrderedDict
from Bio import SeqIO
s6633 = SeqIO.read("6633_seq.embl", "embl")
def make_dict_realgenes(x):
dict = OrderedDict()
...
Also, I would advise against overwriting the builtin 'dict' if you can easily rename it.

I slightly refactored your code, and I suggest to write the output as is produced while parsing the file, instead of relaying in OrderedDicts.
from Bio import SeqIO
output = open("out.txt", "w")
for seq in SeqIO.parse("CP001187.embl", "embl"):
for feature in seq.features:
if feature.type == "CDS":
qualifier = (feature.qualifiers.get("product") or
feature.qualifiers.get("translation"))[0]
if "hypothetical" not in qualifier:
if feature.location.strand == -1:
x1 = feature.location.end
x2 = x1 + 30
sequence = seq[x1:x2].seq.reverse_complement()
else:
x1 = feature.location.start
x2 = x1 - 30
sequence = seq[x2:x1].seq
output.write(">" + qualifier + "\n")
output.write(str(sequence) + "\n")
# You can always insert here to the OrderedDict anyway, e.g.
# d[qualifier] = str(sequence)
output.close()
In python only rarely for i in range(len(anything)) is the way to go.
There is also a cleaner way to output your sequences using Biopython. Use a list to append the Seqs, instead of a dict or OrderedDict:
from Bio.SeqRecord import SeqRecord
my_seqs = []
# Each time you generate a sequence, instead of writing to a file
# or inserting in dict, do this:
my_seqs.append(SeqRecord(sequence, id=qualifier, description=""))
# Now you have the my_seqs, they can be writen in a single line:
SeqIO.write(my_seqs, "output.fas", "fasta")

Xtext grammar error "Decision can match input ... using multiple alternatives: 1, 3, 4, 5"

I got stuck with my xtext grammar definition. Basically I like to define multiple parameters for a component. The component should contain at least one parameter definition paramA OR paramB OR paramC OR (paramA AND paramB) OR (paramB AND paramC) OR (paramA AND paramB AND paramC).
Overall these are 6 cases, as you can see in my grammar definition:
Component:
'Define available parameters:' (
(newParamA = ParamA | newParamB = ParamB | newParamC = ParamC)
| (newParamA = ParamA & newParamB = ParamB)
| (newParamA = ParamA & newParamC = ParamC)
| (newParamB = ParamB & newParamC = ParamC)
| (newParamA = ParamA & newParamB = ParamB & newParamC = ParamC)
)
;
ParamA: ('paramA = ' paramA=Integer ';');
ParamB: ('paramB = ' paramB=Integer ';');
ParamC: ('paramC = ' paramC=Integer ';');
// Datatype
Integer returns ecore::EIntegerObject: '-'? INT;
Here is what is working when I reduce my grammar to use (newParamA = ParamA | newParamB = ParamB | newParamC = ParamC) only, means without the other cases in the first code snippet:
Define available parameters:
paramA = 1;
...
Define available parameters:
paramB = 2;
...
Define available parameters:
paramC = 3;
But I like to be able to define multiple available params in my dsl, e.g.
Define available parameters:
paramA = 1; paramB = 2;
...
Define available parameters:
paramB = 2; paramC = 3;
...
Define available parameters:
paramA = 1; paramB = 2; paramC = 3;
Any idea how to resolve that issue? Hope you can help me, I'ld appreciate any help!
This is the error I get when generating the grammar from code snippet #1:
warning(200): ../my.packagename/src-gen/my/packagename/projectname/parser/antlr/internal/InternalMyDSL.g:722:1: Decision can match input such as "'paramC = ' '-' RULE_INT ';'" using multiple alternatives: 1, 3, 4, 5
As a result, alternative(s) 3,5,4 were disabled for that input
Semantic predicates were present but were hidden by actions.
...
4514 [main] ERROR enerator.CompositeGeneratorFragment - java.io.FileNotFoundException: ..\my.packagename.ui\src-gen\my\packagename\projectname\ui\contentassist\antlr\internal\InternalMyDSLParser.java (The system cannot find the file specified)
org.eclipse.emf.common.util.WrappedException: java.io.FileNotFoundException: ..\my.packagename.ui\src-gen\my\packagename\projectname\ui\contentassist\antlr\internal\InternalMyDSLParser.java (The system cannot find the file specified)
at org.eclipse.xtext.util.Files.readFileIntoString(Files.java:129)
at org.eclipse.xtext.generator.parser.antlr.AbstractAntlrGeneratorFragment.simplifyUnorderedGroupPredicates(AbstractAntlrGeneratorFragment.java:130)
at org.eclipse.xtext.generator.parser.antlr.AbstractAntlrGeneratorFragment.simplifyUnorderedGroupPredicatesIfRequired(AbstractAntlrGeneratorFragment.java:118)
at org.eclipse.xtext.generator.parser.antlr.XtextAntlrUiGeneratorFragment.generate(XtextAntlrUiGeneratorFragment.java:86)
Here is a workaround I've tried (which works) but it's not a solution because the keywords within the language are changing to avoid the parser error:
('newParamA1 = ' paramA1=Integer ';')
| ('newParamB1 = ' paramB1=Integer ';')
| ('newParamC1 = ' paramC1=Integer ';')
| (('newParamA2 = ' paramA2=Integer ';') & ('newParamB2 = ' paramB2=Integer ';'))
| (('newParamA3 = ' paramA3=Integer ';') & ('newParamC2 = ' paramC2=Integer ';'))
| (('newParamB3 = ' paramB3=Integer ';') & ('newParamC3 = ' paramC3=Integer ';'))
| (('newParamA4 = ' paramA4=Integer ';') & ('newParamB4 = ' paramB4=Integer ';') & ('newParamC4 = ' paramC4=Integer ';'))

I think what you really want is a validation that ensures that at least one parameter is given on the semantic level rather than on the syntactic level. This will greatly simplify your grammar, e.g you could just use
(newParamA = ParamA)? & (newParamB = ParamB)? & (newParamC = ParamC)?
(parenth. added for clarity)
Also note that it's generally a good idea to avoid spaces in keywords. You should prefer 'paramA' '=' over 'paramA ='. This will greatly improve the error handling in the lexer / parser.

What you want to do is something like this:
You want a simple grammar (as Sebastian described it):
(newParamA = ParamA)? & (newParamB = ParamB)? & (newParamC = ParamC)?
To make sure that at least one parameter is required, you can write your own validator, which could look like this:
class MyDSLValidator extends AbstractMyDSLValidator {
#Check
def void atLeastOneParameter(Component component) {
if (component.newParamA == null && component.newParamB == null && component.newParamC == null) {
error('requires at least one parameter definition', MyDSLPackage.Literals.COMPONENT__PARAMA);
}
}
}

How make setParseAction work when chain several definitions (like FLOAT derive from INT) building AST

I'm struggling to make work setParseAction when inherit definitions (I don't know how express this in english, so the example):
from __future__ import division
from decimal import Decimal
from pyparsing import Word, alphas, ParseException, Literal, CaselessLiteral \
, Combine, Optional, nums, Or, Forward, ZeroOrMore, StringEnd, alphanums, Suppress \
, sglQuotedString, dblQuotedString, Group \
, restOfLine, Regex, stringEnd
class ASTNode(object):
def __init__(self, tokens):
self.tokens = tokens
self.assignFields()
def __str__(self):
return self.__class__.__name__ + ':' + str(self.__dict__)
__repr__ = __str__
class ConstantNode(ASTNode):
def assignFields(self):
#print " ", self.tokens
self.setValue(self.tokens[0])
def transform(self, value):
return value
def setValue(self, value):
self.constant = self.transform(value)
del self.tokens
class StringNode(ConstantNode):
pass
class BoolNode(ConstantNode):
def transform(self, value):
return bool(value)
class IntNode(ConstantNode):
def transform(self, value):
return int(value)
class FloatNode(ConstantNode):
def transform(self, value):
print value
return Decimal(value)
class AssignmentNode(ASTNode):
def assignFields(self):
#print self.tokens
self.lhs, self.rhs = self.tokens
del self.tokens
LPAR, RPAR, LBRACK, RBRACK, LBRACE, RBRACE, SEMI, COMMA = map(Suppress, "()[]{};,")
PLUS = Literal("+")
MINUS = Literal("-")
MULT = Literal("*")
DIV = Literal("/")
ASSIGN = Literal("=")
POINT = Literal('.')
TRUE = Literal('True')
FALSE = Literal('False')
SEP = Literal(':').suppress()
NAME = Word(alphas + '_?', alphanums + '_?')
TYPE = SEP + NAME
COMMENT = "#" + restOfLine
BOOLEANS = TRUE | FALSE
BOOLEANS.setParseAction(BoolNode)
EXPR = Forward()
ADDOP = PLUS | MINUS
MULTOP = MULT | DIV
PLUSORMINUS = PLUS | MINUS
#Strings
STR = dblQuotedString.setParseAction(ConstantNode) | sglQuotedString.setParseAction(ConstantNode)
STRINGS = STR
#Numbers
NUMBER = Word(nums)
INTEGER = Combine(Optional(PLUSORMINUS) + NUMBER)
FLOATNUMBER = Combine(INTEGER.copy() +
Optional(POINT + Optional(NUMBER)) +
Optional(INTEGER.copy())
)
MONEY = Combine(FLOATNUMBER.copy() + Word("$").suppress())
TYPED_FLOATNUMBER = Combine(FLOATNUMBER + Word(alphas))
INTEGER.setParseAction(IntNode)
FLOATNUMBER.setParseAction(FloatNode)
NUMBERS = MONEY | TYPED_FLOATNUMBER | FLOATNUMBER
TEST_GRAMMAR = """
#Single values
True
False
1 #Int32
1.0 #Float
1$ #MONEY
25.3mt #Typed number"""
Everything parse, but the Boolean and Int node are not called, only the float.
['True']
['False']
1
[FloatNode:{'constant': Decimal('1')}]
1.0
[FloatNode:{'constant': Decimal('1.0')}]
['1']
['25.3mt']
[ConstantNode:{'constant': "'hello world'"}]
[ConstantNode:{'constant': '"hello world"'}]
['2002-08-10']
['100000']
['2002-08-10-100000']
1
[AssignmentNode:{'rhs': FloatNode:{'constant': Decimal('1')}, 'lhs': 'x'}]
1.0
[AssignmentNode:{'rhs': FloatNode:{'constant': Decimal('1.0')}, 'lhs': 'x'}]
[AssignmentNode:{'rhs': '1', 'lhs': 'x'}]
[AssignmentNode:{'rhs': '12.2mt', 'lhs': 'x'}]
I understand that setParseAction is tied to the definition of a partial grammar. However, I find it don't "clear" if chain something like FLOATNUMBER because is based on INTEGER.

You left out the code that does the actual parsing, so I added this code:
for line in TEST_GRAMMAR.splitlines():
if not line or line[0] == '#': continue
print (BOOLEANS^INTEGER^NUMBERS).parseString(line)
Giving this output:
[BoolNode:{'constant': True}]
[BoolNode:{'constant': False}]
[IntNode:{'constant': 1}]
1.0
[FloatNode:{'constant': Decimal('1.0')}]
['1']
['25.3mt']
I also had to fix a minor bug in BoolNode:
class BoolNode(ConstantNode):
def transform(self, value):
return value.lower()=='true' #bool(value)
In transform, value is going to be one of the strings "True" or "False", but the bool value of both of these strings is True - only the empty string "" will return a bool of False.
One of the problems you have is that your definition of FLOAT will also match an INTEGER, so you might consider redefining FLOAT to require a leading, trailing, or embedded decimal point. I got around this by using '^' instead of '|' as the "or" operator. '^' will test all the given alternatives and select the longest match, '|' will short-circuit and select the first match.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Extracting information from ParseResults in pyparsing - parsing

Related

Is it possible to dump the EBNF/BNF grammar table of a pyparsing object?

How can I get the length of a protein chain from a PDB file with Biopython?

Parse from file to dictionary in correct order, in Python

Xtext grammar error "Decision can match input ... using multiple alternatives: 1, 3, 4, 5"

How make setParseAction work when chain several definitions (like FLOAT derive from INT) building AST

Categories

Resources