How to get JJ and NN (adjective and Noun) from the triples generated StanfordDependencyParser with NLTK? - parsing

i got triples using the following code, but i want to get nouns and adjective from tripples, i tried alot but failed, new to NLTK and python, any help ?
from nltk.parse.stanford import StanfordDependencyParser
dp_prsr= StanfordDependencyParser('C:\Python34\stanford-parser-full-2015-04-20\stanford-parser.jar','C:\Python34\stanford-parser-full-2015-04-20\stanford-parser-3.5.2-models.jar', model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
word=[]
s='bit is good university'
sentence = dp_prsr.raw_parse(s)
for line in sentence:
print(list(line.triples()))
[(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
i want to get university and good and BIT and universityi tried the following but couldnt work.
for line in sentence:
if (list(line.triples)).__contains__() == 'JJ':
word.append(list(line.triples()))
print(word)
but i get empty array... please any help.

Linguistically
What you're looking out for when you look for triplets that contains a JJ and an NN is usually a Noun phrase NP in a context-free grammar.
In dependency grammar, what you're looking for is a triplet that contains the the JJ and NN POS tags in the arguments. Most specifically, when you're for a constituent / branch that contains an adjectival modified Noun. From the StanfordDepdencyParser output, you need to look for the predicate amod. (If you're confused with what's explained above it is advisable to read up on Dependency grammar before proceeding, see https://en.wikipedia.org/wiki/Dependency_grammar.
Note that the parser outputs the triplets, (arg1, pred, arg2), where the argument 2 (arg2) depends on argument 1 (arg1) through the predicate (pred) relation; i.e. arg1 governs arg2 (see, https://en.wikipedia.org/wiki/Government_(linguistics))
Pythonically
Now to the code part of the answer. You want to iterate through a list of tuples (i.e. triplets) so the easiest solution is to specifically assign variables to the tuples as you iterate, then check for the conditions you need see Find an element in a list of tuples
>>> x = [(('university', 'NN'), 'nsubj', ('bit', 'NN')), (('university', 'NN'), 'cop', ('is', 'VBZ')), (('university', 'NN'), 'amod', ('good', 'JJ'))]
>>> for arg1, pred, arg2 in x:
... word1, pos1 = arg1
... word2, pos2 = arg2
... if pos1.startswith('NN') and pos2.startswith('JJ') and pred == 'amod':
... print ((arg1, pred, arg2))
...
(('university', 'NN'), 'amod', ('good', 'JJ'))

Related

Find sequence IDs of DNA subsequences in DNA-sequences from FASTA-file

I want to make a function that reads a FASTA-file with DNA sequences(possibly ambiguous) and inputs a subsequence that returns all sequence IDs of the sequences that contain the given subsequence.
To make the script more efficient, I tried to use nt_search to make give all possibilities of the ambiguous sequence from the FASTA. This seemed more efficient than producing all unambiguous possibilities, especially for larger sequences an FASTA-files.
Right now, I'm struggling to see how I can check whether the subsequence is part of the output given bynt_search.
I want to see if eg 'CGC' (input subsequence) is part of the possibilities given by nt_search: ['TA[GATC][AT][GT]GCGGT'] and return all sequence IDs of sequences for which this is true.
What I have so far:
def bonus_subsequence(file, unambiguous_sequence):
seq_records = SeqIO.parse(file,'fasta', alphabet =ambiguous_dna)
resultListOfSeqIds = []
print(f'Unambiguous sequence {unambiguous_sequence} could be a subsequence of:')
for record in seq_records:
d = Seq.IUPAC.IUPACData.ambiguous_dna_values
couldBeSubSequence = False;
if unambiguous_sequence in nt_search(unambiguous_sequence,record):
couldBeSubSequence = True;
if couldBeSubSequence == True:
print(f'{record.id}')
resultListOfSeqIds.append({record.id})
In a second phase, I want to be able to also use this for ambiguous subsequences, but I'd be more than happy with help on this first question, thanks in advance!
I don't know if I understood You well but you can try this:
Example fasta file:
>seq1
ATGTACGTACGTACNNNNACTG
>seq2
NNNATCGTAGTCANNA
>seq3
NNNNATGNNN
Code:
from Bio import SeqIO
from Bio import SeqUtils
from Bio.Alphabet.IUPAC import ambiguous_dna
if __name__ == '__main__':
sub_seq = input('Enter a subsequence: ')
results = []
with open('test.fasta', 'r') as fh:
for seq in SeqIO.parse(fh, 'fasta', alphabet=ambiguous_dna):
if sub_seq in seq:
results.append((seq.name))
print(results, sep='\n')
Results (console):
Enter a subsequence: ATG
Results:
seq1
seq3
Enter a subsequence: NNNA
Results:
seq1
seq2
seq3

Z3Py: add constraint of two vectors inequality

(Subj)
Here is my attempt:
#!/usr/bin/python
from z3 import *
s=Solver()
veclen=3
tmp_false=BoolVector ('tmp_false', veclen)
for x in range(veclen):
s.add(tmp_false[x]==False)
tmp=BoolVector ('tmp', veclen)
s.add(tmp!=tmp_false) # not working
# I want here tmp equals to anything except False,False,False
print s.check()
print s.model()
I would use tuples, but length of vector is set during runtime.
Should I use arrays?
Or LISP-like cons-cells within tuples, as described in Z3 manuals?
The BoolVector function just creates a list structure. The != operator on python lists does not create an expression. It just evaluates to "true". So you are not really sending an expression to Z3. To create tuple expressions you can use algebraic data-types. A record type is a special case of an algebraic data-type, and Z3 understands how to reason about these.
So for example, you can write:
from z3 import *
s=Solver()
Bv = Datatype("record")
Bv.declare('mk', ('1', BoolSort()), ('2', BoolSort()), ('3', BoolSort()))
Bv = Bv.create()
tmp_false = Bv.mk(False, False, False)
tmp = Const('tmp', Bv)
print tmp != tmp_false
s.add(tmp!=tmp_false)
# I want here tmp equals to anything except False,False,False
print s.check()
print s.model()

z3: conversion of expressions with transcendental functions from z3py to smt-lib2

As per my knowledge, since z3 doesn't recognize transcendental functions its throwing me an error while conversion using following code.
def convertor(f, status="unknown", name="benchmark", logic=""):
v = (Ast * 0)()
if isinstance(f, Solver):
a = f.assertions()
if len(a) == 0:
f = BoolVal(True)
else:
f = And(*a)
return Z3_benchmark_to_smtlib_string(f.ctx_ref(), name, logic, status, "", 0, v, f.as_ast())
pi, EI, kA , kB, N = Reals('pi EI kA kB N')
s= Solver()
s.add(pi == 3.1415926525)
s.add(EI == 175.2481)
s.add(kA>= 0)
s.add(kA<= 100)
s.add(kB>= 0)
s.add(kB<= 100)
s.add(N>= 100)
s.add(N<= 200)
please change the path of the input file "beamfinv3.bch", which can be found at: link
continue_read=False
input_file = open('/home/mani/downloads/new_z3/beamfinv3.bch', 'r')
for line in input_file:
if line.strip()=="Constraints":
continue_read=True
continue
if line.strip()=="end":
continue_read=False
if continue_read==True:
parts = line.split(';')
if (parts[0]!="end"):
#print parts[0]
s.add(eval(parts[0]))
input_file.close()
file=open('cyber.smt2','w')
result=convertor(s, logic="None")
file.write (result)
error:
File "<string>", line 1, in <module>
NameError: name 'sin' is not defined
Any way out? or help?
Thanks.
The core of this problem is that eval tries to execute a Python script, i.e., all functions that occur within parts[0] must have a corresponding Python function of the same name, which is not the case for the trigonometric functions (the are neither in the Python API nor the C API, the former being based on the latter). For now you could try to add those functions yourself, perhaps with an implementation based on parse_smt2_string, or perhaps by replacing the Python strings with SMT2 strings altogether.
Z3 can represent expressions containing trigonometric functions, but it will refuse to do so when the logic is set to something; see arith_decl_plugin. I don't know Python well enough, but it might have to be None instead of "".
While Z3 can represent these expressions, it's probably not very good at solving them. See comments on the limitations in Can Z3 handle sinusoidal and exponential functions, Z3 supports for nonlinear arithmetics, and Z3 Performance with Non-Linear Arithmetic.

Stanford parser - count of tags

I have been using the Stanford Parser for CFG analysis. I can get the output displayed as a tree, but what I really want is a count of tags.
So I can get out, for example (taken from another query on Stack Overflow):
(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .)))
But what I really want is a count of the tags output in a CSV file:
PRP - 1
JJ - 1
Is this possible with the Stanford parser, particularly as I want to process several text files, or should I use a different program?
Yes, this is easily possible.
You will need:
import java.util.HashMap;
import edu.stanford.nlp.trees.Tree;
I assume from the other question you have an existing Tree object already.
I suspect you only want a list with the leave nodes (PRP, NN, RB... in your example), but you could do it for every node in general.
Then iterate over all nodes and count only the leaves:
Tree tree = ...
for (int i = 1; i < tree.size(); i++) {
Tree node = tree.getNodeNumber(i);
if (node.isLeaf()) {
// count here
}
}
The counting is done using a HashMap, you will find many examples on stackoverflow here.
Basically start with a Hashmap, using the tag as key and the tag-count as value.
edit: sorry, corrected a negation mistake in the code.
The previous answer, while being correct, iterates over all nodes in the parse tree. While there is no readily available method that returns the POS tag counts, you can directly get leaf nodes using methods in the edu.stanford.nlp.trees.Trees class as follows:
(I am using Guava's Function for a little extra elegance in the code, but a simple for loop will work just as well.)
Tree tree = sentence.get(TreeAnnotation.class); // parse tree of the sentence
List<CoreLabel> labels = Trees.taggedLeafLabels(tree); // returns the labels of the leaves in a Tree, augmented with POS tags.
List<String> tags = Lists.transform(labels, getPOSTag);
for (String tag : tags)
Collections.frequency(tags, tag);
where
Function<CoreLabel, String> getPOSTag = new Function<CoreLabel, String>() {
public String apply(CoreLabel core_label) { return core_label.get(PartOfSpeechAnnotation.class); }
};

prolog: parsing a sentence and generating a response in a simple language parser

so far I have the following working:
gen_phrase(S1,S3,Cr) :- noun_phrase(S1,S2,Cr1), verb_phrase(S2,S3,Cr2),
append([Cr1],[Cr2],Cr),add_rule(Cr).
question_phrase(S1,S5,Cr) :- ist(S1,S2),noun_phrase(S2,S3,Cr1),
noun_phrase(S3,S4,Cr2),
append([Cr1],[Cr2],Cr).
add_rule([X,Y]) :-
Fact =.. [Y, X],
assertz(Fact).
Given test run, code generates following:
1 ?- gen_phrase([the,comp456,is,a,computing,course],S3,Cr).
S3 = []
Cr = [comp456, computing_course].
add_rule(Cr) asserts existence of predicate computing_course(comp456).
Now what I would like to do is ask a question:
4 ?- question_phrase([is,the,comp456,a,computing,course],X,Cr).
Cr = [comp456, computing_course] .
What I need to do is extract computing_course and comp456, which I can do, then convert it into form accepted by prolog. This should look like Y(X) where Y = computing_course is a predicate and X = comp456 is atom. The result should be something similar to:
2 ?- computing_course(comp456).
true.
And later on for questions like "What are computing courses":
3 ?- computing_course(X).
X = comp456.
I thought about using assertz, however I still do not know how to call predicate once it is constructed. I am having hard time finding what steps need to be taken to accomplish this. (Using swi-prolog).
Edit: I have realized that there is a predicate call(). However I would like to construct something like this:
ask([X,Y]) :- call(Y(X)).
2 ?- gen_phrase([a,comp456,is,a,computing,course],S3,Cr).
S3 = [],
Cr = [comp456, computing_course]
4 ?- question_phrase([is,the,comp456,a,computing,course],X,Cr),ask(Cr).
ERROR: toplevel: Undefined procedure: ask/1 (DWIM could not correct goal)
It doesn't appear that such call() is syntactically correct. Would be good to know if this is at all possible and how.
call/N it's what you need (here N == 2):
ask([X,Y]) :- call(Y,X).
You could as well use something very similar to what you already use in add_rule/1:
ask([X,Y]) :- C =.. [Y,X], call(C).
The first form it's more efficient, and standardized also.

Resources