Terms for basic NLP/logical parsing example - parsing

Given the following clause:
Female or not White
Is the following tree the correct representation of this?
OR
/ \
female NOT white
That is, would "not white" be one unit, or is it considered two?
Additionally, what are the following four elements usually called in parsing:
OR -- (logical?)
female -- (variable name?)
NOT -- (inversion? or is this also logical?)
TRUE -- (for example, whether the value of female is true or not -- variable value?)

Try this code:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Female or not White")
spacy.displacy.render(doc, style='dep')
Output:
So in your case, Not will be considered as inversion
Or you can refer here for sentence parsing- how to get parse tree using python nltk?

Related

What Lua pattern behaves like a regex negative lookahead?

my problem is I need to write a Lua code to interpret a text file and match lines with a pattern like
if line_str:match(myPattern) then do myAction(arg) end
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world". I found that in regex, what I want is called negative lookahead, and you would write it like
.*hello (?!world).*
but I'm struggling to find the Lua version of this.
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world".
As Wiktor has correctly pointed out, the simplest way to write this would be line:find"hello" and not line:find"hello world" (you can use both find and match here, but find is probably more performant; you can also turn off pattern matching for find).
I found that in regex, what I want is called negative lookahead, and
you would write it like .*hello (?!world).*
That's incorrect. If you checked against the existence of such a match, all it would tell you would be that there exists a "hello" which is not followed by a "world". The string hello hello world would match this, despite containing "hello world".
Negative lookahead is a questionable feature anyways as it isn't trivially provided by actually regular expressions and thus may not be implemented in linear time.
If you really need it, look into LPeg; negative lookahead is implemented as pattern1 - pattern2 there.
Finally, the RegEx may be translated to "just Lua" simply by searching for (1) the pattern without the negative part (2) the pattern with the negative part and checking whether there is a match in (1) that is not in (2) simply by counting:
local hello_count = 0; for _ in line:gmatch"hello" do hello_count = hello_count + 1 end
local helloworld_count = 0; for _ in line:gmatch"helloworld" do helloworld_count = helloworld_count + 1 end
if hello_count > helloworld_count then
-- there is a "hello" not followed by a "world"
end

Randomize math coefficients

I need a way to randomise an input math equation (mainly their coefficients).
So for example "x+y-ze^5" could be randomized into "x-y+2ze^4" or "2x+y-3z*e^4".
I am doing this in rails at the moment, and one of the main issues I have is that I can only store equations in strings rather than a math object. How do I do this? Is there any gems or API's I can use? I would also need to use this with Latex input equations. I have used latex API's but I have only found ones that can display equations, not those that turn strings into usable and modifiable math equations.
For example if I input "x+y+z", it should get randomized into "x-y+2ze^4".
Similarly if I give it "x'+sin(x/2)-Integral(xdx)", it could get randomized into "2x'-sin(x/4)-Integral(2xdx)". The idea here the function can take any equation I give it and randomise it's coefficients.
Not pretty but should be close to what you want
def random_coef
op1 = ['+','-'].sample
op2 = op1 == '+' ? '-' : '+'
"#{[1,2].sample}x #{op1} y #{op2} #{[1,2,3].sample}ze ^ #{[4,5].sample}".gsub(' ','')
end
10.times { puts random_coef}

Writing a text file containing LaTeX code from maxima expressions

Suppose in a (wx)Maxima session I have the following
f:sin(x);
df:diff(f,x);
Now I want to have it output a text file containing something like, for example
If $f(x)=\sin(x)$, then $f^\prime(x)=\cos(x)$.
I found the tex and tex1 functions but I think I need some additional string processing to be able to do what I want.
Any help appreciated.
EDIT: Further clarifications.
Auto Multiple Choice is a software that helps you create and manage questionaires. To declare questions one may use LaTeX syntax. From AMC's documentation, a question looks like this:
\element{geographie}{
\begin{question}{Cameroon}
Which is the capital city of Cameroon?
\begin{choices}
\correctchoice{Yaoundé}
\wrongchoice{Douala}
\wrongchoice{Abou-Dabi}
\end{choices}
\end{question}
}
As can be seen, it is just LaTeX. Now, with a little modification, I can turn this example into a math question
\element{derivatives}{
\begin{question}{trig_fun_diff_1}
If $f(x)=\sin(x)$ then $f^\prime(0)$ is
\begin{choices}
\correctchoice{$1$}
\wrongchoice{$-1$}
\wrongchoice{$0$}
\end{choices}
\end{question}
}
This is the sort of output I want. I'll have, say, a list of functions then execute a loop calculating their derivatives and so on.
OK, in response to your updated question. My advice is to work with questions and answers as expressions -- build up your list of questions first, and then when you have the list in the structure that you want, then output the TeX file as the last step. It is generally much clearer and simpler to work with expressions than with strings.
E.g. Here is a simplistic approach. I'll use defstruct to define a structure so that I can refer to its parts by name.
defstruct (question (name, datum, item, correct, incorrect));
myq1 : new (question);
myq1#name : "trig_fun_diff_1";
myq1#datum : f(x) = sin(x);
myq1#item : 'at ('diff (f(x), x), x = 0);
myq1#correct : 1;
myq1#incorrect : [0, -1];
You can also write
myq1 : question ("trig_fun_diff_1", f(x) = sin(x),
'at ('diff (f(x), x), x = 0), 1, [0, -1]);
I don't know which form is more convenient for you.
Then you can make an output function similar to this:
tex_question (q, output_stream) :=
(printf (output_stream, "\\begin{question}{~a}~%", q#name),
printf (output_stream, "If $~a$, then $~a$ is:~%", tex1 (q#datum), tex1 (q#item)),
printf (output_stream, "\\begin{choices}~%"),
/* make a list comprising correct and incorrect here */
/* shuffle the list (see random_permutation) */
/* output each correct or incorrect here */
printf (output_stream, "\\end{choices}~%"),
printf (output_stream, "\\end{question}~%));
where output_stream is an output stream as returned by openw (which see).
It may take a little bit of trying different stuff to get derivatives to be output in just the format you want. My advice is to put the logic for that into the output function.
A side effect of working with expressions is that it is straightforward to output some representations other than TeX (e.g. plain text, XML, HTML). That might or might not become important for your project.
Well, tex is the TeX output function. It can be customized to some extent via texput (which see).
As to post-processing via string manipulation, I don't recommend it. However, if you want to go down that road, there are regex functions which you can access via load(sregex). Unfortunately it's not yet documented; see the comment header of sregex.lisp (somewhere in your Maxima installation) for examples.

Extracting text from APA citation

I have a spreadsheet containing APA citation style text and I want to split them into author(s), date, and title.
An example of a citation would be:
Parikka, J. (2010). Insect Media: An Archaeology of Animals and Technology. Minneapolis: Univ Of Minnesota Press.
Given this string is in field I2 I managed to do the following:
Name: =LEFT(I2, FIND("(", I2)-1) yields Parikka, J.
Date: =MID(I2,FIND("(",I2)+1,FIND(")",I2)-FIND("(",I2)-1) yields 2010
However, I am stuck at extracting the name of the title Insect Media: An Archaeology of Animals and Technology.
My current formula =MID(I2,FIND(").",I2)+2,FIND(").",I2)-FIND(".",I2)) only returns the title partially - the output should show every character between ).and the following ..
I tried =REGEXEXTRACT(I2, "\)\.\s(.*[^\.])\.\s" ) and this generally works but does not stop at the first ". " - Like with this example:
Sanders, E. B.-N., Brandt, E., & Binder, T. (2010). A framework for organizing the tools and techniques of participatory design. In Proceedings of the 11th biennial participatory design conference (pp. 195–198). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1900476
Where is the mistake?
The title can be found (in the two examples you've given, at least) with this:
=MID(I2,find("). ",I2)+3,find(". ",I2,find("). ",I2)+3)-(find("). ",I2)+3)+1)
In English: Get the substring starting after the first occurrence of )., up to and including the first occurrence of . following.
If you wish to use REGEXEXTRACT, then this works (on your two examples). (You can also see a Regex101 demo.):
=REGEXEXTRACT(I3,"(?:.*\(\d{4}\)\.\s)([^.]*\.)(?: .*)")
Where is the mistake?
In your expression, you were capturing (.*[^\.]), which greedily includes any number of characters followed by a character in the character class not (backslash or dot), which means that multiple sentences can be captured. The expression finished with \.\s, which wasn't captured, so the capture group would end before a period-then-space, rather than including it.
Try:
=split(SUBSTITUTE(SUBSTITUTE(I2, "(",""), ")", ""),".")
If you don't replace the parentheses around 2010, it thinks it is a negative number -2010.
For your Title try adding index split to your existing formula:
=index(split(REGEXEXTRACT(A5, "\)\.\s(.*[^\.])\.\s" ),"."),0,1)&"."

Poker hand range parser ... how do I write the grammar?

I'd like to build a poker hand-range parser, whereby I can provide a string such as the following (assume a standard 52-card deck, ranks 2-A, s = suited, o = offsuit):
"22+,A2s+,AKo-ATo,7d6d"
The parser should be able to produce the following combinations:
6 combinations for each of 22, 33, 44, 55, 66, 77, 88, 99, TT, JJ, KK, QQ, AA
4 combinations for each of A2s, A3s, A4s, A5s, A6s, A7s, A8s, A9s, ATs, AJs, AQs, AKs
12 combinations for each of ATo, AJo, AQo, AKo
1 combination of 7(diamonds)6(diamonds)
I think I know parts of the grammar, but not all of it:
NM+ --> NM, N[M+1], ... ,N[N-1]
NN+ --> NN, [N+1][N+1], ... ,TT where T is the top rank of the deck (e.g. Ace)
NP - NM --> NM, N[M+1], ... ,NP
MM - NN --> NN, [N+1][N+1], ..., MM
I don't know the expression for the grammar for dealing with suitedness.
I'm a programming newbie, so forgive this basic question: is this a grammar induction problem or a parsing problem?
Thanks,
Mike
Well you should probably look at EBNF to show your grammar in a widely accepted manner.
I think it would look something like this:
S = Combination { ',' Combination } .
Combination = Hand ['+' | '-' Hand] .
Hand = Card Card ["s" | "o"] .
Card = rank [ color ] .
Where {} means 0 or more occurences, [] means 0 or 1 occurence and | means either whats left of | or whats right of |.
So basically what this comes down to is a start symbol (S) that says that the parser has to handle from 1 to any number of combinations that are all separated by a ",".
These combinations consist of a description of a card and then either a "+", a "-" and another card description or nothing.
A card description consists of rank and optionally a color (spades, hearts, etc.). The fact that rank and color aren't capitalized shows that they can't be further divided into subparts (making them a terminal class).
My example doesn't provide the offsuite/suite possibility and that is mainly because in you're examples one time the o/s comes at the very end "AK-ATo" and one time in the middle "A2s+".
Are these examples your own creation or are they given to you from an external source (read: you can't change them)?
If you can change them I would strongly recommend placing those at one specified position of a combination (for example at the end) to make creating the grammar and ultimately the parsing a lot easier.

Resources