Matcher for keyword and its children spacy - parsing

I have a set of keywords I am already matching for. It is a medical context so I've made up a equivalent scenario at least for the parsing I'm trying to do:
I have a car with chrome 1000-inch rims.
Let's say I want to return as a phrase all children words/tokens of the keyword rims where rims is already marked as an entity by SpaCy as CARPART.
In python this is what I'm doing:
test_phrases = nlp("""I have a car with chrome 100-inch rims.""")
print(test_phrases.cats)
for t in test_phrases:
print('Token: {} || POS: {} || DEP: {} CHILDREN: {} || ent_type: {}'.format(t,t.pos_,t.dep_,[c for c in t.children],t.ent_type_))
Token: I || POS: PRON || DEP: nsubj CHILDREN: [] || ent_type:
Token: have || POS: VERB || DEP: ROOT CHILDREN: [I, car, .] ||
ent_type:
Token: a || POS: DET || DEP: det CHILDREN: [] || ent_type:
Token: car || POS: NOUN || DEP: dobj CHILDREN: [a, with] || ent_type:
Token: with || POS: ADP || DEP: prep CHILDREN: [rims] || ent_type:
Token: chrome || POS: ADJ || DEP: amod CHILDREN: [] || ent_type:
Token: 100-inch || POS: NOUN || DEP: compound CHILDREN: [] || ent_type:
Token: rims || POS: NOUN || DEP: pobj CHILDREN: [chrome, 100-inch] ||
ent_type:
Token: . || POS: PUNCT || DEP: punct CHILDREN: [] || ent_type: CARPART
So, what I want to do is use is something like:
test_matcher = Matcher(nlp.vocab)
test_phrase = ['']
patterns = [[{'ENT':'CARPART',????}] for kp in test_phrase]
test_matcher.add('CARPHRASE', None, *patterns)
call the test_matcher on test_doc have it return:
chrome 100-inch rims

I think I found a satisfactory solution that will work when creating a Spacy Class object. You can test this out to make sure it works with your solution then add to something like this in Spacy pipeline:
from spacy.matcher import Matcher
keyword_list = ['rims']
patterns = [[{'LOWER':kw}] for kw in keyword_list]
test_matcher.add('TESTPHRASE',None, *patterns)
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
doc = nlp("""I have a car with chrome 1000-inch rims.""")
add_children_matches(doc,test_matcher)
This gives:
keyword: rims
keyphrase: chrome 1000-inch rims
Edit: To fully answer my own question you'd have to use something like:
def add_children_matches(doc,keyword_matcher):
'''Add children to match on original single-token keyword.'''
matches = keyword_matcher(doc)
spans = []
for match_id, start, end in matches:
tokens = doc[start:end]
print('keyword:',tokens)
# Since we are getting children for keyword, there should only be one token
if len(tokens) != 1:
print('Skipping {}. Too many tokens to match.'.format(tokens))
continue
keyword_token = tokens[0]
sorted_children = sorted([c.i for c in keyword_token.children] + [keyword_token.i],reverse=False)
print('keyphrase:',doc[min(sorted_children):max(sorted_children)+1])
start_char = doc[min(sorted_children):max(sorted_children)+1].start_char
end_char = doc[min(sorted_children):max(sorted_children)+1].end_char
span = doc.char_span(start_char, end_char,label='CARPHRASE')
if span != None:
spans.append(span)
return doc

Related

How to implement the exercise 15.5 in pil4?

I am working on this exercise in pil4.
Exercise 15.5:
The approach of avoiding constructors when saving tables with cycles is too radical. It is
possible to save the table in a more pleasant format using constructors for the simple case, and to use
assignments later only to fix sharing and loops. Reimplement the function save (Figure 15.3, “Saving
tables with cycles”) using this approach. Add to it all the goodies that you have implemented in the previous
exercises (indentation, record syntax, and list syntax).
I have tried this with the code below, but it seems not to work on the nested table with a string key.
local function basicSerialize(o)
-- number or string
return string.format("%q",o)
end
local function save(name,value,saved,indentation,isArray)
indentation = indentation or 0
saved = saved or {}
local t = type(value)
local space = string.rep(" ",indentation + 2)
local space2 = string.rep(" ",indentation + 4)
if not isArray then io.write(name," = ") end
if t == "number" or t == "string" or t == "boolean" or t == "nil" then
io.write(basicSerialize(value),"\n")
elseif t == "table" then
if saved[value] then
io.write(saved[value],"\n")
else
if #value > 0 then
if indentation > 0 then io.write(space) end
io.write("{\n")
end
local indexes = {}
for i = 1,#value do
if type(value[i]) ~= "table" then
io.write(space2)
io.write(basicSerialize(value[i]))
else
local fname = string.format("%s[%s]",name,i)
save(fname,value[i],saved,indentation + 2,true)
end
io.write(",\n")
indexes[i] = true
end
if #value > 0 then
if indentation > 0 then io.write(space) end
io.write("}\n")
else
io.write("{}\n")
end
saved[value] = name
for k,v in pairs(value) do
if not indexes[k] then
k = basicSerialize(k)
local fname = string.format("%s[%s]",name,k)
save(fname,v,saved,indentation + 2)
io.write("\n")
end
end
end
else
error("cannot save a " .. t)
end
end
local a = { 1,2,3, {"one","Two"} ,5, {4,b = 4,5,6} ,a = "ddd"}
local b = { k = a[4]}
local t = {}
save("a",a,t)
save("b",b,t)
print()
And I got the wrong ouput.
a = {
1,
2,
3,
{
"one",
"Two",
}
,
5,
{
4,
5,
6,
}
a[6]["b"] = 4
,
}
a["a"] = "ddd"
b = {}
b["k"] = a[4]
How could I make the text ' a[6]["b"] = 4 ' jump out of the table constructor?

Searching for a table in table lua

I have a table in a table structure like this:
[1] = {
[1] = {
category = "WORD",
cursor = <filtered>,
ptype = "STATEMENT",
ttype = "SERVICE",
value = "service",
<metatable> = <filtered>
},
[2] = {
category = "VARIABLE",
cursor = <filtered>,
ptype = "STATEMENT",
ttype = "IDENTIFIER",
value = "TestService",
<metatable> = <filtered>
},
[3] = {
ttype = "BRACE_BLOCK",
value = {
[1] = { ...
...
[2] = {
[1] = {
category = "WORD",
cursor = <filtered>,
ptype = "STATEMENT",
ttype = "SERVICE",
value = "service",
<metatable> = <filtered>
},
[2] = {
category = "VARIABLE",
cursor = <filtered>,
ptype = "STATEMENT",
ttype = "IDENTIFIER",
value = "HelloWorld",
<metatable> = <filtered>
},
I programmed a simply loop which looks for the first table with the ttype, filtered that information out and would like to assign the rest of the tokens until the next Service starts to corresponding service. My idea looks like that:
local found_service = 0
if found_service == 0 then
for k1, v1 in pairs (parse_tree) do
for i=1,#v1 do
if v1[i].ttype == "SERVICE" then
--Store wanted data in an object
found_service = 1
end
if (found_service == 1 and v1[i].ttype ~= "SERVICE") then
-- ->assign the rest to last found service
end
if found_service == 1 and v1[i].ttype == "SERVICE" then
-- ->Found the next service -->starting over
found_service = 0
end
end
end
end
The problem is that I stuck at index i, and v1[i] is a "SERVICE", so he enters directly the last if-clause, too. How do I end one loop-iteration (after the first if-clause). Or ist there a much better way to do this?
Thanks in advise.
Theo
I'm not sure if I understand your general idea, but here is the answer of how to skip loop body on first "SERVICE" capture event.
local found_service = 0
if found_service == 0 then
for k1, v1 in pairs (parse_tree) do
for i=1,#v1 do
if (found_service == 0 and v1[i].ttype == "SERVICE") then
--Store wanted data in an object
found_service = 1
else
if (found_service == 1 and v1[i].ttype ~= "SERVICE") then
-- ->assign the rest to last found service
end
if found_service == 1 and v1[i].ttype == "SERVICE" then
-- ->Found the next service -->starting over
found_service = 0
end
end
end
end
end
But I'm still don't get it what should be done on current record not "SERVICE" and found_service == 0. By the way, in my answer after found_service become 0 in third if, the first if could be true again.
If your idea is to build some kind of vector like:
SERVICE_1 (other ttype tables until next SERVICE)
SERVICE_2 (other ttype tables until next SERVICE)
...
In that case code could be:
local found_service = 0
if found_service == 0 then
for k1, v1 in pairs (parse_tree) do
for i=1,#v1 do
if (found_service == 0 and v1[i].ttype == "SERVICE") then
--Store wanted data in an object
found_service = 1
current_service = v1[i]
else
if (found_service == 1 and v1[i].ttype ~= "SERVICE") then
-- ->assign the rest to last found service
end
if found_service == 1 and v1[i].ttype == "SERVICE" then
-- ->Found the next service -->starting over
current_service = v1[i]
end
end
end
end
end

Lua ordered table iteration

I need to iterate through Lua table in order which it's created. I found this article - http://lua-users.org/wiki/SortedIteration
But it doesn't seems like working:
function __genOrderedIndex( t )
local orderedIndex = {}
for key in pairs(t) do
table.insert( orderedIndex, key )
end
table.sort( orderedIndex )
return orderedIndex
end
function orderedNext(t, state)
-- Equivalent of the next function, but returns the keys in the alphabetic
-- order. We use a temporary ordered key table that is stored in the
-- table being iterated.
key = nil
--print("orderedNext: state = "..tostring(state) )
if state == nil then
-- the first time, generate the index
t.__orderedIndex = __genOrderedIndex( t )
key = t.__orderedIndex[1]
else
-- fetch the next value
for i = 1,table.getn(t.__orderedIndex) do
if t.__orderedIndex[i] == state then
key = t.__orderedIndex[i+1]
end
end
end
if key then
return key, t[key]
end
-- no more value to return, cleanup
t.__orderedIndex = nil
return
end
function orderedPairs(t)
return orderedNext, t, nil
end
Here is the usage example:
t = {
['a'] = 'xxx',
['b'] = 'xxx',
['c'] = 'xxx',
['d'] = 'xxx',
['e'] = 'xxx',
}
for key, val in orderedPairs(t) do
print(key.." : "..val)
end
I'm getting an error:
attempt to call field 'getn' (a nil value)
What is the problem?
table.getn has been removed since Lua 5.1, it's replaced by the # operator.
Change table.getn(t.__orderedIndex) to #t.__orderedIndex.

Check if a Lua table member exists at any level

I need to check if a member exists in a table that isn't at the next level, but along a path of members.
foo = {}
if foo.bar.joe then
print(foo.bar.joe)
end
this will cast an attempt to index field 'bar' (a nil value) because bar isn't defined.
My usual solution is to test the chain, piece-by-piece.
foo = {}
if foo.bar and foo.bar.joe then
print(foo.bar.joe)
end
but this can be very tedious when there are many nested tables. Are there a better way to do this test than piece-by-piece?
I don't understand what you try to mean by "along a path of members". From the example, I assume you are trying to find a value in a "subtable"?
local function search(master, target) --target is a string
for k,v in next, master do
if type(v)=="table" and v[target] then return true end
end
end
A simple example. If you use such a function, you can pass the foo table and the joe string to see if foo.*.joe exists. Hope this helps.
debug.setmetatable(nil, {__index = {}})
foo = {}
print(foo.bar.baz.quux)
print(({}).prd.krt.skrz.drn.zprv.zhlt.hrst.zrn) -- sorry ))
To search for an element that is at any level of a table, I would use a method such as this one:
function exists(tab, element)
local v
for _, v in pairs(tab) do
if v == element then
return true
elseif type(v) == "table" then
return exists(v, element)
end
end
return false
end
testTable = {{"Carrot", {"Mushroom", "Lettuce"}, "Mayonnaise"}, "Cinnamon"}
print(exists(testTable, "Mushroom")) -- true
print(exists(testTable, "Apple")) -- false
print(exists(testTable, "Cinnamon")) -- true
I think you're looking for something along these lines:
local function get(Obj, Field, ...)
if Obj == nil or Field == nil then
return Obj
else
return get(Obj[Field], ...)
end
end
local foo = {x = {y = 7}}
assert(get() == nil)
assert(get(foo) == foo)
assert(get(foo, "x") == foo.x)
assert(get(foo, "x", "y") == 7)
assert(get(foo, "x", "z") == nil)
assert(get(foo, "bar", "joe") == nil)
assert(get(foo, "x", "y") or 41 == 7)
assert(get(foo, "bar", "joe") or 41 == 41)
local Path = {foo, "x", "y"}
assert(get(table.unpack(Path)) == 7)
get simply traverses the given path until a nil is encountered. Seems to do the job. Feel free to think up a better name than "get" though.
As usual, exercise care when combining with or.
I'm impressed by Egor's clever answer, but in general I think we ought to not rely on such hacks.
See also
The 'Safe Table Navigation' patch for Lua 5.2 : http://lua-users.org/wiki/LuaPowerPatches
Lengthy discussion on this matter : http://lua-users.org/lists/lua-l/2010-08/threads.html#00519
Related technique : http://lua-users.org/wiki/AutomagicTables
I suspect something relevant has been implemented in MetaLua, but I can't find at the moment.
If I understood your problem correctly, here's one possibility:
function isField(s)
local t
for key in s:gmatch('[^.]+') do
if t == nil then
if _ENV[ key ] == nil then return false end
t = _ENV[ key ]
else
if t[ key ] == nil then return false end
t = t[ key ]
end
--print(key) --for DEBUGGING
end
return true
end
-- To test
t = {}
t.a = {}
t.a.b = {}
t.a.b.c = 'Found me'
if isField('t.a.b.c') then print(t.a.b.c) else print 'NOT FOUND' end
if isField('t.a.b.c.d') then print(t.a.b.c.d) else print 'NOT FOUND' end
UPDATE: As per cauterite's suggestion, here's a version that also works with locals but has to take two arguments :(
function isField(t,s)
if t == nil then return false end
local t = t
for key in s:gmatch('[^.]+') do
if t[ key ] == nil then return false end
t = t[ key ]
end
return true
end
-- To test
local
t = {}
t.a = {}
t.a.b = {}
t.a.b.c = 'Found me'
if isField(t,'a.b.c') then print(t.a.b.c) else print 'NOT FOUND' end
if isField(t,'a.b.c.d') then print(t.a.b.c.d) else print 'NOT FOUND' end
foo = {}
foo.boo = {}
foo.boo.jeo = {}
foo.boo.joe is foo['boo']['joe'] and so
i make next function
function exist(t)
local words = {}
local command
for i,v in string.gmatch(t, '%w+') do words[#words+1] = i end
command = string.format('a = %s', words[1])
loadstring(command)()
if a == nil then return false end
for count=2, #words do
a = a[words[count]]
if a == nil then return false end
end
a = nil
return true
end
foo = {}
foo.boo = {}
foo.boo.joe = {}
print(exist('foo.boo.joe.b.a'))
using loadstring to make temp variable. my lua ver is 5.1
remove loadstring at 5.2 5.3, instead using load

"Error: attempt to index local 'self' (a nil value)" in string.split function

Quick facts, I got this function from http://lua-users.org/wiki/SplitJoin at the very bottom, and am attempting to use it in the Corona SDK, though I doubt that's important.
function string:split(sSeparator, nMax, bRegexp)
assert(sSeparator ~= '')
assert(nMax == nil or nMax >= 1)
local aRecord = {}
if self:len() > 0 then
local bPlain = not bRegexp
nMax = nMax or -1
local nField=1 nStart=1
local nFirst,nLast = self:find(sSeparator, nStart, bPlain)
while nFirst and nMax ~= 0 do
aRecord[nField] = self:sub(nStart, nFirst-1)
nField = nField+1
nStart = nLast+1
nFirst,nLast = self:find(sSeparator, nStart, bPlain)
nMax = nMax-1
end
aRecord[nField] = self:sub(nStart)
end
return aRecord
end
The input: "1316982303 Searching server"
msglist = string.split(msg, ' ')
Gives me the error in the title. Any ideas? I'm fairly certain it's just the function is out of date.
Edit: lots more code
Here's some more from the main.lua file:
multiplayer = pubnub.new({
publish_key = "demo",
subscribe_key = "demo",
secret_key = nil,
ssl = nil, -- ENABLE SSL?
origin = "pubsub.pubnub.com" -- PUBNUB CLOUD ORIGIN
})
multiplayer:subscribe({
channel = "MBPocketChange",
callback = function(msg)
-- MESSAGE RECEIVED!!!
print (msg)
msglist = string.split(msg, ' ')
local recipient = msglist[0] --Get the value
table.remove(msglist, 0) --Remove the value from the table.
local cmdarg = msglist[0]
table.remove(msglist, 0)
arglist = string.split(cmdarg, ',')
local command = arglist[0]
table.remove(arglist, 0)
argCount = 1
while #arglist > 0 do
argname = "arg" .. argCount
_G[argname] = arglist[0]
table.remove(arglist, 0)
argCount = argCount + 1
end
Server.py:
This is the multiplayer server that sends the necessary info to clients.
import sys
import tornado
import os
from Pubnub import Pubnub
## Initiat Class
pubnub = Pubnub( 'demo', 'demo', None, False )
## Subscribe Example
def receive(message) :
test = str(message)
msglist = test.split()
recipient = msglist.pop(0)
msg = msglist.pop(0)
id = msglist.pop(0)
if id != "server":
print id
print msg
commandHandler(msg,id)
return True
def commandHandler(cmd,id):
global needOp
needOp = False
global matchListing
if server is True:
cmdArgList = cmd.split(',')
cmd = cmdArgList.pop(0)
while len(cmdArgList) > 0:
argument = 1
locals()["arg" + str(argument)] = cmdArgList.pop(0)
argument += 1
if cmd == "Seeking":
if needOp != False and needOp != id:
needOp = str(needOp)
id = str(id)
pubnub.publish({
'channel' : 'MBPocketChange',
#Message order is, and should remain:
#----------Recipient, Command,Arguments, Sender
'message' : needOp + " FoundOp," + id + " server"
})
print ("Attempting to match " + id + " with " + needOp + ".")
needOp = False
matchListing[needOp] = id
else:
needOp = id
pubnub.publish({
'channel' : 'MBPocketChange',
#Message order is, and should remain:
#----------Recipient, Command,Arguments, Sender
'message' : id + ' Searching server'
})
print "Finding a match for: " + id
elif cmd == "Confirm":
if matchListing[id] == arg1:
pubnub.publish({
'channel' : 'MBPocketChange',
#Message order is, and should remain:
#----------Recipient, Command,Arguments, Sender
'message' : arg1 + ' FoundCOp,' + id + ' server'
})
matchListing[arg1] = id
else:
pass #Cheater.
elif cmd == "SConfirm":
if matchListing[id] == arg1 and matchListing[arg1] == id:
os.system('python server.py MBPocketChange' + arg1)
#Here, the argument tells both players what room to join.
#The room is created from the first player's ID.
pubnub.publish({
'channel' : 'MBPocketChange',
#Message order is, and should remain:
#----------Recipient, Command,Arguments, Sender
'message' : id + ' GameStart,' + arg1 + ' server'
})
pubnub.publish({
'channel' : 'MBPocketChange',
#Message order is, and should remain:
#----------Recipient, Command,Arguments, Sender
'message' : arg1 + ' GameStart,' + arg1 + ' server'
})
else:
pass #hax
else:
pass
def connected():
pass
try:
channel = sys.argv[1]
server = False
print("Listening for messages on '%s' channel..." % channel)
pubnub.subscribe({
'channel' : channel,
'connect' : connected,
'callback' : receive
})
except:
channel = "MBPocketChange"
server = True
print("Listening for messages on '%s' channel..." % channel)
pubnub.subscribe({
'channel' : channel,
'connect' : connected,
'callback' : receive
})
tornado.ioloop.IOLoop.instance().start()
This error message happens if you run:
string.split(nil, ' ')
Double check your inputs to be sure you are really passing in a string.
Edit: in particular, msglist[0] is not the first position in the array in Lua, Lua arrays start at 1.
As an aside, this function was written when the intention that you'd use the colon syntactic sugar, e.g.
msglist=msg:split(' ')

Resources