How to use marginal, probability method in pycrfsuite.Tagger() - named-entity-recognition

Documentation is not helpful to me at all.
First, I tried using set() ,but I don't understand what it means by
set an instance for future calls
I could successfully feed my data using my dataset's structure described below.
So, I am not sure why I need to use set for that as it mentioned.
Here is my feature sequence of type scipy.sparse after I called nonzero() method.
[['66=1', '240=1', '286=1', '347=10', '348=1'],...]
where ... imply, same structure as previous elements
Second problem I encountered is Tagger.probability() and Tagger.marginal().
For Tagger.probability, I used the same input as Tagget.tag(), and I get this follwoing error.
and if my input is just a list instead of list of list. I get the following error.
Traceback (most recent call last):
File "cliner", line 60, in <module>
main()
File "cliner", line 49, in main
train.main()
File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 157, in main
train(training_list, args.model, args.format, args.use_lstm, logfile=args.log, val=val_list, test=test_list)
File "C:\Users\Anak\PycharmProjects\CliNER\code\train.py", line 189, in train
model.train(train_docs, val=val_docs, test=test_docs)
File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 200, in train
test_sents=test_sents, test_labels=test_labels)
File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 231, in train_fit
dev_split=dev_split )
File "C:\Users\Anak\PycharmProjects\CliNER\code\model.py", line 653, in generic_train
test_X=test_X, test_Y=test_Y)
File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 220, in train
train_pred = predict(model, X) # ANAK
File "C:\Users\Anak\PycharmProjects\CliNER\code\machine_learning\crf.py", line 291, in predict
print(tagger.probability(xseq[0]))
File "pycrfsuite/_pycrfsuite.pyx", line 650, in pycrfsuite._pycrfsuite.Tagger.probability
ValueError: The numbers of items and labels differ: |x| = 12, |y| = 73
For Tagger.marginal(), I can only produce error similar to first error shown of Tagger.probabilit().
Any clue on how to use these 3 methods?? Please give me shorts example of use cases of these 3 methods.
I feel like there must be some example of these 3 methods, but I couldn't find one. Am I looking at the right place. This is the website I am reading documentation from
Additional info: I am using CliNER. in case any of you are familiar with it.
https://python-crfsuite.readthedocs.io/en/latest/pycrfsuite.html

I know this questions is over a year old, but I just had to figure out the same thing as well -- I am also leveraging some of the CliNER framework. For the CliNER specific solution, I forked the repo and rewrote the predict method in the ./code/machine_learning/crf.py file
To obtain the marginal probability, you need to add the following line to the for loop that iterates over the pycrf_instances after yseq is created (see line 196 here)
y_probs = [tagger.marginal(y, ii) for ii, y in enumerate(yseq)]
And then you can return that list of marginal probabilities from the predict method -- you will in turn be required to rewrite additional functions in the to accommodate this change.

Related

Experiment does not run due to Error Messages

I have started programming an Arithmetic Strategy Use task in PsychoPy. The idea is to have a total of 80 arithmetic problems, which would essentially end up being 4 conditions; single addition (20 problems), single subtraction (20 problems), double addition (20 problems), double subtraction (20 problems).
What I have done so far:
I created 4 excel sheets; one per condition with 20 arithmetic problems
I inserted a routine called Trial and inserted 4 loops with Single Subtraction, Single Addition, Double Subtraction and Double addition.
I included a strategy report question after each trial
I tried to run the experiment, however, several error messages keep popping up and I am not sure how to troubleshoot them! Please find the error messages below:
*Traceback (most recent call last):
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/psychopy/app/builder/builder.py”, line 1419, in onPavloviaRun
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/psychopy/app/builder/builder.py”, line 1413, in onPavloviaSync
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/psychopy/app/pavlovia_ui/project.py”, line 844, in syncProject
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/psychopy/app/pavlovia_ui/functions.py”, line 148, in showCommitDialog
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/psychopy/projects/pavlovia.py”, line 1167, in commit
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/git/cmd.py”, line 542, in
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/git/cmd.py”, line 1005, in _call_process
File “/Users/nina/Desktop/PsychoPy.app/Contents/Resources/lib/python3.8/git/cmd.py”, line 822, in execute
git.exc.GitCommandError: Cmd(‘/Users/nina/Desktop/PsychoPy.app/Contents/Resources/git-core/git’) failed due to: exit code(128)
cmdline: /Users/nina/Desktop/PsychoPy.app/Contents/Resources/git-core/git commit -m _
stderr: ‘fatal: Unable to create ‘/Users/ninajost/Desktop/.git/index.lock’: File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by ‘git commit’. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
*
Any tips would be greatly appreciated!
I tried to run the experiment and expected it to run.

MIDO: ValueError: variable int must be a positive integer

In my code I get
Traceback (most recent call last):
File "Midi Projects/symbolToChord_v1.py", line 160, in <module>
mo.save("songWithChords.mid")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mido/midifiles/midifiles.py", line 432, in save
self._save(file)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mido/midifiles/midifiles.py", line 445, in _save
write_track(outfile, track)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mido/midifiles/midifiles.py", line 251, in write_track
data.extend(encode_variable_int(msg.time))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/mido/midifiles/meta.py", line 112, in encode_variable_int
raise ValueError('variable int must be a positive integer')
ValueError: variable int must be a positive intege
I suppose I am running the latest version on MIDO.
pip freeze | grep mido
mido==1.2.9
what am doing wrong?
Any help would be greatly appreciated.
I am no expert, but I had a similar problem.
The time attribute in mido is a bit confusing as it can either represents ticks or time deltas. From the documentation (https://mido.readthedocs.io/en/latest/midi_files.html#about-the-time-attribute
):
The time attribute is used in several different ways:
inside a track, it is delta time in ticks. This must be an integer.
in messages yielded from play(), it is delta time in seconds (time elapsed since the last yielded message)
(only important to implementers) inside certain methods it is used for absolute time in ticks or seconds
You can also see this github issue for reference https://github.com/mido/mido/issues/189

Filtering input file with chunksize and skiprows using line number as index in dask dataframe

I have ~70gb output of MD simulations. A pattern of a fixed-number-of-lines explanation and a fixed-number-of-lines data regularly repeat in the file. How can I read the file in Dask Dataframe chunk by chunk in which the explanation lines are ignored?
I successfully wrote a lambda function in the skiprows argument of the pandas.read_csv to ignore the explanation lines and only read the data lines. I converted the pandas-entered code to dask one but it does not work. Here you can see the dask code written by replacing pandas.read_csv with dd.read_csv:
# First extracting number of atoms and hence, number of data lines:
with open(filename[0],mode='r') as file: # The same as Chanil's code
line = file.readline()
line = file.readline()
line = file.readline()
line = file.readline() # natoms
natoms = int(line)
skiplines = 9 # Number of explanation lines repeating after nnatoms lines of data
def logic_for_chunk(index):
"""This function read a chunk """
if index % (natoms+skiplines) > 8:
return False
return True
df_chunk = dd.read_csv('trajectory.txt',sep=' ',header=None,index_col=False,skiprows=lambda x: logic_for_chunk(x),chunksize=natoms)
Here the indexes of the dataframe is line numbers of the file. Using above code, at the first chunk, lines 0 to 8 in file are ignored, then the lines 9 to 58 are read. At the next chunk, the line 59 to 67 are ignored and then a natoms-size chunk from line 68 to 117 are read. This happens until all the data snapshots are read.
Unfortunately, while the above code works well in pandas, it does not works in dask. How can I implement a similar procedure in dask dataframe?
The dask dataframe read_csv function cuts the file up at byte locations. It is unable to determine exactly how many lines are in each partition, so it is unwise to depend on the row index within each partition.
If there is some other way to detect a bad line then I would try that. Ideally you will be able to determine a bad line based on the content of the line, not on its location within the file (like every eighth line).

Why does the result of my sage command becomes question mark?

I'm trying to use sagetex package, but then I found this kind of problem when running the code
the result
Here is the code I tried to run:
\documentclass{article}
\usepackage{sagetex}
\usepackage{graphicx}
\usepackage{fancyvrb}
\begin{document}
Using Sage\TeX, one can use Sage to compute things and put them into
your \LaTeX{} document. For example, there are
$\sage{number_of_partitions(1269)}$ integer partitions of $1269$.
You don't need to compute the number yourself, or even cut and paste
it from somewhere.
Here's some Sage code:
\begin{sageblock}
f(x) = exp(x) * sin(2*x)
\end{sageblock}
The second derivative of $f$ is
\[
\frac{\mathrm{d}^{2}}{\mathrm{d}x^{2}} \sage{f(x)} =
\sage{diff(f, x, 2)(x)}.
\]
Here's a plot of $f$ from $-1$ to $1$:
\sageplot{plot(f, -1, 1)}
\sageplot[scale=.5]{plot3d(sin(pi*(x^2+y^2))/2,(x,-1,1),(y,-1,1))}
we know that 2010 factors to $\sage{factor(2010)}$
\begin{sagesilent}
m=identity_matrix(QQ,3)
m[0]=m[0]+m[1]
m[1]=m[1]-m[2]
m[2]=m[2]-2*m[1]
m[1]=m[1]+3*m[0]
m[0]=2*m[0]
\end{sagesilent}
Compute the rref of $\sage{m}$
\begin{sageblock}
g(x)=taylor(tan(x),x,0,10)
\end{sageblock}
$$\tan(x)=\sage{g(x)}$$
\end{document}
When I try to compile this, I get:
**** Error in Sage code on line 23 of file.tex! Traceback follows.
Traceback (most recent call last):
File "file.sagetex.sage.py", line 39, in <module>
_st_.plot(_sage_const_1 , format='notprovided', _p_=plot3d(sin(pi*(x**_sage_const_2 +y**_sage_const_2 ))/_sage_const_2 ,(x,-_sage_const_1 ,_sage_const_1 ),(y,-_sage_const_1 ,_sage_const_1 )))
NameError: name 'y' is not defined
Once a single error occurs, the rest of the Sage output may be lost, leading to all of the question marks. The problem is this line:
\sageplot[scale=.5]{plot3d(sin(pi*(x^2+y^2))/2,(x,-1,1),(y,-1,1))}
and in particular, you have not defined y. (In SageMath, x is automatically defined to be a variable, but not y.) If you add this before the plot, it should work:
\begin{sagesilent}
var('y')
\end{sagesilent}

z3py: How do I set the used logic in z3py?

Using the following code:
import z3
solver = z3.Solver(ctx=z3.Context())
#solver = z3.Solver()
Direction = z3.Datatype('Direction')
Direction.declare('up')
Direction.declare('down')
Direction = Direction.create()
Cell = z3.Datatype('Cell')
Cell.declare('cons', ('front', Direction), ('back', z3.IntSort()))
Cell = Cell.create()
mycell = z3.Const("mycell", Cell)
solver.add(Cell.cons(Direction.up, 10) == Cell.cons(Direction.up, 10))
I get the following error:
Traceback (most recent call last):
File "thedt2opttest.py", line 17, in <module>
solver.add(Cell.cons(Direction.up, 10) == Cell.cons(Direction.up, 10))
File "/home/john/tools/z3-master/build/python/z3/z3.py", line 6052, in add
self.assert_exprs(*args)
File "/home/john/tools/z3-master/build/python/z3/z3.py", line 6040, in assert_exprs
arg = s.cast(arg)
File "/home/john/tools/z3-master/build/python/z3/z3.py", line 1304, in cast
_z3_assert(self.eq(val.sort()), "Value cannot be converted into a Z3 Boolean value")
File "/home/john/tools/z3-master/build/python/z3/z3.py", line 90, in _z3_assert
raise Z3Exception(msg)
z3types.Z3Exception: Value cannot be converted into a Z3 Boolean value
When only using z3.Solver() without giving a new z3.Context as Parameter the code is working.
Can someone please answer the following questions:
What is the difference here?
How do I set the logic in z3py?
Which logic should I use with datatypes?
Solution: SolverFor()
To set a logic with Z3Py, instead of creating a solver using the Solver() function constructor, you can use the SolverFor(logic) function, where logic is the logic you would like to use.
For example, if you type:
s = SolverFor("LIA")
then the variable s would contain a solver based on Linear Integer Arithmetics, or if you type
s = SolverFor("LRA")
then the variable s would contain a solver based on Linear Real Arithmetics.
Beware that, so far (but I haven't used z3 for a while, then the updated versions may have fixed this) if you specify a non-existing/unsupported logic, for example typing SolverFor("abc"), then no error would be generated and the logic would be guessed automatically as usual.
Because of the issue above, the only ways to test whether the logic you want is actually being used would be to compare the performances with respect to an automatically chosen logic, or to try solving something which is not supported by the logic you specified (for example, using real variables when you specified LIA which only accepts integer varaibales) to see if an error is generated. If yes, then the solver is actually trying to use that logic.

Resources