Getting error when adding package to Spacy pipe - machine-learning

I'm getting an error while adding a spacy compatible extension, med7, to the pipeline. I've included the replicable code below.
!pip install -U https://med7.s3.eu-west-2.amazonaws.com/en_core_med7_lg.tar.gz
import spacy
import en_core_med7_lg
from spacy.lang.en import English
med7 = en_core_med7_lg.load()
# Create the nlp object
nlp2 = English()
nlp2.add_pipe(med7)
# Process a text
doc = nlp2("This is a sentence.")
The error I get is
Argument 'string' has incorrect type (expected str, got spacy.tokens.doc.Doc)
I realized I was having this problem because I don't understand the difference some components of Spacy. For instance, in the Negex extension package, loading the pipeline is done with the Negex command:
negex = Negex(nlp, ent_types=["PERSON","ORG"])
nlp.add_pipe(negex, last=True)
I don't understand what the difference between Negex and en_core_med7_lg.load(). For some reason, I when add "med7" into the pipeline, it causes this error. I'm new to Spacy and would appreciate an explanation so that I can learn. And please let me know if I can make this question any more clear. Thanks!

med7 is already the loaded pipeline. Run:
doc = med7("This is a sentence.")

Related

how to tokenize indic languages using inltk

i did this using this NLP documentation check it out:
https://inltk.readthedocs.io/en/latest/index.html
from inltk.inltk import tokenize
text="जो मुझको सताती है तुझे वो बातें आती है जब सामने तू होता नहीं बेचैनी बढ़ जाती है मैं रूठ "
tokenize(text ,'hi')
the error is:
RuntimeError: Internal: src/sentencepiece_processor.cc(890)
[model_proto->ParseFromArray(serialized.data(), serialized.size())]
The issue you encountered usually appears when a wrong SPM model is used, or when there is any other issue related to SPM model.
Make sure you set up the language support first:
from inltk.inltk import setup
setup('hi')

Not able to save pyspark iforest model using pyspark

Using iforest as described here: https://github.com/titicaca/spark-iforest
But model.save() is throwing exception:
Exception:
scala.NotImplementedError: The default jsonEncode only supports string, vector and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Double.
Followed the code snippet mentioned under "Python API" section on mentioned git page.
from pyspark.ml.feature import VectorAssembler
import os
import tempfile
from pyspark_iforest.ml.iforest import *
col_1:integer
col_2:integer
col_3:integer
assembler = VectorAssembler(inputCols=in_cols, outputCol="features")
featurized = assembler.transform(df)
iforest = IForest(contamination=0.5, maxDepth=2)
model=iforest.fit(df)
model.save("model_path")
model.save() should be able to save model files.
Below is the output dataframe I'm getting after executing model.transform(df):
col_1:integer
col_2:integer
col_3:integer
features:udt
anomalyScore:double
prediction:double
I have just fixed this issue. It was caused by an incorrect param type. You can checkout the latest codes in the master branch, and try it again.

What is the purpose of foundation.dart in Writing and Reading Files using path_provider plugin?

I am trying to understand how to read and write data on text files using path_provider plugin.
I've read an example on how to use it on Flutter from here. Then I saw this line of code which I don't understand:
import "package:flutter/foundation.dart";
I tried to comment it out from the code and ran "flutter run":
//import "package:flutter/foundation.dart";
And to my surprise, it ran perfectly. Although it raised some errors like:
E/DartVM (23127): 'dart:core/runtime/libintegers.dart': error: Unexpected tag 0 (Nothing) in ?, expected expression
E/DartVM (23127): ../../third_party/dart/runtime/vm/compiler/intrinsifier.cc: 153: error: Intrinsifier failed to find method ~ in class _Smi
and
E/DartVM (23237): 'dart:typed_data': error: Unexpected tag 15 (DirectPropertyGet) in ?, expected type
E/DartVM (23237): ../../third_party/dart/runtime/vm/compiler/intrinsifier.cc: 153: error: Intrinsifier failed to find method get:x in class _Float32x4
But it ran well. I don't know why. When should I use it? What method from the code did the foundation.dart was used?
I would appreciate any kind of enlightment. Thanks in advance.
[UPDATE]
I think I understand why foundation library was used in the example code. Maybe because the example code used the "required" constant from the foundation library.

Basic and enhanced dependencies give different results in Stanford coreNLP

I am using dependency parsing of coreNLP for a project of mine. The basic and enhanced dependencies are different result for a particular dependency.
I used the following code to get enhanced dependencies.
val lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
lp.setOptionFlags("-maxLength", "80")
val rawWords = edu.stanford.nlp.ling.Sentence.toCoreLabelList(tokens_arr:_*)
val parse = lp.apply(rawWords)
val tlp = new PennTreebankLanguagePack()
val gsf:GrammaticalStructureFactory = tlp.grammaticalStructureFactory()
val gs:GrammaticalStructure = gsf.newGrammaticalStructure(parse)
val tdl = gs.typedDependenciesCCprocessed()
For the following example,
Account name of ramkumar.
I use simple API to get basic dependencies. The dependency i get between
(account,name) is (compound). But when i use the above code to get enhanced dependency i get the relation between (account,name) as (dobj).
What is the fix to this? Is this a bug or am i doing something wrong?
When I run this command:
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -file example.txt -outputFormat json
With your example text in the file example.txt, I see compound as the relationship between both of those words for both types of dependencies.
I also tried this with the simple API and got the same results.
You can see what simple produces with this code:
package edu.stanford.nlp.examples;
import edu.stanford.nlp.semgraph.SemanticGraphFactory;
import edu.stanford.nlp.simple.*;
import java.util.*;
public class SimpleDepParserExample {
public static void main(String[] args) {
Sentence sent = new Sentence("...example text...");
Properties props = new Properties();
// use sent.dependencyGraph() or sent.dependencyGraph(props, SemanticGraphFactory.Mode.ENHANCED) to see enhanced dependencies
System.out.println(sent.dependencyGraph(props, SemanticGraphFactory.Mode.BASIC));
}
}
I don't know anything about any Scala interfaces for Stanford CoreNLP. I should also note my results are using the latest code from GitHub, though I presume Stanford CoreNLP 3.8.0 would also produce similar results. If you are using an older version of Stanford CoreNLP that could be a potential cause of the error.
But running this example in various ways using Java I don't see the issue you are encountering.

GCE Python API: oauth2client.util:execute() takes at most 1 positional argument (2 given)

I'm trying to get started with the Python API for Google Compute Engine using their "hello world" tutorial on https://developers.google.com/compute/docs/api/python_guide#setup
Whenever making the call response = request.execute(auth_http) though, I get the following error signaling that I can't authenticate:
WARNING:oauth2client.util:execute() takes at most 1 positional argument (2 given)
I'm clearly only passing one positional argument (auth_http), and I've looked into oauth2client/util.py, apiclient/http.py, and oauth2client/client.py for answers, but nothing seems amiss. I found another stack overflow post that encountered the same issue, but it seems that in the constructor of the OAuth2WebServerFlow class in oauth2client/client.py, 'access_type' is set to 'offline' already (though to be honest I don't completely understand what's going on here in terms of setting up oauth2.0 flows).
Any suggestions would be much appreciated, and thanks in advance!
Looking at the code, the #util.positional(1) annotation is throwing the warning. Avoid it using named parameters.
Instead of:
response = request.execute(auth_http)
Do:
response = request.execute(http=auth_http)
https://code.google.com/p/google-api-python-client/source/browse/apiclient/http.py#637
I think documentation is wrong. Please use the following:
auth_http = credentials.authorize(http)
# Build the service
gce_service = build('compute', API_VERSION, http=auth_http)
project_url = '%s%s' % (GCE_URL, PROJECT_ID)
# List instances
request = gce_service.instances().list(project=PROJECT_ID, filter=None, zone=DEFAULT_ZONE)
response = request.execute()
You can do one of three things here:
1 Ignore the warnings and do nothing.
2 Suppress the warnings and set the flag to ignore:
import oauth2client
import gflags
gflags.FLAGS['positional_parameters_enforcement'].value = 'IGNORE'
3 Figure out where the positional parameter is being provided and fix it:
import oauth2client
import gflags
gflags.FLAGS['positional_parameters_enforcement'].value = 'EXCEPTION'
# Implement a try and catch around your code:
try:
pass
except TypeError, e:
# Print the stack so you can fix the problem, see python exception traceback docs.
print str(e)

Resources