how to tokenize indic languages using inltk

how to tokenize indic languages using inltk - machine-learning

i did this using this NLP documentation check it out:
https://inltk.readthedocs.io/en/latest/index.html
from inltk.inltk import tokenize
text="जो मुझको सताती है तुझे वो बातें आती है जब सामने तू होता नहीं बेचैनी बढ़ जाती है मैं रूठ "
tokenize(text ,'hi')
the error is:
RuntimeError: Internal: src/sentencepiece_processor.cc(890)
[model_proto->ParseFromArray(serialized.data(), serialized.size())]

The issue you encountered usually appears when a wrong SPM model is used, or when there is any other issue related to SPM model.
Make sure you set up the language support first:
from inltk.inltk import setup
setup('hi')

Related

Getting error when adding package to Spacy pipe

I'm getting an error while adding a spacy compatible extension, med7, to the pipeline. I've included the replicable code below.
!pip install -U https://med7.s3.eu-west-2.amazonaws.com/en_core_med7_lg.tar.gz
import spacy
import en_core_med7_lg
from spacy.lang.en import English
med7 = en_core_med7_lg.load()
# Create the nlp object
nlp2 = English()
nlp2.add_pipe(med7)
# Process a text
doc = nlp2("This is a sentence.")
The error I get is
Argument 'string' has incorrect type (expected str, got spacy.tokens.doc.Doc)
I realized I was having this problem because I don't understand the difference some components of Spacy. For instance, in the Negex extension package, loading the pipeline is done with the Negex command:
negex = Negex(nlp, ent_types=["PERSON","ORG"])
nlp.add_pipe(negex, last=True)
I don't understand what the difference between Negex and en_core_med7_lg.load(). For some reason, I when add "med7" into the pipeline, it causes this error. I'm new to Spacy and would appreciate an explanation so that I can learn. And please let me know if I can make this question any more clear. Thanks!

med7 is already the loaded pipeline. Run:
doc = med7("This is a sentence.")

Parse errror: version mismatch between Agda and its standard library

I'm running the prebuilt Windows Agda version 2.4.2.2. In Emacs/Agda2 Include Dirs I have identified c:/agda-stdlib-0.13/src and the folders one level below.
Upon loading a module which consists only these two lines, I get an error message.
module test1 where
open import Integer
The error message:
C:\agda-stdlib-0.13\src\Data\Empty.agda:13,5-5
C:\agda-stdlib-0.13\src\Data\Empty.agda:13,5: Parse error
HASKELL<ERROR> data AgdaEmpty #-} {-# COMPIL...
Is something missing from the proper installation of the library?

Your Agda is old. See this page for library compatibility. You need Agda 2.5.2 for this library.

H2O POJO causing javac java.lang.IllegalArgumentException

I have a distributed random forest POJO model using the default model setting except for:
ntrees = 150
max_depth = 50
min_rows = 5
Here are the full settings:
buildModel 'drf', {"model_id":"drf-335270ee-8970-4855-b521-c4fb4ca184f5","training_frame":"frame_0.750","validation_frame":"frame_0.250","nfolds":0,"response_column":"DENIAL","ignored_columns":["tx_match_date"],"ignore_const_cols":true,"ntrees":"150","max_depth":"50","min_rows":"5","nbins":20,"seed":-1,"mtries":-1,"sample_rate":0.6320000290870667,"score_each_iteration":true,"score_tree_interval":0,"balance_classes":false,"nbins_top_level":1024,"nbins_cats":1024,"r2_stopping":1.7976931348623157e+308,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"checkpoint":"","col_sample_rate_per_tree":1,"min_split_improvement":0.00001,"histogram_type":"AUTO","categorical_encoding":"AUTO","build_tree_one_node":false,"sample_rate_per_class":[],"binomial_double_trees":true,"col_sample_rate_change_per_level":1,"calibrate_model":false}
When I try to compile the pojo with:
$javac -cp "h2o-genmodel.jar" -J-Xmx2g -J-XX:MaxPermSize=128m drf_335270ee_8970_4855_b521_c4fb4ca184f5.java
I get the following error.
An exception has occurred in the compiler (1.8.0_131). Please file a bug against the Java compiler via the Java bug reporting page (http://bugreport.java.com) after checking the Bug Database (http://bugs.java.com) for duplicates. Include your program and the following diagnostic in your report. Thank you.
java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
at com.sun.tools.javac.util.BaseFileManager$ByteBufferCache.get(BaseFileManager.java:325)
at com.sun.tools.javac.util.BaseFileManager.makeByteBuffer(BaseFileManager.java:294)
at com.sun.tools.javac.file.RegularFileObject.getCharContent(RegularFileObject.java:114)
at com.sun.tools.javac.file.RegularFileObject.getCharContent(RegularFileObject.java:53)
at com.sun.tools.javac.main.JavaCompiler.readSource(JavaCompiler.java:602)
at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:665)
at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:950)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.main.Main.compile(Main.java:381)
at com.sun.tools.javac.main.Main.compile(Main.java:370)
at com.sun.tools.javac.main.Main.compile(Main.java:361)
at com.sun.tools.javac.Main.compile(Main.java:56)
at com.sun.tools.javac.Main.main(Main.java:42)
I don't get this error when replacing the DRF model with a deep learning pojo that I have also downloaded from h2o's Flow UI, so I'm thinking it is likely related to the drf_335270ee_8970_4855_b521_c4fb4ca184f5.java file (note that the POJO was too big to preview in H2O's Flow UI). What could be going on here?
Thanks

Instead of trying to compile an H2O random forest POJO, you can download and use a MOJO instead in almost exactly the same way without needing the compile step.
See:
http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html

Haskell-src-exts throws TemplateHaskell error

I'm trying to use the haskell-src-exts package to parse Haskell modules. Currently, I'm trying to parse the acme-io package's module, but I keep getting this error no matter what parse mode I try:
*** Exception: fromParseResult: Parse failed at [System/IO/Unsafe/Really/IMeanIt] (1:57): TemplateHaskell is not enabled
The module mentioned makes no references to TemplateHaskell, not in it's LANGUAGE pragma, nor is there a $ anywhere in the source file.
I'm wondering if my parse mode has something to do with it - here it is:
defaultParseMode { parseFilename = toFilePath m
, baseLanguage = Haskell2010
, extensions = []
, ignoreLanguagePragmas = True
, ignoreLinePragmas = True
, fixities = Nothing
}
I've also tried to replace the extensions field with knownExtensions from the parsing suite, without any luck.

This is a duplicate question of this answer - using the parseFile function fixed the issue. However, the reader should note that haskell-src-exts uses different parsing than GHC - I ran into another similar issue right after this, because haskell-src-exts can't handle multi-param contexts without -XMultiParamTypeClasses, yet GHC can, borking the parser if you're scraping Hackage. Hint may be a better option, can't say for sure though.

Is SVCUTIL.exe broken or is it the supplier's WSDL?

I am trying to use SVCUTIL from the SDK to generate the common types in several web services. When I try to generate code I get the errors (shown at the bottom)- these suggest the WSDL is broken. However, if I import the service in ServiceReferences Visual Studio does not complain. Is SVCUTIL broken?
The WSDLs are public and are:
http://test.wlr3.net/empws/services/WLR3AssuranceServices?wsdl
http://test.wlr3.net/empws/services/WLR3BillingServices?wsdl
http://test.wlr3.net/empws/services/WLR3DialogueServices?wsdl
http://test.wlr3.net/empws/services/WLR3FulfillmentServices?wsdl
http://test.wlr3.net/empws/services/WLR3InventoryServices?wsdl
http://test.wlr3.net/empws/services/WLR3InventoryOrderServices?wsdl
http://test.wlr3.net/empws/services/WLR3InventoryTroubleReportServices?wsdl
http://test.wlr3.net/empws/services/WLR3InventoryWorkItemServices?wsdl
http://test.wlr3.net/empws/services/WLR3IssueServices?wsdl
http://test.wlr3.net/empws/services/WLR3ReportingServices?wsdl
http://test.wlr3.net/empws/services/WLR3SecurityServices?wsdl
If you check these out you'll see a lot of common types and several namespaces. I've tried sending these to SVCUTIL to generate the code but it doesn't like the FulfillmentServices and the InventoryServices ones:
Error: Cannot import wsdl:portType
Detail: An exception was thrown while running a WSDL import extension: System.ServiceModel.Description.XmlSerializerMessageContractImporter
Error: Cannot import invalid schemas. Compilation on the XmlSchemaSet failed.
XPath to Error Source: //wsdl:definitions[#targetNamespace='http://imperatives.co.uk/V20']/wsdl:portType[#name='WLR3FulfilmentServices']
Error: Cannot import wsdl:binding
Detail: There was an error importing a wsdl:portType that the wsdl:binding is dependent on.
XPath to wsdl:portType: //wsdl:definitions[#targetNamespace='http://imperatives.co.uk/V20']/wsdl:portType[#name='WLR3FulfilmentServices']
XPath to Error Source: //wsdl:definitions[#targetNamespace='http://imperatives.co.uk/V20']/wsdl:binding[#name='WLR3FulfillmentServicesHttpBinding']
Error: Cannot import wsdl:port
Detail: There was an error importing a wsdl:binding that the wsdl:port is dependent on.
XPath to wsdl:binding: //wsdl:definitions[#targetNamespace='http://imperatives.co.uk/V20']/wsdl:binding[#name='WLR3FulfillmentServicesHttpBinding']
XPath to Error Source: //wsdl:definitions[#targetNamespace='http://imperatives.co.uk/V20']/wsdl:service[#name='WLR3FulfillmentServices']/wsdl:port[#name='WLR3FulfillmentServicesHttpPort']

As per comments to the question, there is a small difference in the two problem WSDLs. I compared these with older versions and found there is a type in these based on a virtual base type called "Dto".
This is applied to only two types - one in the Fulfillment Services and one in the Inventory Services, and these are also found in other WSDLs but without the base - the WSDL files are not consistent.
So I can exponerate SVCUTIL here and give the supplier a kick..
Thanks to John Saunders for looking at it

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

how to tokenize indic languages using inltk - machine-learning

The issue you encountered usually appears when a wrong SPM model is used, or when there is any other issue related to SPM model. Make sure you set up the language support first: from inltk.inltk import setup setup('hi')

Related

Getting error when adding package to Spacy pipe

Parse errror: version mismatch between Agda and its standard library

H2O POJO causing javac java.lang.IllegalArgumentException

Haskell-src-exts throws TemplateHaskell error

Is SVCUTIL.exe broken or is it the supplier's WSDL?

Categories

Resources