Mesh ontology inference type - ontology

What is the meaning of the type of ontology inference (materialized vs non-materialized version)? and what bis the difference between them.

An ontology is typically stored as and .rdf or .owl file. An ontology contains axioms. Based on these axioms AI reasoners can infer that additional axioms must hold. When these additional axioms are stored as part of the .rdf/.owl it said that the inferences are materialized, otherwise the inferences are non-materialized.

Related

ALS methods - train, trainImplicit and fit

What is the difference between Difference between als.train(, als.fit() , als.traimImplicit()
First of all we should know difference between implicit and explicit feedback.
explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.
For better understanding you can look at below two links:
Why does ALS.trainImplicit give better predictions for explicit ratings?
https://stats.stackexchange.com/questions/133565/how-to-set-preferences-for-als-implicit-feedback-in-collaborative-filtering
train and trainimplicit are used in mllib package which is used for rdd data. With spark dataframe, spark has new module with name ml. In ml package it uses spark dataframe for calculating ratings and the method name is fit. fit method from ml use matrix factorization. for more detail check doc for ALS(ml) class.
https://github.com/apache/spark/blob/926e3a1efe9e142804fcbf52146b22700640ae1b/python/pyspark/ml/recommendation.py
Also,ml module is faster than mllib.
What's the difference between Spark ML and MLLIB packages

Estimator choice for mapping string independent variable to string categorical dependent variable

I'm attempting to build a predictive model that can map text-based vendor-provided descriptions of a service to around 800 standardized service codes, based on a training set of about 13,000 correctly mapped services.
Each standardized service code also has a standardized description, which is usually similar to the vendor-provided description (i.e, some of the words used are the same), but not identical. Descriptions are typically 3-10 word in length
My main issue is that I'm not sure what type of estimator will be appropriate for this problem.
I have tried using simple fuzzy matching approaches, including:
Counting matching words/characters between the vendor-provided and standardized service descriptions and selecting the one with the most matches
Trying to find the standardized service description with the minimum Levenshtein distance
These have not worked particularly well due to the use of synonymous but different word choices within the vendor-provided and standardized descriptions.
I have also considered using a decision tree, but it seems infeasible given 800+ possible outcomes.
Which type of estimator can I use to solve this problem?

When is entity replacement necessary for relation extraction?

In this tutorial for "Training a Machine Learning Classifier for Relation Extraction from Medical Literature" the author does Entity replacement because that "we don’t want the model to learn according to a specific entity name, but we want it to learn according to the structure of the text".
Is this generally true or does it depend on the dataset or the used models?
Entity replacement, much like other text transformation techniques including stemming and lemmatization, is usually a part of the relationship extraction process because it increases the number of observation per feature. That increase in ratio might help your problem, depending on the size of the dataset, quality of features, type of feature extraction, and complexity of the model.
A good rule of thumb is to define your objective and subsequently your acceptable representation based on your understanding of the dataset. For instance, the given tutorial sets out to understand the relationship between miRNA and genes. The author is okay with grouping miRNA-335, miRNA-342, miRNA-100 and others, under the same entity name.
In scenarios, where you don't have domain understanding of the corpora, you can start out without entity replacement, inspect the result and understand the model's bias-variance tradeoff. Then, if required, try entity replacement after experimenting with some clustering techniques.

Stanford parser output doesn't match demo output

If I use the Stanford CoreNLP neural network dependency parser with the english_SD model, which performed pretty good according to the website (link, bottom of the page), it provides completely different results compared to this demo, which I assume is based on the LexicalizedParser (or at least any other one).
If I put the sentence I don't like the car in the demo page, this is the result:
If I put the same sentence into the neural network parser, it results in this:
In the result of the neural network parser, everything just depends on like. I think it could be due to the different POS-Tags, but I used the CoreNLP Maxent Tagger with the english-bidirectional-distsim.tagger model, so pretty common I think. Any ideas on this?
By default, we use the english-left3words-distsim.tagger model for the tagger which is faster than the bidirectional model but occasionally produces worse results. As both, the constituency parser which is used on the demo page, and the neural network dependency parser which you used, heavily rely on POS tags it is not really surprising that the different POS sequences lead to different parses, especially when the main verb has a function word tag (IN = prepositon) instead of a content word tag (VB = verb, base form).
But also note that the demo outputs dependency parses in the new Universal Dependencies representation, while the english_SD model parses sentences to the old Stanford Dependencies representation. For your example sentence, the correct parses are actually the same but you will see differences for other sentences, especially if they have prepositional phrases which are being treated differently in the new representation.

Binary And Alternate Representation Transforming

In this publication about Metamorphic viruses i have found this classification:
Metamorphic malware may be either a binary-transformer or an alternate-representation-transformer. The former class transforms the binary image that is executed,
whereas the latter class carries its code in a higher level representation, which is used for
transformation.
I did not found a precise definition of these two terms. I would like to know if is there a generic definition for each one, or a generic context to introduce the classification in my dissertation.
Thanks all.
More common term for Binary-Transformer is Binary Code Obfuscation or simple Binary Obfuscation (plays an essential role in evading malware static analysis and detection). Some anthers also use term Post-compilation obfuscation[*]. Term Binary Obfuscation also used in reverse engineering for innocent purpose (to recover source file).[1][2][3]
Whereas, for Alternate-Representation-Transformer term Assembly Level Obfuscation, Source Code Obfuscation( or Source Obfuscation) you can use Mnemonic Level Obfuscation, Code Obfuscation.
Read this sort article to find useful common terms.
(but I am not sure for Post-compilation obfuscation)
Paper Writing is not exact science. Different authors use different(rare) words to prevent probability of match. Many time my papers/journal rejected only due to presentation, but not because of technical flaw.

Resources