How would you evaluate a large scale ontology (the CSO) that comprises of one class and hundreds of individuals? - ontology

Is there such a tool that can automate the evaluation of such a large scale ontology without focusing on the evaluation of the class and mainly focuses on evaluating the ontologies individuals and their semantic relationships?

Related

What is the role of probability in machine learning software? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
There are several components and techniques used in learning programs. Machine learning components include ANN, Bayesian networks, SVM, PCA and other probability based methods. What role do Bayesian networks based techniques play in machine learning?
Also it would be helpful to know how does integrating one or more of these components into applications lead to real solutions, and how does software deal with limited knowledge and still produce sufficiently reliable results.
Probability and Learning
Probability plays a role in all learning. If we apply Shannon's information theory, the movement of probability toward one of the extremes 0.0 or 1.0 is information. Shannon defined a bit as the quotient of the log_2 of the before and after probabilities of a hypothesis. Given the probability of the hypothesis and its logical inversion, if the probability does not increase for either, no bits of information have been learned.
Bayesian Approaches
Bayesian Networks are directed graphs that represents causality hypotheses. They are generally represented as nodes with conditions connected by arrows that represent the hypothetical causes and corresponding effects. Algorithms have been developed based on Bayes' Theorem that attempt to statistically analyze causality from data that had been or is being collected.
MINOR SIDE NOTE: There are often usage constraints for the analytic tools. Most Bayesian algorithms require that the directed graph be acyclic, meaning that no series of arrows exist between two or more nodes anywhere in the graph that create a purely clockwise or purely counterclockwise closed loop. This is to avoid endless loops, however there may be now or in the future algorithms that work with cycles and handle them seamlessly from mathematical theory and software usability perspectives.
Application to Learning
The application to learning is that the probabilities calculated can be used to predict potential control mechanisms. The litmus test for learning is the ability to reliably alter the future through controls. An important application is the sorting of mail from handwriting. Both neural nets and Naive Bayesian classifiers can be useful in general pattern recognition integrated into routing or manipulation robotics.
Keep in mind here that the term network has a very wide meaning. Neural Nets are not at all the same approach as Bayesian Networks, although they may be applied to similar problem-solution topologies.
Relation to Other Approaches and Mechanisms
How a system designer uses support vector machines, principle component analysis, neural nets, and Bayesian networks in multivariate time series analysis (MTSA) varies from author to author. How they tie together also depends on the problem domain and statistical qualities of the data set, including size, skew, sparseness, and the number of dimensions.
The list given includes only four of a much larger set of machine learning tools. For instance Fuzzy Logic combines weights and production system (rule based) approaches.
The year is also a factor. An answer given now might be stale next year. If I were to write software given the same predictive or control goals as I was given ten years ago, I might combine various techniques entirely differently. I would certainly have a plethora of additional libraries and comparative studies to read and analyse before drawing my system topology.
The field is quite active.

What is the difference between Yago and DBpedia taxonomies?

Both of them are widely used to type DBpedia resources but it seems that YAGO has much more classes or concepts organized using rdfs:subClassOf predicate. Despite this, it is not clear if, for example, that class hierarchy is a DAG (like in DBpedia), how many classes conform it, etc.
DBpedia is a community effort to extract structured information from Wikipedia. In this sense, both YAGO and DBpedia share the same goal of generating a structured ontology. The projects differ in their foci. In YAGO, the focus is on precision, the taxonomic structure, and the spatial and temporal dimension. For a detailed comparison of the projects, see Chapter 10.3 of our AI journal paper "YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia".
[Link: http://resources.mpi-inf.mpg.de/yago-naga/yago/publications/aij.pdf]

How to combine ontologies?

I am a beginner in the field of ontologies, ontology alignment and composition of ontologies. What is the purpose of the composition of ontologies and on what basis it is performed and how ?
One of the main advantages of using ontologies is knowledge sharing. Different people from various backgrounds might develop the same ontology. This will often result in having different labels for the same concepts or relations. In order to be able to take advantage of having multiple ontologies in the same domain, for example for having a more comprehensive and expressive domain ontology, ontology matching/alignment comes to play. In the ontology matching, a mapping between concepts an relations of various ontologies is created.
For example, before national cancer institute came up with the first version of their cancer ontology, there were multiple ontologies modelling cancer around. They started by combining the various available ontologies and creating a central, more reliable ontology.
There are various algorithms for ontology matching. The algorithms normally are categorised based on:
input
process
output
Broadly putting it, you can either match on element to element basis, or based on the structure. The tools that can be used for matching can be linguistic resources such as WordNet for semantic matching, or domain specific resources, statistical approaches, taxonomy, various models, and etc.There are too much research in this area and you should really consider using google scholar.

what is the definition of "flexibility" of a method in Machine Learning? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I want to find the definition of "flexibility" of a method in machine learning, just like Lasso, SVM, Least Squares.
here is a representation of the tradeoff between flexibility and interpretability.
And I also think flexibility is a detailed numerical thing.
Because of my reputation, I cannot upload the pictures. If you want to know some details, you can read An Introduction to Statistical Learning, the pictures are on page 25 and page 31.
Thank you.
You can think of "Flexibility" of a model as the model's "curvy-ness" when graphing the model equation. A linear regression is said to be be inflexible. On the other hand, if you have 9 training sets that are each very different, and you require a more rigid decision boundary, the model will be deemed flexible, just because the model can't be a straight line.
Of course, there's an essential assumption that these models are adequate representations of the training data (a linear representation doesn't work well for highly spread out data, and a jagged multinomial representation doesn't work well with straight lines).
As a result, A flexible model will:
Generalize well across the different training sets
Comes at a cost of higher variance. That's why flexible models are generally associated with low bias
Perform better as complexity increases and/or # of data points increase (up to a point, where it won't perform better)
There's no rigor definition of method's flexibility. The aforementioned book says
We can try to address this problem by choosing flexible models that can fit many different possible functional forms flexible for f.
In that sense Least Squares is less flexible since it's a linear model. Kernel SVM, on contrary, doesn't have such limitation and can model fancy non-linear functions.
Flexibility isn't measured in numbers, the picture in the book shows relational data only, not actual points on a 2D-plane.
Flexibility describes the ability to increase the degrees of freedom available to the model to "fit" to the training data.

Application of Machine Learning Techniques to Chemistry [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I am a computer science student and i have to choose the theme of my future research work. I really want to solve some scientific problems in chemistry(or maybe biology) using computers. Also I have huge interest in machine learning sphere.
I have been surfing over internet for a while, and have found some particular references on that kind of problems. But, unfortunately, that stuff is not enough for me.
So, I am interested in the Community's recommendation of particular resources that present the application of an ML technique to solve a problem in chemistry--e.g., a journal article or a good book describing typical (or the new ones) problems in chemistry being solved "in silico".
i should think that chemistry, as much as any domain, would have the richest supply of problems particularly suited for ML. The rubric of problems i have in mind are QSAR (quantitative structure-activity relationships) both for naturally occurring compounds and prospectively, e.g., drug design.
Perhaps have a look at AZOrange--an entire ML library built for the sole purpose of solving chemistry problems using ML techniques. In particular, AZOrange is a re-implementation of the highly-regarded GUI-driven ML Library, Orange, specifically for the solution of QSAR problems.
In addition, here are two particularly good ones--both published within the last year and in both, ML is at the heart (the link is to the article's page on the Journal of Chemoinformatics Site and includes the full text of each article):
AZOrange-High performance open source machine learning for QSAR modeling in a graphical programming environment.
2D-Qsar for 450 types of amino acid induction peptides with a novel substructure pair descriptor having wider scope
It seems to me that the general natural of QSAR problems are ideal for study by ML:
a highly non-linear relationship between the expectation variables
(e.g, "features") and the response variable (e.g., "class labels" or
"regression estimates")
at least for the larger molecules, the structure-activity
relationships is sufficiently complex that they are at least several
generations from solution by analytical means, so any hope of
accurate prediction of these relationships can only be reliably
performed by empirical techniques
oceans of training data pairing analysis of some form of
instrument-produced data (e.g., protein structure determined by x-ray
crystallography) with laboratory data recording the chemical behavior
behavior of that protein (e.g., reaction kinetics)
So here are a couple of suggestions for interesting and current areas of research at the ML-chemistry interface:
QSAR prediction applying current "best practices"; for instance, the technique that won the NetFlix Prize (awarded sept 2009) was not based on a state-of-the-art ML algorithm, instead it used kNN. The interesting aspects of the winning technique are:
the data imputation technique--the technique for re-generating the data rows having one or more feature missing; the particular
technique for solving this sparsity problem is usually referred to by
the term Positive Maximum Margin Matrix Factorization (or
Non-Negative Maximum Margin Matrix Factorization). Perhaps there are
a interesting QSAR problems which were deemed insoluble by ML
techniques because of poor data quality, in particular sparsity.
Armed with PMMMF, these might be good problems to revisit
algorithm combination--the rubric of post-processing techniques that involve combining the results of two or more
classifiers was generally known to ML practitioners prior to the
NetFlix Prize but in fact these techniques were rarely used. The most
widely used of these techniques are AdaBoost, Gradient Boosting, and
Bagging (bootstrap aggregation). I wonder if there are some QSAR
problems for which the state-of-the-art ML techniques have not quite
provided the resolution or prediction accuracy required by the
problem context; if so, it would certainly be interesting to know if
those results could be improved by combining classifiers. Aside from their often dramatic improvement on prediction accuracy, an additional advantage of these techniques is that many of them are very simple to implement. For instance, Bagging works like this: train your classifier for some number of epochs and look at the results; identify those data points in your training data that caused the poorest resolution by your classifier--i.e., the data points it consistently predicted incorrectly over many epochs; apply a higher weight to those training instances (i.e., penalize your classifier more heavily for an incorrect prediction) and re-train y our classifier with this "new" data set.

Resources