How to link a domain ontology with WordNet synsets? - ontology

I have created a domain ontology for emotions. My task is to detect emotions conveyed in comments, posted on some forum. I want to populate each class of emotion (e.g., Joy) with its synonyms (e.g., Happy, Glad, etc.) using WordNet synsets, but I don't know how. I tried using IRI to create individuals but no luck so far. I'm using Protege 4.3.
Previously, a plugin called WordNetTab was available, but it is no longer supported (http://protege.cim3.net/cgi-bin/wiki.pl?WordNetTab).
Any suggestions?

As far as I know, there are no plugins that work with Protégé version 4.3 or greater. (I looked at the WordNet Princeton Related Projects page, Ontoling was a plugin someone created but it works with Protégé 3.2 only). You may have to resort to
1) Rolling back to a much older version of Protégé .
2) Using a Java (Or your preferred programming language) api for wordnet (like JAWS) along with a Java (again preference) based ontology framework (like OWL API or Apache Jena) to create these links.
3) Write a Protégé plugin for WordNet yourself!

Related

differences between classes in backtype.storm & org.apache.storm & com.twitter.heron packages

I want to write some custom schedulers for apache heron, and i'm diving a little deep into the source code. I noticed that in the heron source code there are couple of packages with similar classes. For example most of classes in backtype.storm & org.apache.storm are similar(exactly similar such that inside codes are identical). There are also some similar classes between these two packages and com.twitter.heron(for example com.twitter.heron.api.tuple.Fields) but some of them have different code inside(such as the Fields class). I know that when writing topologies we can import each package that we want and we can choose between either one of these but i'm curious about the differences between them and why they put all of these packages together. and didn't merge them? And if storm classes are the only choice for writing topologies, what are classes in com.twitter.heron package good for?
I know that heron is designed to be fully backward compatible with storm and this might be because of the backward compatibility issue, but i have to admit that this has confused me a lot, because i need to write my own code inside these classes and i don't know how to choose which one, which one is constantly developing and maintaining by developers and i should choose them as candidates to modify.
Thanks in advance.
Based on the Descriptions of the developer team in here:
Use of heron api classes is not recommended - since we might change them frequently. They are meant for internal usage only.
backtype.storm is if your application wants to use pre-storm 1.0.0. For post 1.0.0 applications, you should use org.apache.storm

Making ontology in Protege or by Jena API library in Eclipse(Java coding)

Is it also possible to build an ontology with Jena API for example by Java coding in Eclipse? if yes what is the difference between making an ontology in Protege with making an ontology by Jena API in Eclipse?I am really confused about differences as I am not good in java programming.
Lots of thanks
It depends on what you actually want to do. Are you trying to build a new ontology from scratch? Do you have an existing ontology that you want to extend by adding some data. Are you generating lots of triples using pre-existing classes and properties, based on some data you're already processing in Java?
Protege offers a GUI that allows you to create the ontology more quickly. It also gives you a graphical overview of the ontology in creation. It's sometimes hard to get the big picture even with all this help. It's a powerful tool for overall ontology authoring. Writing an entire ontology by scribbling Java code line-by line seems a gruelling task to me.
Writing plain Java code to create an onthology would be very difficult and inefficient. On top of the complexity of RDF itself, you'd need to understand the Jena API. IMO, it only makes sense if you have a well-defined ontology and you're really sure what kind of triples you want to add. In such case, using Java code to interface with some data source could save you a lot of time. Provided that you know Java itself to do it efficiently.
In your case, sticking with Protege seems the most reasonable option.
The answer is yes: it's perfectly possible to create a no ontology through the Jena API and Java code.
The difference between Protege and plain old Java code is the task you need to perform: if you need to inspect an ontology or create a few concepts manually, Protege allows you to do it fairly quickly without writing code first. If you need to execute some repetitive task, on a large number of entities, or carry out something else for which Protege does not cater, then you're better off writing Java code.
This is not different from a question like: is it better to use a text editor or write a Perl script to edit text? It very much depends on the exact task at hand. Both are possible.

Generating RDF From Natural Language

Are there any tools available for generating RDF from natural language? A list of RDFizers compiled by the SIMILE project only mentions one, the Monrai Cypher. Unfortunately, it seems to have been a proprietary tool developed by Monrai Technologies, which has since disappeared, and I can't find any download links. Has anyone seen anything similar?
You want some ontology learning and population tools.
This online article lists 4 different systems:
Text2Onto,
Abraxas,
KnowItAll,
OntoLearn
You may want to check out the book; it reviews several ontology learning tools as well:
Ontology learning from text: methods, evaluation and applications, by Paul Buitelaar, Philipp Cimiano, Bernardo Magnini
You might look into OpenCalias, Zemanta and Hakia which all have nice APIs for extracting semantic data out of internet resources. Not familiar with Monrai Cypher, but possibly these might help.
you could use the python nltk
to parse the text and emit the rdf tripplets

Looking for an information retrival / text mining application or library

We extract various information from e-mails - flights, car rentals, hotels and more. the method is to extract the body of the mail, usually in HTML form but sometime it's text or we use the information in a PDF/Word/RTF attachment. We then apply regular expressions (sometimes in several steps) in order to get information, which is provided in a tabular form (you can think of a flight table, hotel table, etc.). Notice, even though we parse HTML, this is not web scraping.
Currently we are using QL2's WebQL engine, but we are looking to replace it from business reasons. Can you recommend on another engine? It must run on Linux and be accessible from Java (a Java API would be the the best, but Web services are good solution as well). It also must support regular expressions for text extraction and not just to be based on the HTML structure.
I recommend that you have a look at R. It has an extensive number of text mining packages: have a look at the Natural Language Processing view. In particular, look at the tm package. Here are some relevant links:
Paper about the package in the Journal of Statistical Computing: http://www.jstatsoft.org/v25/i05/paper. The paper includes a nice example of an analysis of the R-devel
mailing list (https://stat.ethz.ch/pipermail/r-devel/) newsgroup postings from 2006.
Package homepage: http://cran.r-project.org/web/packages/tm/index.html
Look at the introductory vignette: http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
In addition, R provides many tools for parsing HTML or XML. Have a look at this question for an example using the RCurl and XML packages.
Edit: You can integrate R with Java with JRI. It's a very widely used package, with many examples. You can also see these related questions.
Have a look at:
LingPipe - LingPipe is a suite of Java libraries for the linguistic analysis of human language.
Lucene - Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
Just wanted to update - our final decision was to implement the parsing in groovy, and to add some required functionality (html to text, pdf to text, clean whitespace, etc.) either by implementing it in Java ot by relying on 3rd party libraries.
I use a custom parser made with Flex and C++ for similar purposes. I'd suggest you take a look at parser generators in java (javaCC .jj files) javacc-faq Nutch does it this way. (NutchAnalysis.jj)

Tools to manage semantic webs

I've seen a lot frameworks to create a semantic web (or rather the model below it). What tools are there to create a small semantic web or repository on the desktop, for example for personal information management.
Please include information how easy these are to use for a casual user, (in contrast to someone who has worked in this area for years). So I'd like to hear which tools can create a repository without a lot of types and where you can type the nodes later, as you learn about your problem domain.
For personal semantic information management on the desktop there is NEPOMUK. There are two versions, one embedded in kde4, this lets you tag, rate and comment things such as files, folders, pictures, mp3s, etc. on the desktop across all applications.
Another version is written in Java and is OS independent, this is more of a research prototype. It has more features, but is overall less stable.
For KDE-Nepomuk see http://nepomuk.kde.org/
For Java-Nepomuk see http://dev.nepomuk.semanticdesktop.org/ and http://dev.nepomuk.semanticdesktop.org/download/ for downloads (the DFKI version is better)
Extensive list of semantic web tools
Also check out Protege
If you need to create a small model, then I suggest that you use topbraid. I have used for creating much larger models and I know people who have used to create humongous models. It comes packaged with a set of reasoners and provides ability to plug-in custom reasoner and in case if you decide to make your model larger, you can even integrate Topbraid with a triple store like Allegrograph.
And since its based on eclipse, to get started with it is relatively easier.
For developers who are spoiled working in more matured programming languages like Java (IDEA ? anyone), topbraid is the closest tool to an actual IDE.
Chandler is a "a notebook you can organize, back up and share!" It seems to be pretty simple to use.
OS: Windows, Mac, Linux

Resources