Find OpenCyc concepts on the Web? - ontology

Is it possible to look at OpenCyc ontology content on the Web? I have been searching using Google and can find plenty of content about the Cyc ontology but no actual terms.

The OpenCyc semantic web pages are no longer available. You might try running an OpenCyc yourself.
https://github.com/bovlb/opencyc

Version 4 of the OpenCyc RDF ontology is hosted by an Ai Blockchain endpoint (permanent web server), with Cool URIs as described by Tim Berners Lee.
Here is an example URI: https://api-ai-blockchain.com/kb-endpoint/opencyc/Baseball_Ball
The returned page shows English glosses of the term, and its more specific and more general terms as hyperlinks.
In addition to OpenCyc version 4, the Ai Blockchain endpoint contains ontology merged from WordNet 3.0 wordsenses and Wiktionary wordsenses.

Related

How i know when a web site is using RDF?

How do I know when a web site uses an RDF ?
For example , I know that eBay and Amazon uses RDF because I've read in many articles, but as I know it in practice ?
In practice, there is currently no single standardized way for websites to "advertise" their use of RDF. You find out by them informing you about it in some fashion, e.g. by them publishing a link to API documentation that describes how they use RDF, or indeed by writing an article about it, so pretty much the same way you find out about any REST API / web service. Of course, in the case of RDF/linked data you are often helped by the fact that other datasets you already know about may be linking to the new source, thus making it discoverable.
There are some attempts at defining more standardized mechanisms for 'advertising' a website's linked data use. The W3C VoID specification is the closest thing to a standard in that regard: it provides a vocabulary for publishers to describe the data and access mechanisms they offer, and it also gives pointers on how to make things discoverable. Unfortunately, it is not (yet) very widely adopted.

How to link a domain ontology with WordNet synsets?

I have created a domain ontology for emotions. My task is to detect emotions conveyed in comments, posted on some forum. I want to populate each class of emotion (e.g., Joy) with its synonyms (e.g., Happy, Glad, etc.) using WordNet synsets, but I don't know how. I tried using IRI to create individuals but no luck so far. I'm using Protege 4.3.
Previously, a plugin called WordNetTab was available, but it is no longer supported (http://protege.cim3.net/cgi-bin/wiki.pl?WordNetTab).
Any suggestions?
As far as I know, there are no plugins that work with Protégé version 4.3 or greater. (I looked at the WordNet Princeton Related Projects page, Ontoling was a plugin someone created but it works with Protégé 3.2 only). You may have to resort to
1) Rolling back to a much older version of Protégé .
2) Using a Java (Or your preferred programming language) api for wordnet (like JAWS) along with a Java (again preference) based ontology framework (like OWL API or Apache Jena) to create these links.
3) Write a Protégé plugin for WordNet yourself!

HL7 version 3 parsing

I was parsing HL7 version 2.x messages through HAPI. Now I want to parse HL7 version 3 messages, which are in XML format. HAPI does not support HL7 version 3, so how can I do this?
HL7 version 3 is essentially XML-formatted HL7 data. As such, you could use any old XML parser. That said, you would have to build the intelligence re: segments etc... in yourself.
It does, however, appear that there is an HL7 v3 Java Special Interest Group, which has developed an API at least for RIM.
Another option would be to look at an integration engine. An open source option here is mirth. Mirth is a interface integration engine. It is a separate product - not a library you would integrate with your own. It can, however, take over the heavy lifting of converting HL7 to something more useful in your application - a Web Service call, a database insert, a differently formatted file (pdf, edi, etc).
Mohawk College publishes a Free and Open Source (FLOSS) API Framework for HL7 version 3 messaging and CDA Document processing called the "Everest Framework".
This framework is available for Java and .NET and comes with extensive examples and documentation on how to use HL7v3 messaging.
You can download the framework at (https://github.com/MohawkMEDIC/everest).
Support is also available via the GitHub project page.
This framework was developed through grant funding provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) and Canada Health Infoway.
I used HL7 Java SIG some time ago (2008), but it is very easy to 1. create your own parser from the schemas using JAXB (Generate Java classes from .XSD files...?), or 2. create your own parser from scratch (I would suggest to use Groovy XMLSlurper http://www.groovy-lang.org/processing-xml.html).
You were asking for a link to the official parser for HL7v3 (go to the section under "v3 Utilities", I'll admit it's not easy to find but here it is:
http://www.hl7.org/participate/toolsandresources.cfm?ref=nav
They have some examples and data files to test with as well.

Generating RDF From Natural Language

Are there any tools available for generating RDF from natural language? A list of RDFizers compiled by the SIMILE project only mentions one, the Monrai Cypher. Unfortunately, it seems to have been a proprietary tool developed by Monrai Technologies, which has since disappeared, and I can't find any download links. Has anyone seen anything similar?
You want some ontology learning and population tools.
This online article lists 4 different systems:
Text2Onto,
Abraxas,
KnowItAll,
OntoLearn
You may want to check out the book; it reviews several ontology learning tools as well:
Ontology learning from text: methods, evaluation and applications, by Paul Buitelaar, Philipp Cimiano, Bernardo Magnini
You might look into OpenCalias, Zemanta and Hakia which all have nice APIs for extracting semantic data out of internet resources. Not familiar with Monrai Cypher, but possibly these might help.
you could use the python nltk
to parse the text and emit the rdf tripplets

Looking for an information retrival / text mining application or library

We extract various information from e-mails - flights, car rentals, hotels and more. the method is to extract the body of the mail, usually in HTML form but sometime it's text or we use the information in a PDF/Word/RTF attachment. We then apply regular expressions (sometimes in several steps) in order to get information, which is provided in a tabular form (you can think of a flight table, hotel table, etc.). Notice, even though we parse HTML, this is not web scraping.
Currently we are using QL2's WebQL engine, but we are looking to replace it from business reasons. Can you recommend on another engine? It must run on Linux and be accessible from Java (a Java API would be the the best, but Web services are good solution as well). It also must support regular expressions for text extraction and not just to be based on the HTML structure.
I recommend that you have a look at R. It has an extensive number of text mining packages: have a look at the Natural Language Processing view. In particular, look at the tm package. Here are some relevant links:
Paper about the package in the Journal of Statistical Computing: http://www.jstatsoft.org/v25/i05/paper. The paper includes a nice example of an analysis of the R-devel
mailing list (https://stat.ethz.ch/pipermail/r-devel/) newsgroup postings from 2006.
Package homepage: http://cran.r-project.org/web/packages/tm/index.html
Look at the introductory vignette: http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
In addition, R provides many tools for parsing HTML or XML. Have a look at this question for an example using the RCurl and XML packages.
Edit: You can integrate R with Java with JRI. It's a very widely used package, with many examples. You can also see these related questions.
Have a look at:
LingPipe - LingPipe is a suite of Java libraries for the linguistic analysis of human language.
Lucene - Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
Just wanted to update - our final decision was to implement the parsing in groovy, and to add some required functionality (html to text, pdf to text, clean whitespace, etc.) either by implementing it in Java ot by relying on 3rd party libraries.
I use a custom parser made with Flex and C++ for similar purposes. I'd suggest you take a look at parser generators in java (javaCC .jj files) javacc-faq Nutch does it this way. (NutchAnalysis.jj)

Resources