RDF file for language synonyms - search-engine

I am trying to build my own search engine and i want to add an ontology to it, so I want to add a RDF ontology that has all English language synonyms.
I already tried to search for RDF files from this website but i didn't found any think helpful
DataSetRDFDumps
I am trying to search for files or a way to make it my self but I can't find anything helpful, can any one help me??.

A good place to look for language-related RDF data is the Linguistic Linked Open Data site.
English language synonyms should be in WordNet which has an RDF version available:
WordNet RDF

Related

How to link a domain ontology with WordNet synsets?

I have created a domain ontology for emotions. My task is to detect emotions conveyed in comments, posted on some forum. I want to populate each class of emotion (e.g., Joy) with its synonyms (e.g., Happy, Glad, etc.) using WordNet synsets, but I don't know how. I tried using IRI to create individuals but no luck so far. I'm using Protege 4.3.
Previously, a plugin called WordNetTab was available, but it is no longer supported (http://protege.cim3.net/cgi-bin/wiki.pl?WordNetTab).
Any suggestions?
As far as I know, there are no plugins that work with Protégé version 4.3 or greater. (I looked at the WordNet Princeton Related Projects page, Ontoling was a plugin someone created but it works with Protégé 3.2 only). You may have to resort to
1) Rolling back to a much older version of Protégé .
2) Using a Java (Or your preferred programming language) api for wordnet (like JAWS) along with a Java (again preference) based ontology framework (like OWL API or Apache Jena) to create these links.
3) Write a Protégé plugin for WordNet yourself!

Making ontology in Protege or by Jena API library in Eclipse(Java coding)

Is it also possible to build an ontology with Jena API for example by Java coding in Eclipse? if yes what is the difference between making an ontology in Protege with making an ontology by Jena API in Eclipse?I am really confused about differences as I am not good in java programming.
Lots of thanks
It depends on what you actually want to do. Are you trying to build a new ontology from scratch? Do you have an existing ontology that you want to extend by adding some data. Are you generating lots of triples using pre-existing classes and properties, based on some data you're already processing in Java?
Protege offers a GUI that allows you to create the ontology more quickly. It also gives you a graphical overview of the ontology in creation. It's sometimes hard to get the big picture even with all this help. It's a powerful tool for overall ontology authoring. Writing an entire ontology by scribbling Java code line-by line seems a gruelling task to me.
Writing plain Java code to create an onthology would be very difficult and inefficient. On top of the complexity of RDF itself, you'd need to understand the Jena API. IMO, it only makes sense if you have a well-defined ontology and you're really sure what kind of triples you want to add. In such case, using Java code to interface with some data source could save you a lot of time. Provided that you know Java itself to do it efficiently.
In your case, sticking with Protege seems the most reasonable option.
The answer is yes: it's perfectly possible to create a no ontology through the Jena API and Java code.
The difference between Protege and plain old Java code is the task you need to perform: if you need to inspect an ontology or create a few concepts manually, Protege allows you to do it fairly quickly without writing code first. If you need to execute some repetitive task, on a large number of entities, or carry out something else for which Protege does not cater, then you're better off writing Java code.
This is not different from a question like: is it better to use a text editor or write a Perl script to edit text? It very much depends on the exact task at hand. Both are possible.

authoring and maintaining multilingual documentation

I need to author and maintain multilingual technical documentation, where each document is made up of some "standard" portions and some specific to the product (industrial equipments): standard portions could be warnings, quotes from laws or regulations, common sentences, or the like.
Each "portion" (i.e. a sentence, a paragraph, an image with its caption, an annotation or a paragraph with an icon) is translated in 13 languages, and each translation should be versioned and completed with references to the author.
Hence each document would result as a "composition" of those portions: a language-specific instance of the document is one using the portions in that specific translation.
Is there any specific technology, standards, or tools for doing that?
The standard is Dita and being it essentially XML-based there is plenty of tools to work with, both standalone XML editors (OxygenXML, XMetal XML, Adobe Framemaker, for full reference: maintained list of editors) and extensions to well-known CMS (Drupal, Joomla) and ECM (Alfresco, MS Sharepoint already have; other like Nuxeo can handle by configuring the XML grammar according to the DITA specs).

Generating RDF From Natural Language

Are there any tools available for generating RDF from natural language? A list of RDFizers compiled by the SIMILE project only mentions one, the Monrai Cypher. Unfortunately, it seems to have been a proprietary tool developed by Monrai Technologies, which has since disappeared, and I can't find any download links. Has anyone seen anything similar?
You want some ontology learning and population tools.
This online article lists 4 different systems:
Text2Onto,
Abraxas,
KnowItAll,
OntoLearn
You may want to check out the book; it reviews several ontology learning tools as well:
Ontology learning from text: methods, evaluation and applications, by Paul Buitelaar, Philipp Cimiano, Bernardo Magnini
You might look into OpenCalias, Zemanta and Hakia which all have nice APIs for extracting semantic data out of internet resources. Not familiar with Monrai Cypher, but possibly these might help.
you could use the python nltk
to parse the text and emit the rdf tripplets

Looking for an information retrival / text mining application or library

We extract various information from e-mails - flights, car rentals, hotels and more. the method is to extract the body of the mail, usually in HTML form but sometime it's text or we use the information in a PDF/Word/RTF attachment. We then apply regular expressions (sometimes in several steps) in order to get information, which is provided in a tabular form (you can think of a flight table, hotel table, etc.). Notice, even though we parse HTML, this is not web scraping.
Currently we are using QL2's WebQL engine, but we are looking to replace it from business reasons. Can you recommend on another engine? It must run on Linux and be accessible from Java (a Java API would be the the best, but Web services are good solution as well). It also must support regular expressions for text extraction and not just to be based on the HTML structure.
I recommend that you have a look at R. It has an extensive number of text mining packages: have a look at the Natural Language Processing view. In particular, look at the tm package. Here are some relevant links:
Paper about the package in the Journal of Statistical Computing: http://www.jstatsoft.org/v25/i05/paper. The paper includes a nice example of an analysis of the R-devel
mailing list (https://stat.ethz.ch/pipermail/r-devel/) newsgroup postings from 2006.
Package homepage: http://cran.r-project.org/web/packages/tm/index.html
Look at the introductory vignette: http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
In addition, R provides many tools for parsing HTML or XML. Have a look at this question for an example using the RCurl and XML packages.
Edit: You can integrate R with Java with JRI. It's a very widely used package, with many examples. You can also see these related questions.
Have a look at:
LingPipe - LingPipe is a suite of Java libraries for the linguistic analysis of human language.
Lucene - Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
Just wanted to update - our final decision was to implement the parsing in groovy, and to add some required functionality (html to text, pdf to text, clean whitespace, etc.) either by implementing it in Java ot by relying on 3rd party libraries.
I use a custom parser made with Flex and C++ for similar purposes. I'd suggest you take a look at parser generators in java (javaCC .jj files) javacc-faq Nutch does it this way. (NutchAnalysis.jj)

Resources