Creating an ontology for an agglutinative language - ontology

I am sorry if the question is to vague, but I am very new to this concept and don't know where to go.
If you are developing an ontology for a agglutinative language, what is the correct way to identify the terms?
There are not so many examples in English, but :
telling, tell and told basically refer to same concept. How do you deal with that?

The process of finding base or root form (stem) of inflected or derived words is called stemming. There are libraries for doing this, for example NLTK stem package.

Related

Finding Java Grammars

I'm making my way into Rascal - nice work!
I'm halfway through the Tutor and might be getting ahead of myself, but one of my interests is refactoring Java. I've watched Tijs van der Storm's Curry On 2016 talk where he presents an example of this (the "TrafoFields" module around minute 16 in the video).
I was eager to get into this kind of work so I searched the documentation for the Java grammar and a similar example using it, to no avail. What's more, the library documentation has under lang::java only m3 and jdt. I reloaded Tijs' video to find he uses lang::java::\syntax::Java15. I blindly tried importing that in the Repl and it worked (albeit with lots of warnings)! I opened the Rascal .jar file to find there is even more in this package.
So my questions in this context are:
Why is this not in the documentation? I would have expected that the library documentation is exhaustive. Couldn't you at least add "TrafoFields" to the recipes?!
Is there an alternative way of finding out about such modules besides the online documentation (and apart from searching the .jar file)?
What is this weird backslash in the module name before "syntax", ::\syntax?
All good questions; and also implied suggestions for improvement. In the meantime if you get stuck please ask more questions.
Short answer: we've prioritized documenting what is used most and what is used in courses at universities and then what we've received questions about. So thanks for asking.
In reverse order:
The backslash is a generic escape to use identifiers in Rascal which are also keywords in the language. So any variable or package name or module name could be named "if" but you'd have to escape it. This helps a lot when defining abstract syntax trees for programming languages, as you can imagine where "if" would be a great name for an ast node type. "Syntax" happens to be a keyword in Rascal too.
The Rascal explorer view in eclipse can be used to browse through the library interactively. Also you can crawl the library using util::FileSystem and get a first class representation of everything that is in a library like |std:///|. Agreed, these are poorman's solutions and a good feature would a fully indexed searchable library. We're halfway there (see the analysis::text:: Lucene pre-work we've done towards supporting this.
That is a rhetorical question which I accept as a suggestion for improvement 😉😊 the answer is yes.

Creating parsers using flex/bison

Hi I'm need to create a parser to parse search engine advanced query languages:
For instance: “food” language:es
I want to use Flex and Bison but I've never used them. I was wondering if anyone could point me to a good tutorial online, then it would be really helpful. I've been looking online but I didn't find anything useful.
Also, If anyone can provide any sample flex/bison code, I would really appreciate it.
Thanks so much in advance
I'm surprised you have been unable to find good tutorial's online, as the use of flex and bison and similar compiling tools are used in large numbers of computer science university courses world wide. As many people are learning them there are a large number of resources available. You must not have been using the right search terms. There are also numerous helpful tutorial videos on YouTube (including mine).
When I searched, this one came are the first result: http://aquamentus.com/flex_bison.html
The page suggested by #Bart Kiers http://dinosaur.compilertools.net/ is good too.

How does "Language Oriented Programming" compare to OOP/Functional in the real world

I recently began to read some F# related literature, speaking of "Real World Functional Programming" and "Expert F#" e. g.. At the beginning it's easy, because I have some background in Haskell, and know C#. But when it comes to "Language Oriented Programming" I just don't get it. - I read some explanations and it's like reading an academic paper that gets more abstract and strange with every sentence.
Does anybody have an easy example for that kind of stuff and how it compares to existing paradigms? It's not just academic fantasy, isn't it? ;)
Thanks,
wishi
Language oriented program (LOP) can be used to describe any of the following.
Creating an external language (DSL)
This is perhaps the most common use of LOP, and is where you have a specific domain - such as UPS shipping packages via transit types through routes, etc. Rather than try to encode all of these domain-specific entities inside of program code, you rather create a separate programming language for just that domain. So you can encode your problem in a separate, external language.
Creating an internal language
Sometimes you want your program code to look less like 'code' and map more closely to the problem domain. That is, have the code 'read more naturally'. A fluent interface is an example of this: Fluent Interface. Also, F# has Active Patterns which support this quite well.
I wrote a blog post on LOP a while back that provides some code examples.
F# has a few mechanisms for doing programming in a style one might call "language-oriented".
First, the syntax niceties (function calls don't need parentheses, can define own infix operators, ...) make it so that many user-defined libraries have the appearance of embedded DSLs.
Second, the F# "quotations" mechanism can enable you to quote code and then run it with an alternative semantics/evaluation engine.
Third, F# "computation expressions" (aka workflows, monads, ...) also provide a way to provide a type of alternative semantics for certain blocks of code.
All of these kinda fall into the EDSL category.
In Object Oriented Programming, you try to model a problem using Objects. You can then connect those Objects together to perform functions...and in the end solve the original problem.
In Language Oriented Programming, rather than use an existing Object Oriented or Functional Programming Language, you design a new Domain Specific Language that is best suited to efficiently solve your problem.
The term language Oriented Programming may be overloaded in that it might have different meanings to different people.
But in terms of how I've used it, it means that you create a DSL(http://en.wikipedia.org/wiki/Domain_Specific_Language) before you start to solve your problem.
Once your DSL is created you would then write your program in terms of the DSL.
The idea being that your DSL is more suited to expressing the problem than a General purpose language would be.
Some examples would be the make file syntax or Ruby on Rails ActiveRecord class.
I haven't directly used language oriented programming in real-world situations (creating an actual language), but it is useful to think about and helps design better domain-driven objects.
In a sense, any real-world development in Lisp or Scheme can be considered "language-oriented," since you are developing the "language" of your application and its abstract tree as you code along. Cucumber is another real-world example I've heard about.
Please note that there are some problems to this approach (and any domain-driven approach) in real-world development. One major problem that I've dealt with before is mismatch between the logic that makes sense in the domain and the logic that makes sense in software. Domain (business) logic can be extremely convoluted and senseless - and causes domain models to break down.
An easy example of a domain-specific language, mentioned here, is SQL. Also: UNIX shell scripts.
Of course, if you are doing a lot of basic ops and have a lot of overlap with the underlying language, it is probably overengineering.

Writing a parser - In the need of guides and research papers

My knowledge about implementing a parser is a bit rusty.
I have no idea about the current state of research in the area, and could need some links regarding recent advances and their impact on performance.
General resources about writing a parser are also welcome, (tutorials, guides etc.) since much of what I had learned at college I have already forgotten :)
I have the Dragon book, but that's about it.
And does anyone have input on parser generators like ANTLR and their performance? (ie. comparison with other generators)
edit My main target is RDF/OWL/SKOS in N3 notation.
Mentioning the dragon book and antlr means you've answered your own question.
If you're looking for other parser generators you could also check out boost::spirit (http://spirit.sourceforge.net/).
Depending on what you're trying to achieve you might also want to consider a DSL, which you can either parse yourself or write in a scripting language like boo, ruby, python etc...
Hmm … your request is a bit unspecific. While there are many recent developments in this general area, they're all quite specialized (naturally, since the field has matured). The original parsing approaches haven't really changed, though. You might want to read up on changes in parser creation tools (Antlr, Gold Parser, to name but a few).
You might also want to take a look at SableCC, another parser generator "which generates fully featured object-oriented frameworks for building compilers".
Their is some documentation about basic uses here and here. Since you asked about research papers, SableCC's main developper's master thesis (1998) is available and explains a little more about SableCC advantages.
Although the current stable version is 3.2, the development branch v4 is a complete rewrite and should implement features new to parser generators.
If you want to build custom analyzers for complex languages,
consider our DMS Software Reengineering Toolkit.
See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
This provides very strong parsing technology, making it "easy" to define your language
(especially in comparison with most parser generators).
Conventional parser generators may help
with parsing, but they provide zero help in the hard part of the
process, which happens after you can parse the code.
DMS provides a vast amount of machinery to support analyzing and transforming
the code once your have parsed it.

Parsing, where can I learn about it

I've been given a job of 'translating' one language into another. The source is too flexible (complex) for a simple line by line approach with regex. Where can I go to learn more about lexical analysis and parsers?
If you want to get "emotional" about the subject, pick up a copy of "The Dragon Book." It is usually the text in a compiler design course. It will definitely meet your need "learn more about lexical analysis and parsers" as well as a bunch of other fun stuff!
IMH(umble)O, save yourself an arm and/or leg and buy an older edition - it will fill your information desires.
Try ANLTR:
ANTLR, ANother Tool for Language
Recognition, is a language tool that
provides a framework for constructing
recognizers, interpreters, compilers,
and translators from grammatical
descriptions containing actions in a
variety of target languages.
There's a book for it also.
Niklaus Wirth's book "Compiler Construction" (available as a free PDF)
http://www.google.com/search?q=wirth+compiler+construction
I've recently been working with PLY which is an implementation of lex and yacc in Python. It's quite easy to get started with it and there are some simple examples in the documentation.
Parsing can quickly become a very technical topic and you'll find that you probably won't need to know all the details of the parsing algorithm if you're using a parser builder like PLY.
Lots of people have recommended books. For many these are much more useful in a structured environment with assignments and due dates and so forth. Even if not, having the material presented in a different way can help greatly.
(a) Have you considered going to a school with a decent CS curriculum?
(b) There are lots of online lectures, such as MIT's Open Courseware. Their EE/CS section has many courses that touch on parsing, though I can't see any on parsing per se. It's typically introduced as one of the first theory courses as language classification and automata is at the heart of much of CS theory.
If you prefer Java based tools, the Java Compiler Compiler, JavaCC, is a nice parser/scanner. It's config file driven, and will generate java code that you can include in your program. I haven't used it a couple years though, so I'm not sure how the current version is. You can find out more here: https://javacc.dev.java.net/
Lexing/Parsing + typecheck + code generation is a great CS exercise I would recommend it to anyone wanting a solid basis, so I'm all for the Dragon Book
I found this site helpful:
Lex and YACC primer/HOWTO
The first time I used lex/yacc was for a relatively simple project. This tutorial was all I really needed. When I approached more complex projects later, the familiarity I had from this tutorial and a simple project allowed me to build something fancier.
After taking (quite) a few compilers classes, I've used both The Dragon Book and C&T. I think C&T does a far better job of making compiler construction digestible. Not to take anything away from The Dragon Book, but I think C&T is a far more practical book.
Also, if you like writing in Java, I recommend using JFlex and BYACC/J for your lexing and parsing needs.
Yet another textbook to consider is Programming Language Pragmatics. I prefer it over the Dragon book, but YMMV.
If you're using Perl, yet another tool to consider is Parse::RecDescent.
If you just need to do this translation once and don't know anything about compiler technology, I would suggest that you get as far as you can with some fairly simplistic translations and then fix it up by hand. Yes, it is a lot of work. But it is less work than learning a complex subject and coding up the right solution for one job. That said, you should still learn the subject, but don't let not knowing it be a roadblock to finishing your current project.
Parsing Techniques - A Practical Guide
By Dick Grune and Ceriel J.H. Jacobs
This book (freely available as PDF) gives an extensive overview of different parsing techniques/algorithms. If you really want to understand the different parsing algorithms, this IMO is a better reference than the Dragon Book (as Parsing Techniques focuses entirely on parsing, while the Dragon Book covers parsing only as one - although important - part of the compiler construction process).
flex and bison are the new lex and yacc though. The syntax for BNF is often derided for being a bit obtuse. Some have moved to ANTLR and Ragel for this reason.
If you're not doing much translation, you may one to pull a one-off using multiline regexes with Perl or Ruby. Writing a compatible BNF grammar for an existing language is not a task to be taken lightly.
On the other hand, it is entirely possible to leverage any given language's .l and .y files if they are available as open source. Then, you could construct new code from an existing parse tree.

Resources