OWL/RDF Automated Planner - machine-learning

Is there any software that acts as an intersection between contemporary OWL/RDF reasoners, and the older STRIPS-style automated planners and schedulers? Both systems make use of RETE-based pattern matching, but only the automated planners seem to formalise the concept of an "action". Unfortunately, all the projects I've found that implemented automated planning, like Graphplan or SOAR, seem to be dead or dying, and never seemed to scale well to begin with. Current data stores are implemented on RDMS and can scale to and reason over millions of triples, but I haven't found any that specifically try and reason over actions. I can envision how the concept of actions might be represented in traditional RDF, but I'm sure it would still be very complicated and hackish without official support. Unfortunately, I can't find much prior art. Has this been done before?

Drools Planner (open source, java, ASL) sits on top of the RETE based rule engine Drools Expert and formalizes the concept of a Move, which might or might not be the action you're looking for. It excels at scaling out, both in data as in planning constraints. And it's production ready and has a complete reference manual.
There is some research going on to do OWL with Drools Expert, but I don't know how far that is at this point.

Related

What's the purpose and mechanism of Ontology in D3WEB

In the expert system D3WEB, it is possible to insert\develop\use Ontology. However, I cannot get the point what's the purpose to introduce ontology in D3WEB?
The nice example on this page, https://www.d3web.de/Wiki.jsp?page=Demo%20-%20Ontology , shows how to develop an ontology in D3WEB. In my opinion, it can be more efficiently developed using Protégé. If the contents shall be changed with a real application, for instance, an ontology about 'dog', in the real application there could be instance dog A, B, C, D. It might be not feasible to 'insert' the instances into the D3WEB knowledge base. However, if the ontology changes over time, how to use the ontology in D3WEB then?
In my opinion, the best way is to develop an ontology outside of D3WEB using Java code. However, I believe the designer of D3WEB would have a nice reason to introduce ontology in D3WEB. I will appreciate it if someone let me know.
This is a somewhat common question we get regarding d3web-KnowWE, one reason might be, that our naming is somewhat misleading. So let me explain.
First there is d3web the java framework to run knowledge bases with strong problem solving knowledge, including rules, decision trees, flow-charts, covering lists, cost-benefit dialog strategies, time based reasoning, and so on. This framework in its core does not provide any GUIs, but is meant to integrate problem solving capabilities in other applications/expert systems. It also does not provide a way to properly create/author the knowledge bases it runs, aside maybe from doing it in the Java code on an API level.
To also provide proper means to author and develop a knowledge base, including some basic dialogs to run, demo, test, and debug the authored knowledge bases, we began working on the wiki system KnowWE, which today is basically a heavily extended JSPWiki. The page d3web.de itself for example is also just a build of KnowWE with specific content.
While we were working on and with KnowWE, we began to really like the approach to edit and author large knowledge bases in this 'wiki way', were you automatically support multiple distributed users to work on the same knowledge base, have automatic versioning, can add nice documentation directly beside the actual formal knowledge, can generate knowledge using script (because it's all just simple text markup), and so forth. Also, the underlying architecture of KnowWE became quite good and mature over the years.
So after some time of this, we found ourselves in the need to also author large ontologies. And yes, Protégé is a nice tool to develop ontologies, but for our use cases, it was just not well suited and we also found it to not scale very well. So we began to implement some simple markups to also allow to also develop ontologies in KnowWE. After then recognizing, that authoring ontologies the 'wiki way' indeed works pretty nicely, we decided to again also share these tools with everybody else on d3web.de. And that is why today you can author/develop both d3web knowledge bases and ontologies in KnowWE, although there is no actual connection/interoperability between both as of now. That would be nice of course and maybe we add this in the future, but for KnowWE is just a development environment for these two knowledge representation.
Maybe you can see KnowWE similar to an IDE like eclipse or IntelliJ, where the same application can be used to develop many different programming languages. KnowWE does the same for different knowledge representations.
A problem is maybe, that historically, we didn't differentiate very well between KnowWE and d3web, because KnowWE was narrowly used to build d3web knowledge bases. We also like to call KnowWE and its distribution package d3web-KnowWE for example. But maybe this should change...
Thanks for pointing this out, I will try to correct/clarify this on d3web.de

Coding Standards for Gremlin/Cypher

We are in the process of developing a review tool for Gremlin/Cypher as we predominantly work with Neo4j graph databases in our project to reduce the manual review effort and also deliver quality code.
Are there any list of coding standards(formatting/performance tips etc.,) for Gremlin and Cypher scripts which can be used as a checklist for performing review of these scripts?
I don't think you're going to find one specific answer, as discussing coding standards can lead to very subjective (and debate-laden) answers. That said: I'll go with something more objective:
First step would be to decide on Gremlin vs Cypher since they're not the same thing nor the same style. When making that decision (and maybe that decision is use both), you should really take a close look at Neo4j 2.0 development (currently at Milestone 4), as Cypher is maturing rapidly and there's a lot of work being put into it, both from expressiveness point of view and performance point of view.
Assuming you go with Cypher, I'd suggest you look at the samples being published by Neo Technology, especially the Cypher learning module. I don't know of any published guidelines, but I'd think most of the guidelines would be similar to any scripting guidelines you already have (such as naming conventions, spacing, etc.). Going further, you're likely going to use Cypher via code as well as through the console. So you'll want to continue using your traditional programming style guidelines, as well as specifying the language-specific library you'll be using.
I can only give you an answer related to Gremlin. First, it is important to note that almost all examples, in the Gremlin wiki, GremlinDocs, the gremlin-users mailing list, etc. are meant for the REPL. Example traversals from these sources tend to be lengthy one-liners, that are written that way for easy transfer via copy/paste to the command line for execution. It is somewhat satisfying to get your answer in one-line in the REPL, but for production code that requires maintenance over time, consider avoiding the temptation to do so, unless there is some specific reason that dictates it.
Second, from a style/formatting perspective, Gremlin is a DSL built on Groovy. Whatever style you like for Groovy should generally work for Gremlin. Start with the Groovy recommendations and tweak them to your needs/liking. I would expect that a tool like CodeNarc would help with general style checks and identifying common Groovy coding problems.
Also see the message from the mailing list https://groups.google.com/forum/#!searchin/neo4j/coding$20standards/neo4j/JYz2APHV-_k/V1BKRwyv5hAJ

Scale now or later?

I am looking to start developing a relatively simple web application that will pull data from various sources and normalizing it. A user can also enter the data directly into the site. I anticipate hitting scale, if successful. Is it worth putting in the time now to use scalable or distributed technologies or just start with a LAMP stack? Framework or not? Any thoughts, suggestions, or comments would help.
Disregard my vague description of the idea, I'd love to share once I get further along.
Later. I can't remember who said it (might have been SO's Jeff Atwood) but it rings true: your first problem is getting other people to care about your work. Worry about scale when they do.
Definitely go with a well structured framework for your own sanity though. Even if it doesn't end up with thousands of users, you'll want to add features as time goes on. Maintaining an expanding codebase without good structure quickly becomes fairly horrible (been there, done that, lost the client).
btw, if you're tempted to write your own framework, be aware that it is a lot of work. My company has an in-house one we're quite proud of, but it's taken 3-4 years to mature.
Is it worth putting in the time now to use scalable or distributed technologies or just start with a LAMP stack?
A LAMP stack is scalable. Apache provides many, many alternatives.
Framework or not?
Always use the highest-powered framework you can find. Write as little code as possible. Get something in front of people as soon as you can.
Focus on what's important: Get something to work.
If you don't have something that works, scalability doesn't matter, does it?
Then read up on optimization. http://c2.com/cgi/wiki?RulesOfOptimization is very helpful.
Rule 1. Don't.
Rule 2. Don't yet.
Rule 3. Profile before Optimizing.
Until you have a working application, you don't know what -- specific -- thing limits your scalability.
Don't assume. Measure.
That means build something that people actually use. Scale comes later.
Absolutely do it later. Scaling pains is a good problem to have, it means people like your project enough to stress the hardware it's running on.
The last company I worked at started fairly small with PHP and the very very first versions of CakePHP that came out (when it was still in beta). Some of the code was dirty, the admin tool was a mess (code-wise), and sure it could have been done better from the start. But do you know what? They got it out the door before their competitors did, and became extremely successful.
When I came on board they were starting to hit the limits of their current potential scalability, and that is when they decided to start looking at CDN's, lighttpd caching techniques, and other ways to clean up the code and make things run smoother when under heavy load. I don't work for them anymore but it was a good experience in growing an architecture beyond what it was originally scoped at.
I can tell you right now if they had tried to do the scalability and optimizations before selling content and getting a website live - they would never have grown to the size they are now. The company is www.beatport.com if you're interested in who I'm talking about (To re-iterate, I'm not trying to advertise them as I am no longer affiliated with them, but it stands as a good case study and it's easier for people to understand what I'm talking about when they see their website).
Personally, after working with Ruby and Rails (and understanding the separation!) for a couple of years, and having experience with PHP at Beatport - I can confidently say that I never want to work with PHP code again =p
Funny to ask "scale now or later?" and label it "ruby on rails".
Actually, Ruby on Rails was created by David Heinemeier Hansson, who has a whole chapter in his book labeled "Scale later" :))
http://gettingreal.37signals.com/ch04_Scale_Later.php
I agree with the earlier respondents -- make it useful, make it work and get people motivated to use it first. I also agree that you should pick off-the shelf components (of which there are many) rather than roll your own, as much as possible. At the same time, make sure that you choose components for your infrastructure that you know to be scalable so that you can go there when you need to, without having to re-write major chunks of your application.
As the Product Manager for Berkeley DB, I've seen countess cases of developers who decided "Oh, we'll just write that to a flat file" or "I can write my own simple B-tree function" or "Database XYZ is 'good enough', I don't have to worry about concurrency or scalability until later". The problem with that approach is that a) you're re-inventing the wheel (and forgoing what others have learned the hard way already) and b) you're ignoring the fact that you'll have to deal with scalability at some point and going with a 'good enough' solution.
Good luck in your implementation.

For distributed applications, which to use, ASIO vs. MPI?

I am a bit confused about this. If you're building a distributed application, which in some cases may perform parallel operations (although not necessarily mathematical), should you use ASIO or something like MPI? I take it MPI is a higher level than ASIO, but it's not clear where in the stack one would begin.
I know nothing about ASIO but from a quick Google it looks to me to be a lot lower level than MPI. For me the whole point of MPI is so that I can program against a higher level of abstraction from the messaging than, it seems, ASIO provides. Where you begin depends on your needs. For mine, parallelising scientific codes for high-performance, the obvious answer is MPI. I'm not sure I'd use it, or at least not sure it would be my default choice, if I were writing more general-purpose distributed, as opposed to parallel, applications. Well, actually, it probably would be my default choice to avoid learning another approach (most of which are less portable and less long-lived than MPI) but I'll admit it might not be the best choice if starting from an equal footing.
As far as I know MPI is currently incapable of handling the situation, when the new distributed nodes want to join the already started group. The problems also may occur if one of the nodes goes offline.
MPI does not reveal any network related machinery that is underneath. Thus if you would ever need something on the lower level -- you're in trouble. If you on the other hand do not aticipate such a need, then you'll save yourself a lot of time using MPI.

How to push Erlang to my workplace

I think Erlang is very well suited for server systems developed in my workplace (currently developed in Java). I am a bit skeptical how this would be accepted both by developers (who have no idea about functional or Erlang) and by managers.
Any ideas on how to approach the issue? I am thinking about some hybrid system, where the hardcore highly reliable infra uses Elrang, and app specific stuff developed in Java (as nodes?)
There are a few approaches, and neither have any guarantees to actually work
Implement something substantial in a short time frame, perhaps using your own time. Don't tell anyone until you have something to display that works. Unless you have a colleague in on it.
Pull up lots of Erlang projects that are good demonstrations of the features you want. Present it to your managers and try to frame them about the risk in keeping using Java with this kind of technology available.
If the company you work for actually have a working code base in Java already, they're not likely to take you seriously when you suggest to rewrite it in another language.
The true test that you believe in Erlang being a much better choice: Quit and start up a competing company and bring the technology insight you have in your current industry. Your managers are really comparing a similar risk-scenario as you would do if you were to quit your job, and they are looking for the same assuring facts for success as you would do, to consider leaving a "safe" paycheck.
As for how to integrate, check out the jinterface application in Erlang. It allows Java code to send messages to Erlang nodes, and it allows Java to expose mailboxes to the Erlang nodes as if there were Erlang processes.
It's all about ROI (Return On Investment) to a manager: a manager will be concerned about performance (of the company). In order to appeal to his business nature, you'll have to make a case for it using dollar$ (or whatever appropriate currency).
Beware that undertaking a "skunkwork" project on the side to "prove" your solution based on Erlang might backfire: "so you had time to play with Erlang, why didn't you spend the time on the project then?" (Of course, not all managers/companies would think this way).
You have to take into account the whole proposal e.g. impact on the team, skills to be developed etc. It's all about money.
If I have an advice for you: start small, plant a seed, nurture it and watch it grow.
A wise man once said to me:
"It's not about technology, it's about
the product & market".
Start by not targetting a rewrite but using erlang for a new feature/project. Rewrites can be expensive and taking a chance on erlang for something that is already a time consuming and costly undertaking is a hard sell. But if there is a new piece that could be done in erlang and java, you stand a better chance. The project will be small enough hopefully that you can discover early if erlang is a good fit and adapt accordingly. And when erlang proves itself in that project you will have better data to make your case with.
We're introducing RabbitMQ into our infrastructure, which currently runs a combination of C++, Java and Python applications. I'm not specifically intending to move the team towards Erlang, but if I were, introducing a well-written third-party tool that just happens to use Erlang is a very good way to get the foot in the door.
One major caveat is that while Erlang is a wonderful language to learn, the surrounding technology (OTP in particular) has a huge learning curve and is extremely primitive in many ways (debugging, IDE's, etc.). It is getting better all the time, but reluctant converts will crucify you if you don't warn them about the pain of learning to program in a radically different environment. Even simple things like the lack of code-sense technology (E.g., type 'foo.' and the IDE tells you what methods you can call on foo) can leave a really bad taste in the mouth.

Resources