How to measure the performance of Knowledge-Based Systems (KBSs) built with ontology? - ontology

I have developed an Ontology, and the next step is to build a knowledge-based system (KBS) by using the ontology as the knowledge base. However, I don't how to measure the performance of an ontology-based KBS against traditional computer application or other knowledge-based systems which built using other knowledge representation method. How can I do this? How can I prove that my ontology-based KBS is better than other computer application in some ways?

Related

Can I use drake to test Visual SLAM algorithms?

I was wondering whether I could leverage the modularity drake gives to test Visual SLAM algorithms on realtime data. I would like to create 3 blocks that output acceleration, angular speed, and RGBD data. The blocks should pull information from a real sensor. Another block would process the data and produce the current transform of the camera and a global map. Effectively, I would like to cast my problem into a "Systems" framework so I can easily add filters where I need them.
My question is: Given other people's experience with this library, is Drake the right tool for the job for this usecase? Specifically, can I use this library to process real time information in a production setting?
Visual SLAM is not a use case I've implemented myself, but I believe the Drake Systems framework should be up to the task, depending on what you mean by "realtime".
We definitely ship RGBD data through the framework often.
We haven't made any attempt to support running Drake in hard realtime, but certainly can run at high rates. If you were to hit a performance bottleneck, we tend to be pretty responsive and would welcome PRs.
As for the "production-level", it is certainly our intention for the code / process to be mature enough for that setting, and numerous teams do already.

How to create a custom environment using OpenAI gym for reinforcement learning

I am a newbie in reinforcement learning working on a college project. The project is related to optimizing x86 hardware power. I am running proprietary software in Linux distribution (16.04). The goal is to use reinforcement learning and optimize the power of the System (keeping the performance degradation of the software as minimum as possible). The proprietary software is a cellular network.
As we already know, the primary functional blocks of Reinforcement learning are Agent and Environment. The basic idea is to use the cellular network running on x86 hardware as the environment for RL. This environment interacts with the agent implementing RL using state, actions, and reward.
From reading different materials, I could understand that I need to make my software as a custom environment from where I can retrieve the state features. The state features are the application layer KPIs like latency, throughput. Action space may include instructions to Linux to change the power (I can use some predefined set of power options). I did not decide about the reward function.
I read this post and decided that I should use OpenAI gym to create my custom environment.
My doubt is that using OpenAI gym for creating custom environments (for these type of setup) is correct. Am I going in the right direction (or) is there any alternative/best tools to create a custom environment. any tutorial or direction to create this custom environment is appreciated.

Splitting OpenCV operations between frontend and backend processors

Is it possible to split an OpenCV application into a frontend and
backend modules, such that frontend runs on thin-clients that have
very limited processing power (running Intel Atom dual-core
processors, with 1-2GB RAM), and backend does most the computational
heavy-lifting s.a. using Google Compute Engine ?
Is this possible
with an additional constraint of the network communication between
frontend and backend being not fast, s.a. being limited to say
128-256kbps ?
Are there any precedents of this kind ? Is there any such opensource
project ?
Are there some common architectural patters that could help
in such design ?
Additional clarification:
The front-end node, need NOT be purely a front-end, as in running the user-interface. I would imagine that certain OpenCV algorithms could be run on the front-end node, that is especially useful in reducing the amount of data that needs to be sent to the back-end for processing (s.a. colour-space transformation, conversion to grayscale, histogram etc.). I've successfully tested real-time face-detection (Haar cascade) on this low-end machine, in realtime, so the frontend node can pull some workload. In fact, I'd prefer to do most of the work in the frontend, and only push those computation heavy aspects to the backend, that are clearly and definitely well beyond the computational power of the frontend computer.
What I am looking for are suggestions/ideas on nature of algorithms that are best run on Google Compute Engine, and some architectural patterns that are tried & tested, for use with OpenCV to achieve such a split.

Machine Learning & Big Data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In the beginning, I would like to describe my current position and the goal that I would like to achieve.
I am a researcher dealing with machine learning. So far have gone through several theoretical courses covering machine learning algorithms and social network analysis and therefore have gained some theoretical concepts useful for implementing machine learning algorithms and feed in the real data.
On simple examples, the algorithms work well and the running time is acceptable whereas the big data represent a problem if trying to run algorithms on my PC. Regarding the software I have enough experience to implement whatever algorithm from articles or design my own using whatever language or IDE (so far have used Matlab, Java with Eclipse, .NET...) but so far haven't got much experience with setting-up infrastructure. I have started to learn about Hadoop, NoSQL databases, etc, but I am not sure what strategy would be the best taking into consideration the learning time constraints.
The final goal is to be able to set-up a working platform for analyzing big data with focusing on implementing my own machine learning algorithms and put all together into production, ready for solving useful question by processing big data.
As the main focus is on implementing machine learning algorithms I would like to ask whether there is any existing running platform, offering enough CPU resources to feed in large data, upload own algorithms and simply process the data without thinking about distributed processing.
Nevertheless, such a platform exists or not, I would like to gain a picture big enough to be able to work in a team that could put into production the whole system tailored upon the specific customer demands. For example, a retailer would like to analyze daily purchases so all the daily records have to be uploaded to some infrastructure, capable enough to process the data by using custom machine learning algorithms.
To put all the above into simple question: How to design a custom data mining solution for real-life problems with main focus on machine learning algorithms and put it into production, if possible, by using the existing infrastructure and if not, design distributed system (by using Hadoop or whatever framework).
I would be very thankful for any advice or suggestions about books or other helpful resources.
First of all, your question needs to define more clearly what you intend by Big Data.
Indeed, Big Data is a buzzword that may refer to various size of problems. I tend to define Big Data as the category of problems where the Data size or the Computation time is big enough for "the hardware abstractions to become broken", which means that a single commodity machine cannot perform the computations without intensive care of computations and memory.
The scale threshold beyond which data become Big Data is therefore unclear and is sensitive to your implementation. Is your algorithm bounded by Hard-Drive bandwidth ? Does it have to feet into memory ? Did you try to avoid unnecessary quadratic costs ? Did you make any effort to improve cache efficiency, etc.
From several years of experience in running medium large-scale machine learning challenge (on up to 250 hundreds commodity machine), I strongly believe that many problems that seem to require distributed infrastructure can actually be run on a single commodity machine if the problem is expressed correctly. For example, you are mentioning large scale data for retailers. I have been working on this exact subject for several years, and I often managed to make all the computations run on a single machine, provided a bit of optimisation. My company has been working on simple custom data format that allows one year of all the data from a very large retailer to be stored within 50GB, which means a single commodity hard-drive could hold 20 years of history. You can have a look for example at : https://github.com/Lokad/lokad-receiptstream
From my experience, it is worth spending time in trying to optimize algorithm and memory so that you could avoid to resort to distributed architecture. Indeed, distributed architectures come with a triple cost. First of all, the strong knowledge requirements. Secondly, it comes with a large complexity overhead in the code. Finally, distributed architectures come with a significant latency overhead (with the exception of local multi-threaded distribution).
From a practitioner point of view, being able to perform a given data mining or machine learning algorithm in 30 seconds is one the key factor to efficiency. I have noticed than when some computations, whether sequential or distributed, take 10 minutes, my focus and efficiency tend to drop quickly as it becomes much more complicated to iterate quickly and quickly test new ideas. The latency overhead introduced by many of the distributed frameworks is such that you will inevitably be in this low-efficiency scenario.
If the scale of the problem is such that even with strong effort you cannot perform it on a single machine, then I strongly suggest to resort to on-shelf distributed frameworks instead of building your own. One of the most well known framework is the MapReduce abstraction, available through Apache Hadoop. Hadoop can be run on 10 thousands nodes cluster, probably much more than you will ever need. If you do not own the hardware, you can "rent" the use of a Hadoop cluster, for example through Amazon MapReduce.
Unfortunately, the MapReduce abstraction is not suited to all Machine Learning computations.
As far as Machine Learning is concerned, MapReduce is a rigid framework and numerous cases have proved to be difficult or inefficient to adapt to this framework:
– The MapReduce framework is in itself related to functional programming. The
Map procedure is applied to each data chunk independently. Therefore, the
MapReduce framework is not suited to algorithms where the application of the
Map procedure to some data chunks need the results of the same procedure to
other data chunks as a prerequisite. In other words, the MapReduce framework
is not suited when the computations between the different pieces of data are
not independent and impose a specific chronology.
– MapReduce is designed to provide a single execution of the map and of the
reduce steps and does not directly provide iterative calls. It is therefore not
directly suited for the numerous machine-learning problems implying iterative
processing (Expectation-Maximisation (EM), Belief Propagation, etc.). The
implementation of these algorithms in a MapReduce framework means the
user has to engineer a solution that organizes results retrieval and scheduling
of the multiple iterations so that each map iteration is launched after the reduce
phase of the previous iteration is completed and so each map iteration is fed
with results provided by the reduce phase of the previous iteration.
– Most MapReduce implementations have been designed to address production needs and
robustness. As a result, the primary concern of the framework is to handle
hardware failures and to guarantee the computation results. The MapReduce efficiency
is therefore partly lowered by these reliability constraints. For example, the
serialization on hard-disks of computation results turns out to be rather costly
in some cases.
– MapReduce is not suited to asynchronous algorithms.
The questioning of the MapReduce framework has led to richer distributed frameworks where more control and freedom are left to the framework user, at the price of more complexity for this user. Among these frameworks, GraphLab and Dryad (both based on Direct Acyclic Graphs of computations) are well-known.
As a consequence, there is no "One size fits all" framework, such as there is no "One size fits all" data storage solution.
To start with Hadoop, you can have a look at the book Hadoop: The Definitive Guide by Tom White
If you are interested in how large-scale frameworks fit into Machine Learning requirements, you may be interested by the second chapter (in English) of my PhD, available here: http://tel.archives-ouvertes.fr/docs/00/74/47/68/ANNEX/texfiles/PhD%20Main/PhD.pdf
If you provide more insight about the specific challenge you want to deal with (type of algorithm, size of the data, time and money constraints, etc.), we probably could provide you a more specific answer.
edit : another reference that could prove to be of interest : Scaling-up Machine Learning
I had to implement a couple of Data Mining algorithms to work with BigData too, and I ended up using Hadoop.
I don't know if you are familiar to Mahout (http://mahout.apache.org/), which already has several algorithms ready to use with Hadoop.
Nevertheless, if you want to implement your own Algorithm, you can still adapt it to Hadoop's MapReduce paradigm and get good results. This is an excellent book on how to adapt Artificial Intelligence algorithms to MapReduce:
Mining of Massive Datasets - http://infolab.stanford.edu/~ullman/mmds.html
This seems to be an old question. However given your usecase, the main frameworks focusing on Machine Learning in Big Data domain are Mahout, Spark (MLlib), H2O etc. However to run Machine Learning algorithms on Big Data you have to convert them to parallel programs based on Map Reduce paradigm. This is a nice article giving a brief introduction to major (not all) big Data frameworks:
http://www.codophile.com/big-data-frameworks-every-programmer-should-know/
I hope this will help.

Robotic Applications In Erlang

I want to use Erlang for implementing a Robotic Application. Most current real world applications implemented in Erlang are web based. Robot implemented by Prof. corrado didn't utilize concurrency of Erlang which is the heart of Erlang and concentrated on Artificial intelligence more(according to my understanding of that project).
Some ideas come to my mind are like Soccor Robots, Multiple robots cleaning a room but in such systems Robots can be programmed in C (or any other programming language) which can be controlled using MATLAB. MATLAB helps in image processing(vision system) and solving complex array calculations, so what is the point of using Erlang?.(please correct me if I am wrong)
Can somebody suggest me some Robotic Application which can utilize Erlang's feature especially concurrency and one can argue that Erlang is best suited for such application over other languages.
Little bit detailed answer would help me a lot.
But Erlang might not be the best approach to any robot application at all. But you can go for the less ambitious thesis of demonstrating that Erlang supports a computing model that is important in many robotics applications. The points of using Erlang for robotics include
concurrency to model and monitor a concurrent world;
distribution of sensor, actuator and computing resources;
state machines to tie the behaviors together; and
supervisors for fault tolerance.
Anything can be done in any language, but Erlang makes some things convenient, particularly on the architectural level.
Chapter 14 in Concurrent Programming in Erlang for example models a lift control system by one process for each lift and one for each floor, and later discusses the process structure for a satellite control system. Lifts or satellites maybe aren't very robot-like, but the principles are the same.
The Erlang & Robotics work by Corrado Santoro et al. makes plenty use of concurrency. Their 2007 mobile robot project has bunch of different (concurrent) OTP behaviors that range from low level I/O to high level planning. Teaching Erlang using robotics and player/stage is another recent work.
Your Robot Soccer or Cleaning Robot ideas are fine and have plenty of scope for concurrency and inter-robot communication. But you don't just do an arbitrary robot application of that size. Either you have a team and some specific robots to work on, or you get yourself a simulator (get a simulator either way).
Try simulating a number of robots that steer towards each other until they all collide, each robot running a process of its own. When that is working, replace the task and add processes that (pretend to) control motors, feel walls, see the environment, understand user commands, break down, etc., and exchange messages with other robots and planning processes.
Read up on robotic systems architectures to understand that such designs are common and why. Did Erlang facilitate this type of programming?
the only way to understand where erlang is best and when it's best, is to program in erlang for awhile. It's not about what erlang does best, it's more about understanding how a functional design pattern using erlang's small spawns and otp compares to trying to solve the same problems in a imperative language. A short bulleted pros and cons list will not do justice. Google's go routines and channels, haskell and D can act similarly without any problem. Erlang is particularly good at being distributed. In newer concurrent languages you don't really have to work too hard to make things concurrent. In erlang in particular, if you're making your spawns evenly do the cpu intensive tasks you are using concurrency. You won't find much information on erlang for robotics, but you'll find 3 books with lots of examples on making things distributed and concurrent which tackle many of the same problems. The other concurrent languages don't usually specialize in such a way. Many useful primitives are built into erlang.
A lot of posts have been made about doing similar things in other languages but sticking with erlang saves a lot of work.
http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=%22erlang+style%22+%22(concurrency|message+passing)%22
http://groups.google.com/group/golang-dev/browse_thread/thread/e120a586441b9b24/806eab93bd5281a0?#806eab93bd5281a0

Resources