"For all" using Apache Jenas rule engine - jena

I'm currently working on some small examples about Apache Jena. What I want to show is universal quantification.
Let's say I have balls that each have a different color. These balls are stored within boxes. I now want to determine whether these boxes only contain balls that have the same color of if they are mixed.
So basically something along these lines:
SAME_COLOR = ∃x∀y:{y in Box a → color of y = x}
I know that this is probably not possible with Jena, and can be converted to the following:
SAME_COLOR = ∃x¬∃y:{y in Box a → color of y != x}
With "not exists" Jena's "NoValue" can be used, however, this does (at least for me) not work and I don't know how to translate above logical representations in Jena. Any thoughts on this?
See the code below, which is the only way I could think of:
(?box, ex:isA, ex:Box)
(?ball, ex:isIn, ?box)
(?ball, ex:hasColor, ?color)
(?ball2, ex:isIn, ?box)
(?ball2, ex:hasColor, ?color2)
NotEqual(?color, ?color2)
->
(?box, ex:hasSomeColors, "No").
(?box, ex:isA, ex:Box)
NoValue(?box, ex:hasSomeColors)
->
(?box, ex:hasSomeColors, "Yes").
A box with mixed content now has both values "Yes" and "No".

I've ran into the same sort of problem, which is more simplified.
The question is how to get a collection of objects or count no. of objects in rule engine.
Given that res:subj ont:has res:obj_xxx(several objects), how to get this value in rule engine?
But I just found a Primitive called Remove(), which may inspire me a bit.

Related

How to get a path for script-fu function: gimp-drawable-edit-stroke-item

I am struggling with getting the actual path (or vector) object from an id. I want to stroke a path and the currently advised way of doing so seems to be the method gimp-drawable-edit-stroke-item. This needs an item as input. By the way I tried to find a list of all predefined types in script-fu but also didn't find anything. So I am not sure what the typ Item really is but it looks like you can pass a vector to it.
All I can find so far to identify a path is using (cadr(gimp-image-get-vectors p-image)) which seems to only give me an id. As the following (gimp-drawable-edit-stroke-item p-drawable (cadr(gimp-image-get-vectors p-image))) leads to an "Error: Invalid type for argument 2 to gimp-drawable-edit-stroke-item".
Having to navigate lists instead of using names for fields is the reason why I never bothered with Scheme/script-fu since Gimp can be scripted in Python.
This said, with my limited Lisp knowledge:
(gimp-image-get-vectors p-image) returns a (count (v1 v2 v3 ...)) list
so (cadr (gimp-image-get-vectors p-image)) returns the list, and not a single item of the list.
You can get the "active path" directly with (gimp-image-get-vectors p-image) (using the paths list doesn't tell you which path in the list is meant by the user anyway).
"I want to stroke a path"
Rather than gimp-drawable-edit-stroke-item, gimp-pencil (or gimp-brush) can do it:
(define (stroke-path drawable color width . path)
(gimp-context-set-line-miter-limit 5) ; default: 10, default mitre up to 60 pixels
(gimp-context-set-stroke-method STROKE-LINE) ; default STROKE-PAINT-METHOD
(gimp-context-set-line-cap-style CAP-BUTT) ; CAP-ROUND, CAP-SQUARE
(gimp-context-set-line-join-style JOIN-ROUND) ; JOIN-MITER, JOIN-ROUND, JOIN-BEVEL
(gimp-context-set-foreground color)
(gimp-context-set-line-width width) ; default: 6
(let ((vec (apply vector path)))
(gimp-pencil drawable (vector-length vec) vec)
)
)

Find all possible pairs between the subsets of N sets with Erlang

I have a set S. It contains N subsets (which in turn contain some sub-subsets of various lengths):
1. [[a,b],[c,d],[*]]
2. [[c],[d],[e,f],[*]]
3. [[d,e],[f],[f,*]]
N. ...
I also have a list L of 'unique' elements that are contained in the set S:
a, b, c, d, e, f, *
I need to find all possible combinations between each sub-subset from each subset so, that each resulting combination has exactly one element from the list L, but any number of occurrences of the element [*] (it is a wildcard element).
So, the result of the needed function working with the above mentioned set S should be (not 100% accurate):
- [a,b],[c],[d,e],[f];
- [a,b],[c],[*],[d,e],[f];
- [a,b],[c],[d,e],[f],[*];
- [a,b],[c],[d,e],[f,*],[*];
So, basically I need an algorithm that does the following:
take a sub-subset from the subset 1,
add one more sub-subset from the subset 2 maintaining the list of 'unique' elements acquired so far (the check on the 'unique' list is skipped if the sub-subset contains the * element);
Repeat 2 until N is reached.
In other words, I need to generate all possible 'chains' (it is pairs, if N == 2, and triples if N==3), but each 'chain' should contain exactly one element from the list L except the wildcard element * that can occur many times in each generated chain.
I know how to do this with N == 2 (it is a simple pair generation), but I do not know how to enhance the algorithm to work with arbitrary values for N.
Maybe Stirling numbers of the second kind could help here, but I do not know how to apply them to get the desired result.
Note: The type of data structure to be used here is not important for me.
Note: This question has grown out from my previous similar question.
These are some pointers (not a complete code) that can take you to right direction probably:
I don't think you will need some advanced data structures here (make use of erlang list comprehensions). You must also explore erlang sets and lists module. Since you are dealing with sets and list of sub-sets, they seems like an ideal fit.
Here is how things with list comprehensions will get solved easily for you: [{X,Y} || X <- [[c],[d],[e,f]], Y <- [[a,b],[c,d]]]. Here i am simply generating a list of {X,Y} 2-tuples but for your use case you will have to put real logic here (including your star case)
Further note that with list comprehensions, you can use output of one generator as input of a later generator e.g. [{X,Y} || X1 <- [[c],[d],[e,f]], X <- X1, Y1 <- [[a,b],[c,d]], Y <- Y1].
Also for removing duplicates from a list of things L = ["a", "b", "a"]., you can anytime simply do sets:to_list(sets:from_list(L)).
With above tools you can easily generate all possible chains and also enforce your logic as these chains get generated.

Mathematica's TextRecognize not up to par

Please take a look at the screenshot below and see if you can tell me why this won't work. The examples in on the reference page for TextRecognize look pretty impressive, I don't think recognizing single letters like this should be a problem. I've tried resizing the letters as well as having the image sharpened.
For convenience in case you want to try this yourself I have included the image that I use at the bottom of this post. You can also find plenty more like this by searching for "Wordfeud" in Google Image Search.
Very cool question!
TextRecognize uses heuristics to recognize whole words from the English language. This is
the gotcha that makes recognizing single letters very hard
Consider the following line of thought:
s = Import["http://i.stack.imgur.com/JHYuh.png"];
p = ImagePartition[s, 32]
Now pick letters to form the English word 'EXIT':
x = {p[[1, 13]], p[[6, 6]], p[[3, 13]], p[[1, 12]]}
Now clean up these images a bit, like so:
d = ImageAssemble[ Map[ImageTake[#, {3, 27}, {2, 20}] &, x ]];
Then this returns the string "EXIT":
TextRecognize[d]
This is an approach completely different from using TextRecognize, so I am posting this as a separate answer. It uses the same image recognition technique from the How do I find Waldo with Mathematica.
First get the puzzle:
wordfeud = Import["http://i.stack.imgur.com/JHYuh.png"]
And then get the pieces of the puzzle:
Grid[pieces = ImagePartition[s, 32]]
Let's be interested in the letter E:
LetterE = pieces[[4, 3]]
Get the correlation image:
correlation =
ImageCorrelate[wordfeud, Binarize[LetterE],
NormalizedSquaredEuclideanDistance]
And highlight the matches:
positions = Dilation[ColorNegate[Binarize[correlation, .1]], DiskMatrix[20]];
found = ImageMultiply[wordfeud, ImageAdd[ColorConvert[positions, "GrayLevel"], .5]]
As before, this requires a bit of tuning on binarizing the correlation image, but other than
that this should help to identify bits and pieces of this puzzle.
I thought the quality of your image might be interfering. Binarizing your image did not help : recognition was zilch. I also tried a very sharp black and white image of a crossword puzzle solution. (see below) Again, nothing was recognized whether in regular or binarized format.
So I removed the black background leaving only the letters and their thin black frames. Again, recognition was about 0%.
When I removed the frames from around some of the letters AND binarized the image the only parts that were recognizable were those regions in which there was nothing but letters. (see below)
Notice in the output below, ANTS, TIRES, and TEXAS are correctly identified (as well as VECTORS), but just about nothing else.
Notice also that, even though the strings were widely spaced, mma interpreted them as words, rather than separate letters. Note "TEXAS" instead of "T E X A S".
TextRecognize[Binarize#img]
(* output *)
ANTS FFWWW FEEWF
E R o If IU I?
E A FI5F WWWFF 5
5552? L E F F
T s E NTT BT|
H0RWW#0WVlWF;EE F
5 W E ; OCS
FOFT W W R AL%AE
A TT I T ? _
i iE#W'NF WG%S W
A A EW F I i
SWWTW W ALTFCWD N
H A V 5 A F F
PLATT EWWLIGHT
W N E T
HE TIRES C
TEXAS VECTORS
I didn't have the patience to completely clean up the image. It would have been much faster to retype the text by hand.
Conclusion: Don't use text recognition in mma unless you have absolutely clear text against an even-colored, bright, preferrably white, background.
The results also varied depending on the file format used. Avoid .pdf altogether.
Edit
acl captured and tried to recognize the last 5 lines (above Edit). His results (in a comment below): mostly gibberish.
I decided to do the same. But since Prashant warned that text size makes a difference, I zoomed in first so that the text appear (to my eyes) to be about 20 pica. Below is the picture of the text I scanned and TextRecognized.
Here's the result of an unbinarized TextRecognize (at that large size):
Gliii. Q lk-ii`t`*¥ if EY £\[CloseCurlyDoubleQuote]1\[Euro]'EE \
Di'¥C~E\"P ITF SKI' T»f}!E'!',IL:?E\[CloseCurlyDoubleQuote] I 2 VEEE5\
\[CloseCurlyQuote] LEP \"- \"VE
1. ur e=\\..r.1.»».»\\\\ rw r 1»»\\|a'*r | r .fm -»'-an \
\[OpenCurlyQuote] -.-rr -_.»~|-.'i~-.w~,.-- nv n.w~»-\
\[OpenCurlyDoubleQuote]~"
Now, here's the result for the TextRecognize of the binarized image. The original image was a .png from Jing.
I didn't have the patience to completely clean up the image. It would \
have been much faster to retype the
text by hand.
Conclusion: Don't use text recognition in mma unless you have \
absolutely clear text against an even-
colored, bright, preferrably white, background.
The results also varied depending on the file format used. Avoid .pdf \
altogether.

Creating a valid function declaration from a complex tuple/list structure

Is there a generic way, given a complex object in Erlang, to come up with a valid function declaration for it besides eyeballing it? I'm maintaining some code previously written by someone who was a big fan of giant structures, and it's proving to be error prone doing it manually.
I don't need to iterate the whole thing, just grab the top level, per se.
For example, I'm working on this right now -
[[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"],
[{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.5","5060"},
[{'via-branch',"z9hG4bKb561e4f03a40c4439ba375b2ac3c9f91.0"}]}]},
{'Via',
[{'via-parm',
{'sent-protocol',"SIP","2.0","UDP"},
{'sent-by',"172.20.10.15","5060"},
[{'via-branch',"12dee0b2f48309f40b7857b9c73be9ac"}]}]},
{'From',
{'from-spec',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"003018CFE4EF"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"b7226ffa86c46af7bf6e32969ad16940"}]}},
{'To',
{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3966"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[{tag,"a830c764"}]},
{'Call-ID',"90df0e4968c9a4545a009b1adf268605#172.20.10.15"},
{'CSeq',1358286,"SUBSCRIBE"},
["date",'HCOLON',
["Mon",44,32,["13",32,"Jun",32,"2011"],32,["17",58,"03",58,"55"],32,"GMT"]],
{'Contact',
[[{'name-addr',
[[]],
{'SIP-URI',
[{userinfo,{user,"3ComCallProcessor"},[]}],
{hostport,"172.20.10.11",[]},
{'uri-parameters',[]},
[]}},
[]],
[]]},
["expires",'HCOLON',3600],
["user-agent",'HCOLON',
["3Com",[]],
[['LWS',["VCX",[]]],
['LWS',["7210",[]]],
['LWS',["IP",[]]],
['LWS',["CallProcessor",[['SLASH',"v10.0.8"]]]]]],
["proxy-authenticate",'HCOLON',
["Digest",'LWS',
["realm",'EQUAL',['SWS',34,"3Com",34]],
[['COMMA',["domain",'EQUAL',['SWS',34,"3Com",34]]],
['COMMA',
["nonce",'EQUAL',
['SWS',34,"btbvbsbzbBbAbwbybvbxbCbtbzbubqbubsbqbtbsbqbtbxbCbxbsbybs",
34]]],
['COMMA',["stale",'EQUAL',"FALSE"]],
['COMMA',["algorithm",'EQUAL',"MD5"]]]]],
{'Content-Length',0}],
"\r\n",
["\n"]]
Maybe https://github.com/etrepum/kvc
I noticed your clarifying comment. I'd prefer to add a comment myself, but don't have enough karma. Anyway, the trick I use for that is to experiment in the shell. I'll iterate a pattern against a sample data structure until I've found the simplest form. You can use the _ match-all variable. I use an erlang shell inside an emacs shell window.
First, bind a sample to a variable:
A = [{a,b},[{c,d}, {e,f}]].
Now set the original structure against the variable:
[{a,b},[{c,d},{e,f}]] = A.
If you hit enter, you'll see they match. Hit alt-p (forget what emacs calls alt, but it's alt on my keyboard) to bring back the previous line. Replace some tuple or list item with an underscore:
[_,[{c,d},{e,f}]].
Hit enter to make sure you did it right and they still match. This example is trivial, but for deeply nested, multiline structures it's trickier, so it's handy to be able to just quickly match to test. Sometimes you'll want to try to guess at whole huge swaths, like using an underscore to match a tuple list inside a tuple that's the third element of a list. If you place it right, you can match the whole thing at once, but it's easy to misread it.
Anyway, repeat to explore the essential shape of the structure and place real variables where you want to pull out values:
[_, [_, _]] = A.
[_, _] = A.
[_, MyTupleList] = A. %% let's grab this tuple list
[{MyAtom,b}, [{c,d}, MyTuple]] = A. %% or maybe we want this atom and tuple
That's how I efficiently dissect and pattern match complex data structures.
However, I don't know what you're doing. I'd be inclined to have a wrapper function that uses KVC to pull out exactly what you need and then distributes to helper functions from there for each type of structure.
If I understand you correctly you want to pattern match some large datastructures of unknown formatting.
Example:
Input: {a, b} {a,b,c,d} {a,[],{},{b,c}}
function({A, B}) -> do_something;
function({A, B, C, D}) when is_atom(B) -> do_something_else;
function({A, B, C, D}) when is_list(B) -> more_doing.
The generic answer is of course that it is undecidable from just data to know how to categorize that data.
First you should probably be aware of iolists. They are created by functions such as io_lib:format/2 and in many other places in the code.
One example is that
[["SIP",47,"2",46,"0"],32,"407",32,"Proxy Authentication Required","\r\n"]
will print as
SIP/2.0 407 Proxy Authentication Required
So, I'd start with flattening all those lists, using a function such as
flatten_io(List) when is_list(List) ->
Flat = lists:map(fun flatten_io/1, List),
maybe_flatten(Flat);
flatten_io(Tuple) when is_tuple(Tuple) ->
list_to_tuple([flatten_io(Element) || Element <- tuple_to_list(Tuple)];
flatten_io(Other) -> Other.
maybe_flatten(L) when is_list(L) ->
case lists:all(fun(Ch) when Ch > 0 andalso Ch < 256 -> true;
(List) when is_list(List) ->
lists:all(fun(X) -> X > 0 andalso X < 256 end, List);
(_) -> false
end, L) of
true -> lists:flatten(L);
false -> L
end.
(Caveat: completely untested and quite inefficient. Will also crash for inproper lists, but you shouldn't have those in your data structures anyway.)
On second thought, I can't help you. Any data structure that uses the atom 'COMMA' for a comma in a string should be taken out and shot.
You should be able to flatten those things as well and start to get a view of what you are looking at.
I know that this is not a complete answer. Hope it helps.
Its hard to recommend something for handling this.
Transforming all the structures in a more sane and also more minimal format looks like its worth it. This depends mainly on the similarities in these structures.
Rather than having a special function for each of the 100 there must be some automatic reformatting that can be done, maybe even put the parts in records.
Once you have records its much easier to write functions for it since you don't need to know the actual number of elements in the record. More important: your code won't break when the number of elements changes.
To summarize: make a barrier between your code and the insanity of these structures by somehow sanitizing them by the most generic code possible. It will be probably a mix of generic reformatting with structure speicific stuff.
As an example already visible in this struct: the 'name-addr' tuples look like they have a uniform structure. So you can recurse over your structures (over all elements of tuples and lists) and match for "things" that have a common structure like 'name-addr' and replace these with nice records.
In order to help you eyeballing you can write yourself helper functions along this example:
eyeball(List) when is_list(List) ->
io:format("List with length ~b\n", [length(List)]);
eyeball(Tuple) when is_tuple(Tuple) ->
io:format("Tuple with ~b elements\n", [tuple_size(Tuple)]).
So you would get output like this:
2> eyeball({a,b,c}).
Tuple with 3 elements
ok
3> eyeball([a,b,c]).
List with length 3
ok
expansion of this in a useful tool for your use is left as an exercise. You could handle multiple levels by recursing over the elements and indenting the output.
Use pattern matching and functions that work on lists to extract only what you need.
Look at http://www.erlang.org/doc/man/lists.html:
keyfind, keyreplace, L = [H|T], ...

splitting space delimited entries into new columns in R

I am coding a survey that outputs a .csv file. Within this csv I have some entries that are space delimited, which represent multi-select questions (e.g. questions with more than one response). In the end I want to parse these space delimited entries into their own columns and create headers for them so i know where they came from.
For example I may start with this (note that the multiselect columns have an _M after them):
Q1, Q2_M, Q3, Q4_M
6, 1 2 88, 3, 3 5 99
6, , 3, 1 2
and I want to go to this:
Q1, Q2_M_1, Q2_M_2, Q2_M_88, Q3, Q4_M_1, Q4_M_2, Q4_M_3, Q4_M_5, Q4_M_99
6, 1, 1, 1, 3, 0, 0, 1, 1, 1
6,,,,3,1,1,0,0,0
I imagine this is a relatively common issue to deal with but I have not been able to find it in the R section. Any ideas how to do this in R after importing the .csv ? My general thoughts (which often lead to inefficient programs) are that I can:
(1) pull column numbers that have the special suffix with grep()
(2) loop through (or use an apply) each of the entries in these columns and determine the levels of responses and then create columns accordingly
(3) loop through (or use an apply) and place indicators in appropriate columns to indicate presence of selection
I appreciate any help and please let me know if this is not clear.
I agree with ran2 and aL3Xa that you probably want to change the format of your data to have a different column for each possible reponse. However, if you munging your dataset to a better format proves problematic, it is possible to do what you asked.
process_multichoice <- function(x) lapply(strsplit(x, " "), as.numeric)
q2 <- c("1 2 3 NA 4", "2 5")
processed_q2 <- process_multichoice(q2)
[[1]]
[1] 1 2 3 NA 4
[[2]]
[1] 2 5
The reason different columns for different responses are suggested is because it is still quite unpleasant trying to retrieve any statistics from the data in this form. Although you can do things like
# Number of reponses given
sapply(processed_q2, length)
#Frequency of each response
table(unlist(processed_q2), useNA = "ifany")
EDIT: One more piece of advice. Keep the code that processes your data separate from the code that analyses it. If you create any graphs, keep the code for creating them separate again. I've been down the road of mixing things together, and it isn't pretty. (Especially when you come back to the code six months later.)
I am not entirely sure what you trying to do respectively what your reasons are for coding like this. Thus my advice is more general – so just feel to clarify and I will try to give a more concrete response.
1) I say that you are coding the survey on your own, which is great because it means you have influence on your .csv file. I would NEVER use different kinds of separation in the same .csv file. Just do the naming from the very beginning, just like you suggested in the second block.
Otherwise you might geht into trouble with checkboxes for example. Let's say someone checks 3 out of 5 possible answers, the next only checks 1 (i.e. "don't know") . Now it will be much harder to create a spreadsheet (data.frame) type of results view as opposed to having an empty field (which turns out to be an NA in R) that only needs to be recoded.
2) Another important question is whether you intend to do a panel survey(i.e longitudinal study asking the same participants over and over again) . That (among many others) would be a good reason to think about saving your data to a MySQL database instead of .csv . RMySQL can connect directly to the database and access its tables and more important its VIEWS.
Views really help with survey data since you can rearrange the data in different views, conditional on many different needs.
3) Besides all the personal / opinion and experience, here's some (less biased) literature to get started:
Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology
The book is comparatively simple and leaves out panel surveys but gives a lot of R Code and examples which should be a practical start.
To prevent re-inventing the wheel you might want to check LimeSurvey, a pretty decent (not speaking of the templates :) ) tool for survey conductors. Besides I TYPO3 CMS extensions pbsurvey and ke_questionnaire (should) work well too (only tested pbsurvey).
Multiple choice items should always be coded as separate variables. That is, if you have 5 alternatives and multiple choice, you should code them as i1, i2, i3, i4, i5, i.e. each one is a binary variable (0-1). I see that you have values 3 5 99 for Q4_M variable in the first example. Does that mean that you have 99 alternatives in an item? Ouch...
First you should go on and create separate variables for each alternative in a multiple choice item. That is, do:
# note that I follow your example with Q4_M variable
dtf_ins <- as.data.frame(matrix(0, nrow = nrow(<initial dataframe>), ncol = 99))
# name vars appropriately
names(dtf_ins) <- paste("Q4_M_", 1:99, sep = "")
now you have a data.frame with 0s, so what you need to do is to get 1s in an appropriate position (this is a bit cumbersome), a function will do the job...
# first you gotta change spaces to commas and convert character variable to a numeric one
y <- paste("c(", gsub(" ", ", ", x), ")", sep = "")
z <- eval(parse(text = y))
# now you assing 1 according to indexes in z variable
dtf_ins[1, z] <- 1
And that's pretty much it... basically, you would like to reconsider creating a data.frame with _M variables, so you can write a function that does this insertion automatically. Avoid for loops!
Or, even better, create a matrix with logicals, and just do dtf[m] <- 1, where dtf is your multiple-choice data.frame, and m is matrix with logicals.
I would like to help you more on this one, but I'm recuperating after a looong night! =) Hope that I've helped a bit! =)
Thanks for all the responses. I agree with most of you that this format is kind of silly but it is what I have to work with (survey is coded and going into use next week). This is what I came up with from all the responses. I am sure this is not the most elegant or efficient way to do it but I think it should work.
colnums <- grep("_M",colnames(dat))
responses <- nrow(dat)
for (i in colnums) {
vec <- as.vector(dat[,i]) #turn into vector
b <- lapply(strsplit(vec," "),as.numeric) #split up and turn into numeric
c <- sort(unique(unlist(b))) #which values were used
newcolnames <- paste(colnames(dat[i]),"_",c,sep="") #column names
e <- matrix(nrow=responses,ncol=length(c)) #create new matrix for indicators
colnames(e) <- newcolnames
#next loop looks for responses and puts indicators in the correct places
for (i in 1:responses) {
e[i,] <- ifelse(c %in% b[[i]],1,0)
}
dat <- cbind(dat,e)
}
Suggestions for improvement are welcome.

Resources