String pattern recognition without training set - parsing

I have multiple strings, which are created based on a few (mostly) known variables and a few unknown templates. I'd like to know what those templates were to extract the variable parts from these strings. After that I can relatively easily infer the meaning of each substring, so only the pattern recognition is the question here. For example:
"76 (q) h"
"a x q y 123"
"c x e y 73"
"3 (e) z"
...
# pattern recognition: examples -> templates
"{1} x {2} y {3}"
"{1} ({2}) {3}"
# clusters based on template type
"{1} x {2} y {3}" -> ["a x q y 123", "c x e y 73", ...]
"{1} ({2}) {3}" -> ["76 (q) h", "3 (e) z", ...]
# inference: substrings -> extracted variables
"76 (q) h" -> ["76", "q", "h"] -> {x: "h", y: "q", z: 76}
"a x q y 123" -> ["a", "q", "123"] -> {x: "a", y: "q", z: 123}
"c x e y 73" -> ["c", "e", "73"] -> {x: "c", y: "e", z: 73}
"3 (e) z" -> ["3", "e", "z"] -> {x: "z", y: "e", z: 3}
I have found a similar question: Intelligent pattern matching in string, but in my case there is no way to train the parser with positives. Any idea how to solve this?

It turned out what I need is called sequential pattern mining. There are many algorithms for example SPADE, PrefixSpan, CloSpan, BIDE, etc. What I need is an algorithm, which works with gaps too, or an algorithm which finds the frequent substrings which I can concatenate with wildcards. Selecting the proper pattern from the found frequent closed patterns is far from obvious, I am still working on it, but I am a lot closer now than 2 months ago.

Related

Erlang generic foldl/3 equivalent for binary trees

I need to write the equivalent of lists:foldl/3 for binary trees.
Every node is represented as:
[Value, LeftNode, RightNode]
So a tree with a root 2 and leaves 1 and 3 would look like this:
[2, [1, [], []], [3, [], []]]
Later I want to use that function to perform operations, such as summing up all positive values in that tree etc.
This is the function I wrote:
tree_foldl(_, Acc, []) -> Acc;
tree_foldl(Fun, Acc, [V, L, R]) ->
X1 = tree_foldl(Fun, Fun(V, Acc), L),
X2 = tree_foldl(Fun, Fun(L, X1), R),
X2.
the problem is that it's not truly generic, and when we have, let's say a tree with only a root, e.g.:
[2, [], []]
it calls both Fun(2, 0) and Fun([], 2) and when trying to sum up the values in the second case it tries to do [] + 2, which is an invalid expression. It also would break on more complicated cases.
I would greatly appreciate any help with fixing this function. Thank you.
First of all, you should use tuples as nodes in this case instead of lists. Lists are linked lists between their elements while tuples use contiguous memory (you can access a single element of a tuple without processing the leading elements of structure).
There's no need for Fun(L, X1):
tree_foldl(_, Acc, []) -> Acc;
tree_foldl(Fun, Acc, [V, L, R]) ->
Acc0 = Fun(V, Acc),
Acc1 = tree_foldl(Fun, Acc0, L),
Acc2 = tree_foldl(Fun, Acc1, R),
Acc2.
If the node is empty, do nothing, else run the Fun on the node and recurse on both subtrees.

How do I merge the content of SPSS variables from different columns

I want to create five new variable K1 K2 K3 K4 K5 where the table below will return the content for each in their order of entry as shown on Fig 2
SN ID1 ID2 ID3 ID4 ID5 IE1 IE2 IE3 IE4 IE5
1 a b c d e
2 b a f c k
Fig 2
SN K1 K2 K3 K4 K5
1 a b c d e
2 b a f c k
Here's a possible way to do it:
(first recreating your example data to demonstrate on:)
data list list/ SN (f1) ID1 to ID5 IE1 to IE5 (10a1).
begin data
1, "a", "b", "c", , , "d", "e", , ,
2, "b", "a", , "f", , "c", "k", , ,
end data.
This is your example data, now you can run the following syntax, which will yield the results you expected:
string K1 to K5 (a1).
vector K=K1 to K5.
compute #x=1.
do repeat id=ID1 to IE5.
do if id<>"".
compute K(#x)=id. /* correction made here .
compute #x=#x+1.
end if.
end repeat.

How to solve a system of equations seperated by "and" in maxima?

My student gives me an answer in the form:
x=4 and y=3
Now I want to find out what x and y is in maxima, and give feedback. For example, "x is correct, but y is incorrect". I know that if the student gives the answer as a list, I can do:
solve([x=4, y=3], [x,y])
Is there a way to either convert this and expression to a list, or make maxima find out for me what x and y is directly?
If the input expression is a string, then you can use parse_string:
a: "x=3 and y = 4"$
inpart(parse_string(a),1);
(%o1) x = 3
exp: map(lambda([i],inpart(parse_string(a),i)), [1,2]);
(%o2) [x = 3, y = 4]
solve(exp, [x,y]);
(%o3) [[x = 3, y = 4]]
I assume that you can obtain a Maxima expression from the input via parse_string or some other means.
Let e be the expression. Then subst("and"="[", e) returns an expression which has the operator "[" (i.e., a list) instead of "and".
Another way is to use split:
str:"x=4 and y=3";
spl:split(str,"and");
>>> ["x=4 "," y=3"]
eq:map(parse_string,spl);
>>> [x=4,y=3]

Insert Char at specific position in string Erlang

I wish to insert a character at a specific position in the string in Erlang.
Eg. Suppose i wish to insert "," in string "123456789" at position 3,5,7.
123456789 ~> 12,34,56,789
Any help Appreciated!! Thanks :)
Instead answer as finished solution I show how you easy could found it yourself.
A. Define input data - Str string for transforming, Pos - list with positions for insert.
simple_transform(Str,Positions)->
B. Split the problem on part. What we need? Iterate over list with getting element and index , apply function to them and detect if element in list. That's all. If you need work with list usually you will use module lists from standard library. Look into the documentation and looking for suitable function.
transform - it's lists:map/2
iterate(traversed from left to right) - it's lists:foldl/2;
But since the combination of these two operations are very popular also there is a function that does this. It is lists:mapfoldl/2
detect - it's lists:member/2;
Collect everything together:
simple_transform(Str,Positions)->
{List,_}=lists:mapfoldl(
fun(El,Acc)->case lists:member(Acc,Positions) of
true ->{[$,,El],Acc+1};
false ->{El,Acc+1} end end,1,Str),
lists:flatten(List).
The following solutions require that the positions list be sorted low to high:
1) To insert a single character string:
insert_test() ->
"a,b" = insert(",", "ab", [2]),
",a" = insert(",", "a", [1]),
"ab" = insert(",", "ab", [3]),
"a,b,c" = insert(",", "abc", [2,3]),
all_tests_passed.
insert([InsertChar], String, Positions) ->
insert(InsertChar, String, Positions, 1, []).
insert(InsertChar, [Char|Chars], [Index|Ps], Index, Acc) ->
insert(InsertChar, Chars, Ps, Index+1, [Char,InsertChar|Acc]);
insert(InsertChar, [Char|Chars], Ps, Index, Acc) ->
insert(InsertChar, Chars, Ps, Index+1, [Char|Acc] );
insert(_, [], _, _, Acc) ->
lists:reverse(Acc).
2) To insert a random length string:
insert_test() ->
"a,b" = insert(",", "ab", [2]),
",a" = insert(",", "a", [1]),
"a--b" = insert("--", "ab", [2]),
"--ab" = insert("--", "ab", [1]),
"a--b--c" = insert("--", "abc", [2,3]),
all_tests_passed.
insert(InsertStr, Str, Positions) ->
insert(InsertStr, Str, Positions, 1, []).
insert(InsertStr, [Char|Chars], [Index|Ps], Index, Acc) ->
insert(InsertStr, Chars, Ps, Index+1, combine(InsertStr, Char, Acc) );
insert(InsertStr, [Char|Chars], Ps, Index, Acc) ->
insert(InsertStr, Chars, Ps, Index+1, [Char|Acc]);
insert(_, [], _, Acc, _) ->
lists:reverse(Acc).
combine_test() ->
",X" = lists:reverse( combine(",", $X, []) ),
"a,X" = lists:reverse( combine(",", $X, "a") ),
"ab--X" = lists:reverse( combine("--", $X, lists:reverse("ab") ) ),
all_tests_passed.
combine([], X, Acc) ->
[X|Acc];
combine([Char|Chars], X, Acc) ->
combine(Chars, X, [Char|Acc]).
If your looking to just transform a string into a very specific format:
insert_commas(String) ->
string:join([string:substr(String, 1, 2), ",", string:substr(String, 3, 2), ",", string:substr(String, 5, 2), ",", string:substr(String, 7)], "").
module:insert_commas("123456789").` returns `"12,34,56,789"

Backtracking Recursive Descent Parser for the following grammar

I am trying to figure out some details involving parsing expression grammars, and am stuck on the following question:
For the given grammar:
a = b Z
b = Z Z | Z
(where lower-case letters indicate productions, and uppercase letters indicate terminals).
Is the production "a" supposed to match against the string "Z Z"?
Here is the pseudo-code that I've seen the above grammar get translated to, where each production is mapped to a function that outputs two values. The first indicates whether the parse succeeded. And the second indicates the resulting position in the stream after the parse.
defn parse-a (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b(i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b1 (i:Int) -> [True|False, Int] :
val [r1, i1] = eat("Z", i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b2 (i:Int) -> [True|False, Int] :
eat("Z", i)
defn parse-b (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b1(i)
if r1 : [r1, i1]
else : parse-b2(i)
The above code will fail when trying to parse the production "a" on the input "Z Z". This is because the parsing function for "b" is incorrect. It will greedily consume both Z's in the input and succeed, and then leave nothing left for a to parse. Is this what a parsing expression grammar is supposed to do? The pseudocode in Ford's thesis seems to indicate this.
Thanks very much.
-Patrick
In PEGs, disjunctions (alternatives) are indeed ordered. In Ford's thesis, the operator is written / and called "ordered choice", which distinguishes it from the | disjunction operator.
That makes PEGs fundamentally different from CFGs. In particular, given PEG rules a -> b Z and b -> Z Z / Z, a will not match Z Z.
Thanks for your reply Rici.
I re-read Ford's thesis much more closely, and it reaffirms what you said. PEGs / operator are both ordered and greedy. So the rule presented above is supposed to fail.
-Patrick

Resources