Why no dynamic bit pattern in function argument? - erlang

I am experimenting with bit pattern matching in Erlang:
-module(test).
-export([test/2]).
%test(P,<<X:P,0:1>>) ->
% X.
test(P,X) ->
<<Y:P,0:1>> = X,
Y.
when compiling the commented out version of test/2 i get a complaint that "variable 'P' is unbound".
Is there any good reason for not allowing the first version to work the same as the second?

Because in the commented out version P is a length - for it to work Erlang would need to perform a double match - match the value of the 2nd parameter with a pattern which is undetermined...
The question you are asking in a clause pattern match is "is this the clause for me" - you can't 'pop into the clause' and then back out if it isn't...
In the second example X is bound before the match, you are committed to going into the clause and if <<Y:P,0:1>> don't match X, well crash time!

The reason is that arguments to the function are evaluated independent of each other. The correctness of bindings to variables in only checked as a second step.
This means that in your first example P will be unbound when evaluating the second argument, which is against the rules of pattern matching. In contrast, in your second example, P is bound at the time of evaluating the pattern match on the binary.

Related

How to invoke Erlang function with variable?

4> abs(1).
1
5> X = abs.
abs
6> X(1).
** exception error: bad function abs
7> erlang:X(1).
1
8>
Is there any particular reason why I have to use the module name when I invoke a function with a variable? This isn't going to work for me because, well, for one thing it is just way too much syntactic garbage and makes my eyes bleed. For another thing, I plan on invoking functions out of a list, something like (off the top of my head):
[X(1) || X <- [abs, f1, f2, f3...]].
Attempting to tack on various module names here is going to make the verbosity go through the roof, when the whole point of what I am doing is to reduce verbosity.
EDIT: Look here: http://www.erlangpatterns.org/chain.html The guy has made some pipe-forward function. He is invoking functions the same way I want to above, but his code doesn't work when I try to use it. But from what I know, the guy is an experienced Erlang programmer - I saw him give some keynote or whatever at a conference (well I saw it online).
Did this kind of thing used to work but not anymore? Surely there is a way I can do what I want - invoke these functions without all the verbosity and boilerplate.
EDIT: If I am reading the documentation right, it seems to imply that my example at the top should work (section 8.6) http://erlang.org/doc/reference_manual/expressions.html
I know abs is an atom, not a function. [...] Why does it work when the module name is used?
The documentation explains that (slightly reorganized):
ExprM:ExprF(Expr1,...,ExprN)
each of ExprM and ExprF must be an atom or an expression that
evaluates to an atom. The function is said to be called by using the
fully qualified function name.
ExprF(Expr1,...,ExprN)
ExprF
must be an atom or evaluate to a fun.
If ExprF is an atom the function is said to be called by using the implicitly qualified function name.
When using fully qualified function names, Erlang expects atoms or expression that evaluates to atoms. In other words, you have to bind X to an atom: X = atom. That's exactly what you provide.
But in the second form, Erlang expects either an atom or an expression that evaluates to a function. Notice that last word. In other words, if you do not use fully qualified function name, you have to bind X to a function: X = fun module:function/arity.
In the expression X=abs, abs is not a function but an atom. If you want thus to define a function,you can do so:
D = fun erlang:abs/1.
or so:
X = fun(X)->abs(X) end.
Try:
X = fun(Number) -> abs(Number) end.
Updated:
After looking at the discussion more, it seems like you're wanting to apply multiple functions to some input.
There are two projects that I haven't used personally, but I've starred on Github that may be what you're looking for.
Both of these projects use parse transforms:
fun_chain https://github.com/sasa1977/fun_chain
pipeline https://github.com/stolen/pipeline
Pipeline is unique because it uses a special syntax:
Result = [fun1, mod2:fun2, fun3] (Arg1, Arg2).
Of course, it could also be possible to write your own function to do this using a list of {module, function} tuples and applying the function to the previous output until you get the result.

Grammar: start: (a b)? a c; Input: a d. Which error correct at position 2? 1. expected "b", "c". OR expected "c"

Grammar:
rule: (a b)? a c ;
Input:
a d
Question: Which error message correct at position 2 for given input?
1. expected "b", "c".
2. expected "c".
P.S.
I write parser and I have choice (dilemma) take into account that "b" expected at position or not take.
The #1 error (expected "b", "c") want to say that input "a b" expected but because it optional it may not expected but possible.
I don't know possible is the same as expected or not?
Which error message better and correct #1 or #2?
Thanks for answers.
P.S.
In first case I define marker of testing as limit of position.
if(_inputPos > testing) {
_failure(_inputPos, _code[cp + {{OFFSET_RESULT}}]);
}
Limit moved in optional expressions:
OPTIONAL_EXPRESSION:
testing = _inputPos;
The "b" expression move _inputPos above the testing pos and add failure at _inputPos.
In second case I can define marker of testing as boolean flag.
if(!testing) {
_failure(_inputPos, _code[cp + {{OFFSET_RESULT}}]);
}
The "b" expression in this case not add failure because it tested (inner for optional expression).
What you think what is better and correct?
Testing defined as specific position and if expression above this position (_inputPos > testing) it add failure (even it inside optional expression).
Testing defined as flag and if this flag set that the failures not takes into account. After executing optional expression it restore (not reset!) previous value of testing (true or false).
Also failures not takes into account if rule not fails. They only reported if parsing fails.
P.S.
Changes at 06 Jan 2014
This question raised because it related to two different problems.
First problem:
Parsing expression grammar (PEG) describe only three atomic items of input:
terminal symbol
nonterminal symbol
empty string
This grammar does not provide such operation as lexical preprocessing an thus it does not provide such element as the token.
Second problem:
What is a grammar? Are two grammars can be considred equal if they accept the same input but produce different result?
Assume we have two grammar:
Grammar 1
rule <- type? identifier
Grammar 2
rule <- type identifier / identifier
They both accept the same input but produce (in PEG) different result.
Grammar 1 results:
{type : type, identifier : identifier}
{type : null, identifier : identifier}
Grammar 2 results:
{type : type, identifier : identifier}
{identifier : identifier}
Quetions:
Both grammar equal?
It is painless to do optimization of grammars?
My answer on both questions is negative. No equal, Not painless.
But you may ask. "But why this happens?".
I can answer to you. "Because this is not a problem. This is a feature".
In PEG parser expression ALWAYS consists from these parts.
ORDERED_CHOICE => SEQUENCE => EXPRESSION
And this explanation is the my answer on question "But why this happens?".
Another problem.
PEG parser does not recognize WHITESPACES because it does not have tokens and tokens separators.
Now look at this grammar (in short):
program <- WHITESPACE expr EOF
expr <- ruleX
ruleX <- 'X' WHITESPACE
WHITESPACE < ' '?
EOF <- ! .
All PEG grammar desribed in this manner.
First WHITESPACE at begin and other WHITESPACE (often) at the end of rule.
In this case in PEG optional WHITESPACE must be assumed as expected.
But WHITESPACE not means only space. It may be more complex [\t\n\r] and even comments.
But the main rule of error messages is the following.
If not possible to display all expected elements (or not possible to display at least one from all set of expected elements) in this case is more correct do not display anything.
More precisely required to display "unexpected" error mesage.
How you in PEG will display expected WHITESPACE?
Parser error: expected WHITESPACE
Parser error: expected ' ', '\t', '\n' , 'r'
What about start charcters of comments? They also may be part of WHITESPACE in some grammars.
In this case optional WHITESPACE will be reject all other potential expected elements because not possible correctly to display WHITESPACE in error message because WHITESPACE is too complex to display.
Is this good or bad?
I think this is not bad and required some tricks to hide this nature of PEG parsers.
And in my PEG parser I not assume that the inner expression at first position of optional (optional & zero_or_more) expression must be treated as expected.
But all other inner (except at the first position) must treated as expected.
Example 1:
List<int list; // type? ident
Here "List<int" is a "type". But missing ">" is not at the first position in optional "type?".
This failure take into account and report as "expected '>'"
This is because we not skip "type" but enter into "type" and after really optional "List" we move position from first to next real "expected" (that already outside of testing position) element.
"List" was in "testing" position.
If inner expression (inside optional expression) "fits in the limitation" not continue at next position then it not assumed as the expected input.
From this assumption has been asked main question.
You must just take into account that we are talking about PEG parsers and their error messages.
Here is your grammar:
What is clear here is that after the first a there are two possible inputs: b or c. Your error message should not prioritize one over the other.
The basic idea to produce an error message for an invalid input is to find the most far place you failed (if your grammar where d | (a b)? a c, d wouldn't be part of the error) and determine what are all possible inputs that could make you advance and say "expected '...' but got '...'". There are other approaches to try to recover the parser and force it to continue. If there is only one possible expected token, let's temporarily insert it into the token stream and continue as if it where there since ever. This would lead to better error detection as you can find errors beyond the point where the parser first stopped.

Haskell/Parsec: How do you use the functions in Text.Parsec.Indent?

I'm having trouble working out how to use any of the functions in the Text.Parsec.Indent module provided by the indents package for Haskell, which is a sort of add-on for Parsec.
What do all these functions do? How are they to be used?
I can understand the brief Haddock description of withBlock, and I've found examples of how to use withBlock, runIndent and the IndentParser type here, here and here. I can also understand the documentation for the four parsers indentBrackets and friends. But many things are still confusing me.
In particular:
What is the difference between withBlock f a p and
do aa <- a
pp <- block p
return f aa pp
Likewise, what's the difference between withBlock' a p and do {a; block p}
In the family of functions indented and friends, what is ‘the level of the reference’? That is, what is ‘the reference’?
Again, with the functions indented and friends, how are they to be used? With the exception of withPos, it looks like they take no arguments and are all of type IParser () (IParser defined like this or this) so I'm guessing that all they can do is to produce an error or not and that they should appear in a do block, but I can't figure out the details.
I did at least find some examples on the usage of withPos in the source code, so I can probably figure that out if I stare at it for long enough.
<+/> comes with the helpful description “<+/> is to indentation sensitive parsers what ap is to monads” which is great if you want to spend several sessions trying to wrap your head around ap and then work out how that's analogous to a parser. The other three combinators are then defined with reference to <+/>, making the whole group unapproachable to a newcomer.
Do I need to use these? Can I just ignore them and use do instead?
The ordinary lexeme combinator and whiteSpace parser from Parsec will happily consume newlines in the middle of a multi-token construct without complaining. But in an indentation-style language, sometimes you want to stop parsing a lexical construct or throw an error if a line is broken and the next line is indented less than it should be. How do I go about doing this in Parsec?
In the language I am trying to parse, ideally the rules for when a lexical structure is allowed to continue on to the next line should depend on what tokens appear at the end of the first line or the beginning of the subsequent line. Is there an easy way to achieve this in Parsec? (If it is difficult then it is not something which I need to concern myself with at this time.)
So, the first hint is to take a look at IndentParser
type IndentParser s u a = ParsecT s u (State SourcePos) a
I.e. it's a ParsecT keeping an extra close watch on SourcePos, an abstract container which can be used to access, among other things, the current column number. So, it's probably storing the current "level of indentation" in SourcePos. That'd be my initial guess as to what "level of reference" means.
In short, indents gives you a new kind of Parsec which is context sensitive—in particular, sensitive to the current indentation. I'll answer your questions out of order.
(2) The "level of reference" is the "belief" referred in the current parser context state of where this indentation level starts. To be more clear, let me give some test cases on (3).
(3) In order to start experimenting with these functions, we'll build a little test runner. It'll run the parser with a string that we give it and then unwrap the inner State part using an initialPos which we get to modify. In code
import Text.Parsec
import Text.Parsec.Pos
import Text.Parsec.Indent
import Control.Monad.State
testParse :: (SourcePos -> SourcePos)
-> IndentParser String () a
-> String -> Either ParseError a
testParse f p src = fst $ flip runState (f $ initialPos "") $ runParserT p () "" src
(Note that this is almost runIndent, except I gave a backdoor to modify the initialPos.)
Now we can take a look at indented. By examining the source, I can tell it does two things. First, it'll fail if the current SourcePos column number is less-than-or-equal-to the "level of reference" stored in the SourcePos stored in the State. Second, it somewhat mysteriously updates the State SourcePos's line counter (not column counter) to be current.
Only the first behavior is important, to my understanding. We can see the difference here.
>>> testParse id indented ""
Left (line 1, column 1): not indented
>>> testParse id (spaces >> indented) " "
Right ()
>>> testParse id (many (char 'x') >> indented) "xxxx"
Right ()
So, in order to have indented succeed, we need to have consumed enough whitespace (or anything else!) to push our column position out past the "reference" column position. Otherwise, it'll fail saying "not indented". Similar behavior exists for the next three functions: same fails unless the current position and reference position are on the same line, sameOrIndented fails if the current column is strictly less than the reference column, unless they are on the same line, and checkIndent fails unless the current and reference columns match.
withPos is slightly different. It's not just a IndentParser, it's an IndentParser-combinator—it transforms the input IndentParser into one that thinks the "reference column" (the SourcePos in the State) is exactly where it was when we called withPos.
This gives us another hint, btw. It lets us know we have the power to change the reference column.
(1) So now let's take a look at how block and withBlock work using our new, lower level reference column operators. withBlock is implemented in terms of block, so we'll start with block.
-- simplified from the actual source
block p = withPos $ many1 (checkIndent >> p)
So, block resets the "reference column" to be whatever the current column is and then consumes at least 1 parses from p so long as each one is indented identically as this newly set "reference column". Now we can take a look at withBlock
withBlock f a p = withPos $ do
r1 <- a
r2 <- option [] (indented >> block p)
return (f r1 r2)
So, it resets the "reference column" to the current column, parses a single a parse, tries to parse an indented block of ps, then combines the results using f. Your implementation is almost correct, except that you need to use withPos to choose the correct "reference column".
Then, once you have withBlock, withBlock' = withBlock (\_ bs -> bs).
(5) So, indented and friends are exactly the tools to doing this: they'll cause a parse to immediately fail if it's indented incorrectly with respect to the "reference position" chosen by withPos.
(4) Yes, don't worry about these guys until you learn how to use Applicative style parsing in base Parsec. It's often a much cleaner, faster, simpler way of specifying parses. Sometimes they're even more powerful, but if you understand Monads then they're almost always completely equivalent.
(6) And this is the crux. The tools mentioned so far can only do indentation failure if you can describe your intended indentation using withPos. Quickly, I don't think it's possible to specify withPos based on the success or failure of other parses... so you'll have to go another level deeper. Fortunately, the mechanism that makes IndentParsers work is obvious—it's just an inner State monad containing SourcePos. You can use lift :: MonadTrans t => m a -> t m a to manipulate this inner state and set the "reference column" however you like.
Cheers!

Erlang matchspecs with tuple comparison

I want to use erlang datetime values in the standard format {{Y,M,D},{H,Min,Sec}} in a MNESIA table for logging purposes and be able to select log entries by comparing with constant start and end time tuples.
It seems that the matchspec guard compiler somehow confuses tuple values with guard sub-expressions. Evaluating ets:match_spec_compile(MatchSpec) fails for
MatchSpec = [
{
{'_','$1','$2'}
,
[
{'==','$2',{1,2}}
]
,
['$_']
}
]
but succeeds when I compare $2 with any non-tuple value.
Is there a restriction that match guards cannot compare tuple values?
I believe the answer is to use double braces when using tuples (see Variables and Literals section of http://www.erlang.org/doc/apps/erts/match_spec.html#id69408). So to use a tuple in a matchspec expression, surround that tuple with braces, as in,
{'==','$2',{{1,2}}}
So, if I understand your example correctly, you would have
22> M=[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}].
[{{'_','$1','$2'},[{'==','$2',{{1,2}}}],['$_']}]
23> ets:match_spec_run([{1,1,{1,2}}],ets:match_spec_compile(M)).
[{1,1,{1,2}}]
24> ets:match_spec_run([{1,1,{2,2}}],ets:match_spec_compile(M)).
[]
EDIT: (sorry to edit your answer but this was the easiest way to get my comment in a readable form)
Yes, this is how it must be done. An easier way to get the match-spec is to use the (pseudo) function ets:fun2ms/1 which takes a literal fun as an argument and returns the match-spec. So
10> ets:fun2ms(fun ({A,B,C}=X) when C == {1,2} -> X end).
[{{'$1','$2','$3'},[{'==','$3',{{1,2}}}],['$_']}]
The shell recognises ets:fun2ms/1. For more information see ETS documentation. Mnesia uses the same match-specs as ETS.

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

Resources