IMAP criteria with multiple ORs - imap

I am using gmail's IMAP API to search for mails.
I use the 'OR' criteria to search for different keywords. This works well if I go one level only, i.e. something like
'UID SEARCH OR (FROM "somebody#email.com") (TO "somebody#email.com")'
however, it does not work when I try a longer expression like
criteria OR criteria OR criteria or criteria
which translates to (as far as I understand the syntax)
'UID SEARCH OR ( OR (FROM "somebodyelse#email.com") (TO "somebodyelse#email.com")) ( OR (FROM "somebody#email.com") (TO "somebody#email.com"))'
to make it clear I basically want all messages that are either sent from or sent to ANY of a given list of emails
The error I get from the libray is
[ 'A34', 'BAD', 'Could', 'not', 'parse', 'command' ]
any suggestions?

Do not use parenthesis.
IMAP uses polish notation which does not need them:
UID SEARCH OR OR FROM one#mail.com TO one#mail.com OR FROM two#mail.com TO two#mail.com
There are cases where parenthesis are needed (as IMAP AND operator can take more than 2 operands):
https://www.limilabs.com/blog/imap-search-requires-parentheses

So if you have N expressions that you want to "OR" together, you would prefix those N expressions by N-1 "OR"s.
In Perl, for example, that would be:
$search = join " ", ("OR") x (#exprs - 1), #exprs;

Just to clarify Pawel's answer (posting since I lack the points to comment), as I didn't quite pick up the nuance:
OR is a prefix operator and can only take two operands (not more). So this translates into:
UID SEARCH OR FROM "somebodyelse#email.com" TO "somebodyelse#email.com"
If you have more than two, you need to add an OR:
e.g., 3 operands
UID SEARCH OR OR FROM "somebodyelse#email.com" TO "somebodyelse#email.com" FROM "somebody#email.com"
e.g., 4 operands
UID SEARCH OR OR OR FROM "somebodyelse#email.com" TO "somebodyelse#email.com" FROM "somebody#email.com" TO "somebody#email.com"
In other words, for four operands either:
Pawel's answer of :
OR (OR x x) (OR x x)
or what I outlined above:
(OR (OR (OR x x) x) x)
Should work.

Related

How to type AND in regex word matching

I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)

Starting a parser for scheme language

I am writing a basic parser for a Scheme interpreter and here are the definitions I have set up to define the various type of tokens:
# 1. Parens
Type:
PAREN
Subtype:
LEFT_PAREN
Value:
'('
# 2. Operators (<=, =, +, ...)
Type:
OPERATOR
Subtype:
EQUALS
Value:
'='
Arity:
2
# 3. Types (2.5, "Hello", #f, etc.)
Type:
DATA
Subtype:
NUMBER
Value:
2.4
# 4. Procedures, builtins, and such
Type:
KEYWORD
Subtype:
BUILTIN
Value:
"set"
Arity:
2
PROCEDURE:
... // probably need a new class for this
Does the above seem like it's a good starting place? Are there some obvious things I'm missing here, or does this give me a "good-enough" foundation?
Your approach makes distinctions which really don't exist in the syntax of the language, and also makes decisions far too early. For example consider this program:
(let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
When I run this:
> (let ((x 1))
(with-assignment-notes
(set! x 2)
(set! x 3)
x))
setting x to 2
setting x to 3
3
In order for this to work with-assignment-notes has to somehow redefine what (set! ...) means in its body. Here's a hacky and probably incorrect (Racket) implementation of that:
(define-syntax with-assignment-notes
(syntax-rules (set!)
[(_ form ...)
(let-syntax ([rewrite/maybe
(syntax-rules (set!)
[(_ (set! var val))
(let ([r val])
(printf "setting ~A to ~A~%" 'var r)
(set! var r))]
[(_ thing)
thing])])
(rewrite/maybe form) ...)]))
So the critical features of any parser for a Lisp-family language are:
it should not make any decision about the semantics of the language that it can avoid making;
the structure it constructs must be available to the language itself as first-class objects;
(and optionally) the parser should be modifiable from the language itself.
As examples:
it is probably inevitable that the parser needs to make decisions about what is and is not a number and what sort of number it is;
it would be nice if it had default handling for strings, but this should ideally be controllable by the user;
it should make no decision at all about what, say (< x y) means but rather should return a structure representing it for interpretation by the language.
The reason for the last, optional, requirement is that Lisp-family languages are used by people who are interested in using them for implementing languages. Allowing the reader to be altered from within the language makes that hugely easier, since you don't have to start from scratch each time you want to make a language which is a bit like the one you started with but not completely.
Parsing Lisp
The usual approach to parsing Lisp-family languages is to have machinery which will turn a sequence of characters into a sequence of s-expressions consisting of objects which are defined by the language itself, notably symbols and conses (but also numbers, strings &c). Once you have this structure you then walk over it to interpret it as a program: either evaluating it on the fly or compiling it. Critically, you can also write programs which manipulate this structure itself: macros.
In 'traditional' Lisps such as CL this process is explicit: there is a 'reader' which turns a sequence of characters into a sequence of s-expressions, and macros explicitly manipulate the list structure of these s-expressions, after which the evaluator/compiler processes them. So in a traditional Lisp (< x y) would be parsed as (a cons of a symbol < and (a cons of a symbol x and (a cons of a symbol y and the empty list object)), or (< . (x . (y . ()))), and this structure gets handed to the macro expander and hence to the evaluator or compiler.
In Scheme it is a little more subtle: macros are specified (portably, anyway) in terms of rules which turn a bit of syntax into another bit of syntax, and it's not (I think) explicit whether such objects are made of conses & symbols or not. But the structure which is available to syntax rules needs to be as rich as something made of conses and symbols, because syntax rules get to poke around inside it. If you want to write something like the following macro:
(define-syntax with-silly-escape
(syntax-rules ()
[(_ (escape) form ...)
(call/cc (λ (c)
(define (escape) (c 'escaped))
form ...))]
[(_ (escape val ...) form ...)
(call/cc (λ (c)
(define (escape) (c val ...))
form ...))]))
then you need to be able to look into the structure of what came from the reader, and that structure needs to be as rich as something made of lists and conses.
A toy reader: reeder
Reeder is a little Lisp reader written in Common Lisp that I wrote a little while ago for reasons I forget (but perhaps to help me learn CL-PPCRE, which it uses). It is emphatically a toy, but it is also small enough and simple enough to understand: certainly it is much smaller and simpler than the standard CL reader, and it demonstrates one approach to solving this problem. It is driven by a table known as a reedtable which defines how parsing proceeds.
So, for instance:
> (with-input-from-string (in "(defun foo (x) x)")
(reed :from in))
(defun foo (x) x)
Reeding
To read (reed) something using a reedtable:
look for the next interesting character, which is the next character not defined as whitespace in the table (reedtables have a configurable list of whitespace characters);
if that character is defined as a macro character in the table, call its function to read something;
otherwise call the table's token reader to read and interpret a token.
Reeding tokens
The token reader lives in the reedtable and is responsible for accumulating and interpreting a token:
it accumulates a token in ways known to itself (but the default one does this by just trundling along the string handling single (\) and multiple (|) escapes defined in the reedtable until it gets to something that is whitespace in the table);
at this point it has a string and it asks the reedtable to turn this string into something, which it does by means of token parsers.
There is a small kludge in the second step: as the token reader accumulates a token it keeps track of whether it is 'denatured' which means that there were escaped characters in it. It hands this information to the token parsers, which allows them, for instance, to interpret |1|, which is denatured, differently to 1, which is not.
Token parsers are also defined in the reedtable: there is a define-token-parser form to define them. They have priorities, so that the highest priority one gets to be tried first and they get to say whether they should be tried for denatured tokens. Some token parser should always apply: it's an error if none do.
The default reedtable has token parsers which can parse integers and rational numbers, and a fallback one which parses a symbol. Here is an example of how you would replace this fallback parser so that instead of returning symbols it returns objects called 'cymbals' which might be the representation of symbols in some embedded language:
Firstly we want a copy of the reedtable, and we need to remove the symbol parser from that copy (having previously checked its name using reedtable-token-parser-names).
(defvar *cymbal-reedtable* (copy-reedtable nil))
(remove-token-parser 'symbol *cymbal-reedtable*)
Now here's an implementation of cymbals:
(defvar *namespace* (make-hash-table :test #'equal))
(defstruct cymbal
name)
(defgeneric ensure-cymbal (thing))
(defmethod ensure-cymbal ((thing string))
(or (gethash thing *namespace*)
(setf (gethash thing *namespace*)
(make-cymbal :name thing))))
(defmethod ensure-cymbal ((thing cymbal))
thing)
And finally here is the cymbal token parser:
(define-token-parser (cymbal 0 :denatured t :reedtable *cymbal-reedtable*)
((:sequence
:start-anchor
(:register (:greedy-repetition 0 nil :everything))
:end-anchor)
name)
(ensure-cymbal name))
An example of this. Before modifying the reedtable:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(x y . z)
After:
> (with-input-from-string (in "(x y . z)")
(reed :from in :reedtable *cymbal-reedtable*))
(#S(cymbal :name "x") #S(cymbal :name "y") . #S(cymbal :name "z"))
Macro characters
If something isn't the start of a token then it's a macro character. Macro characters have associated functions and these functions get called to read one object, however they choose to do that. The default reedtable has two-and-a-half macro characters:
" reads a string, using the reedtable's single & multiple escape characters;
( reads a list or a cons.
) is defined to raise an exception, as it can only occur if there are unbalanced parens.
The string reader is pretty straightforward (it has a lot in common with the token reader although it's not the same code).
The list/cons reader is mildly fiddly: most of the fiddliness is dealing with consing dots which it does by a slightly disgusting trick: it installs a secret token parser which will parse a consing dot as a special object if a dynamic variable is true, but otherwise will raise an exception. The cons reader then binds this variable appropriately to make sure that consing dots are parsed only where they are allowed. Obviously the list/cons reader invokes the whole reader recursively in many places.
And that's all the macro characters. So, for instance in the default setup, ' would read as a symbol (or a cymbal). But you can just install a macro character:
(defvar *qr-reedtable* (copy-reedtable nil))
(setf (reedtable-macro-character #\' *qr-reedtable*)
(lambda (from quote table)
(declare (ignore quote))
(values `(quote ,(reed :from from :reedtable table))
(inch from nil))))
And now 'x will read as (quote x) in *qr-reedtable*.
Similarly you could add a more compllicated macro character on # to read objects depending on their next character in the way CL does.
An example of the quote reader. Before:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
\'
The object it has returned is a symbol whose name is "'", and it didn't read beyond that of course. After:
> (with-input-from-string (in "'(x y . z)")
(reed :from in :reedtable *qr-reedtable*))
`(x y . z)
Other notes
Everything works one-character-ahead, so all of the various functions get the stream being read, the first character they should be interested in and the reedtable, and return both their value and the next character. This avoids endlessly unreading characters (and probably tells you what grammar class it can handle natively (obviously macro character parsers can do whatever they like so long as things are sane when they return).
It probably doesn't use anything which isn't moderately implementable in non-Lisp languages. Some
Macros will cause pain in the usual way, but the only one is define-token-parser. I think the solution to that is the usual expand-the-macro-by-hand-and-write-that-code, but you could probably help a bit by having an install-or-replace-token-parser function which dealt with the bookkeeping of keeping the list sorted etc.
You'll need a language with dynamic variables to implement something like the cons reeder.
it uses CL-PPCRE's s-expression representation of regexps. I'm sure other languages have something like this (Perl does) because no-one wants to write stringy regexps: they must have died out decades ago.
It's a toy: it may be interesting to read but it's not suitable for any serious use. I found at least one bug while writing this: there will be many more.

In a tree-sitter grammar, how do I match strings except for reserved keywords in identifiers?

This might be related to me not understanding the Keyword Extraction feature, which from the docs seems to be about avoiding an issue where no space exists between a keyword and the following expression. But say I have a fairly standard identifier regex for variable names, function names, etc.:
/\w*[A-Za-z]\w*/
How do I keep this from matching a reserved keyword like IF or ELSE or something like that? So this expression would produce an error:
int IF = 5;
while this would not:
int x = 5;
There is a pull request pending since 2019 to add an EXCLUDE feature, but this is not currently implemented as of time of writing this (April 2021 - if some time has passed and you're reading this, please do re-check this!). And since treesitter also does not support negative lookbehind in its regular expressions, this has to be handled at the semantic level. One thing you can do to make this check easier is to enumerate all your reserved words then add them as an alternative to your identifier regex:
keyword: $ => choice('IF', 'THEN', 'ELSE'),
name: $ => /\w*[A-Za-z]\w*/,
identifier: $ => choice($.keyword, $.name)
According to rule 4 of treesitter's match rules, in the expression int IF = 5; the IF token would match (identifier keyword) rather than (identifier name) since it is a more specific match. This means you can do an easy query for illegal (identifier keyword) nodes and surface the error to the user in your language server or from wherever it is you're using the treesitter grammar.
Note that this approach does run the risk of creating many conflicts between your (identifier keyword) match and the actual language constructs that use those keywords. If so, you'll have to handle the whole thing at the semantic level: scan all identifiers to check whether they're a reserved word.

Matching function in erlang based on string format

I have user information coming in from an outside source and I need to check if that user is active. Sometimes I have a User and a Server and other times I have User#Server. The former case is no problem, I just have:
active(User, Server) ->
do whatever.
What I would like to do with the User#Server case is something like:
active([User, "#", Server]) ->
active(User, Server).
Doesn't seem to work. When calling active in the erlang terminal with a#b for example, I get an error that there is no match. Any help would be appreciated!
You can tokenize the string to get the result:
active(UserString) ->
[User,Server] = string:tokens(UserString,"#"),
active(User,Server).
If you need something more elaborate, or with better handling of something like email addresses, it might then be time to delve into using regular expressions with the re module.
active(UserString) ->
RegEx = "^([\\w\\.-]+)#([\\w\\.-]+)$",
{match, [User,Server]} = re:run(UserString,RegEx,[{capture,all_but_first,list}]),
active(User,Server).
Note: The supplied Regex is hardly sufficient for email address validation, it's just an example that allows all alphanumeric characters including underscores (\\w), dots (\\.), and dashes (-) seperated by an at symbol. And it will fail if the match doesn't stretch the whole length of the string: (^ to $).
A note on the pattern matching, for the real solution to your problem I think #chops suggestions should be used.
When matching patterns against strings I think it's useful to keep in mind that erlang strings are really lists of integers. So the string "#" is actually the same as [64] (64 being the ascii code for #)
This means that you match pattern [User, "#", Server] will match lists like: [97,[64],98], but not "a#b" (which in list form is [97,64,98]).
To match the string you need to do [User,$#,Server]. The $ operator gives you the ascii value of the character.
However this match pattern limits the matching string to be 1 character followed by # and then one more character...
It can be improved by doing [User, $# | Server] which allows the server part to have arbitrary length, but the User variable will still only match one single character (and I don't see a way around that).

REBOL path operator vs division ambiguity

I've started looking into REBOL, just for fun, and as a fan of programming languages, I really like seeing new ideas and even just alternative syntaxes. REBOL is definitely full of these. One thing I noticed is the use of '/' as the path operator which can be used similarly to the '.' operator in most object-oriented programming languages. I have not programmed in REBOL extensively, just looked at some examples and read some documentation, but it isn't clear to me why there's no ambiguity with the '/' operator.
x: 4
y: 2
result: x/y
In my example, this should be division, but it seems like it could just as easily be the path operator if x were an object or function refinement. How does REBOL handle the ambiguity? Is it just a matter of an overloaded operator and the type system so it doesn't know until runtime? Or is it something I'm missing in the grammar and there really is a difference?
UPDATE Found a good piece of example code:
sp: to-integer (100 * 2 * length? buf) / d/3 / 1024 / 1024
It appears that arithmetic division requires whitespace, while the path operator requires no whitespace. Is that it?
This question deserves an answer from the syntactic point of view. In Rebol, there is no "path operator", in fact. The x/y is a syntactic element called path. As opposed to that the standalone / (delimited by spaces) is not a path, it is a word (which is usually interpreted as the division operator). In Rebol you can examine syntactic elements like this:
length? code: [x/y x / y] ; == 4
type? first code ; == path!
type? second code
, etc.
The code guide says:
White-space is used in general for delimiting (for separating symbols).
This is especially important because words may contain characters such as + and -.
http://www.rebol.com/r3/docs/guide/code-syntax.html
One acquired skill of being a REBOler is to get the hang of inserting whitespace in expressions where other languages usually do not require it :)
Spaces are generally needed in Rebol, but there are exceptions here and there for "special" characters, such as those delimiting series. For instance:
[a b c] is the same as [ a b c ]
(a b c) is the same as ( a b c )
[a b c]def is the same as [a b c] def
Some fairly powerful tools for doing introspection of syntactic elements are type?, quote, and probe. The quote operator prevents the interpreter from giving behavior to things. So if you tried something like:
>> data: [x [y 10]]
>> type? data/x/y
>> probe data/x/y
The "live" nature of the code would dig through the path and give you an integer! of value 10. But if you use quote:
>> data: [x [y 10]]
>> type? quote data/x/y
>> probe quote data/x/y
Then you wind up with a path! whose value is simply data/x/y, it never gets evaluated.
In the internal representation, a PATH! is quite similar to a BLOCK! or a PAREN!. It just has this special distinctive lexical type, which allows it to be treated differently. Although you've noticed that it can behave like a "dot" by picking members out of an object or series, that is only how it is used by the DO dialect. You could invent your own ideas, let's say you make the "russell" command:
russell [
x: 10
y: 20
z: 30
x/y/z
(
print x
print y
print z
)
]
Imagine that in my fanciful example, this outputs 30, 10, 20...because what the russell function does is evaluate its block in such a way that a path is treated as an instruction to shift values. So x/y/z means x=>y, y=>z, and z=>x. Then any code in parentheses is run in the DO dialect. Assignments are treated normally.
When you want to make up a fun new riff on how to express yourself, Rebol takes care of a lot of the grunt work. So for example the parentheses are guaranteed to have matched up to get a paren!. You don't have to go looking for all that yourself, you just build your dialect up from the building blocks of all those different types...and hook into existing behaviors (such as the DO dialect for basics like math and general computation, and the mind-bending PARSE dialect for some rather amazing pattern matching muscle).
But speaking of "all those different types", there's yet another weirdo situation for slash that can create another type:
>> type? quote /foo
This is called a refinement!, and happens when you start a lexical element with a slash. You'll see it used in the DO dialect to call out optional parameter sets to a function. But once again, it's just another symbolic LEGO in the parts box. You can ascribe meaning to it in your own dialects that is completely different...
While I didn't find any written definitive clarification, I did also find that +,-,* and others are valid characters in a word, so clearly it requires a space.
x*y
Is a valid identifier
x * y
Performs multiplication. It looks like the path operator is just another case of this.

Resources