Haskell filter option concatenation - parsing

exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum) (replaceStr str " " "_")
The code above is supposed to first replace all "spaces" with "_", then filter the String according to Char.isAlphaNum. Unfortunately the Char.isAlphaNum part absorbs the already exchanged "_", which isn't my intention and i want to hold the "_". So, i thought it would be nice just add an exception to the filter which goes like:
exchangeSymbols "a§ b$ c. 1. 2. 3/" = filter (Char.isAlphaNum && /='_') (replaceStr str " " "_")
You see the added && not /='_'. It produces a parse error, obviously it is not so easily possible to concatenate filter options, but is there a smart workaround ? I thought about wrapping the filter function, like a 1000 times or so with each recursion adding a new filter test (/='!'),(/='§') and so on without adding the (/='_'). However it doesn't seem to be a handy solution.

Writing
... filter (Char.isAlphaNum && /='_') ...
is actually a type error (the reason why it yields a parse error is maybe that you used /= as prefix - but its an infix operator). You cannot combine functions with (&&) since its an operator on booleans (not on functions).
Acutally this code snipped should read:
... filter (\c -> Char.isAlphaNum c && c /= '_') ...

Replace your filter with a list comprehension.
[x | x <- replaceStr str " " "_", x /= '_', Char.isAplhaNum x]
Naturally, you probably want to have multiple exceptions. So define a helper function:
notIn :: (Eq a) => [a] -> a -> Bool
notIn [] _ = True
notIn x:xs y = if x == y
then False
else notIn xs
EDIT: Apparently you can use notElem :: (Eq a) => a -> [a] -> Bool instead. Leaving above code for educational purposes.
And use that in your list comprehension:
[x | x <- replaceStr str " " "_", notElem x "chars to reject", Char.isAlphaNum x]
Untested, as haskell isn't installed on this machine. Bonus points if you are doing a map after the filter, since you can then put that in the list comprehension.
Edit 2: Try this instead, I followed in your footsteps instead of thinking it out myself:
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x == ' ']
[x | x <- replaceStr str " " "_", Char.isAlphaNum x || x `elem` "chars to accept"]
At this point the list comprehension doesn't help much. The only reason I did change it was because I you requested an &&, for which using a list comprehension is great.
Since it seems that you don't quite understand the principle of the list comprehension, its basically applying a bunch of filters and then a map with more than one source, for example:
[(x, y, x + y) | x <- [1, 2, 3, 4, 5], y <- [2, 4], x > y]

Related

Why does this List comprehension take too long to evaluate on Erlang/OTP 20?

To find any 5 numbers whose sum = 100. This can be done in a loop but i was illustrating list comprehension to a friend only to realize this takes more than 30 mins on my Mac Book Pro,core i7, 2.2GHz
[[A,B,C,D,E] || A <- lists:seq(1,100),B <- lists:seq(1,100),C <- lists:seq(1,100),D <- lists:seq(1,100),E <- lists:seq(1,100),(A + B + C + D + E) == 100]
And if the question is changed to have the 5 numbers consecutive, the constructed list comprehension even takes much longer. If i am to solve this problem using a list comprehension, am i doing it right ? if yes, why does it take too long ? please provide a solution that may be faster, perhaps using a loop.
The multiple generators behave like nested loops over the lists, and each call to lists:seq() will be fully evaluated each time. This takes a very long time, and spends most of that time allocating list cells and garbage collecting them again. But since they all evaluate to the same constant list anyway, you can rewrite it as L = lists:seq(1,100), [[A,B,C,D,E] || A <- L,B <- L,C <- L,D <- L,E <- L,(A + B + C + D + E) == 100]. Also, running this in the shell will be a lot slower than in a compiled module. On my macbook, the compiled code finished in about 2 min 30 s. And that's just using a single core. Compiling with [native] makes it run in 60 seconds flat.
Because it "creates" all the elements of a 100^5 list of list of 5 elements before it makes the filter, that represents 50000000000 elements.
[edit]
I reviewed the answer from RichardC and Alexey Romanov and I decided to make some tests:
-module (testlc).
-export ([test/1]).
test(N) ->
F1 = fun() -> [{W,X,Y,Z}|| W <- lists:seq(1,N),X <- lists:seq(1,N),Y <- lists:seq(1,N),Z <- lists:seq(1,N), W+X+Y+Z == N] end,
F2 = fun() ->L = lists:seq(1,N), [{W,X,Y,Z}|| W <- L,X <- L,Y <- L,Z <- L, W+X+Y+Z == N] end,
F3 = fun() -> [{W,X,Y,Z}|| W <- lists:seq(1,N-3), X <- lists:seq(1,N-2-W),Y <- lists:seq(1,N-1-W-X),Z <- lists:seq(1,N-W-X-Y), W+X+Y+Z == N] end,
F4 = fun() -> [{W,X,Y,N-W-X-Y}|| W <- lists:seq(1,N-3),X <- lists:seq(1,N-2-W),Y <- lists:seq(1,N-1-W-X)] end,
F5 = fun() -> L = lists:seq(1,N), [{W,X,Y,N-W-X-Y}|| W <- L,
XM <- [N-2-W], X <- L, X =< XM,
YM <- [N-1-W-X], Y <- L, Y =< YM] end,
{T1,L1} = timer:tc(F1),
{T2,L2} = timer:tc(F2),
{T3,L3} = timer:tc(F3),
{T4,L4} = timer:tc(F4),
{T5,L5} = timer:tc(F5),
_L = lists:sort(L1),
_L = lists:sort(L2),
_L = lists:sort(L3),
_L = lists:sort(L4),
_L = lists:sort(L5),
{test_for,N,{t1,T1},{t2,T2},{t3,T3},{t4,T4},{t5,T5}}.
and the result:
1> c(testlc).
{ok,testlc}
2> testlc:test(50).
{test_for,50,
{t1,452999},
{t2,92999},
{t3,32000},
{t4,0},
{t5,0}}
3> testlc:test(100).
{test_for,100,
{t1,4124992},
{t2,1452997},
{t3,203000},
{t4,16000},
{t5,15000}}
4> testlc:test(150).
{test_for,150,
{t1,20312959},
{t2,7483985},
{t3,890998},
{t4,93000},
{t5,110000}}
5> testlc:test(200).
{test_for,200,
{t1,63874875},
{t2,24952951},
{t3,2921995},
{t4,218999},
{t5,265000}}
Preparing the list outside of the list comprehension has a big impact, but it is more efficient to limit drastically the number of useless intermediate lists generated before the filter works. So it is a balance to evaluate. In this example, the 2 enhancements can be used together (Thanks to Alexey) but it does not make a big difference.
Erlang strong when we use concurrence in programming, so you can also spawn 100 process to handle list of [1,...,100]. It can be easy for your laptop calculate. For example:
do()->
L100 = lists:seq(1,100),
[spawn(?MODULE, func, [self(), [A], L100, L100, L100, L100]) ||
A <- L100],
loop(100, []).
loop(0, Acc) -> Acc;
loop(N, Acc) ->
receive
{ok, Result} ->
loop(N - 1, Acc ++ Result)
end.
func(Pid, LA, LB, LC, LD, LE) ->
Result = [[A,B,C,D,E] ||
A <- LA,B <- LB,C <- LC,D <- LD,E <- LE,(A + B + C + D + E) == 100],
Pid ! {ok, Result}.
With solution above, my laptop with i3 2.1GHz can be easy calculate in 1 minute. You can also spawn more process for shorter calculate. Process in Erlang is light-weight process so It can be easy start then easy stop.
One option would be
[[A,B,C,D,100-A-B-C-D] || A <- lists:seq(1,100), B <- lists:seq(1,100-A), C <- lists:seq(1,100-A-B), D <- lists:seq(1,100-A-B-C), 100-A-B-C-D > 0]
Just not enumerating all possible Es when at most one will succeed should be 100 times faster (or more since there's less garbage produced). Also decreasing the sizes of lists for B, C, and D will improve even more.
But there is some code duplication there. Unfortunately, Erlang doesn't allow "local" variables in list comprehensions, but you can emulate them with one-element generators:
[[A,B,C,D,E] || A <- lists:seq(1,100),
BMax <- [100-A], B <- lists:seq(1,BMax),
CMax <- [BMax-B], C <- lists:seq(1,CMax),
DMax <- [CMax-C], D <- lists:seq(1,DMax),
E <- [100-A-B-C-D], E > 0]
Or to avoid repeated lists:seq calls, as #RichardC points out:
L = lists:seq(1, 100),
[[A,B,C,D,E] || A <- L,
BMax <- [100-A], B <- L, B =< BMax,
CMax <- [BMax-B], C <- L, C =< CMax,
DMax <- [CMax-C], D <- L, D =< DMax,
E <- [100-A-B-C-D], E > 0]

Case-insensitive matching in LPeg.re (Lua)

I'm new to the "LPeg" and "re" modules of Lua, currently I want to write a pattern based on following rules:
Match the string that starts with "gv_$/gv$/v$/v_$/x$/xv$/dba_/all_/cdb_", and the prefix "SYS.%s*" or "PUBLIC.%s*" is optional
The string should not follow a alphanumeric, i.e., the pattern would not match "XSYS.DBA_OBJECTS" because it follows "X"
The pattern is case-insensitive
For example, below strings should match the pattern:
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
Currently my pattern is below which can only match non-alphanumeric+string in upper case :
p=re.compile[[
pattern <- %W {owner* name}
owner <- 'SYS.'/ 'PUBLIC.'
name <- {prefix %a%a (%w/"_"/"$"/"#")+}
prefix <- "GV_$"/"GV$"/"V_$"/"V$"/"DBA_"/"ALL_"/"CDB_"
]]
print(p:match(",SYS.DBA_OBJECTS"))
My questions are:
How to achieve the case-insensitive matching? There are some topics about the solution but I'm too new to understand
How to exactly return the matched string only, instead of also have to plus %W? Something like "(?=...)" in Java
Highly appreciated if you can provide the pattern or related function.
You can try to tweak this grammar
local re = require're'
local p = re.compile[[
pattern <- ((s? { <name> }) / s / .)* !.
name <- (<owner> s? '.' s?)? <prefix> <ident>
owner <- (S Y S) / (P U B L I C)
prefix <- (G V '_'? '$') / (V '_'? '$') / (D B A '_') / (C D B '_')
ident <- [_$#%w]+
s <- (<comment> / %s)+
comment <- '--' (!%nl .)*
A <- [aA]
B <- [bB]
C <- [cC]
D <- [dD]
G <- [gG]
I <- [iI]
L <- [lL]
P <- [pP]
S <- [sS]
U <- [uU]
V <- [vV]
Y <- [yY]
]]
local m = { p:match[[
,sys.dba_objects, --should return "sys.dba_objects"
SyS.Dba_OBJECTS
cdb_objects
dba_hist_snapshot) --should return "dba_hist_snapshot"
]] }
print(unpack(m))
. . . prints match table m:
sys.dba_objects SyS.Dba_OBJECTS cdb_objects dba_hist_snapshot
Note that case-insensitivity is quite hard to achieve out of the lexer so each letter has to get a separate rule -- you'll need more of these eventually.
This grammar is taking care of the comments in your sample and skips them along with whitespace so matches after "should return" are not present in output.
You can fiddle with prefix and ident rules to specify additional prefixes and allowed characters in object names.
Note: !. means end-of-file. !%nl means "not end-of-line". ! p and & p are constructing non-consuming patterns i.e. current input pointer is not incremented on match (input is only tested).
Note 2: print-ing with unpack is a gross hack.
Note 3: Here is a tracable LPeg re that can be used to debug grammars. Pass true for 3-rd param of re.compile to get execution trace with test/match/skip action on each rule and position visited.
Finally I got an solution but not so graceful, which is to add an additional parameter case_insensitive into re.compile, re.find, re.match and re.gsubfunctions. When the parameter value is true, then invoke case_insensitive_pattern to rewrite the pattern:
...
local fmt="[%s%s]"
local function case_insensitive_pattern(quote,pattern)
-- find an optional '%' (group 1) followed by any character (group 2)
local stack={}
local is_letter=nil
local p = pattern:gsub("(%%?)(.)",
function(percent, letter)
if percent ~= "" or not letter:match("%a") then
-- if the '%' matched, or `letter` is not a letter, return "as is"
if is_letter==false then
stack[#stack]=stack[#stack]..percent .. letter
else
stack[#stack+1]=percent .. letter
is_letter=false
end
else
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
is_letter=true
end
-- else, return a case-insensitive character class of the matched letter
stack[#stack+1]=fmt:format(letter:lower(), letter:upper())
end
return ""
end)
if is_letter==false then
stack[#stack]=quote..stack[#stack]..quote
end
if #stack<2 then return stack[1] or (quote..pattern..quote) end
return '('..table.concat(stack,' ')..')'
end
local function compile (p, defs, case_insensitive)
if mm.type(p) == "pattern" then return p end -- already compiled
if case_insensitive==true then
p=p:gsub([[(['"'])([^\n]-)(%1)]],case_insensitive_pattern):gsub("%(%s*%((.-)%)%s*%)","(%1)")
end
local cp = pattern:match(p, 1, defs)
if not cp then error("incorrect pattern", 3) end
return cp
end
...

How to eliminate the quotation marks of a list of strings?

I receive the following list of strings from a text file:
["{0988070979,APP03#media}","{0988070978,APP01#media}","{0988070977,APP02#media}"]
I need the same list but without the quotation marks ( " " ), something like this:
[{0988070979,APP03#media},{0988070978,APP01#media},{0988070977,APP02#media}]
1> L = ["{0988070979,APP03#media}","{0988070978,APP01#media}","{0988070977,APP02#media}"].
["{0988070979,APP03#media}","{0988070978,APP01#media}",
"{0988070977,APP02#media}"]
2> [{N, M} || X <- L, [N, M] <- [string:tokens(X, "{},")]].
[{"0988070979","APP03#media"},
{"0988070978","APP01#media"},
{"0988070977","APP02#media"}]
or (not recommended)
3> [{list_to_integer(N), list_to_atom(M)} || X <- L, [N, M] <- [string:tokens(X, "{},")]].
[{988070979,'APP03#media'},
{988070978,'APP01#media'},
{988070977,'APP02#media'}]
and so on.
You should use erl_scan module to tokenize the string and erl_parse to convert the tokens to a erlang term.
let your strings be Str
{ok, Ts, _} = erl_scan:string(Str).
{ok, Tup} = erl_parse:parse_term(Ts).
The Tup is the Tuples you need ...

Checking that a price value in a string is in the correct format

I use n <- getLine to get from user price. How can I check is value correct ? (Price can have '.' and digits and must be greater than 0) ?
It doesn't work:
isFloat = do
n <- getLine
let val = case reads n of
((v,_):_) -> True
_ -> False
If The Input Is Always Valid Or Exceptions Are OK
If you have users entering decimal numbers in the form of "123.456" then this can simply be converted to a Float or Double using read:
n <- getLine
let val = read n
Or in one line (having imported Control.Monad):
n <- liftM read getLine
To Catch Erroneous Input
The above code fails with an exception if the users enter invalid entries. If that's a problem then use reads and listToMaybe (from Data.Maybe):
n <- liftM (fmap fst . listToMaybe . reads) getLine
If that code looks complex then don't sweat it - the below is the same operation but doing all the work with explicit case statements:
n <- getLine
let val = case reads n of
((v,_):_) -> Just v
_ -> Nothing
Notice we pattern match to get the first element of the tuple in the head of the list, The head of the list being (v,_) and the first element is v. The underscore (_) just means "ignore the value in this spot".
If Floating Point Isn't Acceptable
Floating values are well known to be approximate, and not suitable for real world financial computations (but perhaps homework, depending on your professor). In this case you'd want to read the values into a Rational (from Data.Ratio).
n <- liftM maybeRational getLine
...
where
maybeRational :: String -> Maybe Rational
maybeRational str =
let (a,b) = break (=='.') str
in liftM2 (%) (readMaybe a) (readMaybe $ drop 1 b)
readMaybe = fmap fst . listToMaybe . reads
In addition to the parsing advice provided by TomMD, consider using the appropriate monad for error reporting. It allows you to conveniently chain computations which can fail, avoiding explicit error checking on every step.
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Error
parsePrice :: MonadError String m => String -> m Double
parsePrice s = do
x <- case reads s of
[(x, "")] -> return x
_ -> throwError "Not a valid real number."
when (x <= 0) $ throwError "Price must be positive."
return x
main = do
n <- getLine
case parsePrice n of
Left err -> putStrLn err
Right x -> putStrLn $ "Price is " ++ show x

Generate variations with repetitions Erlang

I've found a way to generate the combinations of a list, but here order doesn't matter, but I need to generate variations where order matters.
Combination:
comb_rep(0,_) ->
[[]];
comb_rep(_,[]) ->
[];
comb_rep(N,[H|T]=S) ->
[[H|L] || L <- comb_rep(N-1,S)]++comb_rep(N,T).
Output of this:
comb_rep(2,[1,2,3]).
[[1,1],[1,2],[1,3],[2,2],[2,3],[3,3]]
Desired output:
comb_rep(2,[1,2,3]).
[[1,1],[1,2],[1,3],[2,2],[2,3],[3,1],[3,2],[3,3]]
Following what’s explained in the comments, this will be my initial approach:
cr(0, _) ->
[];
cr(1, L) ->
[ [H] || H <- L ];
cr(N, L) ->
[ [H | T] || H <- L, T <- cr(N - 1, L -- [H]) ].
Permutations of length 0 is an edge case. I would even consider the removal of that clause so that the function fails if invoked as such.
Permutations of length 1 means just means each element in its own list.
Then, for the recursive case, if you already have the permutations of the list without the current element (cr(N - 1, L -- [H])) you can just add that element to the head of each list and you just need to do that for each element in the original list (H <- L).
Hope this helps.

Resources