Clojure idiomatic read file to map - parsing

I need to read file to map {:v '[] :f '[]}. I split each line and if first element "v" then I add remaining part to v-array, same for f-array.
Example:
v 1.234 3.234 4.2345234
v 2.234 4.235235 6.2345
f 1 1 1
Expected result:
{:v [("1.234" "3.234" "4.2345234"), ("2.234" "4.235235" "6.2345")]
:f [("1" "1" "1")]}
My result:
{:v [("2.234" "4.235235" "6.2345")]
:f [("1" "1" "1")]}
Questions:
How can I fix error? (only last line was added to map)
Can I avoid global variable (model) and side effects?
Code:
(def model
{:v '[]
:f '[]})
(defn- file-lines
[filename]
(line-seq (io/reader filename)))
(defn- lines-with-data
[filename]
(->>
(file-lines filename)
(filter not-empty)
(filter #(not (str/starts-with? % "#")))))
(defn- to-item [data]
(let [[type & remaining] data]
(case type
"v" [:v (conj (:v model) remaining)]
"f" [:f (conj (:f model) remaining)])))
(defn- fill-model
[lines]
(into model
(for [data lines] (to-item data))))
(defn parse
[filename]
(->>
(lines-with-data filename)
(map #(str/split % #"\s+"))
(fill-model)))

You seem to be dropping the changing state of your model, instead appending the data of all lines to the original model with two empty vectors. You can keep the state of your model as you read the file, for example using reduce:
(defn- add-item [m data]
(let [[type & remaining] data]
;; Updates the model `m` by appending data under the given key.
;; In the line below, `(keyword type)` will be :v or :f depending on `type`.
;; Returns the updated model.
(update m (keyword type) conj remaining)))
(defn fill-model [lines]
;; Applies `add-item` to the original model and each line of the file
(reduce add-item model lines)))

Related

Type checker problem while writing a parser in racket

I hava a language PLANG that supports evaluating a
polynomial on a sequence of points (numbers).
the language allows expressions of the
form {{ π’‘π’π’π’š π‘ͺ𝟏 π‘ͺ𝟐 … π‘ͺπ’Œ} {π‘·πŸ π‘·πŸ … 𝑷𝓡}} where all 𝐢𝑖 and all 𝑃𝑗 are
valid AE expressions (and both π‘˜ β‰₯ 1 and β„“ β‰₯ 1).
I was trying to write a parse for this language here is what I have so far:
(define-type PLANG
[Poly (Listof AE) (Listof AE)])
(define-type AE
[Num Number]
[Add AE AE]
[Sub AE AE]
[Mul AE AE]
[Div AE AE])
(: parse-sexpr : Sexpr -> AE)
;; to convert s-expressions into AEs
(define (parse-sexpr sexpr)
(match sexpr
[(number: n) (Num n)]
[(list '+ lhs rhs) (Add (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '- lhs rhs) (Sub (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '* lhs rhs) (Mul (parse-sexpr lhs)
(parse-sexpr rhs))]
[(list '/ lhs rhs) (Div (parse-sexpr lhs)
(parse-sexpr rhs))]
[else (error 'parse-sexpr "bad syntax in ~s"
sexpr)]))
(: parse : String -> PLANG)
;; parses a string containing a PLANG expression to a PLANG AST
(define (parse str)
(let ([code (string->sexpr str)])
(parse-sexpr (code) )))
(test (parse "{{poly 1 2 3} {1 2 3}}")
=> (Poly (list (Num 1) (Num 2) (Num 3))
(list (Num 1) (Num 2) (Num 3))))
(test (parse "{{poly } {1 2} }")
=error> "parse: at least one coefficient is
required in ((poly) (1 2))")
(test (parse "{{poly 1 2} {} }")
=error> "parse: at least one point is
required in ((poly 1 2) ())")
when I'm trying to make it run I get the errors:
Type Checker: Cannot apply expression of type (U (Listof Sexpr) Boolean Real String Symbol), since it is not a function type in: (code)
. Type Checker: type mismatch
expected: Poly
given: AE in: (parse-sexpr (code))
. Type Checker: Summary: 2 errors encountered in:
(code)
(parse-sexpr (code))
>
Any help would be appreciated..
The first problem is caused by an extra pair of parentheses. Keep in mind that in Racket, Typed Racket, and #lang pl, parentheses usually mean function application like this:
(function argument ...)
So when you write (code), it tries to interpret code as a function, to call it with zero arguments.
You can fix this problem by replacing (code) with code in the body of the parse function.
(define (parse str)
(let ([code (string->sexpr str)])
(parse-sexpr code)))
The second problem happens because you specified that the parse function should return a PLANG, but it instead returns the result of parse-sexpr which returns an AE.
Another way of wording this is that you've implemented parsing for AEs, but not for PLANGs.

Parsing concrete syntax in Scheme

I wrote a procedure that gets a valid prefix list for subtraction (e.g, "(- 6 5)" for what we know as "6-5"). Here is my code:
(define parse-diff-list
(lambda (datum)
(cond
((number? datum) (const-exp datum)) ;; if datum is a number, return const-exp
((pair? datum) ;; if datum is a pair:
(let ((sym (car datum))) ;; let sym be the first of the pair
(cond
((eqv? sym '-) ;; if sym is minus:
(let ((lst1 (parse-diff-list (cdr datum)))) ;; parse second element of subtraction
(let ((lst2 (parse-diff-list (cdr lst1)))) ;; parse first element of subtraction
(cons (diff-exp (car lst1) (car lst2)) (cdr lst2))))) ;; "perform" the subtraction
((number? sym) ;; if sym is number:
(cons (const-exp sym) (cdr datum))) ;; return const-exp with the remainder of the list, yet to be processed
(else (eopl:error 'parse-diff-list "bad prefix-expression, expected - ~s" sym)))))
(eopl:error 'parse-diff-list "bad prefix-expression ~s" datum))))
(define parse-prefix
(lambda (lst)
(car (parse-diff-list lst))))
It works fine logically, but I don't understand the logic of the indentation in printing. For the input:
(parse-prefix '(- - 1 2 - 3 - 4 5))
It prints:
#(struct:diff-exp
#(struct:diff-exp #(struct:const-exp 1) #(struct:const-exp 2))
#(struct:diff-exp #(struct:const-exp 3) #(struct:diff-exp #(struct:const-exp 4) #(struct:const-exp 5)))
While I would want the following print style:
#(struct:diff-exp
#(struct:diff-exp
#(struct:const-exp 1)
#(struct:const-exp 2))
#(struct:diff-exp
#(struct:const-exp 3)
#(struct:diff-exp
#(struct:const-exp 4)
#(struct:const-exp 5)))
It's more than a petty question for me, as it does create indentations but I don't know how it does it.
Thanks a lot!
Take a look at racket/pretty the pretty printing library.
In particular note the parameter (pretty-print-columns) which
you can set like this:
`(pretty-print-columns 40)`
in order to avoid long lines.
http://docs.racket-lang.org/reference/pretty-print.html
(I am guessing you are using DrRacket based on the way the structures are printing)

Thinking in Clojure: Avoid OOP for simple string parser

I'm currently implementing a small parser in Clojure that takes an input string like:
aaa (bbb(ccc)ddd(eee)) fff (ggg) hhh
and returns the string without characters that are not in brackets, i.e.
(bbb(ccc)ddd(eee))(ggg)
I've written the following function:
(defn- parse-str [input]
(let [bracket (atom 0)
output (atom [])]
(doseq [ch (seq input)]
(case ch
\( (swap! bracket inc)
\) (swap! bracket dec)
nil)
(if (or (> #bracket 0) (= ch \)))
(swap! output conj ch)))
(apply str #output)))
which works for me:
(parse-str "aaa (bbb(ccc)ddd(eee)) fff (ggg) hhh")
"(bbb(ccc)ddd(eee))(ggg)"
I am however concerned that my approach is a too object oriented since it uses atoms as some kind of local variables to keep the current state of the parser.
Is it possible to write the same function from a more functional programming perspective? (avoiding the atoms?)
Any comments to improve my code are appreciated as well.
Two ways: You can use explicit recursion or reduce.
(defn parse-str [input]
(letfn [(parse [input bracket result]
(if (seq input)
(let [[ch & rest] input]
(case ch
\( (recur rest (inc bracket) (conj result ch))
\) (recur rest (dec bracket) (conj result ch))
(recur rest bracket (if (> bracket 0)
(conj result ch)
result))))
result))]
(clojure.string/join (parse input 0 []))))
(defn parse-str [input]
(clojure.string/join
(second (reduce (fn [acc ch]
(let [[bracket result] acc]
(case ch
\( [(inc bracket) (conj result ch)]
\) [(dec bracket) (conj result ch)]
[bracket (if (> bracket 0)
(conj result ch)
result)])))
[0 []]
input))))
In a lot of cases where you would use local variables, you just put any variable that changes as a parameter to loop, thereby using recursion instead of mutation.
(defn- parse-str [input]
;; Instead of using atoms to hold the state, use parameters in loop
(loop [output []
bracket 0
;; The [ch & tail] syntax is called destructuring,
;; it means let ch be the first element of (seq input),
;; and tail the rest of the elements
[ch & tail] (seq input)]
;; If there's no elements left, ch will be nil, which is logical false
(if ch
(let [bracket* (case ch
\( (inc bracket)
\) (dec bracket)
bracket)
output* (if (or (> bracket* 0) (= ch \)))
(conj output ch)
output)]
;; Recurse with the updated values
(recur output* bracket* tail))
;; If there's no characters left, apply str to the output
(apply str output))))
This is an iterative version of your function; but it's still functionally pure. I find having the code laid out like this makes it easy to read. Remember, when using recursion, always check your termination condition first.
(defn parse-str [s]
(loop [[x & xs] (seq s), acc [], depth 0]
(cond
(not x) (clojure.string/join acc)
(= x \() (recur xs (conj acc x) (inc depth))
(= x \)) (recur xs (conj acc x) (dec depth))
(<= depth 0) (recur xs acc depth)
:else (recur xs (conj acc x) depth))))

Recursively parse org-mode hierarchy

I'm trying to parse org-mode text in this way:
* head
** sub-head
- word :: description
** sub-head
- word :: description
- some notes
* head2
** sub-head2
- some more notes
I am trying to capture the data (such as "word :: description" and "some notes") in such a way that each piece of data preserves what its parent headers are and what the parent's parents are, etc. I envision the data coming out in such a form in elisp:
(
("head"
("sub-head" ("word :: definition"))
("sub-head" ("word :: description" "some notes"))
)
("head2"
("sub-head2" ("some more notes"))
)
)
I am guessing there is an elegant solution using recursion. I'm open to structuring the data in elisp a different way, if there's a better way to do it.
The function org-element-parse-buffer should help. It parses the whole org-mode buffer into a lisp list. You will get more properties than you need.
http://orgmode.org/worg/exporters/org-element-docstrings.html#sec-10
Here's a recursive solution:
(defun org-splitter (str lvl)
(let* ((lst (split-string
str
(concat lvl " ")))
(out (unless (= (length (car lst))
(length str))
(mapcar
(lambda (s)
(and
(string-match "\\([^\n]+\\)\n\\(.*\\)" s)
(list (match-string 1 s)
(org-splitter
(substring-no-properties
s (match-beginning 2))
(concat lvl "\\*")))))
(cdr lst)))))
(if (string= (car lst) "")
out
(cons (car lst) out))))
(defun org-recurse-all ()
(let ((str (buffer-substring-no-properties
(point-min) (point-max))))
(org-splitter str "^\\*")))

Parse Tab Delimited String

I'm having some trouble figuring out how to separate a string which is tab delimited into chunks of data as an example if i have a text file which I'm reading from that looks like this
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
and i read the first line of my file and get a string which of
"a1 b1 c1 d1 e2"
I want to separate this into 5 variables a,b,c,d and e, or create a list (a b c d e). Any thoughts?
Thanks.
Try concatenating parentheses onto the front and back of your input string, then using read-from-string (I assume you're using Common Lisp, since you tagged your question clisp).
(setf str "a1 b1 c1 d1 e2")
(print (read-from-string (concatenate 'string "(" str ")")))
Yet another way to go about it (a tad more robust, perhaps), You can also easily modify it so that you could `setf' a character in the string once the callback is called, but I didn't do it that way because it seemed like you don't need this sort of ability. Also, in that later case, I'd rather use a macro.
(defun mapc-words (function vector
&aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
"Iterates over string `vector' and calls the `function'
with the non-white characters collected so far.
The white characters are, by default: #\Space, #\Tab
#\Newline and #\Rubout.
`mapc-words' will short-circuit when `function' returns false."
(do ((i 0 (1+ i))
(start 0)
(len 0))
((= i (1+ (length vector))))
(if (or (= i (length vector)) (find (aref vector i) whites))
(if (> len 0)
(if (not (funcall function (subseq vector start i)))
(return-from map-words)
(setf len 0 start (1+ i)))
(incf start))
(incf len))) vector)
(mapc-words
#'(lambda (word)
(not
(format t "word collected: ~s~&" word)))
"a1 b1 c1 d1 e1
a2 b2 c2 d2 e2")
;; word collected: "a1"
;; word collected: "b1"
;; word collected: "c1"
;; word collected: "d1"
;; word collected: "e1"
;; word collected: "a2"
;; word collected: "b2"
;; word collected: "c2"
;; word collected: "d2"
;; word collected: "e2"
Here's an example macro you could use, if you wanted to modify the string as you read it, but I'm not entirely happy with it, so maybe someone will come up with a better variant.
(defmacro with-words-in-string
((word start end
&aux (whites '(#\Space #\Tab #\Newline #\Rubout)))
s
&body body)
`(do ((,end 0 (1+ ,end))
(,start 0)
(,word)
(len 0))
((= ,end (1+ (length ,s))))
(if (or (= ,end (length ,s)) (find (aref ,s ,end) ',whites))
(if (> len 0)
(progn
(setf ,word (subseq ,s ,start ,end))
,#body
(setf len 0 ,start (1+ ,end)))
(incf ,start))
(incf len))))
(with-words-in-string (word start end)
"a1 b1 c1 d1 e1
a2 b2 c2 d2 e2"
(format t "word: ~s, start: ~s, end: ~s~&" word start end))
assuming that they are tabbed (not spaced) then this will create a list
(defun tokenize-tabbed-line (line)
(loop
for start = 0 then (+ space 1)
for space = (position #\Tab line :start start)
for token = (subseq line start space)
collect token until (not space)))
which results in the following:
CL-USER> (tokenize-tabbed-line "a1 b1 c1 d1 e1")
("a1" "b1" "c1" "d1" "e1")

Resources