Parsing numbers from strings in lisp

Parsing numbers from strings in lisp - parsing

Here's the brief problem:
Input: a list of strings, each containing numbers
(" 3.4 5.4 1.2 6.4" "7.8 5.6 4.3" "1.2 3.2 5.4")
Output: a list of numbers
(3.4 5.4 1.2 6.4 7.8 5.6 4.3 1.2 3.2 5.4)
Here's my attempt at coding this:
(defun parse-string-to-float (line &optional (start 0))
"Parses a list of floats out of a given string"
(if (equalp "" line)
nil
(let ((num (multiple-value-list (read-from-string (subseq line start)))))
(if (null (first num))
nil
(cons (first num) (parse-string-to-float (subseq line (+ start (second num)))))))))
(defvar *data* (list " 3.4 5.4 1.2 6.4" "7.8 5.6 4.3" "1.2 3.2 5.4"))
(setf *data* (format nil "~{~a ~}" *data*))
(print (parse-string-to-float *data*))
===> (3.4 5.4 1.2 6.4 7.8 5.6 4.3 1.2 3.2 5.4)
However, for rather large data sets, it's a slow process. I'm guessing the recursion isn't as tight as possible and I'm doing something unnecessary. Any ideas?
Furthermore, the grand project involves taking an input file that has various data sections separated by keywords. Example -
%FLAG START_COORDS
1 2 5 8 10 12
%FLAG END_COORDS
3 7 3 23 9 26
%FLAG NAMES
ct re ct cg kl ct
etc...
I'm trying to parse a hash-table with the keywords that follow %FLAG as the keys, and the values stored as number or string lists depending on the particular keyword I'm parsing. Any ideas for libraries that already do this very type of job, or simple ways around this in lisp?

This is not a task you want to be doing recursively to begin with. Instead, use LOOP and a COLLECT clause. For example:
(defun parse-string-to-floats (line)
(loop
:with n := (length line)
:for pos := 0 :then chars
:while (< pos n)
:for (float chars) := (multiple-value-list
(read-from-string line nil nil :start pos))
:collect float))
Also, you might want to consider using WITH-INPUT-FROM-STRING instead of READ-FROM-STRING, which makes things even simpler.
(defun parse-string-to-float (line)
(with-input-from-string (s line)
(loop
:for num := (read s nil nil)
:while num
:collect num)))
As for performance, you might want to do some profiling, and ensure that you are actually compiling your function.
EDIT to add: One thing you do need to be careful of is that the reader can introduce a security hole if you're not sure of the source of the string. There's a read macro, #., which can allow evaluation of arbitrary code following it when it's read from a string. The best way to protect yourself is by binding the *READ-EVAL* variable to NIL, which will make the reader signal an error if it encounters #.. Alternatively, you can use one of the specialized libraries that Rainer Joswig mentions in his answer.

Parse a single string:
(defun parse-string-to-floats (string)
(let ((*read-eval* nil))
(with-input-from-string (stream string)
(loop for number = (read stream nil nil)
while number collect number))))
Process a list of strings and return a single list:
(defun parse-list-of-strings (list)
(mapcan #'parse-string-to-floats list))
Example:
CL-USER 114 > (parse-list-of-strings (list "1.1 2.3 4.5" "1.17 2.6 7.3"))
(1.1 2.3 4.5 1.17 2.6 7.3)
Note:
A costly operation is READ to read float values from streams. There are libraries like PARSE-NUMBER that might be more efficient - some Common Lisp implementation also might have the equivalent of a READ-FLOAT / PARSE-FLOAT function.

Also for performance, try
(declare (optimize (speed 3)))
inside your defun. Some lisps (for example SBCL) will print helpful messages about where it could not optimize, and the estimated cost of not having this optimization

As for performance, try at least measuring the memory allocation. I guess that all the performance is eaten by memory allocation and GC: you allocate a lot of big strings with subseq. E.g., (time (parse-string-to-float ..)) will show you how much time is spent in your code, how much in GC and how much memory was allocated.
If this is the case, then use string-stream (like in with-input-from-string) to decrease GC pressure.

Related

How to filter list 1 without elements of list 2

What is the best way to create a new list based on List1 without elements of List2?
List1 = ["Candy", "Brandy", "Sandy", "Lady", "Baby", "Shady"].
List2 = ["Sandy", "Shady", "Candy", "Sandy"].
The contents of the new list should be:
List3 = ["Brandy", "Lady", "Baby"].

Currently, the best way to do this is to use a module that handles sets, such as ordsets:
> ordsets:subtract(ordsets:from_list(List1), ordsets:from_list(List2)).
["Baby","Brandy","Lady"]
If you're using Erlang/OTP 22 or later (due to be released in June 2019), the best way is using the -- operator:
> List3 = List1 -- List2.
["Brandy","Lady","Baby"]
The runtime complexity of this operation is O(n log n) starting in Erlang/OTP 22, but in earlier Erlang versions, the runtime complexity of this operation is O(n*m), so it would perform very badly if both lists are very long.
See the Retired Myths chapter in the Erlang Efficiency Guide:
12.3 Myth: List subtraction ("--" operator) is slow
List subtraction used to have a run-time complexity proportional to the product of the length of its operands, so it was extremely slow when both lists were long.
As of OTP 22 the run-time complexity is "n log n" and the operation will complete quickly even when both lists are very long. In fact, it is faster and uses less memory than the commonly used workaround to convert both lists to ordered sets before subtracting them with ordsets:subtract/2.

Erlang: split binary on every char

I wrote a function that works, to split a binary to every char, but I have a feeling there is an easier way to do it:
my_binary_to_list(<<H,T/binary>>) ->
%slightly modified version of http://erlang.org/doc/efficiency_guide/binaryhandling.html
[list_to_binary([H])|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].
> my_binary_to_list(<<"ABC">>).
[<<"A">>,<<"B">>,<<"C">>]
I think this is probably messy because of the list_to_binary([H]) because H should already be a binary.
I tried using that linked function directly but got "AA" which was not what I wanted. Then I tried just [H] and got ["A","B","C"] which was also not what I wanted.

You can create a binary from a single byte without creating a list and calling list_to_binary like this:
my_binary_to_list(<<H,T/binary>>) ->
[<<H>>|my_binary_to_list(T)];
You can also use binary comprehensions here to do the same logic as above in a single line:
1> [<<X>> || <<X>> <= <<"ABC">>].
[<<"A">>,<<"B">>,<<"C">>]
You can also directly extract binaries of size 1 (this is probably not faster than above though):
2> [X || <<X:1/binary>> <= <<"ABC">>].
[<<"A">>,<<"B">>,<<"C">>]
Edit: a quick bench using timer:tc/1 runs the second code in roughly half the time compared to first, but you should benchmark yourself before using either one for performance reasons. Maybe the second one is sharing the large binary by creating sub binaries?
1> Bin = binary:copy(<<"?">>, 1000000).
<<"????????????????????????????????????????????????????????????????????????????????????????????????????????????????????"...>>
2> timer:tc(fun() -> [<<X>> || <<X>> <= Bin] end).
{14345634,
[<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<...>>|...]}
3> timer:tc(fun() -> [X || <<X:1/binary>> <= Bin] end).
{7374003,
[<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,
<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<"?">>,<<...>>|...]}

You can use a list comprehension with a bit string generator (<= consumes binaries, as opposed to <- which consumes lists):
> [<<A>> || <<A>> <= <<"foo">>].
[<<"f">>,<<"o">>,<<"o">>]
In your version, list_to_binary([H]) can be replaced by <<H>> - both generate a binary containing one byte. Whether using a list comprehension instead of a recursive function qualifies as "easier" might be a matter of taste.

Need advice on how to print a matrix in lisp

I have a matrix defined so if I do this
(format t "~a" (get-real-2d 0 0))
it prints out the element in the first row first column
and if I do this
(format t "~a" (get-real-2d a 0 1))
it prints out the element in first row second column
and if I do this
(format t "~a" (get-real-2d a 1 0))
it prints out the element in second row first column.
The matrix a looks like this
a =
((0 1 2)
(3 4 5)
(6 7 8))
and I was hoping you can show me exactly how to write a dotimes loop or other loop
that would in as few lines as possible would print out the matrix using the get-real-2d function so the output looks like this:
0 1 2
3 4 5
6 7 8
I'm just hoping you can show me a slick loop that would be real small that I can use to print matrices that I can use in my lisp library something real professional looking, like one that would use only variables. Something like:
(format t "~a" (get-real-2d i j))
instead of a bunch of:
(format t "~a" (get-real-2d 0 0))
(format t "~a" (get-real-2d 0 1))
(format t "~a" (get-real-2d 0 2))
;;;;LATEST EDIT;;;
to make this simple I call
(defparameter a (create-mat 3 3 +32fc1+))
to create a 3x3 matrix - create-mat is a wrapper for opencv's cvCreateMat
the output from that command at repl is
(defparameter a (create-mat 3 3 +32fc1+))
A
CL-OPENCV> a
#.(SB-SYS:INT-SAP #X7FFFD8000E00)
i/e the variable a is a pointer to the 3x3 matrix
then I run
(defparameter data (cffi:foreign-alloc :float :initial-contents
'(0.0f0 1.0f0 2.0f0 3.0f0 4.0f0 5.0f0 6.0f0 7.0f0 8.0f0)))
to create the data for the matrix - which I next will allocate to the matrix
the output from that command at repl is
CL-OPENCV> (defparameter data (cffi:foreign-alloc :float :initial-contents
'(0.0f0 1.0f0 2.0f0 3.0f0 4.0f0 5.0f0 6.0f0 7.0f0 8.0f0)))
DATA
CL-OPENCV> data
#.(SB-SYS:INT-SAP #X7FFFD8000E40)
i/e the variable a is data pointer to the data ill add to the matrix
then I call..
(set-data a data 12) to add the data to the matrix - set-data is a wrapper for opencv's cvSetData
so now when I run - (get-real-2d is a wrapper for opencv's cvGetReal2d)
(get-real-2d a 0 0) it gets the element of matrix a at row 0 col 0 which is 0.0d0
the output from that command at repl is
CL-OPENCV> (get-real-2d a 0 0)
0.0d0
and now when I run
(get-real-2d a 0 1) it gets the element of matrix a at row 0 col 1 which is is 0.0d0
the output from that command at repl is
CL-OPENCV> (get-real-2d a 0 1)
1.0d0
and when I run this loop
(dotimes (i 3)
(dotimes (j 3)
(format t "~a~%" (get-real-2d a i j))))
the output from that command at repl is
CL-OPENCV> (dotimes (i 3)
(dotimes (j 3)
(format t "~a~%" (get-real-2d a i j))))
0.0d0
1.0d0
2.0d0
3.0d0
4.0d0
5.0d0
6.0d0
7.0d0
8.0d0
NIL
but when I try your method #Svante
(dotimes (i 3)
(dotimes (j 3)
(format t "~{~{~a~^ ~}~%~}" (get-real-2d a i j))))
I get error:
The value 0.0d0 is not of type LIST.
[Condition of type TYPE-ERROR]
because the output of 1 run of get-real-2d is just a 1 number float i/e
CL-OPENCV> (get-real-2d a 0 0)
0.0d0
with that info can you help me print the matrix so it looks like this
0.0d0 1.0d0 2.0d0
3.0d0 4.0d0 5.0d0
6.0d0 7.0d0 8.0d0

You can do that directly in the format directive. The format instructions ~{ and ~} descend into a list structure.
(format t "~{~{~a~^ ~}~%~}" matrix)
The outer pair of ~{ ~} loops over the first level of the matrix, so that the directives inside get to see one row at a time. The inner pair of ~{ ~} loops over each such row, so that the directives inside get to see one element at a time. ~A prints that element. The part between ~^ and ~} gets printed only between executions of the loop body, not at the end. ~% emits a #\Newline.
EDIT as requested
Note that the ~{ ~} replace the looping, and that I named the variable matrix, not element. You need to put the entire matrix there, and it is supposed to be in the form of a nested list. I deduced this from your statement that a is ((0 1 2) (3 4 5) (6 7 8)). So, (format t "~{~{~a~^ ~}~%~}" a).
If the matrix happens not to be in the form of a nested list but rather some kind of array, you really need to loop over the indices. Nested dotimes forms should be sufficient at first:
(fresh-line)
(dotimes (i (array-dimension array 0))
(dotimes (j (array-dimension array 1))
(format t "~a " (aref array i j)))
(terpri))
I don't know how your matrices map to arrays, so you will have to replace array-dimension and aref with your versions.

Your question can be understood in two ways, and that is why it has two solutions:
Define method for printing object of type matrix (in this case it may use the knowledge about the internal structure of matrix):
(defmethod print-object ((matrix matrix) stream)
(format stream "~{~{~a~^ ~}~%~}" matrix))
Using format as is shown in the answers.
Define client function that can use the only method of your object - get-real-2d:
(defun print-matrix (matrix dimension-x dimension-y)
(dotimes (x dimension-x)
(dotimes (y dimension-y)
(princ (get-real-2d matrix x y))
(princ #\Space))
(princ #\Newline)))
Just using dotimes.

Here are just the two dotimes loops that you were asking for. The only thing that you need to pay attention for is when to print spaces and when to print newlines.
(dotimes (i 3)
(dotimes (j 3)
(princ (get-real-2d a i j))
(if (< j 2)
(princ #\Space)
(terpri))))
Alternatively, you might want to use the format directives for floating point printing to have the numbers always aligned in nice columns. You can choose between ~F that will never print an exponent, ~E that will always print one, and ~G that behaves according to the magnitude. Look for details here in the HyperSpec: http://www.lispworks.com/documentation/HyperSpec/Body/22_cc.htm.
Here's an example that uses ~F with a maximum field width of 5 and 1 fractional digit:
(dotimes (i 3)
(dotimes (j 3)
(format t "~5,1F" (get-real-2d a i j)))
(terpri))

This isn't hard, so I'd rather leave it to you to figure out, but here are some tips to make a "slick loop" Lisp-style. I would suggest one or more instances of mapc (or mapcar), rather than dotimes. This may feel odd if you're not used to functional programming, but once you're used to it, it's easier to read than dotimes, and you don't have to keep track of the indexes, so it can avoid errors. You really should learn to use mapcar/mapc if you aren't already familiar with them. They are cool. Or if you want to be really cool :-) you could use recursion to iterate over the matrix, but I think that for this purpose iterating using mapc will be easier to read. (But you should learn the recursive way for other jobs. If you find recursion confusing--I have no reason to think you do, but some people have trouble with it--my favorite tutorial is The Little Schemer.)
You may also want to use other format directives that allow you pad numbers with spaces if they don't have enough digits. The ~% directive may be useful as well. Peter Seibel has a very nice introduction to format.

BST printing without mutating?

So i basically want to printbst's .. here is a little more detail
Provide a function (printbst t) that prints a BST constructed from BST as provided by bst.rkt in the following format:
-Each node in the BST should be printed on a separate line;
-the left subtree should be printed after the root;
-The right subtree should be printed before the root;
-The key value should be indented by 2d spaces where d is its depth, or distance from the root. That is, the root should not be indented, the keys in its subtrees should be intended 2 spaces, the keys in their subtrees 4 spaces, and so on.
For example, the complete tree containing {1,2,3,4,5,6} would be printed like this:
6
5
4
3
2
1
Observe that if you rotate the output clockwise and connect each node to its subtrees, you arrive at the conventional graphical representation of the tree. Do not use mutation.
Here is what i have so far:
#lang racket
;;Note: struct-out exports all functions associated with the structure
(provide (struct-out BST))
(define-struct BST (key left right) #:transparent)
(define (depth key bst)
(cond
[(or (empty? bst) (= key (BST-key bst))) 0]
[else (+ 1 (depth key (BST-right bst)) (depth key (BST-left bst)))]))
(define (indent int)
(cond
[(= int 0) ""]
[else " " (indent (sub1 int))]))
(define (printbst t)
(cond
[(empty? t) (newline)]
[(and (empty? (BST-right t)) (empty? (BST-left t)))
(printf "~a~a" (indent (depth (BST-key t) t)) (BST-key t))]))
My printbst only prints a tree with one node thou .... i have an idea but it involves mutation, which i can't use :( ..... Any suggestions ? Should i change my approach to the problem all together?

Short answer: yes, you're going to want to restructure this more or less completely.
On the bright side, I like your indent function :)
The easiest way to write this problem involves making recursive calls on the subtrees. I hope I'm not giving away too much when I tell you that in order to print a subtree, there's one extra piece of information that you need.
...
Based on our discussion below, I'm going to first suggest that you develop the closely related recursive program that prints out the desired numbers with no indentation. So then the correct output would be:
6
5
4
3
2
1
Updating that program to the one that handles indentation is just a question of passing along a single extra piece of information.
P.S.: questions like this that produce output are almost impossible to write good test cases for, and consequently not great for homework. I hope for your sake that you have lots of other problems that don't involve output....

Binary to Integer -> Erlang

I have a binary M such that 34= will always be present and the rest may vary between any number of digits but will always be an integer.
M = [<<"34=21">>]
When I run this command I get an answer like
hd([X || <<"34=", X/binary >> <- M])
Answer -> <<"21">>
How can I get this to be an integer with the most care taken to make it as efficient as possible?

[<<"34=",X/binary>>] = M,
list_to_integer(binary_to_list(X)).
That yields the integer 21

As of R16B, the BIF binary_to_integer/1 can be used:
OTP-10300
Added four new bifs, erlang:binary_to_integer/1,2,
erlang:integer_to_binary/1, erlang:binary_to_float/1 and
erlang:float_to_binary/1,2. These bifs work similarly to how
their list counterparts work, except they operate on
binaries. In most cases converting from and to binaries is
faster than converting from and to lists.
These bifs are auto-imported into erlang source files and can
therefore be used without the erlang prefix.
So that would look like:
[<<"34=",X/binary>>] = M,
binary_to_integer(X).

A string representation of a number can be converted by N-48. For multi-digit numbers you can fold over the binary, multiplying by the power of the position of the digit:
-spec to_int(binary()) -> integer().
to_int(Bin) when is_binary(Bin) ->
to_int(Bin, {size(Bin), 0}).
to_int(_, {0, Acc}) ->
erlang:trunc(Acc);
to_int(<<N/integer, Tail/binary>>, {Pos, Acc}) when N >= 48, N =< 57 ->
to_int(Tail, {Pos-1, Acc + ((N-48) * math:pow(10, Pos-1))}).
The performance of this is around 100 times slower than using the list_to_integer(binary_to_list(X)) option.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parsing numbers from strings in lisp - parsing

Also for performance, try (declare (optimize (speed 3))) inside your defun. Some lisps (for example SBCL) will print helpful messages about where it could not optimize, and the estimated cost of not having this optimization

Related

How to filter list 1 without elements of list 2

Erlang: split binary on every char

Need advice on how to print a matrix in lisp

BST printing without mutating?

Binary to Integer -> Erlang

Categories

Resources