Read a single number from a text file and advance stream position in Julia - parsing

I understand that Julia has a complete set of low level tools for interfacing with binary files on one hand and some powerfull utilities such as readdlm to load text files containing rectangular data into Array structures on the other hand.
What I cannot discover in the standard library docs, however, is how to easily get input from less structured text files. In particular, what would be the Julia equivalent of the c++ idiom
some_input_stream >> a_variable_int_perhaps;
Given this is such a common usage scenario I am surprised something like this does not feature prominently in the standard library...

You can use readuntil http://docs.julialang.org/en/latest/stdlib/io-network/#Base.readuntil
shell> cat test.txt
1 2 3 4
julia> i,j = open("test.txt") do f
parse(Int, readuntil(f," ")), parse(Int, readuntil(f," "))
end
(1,2)
EDIT: To address comments
To get the last integer in an irregularly formatted ascii file you could use split if you know the character preceding the integer (I've use a blank space here)
shell> cat test.txt
1.0, two five:$#!() + 4
last line 3
julia> i = open("test.txt") do f
parse(Int, split(readline(f), " ")[end])
end
4
As far as code length is concerned, the above examples are completely self contained and the file is opened and closed in an exception safe manner (i.e. wrapped in a try-finally block). To do the same in C++ would be quite verbose.

Related

How to make the output of Maxima cleaner?

I want to make use of Maxima as the backend to solve some computations used in my LaTeX input file.
I did the following steps.
Step 1
Download and install Maxima.
Step 2
Create a batch file named cas.bat (for example) as follows.
rem cas.bat
echo off
set PATH=%PATH%;"C:\Program Files (x86)\Maxima-5.31.2\bin"
maxima --very-quiet -r %1 > solution.tex
Save the batch in the same directory in which your input file below exists. It is just for the sake of simplicity.
Step 3
Create the input file named main.tex (for example) as follows.
% main.tex
\documentclass[preview,border=12pt,12pt]{standalone}
\usepackage{amsmath}
\def\f(#1){(#1)^2-5*(#1)+6}
\begin{document}
\section{Problem}
Evaluate $\f(x)$ for $x=\frac 1 2$.
\section{Solution}
\immediate\write18{cas "x: 1/2;tex(\f(x));"}
\input{solution}
\end{document}
Step 4
Compile the input file with pdflatex -shell-escape main and you will get a nice output as follows.
!
Step 5
Done.
Questions
Apparently the output of Maxima is as follows. I don't know how to make it cleaner.
solution.tex
1
-
2
$${{15}\over{4}}$$
false
Now, my question are
how to remove such texts?
how to obtain just \frac{15}{4} without $$...$$?
(1) To suppress output, terminate input expressions with dollar sign (i.e. $) instead of semicolon (i.e. ;).
(2) To get just the TeX-ified expression sans the environment delimiters (i.e. $$), call tex1 instead of tex. Note that tex1 returns a string, which you have to print yourself (while tex prints it for you).
Combining these ideas with the stuff you showed, I think your program could look like this:
"x: 1/2$ print(tex1(\f(x)))$"
I think you might find the Maxima mailing list helpful. I'm pretty sure there have been several attempts to create a system such as the one you describe. You can also look at the documentation.
I couldn't find any way to completely clean up Maxima's output within Maxima itself. It always echoes the input line, and always writes some whitespace after the output. The following is an example of a perl script that accomplishes the cleanup.
#!/usr/bin/perl
use strict;
my $var = $ARGV[0];
my $expr = $ARGV[1];
sub do_maxima_to_tex {
my $m = shift;
my $c = "maxima --batch-string='exptdispflag:false; print(tex1($m))\$'";
my $e = `$c`;
my #x = split(/\(%i\d+\)/,$e); # output contains stuff like (%i1)
my $f = pop #x; # remove everything before the echo of the last input
while ($f=~/\A /) {$f=~s/\A .*\n//} # remove echo of input, which may be more than one line
$f =~ s/\\\n//g; # maxima breaks latex tokens in the middle at end of line; fix this
$f =~ s/\n/ /g; # if multiple lines, get it into one line
$f =~ s/\s+\Z//; # get rid of final whitespace
return $f;
}
my $e1 = do_maxima_to_tex("diff($expr,$var,1)");
my $e2 = do_maxima_to_tex("diff($expr,$var,2)");
print <<TEX;
The first derivative is \$$e1\$. Differentiating a second time,
we get \$$e2\$.
TEX
If you name this script a.pl, then doing
a.pl z 3*z^4
outputs this:
The first derivative is $12\,z^3$. Differentiating a second time,
we get $36\,z^2$.
For the OP's application, a script like this one could be what is invoked by the write18 in the latex file.
If you really want to use LaTeX then the maxiplot package is the answer. It provides a maxima environment inside of which you enter Maxima commands. When you process your LaTeX file a Maxima batch file is generated. Process this file with Maxima and process your LaTeX file again to typeset the equations generated by Maxima.
If you would rather have 2D math input with live typesetting then use TeXmacs. It is a cross-platform document authoring environment (a word processor on steroids if you like) that includes plugins for Maxima, Mathematica and many more scientific computing tools. If you need to or are not satisfied with the typesetting, you can export your document to LaTeX.
I know this is a very old post. Excellent answers for the question asked by OP. I was using --very-quiet -r options on the command line for a long time like OP, but in maxima version 5.43.2 they behave differently. See maxima command line v5.43 is behaving differently than v5.41. I am answering this question with a cross reference because when incorporating these answers in your solutions, make sure the changes in behavior of those command line flags are also incorporated.

Using sed or awk to parse current field/column into additional fields/columns on the same line

Here are 4 sample rows of the text file of interest
EnumerateKey,explorer.exe,HKCR\\Directory\\shellex\\ContextMenuHandlers,NOMORE
CreateSec,explorer.exe,\\WINDOWS\\system32\\verclsid.exe,SUCCESS
QueryKey,AcroRd32.exe,HKCU\\Control Panel\\International,BUFOVRFLOW
QueryValue,AcroRd32.exe,HKCU\\Software\\Microsoft\\Windows\\CurrentVersion\\Policies\\Explorer\\NoRecentDocsHistory,NOTFOUND
I would like to augment the rows by appending K fields/columns (for example K=3 below) which contain the elements of the path found in $3 but parsed by \\
Here are the desired output for the 4 lines.
EnumerateKey,explorer.exe,HKCR\\Directory\\shellex\\ContextMenuHandlers,NOMORE, Directory, shellex, ContextMenuHandlers
CreateSec,explorer.exe,\\WINDOWS\\system32\\verclsid.exe,SUCCESS, WINDOWS, system32, verclsid.exe
QueryKey,AcroRd32.exe,HKCU\\Control Panel\\International,BUFOVRFLOW, Control Panel, International,
QueryValue,AcroRd32.exe,HKCU\\Software\\Microsoft\\Windows\\CurrentVersion\\Policies\\Explorer\\NoRecentDocsHistory,NOTFOUND, Software, Microsoft, Windows
After some more study, here are 2 nuances:
Some of the paths begin with HK**, others don't. However, in both cases I only care about the path that starts after the initial \\. This difference is captured between line 1 and 2. Therefore I believe the parsing must be anchored at \\ rather than simply $3 if possible. (Am I using that terminology correctly?)
Second, the depth of the path varies. In order to keep consistency in column/fields I'm willing to lose some information (line 4) as well as have empty fields for the short paths (line 3) in order to maintain this.
Here's one way using GNU awk:
awk 'BEGIN { FS=OFS="," } { split($3,a,"\\\\\\\\"); print $0, a[2], a[3], a[4] }' file
Results:
EnumerateKey,explorer.exe,HKCR\\Directory\\shellex\\ContextMenuHandlers,NOMORE,Directory,shellex,ContextMenuHandlers
CreateSec,explorer.exe,\\WINDOWS\\system32\\verclsid.exe,SUCCESS,WINDOWS,system32,verclsid.exe
QueryKey,AcroRd32.exe,HKCU\\Control Panel\\International,BUFOVRFLOW,Control Panel,International,
QueryValue,AcroRd32.exe,HKCU\\Software\\Microsoft\\Windows\\CurrentVersion\\Policies\\Explorer\\NoRecentDocsHistory,NOTFOUND,Software,Microsoft,Windows
With some ugly Perl:
perl -lane '{$l=$_;s/.*?\\\\([^,]*),.*/$1/;#v=split(/\\\\/,$_); print "$l,".join(",",#v[0,1,2]);}' input

How to write an array into a text file in maxima?

I am relatively new to maxima. I want to know how to write an array into a text file using maxima.
I know it's late in the game for the original post, but I'll leave this here in case someone finds it in a search.
Let A be a Lisp array, Maxima array, matrix, list, or nested list. Then:
write_data (A, "some_file.data");
Let S be an ouput stream (created by openw or opena). Then:
write_data (A, S);
Entering ?? numericalio at the input prompt, or ?? write_ or ?? read_, will show some info about this function and related ones.
I've never used maxima (or even heard of it), but a little Google searching out of curiousity turned up this: http://arachnoid.com/maxima/files_functions.html
From what I can gather, you should be able to do something like this:
stringout("my_new_file.txt",values);
It says the second parameter to the stringout function can be one or more of these:
input: all user entries since the beginning of the session.
values: all user variable and array assignments.
functions: all user-defined functions (including functions defined within any loaded packages).
all: all of the above. Such a list is normally useful only for editing and extraction of useful sections.
So by passing values it should save your array assignments to file.
A bit more necroposting, as google leads here, but I haven't found it useful enough. I've needed to export it as following:
-0.8000,-0.8000,-0.2422,-0.242
-0.7942,-0.7942,-0.2387,-0.239
-0.7776,-0.7776,-0.2285,-0.228
-0.7514,-0.7514,-0.2124,-0.212
-0.7168,-0.7168,-0.1912,-0.191
-0.6750,-0.6750,-0.1655,-0.166
-0.6272,-0.6272,-0.1362,-0.136
-0.5746,-0.5746,-0.1039,-0.104
So I've found how to do this with printf:
with_stdout(filename, for i:1 thru length(z_points) do
printf (true,"~,4f,~,4f,~,4f,~,3f~%",bot_points[i],bot_points[i],top_points[i],top_points[i]));
A bit cleaner variation on the #ProdoElmit's answer:
list : [1,2,3,4,5]$
with_stdout("file.txt", apply(print, list))$
/* 1 2 3 4 5 is then what appears in file.txt */
Here the trick with apply is needed as you probably don't want to have square brackets in your output, as is produced by print(list).
For a matrix to be printed out, I would have done the following:
m : matrix([1,2],[3,4])$
with_stdout("file.txt", for row in args(m) do apply(print, row))$
/* 1 2
3 4
is what you then have in file.txt */
Note that in my solution the values are separated with spaces and the format of your values is fixed to that provided by print. Another caveat is that there is a limit on the number of function parameters: for example, for me (GCL 2.6.12) my method does not work if length(list) > 64.

How to check for EOF/EOL with Stream I/O in Fortran?

I would like to use FORTRAN streaming I/O to make a program that tells me how many lines a text-file has. The idea is to make something like this:
OPEN(UNIT=10,ACCESS='STREAM',FILE='testfile.txt')
nLines=0
bContinue=.TRUE.
DO WHILE (bContinue)
READ(UNIT=10) cCharacter
IF (cCharacter.EQ.{EOL-char}) nLines=nLines+1
IF (cCharacter.EQ.{EOF-char}) bContinue=.FALSE.
ENDDO
(I didn't include variable declaration but I think you get the idea of what they are; the only important clarification would be that that cCharacter has LEN=1)
My problem is that I don't know how to check if the character I just read from the file is an end-of-line or end-of-file (the "ifs" in the code). When you read and print characters this way, you eventually get newlines in the same place you had them in the original text, so I think it does read and recognize them as "characters", somehow. Perhaps turning the characters into integers and comparing to the appropriate number? Or is there a more direct way?
(I know that you can use the register reading (EDIT: I meant record reading) to do a program that reads lines more easily and add an IOstatus to check for eof, but the "line counter" is just a useful example, the idea is to learn how to move in a more controlled way through a textfile)
Checking for a specific character as line terminator makes you program OS dependent. It would be better to use the facilities of the language so that your program is compiler and OS dependent. Since lines are basically records, why do this with steam I/O? That request seems to make an easy job into a hard one. If are can use regular IO, here is an example program to count the lines in a text file.
EDIT: the code fragment was changed into a program to answer questions in the comments. With "line" as a character variable, when I test the program with gfortran and ifort I don't see a problem when the input file has empty or blank lines.
program test_lc
use, intrinsic :: iso_fortran_env
integer :: LineCount, Read_Code
character (len=200) :: line
open (unit=51, file="temp.txt", status="old", access='sequential', form='formatted', action='read' )
LineCount = 0
ReadLoop: do
read (51, '(A)', iostat=Read_Code) line
if ( Read_Code /= 0 ) then
if ( Read_Code == iostat_end ) then
exit ReadLoop ! end of file --> line count found
else
write ( *, '( / "read error: ", I0 )' ) Read_Code
stop
end if
end if
LineCount = LineCount + 1
write (*, '( I0, ": ''", A, "''" )' ) LineCount, trim (line)
if ( len_trim (line) == 0 ) write (*, '("The above is an empty or all blank line.")' )
end do ReadLoop
write (*, *) "found", LineCount, " lines"
end program test_lc
If you want to do further processing of the file, you can rewind it.
P.S.
The main reason that I have used Fortran Stream IO is to read files produced by other languages, e.g., C
Portable methods are provided to write new-line boundaries; I'm not aware of a portable method to test for such.

Correct word-count of a LaTeX document

I'm currently searching for an application or a script that does a correct word count for a LaTeX document.
Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include and \input links to produce a correct word-count for the whole document.
With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.
Does anyone know of any script (or application) that can do this job?
I use texcount. The webpage has a Perl script to download (and a manual).
It will include tex files that are included (\input or \include) in the document (see -inc), supports macros, and has many other nice features.
When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:
TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19
If you're only interested in the total, use the -total argument.
I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:
pdftotext file.pdf - | wc - w
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w
should give you a fairly accurate word count.
To add to #aioobe,
If you use pdflatex, just do
pdftops file.pdf
ps2ascii file.ps|wc -w
I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext produced a text with 1700+ words. texcount did not include the references and produced 1088 words. ps2ascii returned 1603 words. 4 more than in Word.
I say that's a pretty good count. I am not sure where's the 4 word difference, though. :)
In Texmaker interface you can get the word count by right clicking in the PDF preview:
Overleaf has a word count feature:
Overleaf v2:
Overleaf v1:
I use the following VIM script:
function! WC()
let filename = expand("%")
let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
let result = system(cmd)
echo result . " words"
endfunction
… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?
The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.
Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.
If the use of a vim plugin suits you, the vimtex plugin has integrated the texcount tool quite nicely.
Here is an excerpt from their documentation:
:VimtexCountLetters Shows the number of letters/characters or words in
:VimtexCountWords the current project or in the selected region. The
count is created with `texcount` through a call on
the main project file similar to: >
texcount -nosub -sum [-letter] -merge -q -1 FILE
<
Note: Default arguments may be controlled with
|g:vimtex_texcount_custom_arg|.
Note: One may access the information through the
function `vimtex#misc#wordcount(opts)`, where
`opts` is a dictionary with the following
keys (defaults indicated): >
'range' : [1, line('$')]
'count_letters' : 0/1
'detailed' : 0
<
If `detailed` is 0, then it only returns the
total count. This makes it possible to use for
e.g. statusline functions. If the `opts` dict
is not passed, then the defaults are assumed.
*VimtexCountLetters!*
*VimtexCountWords!*
:VimtexCountLetters! Similar to |VimtexCountLetters|/|VimtexCountWords|, but
:VimtexCountWords! show separate reports for included files. I.e.
presents the result of: >
texcount -nosub -sum [-letter] -inc FILE
<
*VimtexImapsList*
*<plug>(vimtex-imaps-list)*
The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.
For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F (Command+F on Mac) and then, with regex enabled, search for
(^|\s+|"|((h|f|te){)|\()\w+
which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and \usepackage declarations, while including quotations and parentheticals. It also counts footnotes and \emphasized text and will count \hyperref links as one word. It's not perfect, but it's typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn't a regular language. Just thought I'd throw this up here.

Resources