parsing input file in fortran - parsing

This is a continuation of my older thread.
I have a file from different code, that I should parse to use as my input.
A snippet from it looks like:
GLOBAL SYSTEM PARAMETER
NQ 2
NT 2
NM 2
IREL 3
*************************************
BEXT 0.00000000000000E+00
SEMICORE F
LLOYD F
NE 32 0
IBZINT 2
NKTAB 936
XC-POT VWN
SCF-ALG BROYDEN2
SCF-ITER 29
SCF-MIX 2.00000000000000E-01
SCF-TOL 1.00000000000000E-05
RMSAVV 2.11362995016878E-06
RMSAVB 1.25411205586140E-06
EF 7.27534671479201E-01
VMTZ -7.72451391270293E-01
*************************************
And so on.
Currently I am reading it line by line, as:
Program readpot
use iso_fortran_env
Implicit None
integer ::i,filestat,nq
character(len=120):: rdline
character(10)::key!,dimension(:),allocatable ::key
real,dimension(:),allocatable ::val
i=0
open(12,file="FeRh.pot_new",status="old")
readline:do
i=i+1
read(12,'(A)',iostat=filestat) rdline!(i)
if (filestat /= 0) then
if (filestat == iostat_end ) then
exit readline
else
write ( *, '( / "Error reading file: ", I0 )' ) filestat
stop
endif
end if
if (rdline(1:2)=="NQ") then
read(rdline(19:20),'(i)'),nq
write(*,*)nq
end if
end do readline
End Program readpot
So, I have to read every line, manually find the value column corresponding to the key, and write that(For brevity, I have shown for one value only).
My question is, is this the proper way of doing this? or there is other simpler way? Kindly let me know.

If the file has no variability you scarcely need to parse it at all. Let's suppose that you have declared variables for all the interesting data items in the file and that those variables have the names shown on the lines of the file. For example
INTEGER :: nq , nt, nm, irel
REAL:: scf_mix, scf_tol ! '-' not allowed in Fortran names
CHARACTER(len=48) :: label, text
LOGICAL :: semicore, lloyd
! Complete this as you wish
Then write a block of code like this
OPEN(12,file="FeRh.pot_new",status="old")
READ(12,*) ! Not interested in the 1st line
READ(12,*) label, nq
READ(12,*) label, nt
READ(12,*) label, nm
READ(12,*) label, irel
READ(12,*) ! Not interested in this line
READ(12,*) label, bext
READ(12,*) label, semicore
! Other lines to write
CLOSE(12)
Fortran's list-directed input understands blanks in lines to separate values. It will not read those blanks as part of a character variable. That behaviour can be changed but in your case you don't need to. Note that it will also understand the character F to mean .false. when read into a logical variable.
My code snippet just ignores the labels and lines of explanation. If you are of a nervous disposition you could process them, perhaps
IF (label/='NE') STOP
or whatever you wish.

Related

Erlang equivalent of javascript codePointAt?

Is there an erlang equivalent of codePointAt from js? One that gets the code point starting at a byte offset, without modifying the underlying string/binary?
You can use bit syntax pattern matching to skip the first N bytes and decode the first character from the remaining bytes as UTF-8:
1> CodePointAt = fun(Binary, Offset) ->
<<_:Offset/binary, Char/utf8, _/binary>> = Binary,
Char
end.
Test:
2> CodePointAt(<<"πr²"/utf8>>, 0).
960
3> CodePointAt(<<"πr²"/utf8>>, 1).
** exception error: no match of right hand side value <<207,128,114,194,178>>
4> CodePointAt(<<"πr²"/utf8>>, 2).
114
5> CodePointAt(<<"πr²"/utf8>>, 3).
178
6> CodePointAt(<<"πr²"/utf8>>, 4).
** exception error: no match of right hand side value <<207,128,114,194,178>>
7> CodePointAt(<<"πr²"/utf8>>, 5).
** exception error: no match of right hand side value <<207,128,114,194,178>>
As you can see, if the offset is not in a valid UTF-8 character boundary, the function will throw an error. You can handle that differently using a case expression if needed.
First, remember that only binary strings are using UTF-8 in Erlang. Plain double-quote strings are already just lists of code points (much like UTF-32). The unicode:chardata() type represents both of these kinds of strings, including mixed lists like ["Hello", $\s, [<<"Filip"/utf8>>, $!]]. You can use unicode:characters_to_list(Chardata) or unicode:characters_to_binary(Chardata) to get a flattened version to work with if needed.
Meanwhile, the JS codePointAt function works on UTF-16 encoded strings, which is what JavaScript uses. Note that the index in this case is not a byte position, but the index of the 16-bit units of the encoding. And UTF-16 is also a variable length encoding: code points that need more than 16 bits use a kind of escape sequence called "surrogate pairs" - for example emojis like 👍 - so if such characters can occur, the index is misleading: in "a👍z" (in JavaScript), the a is at 0, but the z is not at 2 but at 3.
What you want is probably what's called the "grapheme clusters" - those that look like a single thing when printed (see the docs for Erlang's string module: https://www.erlang.org/doc/man/string.html). And you can't really use numerical indexes to dig the grapheme clusters out from a string - you need to iterate over the string from the start, getting them out one at a time. This can be done with string:next_grapheme(Chardata) (see https://www.erlang.org/doc/man/string.html#next_grapheme-1) or if you for some reason really need to index them numerically, you could insert the individual cluster substrings in an array (see https://www.erlang.org/doc/man/array.html). For example: array:from_list(string:to_graphemes(Chardata)).

How to define custom character with code > 126 using ESC/POS commands?

I'm trying to define custom characters on thermal printer NCR 7199.
I used ESC/POS command
ESC & y c1 c2 x d1...dn
and it works fine. But this command can change only characters in range 32-126, and those characters are latin letters and common symbols.
I'd prefer to replace characters with codes 8E-8F, for example, but cannot do it using this command.
Is it possible? Or is there any other ESC/POS command for user-defined characters?
UPD.
It seems like a firmware update can fix this problem. Firmware version on our printer is v99.21, and I saw this in release notes:
v99.25 "based on v99.24"
1. Allowed User-defined characters defined range from 20H to FFH in 7199 Emulation mode
Another user-defined character setting command for Kanji is for printers that support the MBCS character set, which is not what you want.
FS 2
Define user-defined Kanji characters
However, although it is unclear whether NCR 7199 supports it, ESC/POS has the ability to customize the font of the user-defined code page rather than the individual characters.
Please refer to the contents of the following pages.
GS ( E <Function 7>
Copy the user-defined page
GS ( E <Function 8>
Define the data (column format) for the character code page
GS ( E <Function 9>
Define the data (raster format) for the character code page
GS ( E <Function 10>
Delete the data for the character code page
Problem solved by updating firmware.
After updating "Main Firmware" to v99.27 (it seems that version must be greater or equal v99.25) and changing Emulation mode to "NCR 7199" I was finally able to define characters in all range 20-FF.

How to check for EOF/EOL with Stream I/O in Fortran?

I would like to use FORTRAN streaming I/O to make a program that tells me how many lines a text-file has. The idea is to make something like this:
OPEN(UNIT=10,ACCESS='STREAM',FILE='testfile.txt')
nLines=0
bContinue=.TRUE.
DO WHILE (bContinue)
READ(UNIT=10) cCharacter
IF (cCharacter.EQ.{EOL-char}) nLines=nLines+1
IF (cCharacter.EQ.{EOF-char}) bContinue=.FALSE.
ENDDO
(I didn't include variable declaration but I think you get the idea of what they are; the only important clarification would be that that cCharacter has LEN=1)
My problem is that I don't know how to check if the character I just read from the file is an end-of-line or end-of-file (the "ifs" in the code). When you read and print characters this way, you eventually get newlines in the same place you had them in the original text, so I think it does read and recognize them as "characters", somehow. Perhaps turning the characters into integers and comparing to the appropriate number? Or is there a more direct way?
(I know that you can use the register reading (EDIT: I meant record reading) to do a program that reads lines more easily and add an IOstatus to check for eof, but the "line counter" is just a useful example, the idea is to learn how to move in a more controlled way through a textfile)
Checking for a specific character as line terminator makes you program OS dependent. It would be better to use the facilities of the language so that your program is compiler and OS dependent. Since lines are basically records, why do this with steam I/O? That request seems to make an easy job into a hard one. If are can use regular IO, here is an example program to count the lines in a text file.
EDIT: the code fragment was changed into a program to answer questions in the comments. With "line" as a character variable, when I test the program with gfortran and ifort I don't see a problem when the input file has empty or blank lines.
program test_lc
use, intrinsic :: iso_fortran_env
integer :: LineCount, Read_Code
character (len=200) :: line
open (unit=51, file="temp.txt", status="old", access='sequential', form='formatted', action='read' )
LineCount = 0
ReadLoop: do
read (51, '(A)', iostat=Read_Code) line
if ( Read_Code /= 0 ) then
if ( Read_Code == iostat_end ) then
exit ReadLoop ! end of file --> line count found
else
write ( *, '( / "read error: ", I0 )' ) Read_Code
stop
end if
end if
LineCount = LineCount + 1
write (*, '( I0, ": ''", A, "''" )' ) LineCount, trim (line)
if ( len_trim (line) == 0 ) write (*, '("The above is an empty or all blank line.")' )
end do ReadLoop
write (*, *) "found", LineCount, " lines"
end program test_lc
If you want to do further processing of the file, you can rewind it.
P.S.
The main reason that I have used Fortran Stream IO is to read files produced by other languages, e.g., C
Portable methods are provided to write new-line boundaries; I'm not aware of a portable method to test for such.

Easiest way to remove Latex tag (but not its content)?

I am using TeXnicCenter to edit a LaTeX document.
I now want to remove a certain tag (say, emph{blabla}} which occurs multiple times in my document , but not tag's content (so in this example, I want to remove all emphasization).
What is the easiest way to do so?
May also be using another program easily available on Windows 7.
Edit: In response to regex suggestions, it is important that it can deal with nested tags.
Edit 2: I really want to remove the tag from the text file, not just disable it.
Using a regular expression do something like s/\\emph\{([^\}]*)\}/\1/g. If you are not familiar with regular expressions this says:
s -- replace
/ -- begin match section
\\emph\{ -- match \emph{
( -- begin capture
[^\}]* -- match any characters except (meaning up until) a close brace because:
[] a group of characters
^ means not or "everything except"
\} -- the close brace
and * means 0 or more times
) -- end capture, because this is the first (in this case only) capture, it is number 1
\} -- match end brace
/ -- begin replace section
\1 -- replace with captured section number 1
/ -- end regular expression, begin extra flags
g -- global flag, meaning do this every time the match is found not just the first time
This is with Perl syntax, as that is what I am familiar with. The following perl "one-liners" will accomplish two tasks
perl -pe 's/\\emph\{([^\}]*)\}/\1/g' filename will "test" printing the file to the command line
perl -pi -e 's/\\emph\{([^\}]*)\}/\1/g' filename will change the file in place.
Similar commands may be available in your editor, but if not this will (should) work.
Crowley should have added this as an answer, but I will do that for him, if you replace all \emph{ with { you should be able to do this without disturbing the other content. It will still be in braces, but unless you have done some odd stuff it shouldn't matter.
The regex would be a simple s/\\emph\{/\{/g but the search and replace in your editor will do that one too.
Edit: Sorry, used the wrong brace in the regex, fixed now.
\renewcommand{\emph}[1]{#1}
any reasonably advanced editor should let you do a search/replace using regular expressions, replacing emph{bla} by bla etc.

Character column parsing in Boost::Spirit

I'm working on a Boost Spirit 2.0 based parser for a small subset of Fortran 77. The issue I'm having is that Fortran 77 is column oriented, and I have been unable to find anything in Spirit that can allow its parsers to be column-aware. Is there any way to do this?
I don't really have to support the full arcane Fortran syntax, but it does need to be able to ignore lines that have a character in the first column (Fortran comments), and recognize lines with a character in the sixth column as continuation lines.
It seems like folks dealing with batch files would at least have the same first-column problem as me. Spirit appears to have an end-of-line parser, but not a start-of-line parser (and certianly not a column(x) parser).
Well, since I now have an answer to this, I guess I should share it.
Fortran 77, like probably all other languages that care about columns, is a line-oriented language. That means your parser has to keep track of the EOL and actually use it in its parsing.
Another important fact is that in my case, I didn't care about parsing the line numbers that Fortran can put in those early control columns. All I need is to know when it is telling me to scan rest of the line differently.
Given those two things, I could entirely handle this issue with a Spirit skip parser. I wrote mine to
skip the entire line if the first (comment) column contains an alphabetic charater.
skip the entire line if there is nothing on it.
ignore the preceeding EOL and everything up to the fifth column if the fifth column contains a '.' (continuation line). This tacks it to the preceeding line.
skip all non-eol whitespace (even spaces don't matter in Fortran. Yes, it's a wierd language.)
Here's the code:
skip =
// Full line comment
(spirit::eol >> spirit::ascii::alpha >> *(spirit::ascii::char_ - spirit::eol))
[boost::bind (&fortran::parse_info::skipping_line, &pi)]
|
// remaining line comment
(spirit::ascii::char_ ('!') >> *(spirit::ascii::char_ - spirit::eol)
[boost::bind (&fortran::parse_info::skipping_line_comment, &pi)])
|
// Continuation
(spirit::eol >> spirit::ascii::blank >>
spirit::qi::repeat(4)[spirit::ascii::char_ - spirit::eol] >> ".")
[boost::bind (&fortran::parse_info::skipping_continue, &pi)]
|
// empty line
(spirit::eol >>
-(spirit::ascii::blank >> spirit::qi::repeat(0, 4)[spirit::ascii::char_ - spirit::eol] >>
*(spirit::ascii::blank) ) >>
&(spirit::eol | spirit::eoi))
[boost::bind (&fortran::parse_info::skipping_empty, &pi)]
|
// whitespace (this needs to be the last alternative).
(spirit::ascii::space - spirit::eol)
[boost::bind (&fortran::parse_info::skipping_space, &pi)]
;
I would advise against blindly using this yourself for line-oriented Fortran, as I ignore line numbers, and different compilers have different rules for valid comment and continuation characters.

Resources