I have JSON files with hundreds of lines, but when there is an error that causes a parsing exception, the library returns a character position, not a line number.
Line number would be hugely helpful since most text editors will show you, or take you to the line number, but I don't know of any that give the absolute character number.
I found the spot in parse_error where deserialization member byte_ holds the character index, but it doesn't seem to have line number info.
Does the json container "know" which line it is, and I could ask for it in the exception handler? I know this isn't a trivial issue, since different OS's give us the "joys" of different EOLs, but perhaps it has been handled anyway?
Related
My question is pertaining to a file status 23, which according to MicroFocus means that upon my attempt to READ from a .DAT file:
"Indicates no record found."
or
"Indicates a duplicate key condition. Attempt has been made to store a
record that would create a duplicate key in the indexed or relative
file or a duplicate alternate record key that does not allow
duplicates."
I have eliminated the fact that the latter is my issue because I'm allowing duplicates in this case.
The reason I'm stumped is that I'm using a START to navigate to the record inside of my .DAT file, and when I execute a READ just after the START has positioned my file pointer, I get the file status 23.
Here is my code:
900-GET-INST-ID.
OPEN INPUT INST-MST.
MOVE FALL-IN-INST TO INST-NAME-REC.
START INST-MST
KEY EQUAL TO INST-NAME-REC
INVALID KEY
DISPLAY "RECORD NOT FOUND"
NOT INVALID KEY
READ INST-MST
MOVE INST-ID-REC TO WS-INST-ID
END-START.
CLOSE INST-MST.
So when I am running this code my START successfully runs and goes into the NOT INVALID KEY block, and then the very next line executes and my read is null. How can this be if my alternate key (INST-NAME-REC) is actually found inside the .DAT?
I have ensured that my FD picture clauses match exactly in the ISAM Build program and in this program (the reading program).
The second reason you show is excluded not because you allow duplicate keys, but because that error message with that file-status is for a WRITE, and your failure is on a READ.
Here's your problem:
READ INST-MST
Here's how you fix it:
READ INST-MST NEXT
In COBOL 85, the READ statement has two formats. Format 1 is for a sequential read and Format 2 is for a keyed (random) read.
Unfortunately, the minimum READ syntax for both sequential and keyed reads is:
READ file-name
Which means if you use READ file-name the compiler will implicitly treat it as Format 1 or Format 2 depending on your SELECT statement.
READ file-name NEXT RECORD is identical to READ file-name NEXT.
Consult your actual documentation for a full explanation and discovery of possible Language Extensions from the vendor. If you consult closely, the behaviour of READ file-name with no further option depends on the type of file. With a keyed file, the default is a keyed READ. You key field (luckily) does not contain a key that exists, so you get the 23.
Even if it didn't work like that, what would be the point of not using the word NEXT? The compiler always knows what you tell it (which sometimes is not what you think you tell it), but in a situation like this, the human reader can be very unsure. The last thing you want to do when bug-hunting is break off to look at the manual to discover exactly how that behaves, and then try to work it if that behaviour was the one sought by the original coder. The bug? A bug? Intended, but sloppy, code? No-one wants to spend that time, and look, even now, it is you.
A couple of comments on your code.
Look up the FILE STATUS clause of the SELECT. Use it. One field per file. Check after each IO. It'll save you grief.
Once using the FILE STATUS, ditch the imperative parts of the IO statements (the something/NOT something) and replace by tests of the file-status field (using 88s).
It looks like you are OPENing and CLOSEing your look-up file all the time. Please don't. OPEN and CLOSE can be very heavy and time-consuming, so do them once per program per file. If you've done that because of a problem, find a correct resolution to that problem, don't use a hack.
Drop the full-stops/periods except where they are needed. This is COBOL 85, which means for 30 years the number of full-stops/periods required in the PROCEDURE DIVISION have been greatly reduced. Get modern, and take advantage of that, it'll save you Gotcha!s as you copy/paste code, leaving the one which shouldn't be there and changing the way the program behaves.
I have modified the PLSQL parser given by [Porcelli] (https://github.com/porcelli/plsql-parser). I am using this parser to parse PlSql files. After successful parsing, I am printing the AST. Now, I want to edit the AST and print back the original plsql source with edited information. How can I achieve this? How can I get back source file from AST with comments, newline and whitespace. Also, formatting should also be remain as original file.
Any lead towards this would be helpful.
Each node in an AST comes with an index member which gives you the token position in the input stream (token stream actually). When you examine the indexes in your AST you will see that not all indexes appear there (there are holes in the occuring indexes). These are the positions that have been filtered out (usually the whitespaces and comments).
Your input stream however is able to give you a token at a given index and, important, to give you every found token, regardless of the channel it is in. So, your strategy could be to iterate over the tokens from your token stream and print them out as they come along. Additionally, you can inspect your AST for the current index and see if instead a different output must be generated or additional output must be appended.
The simple answer is "walk the tree, and spit out text that corresponds to the nodes".
ANTLR offers "StringTemplates" as a basic kind of help, but in fact there's a lot of
fine detail that needs to be addressed: indentation, literals and their formats, comments,...
See my SO answer on Compiling an AST back to source code for a lot more detail.
One thing not addressed there is the general need to reproduce the original character encoding of the file (if you can, sometimes you can't, e.g., you had an ASCII file but inserted a string containing a Unicode character).
I'm trying to interface Haskell with a command line program that has a read-eval-print loop. I'd like to put some text into an input handle, and then read from an output handle until I find a prompt (and then repeat). The reading should block until a prompt is found, but no longer. Instead of coding up my own little state machine that reads one character at a time until it constructs a prompt, it would be nice to use Parsec or Attoparsec. (One issue is that the prompt changes over time, so I can't just check for a constant string of characters.)
What is the best way to read the appropriate amount of data from the output handle and feed it to a parser? I'm confused because most of the handle-reading primatives require me to decide beforehand how much data I want to read. But it's the parser that should decide when to stop.
You seem to have two questions wrapped up in here. One is about incremental parsing, and one is about incremental reading.
Attoparsec supports incremental parsing directly. See the IResult type in Data.Attoparsec.Text. Parsec, alas, doesn't. You can run your parser on what you have, and if it gives an error, add more input and try again, but you really don't know if the error was an unrecoverable parse error, or just needing for more input.
In your case, usualy REPLs read one line at a time. Hence you can use hGetLine to read a line - pass it to Attoparsec, and if it parses evaluate it, and if not, get another line.
If you want to see all this in action, I do this kind of thing in Plush.Job.Output, but with three small differences: 1) I'm parsing byte streams, not strings. 2) I've set it up to pull as much as is available from the input and parse as many items as I can. 3) I'm reading directly from file descriptos. But the same structure should help you do it in your situation.
As already pointed out in the topic, I got the following error:
Character #\u009C cannot be represented in the character set CHARSET:CP1252
trying to print out a string given back by drakma:http-request, as far as I understand the error-code the problem is that the windows-encoding (CP1252) does not support this character.
Therefore to be able to process it, I might/must convert the whole string.
My question is what package/library does support converting strings to certain character-sets efficiently?
An alike question is this one, but just ignoring the error would not help in my case.
Drakma already does the job of "converting strings": after all, when it reads from some random webserver, it just gets a stream of bytes. It then has to convert that to a lisp string. You probably want to bind *drakma-default-external-format* to something else, although I can't remember off-hand what the allowable values are. Maybe something like :utf-8?
Given the following module:
run(N)->
timer:tc(?MODULE,fct,[N]).
I call it by run(100). from a shell and I have this:
{1,
{'EXIT',{undef,[{parser,loop,"d"},
{timer,tc,3},
{erl_eval,do_apply,5},
{shell,exprs,7},
{shell,eval_exprs,7},
{shell,eval_loop,3}]}}}
100 is interpreted as a char ($d = 100) and not as an integer !
Where is my fault ?
In Erlang, [100] and "d" are indistinguishable, the code you show above isn't the problem. The Erlang shell is being helpful (for certain values of help) and printing [100] as "d" because it's a list containing only integers representing printable characters.
The real problem is the undef error in the above, my guess is that your parser module doesn't contain a function parser:loop/1 that you call via parser:fct/1.
Did you get any warnings on your compilation ? I suspect you will see at least one message about an unused function. As you are learning, if you see a warning message then investigate it, understand it and correct it. Generally speaking, you want your code to have no warning messages.
If a function is called in MFA style then it has to be exported in the source code. From what you've shown it's not clear if it is named "fct" or "loop". So, make sure your naming is consistent, and make sure it is exported : You need this in your source code (assuming the function is called "loop" and takes 1 argument) :
-export([loop/1]).
Error messages in Erlang can be tricky to decipher at first. Take some time to read more and become more familiar with them and you will save yourself lots of time going forward.