How do I skip document with errors in YAML stream? - parsing

It does not appear that the Python library pyyaml will allow me to read a multi-document YAML stream and continue past the point of an parsing error. I have two related questions:
Am I just missing something, and some other API will support this?
Do parsers in other programming languages support this operation? (if so, which)
Here is an example of a multiple-document YAML stream:
%YAML 1.1
---
# YAML can contain comments like this
name: David
age: 55
---
name: Mei
age: 50 # Including end-of-line
---
name: Juana: ERROR
age: 47
...
---
name: Adebayo
age: 58
...
I would like code similar to this to skip the bad document, but figure out "no matter how bad this document is, something new starts after the ... and ---.
with open('data/multidoc-bad.yaml') as stream:
docs = yaml.load_all(stream)
while True:
try:
doc = next(docs)
print(doc)
except StopIteration:
break
except Exception as err:
print(err)
I'd like to get:
{'name': 'David', 'age': 55}
{'name': 'Mei', 'age': 50}
mapping values are not allowed here
in "data/multidoc-bad.yaml", line 10, column 12
{'name': 'Adebayo', 'age': 58}
But in reality I do not get that last line for "Adebayo."
I recognize that I could write a small parser myself that reads lines and only looks for ... and --- lines to chunk the stream. Then pass only single documents to yaml.loads() after my own parsing. But it sure seems like that's what a parser is supposed to do for me.

Am I just missing something, and some other API will support this?
No, PyYAML cannot do this.
Do parsers in other programming languages support this operation? (if so, which)
None that I know of. Most YAML parsers are hand-written with quite some being translations from PyYAML. I don't know a single one that implements error recovery. (I worked with SnakeYAML, go-yaml, PyYAML, libyaml, YamlDotNet, and authored NimYAML and AdaYaml.)
But it sure seems like that's what a parser is supposed to do for me.
I think the reasons why parsers don't support this include
writing a compliant parser for YAML is already very complex without error recovery,
the multi-document feature is seldom used and therefore little effort is put into enhancing it,
this is the only case where it is obvious how to implement error recovery; I would argue that inside a YAML document, it is nigh impossible to implement useful error recovery, and therefore error recovery is not seen as an obvious feature to implement,
the workaround is very simple (you described it yourself).

Related

Serilog ExpressionTemplate rename log level and apply formatting

I'm trying to use serilog's ExpressionTemplate and write the log level in the following way:
{ "level": "INF", ... }
I know how to alias #l to level - level: #l
and I know how to format the level to three upper-case letters - #l:u3 but I'm unable to combine the two
I have tried the following without success (fails when trying to format the template with an exception):
level: #l:u3
{'level': #l:u3}
At the time of writing this is a limitation in Serilog.Expressions, which will hopefully be addressed soon.
Update: version 3.4.0-dev-* now on NuGet supports ToString(#l, 'u3').
You can work around it with conditionals:
{'level': if #l = 'Information' then 'INF' else if #l = 'Warning' then 'WRN' else 'ERR'}
With a few more branches for remaining levels.
(Alternatively, you could write and plug in a user defined function along the lines of ToShortLevel(#l), and use that instead.)

Puppet3 | read values from different yaml file

So I'm using puppet3 and I have X.yaml and Y.yaml. X.yaml has profiles::resolv_conf::nameservers: [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ]in it. I want to add that [ '1.1.1.1', '8.8.8.8', '2.2.2.2' ] as a value to the servers: which is in Y.yaml:
'dns_test':
plugin_type: 'dns_query'
options:
'servers': \['1.1.1.1', '8.8.8.8', "2.2.2.2"\]
'domains': \['google.com'\]
'record_type': 'A'
'timeout': 5
tags:
'input_source': 'dns_query'
By doing this I want to make sure that when someone change values in profiles::resolv_conf::nameservers: that value is changed in this telegraf plugin too.
I tried multiple solution but the one that was the closest was:
'dns_test':
plugin_type: 'dns_query'
options:
'servers': "%{hiera('profiles::resolv_conf::nameservers')}"
'domains': ['google.com']
'record_type': 'A'
'timeout': 5
tags: 'input_source': 'dns_query'
but problem is that puppet was adding extra " " to the value and final value in plugin conf was:
"["1.1.1.1", "2.2.2.2", "8.8.8.8"]" instead of ["1.1.1.1", "2.2.2.2", "8.8.8.8"]
TL;DR: You can't.
From the current docs and the Puppet documentation archive, I confirm that no version of the %{hiera} interpolation function or its replacement, %{lookup}, ever supported interpolating values other than strings. That's expressed in the current docs like so:
The lookup and hiera interpolation functions look up a key and return
the resulting value. The result of the lookup must be a string; any
other result causes an error.
(Emphasis added)
What you're looking for would be supported by Hiera 5's %{alias} function, provided that the data are available somewhere else in the same hierarchy (which is also a requirement for %{hiera}). Since you're stuck on Puppet 3, however, you're probably on Hiera 2, and certainly not later than Hiera 3.
"But wait!" You may say. "I'm getting a successful interpolation, but the data are just munged". Specifically, you wrote:
problem is that puppet was adding extra " " to the value and final value
Since %{hiera()} interpolates only strings, it is not surprising that you got a string value, given that you got a value at all. I do find it a bit surprising that Puppet did not throw an error, but I'm not prepared to comment further on that without a minimum reproducible example that demonstrates the behavior.

Sending IFS File to Outq Prints Line of "#" Symbols

I am attempting to send a file from IFS to an outq on our AS/400 system. Whenever I do, I get exactly what I send, as well as a line of "#" symbols of varying lengths appended to the end.
Here's the command I'm using:
qsh cmd('cat -c /path/test.txt | Rfile -wbQ -c "ovrprtf file(qprint)
outq(*LIBL/ABCD) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)"
qprint')
The contents of test.txt is just Hello World!
The output I get when I send the command is
Hello World!####################################################################
I have not found any posts online about a similar problem, and have tried changing values and looking for additional switches to get it to work. Nothing I'm doing seems to fix the issue.
Is there a command or switch that I am missing, or is something I have in there already causing this?
EDIT:
I found this documentation which is the first time I've seen this issue mentioned, but it's not very helpful:
“Messages for a Take Action command might consist of a long string of "at" symbols (#) in a pop-up message. (The Reflex automation Take Action command, which is configured in situations, does not have this problem.) A resolution for this problem is under construction. This problem might be resolved by the time of the product release. If you see this problem, contact IBM Software Support.”
The only differences are: 1) this is not a pop-up message, it's printed. 2) I don't believe we use Tivoli Monitoring, although I could be wrong.
Assuming we do use Tivoli Monitoring, what would the solution be? There's no additional documentation past that, and I am not a system administrator, so I can't really make the call to IBM Software Support myself. And assuming we DON'T use it, what else could cause this issue?
I get different results, yet similar. I created a test.txt with Windows Explorer, put in Hello, world!, saved it and tried the script. I got gibberish for the 'Hello, world!' and then the line of # symbols.
My system is 7.3 TR5, CCSID 37 (US English) and my IFS file is CCSID 1252 (Windows English). Results did not change if I used a stream file of CCSID 819 (US ASCII).
I didn't have any luck modifying Rfile switches.
I found that removing devtype(*userascii) produced printed output in plain English without the # symbols. Do you really need *USERASCII? I would think that would be more for a pre-formatted 'print-ready' file like Postscript or the like.
EDIT: some more things to try
I don't understand why *USERASCII is adding those # symbols; it looks like a translation issue.
I tried this and still got the extra ###... You might have to play with the TOCCSID() parameter. Although a failure, it did give me an idea: what if those # symbols are EBCDIC spaces being sent as-is to the *USERASCII print stream? All we'd need is a way to send only the number of bytes in the stream file, without any padding.
CRTPF FILE(QTEMP/PRTSTMF) RCDLEN(132)
CPY OBJ('/path/test.txt') TOOBJ('/qsys.lib/qtemp.lib/prtstmf.file/prtstmf.mbr') replace(*yes)
ovrprtf file(qprint) outq(*LIBL/prt3812) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)
cpyf prtstmf qprint
The data in QTEMP/PRTSTMF is in ASCII; DSPPFM shows that much. It also shows a bunch of spaces: after all, it is a fixed length file. My next step was to write an RPG program to read the stream file and print it, but Scott Klement already did that: http://www.scottklement.com/PrtStmf.zip
This works on my system:
ovrprtf file(qsysprt) outq(*LIBL/abcd) devtype(*USERASCII) rplunprt(*no) splfname(test) hold(*no)
prtstmf stmf('/path/test.txt') outq(abcd)

How do I get the exported types of an erlang module?

I had cause to check the types exported by a module, and I immediately thought "right, module_info then" but was surprised to run into a few difficulties. I found I can get the exported types from modules I compile, but not from say modules in stdlib.
My (three) questions are, how do I reliably get the exported types of a module, why are the exported types in the attributes bit of the module info on some modules, and why some modules and not others?
I discovered that if I build this module:
-module(foo).
-export([bar/0]).
-export_types([baz/0]).
bar() -> bat .
And then use foo:module_info/0, I get this:
[{exports,[{bar,0},{module_info,0},{module_info,1}]},
{imports,[]},
{attributes,[{vsn,[108921085595958308709649797749441408863]},
{export_types,[{baz,0}]}]},
{compile,[{options,[{outdir,"/tmp"}]},
{version,"5.0.1"},
{time,{2015,10,22,10,38,8}},
{source,"/tmp/foo.erl"}]}]
Great, hidden away in 'attributes' is 'export_types'. Why this is in attributes I'm not quite sure, but... whatever...
I now know this will work:
4> lists:keyfind(export_types, 1, foo:module_info(attributes)).
{export_types,[{baz,0}]}
Great. So, I now know this will work:
5> lists:keyfind(export_types, 1, ets:module_info(attributes)).
false
Ah... it doesn't.
I know there are exported types of course, if the documentation isn't good enough the ets source shows:
-export_type([tab/0, tid/0, match_spec/0, comp_match_spec/0, match_pattern/0]).
In fact the exported type information for the ets module doesn't seem to be anywhere in the module info:
6> rp(ets:module_info()).
[{exports,[{match_spec_run,2},
{repair_continuation,2},
{fun2ms,1},
{foldl,3},
{foldr,3},
{from_dets,2},
{to_dets,2},
{test_ms,2},
{init_table,2},
{tab2file,2},
{tab2file,3},
{file2tab,1},
{file2tab,2},
{tabfile_info,1},
{table,1},
{table,2},
{i,0},
{i,1},
{i,2},
{i,3},
{module_info,0},
{module_info,1},
{tab2list,1},
{match_delete,2},
{filter,3},
{setopts,2},
{give_away,3},
{update_element,3},
{match_spec_run_r,3},
{match_spec_compile,1},
{select_delete,2},
{select_reverse,3},
{select_reverse,2},
{select_reverse,1},
{select_count,2},
{select,3},
{select,2},
{select,1},
{update_counter,3},
{slot,2},
{safe_fixtable,2},
{rename,2},
{insert_new,2},
{insert,2},
{prev,2},
{next,2},
{member,2},
{match_object,3},
{match_object,2},
{match_object,1},
{match,3},
{match,2},
{match,1},
{last,1},
{info,2},
{info,1},
{lookup_element,3},
{lookup,2},
{is_compiled_ms,1},
{first,1},
{delete_object,2},
{delete_all_objects,1},
{delete,2},
{delete,1},
{new,2},
{all,0}]},
{imports,[]},
{attributes,[{vsn,[310474638056108355984984900680115120081]}]},
{compile,[{options,[{outdir,"/tmp/buildd/erlang-17.1-dfsg/lib/stdlib/src/../ebin"},
{i,"/tmp/buildd/erlang-17.1-dfsg/lib/stdlib/src/../include"},
{i,"/tmp/buildd/erlang-17.1-dfsg/lib/stdlib/src/../../kernel/include"},
warnings_as_errors,debug_info]},
{version,"5.0.1"},
{time,{2014,7,25,16,54,59}},
{source,"/tmp/buildd/erlang-17.1-dfsg/lib/stdlib/src/ets.erl"}]}]
ok
I took things to extremes now and ran this, logging the output to a file:
rp(beam_disasm:file("/usr/lib/erlang/lib/stdlib-2.1/ebin/ets.beam")).
Not that I don't consider this absurd... but anyway, it's about 5,000 lines of output, but nowhere do I find an instance of the string "tid".
Up to Erlang 18 this information is not easily available.
Dialyzer, for example, extracts it from the abstract syntax tree of the core Erlang version of a module (see e.g. dialyzer_utils:get_record_and_type_info/1 used by e.g. dialyzer_analysis_callgraph:compile_byte/5)
Regarding this part:
why are the exported types in the attributes bit of the module info on some modules, and why some modules and not others?
this is due to a bad definition in your module. The attribute should be -export_type, not -export_types. If you use the correct one (and define the baz/0 type and use it somewhere so that the module compiles), the exported types... vanish, as is expected.

Stream reasoning / Reactive programming in prolog?

I was wondering if you know of any way to use prolog for stream processing, that is, some kind of reactive programming, or at least to let a query run on a knowledge base that is continuously updated (effectively a stream), and continuously output the output of the reasoning?
Anything implemented in the popular "prologs", such as SWI-prolog?
You can use Logtalk's support for event-driven programming to define monitors that watch for knowledge base update events and react accordingly. You can run Logtalk using most Prolog systems as the backed compiler, including SWI-Prolog.
The event-driven features are described e.g. in the user manual:
http://logtalk.org/manuals/userman/events.html
The current distribution contains some examples of using events and monitors. An interesting one considering your question is the bricks example:
https://github.com/LogtalkDotOrg/logtalk3/tree/master/examples/bricks
Running this example first and then looking at its code should give you as good idea of what you can do with system wide events and monitors.
XSB has stream processing capabilities. See page 14 in the
XSB Manual
I'm working on something related, in project pqConsole already there is the basic capability: report to the user structured data, containing actionable areas (links) that call back in Prolog current state, hence the possibility to expose actions and react appropriately (hopefully).
It's strictly related to pqConsole::win_write_html, showcasing recent Qt capabilities of SWI-Prolog.
Here an example of a snippet producing only a simple formatted report, I'll try now to add the reactive part, so you can evaluate if you find expressive this basic system. Hints are welcome...
/* File: win_html_write_test.pl
Author: Carlo,,,
Created: Aug 27 2013
Purpose: example usage win_html_write/1
*/
:- module(win_html_write_test,
[dir2list/1
]).
:- [library(http/html_write)].
:- [library(dirtree)].
dir2list(Path) :-
dirtree(Path, DirTree),
% sortree(compare_by_attr(name), DirTree, Sorted), !,
DirTree = Sorted,
phrase(html([\css,
\logo,
hr([]),
ul(\dirtree2html(Sorted, [])),
br([])]), Tokens),
with_output_to(atom(X), print_html(Tokens)),
win_html_write(X),
dump_page_to_debug(X).
css --> html(style(type='text/css',
['.size{color:blue;}'
])).
logo --> html(img([src=':/swipl.png'],[])).
dirtree2html(element(dir, A, S), Parents) -->
html(li([\elem2html(A),
ul(\elements2html(S, [A|Parents]))])).
dirtree2html(element(file, A, []), _Parents) -->
html(li(\elem2html(A))).
elem2html(A) -->
{memberchk(name=N, A),
memberchk(size=S, A)
},
html([span([class=name], N), ' : ', span([class=size], S)]).
elements2html([E|Es], Parents) -->
dirtree2html(E, Parents),
elements2html(Es, Parents).
elements2html([], _Parents) --> [].
dump_page_to_debug(X) :-
open('page_to_debug.html', write, S),
format(S, '<html>~n~s~n</html>~n', [X]),
close(S).
This snippet depends on dirtree, that should be installed with
?- pack_install(dirtree).
edit With 3 modifications now the report has the ability to invoke editing of files:
call to get paths in structure
dir2list(Path) :-
dirtree(Path, DirTreeT),
assign_path(DirTreeT, DirTree),
...
request a specialized output for files only
dirtree2html(element(file, A, []), _Parents) -->
html(li(\file2html(A))).
finally, the 'handler' - here just place a request to invoke the editor
file2html(A) -->
{memberchk(name=N, A),
memberchk(path=P, A),
memberchk(size=S, A)
},
html([span([class=name],
[a([href='writeln(editing(\'~s\')), edit(\'~s\')'-[N,P]], N)]
), ' : ', span([class=size], S)]).
and now the file names are clickable, write a message and get edited if requested: a picture
You should check out RTEC: Run-Time Event Calculus:
https://github.com/aartikis/RTEC
RTEC is an open-source Event Calculus dialect optimised for stream reasoning. It is written in Prolog and has been tested under YAP 6.2.
Feature highlights:
Interval-based.
Sliding window reasoning.
Interval manipulation constructs for non-inertial fluents.
Caching for hierarchical knowledge bases.
Support for out-of-order data streams.
Indexing for handling efficiently irrelevant data.
There is also a mention of it on the SWI-Prolog website:
https://www.swi-prolog.org/pack/file_details/prologmud_I7/prolog/ec_planner/RTEC/README.md
which presumably relies on:
https://www.swi-prolog.org/pldoc/doc/_SWI_/library/dialect/yap.pl
I don't know why this hasn't been brought up so far, but in SWI-Prolog there is prolog_listen, which can, amongst other things, monitor dynamic updates to the database:

Resources