String Stream in Prolog? - stream

I have to work with some SWI-Prolog code that opens a new stream (which creates a file on the file system) and pours some data in. The generated file is read somewhere else later on in the code.
I would like to replace the file stream with a string stream in Prolog so that no files are created and then read everything that was put in the stream as one big string.
Does SWI-Prolog have string streams? If so, how could I use them to accomplish this task? I would really appreciate it if you could provide a small snippet. Thank you!

SWI-Prolog implements memory mapped files. Here is a snippet from some old code of mine, doing both write/read
%% html2text(+Html, -Text) is det.
%
% convert from html to text
%
html2text(Html, Text) :-
html_clean(Html, HtmlDescription),
new_memory_file(Handle),
open_memory_file(Handle, write, S),
format(S, '<html><head><title>html2text</title></head><body>~s</body></html>', [HtmlDescription]),
close(S),
open_memory_file(Handle, read, R, [free_on_close(true)]),
load_html_file(stream(R), [Xml]),
close(R),
xpath(Xml, body(normalize_space), Text).

Another option is using with_output_to/2 combined with current_output/1:
write_your_output_to_stream(Stream) :-
format(Stream, 'example output\n', []),
format(Stream, 'another line', []).
str_out(Codes) :-
with_output_to(codes(Codes), (
current_output(Stream),
write_your_output_to_stream(Stream)
)).
Usage example:
?- portray_text(true), str_out(C).
C = "example output
another line"
Of course, you can choose between redirecting output to atom, string, list of codes (as per example above), etc., just use the corresponding parameter to with_output_to/2:
with_output_to(atom(Atom), ... )
with_output_to(string(String), ... )
with_output_to(codes(Codes), ... )
with_output_to(chars(Chars), ... )
See with_output_to/2 documentation:
http://www.swi-prolog.org/pldoc/man?predicate=with_output_to/2
Later on, you could use open_string/2, open_codes_stream/2 and similar predicates to open string/list of codes as an input stream to read data.

Related

Read into Dask from Minio raises issue with reading / converting binary string JSON into utf8

I'm trying to read JSON-LD into Dask from Minio. The pipeline works but the strings come from Minio as binary strings
So
with oss.open('gleaner/summoned/repo/file.jsonld', 'rb') as f:
print(f.read())
results in
b'\n{\n "#context": "http://schema.org/",\n "#type": "Dataset",\n ...
I can simply convert this with
with oss.open('gleaner/summoned/repo/file.jsonld', 'rb') as f:
print(f.read().decode("utf-8"))
and now everything is as I expect it.
However, I am working with Dask and when reading into the a bag with
dgraphs = db.read_text('s3://bucket/prefa/prefb/*.jsonld',
storage_options={
"key": key,
"secret": secret,
"client_kwargs": {"endpoint_url":"https://example.org"}
}).map(json.loads)
I can not get the content coming from Minio to become strings vs binary strings. I need these converted before they hit the json.loads map I suspect.
I assume I can inject the "decode" in here somehow as well, but I can't resolve how.
Thanks
As the name implies, read_text opens the remote file in text mode, equivalent to open(..., 'rt'). The signature of read_text includes the various decoding arguments, such as UTF8 as the default encoding. You should not need to do anything else, but please post a specific error if you are having trouble, ideally with example file contents.
If your data isn't delimited by lines, read_text might not be right for you, and you can do something like
#dask.delayed()
def read_a_file(fn):
# or preferably open in text mode and json.load from the file
with oss.open('gleaner/summoned/repo/file.jsonld', 'rb') as f:
return json.loads(f.read().decode("utf-8"))
output = [read_a_file(f) for f in filenames]
and then you can create a bag or dataframe from this, as required.

How to read string stored in hdf5 format files by DM

I am scripting with DM and would like to read hdf5 file format.
I borrowed Tore Niermann's gms_HDF5_Plug-In (hdf5_GMS2X_amd64.dll) and his CMD_import_hdf5.s script. It use h5_read_dataset(filename, datapath) to read a image dataset.
I am trying to figure out the way to read a string info stored in the same file. I am particular interested to read the angle stored in string as shown in this figure.Demonstrated string to read. The h5_read_dataset(filename, datapath) function doesn't work for reading string.
There is a help file (hdf5_plugin.chm) with a list of functions but unfortunately I can't open them to see more info.
hdf5_plugin.chm showing the function list.
I suppose the right function to read strings should be something like h5_read_attr() or h5_info() but I didn't test them out. DM always says the two functions doesn't exist.
After reading out the angle by string, I will also need a bit help to convert the string to a double datatype.
Thank you.
Converting String to Number is done with the Val() command.
There is no integer/double/float concept for variables in DM-script, all are just number. ( This is different for images, where you can define the numeric type. Also: For file-inport/export a type differntiation can be made using the taggroup streaming commands in the other answer. )
Example script:
string numStr = "1.234e-2"
number num = val( numStr )
ClearResults()
Result( "\n As string:" + numStr )
Result( "\n As value:" + num )
Result( "\n As value, formatted:" + Format(num,"%3.2f") )
Potential answer regarding the .chm files: When you download (or email) .chm files in Windows, the OS classifies them as "potentially dagerouse" (because it could contain executable HTML code, I think). As a result, these files can not be shown by default. However, you can right-click these files and "unblock" them in the file properties.
Example:
I think this will be most likely a question specific to that plugin and not general DM scripting. So it might be better to contact the plugin-author directly.
The alternative (not good) solution would be to "rewrite" your own HDF5 file-reader, if you know the file-format. For this you would need the "Streaming" commands of the DM script language and browse through the (binary?) source file to the apropriate file location. The starting point for reading on this in the F1 help documentation would be here:

Is it possible to test a lexer made in JFlex without writing a parser?

I am beginning to use JFlex and I want to try to write a lexer first, and then move onto the parser. However, it seems like there is no way to test your JFlex lexer without writing a parser in CUP as well.
All I want to do is write a lexer, give it an input file and then output the lexemes to check that it read everything correctly. Later I would like to output the tokens, but lexemes would be a good start.
Yes It is possible to write a standalone scanner. You can read the details on this page. If you specify %standalone directive, it will add main method to the generated class. You can mention input files as command line arguments to run this program. jflex tar comes with an examples directory you can find one standalone example inside examples/standalone-maven/src/main/jflex directory. For quick reference I am posting one example code here
/**
This is a small example of a standalone text substitution scanner
It reads a name after the keyword name and substitutes all occurences
of "hello" with "hello <name>!". There is a sample input file
"sample.inp" provided in this directory
*/
package de.jflex.example.standalone;
%%
%public
%class Subst
%standalone
%unicode
%{
String name;
%}
%%
"name " [a-zA-Z]+ { name = yytext().substring(5); }
[Hh] "ello" { System.out.print(yytext()+" "+name+"!"); }

How to open Excel file written with incorrect character encoding in VBA

I read an Excel 2003 file with a text editor to see some markup language.
When I open the file in Excel it displays incorrect characters. On inspection of the file I see that the encoding is Windows 1252 or some such. If I manually replace this with UTF-8, my file opens fine. Ok, so far so good, I can correct the thing manually.
Now the trick is that this file is generated automatically, that I need to process it automatically (no human interaction) with limited tools on my desktop (no perl or other scripting language).
Is there any simple way to open this XL file in VBA with the correct encoding (and ignore the encoding specified in the file)?
Note, Workbook.ReloadAs does not function for me, it bails out on error (and requires manual action as the file is already open).
Or is the only way to correct the file to go through some hoops? Either: text in, check line for encoding string, replace if required, write each line to new file...; or export to csv, then import from csv again with specific encoding, save as xls?
Any hints appreciated.
EDIT:
ADODB did not work for me (XL says user defined type, not defined).
I solved my problem with a workaround:
name2 = Replace(name, ".xls", ".txt")
Set wb = Workbooks.Open(name, True, True) ' open read-only
Set ws = wb.Worksheets(1)
ws.SaveAs FileName:=name2, FileFormat:=xlCSV
wb.Close False ' close workbook without saving changes
Set wb = Nothing ' free memory
Workbooks.OpenText FileName:=name2, _
Origin:=65001, _
DataType:=xlDelimited, _
Comma:=True
Well I think you can do it from another workbook. Add a reference to AcitiveX Data Objects, then add this sub:
Sub Encode(ByVal sPath$, Optional SetChar$ = "UTF-8")
Dim stream As ADODB.stream
Set stream = New ADODB.stream
With stream
.Open
.LoadFromFile sPath ' Loads a File
.Charset = SetChar ' sets stream encoding (UTF-8)
.SaveToFile sPath, adSaveCreateOverWrite
.Close
End With
Set stream = Nothing
Workbooks.Open sPath
End Sub
Then call this sub with the path to file with the off encoding.

Backslash read and write and F# interactive console

Edit: whats the difference between reading a backslash from a file and writing it to the interactive window vs writing directly the string to the interactive window ?
For example
let toto = "Adelaide Gu\u00e9nard"
toto;;
the interactive window prints "Adelaide Guénard".
Now if I save a txt file with the single line Adelaide Gu\u00e9nard . And read it in:
System.IO.File.ReadAllLines(#"test.txt")
The interactive window prints [|"Adelaide Gu\u00e9nard"|]
What is the difference between these 2 statements in terms of the interactive window printing ?
As far as I know, there is no library that would decode the F#/C# escaping of string for you, so you'll have to implement that functionality yourself. There was a similar question on how to do that in C# with a solution using regular expressions.
You can rewrite that to F# like this:
open System
open System.Globalization
open System.Text.RegularExpressions
let regex = new Regex (#"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase)
let line = "Adelaide Gu\\u00e9nard"
let line = regex.Replace(line, fun (m:Match) ->
(char (Int32.Parse(m.Groups.[1].Value, NumberStyles.HexNumber))).ToString())
(If you write "some\\u00e9etc" then you're creating string that contains the same thing as what you'd read from the text file - if you use single backslash, then the F# compiler interprets the escaping)
It uses the StructuredFormat stuff from the F# PowerPack. For your string, it's effectively doing printfn toto;;.
You can achieve the same behaviour in a text file as follows:
open System.IO;;
File.WriteAllText("toto.txt", toto);;
The default encoding used by File.WriteAllText is UTF-8. You should be able to open toto.txt in Notepad or Visual Studio and see the é correctly.
Edit: If wanted to write the content of test.txt to another file in the clean F# interactive print, how would i proceed ?
It looks like fsi is being too clever when printing the contents of test.txt. It's formatting it as a valid F# expression, complete with quotes, [| |] brackets, and a Unicode character escape. The string returned by File.ReadAllLines doesn't contain any of these things; it just contains the words Adelaide Guénard.
You should be able to take the array returned by File.ReadAllLines and pass it to File.WriteAllLines, without the contents being mangled.

Resources