I have read all document,
if it accept
ThresholdFilter
rt, I read the document, no such LevelMatchFilter
Related
I have two questions about logging using OSLog lib.
I used to use log4j and similar logging libraries and now i'm confusing with OSLog.
I have a struct called coord with x and y vars inside it. I can plot it directly using print but cannot add it in os_log function :
os_log("Step: %{coord}d", log: OSLog.default, type: .info, myCoord)
I know OSLog store log entries in memory and I must use console.app to see them but, can I spool all the log entries of my app to a log file ? I see some property files in apple doc but doesn't care about write them into a text file.
2.1 Is it possible to write to a file only a category of log entries to only analyze some categories, for example to analyze only an algorithm?
I'd like to use pandoc lua filter when I convert multiple markdown files from markdown to pdf. I'd like the titles of individual markdown files to be used as chapters (first-level headers).
I learned the existing examples, and I think this one is close to what I need - basically I need to add pandoc.Header(1, doc.meta.title) to all my markdown files, however I'm struggling to write the lua filter and make it work.
I think this question is doing a similar action pandoc filter in lua and walk_block
The pandoc command:
pandoc -N --lua-filter add-title.lua blog/*.md --pdf-engine=xelatex --toc -s -o my_book.pdf
The add-title.lua (this is just wrong, no exceptions but nothing happens to the output):
function add_header (header)
return {
{Header = pandoc.Header(1, meta.title)}}
end
Input files:
1.md
---
title: Topic1
---
## Sample Header from file 1.md
text text text
2.md
---
title: Topic2
---
## Sample Header from file 2.md
text text text
Expected output equivalent to this markdown (but my final format is pdf)
---
title: Title from pandoc latex variable
---
# Topic1
## Sample Header from file 1.md
text text text
# Topic2
## Sample Header from file 2.md
text text text
I think the key problem is that the lua filters only run once the full set of documents have been parsed into a single AST. So the individual files are effectively concatenated prior to parsing to create a single document with a single set of metadata. The individual title settings in the yaml metadata blocks are being overridden before the filter has a chance to run. Assuming that you need to get the heading from each separate metadata block (and can't just put the header in directly) this means that you cannot let pandoc join the files. You will need to read and parse each file separately. Fortunately this is pretty easy with filters.
The first step is to make a single reference file that contains links to all of the other files.
---
title: Combined title
---
![First file](1.md){.markdown}
![Second file](2.md){.markdown}
Note that the links are specified using images with a special class .markdown. You could use some other method, but images are convenient because they support attributes and are easy to recognise.
Now we just need a filter that will replace these images with the parsed elements from the linked markdown file. We can do this by opening the files from lua, and parsing them as complete documents with pandoc.read (see https://www.pandoc.org/lua-filters.html#module-pandoc). Once we have the documents we can read the title from the metadata and insert the new header. Note that we apply the filter to a Para element rather than the Image itself. This is because pandoc separates Block elements from Inline elements, and the return value of the filter must be of the same type. An Image filter cannot return the list of blocks parsed from the file but a Para can.
So here is the resulting code.
function Para(elem)
if #elem.content == 1 and elem.content[1].t == "Image" then
local img = elem.content[1]
if img.classes:find('markdown',1) then
local f = io.open(img.src, 'r')
local doc = pandoc.read(f:read('*a'))
f:close()
-- now we need to create a header from the metadata
local title=pandoc.utils.stringify(doc.meta.title) or "Title has not been set"
local newHeader=pandoc.Header(1, {pandoc.Str(title)})
table.insert(doc.blocks, 1, newHeader)
return doc.blocks
end
end
end
If you run it on the combined file with
pandoc -f markdown -t markdown -i combined.md -s --lua-filter addtitle.lua
you will get
---
title: Combined title
---
Topic 1
=======
Sample Header from file 1.md
----------------------------
text text text
Topic 2
=======
Sample Header from file 2.md
----------------------------
text text text
as required.
Note that any other yaml metadata in the included files is lost. You could capture anything else by taking it from the individual meta object and placing it into the global one.
My filename contains information that I need in my pipeline, for example the identifier for my data points is part of the filename and not a field in the data. e.g Every wind turbine generates a file turbine-loc-001-007.csv. e.g And I need the loc data within the pipeline.
Java (sdk 2.9.0):
Beams TextIO readers do not give access to the filename itself, for these use cases we need to make use of FileIO to match the files and gain access to the information stored in the file name. Unlike TextIO, the reading of the file needs to be taken care of by the user in transforms downstream of the FileIO read. The results of a FileIO read is a PCollection the ReadableFile class contains the file name as metadata which can be used along with the contents of the file.
FileIO does have a convenience method readFullyAsUTF8String() which will read the entire file into a String object, this will read the whole file into memory first. If memory is a concern you can work directly with the file with utility classes like FileSystems.
From: Document Link
PCollection<KV<String, String>> filesAndContents = p
.apply(FileIO.match().filepattern("hdfs://path/to/*.gz"))
// withCompression can be omitted - by default compression is detected from the filename.
.apply(FileIO.readMatches().withCompression(GZIP))
.apply(MapElements
// uses imports from TypeDescriptors
.into(KVs(strings(), strings()))
.via((ReadableFile f) -> KV.of(
f.getMetadata().resourceId().toString(), f.readFullyAsUTF8String())));
Python (sdk 2.9.0):
For 2.9.0 for python you will need to collect the list of URI from outside of the Dataflow pipeline and feed it in as a parameter to the pipeline. For example making use of FileSystems to read in the list of files via a Glob pattern and then passing that to a PCollection for processing.
Once fileio see PR https://github.com/apache/beam/pull/7791/ is available, the following code would also be an option for python.
import apache_beam as beam
from apache_beam.io import fileio
with beam.Pipeline() as p:
readable_files = (p
| fileio.MatchFiles(‘hdfs://path/to/*.txt’)
| fileio.ReadMatches()
| beam.Reshuffle())
files_and_contents = (readable_files
| beam.Map(lambda x: (x.metadata.path,
x.read_utf8()))
I am using HDFS sink and writing to HDFS. But the payload I write to HDFS is prefixed with ?contentType "text/plain" though this in not in the payload.
Please let me know why this is getting prefixed and how to remove it.
stream create --definition ":streaming --spring.cloud.stream.bindings.output.producer.headerMode=raw > myprocessor --spring.cloud.stream.bindings.output.content-type=text/plain --spring.cloud.stream.bindings.input.consumer.headerMode=raw|hdfs --spring.hadoop.fsUri=hdfs://127.0.0.1:50071 --hdfs.directory=/ws/sparkoutput --hdfs.file-name=sparkstream --hdfs.enable-sync=true --hdfs.flush-timeout=10000 --spring.cloud.stream.bindings.input.consumer.headerMode=raw --spring.cloud.stream.bindings.input.content-type=text/plain" --name sparkstream
If you are assuming that header mode for the hdfs input is raw then you should make the output of myprocessor raw as well - i.e.
myprocessor --spring.cloud.stream.bindings.output.content-type=text/plain --spring.cloud.stream.bindings.input.consumer.headerMode=raw --spring.cloud.stream.bindings.output.producer.headerMode=raw
Or alternatively you should remove the header settings on hdfs (since the sink will just process the payload then).
I am getting the following error in trying to load a large RDF/XML document into Fuseki:
> Code: 4/UNWISE_CHARACTER in PATH: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
How do I find out what line contains the offending error?
I have tried turning up the output in Log4j.properties and I also tried validating the RDF/XML file using the Jena commandline rdfxml tool (as well as utf8 & riot) --- it validates with no errors reported. But I'm new to this toolset.
(version?)
Check the ""-strings in your RDF/XML data for undesiravle URIs - especially spaces in URIs.
Best to validate before loading : try riot YourFile and send stderr and stdout to a file. The errors should be approximately in the position of the parser output (N-triples) at the time.