xterm.js intercepting and replacing ANSI sequences - xtermjs

I need to intercept ANSI sequences received by Xterm.js, mostly CSI sequences, in order to modify/customize them, what would be the best method to do so ?
Is there a xterm.js API that could help me ?
example of what i need to achieve :
<esc>[2m (SGR faint) replaced by a vivid color like (Cyan)
<esc>[36m
<esc>[2J (erase screen) replaced by a sequence of newlines before
the erase sequence because of xterm.js implementation not sending in
scrollback buffer anything erased
note : i cant modify softwares generating the CSI sequence, and i'm obliged to modify the behavior of above sequence to match the exact behavior of an old client i'm replacing
Since i'm using WebSSH as a framework, i evaluated the opportunity to do it before, on python backend, on on_read ssh channel.recv(), but it is challenging because of data being split over multiple messages resulting in an ansi sequence can be split across two messages. Clean incomplete ansi sequence detection seems difficult to reach.
Moreover it sounds dirty to me trying to patch this text buffer on backend.
That's why i think it would be much more efficient if i could do it directly from xterm.js between his ansi sequence detection and rendering.
I gave a try to xterm.js CSI hook registerCsiHandler but it seems not allowing me to change the whole sequence, and even less to add more data, i think it's not design to suit my needs ...

Related

Erlang and Elixir's colorful REPL shells

How does Learn some Erlang or IEx colorize the REPL shell? Is kjell a stable drop-in replacement?
The way this is done in LYSE is to use a javascript plugin called highlight.js, so LYSE isn't actually doing it, your browser is. There are plugins/modes for most mainstream(ish) languages available for highlight.js. If the web is what you are interested in, this is one way to do it (except for when a user can't use JS or has it turned off).
This isn't actually the shell being highlighted at all, nor is it useful anywhere outside of browsers. I've been messing around with a way to do this more generically, initially by inserting static formatting in HTML and XML documents (feed it a document, and it outputs one with Erlang syntax highlighted a certain way whenever this is detected/tagged). I don't yet have a decent project to publish for this (very low on my priority list atm), but I can point you in the direction of some solid inspiration: the source for wx:demo.
Pay particular attention to the function demo:code_area/1. There you will see how the tokenization routines are used to provide highlight hints for the source code text display area in the wx:demo application. This can provide a solid foundation to build your own source highlighting/display utility. (I think it wouldn't be impossible, considering every terminal in common use today responds correctly to ANSI color codes, to write a plugin to the shell that highlights terminal input directly -- not that there is a big clamor for this feature at the moment.)
EDIT (Prompted by a comment by Fred the Magic Wonder Dog)
On the subject of ANSI color codes, if this is what you are actually after, they are easy to implement as a prepend to any string value you are returning within a terminal. The terminal escapes them, so you won't see the characters, but will perform whatever action the code represents. There is no termination (its not like a markup tag that encloses the text) and typically no concept of "default color to go back to" (though there are a gajillion-jillion extensions to telnet and terminal modes that enable all sorts of nonsense like this).
An example of basic colorization is the telcon:greet/0 and telcon:sys_help/0 functions in the v0.1 code of ErlMUD (along with a slew of other places -- colorization in games is sort of a thing). What you see there is a pre-built list per color, but this could be represented any way that would get those values at the front of the string. (I just happened to remember the code value sequences, but didn't remember the characters that make them up; the next version of the code represents this somewhat differently.) Here is a link to a list of ANSI color codes and a discussion about colorizing the shell. Play around! Its nerdy fun, 1980's style!
Oh, I almost forgot... if you really want to go down the rabbit hole without silly little child toys like ncurses to help you, take a look at termcap.
I don't know if kjell is a stable drop-in replacement for Erl but it wouldn't be for IEx.
As far as how the colors are done; to the best of my knowledge it's done with ANSI Escape Sequences.

How to parse a very large file in F# using FParsec

I'm trying to parse a very large file using FParsec. The file's size is 61GB, which is too big to hold in RAM, so I'd like to generate a sequence of results (i.e. seq<'Result>), rather than a list, if possible. Can this be done with FParsec? (I've come up with a jerry-rigged implementation that actually does this, but it doesn't work well in practice due to the O(n) performance of CharStream.Seek.)
The file is line-oriented (one record per line), which should make it possible in theory to parse in batches of, say, 1000 records at a time. The FParsec "Tips and tricks" section says:
If you’re dealing with large input files or very slow parsers, it
might also be worth trying to parse multiple sections within a single
file in parallel. For this to be efficient there must be a fast way to
find the start and end points of such sections. For example, if you
are parsing a large serialized data structure, the format might allow
you to easily skip over segments within the file, so that you can chop
up the input into multiple independent parts that can be parsed in
parallel. Another example could be a programming languages whose
grammar makes it easy to skip over a complete class or function
definition, e.g. by finding the closing brace or by interpreting the
indentation. In this case it might be worth not to parse the
definitions directly when they are encountered, but instead to skip
over them, push their text content into a queue and then to process
that queue in parallel.
This sounds perfect for me: I'd like to pre-parse each batch of records into a queue, and then finish parsing them in parallel later. However, I don't know how to accomplish this with the FParsec API. How can I create such a queue without using up all my RAM?
FWIW, the file I'm trying to parse is here if anyone wants to give it a try with me. :)
The "obvious" thing that comes to mind, would be pre-processing the file using something like File.ReadLines and then parsing one line at a time.
If this doesn't work (your PDF looked, like a record is a few lines long), then you can make a seq of records or 1000 records or something like that using normal FileStream reading. This would not need to know details of the record, but it would be convenient, if you can at least delimit the records.
Either way, you end up with a lazy seq that the parser can then read.

Incremental Parsing from Handle in Haskell

I'm trying to interface Haskell with a command line program that has a read-eval-print loop. I'd like to put some text into an input handle, and then read from an output handle until I find a prompt (and then repeat). The reading should block until a prompt is found, but no longer. Instead of coding up my own little state machine that reads one character at a time until it constructs a prompt, it would be nice to use Parsec or Attoparsec. (One issue is that the prompt changes over time, so I can't just check for a constant string of characters.)
What is the best way to read the appropriate amount of data from the output handle and feed it to a parser? I'm confused because most of the handle-reading primatives require me to decide beforehand how much data I want to read. But it's the parser that should decide when to stop.
You seem to have two questions wrapped up in here. One is about incremental parsing, and one is about incremental reading.
Attoparsec supports incremental parsing directly. See the IResult type in Data.Attoparsec.Text. Parsec, alas, doesn't. You can run your parser on what you have, and if it gives an error, add more input and try again, but you really don't know if the error was an unrecoverable parse error, or just needing for more input.
In your case, usualy REPLs read one line at a time. Hence you can use hGetLine to read a line - pass it to Attoparsec, and if it parses evaluate it, and if not, get another line.
If you want to see all this in action, I do this kind of thing in Plush.Job.Output, but with three small differences: 1) I'm parsing byte streams, not strings. 2) I've set it up to pull as much as is available from the input and parse as many items as I can. 3) I'm reading directly from file descriptos. But the same structure should help you do it in your situation.

Delphi TStringList wrapper to implement on-the-fly compression

I have an application for storing many strings in a TStringList. The strings will be largely similar to one another and it occurs to me that one could compress them on the fly - i.e. store a given string in terms of a mixture of unique text fragments plus references to previously stored fragments. StringLists such as lists of fully-qualified path and filenames should be able to be compressed greatly.
Does anyone know of a TStringlist descendant that implement this - i.e. provides read and write access to the uncompressed strings but stores them internally compressed, so that a TStringList.SaveToFile produces a compressed file?
While you could implement this by uncompressing the entire stringlist before each access and re-compressing it afterwards, it would be unnecessarily slow. I'm after something that is efficient for incremental operations and random "seeks" and reads.
TIA
Ross
I don't think there's any freely available implementation around for this (not that I know of anyway, although I've written at least 3 similar constructs in commercial code), so you'd have to roll your own.
The remark Marcelo made about adding items in order is very relevant, as I suppose you'll probably want to compress the data at addition time - having quick access to entries already similar to the one being added, gives a much better performance than having to look up a 'best fit entry' (needed for similarity-compression) over the entire set.
Another thing you might want to read up about, are 'ropes' - a conceptually different type than strings, which I already suggested to Marco Cantu a while back. At the cost of a next-pointer per 'twine' (for lack of a better word) you can concatenate parts of a string without keeping any duplicate data around. The main problem is how to retrieve the parts that can be combined into a new 'rope', representing your original string. Once that problem is solved, you can reconstruct the data as a string at any time, while still having compact storage.
If you don't want to go the 'rope' route, you could also try something called 'prefix reduction', which is a simple form of compression - just start out each string with an index of a previous string and the number of characters that should be treated as a prefix for the new string. Be aware that you should not recurse this too far back, or access-speed will suffer greatly. In one simple implementation, I did a mod 16 on the index, to establish the entry at which prefix-reduction started, which gave me on average about 40% memory savings (this number is completely data-dependant of course).
You could try to wrap a Delphi or COM API around Judy arrays. The JudySL type would do the trick, and has a fairly simple interface.
EDIT: I assume you are storing unique strings and want to (or are happy to) store them in lexicographical order. If these constraints aren't acceptable, then Judy arrays are not for you. Mind you, any compression system will suffer if you don't sort your strings.
I suppose you expect general flexibility from the list (including delete operation), in this case I don't know about any out of the box solution, but I'd suggest one of the two approaches:
You split your string into words and
keep separated growning dictionary
to reference the words and save list of indexes internally
You implement something related to
zlib stream available in Delphi, but operating by the block that
for example can contains 10-100
strings. In this case you still have
to recompress/compress the complete
block, but the "price" you pay is lower.
I dont think you really want to compress TStrings items in memory, because it terribly ineffecient. I suggest you to look at TStream implementation in Zlib unit. Just wrap regular stream into TDecompressionStream on load and TCompressionStream on save (you can even emit gzip header there).
Hint: you will want to override LoadFromStream/SaveToStream instead of LoadFromFile/SaveToFile

Why does Tex/Latex not speed up in subsequent runs?

I really wonder, why even recent systems of Tex/Latex do not use any caching to speed up later runs. Every time that I fix a single comma*, calling Latex costs me about the same amount of time, because it needs to load and convert every single picture file.
(* I know that even changing a tiny comma could affect the whole structure but of course, a well-written cache format could see the impact of that. Also, there might be situations where 100% correctness is not needed as long as it’s fast.)
Is there something in the language of Tex which makes this complicated or impossible to accomplish or is it just that in the original implementation of Tex, there was no need for this (because it would have been slow anyway on those large computers)?
But then on the other hand, why doesn’t this annoy other people so much that they’ve started a fork which has some sort of caching (or transparent conversion of Tex files to a format which is faster to parse)?
Is there anything I can do to speed up subsequent runs of Latex? Except from putting all the stuff into chapterXX.tex files and then commenting them out?
Let's try to understand how TeX works. What happens when you write the following?
tex.exe myfile.tex
TeX reads your file byte by byte. First of all, TeX converts each char to pair <category, ascii-code>. Each character has category code and ascii code. Category code means that the character is an opening brace ({) or entrance into the mathematical mode ($), symbol-macro (~, for example) or letter (A-Z,a-z).
If TeX gets chars with category code 11 (letters) or 12 (other symbols: digits, comma, period) TeX starts a paragraph. You want to cache all paragraphs.
Suppose you changed something in your document. How can TeX check that all paragraphs after your changes is the same? May be you changed the category of some char. Me be you changed the meaning of some macro. Or you have removed } somewhere and thus changed the current font.
To be sure that the paragraph is the same you must be sure that all characters in the paragraph is the same, that all character categories is the same, the current font is the same, all math fonts is the same, and the value of some internal variables is the same (for example, \hsize, \vsize, \pretolerance, \tolerance, \hypenpenalty, exhyphenpenalty, \widowpenalty, \spaceskip, ..., ........)
You can be sure only that all paragraphed before your changes is the same. But in this case you must keep all states after each paragraph.
Your system SuperCachedTeX is very complicated. Isn't it?
If you're using pdftex, then you can use --draftmode on the command line for the first runs. This instructs pdftex not to generate a PDF.
Of course lots of things could be cached (like graphics information, for instance), but the way TeX works makes it hard to do. There is a rather complex initialization of TeX when it starts up, and one TeX run always means exactly one PDF written out. In order to do caching, you need to keep the data in memory (to be efficient).
You could use IPC and talk to a daemon to get the cached information. But that would involve lots programming. TeX is for normal purposes so blazingly fast, that this does not really gain a lot. But on the other hand, this is a good question, as I have seen LaTeX runs (on currend hardware) that run > 10 hours that would have benefited from caching.
Yet another answer, not strictly related:
You can use the LaTeX macro \include{...} and with \includeonly{} you can rerun your document for a subset only. But this is not caching, nor does it give you the complete document.
There are solutions such as preview-latex, which pre-compile stuff into a dedicated format file for speed purposes. You need to remember that TeX optimises pages on a local basis. There is no concpet at the engine level of material being fixed on a particular page, so you can't just "re-TeX one page".
Actually, the correct answer is (IMO): LaTeX already caches information in its output file (.aux, additional files for other packages). So if you add a comma, this information is reused and thus the typeset run is much faster then without this .aux file.
Tex does have a caching facility, named format files, and I think, pace Alexey's valuable summary of the problems representing Tex's state, it should be possible to use them to allow resumption of editing after any page eject.
The major issue is that pagebreaks will affect paragraphs or floats, and these may not occur at a particular point in the text, but may be occur in the execution of macros that were invoked dependent on the transient state passed to them when they were invoked.
So to make the idea of creating "breakpoints" work, one would need to hack Tex internals to dump additional information, beyond that normaally dumped in format files, and package them up with the state of the auxiliary files. Given what Joseph says about Tex fragment previewers, why would anyone bother hacking Tex to do this?

Resources