How to colour lines in LaTeX that match regular expressions

How to colour lines in LaTeX that match regular expressions - latex

I'm currently using the LaTeX listings package to display a block of code, and a diff of the file against a previous version.
Both blocks are coloured by listings as if they are code (which they are) but I would like to colour the diff similarly to emacs diff-mode - red for lines matching ^-, green for ^\+ etc.
Does anyone know if there is a package to achieve this, whether listings can do it, or whether it is possible to write a LaTeX command to do it? (or rather - where a good source of information on the latter can be found?)
Thanks,
Hud

If you just want unadorned diffs, Pygments supports them, so texments (a latex front-end for pygments) does what you want.
But I guess that what you want is to have diff's coloured, while having the syntax of the underlying code highlighted appropriately. This you can't do properly the usual way, in general, because syntax highlighting may depend on the state from the previous line, and with udiffs the previous line may be missing, or an inserted line might follow a deleted line, &c.
To do the right thing, you'd need to syntax highlight the old and new versions, and then scramble the highlighted versions together to get the right output. Quite a bit of work, and I've not heard of anyone who's done that.
You could also try simply modifying the usual syntax highlighter for a language, removing highlighting rules that involve multiline state, and inserting rules to colour lines with udiff markup. Cf. Pygments' Write your own lexer; what you want from diff is trickier, since you want what is coloured to be highlighted, so you can't just make the lines into GenericTokens; I don't know what the right way to do this is.

Use a scripting tool such as sed, awk, Python [lang of choice]to produce the Latex source code.

Related

Erlang and Elixir's colorful REPL shells

How does Learn some Erlang or IEx colorize the REPL shell? Is kjell a stable drop-in replacement?

The way this is done in LYSE is to use a javascript plugin called highlight.js, so LYSE isn't actually doing it, your browser is. There are plugins/modes for most mainstream(ish) languages available for highlight.js. If the web is what you are interested in, this is one way to do it (except for when a user can't use JS or has it turned off).
This isn't actually the shell being highlighted at all, nor is it useful anywhere outside of browsers. I've been messing around with a way to do this more generically, initially by inserting static formatting in HTML and XML documents (feed it a document, and it outputs one with Erlang syntax highlighted a certain way whenever this is detected/tagged). I don't yet have a decent project to publish for this (very low on my priority list atm), but I can point you in the direction of some solid inspiration: the source for wx:demo.
Pay particular attention to the function demo:code_area/1. There you will see how the tokenization routines are used to provide highlight hints for the source code text display area in the wx:demo application. This can provide a solid foundation to build your own source highlighting/display utility. (I think it wouldn't be impossible, considering every terminal in common use today responds correctly to ANSI color codes, to write a plugin to the shell that highlights terminal input directly -- not that there is a big clamor for this feature at the moment.)
EDIT (Prompted by a comment by Fred the Magic Wonder Dog)
On the subject of ANSI color codes, if this is what you are actually after, they are easy to implement as a prepend to any string value you are returning within a terminal. The terminal escapes them, so you won't see the characters, but will perform whatever action the code represents. There is no termination (its not like a markup tag that encloses the text) and typically no concept of "default color to go back to" (though there are a gajillion-jillion extensions to telnet and terminal modes that enable all sorts of nonsense like this).
An example of basic colorization is the telcon:greet/0 and telcon:sys_help/0 functions in the v0.1 code of ErlMUD (along with a slew of other places -- colorization in games is sort of a thing). What you see there is a pre-built list per color, but this could be represented any way that would get those values at the front of the string. (I just happened to remember the code value sequences, but didn't remember the characters that make them up; the next version of the code represents this somewhat differently.) Here is a link to a list of ANSI color codes and a discussion about colorizing the shell. Play around! Its nerdy fun, 1980's style!
Oh, I almost forgot... if you really want to go down the rabbit hole without silly little child toys like ncurses to help you, take a look at termcap.

I don't know if kjell is a stable drop-in replacement for Erl but it wouldn't be for IEx.
As far as how the colors are done; to the best of my knowledge it's done with ANSI Escape Sequences.

How to define a multipage environment not interrupted by tables and figures?

I have defined a new LaTeX environment for excursions in a book chapter I am writing. The environment is multipage and often includes inline images. Moreover, I am using the shaded environment to give the environment a background colour to make it stand out a bit.
However, the environment, as shown below, is split up by floating tables and images, which makes the flow of the environment visually more difficult to follow. For example, it is now difficult to see if that floating image or table is part (the missing background colour does not help). So, I like to extend my environment to disallow it to be interrupted by floating elements, but do not know how to get that done.
\newcounter{bioclipse}
\def\thebioclipse{\thechapter-\arabic{bioclipse}}
\newenvironment{bioclipse}[2][]{\begin{small}\begin{shaded}\refstepcounter{bioclipse} \par\medskip\noindent%
\textbf{Bioclipse Excursion~\thebioclipse #1: #2
\vspace{0.1cm} \hrule \vspace{0.1cm}}
\rmfamily}{\medskip \end{shaded}\end{small}}
Any solution to disallow interruption is fine, even if the background colour is done differently.

The algorithm for insertion is rather complex. Basically you want that any pending insertion must not be put into the page where the env bioclipse apply. As first fast solution you could flush all the insertion first, and then start the new chapter. If you want to put figures or whatever into the environment, and you want them to be "flushed" only after the last page where the env is at work... second fast solution is: put them after the environment directly!! So they won't "annoy" the page/s at all (of course, avoid using of footnotes).
The other solution (making it someway automatic) is a little bit tricky. Insertion places for "pending" materials are chosen while constructing the vertical list that is the page ("candidate"), in the output routine. This means you have to play with the output routine at worst; but maybe it is too much, unless you're planning your own TeX format, and maybe LaTeX gives easier choices...
Digging a bit in LaTeX codes, I see there's a conditional you can try to use, it is \#insert* i.e. \#insertfalse and \#inserttrue. If you're lucky they "drive" the possibility of putting insertions, so that at the beginning of your env you can say \#insertfalse and at the end \#inserttrue. Try, I am not saying it works.
As maybe you know to use # as letter catcode so that it can be part of a "command" name, you have to use \makeatletter and \makeatother when you finished (likely default class/style preamble does it for you).
You could also be iterested in having a look at placeins style (it could be already in your installation, otherwise, see here ) which apparently could solve (part of) your problem.

Latex - Apply an operation to every character in a string

I am using LaTeX and I have a problem concerning string manipulation.
I want to have an operation applied to every character of a string, specifically
I want to replace every character "x" with "\discretionary{}{}{}x". I want to do
this because I have a long string (DNA) which I want to be able to separate at
any point without hyphenation.
Thus I would like to have a command called "myDNA" that will do this for me instead of
inserting manually \discretionary{}{}{} after every character.
Is this possible? I have looked around the web and there wasnt much helpful
information on this topic (at least not any I could understand) and I hoped
that you could help.
--edit
To clarify:
What I want to see in the finished document is something like this:
the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT
just plain linebreaks, without any hyphens. The DNA sequence will be one
long string without any spaces or anything but it can break at any point.
This is why my idea was to inesert a "\discretionary{}{}{}" after every
character, so that it can break at any point without inserting any hyphens.

This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\say#1{#1}
You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.
Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.
Revision:
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\transform#1{#1\hskip 0pt plus 1pt}
Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.
Edit:
There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…

Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\#gobble and \#ifnextchar):
\makeatletter
\def\hyphenatestring#1{\xHyphen#te#1$\unskip}
\def\xHyphen#te{\#ifnextchar${\#gobble}{\sw#p{\hskip 0pt plus 1pt\xHyphen#te}}}
\def\sw#p#1#2{#2#1}
\makeatother
Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an # in them somewhere so that users don't accidentally call them.
If you want to figure out how this works, \#gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw#p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen#te} and put that with no args in the "else" branch, but (in my opinion) \sw#p is more general (and I'm surprised it's not in standard LaTeX already).

There is a contrib package on CTAN that deals with typesetting DNA sequences. It does a little more than just line-breaking, for example, it also supports colouring. I'm not sure if it is possible to get the output you are after though, and I have no experience in the DNA-sequence-typesetting area, but is one long string the most readable representation?

Assuming your string is the same, in your preamble, use the \newcommand{}{}. Like this:
\newcommand{\myDNA}{blah blah blah}
if that doesn't satisfy your requirements, I suggest:
2. Break the strings down to the smallest portion, then use the \newcommand and then call the new commands in sequence: \myDNA1 \myDNA2.
If that still doesn't work, you might want to look at writing a perl script to satisfy your string replacement needs.

When you write TeX source, how do you use your editor's word wrap?

Do you use "hard wrapping" (either yourself or automatically by your editor) by inserting newlines into your source document at a certain line length, or do you write your paragraphs in one continual line and let your editor "soft-wrap" for you?
Also, what editor do you use for this?
Note: I'm interested in how you wrap lines in your TeX source code (.tex file, general prose), not how TeX wraps lines for the final document.

I recently switched to hard-wrapping per sentence (i.e., newline after sentence end only; one-to-one mapping between lines and sentences) for two reasons:
softwrap for a whole paragraph makes typos impossible to spot in version control diffs.
hardwrapped paragraphs look nice until you start to edit them, and if you re-flow a hard wrapped paragraph you end up with a whole bunch of lines changed in the diff for a possibly one word change.
Only wrapping per sentence fixes these two problems:
Small changes are comparatively easy to spot in a diff.
No re-flowing of text, only changes to, insertions of, or removal of single lines.
Looks a bit weird when you first look at it, but is the only compromise I've seen that addresses the two problems of soft and hard wrapping.
Of course, if you're working collaboratively, the answer is to use whatever the other people are using.

I use Emacs (with AUCTeX). After editing or writing a paragraph, I hit M-q to hard-wrap it. It also handles indenting items, and it also formats commented paragraphs. I don't like soft wraps, because they are visually indistinguishable from real newline characters, but behave differently.

I generally let my LaTeX editor softwrap the lines. I think part of it is due to the fact that I had some bad experiences with significant whitespace when I was first learning LaTeX, and part of it is because I don't like heavily-jagged right-margins when I'm editing the text file.

Depending on what os you use, i recommend winedt (windows) and kile (linux). Both of these soft wrap, and there is no need for hard wraps. (That is, i leave my paragraphs as long lines in the source) Latex sorts out line breaks in the output and when i read the source, i use my editor.
The only possible reason to use hard line breaks is to make it easier to find errors in the code (which the compiler indicates by line number) but they are generally not hard to find, if it's mainly text, errors are rare anyway.

Typically I have my editor insert newlines. That is, I try not to hit the "enter" key for a new line, but when the editor soft-wraps, it actually inserts a newline character.
I use vim to accomplish this, and I don't know if other editors have this feature or how they work. In specific, i use the wrapmargin feature.
I typically try to keep my lines of code (TeX or otherwise) at n-characters long for clarity and consistency. I tend to go with 80 characters, but that is up to you.
More vim-related line-breaking docs:
http://www.vim.org/htmldoc/usr_25.html
http://www.vim.org/htmldoc/options.html#%27textwidth%27

I tend to do hard-wrapping with TeX, but that's rooted more in my obsession with text formatting than any real gain of efficiency. One major thing that I don't like about soft-wrapping is that it tends (in my opinion, obviously) to make things harder to read by wrapping in semantically-random places.

Although I would prefer to use soft wrapping I end up using hard wrapping for one practical reason: all of my collaborators do the same. So, when I work on an article with someone it would be a big pain for me to soft wrap while the other person hard wraps. The second reason is that Emacs was until recently able to handle properly on hard wrapping. Emacs 23 which I currently use changes this but it will be a long time before everybody upgrades to 23 so I can sneak soft wrapped texts to them.
The way I actually use hard wrapping is to have auto-fill-mode turned on. Furthermore M-q is bound to LaTeX-fill-paragraph (in the AucTeX mode - but I don't remember if this is a standard binding or one of my bindings - I'm pretty sure it's the latter). Combining these two I manage to keep my TeX source more or less decently formatted.
By the way, I have heard the suggestion to always start a new sentence at the beginning of a line. In other words a period at the end of a sentence should be followed by a hard return. The benefit is that it works well with version control systems since changes to a sentence can remain localized. I think that this is in principle a nice idea but I have not managed to use it because of my obsessive-compulsive usage of M-q.

I use Kile under Linux with hard wrapping (called static word wrap in Kile) because apparently in my work environment everybody do like that. Soft wrapping makes much more sense to me, so if I could choose I would use that rather than hard wrapping.

I work in joe mostly. I from time to time press enter automatically, and if it doesn't look good I press auto-format (ctrl-k j).
Joe has autowrap modes, but I don't even bother.

I use Auctex with automatic line breaking switched off, and insert line breaks by hand. I avoid auto-formatting, since I want as few changes to where line breaks occur between edits to the document, which makes diffs less cluttered.
Using a smarter diff, one that doesn't care about tex-irrelevant whitespace, would be better, but that's the tool I use.
I like Will's suggestion of hard wrapping per sentence. I thought about it before, but I am fixed in my habits.

Is it possible to write own "packages" for LaTeX?

As a programmer, I wonder if I could create my own package for LaTeX. I need something like that famous "listings" package, but something that is much more capable for my needs. I look for a listings solution that would watch out for a comment line like
// BEGIN LISTING 3122
// END LISTING 3122
No syntax highlighting, but intelligent support for tab indents. That package then would be used with a file name or path, walk through the lines and copy out just the snippets of interest.
I am 100% sure there is absolutely nothing like this on the market. So I want to program it for LaTeX. If that's possible. I have no idea how and what programming language / IDE. Where would I start looking?

This is certainly possible, but it is non-trivial in the TeX programming language. I don't have time to code it up at the moment but here's an algorithm; I suggest asking on comp.text.tex for more specific LaTeX programming advice.
Locally set all catcodes of special chars to "other" (\dospecials) and start reading in the input file line by line (\read)
for each line compare the first however many tokens of the line (some iterative use of \if or \ifx; there might be a package to make this easier such as stringstrings or xstring)
in the default state throw away the input line and read the next
unless it's // BEGIN LISTING, in which case save each line into a macro (something like \g#addto#macro)
until it's // END LISTING, obviously
keep going until the end of the file (\ifeof)
TeX by Topic is a good reference guide for this sort of work.

The rather simple texments package shows how code can be piped into pdflatex: by writing your shell-invocable filter, you should be able to do something similar with your idea.

I'm pretty certain you can't do this in LaTeX. Basically you can go nuts with anything that's either a command (\foo) or an environment (\begin{foo} ... \end{foo}) but not in the way you are describing here. Within environments or commands it is possible to turn off LaTeX's processing and handle everything yourself in some way. This is how verbatim and listings work. It's not very pretty though.

Basically, I think it might be possible, if you make ‘/’ an active character (there is \makeactive for example but I imagine there are more solutions) and then invent some good magic around it. (You will need to emulate/create an environment with your logic.) Similar things are done in some of the internationalisation packages in order to ease the input of letters with diacritics.
For a character like ‘/’ this might be even harder as this one could have been written in other places of your text, too. So of course you’d have to take special care for that.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart