Tex command which affects the next complete word - latex

Is it possible to have a TeX command which will take the whole next word (or the next letters up to but not including the next punctuation symbol) as an argument and not only the next letter or {} group?
I’d like to have a \caps command on certain acronyms but don’t want to type curly brackets over and over.

First of all create your command, for example
\def\capsimpl#1{{\sc #1}}% Your main macro
The solution to catch a space or punctuation:
\catcode`\#=11
\def\addtopunct#1{\expandafter\let\csname punct#\meaning#1\endcsname\let}
\addtopunct{ }
\addtopunct{.} \addtopunct{,} \addtopunct{?}
\addtopunct{!} \addtopunct{;} \addtopunct{:}
\newtoks\capsarg
\def\caps{\capsarg{}\futurelet\punctlet\capsx}
\def\capsx{\expandafter\ifx\csname punct#\meaning\punctlet\endcsname\let
\expandafter\capsend
\else \expandafter\continuecaps\fi}
\def\capsend{\expandafter\capsimpl\expandafter{\the\capsarg}}
\def\continuecaps#1{\capsarg=\expandafter{\the\capsarg#1}\futurelet\punctlet\capsx}
\catcode`\#=12

#Debilski - I wrote something similar to your active * code for the acronyms in my thesis. I activated < and then \def<#1> to print the acronym, as well as the expansion if it's the first time it's encountered. I also went a bit off the deep end by allowing defining the expansions in-line and using the .aux files to send the expansions "back in time" if they're used before they're declared, or to report errors if an acronym is never declared.
Overall, it seemed like it would be a good idea at the time - I rarely needed < to be catcode 12 in my actual text (since all my macros were in a separate .sty file), and I made it behave in math mode, so I couldn't foresee any difficulties. But boy was it brittle... I don't know how many times I accidentally broke my build by changing something seemingly unrelated. So all that to say, be very careful activating characters that are even remotely commonly-used.
On the other hand, with XeTeX and higher unicode characters, it's probably a lot safer, and there are generally easy ways to type these extra characters, such as making a multi (or compose) key (I usually map either numlock or one of the windows keys to this), so that e.g. multi-!-! produces ¡). Or if you're running in emacs, you can use C-\ to switch into TeX input mode briefly to insert unicode by typing the TeX command for it (though this is a pain for actually typing TeX documents, since it intercepts your actual \'s, and please please don't try defining your own escape character!)

Regarding whitespace after commands: see package xspace, and TeX FAQ item Commands gobble following space.
Now why this is very difficult: as you noted yourself, things like that can only be done by changing catcodes, it seems. Catcodes are assigned to characters when TeX reads them, and TeX reads one line at a time, so you can not do anything with other spaces on the same line, IMHO. There might be a way around this, but I do not see it.
Dangerous code below!
This code will do what you want only at the end of the line, so if what you want is more "fluent" typing without brackets, but you are willing to hit 'return' after each acronym (and not run any auto-indent later), you can use this:
\def\caps{\begingroup\catcode`^^20 =11\mcaps}
\def\mcaps#1{\def\next##1 {\sc #1##1\catcode`^^20 =10\endgroup\ }\next}

One solution might be setting another character as active and using this one for escaping. This does not remove the need for a closing character but avoids typing the \caps macro, thus making it overall easier to type.
Therefore under very special circumstances, the following works.
\catcode`\*=\active
\def*#1*{\textsc{\MakeTextLowercase{#1}}}
Now follows an *Acronym*.
Unfortunately, this makes uses of \section*{} impossible without additional macro definitions.
In Xetex, it seems to be possible to exploit unicode characters for this, so one could define
\catcode`\•=\active
\def•#1•{\textsc{\MakeTextLowercase{#1}}}
Now follows an •Acronym•.
Which should reduce the effects on other commands but of course needs to have the character ‘•’ mapped to the keyboard somewhere to be of use.

Related

Are there Ansi escape sequences for superscript and subscript?

I'm playing around with ANSI escape sequences, e.g.
echo -e "\e[91mHello\e[m"
on a Linux console to display colored text.
Now I try to use superscript and subscript output like a=b².
I read here and here about: Partial Line Down (subscript) and Partial Line Up (superscript) but I'm not sure about the exact syntax and even which terminal client might supports this.
Any suggestions about this?
Possibly some commercial product supports it, but it's not supported by any terminal emulator you'll encounter (unless someone modifies one just to prove a point).
The standard describes possible escape sequences, but there is no requirement that any given sequence is supported by any terminal. There are commonly supported (and assumed) sequences such as clearing the screen, but even for that, not all terminals have supported the feature.
The reason is that terminal emulators are generally used with applications (such as text editors) which assume a regular set of rows/columns, and that the text is shown compactly (no extra space such as would be needed to allow for partial line movement. Back in the day when people used typewriters, it was common to have 1.5 or 2.0 line-spacing, and get no more than 33 lines on a page. That changed, long ago.
The need for subscripts/superscripts didn't go away — Unicode provides a usable set of characters with that representation (see Superscripts and Subscripts
Range: 2070–209F)
Further reading:
Your New Royal Portable (1953).
Line Spacing - Butterick's Practical Typography
console_codes - Linux console escape and control sequences

Automatically capitalize first letter of first word in a new sentence in LaTeX

I know one of LaTeX's bragging points is that it doesn't have this Microsoftish behavior. Nevertheless, it's sometimes useful.
LaTeX already adds an extra space after you type a (non-backslashed) period, so it should be possible to make it automatically capitalize the following letter as well.
Is there an obvious way to write a macro that does this, or is there a LaTeX package that does it already?
The following code solves the problem.
\let\period.
\catcode`\.\active
\def\uppercasesingleletter#1{\uppercase{#1}}
\def.{\period\afterassignment\periodx\let\next= }
\def \periodx{\ifcat\space\next \next\expandafter\uppercasesingleletter \else\expandafter\next\fi}
First. second.third. relax.relax. up
\let\period. save period
\catcode\.\active make all periods to be active symbol (like macro).
\def\uppercasesingleletter#1{\uppercase{#1}} defines macro \uppercasesingleletter to make automatically capitalize the following letter.
\def.{\period\afterassignment\periodx\let\next= } writes saved period and checkes the next symbol.
\def \periodx{\ifcat\space\next \next\expandafter\uppercasesingleletter \else\expandafter\next\fi} If the next letter is a space then \uppercasesingleletter is inserted.
ages ago there was discussion of this idea on comp.text.tex, and the general conclusion was you can't do it satisfactorily. satisfactory, in my book, involves not making characters active, but i can't see how that could work at all.
personally, i would want to make space active, and have it then look at \spacefactor and \MakeUppercase the following character if the factor is 3000.
something like
\catcode\ \active % latex already has a saved space character -- \space
\def {\ifhmode% \spacefactor is invalid
% (or something) in vertical mode
\ifnum\spacefactor<3000\else% note: with space active,
% even cs-ended lines need %-termination
\expandafter\gobbleandupper\fi}%
\def\gobbleandupper#1{\def\tempa{#1}\def\tempb{ }%
\ifx\tempa\tempb% can''t indent the code, either :-(
% here, we have another space
\expandafter\gobbleandupper% try again
\else\space% insert a "real" space to soak up the
% space factor
\expandafter\MakeUppercase\fi}%
this doesn't really do the job -- there are enough loose ends to knit a fairisle jumper. for example, given that we can't rely on \everypar in latex, how do you uppercase the first letter of a paragraph?
no ... however much it hurts (which is why i avoid unnecessary key operations) we need to type latex "properly" :-(
I decided to solve it in the following way:
Since I always compile the LaTeX code three times before i okular the result (to get pagination and references right), I decided to build the capitalization of sentences into that process.
Thus, I now have a shell script that calls my capitalization script (written in CRM114) first, then pdflatex three times, and then okular. This way, all the stuff happens as the result of a single command.

Latex - Apply an operation to every character in a string

I am using LaTeX and I have a problem concerning string manipulation.
I want to have an operation applied to every character of a string, specifically
I want to replace every character "x" with "\discretionary{}{}{}x". I want to do
this because I have a long string (DNA) which I want to be able to separate at
any point without hyphenation.
Thus I would like to have a command called "myDNA" that will do this for me instead of
inserting manually \discretionary{}{}{} after every character.
Is this possible? I have looked around the web and there wasnt much helpful
information on this topic (at least not any I could understand) and I hoped
that you could help.
--edit
To clarify:
What I want to see in the finished document is something like this:
the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT
just plain linebreaks, without any hyphens. The DNA sequence will be one
long string without any spaces or anything but it can break at any point.
This is why my idea was to inesert a "\discretionary{}{}{}" after every
character, so that it can break at any point without inserting any hyphens.
This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\say#1{#1}
You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.
Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.
Revision:
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\transform#1{#1\hskip 0pt plus 1pt}
Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.
Edit:
There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…
Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\#gobble and \#ifnextchar):
\makeatletter
\def\hyphenatestring#1{\xHyphen#te#1$\unskip}
\def\xHyphen#te{\#ifnextchar${\#gobble}{\sw#p{\hskip 0pt plus 1pt\xHyphen#te}}}
\def\sw#p#1#2{#2#1}
\makeatother
Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an # in them somewhere so that users don't accidentally call them.
If you want to figure out how this works, \#gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw#p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen#te} and put that with no args in the "else" branch, but (in my opinion) \sw#p is more general (and I'm surprised it's not in standard LaTeX already).
There is a contrib package on CTAN that deals with typesetting DNA sequences. It does a little more than just line-breaking, for example, it also supports colouring. I'm not sure if it is possible to get the output you are after though, and I have no experience in the DNA-sequence-typesetting area, but is one long string the most readable representation?
Assuming your string is the same, in your preamble, use the \newcommand{}{}. Like this:
\newcommand{\myDNA}{blah blah blah}
if that doesn't satisfy your requirements, I suggest:
2. Break the strings down to the smallest portion, then use the \newcommand and then call the new commands in sequence: \myDNA1 \myDNA2.
If that still doesn't work, you might want to look at writing a perl script to satisfy your string replacement needs.

Why does Tex/Latex not speed up in subsequent runs?

I really wonder, why even recent systems of Tex/Latex do not use any caching to speed up later runs. Every time that I fix a single comma*, calling Latex costs me about the same amount of time, because it needs to load and convert every single picture file.
(* I know that even changing a tiny comma could affect the whole structure but of course, a well-written cache format could see the impact of that. Also, there might be situations where 100% correctness is not needed as long as it’s fast.)
Is there something in the language of Tex which makes this complicated or impossible to accomplish or is it just that in the original implementation of Tex, there was no need for this (because it would have been slow anyway on those large computers)?
But then on the other hand, why doesn’t this annoy other people so much that they’ve started a fork which has some sort of caching (or transparent conversion of Tex files to a format which is faster to parse)?
Is there anything I can do to speed up subsequent runs of Latex? Except from putting all the stuff into chapterXX.tex files and then commenting them out?
Let's try to understand how TeX works. What happens when you write the following?
tex.exe myfile.tex
TeX reads your file byte by byte. First of all, TeX converts each char to pair <category, ascii-code>. Each character has category code and ascii code. Category code means that the character is an opening brace ({) or entrance into the mathematical mode ($), symbol-macro (~, for example) or letter (A-Z,a-z).
If TeX gets chars with category code 11 (letters) or 12 (other symbols: digits, comma, period) TeX starts a paragraph. You want to cache all paragraphs.
Suppose you changed something in your document. How can TeX check that all paragraphs after your changes is the same? May be you changed the category of some char. Me be you changed the meaning of some macro. Or you have removed } somewhere and thus changed the current font.
To be sure that the paragraph is the same you must be sure that all characters in the paragraph is the same, that all character categories is the same, the current font is the same, all math fonts is the same, and the value of some internal variables is the same (for example, \hsize, \vsize, \pretolerance, \tolerance, \hypenpenalty, exhyphenpenalty, \widowpenalty, \spaceskip, ..., ........)
You can be sure only that all paragraphed before your changes is the same. But in this case you must keep all states after each paragraph.
Your system SuperCachedTeX is very complicated. Isn't it?
If you're using pdftex, then you can use --draftmode on the command line for the first runs. This instructs pdftex not to generate a PDF.
Of course lots of things could be cached (like graphics information, for instance), but the way TeX works makes it hard to do. There is a rather complex initialization of TeX when it starts up, and one TeX run always means exactly one PDF written out. In order to do caching, you need to keep the data in memory (to be efficient).
You could use IPC and talk to a daemon to get the cached information. But that would involve lots programming. TeX is for normal purposes so blazingly fast, that this does not really gain a lot. But on the other hand, this is a good question, as I have seen LaTeX runs (on currend hardware) that run > 10 hours that would have benefited from caching.
Yet another answer, not strictly related:
You can use the LaTeX macro \include{...} and with \includeonly{} you can rerun your document for a subset only. But this is not caching, nor does it give you the complete document.
There are solutions such as preview-latex, which pre-compile stuff into a dedicated format file for speed purposes. You need to remember that TeX optimises pages on a local basis. There is no concpet at the engine level of material being fixed on a particular page, so you can't just "re-TeX one page".
Actually, the correct answer is (IMO): LaTeX already caches information in its output file (.aux, additional files for other packages). So if you add a comma, this information is reused and thus the typeset run is much faster then without this .aux file.
Tex does have a caching facility, named format files, and I think, pace Alexey's valuable summary of the problems representing Tex's state, it should be possible to use them to allow resumption of editing after any page eject.
The major issue is that pagebreaks will affect paragraphs or floats, and these may not occur at a particular point in the text, but may be occur in the execution of macros that were invoked dependent on the transient state passed to them when they were invoked.
So to make the idea of creating "breakpoints" work, one would need to hack Tex internals to dump additional information, beyond that normaally dumped in format files, and package them up with the state of the auxiliary files. Given what Joseph says about Tex fragment previewers, why would anyone bother hacking Tex to do this?

When you write TeX source, how do you use your editor's word wrap?

Do you use "hard wrapping" (either yourself or automatically by your editor) by inserting newlines into your source document at a certain line length, or do you write your paragraphs in one continual line and let your editor "soft-wrap" for you?
Also, what editor do you use for this?
Note: I'm interested in how you wrap lines in your TeX source code (.tex file, general prose), not how TeX wraps lines for the final document.
I recently switched to hard-wrapping per sentence (i.e., newline after sentence end only; one-to-one mapping between lines and sentences) for two reasons:
softwrap for a whole paragraph makes typos impossible to spot in version control diffs.
hardwrapped paragraphs look nice until you start to edit them, and if you re-flow a hard wrapped paragraph you end up with a whole bunch of lines changed in the diff for a possibly one word change.
Only wrapping per sentence fixes these two problems:
Small changes are comparatively easy to spot in a diff.
No re-flowing of text, only changes to, insertions of, or removal of single lines.
Looks a bit weird when you first look at it, but is the only compromise I've seen that addresses the two problems of soft and hard wrapping.
Of course, if you're working collaboratively, the answer is to use whatever the other people are using.
I use Emacs (with AUCTeX). After editing or writing a paragraph, I hit M-q to hard-wrap it. It also handles indenting items, and it also formats commented paragraphs. I don't like soft wraps, because they are visually indistinguishable from real newline characters, but behave differently.
I generally let my LaTeX editor softwrap the lines. I think part of it is due to the fact that I had some bad experiences with significant whitespace when I was first learning LaTeX, and part of it is because I don't like heavily-jagged right-margins when I'm editing the text file.
Depending on what os you use, i recommend winedt (windows) and kile (linux). Both of these soft wrap, and there is no need for hard wraps. (That is, i leave my paragraphs as long lines in the source) Latex sorts out line breaks in the output and when i read the source, i use my editor.
The only possible reason to use hard line breaks is to make it easier to find errors in the code (which the compiler indicates by line number) but they are generally not hard to find, if it's mainly text, errors are rare anyway.
Typically I have my editor insert newlines. That is, I try not to hit the "enter" key for a new line, but when the editor soft-wraps, it actually inserts a newline character.
I use vim to accomplish this, and I don't know if other editors have this feature or how they work. In specific, i use the wrapmargin feature.
I typically try to keep my lines of code (TeX or otherwise) at n-characters long for clarity and consistency. I tend to go with 80 characters, but that is up to you.
More vim-related line-breaking docs:
http://www.vim.org/htmldoc/usr_25.html
http://www.vim.org/htmldoc/options.html#%27textwidth%27
I tend to do hard-wrapping with TeX, but that's rooted more in my obsession with text formatting than any real gain of efficiency. One major thing that I don't like about soft-wrapping is that it tends (in my opinion, obviously) to make things harder to read by wrapping in semantically-random places.
Although I would prefer to use soft wrapping I end up using hard wrapping for one practical reason: all of my collaborators do the same. So, when I work on an article with someone it would be a big pain for me to soft wrap while the other person hard wraps. The second reason is that Emacs was until recently able to handle properly on hard wrapping. Emacs 23 which I currently use changes this but it will be a long time before everybody upgrades to 23 so I can sneak soft wrapped texts to them.
The way I actually use hard wrapping is to have auto-fill-mode turned on. Furthermore M-q is bound to LaTeX-fill-paragraph (in the AucTeX mode - but I don't remember if this is a standard binding or one of my bindings - I'm pretty sure it's the latter). Combining these two I manage to keep my TeX source more or less decently formatted.
By the way, I have heard the suggestion to always start a new sentence at the beginning of a line. In other words a period at the end of a sentence should be followed by a hard return. The benefit is that it works well with version control systems since changes to a sentence can remain localized. I think that this is in principle a nice idea but I have not managed to use it because of my obsessive-compulsive usage of M-q.
I use Kile under Linux with hard wrapping (called static word wrap in Kile) because apparently in my work environment everybody do like that. Soft wrapping makes much more sense to me, so if I could choose I would use that rather than hard wrapping.
I work in joe mostly. I from time to time press enter automatically, and if it doesn't look good I press auto-format (ctrl-k j).
Joe has autowrap modes, but I don't even bother.
I use Auctex with automatic line breaking switched off, and insert line breaks by hand. I avoid auto-formatting, since I want as few changes to where line breaks occur between edits to the document, which makes diffs less cluttered.
Using a smarter diff, one that doesn't care about tex-irrelevant whitespace, would be better, but that's the tool I use.
I like Will's suggestion of hard wrapping per sentence. I thought about it before, but I am fixed in my habits.

Resources