Replace strings in LaTeX

Replace strings in LaTeX - latex

I want LaTeX to automatically replace strings like " a ", " s ", " z " with " a~", " s~", " z~", because they can't be at line end. Any suggestions?

For Czech typographic rules, there is a preprocessor called Vlna" by Petr Olšák - download . The set of (usually prepositions in czech) is customizable - so it might be usable for other languages as well.

You can use \StrSubstitute from xstring package.
e.g.
\StrSubstitute{change ME}{ ME}{d}
will convert change ME into changed.
Although, nesting is not possible, so to make another substitution you must use an intermediate variable in this way
\StrSubstitute{change ME}{ ME}{d}[\mystring]
\StrSubstitute{\mystring}{ed}{ing}
Finally, your solution would be
\usepackage{xstring}
\def\mystring{...source string here...}
\begin{document}
\StrSubstitute{\mystring}{ a }{a~{}}[\mystring]
\StrSubstitute{\mystring}{ s }{s~{}}[\mystring]
\StrSubstitute{\mystring}{ z }{z~{}}[\mystring]
\mystring
\end{document}
Note the use of the empty string {} to avoid the sequence ~}.

I'm afraid (to the best of my knowledge) this is basically impossible with LaTeX. A LuaTeX-based solution might be possible, though.
It's not actually clear to me, however, that " a ", for example, shouldn't appear at the end of a
line. Although I might be used to different typographic rules.
(Is there anything wrong with the line break in the last paragraph? :))

As far as I know there is no way to do this in LaTeX itself. I'd go for automating this with some external tools, as my typical setup involves a Makefile handling the LaTeX run by itself. This makes it rather easy to run tools like sed on the sources and do some replacements using regular expressions, and a simple rule would do this for your case.
If you use some LaTeX editor that does everything for you you should check the editors regular expression search and replace functionality.

Yes, this is the age old argument of data processing vs. data composition. We have always done these things in a pre-processor environment responsible for extracting the information from its source environment, SQL or plain-text, and created the contents of a \input(file.tex).
But yes, it is possible (TeX is after all a programming language) but you will have to become a wizard. Get the 4 volume set TeX in Practice by Stephan von Bechtolsheim.
The approach would be to begin an environment (execute a macro) whose ''argument'' was all text down to the end of the environment. Then just munge though the tokens fixing the ones you want.
Still, I don't think any of us are advocating this approach.

If you are using TeXmaker to write your LaTeX file, then you may click on the Edit button on the toolbar, then click on Replace.
A dialogue box will come up, and you can enter your strings one after the other.
You put the strings to be changed in the Find text input and what you want it to be changed to in the Replace text input.
You can also specify where you want the replacement to start from.

Click Find and Replace (or similiar) in the menu of your text editor and do it.

Related

LaTeX - Define a list of words to use a certain font

Problem
I'm writing an essay/documentation about an application. Within this doc there are a lot of code-words which I want to be highlighted using a different font. Currently I work with:
{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Which is a bit of work to use every time.
Question
I'm looking for a way to declare a list of words to use a specific font within the text.
I know that I can use the listings package and define morekeywords which will be highlighted within the listings-environment but I need it in the text.
I thought of something like this:
\defineList{\fontfamily{cmtt}}{
SOME-KEY-WORD-1,
SOME-Key-word-2,
...
}
EDIT
I forgot to mention that I already tried something like:
\def\somekeyword{\fontfamily{cmtt}\selectfont some\_key\_word\normalfont}
which is a little bit better then the first attempt but I still need to use \somekeyword in the text.
EDIT 2
I came upon a workaround:
\newcommand{\cmtt}[1]{{\fontfamily{cmtt}\selectfont #1\normalfont}}
It's a little better then EDIT but still not the perfect solution.

Substitution every time a word occurs, without providing any clues to TeX, might be difficult and is beyond my skills (though I'd be interested to see someone come up with a solution).
But why not simply create a macro for each of those words?
\newcommand\somekeyword{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Use like this:
Hello, \somekeyword{} is the magic word!
The trailing {} are unfortunately necessary to prevent eating the subsequent whitespace; even the built-in \LaTeX command requires them.
If you have very many of these words and are worried about maintainability, you can even create a macro to create the macros:
\newcommand\declareword[2][]{%
\expandafter\newcommand%
\csname\if\relax#1\relax#2\else#1\fi\endcsname%
{{\fontfamily{cmtt}\selectfont #2}}%
}
\declareword{oneword} % defines \oneword
\declareword{otherword} % defines \otherword
\declareword[urlspy]{urls.py} % defines \urlspy
...
The optional argument indicates the name of the command, in case the word itself contains characters like . which cannot be used in the name of a command.

How can I expand Sublime's language syntax understanding to incorporate custom syntax?

I know that sounds vague. Basically I just want Sublime to highlight custom syntax (color the text), just like it does with native syntax.
I am using Sublime to write LaTeX code. For those that don't know, LaTeX equations are typically enclosed by \[ \], e.g.
\[ E = m c^2 \]
Sublime understands that syntax and colors the enclosing code appropriately.
However, I use my custom defined command, \eq{ ... }, which wraps the \[ \] functionality (so I can globally change some settings by just redefining the \eq definition). e.g.
\eq{ E = m c^2 }
I don't know anything about Sublime under the hood beyond basic key bindings. I want to expand Sublime's understanding of syntax to incorporate my custom command without wasting a ton of time digging through tutorials and such.

Since you are mainly interested in the result and not in the reasoning, I will try to be as straight forward as I can.
The LaTeX syntax of Sublime Text will change in release 3119 and I would recommend to use that, if you want to change something.
Just download it from https://github.com/sublimehq/Packages and put the LaTeX folder into the folder, which opens when you select Preferences >> Browse Packages... in the Sublime Text menu.
Afterwards open the file LaTeX.sublime-syntax and search for ensuremath (LaTeX.sublime-syntax#L498). Duplicate that part (everything with a higher indent) and change the command to the command you wish, e.g. in your example this would be - match: '((\\)eq)(\{)'.
Aside the new syntax removes the highlighting of math environments as strings, because this has lead to several problems.
I made a small entry in the LaTeXTools wiki to explain, how you restore the highlight.

Latex - Apply an operation to every character in a string

I am using LaTeX and I have a problem concerning string manipulation.
I want to have an operation applied to every character of a string, specifically
I want to replace every character "x" with "\discretionary{}{}{}x". I want to do
this because I have a long string (DNA) which I want to be able to separate at
any point without hyphenation.
Thus I would like to have a command called "myDNA" that will do this for me instead of
inserting manually \discretionary{}{}{} after every character.
Is this possible? I have looked around the web and there wasnt much helpful
information on this topic (at least not any I could understand) and I hoped
that you could help.
--edit
To clarify:
What I want to see in the finished document is something like this:
the dna sequence is CTAAAGAAAACAGGACGATTAGATGAGCTTGAGAAAGCCATCACCACTCA
AATACTAAATGTGTTACCATACCAAGCACTTGCTCTGAAATTTGGGGACTGAGTACACCAAATACGATAG
ATCAGTGGGATACAACAGGCCTTTACAGCTTCTCTGAACAAACCAGGTCTCTTGATGGTCGTCTCCAGGT
ATCCCATCGAAAAGGATTGCCACATGTTATATATTGCCGATTATGGCGCTGGCCTGATCTTCACAGTCAT
CATGAACTCAAGGCAATTGAAAACTGCGAATATGCTTTTAATCTTAAAAAGGATGAAGTATGTGTAAACC
CTTACCACTATCAGAGAGTTGAGACACCAGTTTTGCCTCCAGTATTAGTGCCCCGACACACCGAGATCCT
AACAGAACTTCCGCCTCTGGATGACTATACTCACTCCATTCCAGAAAACACTAACTTCCCAGCAGGAATT
just plain linebreaks, without any hyphens. The DNA sequence will be one
long string without any spaces or anything but it can break at any point.
This is why my idea was to inesert a "\discretionary{}{}{}" after every
character, so that it can break at any point without inserting any hyphens.

This takes a string as an argument and calls \discretionary{}{}{} after each character. The input string stops at the first dollar sign, so you should not use that.
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\say{#1}\discretionary{}{}{}%
\takeTheRest#2\ofTheString
\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\say#1{#1}
You’d call it like \hyphenateWholeString{CTAAAGAAAACAGGACG}.
Instead of \discretionary{}{}{} you can also try \hspace{0pt}, if you like that more (and are in a latex environment). In order to align the right margin, I think you’d need to do some more fine tuning (but see below). The effect is of course minimised by using a font of fixed width.
Revision:
\def\hyphenateWholeString #1{\xHyphenate#1$\wholeString\unskip}
\def\xHyphenate#1#2\wholeString {\if#1$%
\else\transform{#1}%
\takeTheRest#2\ofTheString\fi}
\def\takeTheRest#1\ofTheString\fi
{\fi \xHyphenate#1\wholeString}
\def\transform#1{#1\hskip 0pt plus 1pt}
Steve’s suggestion of using \hskip sounds like a very good idea to me, so I made a few corrections. Note that I’ve renamed the \say macro and made it more useful in that it now actually does the transformation. (However, if you remove the \hskip from \transform, you’ll also need to remove the \unskip in the main macro definition.
Edit:
There is also the seqsplit package which seems to be made for printing DNA data or long numbers. They also bring a few options for nicer output, so maybe that is what you’re looking for…

Debilski's post is definitely a solid way to do it, although the \say is not necessary. Here's a shorter way that makes use of some LaTeX internal shortcuts (\#gobble and \#ifnextchar):
\makeatletter
\def\hyphenatestring#1{\xHyphen#te#1$\unskip}
\def\xHyphen#te{\#ifnextchar${\#gobble}{\sw#p{\hskip 0pt plus 1pt\xHyphen#te}}}
\def\sw#p#1#2{#2#1}
\makeatother
Note the use of \hskip 0pt plus 1pt instead of \discretionary - when I tried your example I ended up with a ragged margin because there's no stretchability. The \hskip adds some stretchable glue in between each character (and the \unskip afterwards cancels the extra one we added). Also note the LaTeX style convention that "end user" macros are all lowercase, while internal macros have an # in them somewhere so that users don't accidentally call them.
If you want to figure out how this works, \#gobble just eats whatever's in front of it (in this case the $, since that branch is only run when a $ is the next char). The main point is that \sw#p is only given one argument in the "else" branch, so it swaps that argument with the next char (that isn't a $). We could just as well have written \def\hyphenate#next#1{#1\hskip...\xHyphen#te} and put that with no args in the "else" branch, but (in my opinion) \sw#p is more general (and I'm surprised it's not in standard LaTeX already).

There is a contrib package on CTAN that deals with typesetting DNA sequences. It does a little more than just line-breaking, for example, it also supports colouring. I'm not sure if it is possible to get the output you are after though, and I have no experience in the DNA-sequence-typesetting area, but is one long string the most readable representation?

Assuming your string is the same, in your preamble, use the \newcommand{}{}. Like this:
\newcommand{\myDNA}{blah blah blah}
if that doesn't satisfy your requirements, I suggest:
2. Break the strings down to the smallest portion, then use the \newcommand and then call the new commands in sequence: \myDNA1 \myDNA2.
If that still doesn't work, you might want to look at writing a perl script to satisfy your string replacement needs.

Tex command which affects the next complete word

Is it possible to have a TeX command which will take the whole next word (or the next letters up to but not including the next punctuation symbol) as an argument and not only the next letter or {} group?
I’d like to have a \caps command on certain acronyms but don’t want to type curly brackets over and over.

First of all create your command, for example
\def\capsimpl#1{{\sc #1}}% Your main macro
The solution to catch a space or punctuation:
\catcode`\#=11
\def\addtopunct#1{\expandafter\let\csname punct#\meaning#1\endcsname\let}
\addtopunct{ }
\addtopunct{.} \addtopunct{,} \addtopunct{?}
\addtopunct{!} \addtopunct{;} \addtopunct{:}
\newtoks\capsarg
\def\caps{\capsarg{}\futurelet\punctlet\capsx}
\def\capsx{\expandafter\ifx\csname punct#\meaning\punctlet\endcsname\let
\expandafter\capsend
\else \expandafter\continuecaps\fi}
\def\capsend{\expandafter\capsimpl\expandafter{\the\capsarg}}
\def\continuecaps#1{\capsarg=\expandafter{\the\capsarg#1}\futurelet\punctlet\capsx}
\catcode`\#=12

#Debilski - I wrote something similar to your active * code for the acronyms in my thesis. I activated < and then \def<#1> to print the acronym, as well as the expansion if it's the first time it's encountered. I also went a bit off the deep end by allowing defining the expansions in-line and using the .aux files to send the expansions "back in time" if they're used before they're declared, or to report errors if an acronym is never declared.
Overall, it seemed like it would be a good idea at the time - I rarely needed < to be catcode 12 in my actual text (since all my macros were in a separate .sty file), and I made it behave in math mode, so I couldn't foresee any difficulties. But boy was it brittle... I don't know how many times I accidentally broke my build by changing something seemingly unrelated. So all that to say, be very careful activating characters that are even remotely commonly-used.
On the other hand, with XeTeX and higher unicode characters, it's probably a lot safer, and there are generally easy ways to type these extra characters, such as making a multi (or compose) key (I usually map either numlock or one of the windows keys to this), so that e.g. multi-!-! produces ¡). Or if you're running in emacs, you can use C-\ to switch into TeX input mode briefly to insert unicode by typing the TeX command for it (though this is a pain for actually typing TeX documents, since it intercepts your actual \'s, and please please don't try defining your own escape character!)

Regarding whitespace after commands: see package xspace, and TeX FAQ item Commands gobble following space.
Now why this is very difficult: as you noted yourself, things like that can only be done by changing catcodes, it seems. Catcodes are assigned to characters when TeX reads them, and TeX reads one line at a time, so you can not do anything with other spaces on the same line, IMHO. There might be a way around this, but I do not see it.
Dangerous code below!
This code will do what you want only at the end of the line, so if what you want is more "fluent" typing without brackets, but you are willing to hit 'return' after each acronym (and not run any auto-indent later), you can use this:
\def\caps{\begingroup\catcode`^^20 =11\mcaps}
\def\mcaps#1{\def\next##1 {\sc #1##1\catcode`^^20 =10\endgroup\ }\next}

One solution might be setting another character as active and using this one for escaping. This does not remove the need for a closing character but avoids typing the \caps macro, thus making it overall easier to type.
Therefore under very special circumstances, the following works.
\catcode`\*=\active
\def*#1*{\textsc{\MakeTextLowercase{#1}}}
Now follows an *Acronym*.
Unfortunately, this makes uses of \section*{} impossible without additional macro definitions.
In Xetex, it seems to be possible to exploit unicode characters for this, so one could define
\catcode`\•=\active
\def•#1•{\textsc{\MakeTextLowercase{#1}}}
Now follows an •Acronym•.
Which should reduce the effects on other commands but of course needs to have the character ‘•’ mapped to the keyboard somewhere to be of use.

Is it possible to write own "packages" for LaTeX?

As a programmer, I wonder if I could create my own package for LaTeX. I need something like that famous "listings" package, but something that is much more capable for my needs. I look for a listings solution that would watch out for a comment line like
// BEGIN LISTING 3122
// END LISTING 3122
No syntax highlighting, but intelligent support for tab indents. That package then would be used with a file name or path, walk through the lines and copy out just the snippets of interest.
I am 100% sure there is absolutely nothing like this on the market. So I want to program it for LaTeX. If that's possible. I have no idea how and what programming language / IDE. Where would I start looking?

This is certainly possible, but it is non-trivial in the TeX programming language. I don't have time to code it up at the moment but here's an algorithm; I suggest asking on comp.text.tex for more specific LaTeX programming advice.
Locally set all catcodes of special chars to "other" (\dospecials) and start reading in the input file line by line (\read)
for each line compare the first however many tokens of the line (some iterative use of \if or \ifx; there might be a package to make this easier such as stringstrings or xstring)
in the default state throw away the input line and read the next
unless it's // BEGIN LISTING, in which case save each line into a macro (something like \g#addto#macro)
until it's // END LISTING, obviously
keep going until the end of the file (\ifeof)
TeX by Topic is a good reference guide for this sort of work.

The rather simple texments package shows how code can be piped into pdflatex: by writing your shell-invocable filter, you should be able to do something similar with your idea.

I'm pretty certain you can't do this in LaTeX. Basically you can go nuts with anything that's either a command (\foo) or an environment (\begin{foo} ... \end{foo}) but not in the way you are describing here. Within environments or commands it is possible to turn off LaTeX's processing and handle everything yourself in some way. This is how verbatim and listings work. It's not very pretty though.

Basically, I think it might be possible, if you make ‘/’ an active character (there is \makeactive for example but I imagine there are more solutions) and then invent some good magic around it. (You will need to emulate/create an environment with your logic.) Similar things are done in some of the internationalisation packages in order to ease the input of letters with diacritics.
For a character like ‘/’ this might be even harder as this one could have been written in other places of your text, too. So of course you’d have to take special care for that.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart