Add forbidden words to TexStudio / Latex - latex

I have some words in my language (German) that seem to be valid according to TexStudios spellchecker.
However they must not be used for my thesis (and globally for me at least).
Is it possible to add words to a list, that trigger a (optimally huge) sign "DO NOT USE THIS!" or even prevent compilation in Latex when such words are used?
I'm looking for something like a negative dictionary.
I've seen files like "badwords" or "stopwords" but don't know when/how they are used. I can freely use them although "check for bad words" is on.

In case anyone else has the problem: Badword files are named after the main language. For me it happened that I have "de_DE_frami" as the dictionary set. Hence it did not use the "de_DE.badwords".
For a good highlighting: One can change the appearance in the options dialog (syntaxhighlighting->badwords) and make it e.g. background red, size 200%
I'd still would like to have a "bad" words and a "impossible" words distinction as you can sometimes not avoid "bad" words or they are not bad in all contexts.

Related

LaTeX - Define a list of words to use a certain font

Problem
I'm writing an essay/documentation about an application. Within this doc there are a lot of code-words which I want to be highlighted using a different font. Currently I work with:
{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Which is a bit of work to use every time.
Question
I'm looking for a way to declare a list of words to use a specific font within the text.
I know that I can use the listings package and define morekeywords which will be highlighted within the listings-environment but I need it in the text.
I thought of something like this:
\defineList{\fontfamily{cmtt}}{
SOME-KEY-WORD-1,
SOME-Key-word-2,
...
}
EDIT
I forgot to mention that I already tried something like:
\def\somekeyword{\fontfamily{cmtt}\selectfont some\_key\_word\normalfont}
which is a little bit better then the first attempt but I still need to use \somekeyword in the text.
EDIT 2
I came upon a workaround:
\newcommand{\cmtt}[1]{{\fontfamily{cmtt}\selectfont #1\normalfont}}
It's a little better then EDIT but still not the perfect solution.
Substitution every time a word occurs, without providing any clues to TeX, might be difficult and is beyond my skills (though I'd be interested to see someone come up with a solution).
But why not simply create a macro for each of those words?
\newcommand\somekeyword{\fontfamily{cmtt}\selectfont SOME-KEY-WORD}
Use like this:
Hello, \somekeyword{} is the magic word!
The trailing {} are unfortunately necessary to prevent eating the subsequent whitespace; even the built-in \LaTeX command requires them.
If you have very many of these words and are worried about maintainability, you can even create a macro to create the macros:
\newcommand\declareword[2][]{%
\expandafter\newcommand%
\csname\if\relax#1\relax#2\else#1\fi\endcsname%
{{\fontfamily{cmtt}\selectfont #2}}%
}
\declareword{oneword} % defines \oneword
\declareword{otherword} % defines \otherword
\declareword[urlspy]{urls.py} % defines \urlspy
...
The optional argument indicates the name of the command, in case the word itself contains characters like . which cannot be used in the name of a command.

Eggplant : How to read text with special characters like ' _ etc

I am trying to read a text in a given rectangle using readText() function.
The function works correctly except when it has to read some text which has special characters like ' _ & etc.
I tried using validCharacters with readText() function. But it didn't help.
Code -
put ReadText((287,125,810,164),validCharacters:"_-'.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890") into Login
I tried working with character collections. But that doesn't seem to be right because the text trying to pick is a dynamic text combination of numbers alphabets and a special character. So one cannot create a library of character collection of every alphabet (a-z, A-Z), numbers(0-9) and special characters.
Example of text trying to read:
Login_Userid1_1, Login'Userid1_1
So how do I read such text correctly
Debugging OCR is a bit of an imprecise science. EggPlant has a lot of OCR Parameters to tweak. When designing test cases it's best to try use other mechanisms to gather information whenever possible. ReadText() should be considered a last resort when more reliable methods are unavailable. When I've used it I've often needed to do a lot of trial and error to find the right set of settings, and SearchRectangle to get consistent results. Without seeing exactly what images you are trying to read text from it's difficult to impossible to troubleshoot where the issue might be.
One thing that does stand out to me is that you're trying to read strings that may contain underscores. ReadText() has an optional property IgnoreUnderscores which treats underscores as spaces. By default this property is set to ON. It defaults to ON because some OCR engines have problems identifying underscore characters consistently.
If you want to have ReadText() handle underscores you'll want to explicitly set this property to OFF.
ReadText(rect, validCharacters:chars, ignoreUnderscores:OFF)

Max size for PO file strings

I know that PO / MO files are meant to be used for small strings like button names, labels, etc. Not long text like an About page, etc.
But lately I am encountering a lot of situations that are in the middle. For example, a two sentence call to action. Or a short paragraph.
Is there best practice or "rule of thumb" for when a string is too long to put in a PO file?
update
For "long" text I use partials and include the correct language version. My question is WHEN is it optimal to use one vs the other. I've heard that PO files are "inefficient" for "long" pieces of text. But what does that mean and when is it too "long"? Or is this not a concern?
Use one entry for a self-contained chunk of text; e.g. a sentence as you say.
Two sentences that belong together and don't make sense without each other should be one entry. Why? Because otherwise the translator wouldn't have the context necessary to translate it well. Same goes for a short paragraph, e.g. explaining a setting: if it's inseparable in the code, it should be one entry.
If you encounter a situation where you have lots of long texts regularly (e.g. entire pages or paragraphs of pages), that's usually a sign that you are using an ill-fitting tool. Some people do it, using Gettext for entire articles, but you're better off having separate documents in such cases. But that doesn't seem to be the case here.

Validate words against an English dictionary in Rails?

I've done some Google searching but couldn't find what I was looking for.
I'm developing a scrabble-type word game in rails, and was wondering if there was a simple way to validate what the player inputs in the game is actually a word. They'd be typing the word out.
Is validation against some sort of English language dictionary database loaded within the app best way to solve this problem? If so, are there any libraries that offer this kind of functionality? If not, what would you suggest?
Thanks for your help!
You need two things:
a word list
some code
The word list is the tricky part. On most Unix systems there's a word list at /usr/share/dict/words or /usr/dict/words -- see http://en.wikipedia.org/wiki/Words_(Unix) for more details. The one on my Mac has 234,936 words in it. But they're not all valid Scrabble words. So you'd have to somehow acquire a Scrabble dictionary, make sure you have the right license to use it, and process it so it's a text file.
(Update: The word list for LetterPress is now open source, and available on GitHub.)
The code is no problem in the simple case. Here's a script I whipped up just now:
words = {}
File.open("/usr/share/dict/words") do |file|
file.each do |line|
words[line.strip] = true
end
end
p words["magic"]
p words["saldkaj"]
This will output
true
nil
I leave it as an exercise for the reader to make it into a proper Words object. (Technically it's not a Dictionary since it has no definitions.) Or to use a DAWG instead of a hash, even though a hash is probably fine for your needs.
A piece of language-agnostic advice here, is that if you only care about the existence of a word (which in such a case, you do), and you are planning to load the entire database into the application (which your query suggests you're considering) then a DAWG will enable you to check the existence in O(n) time complexity where n is the size of the word (dictionary size has no effect - overall the lookup is essentially O(1)), while being a relatively minimal structure in terms of memory (indeed, some insertions will actually reduce the size of the structure, a DAWG for "top, tap, taps, tops" has fewer nodes than one for "tops, tap").

When you write TeX source, how do you use your editor's word wrap?

Do you use "hard wrapping" (either yourself or automatically by your editor) by inserting newlines into your source document at a certain line length, or do you write your paragraphs in one continual line and let your editor "soft-wrap" for you?
Also, what editor do you use for this?
Note: I'm interested in how you wrap lines in your TeX source code (.tex file, general prose), not how TeX wraps lines for the final document.
I recently switched to hard-wrapping per sentence (i.e., newline after sentence end only; one-to-one mapping between lines and sentences) for two reasons:
softwrap for a whole paragraph makes typos impossible to spot in version control diffs.
hardwrapped paragraphs look nice until you start to edit them, and if you re-flow a hard wrapped paragraph you end up with a whole bunch of lines changed in the diff for a possibly one word change.
Only wrapping per sentence fixes these two problems:
Small changes are comparatively easy to spot in a diff.
No re-flowing of text, only changes to, insertions of, or removal of single lines.
Looks a bit weird when you first look at it, but is the only compromise I've seen that addresses the two problems of soft and hard wrapping.
Of course, if you're working collaboratively, the answer is to use whatever the other people are using.
I use Emacs (with AUCTeX). After editing or writing a paragraph, I hit M-q to hard-wrap it. It also handles indenting items, and it also formats commented paragraphs. I don't like soft wraps, because they are visually indistinguishable from real newline characters, but behave differently.
I generally let my LaTeX editor softwrap the lines. I think part of it is due to the fact that I had some bad experiences with significant whitespace when I was first learning LaTeX, and part of it is because I don't like heavily-jagged right-margins when I'm editing the text file.
Depending on what os you use, i recommend winedt (windows) and kile (linux). Both of these soft wrap, and there is no need for hard wraps. (That is, i leave my paragraphs as long lines in the source) Latex sorts out line breaks in the output and when i read the source, i use my editor.
The only possible reason to use hard line breaks is to make it easier to find errors in the code (which the compiler indicates by line number) but they are generally not hard to find, if it's mainly text, errors are rare anyway.
Typically I have my editor insert newlines. That is, I try not to hit the "enter" key for a new line, but when the editor soft-wraps, it actually inserts a newline character.
I use vim to accomplish this, and I don't know if other editors have this feature or how they work. In specific, i use the wrapmargin feature.
I typically try to keep my lines of code (TeX or otherwise) at n-characters long for clarity and consistency. I tend to go with 80 characters, but that is up to you.
More vim-related line-breaking docs:
http://www.vim.org/htmldoc/usr_25.html
http://www.vim.org/htmldoc/options.html#%27textwidth%27
I tend to do hard-wrapping with TeX, but that's rooted more in my obsession with text formatting than any real gain of efficiency. One major thing that I don't like about soft-wrapping is that it tends (in my opinion, obviously) to make things harder to read by wrapping in semantically-random places.
Although I would prefer to use soft wrapping I end up using hard wrapping for one practical reason: all of my collaborators do the same. So, when I work on an article with someone it would be a big pain for me to soft wrap while the other person hard wraps. The second reason is that Emacs was until recently able to handle properly on hard wrapping. Emacs 23 which I currently use changes this but it will be a long time before everybody upgrades to 23 so I can sneak soft wrapped texts to them.
The way I actually use hard wrapping is to have auto-fill-mode turned on. Furthermore M-q is bound to LaTeX-fill-paragraph (in the AucTeX mode - but I don't remember if this is a standard binding or one of my bindings - I'm pretty sure it's the latter). Combining these two I manage to keep my TeX source more or less decently formatted.
By the way, I have heard the suggestion to always start a new sentence at the beginning of a line. In other words a period at the end of a sentence should be followed by a hard return. The benefit is that it works well with version control systems since changes to a sentence can remain localized. I think that this is in principle a nice idea but I have not managed to use it because of my obsessive-compulsive usage of M-q.
I use Kile under Linux with hard wrapping (called static word wrap in Kile) because apparently in my work environment everybody do like that. Soft wrapping makes much more sense to me, so if I could choose I would use that rather than hard wrapping.
I work in joe mostly. I from time to time press enter automatically, and if it doesn't look good I press auto-format (ctrl-k j).
Joe has autowrap modes, but I don't even bother.
I use Auctex with automatic line breaking switched off, and insert line breaks by hand. I avoid auto-formatting, since I want as few changes to where line breaks occur between edits to the document, which makes diffs less cluttered.
Using a smarter diff, one that doesn't care about tex-irrelevant whitespace, would be better, but that's the tool I use.
I like Will's suggestion of hard wrapping per sentence. I thought about it before, but I am fixed in my habits.

Resources