Different encoding of latex and bibtex files [closed] - latex

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Does LaTeX handle situation when a .bib file has different encoding than .tex file? For instance, .tex is in ISO-8859-2 and .bib in UTF-8. Can the encoding be converted on the fly by LaTeX? Or the only way is to do is manually?

First of all, according to the LyX wiki BibTeX can't use UTF-8:
BibTeX does not support files encoded in UTF-8 (i.e., Unicode), which is nowadays the default file encoding on most OSes. The reason is that current BibTeX (v. 0.99c) was released in 1988 and thus predates the advent of unicode. Unless the long-announced BibTeX v. 1.0 or one of the many planned potential successing applications are ready, latin1 (ISO-8859-1) or another 8-bit encoding has to be used for the bib file (this does not affect the LaTeX encoding, which still can be utf8).
Usually, whatever is inside a BibTeX file gets copied verbatim to the LaTeX source code (with some formatting maybe and case changings, &c.), such as book titles, authors, &c.
So your BibTeX file encoding has to match the one used by your LaTeX file, otherwise things get funny. You also can't use babel-provided commands in BibTeX (such as "a for ä, provided by n?german) unless your document includes the right packages.
The canonical way is to make BibTeX files agnostic of any encoding or package issues by always specifying special characters with their appropriate commands.
This basically means that instead of writing ä you would have to use {\" a} if you want to be absolutely sure that it works. Seems to be fairly standard practice.
The BibTeX manual BibTeXing by Oren Patashnik also details this:
BibTeX now handles accented
characters. For example if you have an
entry with the two fields
author = "Kurt G{\"o}del",
year = 1931,
and if you're using the alpha
bibliography style, then BibTeX will
construct the label [Göd31] for this
entry, which is what you'd want. To
get this feature to work you must
place the entire accented character in
braces; in this case either {\"o} or
{\"{o}} will do. Furthermore these
braces must not themselves be enclosed
in braces (other than the ones that
might delimit the entire field or the
entire entry); and there must be a
backslash as the very first character
inside the braces. Thus neither
{G{\"{o}}del} nor {G\"{o}del} will
work for this example. This feature
handles all the accented characters
and all but the nonbackslashed foreign
symbols found in Tables 3.1 and 3.2 of
the LaTeX book. This feature behaves
similarly for "accents" you might
define; we'll see an example shortly.
For the purposes of counting letters
in labels, BibTeX considers everything
contained inside the braces as a
single letter.

You can change the input encoding on the fly:
\inputencoding{latin2}
\bibliography{mybib}
\inputencoding{utf8}
The \inputencoding command is provided by the inputenc package.

BibTeX has huge problems with non-ASCII characters, even in the newest version. If you prefer a modern system, I'd like to recommend the combination of biblatex and biber. Both are still in beta stage, but they work quite well even in production environments. With this combination, most problems related to LaTeX bibliographies will vanish. As a side note, the biblatex documentation also contains a section about encoding issues with traditional BibTeX (§ 2.4.3).

Bibtex has random support for any non-standard character encodings -- essentially sometimes it works, most of the time it doesn't and officially it is not supported (More details ).
Personally, in .bib, I stick to the basic ASCII and LaTeX magic like \"o. For .tex, if I don't write in English, I keep .tex in UTF-8 with \usepackage[utf8]{inputenc} .

Related

The standard article.sty

...the authors should submit the final version as source files, including a word processor file of the text, such as Word or LateX (If using LaTeX, please use the standard article.sty as a style file and also send a PDF version of the LaTeX file)...
Please what they mean by the standard article.sty, they mean llncs format or which format should I use to write the final version of my paper?
As in the first comment to your question, they mean that you should use the article documentclass if you write in LaTeX. In this case, the first line of your main .tex file would be:
\documentclass[]{article}
with the options (or optional arguments) that you choose, separated by commas, between [ and ].
Here is the user guide to check the options and other matters but no worries about installation: you'll most likely have it included in your LaTeX distribution.

#-names in plain TeX

Texinfo adds macros starting with '#'. I'm curious how to do that in plain TeX because I'm trying to create a simple TeX framework for my friends' needs and it would be more readable for them.
texinfo essentially replaces plain TeX's backslash with the at sign. This is done by setting the so-called \catcode of the at sign to 0.
Note that with this setting, #command means exactly the same as \command, you don't get a second family of command names.
Note also that designing and implementing a new TeX format is a lot of work; it is much easier to resort to one of the existing TeX formats.

How to read a text file in ancient encoding?

There is a public project called Moby containing several word lists. Some files contain European alphabets symbols and were created in pre-Unicode time. Readme, dated 1993, reads:
"Foreign words commonly used in English usually include their
diacritical marks, for example, the acute accent e is denoted by ASCII
142."
Wikipedia says that the last ASCII symbol has number 127.
For example this file: http://www.gutenberg.org/files/3203/files/mobypos.txt contains symbols that I couldn't read in any of vatious Latin encodings. (There are plenty of such symbols in the very end of section of words beginning with B, just before C letter. )
Could someone advise please what encoding should be used for reading this file or how can it be converted to some readable modern encoding?
A little research suggests that the encoding for this page is Mac OS Roman, which has é at position 142. Viewing the page you linked and changing the encoding (in Chrome, View → Encoding → Western (Macintosh)) seems to display all the words correctly (it is incorrectly reporting ISO-8859-1).
How you deal with this depends on the language / tools you are using. Here’s an example of how you could convert into UTF-8 with Ruby:
require 'open-uri'
s = open('http://www.gutenberg.org/files/3203/files/mobypos.txt').read
s.force_encoding('macroman')
s.encode!('utf-8')
You are right in that ASCII only goes up to position 127 (it’s a 7-bit encoding), but there are a large number of 8 bit encodings that are supersets of ASCII and people sometimes refer to those as “Extended ASCII”. It appears that whoever wrote the readme you refer to didn’t know about the variety of encodings and thought the one he happened to be using at the time was universal.
There isn’t a general solution to problems like this, as there is no guaranteed way to determine the encoding of some text from the text itself. In this case I just used Wikipedia to look through a few until I found one that matched. Joel Spolsky’s article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) is a good place to start reading about character sets and encodings if you want to learn more.

MLA-style bibliography with BibLaTeX: How to organise by section?

I'm using the MLA authoring style. I would like to print out a bibliography subdivided into different sections. I also want annotations on each source. Is this possible with BibLaTeX? Should I just do it manually?
Yes, I think you can do that with Biblatex, but I think you should still just do it manually.
Note, though, that you are probably wanting to craft your notes differently for each citation from one paper to the next, which leads to the question: why use Bibtex at all? You can generate a Bibtex file the usual way, until all the references are there, then cut&paste the .bbl file into place in your Latex file, and annotate and reformat away to your heart's content.
So I think that Bibtex makes sense as a standard repository of the basic facts about citations you might make again and again: in particular you can get it error-free; my experience as a scientific editor is that most authors are sure that their bibliographies are error-free, most have between 10% and 60% of entries having errors in them. Latex users tend to be better that Word users in this respect, and I think that it is because of Bibtex.
Caveat: you will need to mess about with the thebibliography environment to do this. But that is another question... Also, if there are errors in your Bibtex file, you will need to correct them in two places.
Why I don't like Biblatex: the Bibtex prepresentation is a standard, and is accepted by all kinds of other document processors. You shouldn't put special Latex formatting into your bibliographic database: that will reduce the utility of that database. For m in particular, I use both Latex and Context: both use Bibtex, but only Latex uses Biblatex.
I managed to write a quite nice MLA-style bibliography with bibtex and the style provided by the Reed College (which is based on Natbib), and BibUnits to subdivide the entries in sections (as discussed here)
(let me know if you have any tips with MLA styles, my paper is not finished yet)
EDIT: my answer was for standard bibtex, not biblatex, sorry
yes, you can do it easily with biblatexwith the headings:
For instance:
\defbibheading{general}{\section*{General Architecture}}
\defbibheading{european}{\section*{European Architecture}}
\printbibliography[heading=general,keyword=general]
\printbibliography[heading=european,keyword=european]
and add the relevant keywords={architecture} keywords={general} in your *.bib files
Here is a biblatex MLA-style, if you need biblatex-mla (and a related question, you may also face this problem)

Icelandic, utf8 and utf8x in LaTeX

First of all, what's the difference between utf8 and utf8x in
\usepackage[utf8]{inputenc}
\usepackage[utf8x]{inputenc}
when used in LaTeX?
Secondly, what packages are required when writing an article in Icelandic using LaTeX? I found:
\usepackage[icelandic]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc}
after experimenting a bit but I have a feeling some part of the code may be redundant. And even with them the aforementioned packages, the code inside
\begin{lstlisting}
...
\end{lstlisting}
isn't rendered with Icelandic characters when outputted through pdflatex in Ubuntu, although it works on my friend's computer (who's running Debian). What's missing?
[UTF8] is "supported" by the LaTeX team and covers a fairly specific/limited range of unicode input characters. It only defines those symbols that are known to be available with the current font encoding.
[UTF8x], AFAIK, is no longer supported, but covers a much broader range of input symbols. I would recommend only trying it if [UTF8] doesn't do what you need.
Secondly, the listings package (and most other related packages that do character scanning) does not support UTF8 input. (If it's working on a friend's machine they must be using an 8-bit input encoding instead.) The listingsutf8 package provides a UTF8-compatible replacement for \lstinputlisting but not for the main lstlisting environment. Using XeLaTeX might help you here, however.

Resources