LaTeX vs DocBook [closed] - latex

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have only little knowledge about LaTeX, basic formatting, basic math fomulae etc.. I found that LaTeX is hard to configure to my own flavor. Recently, I've heard about Docbook, which is also a typesetting mechanism, but much easier since it uses XML. So, if my main job using LaTeX/Docbook is writing a simple document (not a class book) with some mathematics, and I want easy configuration, and a highly constomizable application, which one is better, and is there any good book on Docbook?

DocBook isn't "a typesetting mechanism". DocBook is all about separating presentation from content. DocBook only deals with content; it's used to create an abstract representation of a book, article, etc. There are numerous tools out there which layout DocBook according to predefined templates. Some of these tools use LaTeX. AFAIK, O'Reilly uses a slightly modified version of the DocBook language to author their content, then they feed this XML into custom scripts that integrate with Adobe FrameMaker to layout their books.
LaTeX is essentially an attempt to separate presentation from content within TeX, but it doesn't quite achieve that goal IMO. Presentation is still mixed with the content in most cases. I think LaTeX is currently the best open source tool for laying-out paginated documents. However, proprietary tools like InDesign have many features (like good OpenType support) that TeX doesn't have (XeTeX kind of adds OpenType support). Either way, if you're writing a book, I highly recommend using DocBook to author your content rather than LaTeX.
That said, it sounds like you're writing short, one-off documents with a bit of math. I think LaTeX is probably your best choice. If you need lots of customizability, you might need to use Plain TeX as opposed to LaTeX, but it's going to require quite a bit of work on your part.

Well, I haven't used DocBook, but from a quick look on wikipedia and google:
DocBook does not have elements to describe mathematics.
DocBook is XML, as you say. To me, that makes it a horrible thing to write by-hand (or, rather, with a basic text editor). Maybe you enjoy writing XML, or have a good IDE. I guess you could look at this question.
DocBook's Wikipedia page lists a couple of books on it which you may want to look at, though I obviously can't say whether they are "good" books.
I would suggest going with LaTeX. Get someone to give you a basic template, then writing LaTeX is as simple as:
\section{Introduction}
This is my introduction.
\section{Stuff}
Here is some stuff.
\subsection{Particular stuff}
A particular type of stuff. With maths:
$\int_{x=1}^n 3x^2$
% etc.
Google is your friend for finding basic templates that you can start from:
One
Two
Three
To go from source code to a document, you'll need a working install of LaTeX (which is beyond the scope of this answer, but is pretty easy if you're on linux). Ideally your LaTeX install will include pdflatex. Then you just run:
pdflatex source.tex
(there's a bit more work if you have a bibliography – but that's a topic for a different question)

The great thing about DocBook is that it is XML based - so a chapter is a full subtree, a section is a full subtree, etc. In LaTeX, separation is only determined by the structure of the document during a linear scan.
The worst thing about Docbook is that it is XML based - lower-level stuff is extremely dirty and annoying to code manually.

I'm not really familiar with DocBook, though I have used LaTeX fairly extensively. The idea of LaTeX is not to produce a customized document, it's to produce a readable, attractive document. It's a set of libraries, templates, macros, and so forth around TeX, set up by people who know what they are doing when it comes to document design. Of course, you have special needs that they can't anticipate, so you're going to have to do some tweaking, too. It is a very high-level, declarative language that is meant to reflect the content and structure of a document, rather than what it should look like, the idea being that your ideas and how they are organized is what you should concern yourself with, not the layout of your text on the page. If you need more control, there exists a HUGE library of additional styles and macros and so forth (CTAN), and some of them (memoir comes to mind) give you back a lot of that control.
If you are shoving a lot of complicated formatting stuff into the body of your LaTeX document, you're doing it wrong. What you need to do is get your content in there, and your document structured into chapters and sections and subsections semantically, then go back in and worry about formatting. You shouldn't have to go into the body of your document much at this point; it should all be general stuff that applies to the whole document, preferably in a reusable way. This ensures consistency.

Yes, LaTeX is kind of difficult to configure to produce exactly the kind of layout you want. I suggest you take a look at the manual of the LaTeX class memoir to see what kinds of layouts it enables you to produce.
There is a book on DocBook available online. Take a look at that too, to see what kind of layouts you can produce and if you can easily format the math content you want with DocBook.
My suggestion is to go with LaTeX if you have to write any nontrivial math, but of course it depends on which format you find it easier to work with.

About two years ago, I tried to like and use DocBook; however, I returned to LaTeX because, at least at the time, LaTeX produced better quality output (PDFs). I never managed to get the DocBook to LaTeX to PDF translation working. My problems were likely "operator error", but I suggest trying DocBook (and LaTeX) for a few simple documents before choosing one.
Here are a few points that led me to choose LaTeX:
BibTeX for bibliographies with JabREF as a GUI
Excellent quality PDF output
Lots of examples on the Internet, including several similar to my preferred format
Good books, like "A Guide to LaTeX"
If you like GUIs, take a look at LyX.

The real reasons to use DocBook center on having your document marked up meaningfully, being able to validate it, and transform it for many purposes, not only publishing. LaTeX and other macro sets add a layer of semantic markup, but you're always free to introduce TeX code, and add macros from other sources. Fundamentally, a TeX document is a computer program that can only be parsed by a TeX processor.
For maths and DocBook: DocBook being XML it allows you to use other XML technologies as appropriate; in this case MathML. The XMLmind XMLEditor already mentioned provides a GUI maths editor, and includes stylesheets to format them for web and print along with the DocBook contents.
There are also tools available that enable translation of XML documents into other languages (xml2po is a simple one, http://heartsome.net/EN/home.html is a whole suite).

I don't want to go down the "easier" or better route as I regard this as a matter of taste and getting used to. I see docbook being XML as an advantage as therefore it can be morphed into almost anything you like by using XSLT. Combined with its self-containedness it feels more like structuring content that Latex does. Especially documenting open source software Docbook is really widely used. You can easily grab the templates and stylesheets of e.g. Hibernate and/or Spring and tweak them to your needs.
Another aspect I'd like to spot on is integration in build systems. For Maven there is a plugin called docbkx available, that just spits out PDF, HTML and whatever you like based on the contents and an appropriate XSLT. No further installations needed. The only ways I have seen to get this done with Latex is installing a few packages to the build OS and building your own script around em. IMHO that's not a feasible way to go, especially if you build cross platform.
Regarding the editor I can advise XMLmind XMLEditor that takes a lot of the pain and provides quite a nice WYSIWYG approach to docbook.
If you rely on mathematical expressions I also would rather choose Latex as there is nothing with the same power available in docbook.

FWIW I use docbook via xmlmind (http://xmlmind.com/) to produce html and .chm files. I've also set fop up to produce pdfs, but they aren't pretty.
Having got the docbook source done, I cook it with xsltproc and the docbook.xsl files. This is protracted and painful to set up, but once it's working it's sweet.
Another approach would be to use pandoc (an extended markdown type tool) to get from markdown to DocBook. This would cut the xml editor out, but you still have to do the transformation(s) to your output format.

Whoever had to create a professional, scientific document (research paper, book, technical guide etc.) will know why TeX is a better choice.
For those who are not aware of some facts here is a perfect example: at good colleges student's work may be completely refused if (s)he did not properly reference other people's works. There are, I believe, hundreds of "official" ways for citing and referencing, Harvard school has its own, ACM their own, among computer scientists numeric (Vancouver) notation is the most common. Many professional organisations have their own styles, and they stick to it. As far as I know, TeX is the only typesetting system that is aware of that, and with the help of BiBTeX it becomes extremely powerful tool for authors. It can save hours, if not days, of work.
If I was a novel writer, or author of some non-technical document, I might chose DocBook.

Have you looked at ConTeXt. It is more flexible and much easier to configure compared to LaTeX.

Arbortext supports native LaTeX. You can send the publishing engine or print composer LaTeX and it'll pass it through. It also supports a lot of other composition languages as well.

Related

Software to identify patterns in text files

I work on some software that parses large text files and inserts data into a database. Every time we get a new client, we have to write new parsing code for their text files.
I'm looking for some software to help simplify analyzing the text files. It would be nice to have some software that could identify patterns in the file.
I'm also open to any general purpose parsing libraries (.NET) that may simplify the job. Or any other relevant software.
Thanks.
More Specific
I open a text file with some magic software that shows me repeating patterns that it has identified. Really I'm just looking for any tools that developers have used to help them parse files. If something has helped you do this, please tell me about it.
Well, likely not exactly what you are looking for, but clone detection might be the right kind of idea.
There are a variety of such detectors. Some work only one raw lines of text, and that might apply directly to you.
Some work only on the works ("tokens") that make up the text, for some definition of "token".
You'd have to define what you mean by tokens to such tools.
But you seem to want something that discovers the structure of the text and then looks for repeating blocks with some parametric variation. I think this is really hard to do, unless you know sort of what that structure is in advance.
Our CloneDR does this for programming language source code, where the "known structure" is that of the programming language itself, as described specifically by the BNF grammar rules.
You probably don't want to Java-biased duplicate detection on semi-structured text. But if you do know something about the structure of the documents, you could write that down as a grammar, and our CloneDR tool would then pick it up.

Pipeline for writing a book for programmers in a collaboration

I'm a member of a group of enthusiast writers, who decided to collaborate on a cookbook-style book for one of programming languages.
We're trying to pick a pipeline for the collaboration.
I like how ProGit is made.
That is Markdown + some custom pre-processing, processed by Pandoc. But I'm concerned that Markdown is too simple for our case.
I look at Sphinx, but I have no experience using it.
I know that LaTeX would work — but I'm afraid that it will scare off the contributors. Also it may be too powerful, and too easy to build a byzantine pipeline if you don't have the necessary experience (which I do not).
Please do not suggest solutions where a person have to write XML by hand or must use some specific GUI (optionally available GUIs are good, of course). Commercial and non-crossplatform solutions are not an option as well.
It's hard to say whether pandoc's extended version of markdown would be too simple for your case unless you say what features you need. Note also that, if you're able to do a bit of very simple Haskell scripting, you can use the pandoc API to add features.

If TeX is a programming language, how could I start programming in TeX?

I use a Mac. But I also have a PC with Windows 7. So when I want to start programming functionality for LaTeX using TeX, what's my starting point? Is there an SDK and documentation? I couldn't find any book on TeX programming.
Programming something in TeX that isn't a document:
A BASIC interpreter
The Mars rover
So it can be done — it's just an exercise in esoteric, obfuscated programming.
Read Don Knuth's The TeXbook—everything you wanted to know about programming TeX, straight from the source. My favorite chapter is Appendix D: Dirty Tricks.
(Michael Plass, who was Don's student and worked with Don on TeX, told me once that "Don tried very hard not to make TeX a programming language. Unfortunately, he didn't succeed.")
To quote Eric Raymond:
TeX is intentionally Turing-complete (it has conditionals, loops, and recursion), but while it can be made to do amazing things, TeX code tends to be unreadable and painful to debug.
He goes on to say that even if it's possible to program in TeX, it's a really bad idea.
TeX and LaTeX (its follow-up) are quite old languages nowadays, but are still widespread and often used. Their primary use is found in universities, specifically with mathematics and natural sciences. It's not a programming language like C#, PHP or JavaScript, it's more a document-layout language (a bit like HTML perhaps, without any of the modern events).
The idea behind TeX was to use the computer to calculate how the text would be best laid out on the paper when you print it. That means that in TeX, you lay out a table, but you don't say anything (or little) about its size. The TeX compiler will take care of that.
When it comes to books or sites, there's a sheer many of them. Try amazon, for instance this Guide To LaTeX.
Forget about TeX, use LaTeX. It's the same, it's easier, and it's more widespread. LaTeX is TeX. But TeX is not LaTeX.
Let me suggest Prof. Knuth's site:
link text
There are all the books you would want to have on TeX.
Please see the answer I gave you to your previous question.
You could also try starting out with the expl3 programming language, which is a layer on top of TeX with more consistent syntax, more abstracted data structures, and quite a deal more in-built functionality for performing programmatic tasks than LaTeX's kernel. Disclaimer: I'm involved with its development.
Just to add to the other good answers, as TeX is a typesetting system there are not IDEs in that sense. There are lots of TeX aware editors (I use TeXworks, but Emacs is obviously a popular choice particularly on Linux, whole TeXniccenter is very common on Windows).
if you ever need to write an essay or a paper or anything that you want to look nice and publishable, LaTeX is definitely your friend. I recommend More Math into LaTeX as that book really helped me in the past. Wikibook also has an excellent LaTeX guide (on top of being free!).
If you're familiar with Eclipse IDE already then the TeXlipse plugin is great for beginners since it can show user immediate feedback and documentation.

WYSIWYG vs WYSIWYM

Which one is better and ideal in a web based application?
Edit:
Actually I am developing a community site. So the level of users may vary. Heard about XSS security issues with WYSIWYG editors. Also I am not familiar with WYSIWYM editors and its features. As far as I know the features in WYSIWYM editor is less compared to the other one. I figured one named as "WMD: The Wysiwym Markdown Editor". Its quite easy to use.
So security and ease of usability should be there. In such a situation which editor will be better.
If your users can handle WYSIWYM, I'd go with that.
I'm considering your system will be visual, that is, if you say something is a title it'll look like a title (otherwise the WYS part wouldn't apply). If the user has to manually type markup, then only the most savvy or technical users will be able to handle it.
What I've seen with most users is that they have trouble giving meaning to what they want in a document. They don't think "this is a title", they think "this should be bigger and bold". People that cannot think "this is a title" can't handle a WYSIWYM or they'll find it hard.
Who is going to be your user base? If it's people writing academic papers I'd go with WYSIWYM because they'll have no trouble handling it. If it's for house wives writing recipes, they may not be able to handle it or they'll find it so hard that they'll decide it's not worth the effort.
For me the ideal is WYSIWYM, but do it only if you think your target users will be able to handle it, otherwise you'll have to go with WYSIWYG.
I personally love the WYSIWYM mechanism. I use it for my own work as much as possible. I like it so much that I try to get others to try it too.
Boy, that goes over like a fart in a space-suit.
My cynical self assumes this to mean that most folks are ruined by tools like Word. Everyone knows to make a meaningful document. They also know what a meaningful document looks like. If it doesn't look like that, the tool is wrong! What's actually happening is these document producers don't actually know what they mean, and are used to hiding that fact with pretty borders and adjusting tab-stops.
What I really think is happening, though, is that these folks who are resistant to WYSIWYM are that way because it's a harder way of thinking about something they already invested in learning. This is a level of abstraction above WYSIWYG, though not quite as far removed as composing documents in markup like LaTeX or HTML. And since they can already create any sort of document in a tool that requires no abstraction, it's just a hard sell.
That being said, I think you should force WYSIWYM on your users if that is feasable. There are some good reasons for this
All of the benefits that naturally come with two stage composition. Formatting is not decidable until the document is composed, so any time spent before the document is finished on formatting is time wasted. Get it composed quickly.
The document is marked up with semantic information. This can be used in searches, or for other tasks that strictly visual markup cannot. This is especially useful for accessibility.
By depriving your users of arbitrary formatting decisions, all of your documents will follow company branding. Everything will be in a standard font and color. All text will use the same spacing and height. It will look to readers like it came from a single entity.
Check this
http://www.wymeditor.org/
Could you be a bit more specific? What kind of web application? How many users? Who will the users be?
In general though I've had quite of bit of experience implementing WYSIWYG editors for various CMSs and found them to be quite problematic because clients often like to go wild with formatting their content repeatedly over and over again and often end up having the editor generate HTML of poor quality. This causes all sorts of layout issues or simply pages that look really messy because everyone likes to fancy themselves as graphic designers.
If done properly WYSIWYG can work very well, but it is more work to really get it right, especially when taking into consideration CSS. Most of the good editors are nicely configurable and allow to specify just how much control to give the client over visual formatting.
As for the quality of the code they generate, tools such as FCKEditor and TinyMCE are very mature and do a good job of editing out the irrelevant crud in the source code, but be prepared to provide support for clients using a WYSIWYG when their content doesn't look the way they would like it to.
Since WYSIWYM editors are much like WYSIWYG with structural formatting instead of visual formatting, philosophically I think they are better and less problem prone. So if the client doesn't have a need to visually format the content I think WYSIWYM is bound to cause less headaches down the road.
The editor used here in Stack Overflow is a good example of restrained WYSIWYG. You can format the content visually but only to a certain extent.
If your users a tech savy and understand how the basics of markup go and you think they will feel more empowered using WYSIWYM, then use that. If your application is going to be used to people who have little technical knowledge, use WYSIWYG.
Unless you are doing a tool for print layout (i.e. Indesign or maybe a mailing list printing tool) you are probably better to stick with WYSIWYM.
It is likely to be easier to implement
Web browsers are highly configurable and you may not have fine grained control over items like font size.
The structure is explicit so rendering to different media is comparatively easy.
It avoids the temptation on the part of users to over-design the document or whatever they are entering.
Document structure facilitates indexing, table-of-contents generation and cross-referencing where this is relevant. Compare (for example) maintianing a large index in Word with doing in LaTeX or Framemaker.
Anecdotal experience from LaTeX users (in particular) suggests that organising a document by structure is likely to produce a better document.
From my experience with MYSIWYM, I was very seduced by the idea and the looks, but I was then deceived to know that the editor didn't give me a simple and efficient way to restrict the user, for example the user can insert images inside paragraphs... and I don't want that... I want more control over what the user can do.

Latex styles - what do you use and where to find them

What Latex styles do you use and where do you find them?
The reason I'm asking this is that it seems that some 99.9999% of all styles on the internet are copies of each other and of a physics exam paper
However, when you try to find a style for a paper like this one... Good luck, you are never going to find it.
Creating your own style is often not really an option, because it requires you to dig quite deep into the very advanced features of TeX/LaTeX and fighting your way against possible incompatibilities with document classes/packages/whatnot.
LaTeX was originally designed as a reasonably flexible system on which a few standard classes were distributed — that were themselves rather inflexible.
In the current state of affairs, if you want a custom layout, you need to write a few amount of supporting code yourself. How else would it happen? It's not like HTML+CSS gives you templates to work with; you need to implement the design yourself.
Creating your own style is often not really an option
Ah, well, not unless you know how to program in LaTeX!
Seriously, it all depends on knowing where to start and what to build on top of. That catalogue you give as an example would, in my opinion, be reasonably easy to do in LaTeX; it's just a bunch of boxes.
You could write something like
\newcommand\catalogueEntry[4]{%
\parbox[t]{0.23\linewidth}{\textbf{#1}}%
\hfill
\parbox[t]{0.23\linewidth}{\includegraphics{#2}}%
\hfill
\parbox[t]{0.23\linewidth}{\textbf{Characteristics}\\ #3}%
\hfill
\parbox[t]{0.23\linewidth}{\textbf{Application}\\ #4}
}
and use it as so
\catalogueEntry{Spotlights}{spotlight.jpg}
{Eclipse spotlights are...}
{Narrow to medium...}
This is just a basic illustration of what could be knocked up quickly — much more sophistication could be used to turn this into a more flexible system.
I see LaTeX as an extensible markup system. If you separate your markup from its presentation on the page, it's not too hard to get your information represented in whichever form you wish. But getting started is a little tricky, I have to admit; the learning curve for LaTeX programming can be rather steep.
Memoir is a more flexible document class than the default ones, and its manual is excellent.
Well I think CTAN is the best resource for LaTeX and TeX-related stuff. Also lots of scientific organizations provide their own styles, it makes sense to try tracing who was the author/publisher of the paper you like and check their websites.

Resources