-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This is a class designed for those that want to use LaTeX to publish material on the internet. As it becomes more common to publish material via some content-management system, so it becomes rarer to generate (X)HTML documents directly and content for the web is usually written in some limited markup language. This class is designed to make it possible to author material for such systems using the facility of LaTeX.
It is important at the outset to know what this class is designed for and the easiest way to do that is to explain what it is not for. It is not intended as a way to take an already authored LaTeX document and convert it into a format suitable for putting on the internet. It is also not intended as a "plugin" for a content-management system so that authors can use LaTeX as the input format for their blog, forum, wiki, or whatever.
The first of these is both impossible and undesirable. It is impossible for the following simple reason. Because TeX controls the whole process of creating its output, it can do some amazing things. A web document, on the other hand, is extremely malleable and the reader can transform it considerably; therefore, not everything that TeX can do can be (easily) done on the web. It is, via a considerable amount of trickery, to get quite close but only at the expense of this malleability. And this flexibility is a good thing as it allows the reader to make the document as easy for them to access as they can. This is why it is undesirable to support absolutely everything that LaTeX can do.
The second, the plugin, is also undesirable. There is a good reason that the current input formats were chosen: they are simple. They are simple to understand by a human: when writing on a wiki, it would be extremely inconvenient to have to learn the current page's set of macros so there is an advantage to having them consistent across the whole system. They are also simple to understand by a computer: parsing a file in, say, markdown is extremely quick and can be done in real time by scripting languages such as Perl and PHP. Parsing TeX is much harder due to its complexity, and so either the scripting language has to limit the syntax or it has to pass it to an external program, both of which can put severe limitations on the system.
So this package is designed for someone who wants to write a web page, not necessarily directly, using their LaTeX skills. They want the full flexibility of LaTeX together with its familiarity (presumably they write other documents in LaTeX already) but know at the outset that the document will end up on the web.
Usage of this package is extremely simple. It is designed as a class and so should be loaded with a simple:
\documentclass{internet}
There are various options available for the class. These are used to determine what type of output to produce.
The first pair of options switch between a normal LaTeX document and a "special" one. In addition to making it easy to produce a PDF version of the final document, it is often an easier workflow to use the PDF when writing as it is simple to view. The two options that control this output are:
-
doc
This is the default and processes the document as if it were a LaTeX document. -
text
This produces the alternative output.
At the next level, one needs to select the desired output format. These are further split into two groups: a main format and a mathematical format. The main formats are as follows:
-
markdown
The standard Markdown format. -
markdownextra
The Markdown Extra format, which extends the Markdown format. -
maruku
This is the Ruby implementation of Markdown, which extends Markdown Extra. -
xhtml
This produces XHTML code. -
epub
This produces an ePub3 document. -
wordpress
This is basically themarkdownextra
format, but with a couple of modifications for a wordpress blog. -
instiki
This is suitable for theinstiki
wiki system. Note that this also selects theitex
maths format.
The mathematical formats are currently quite limited:
-
itex
This modifies the mathematics to be suitable for processing by theitextomml
program.
These can take options, which are passed on to the appropriate module (and where one module is built on top of another, they are passed down until one accepts the option). The current options are:
-
section level=integer
for the Markdown and XHTML formats, sets the level of the top-level section. -
split at=section type
for the ePub format, sets the level at which the document will be split into separate files.
There is one further option. The use filename option means that TeX will try to determine the output format from the filename. It will split off the last part of the filename (without the extension) using an underscore as the separator. This will then be passed in as if it were a class option. The expected use of this is to have symlinks to the main file with name file_text.tex
and file_doc.tex
whereupon running pdflatex
on one or the other produces the appropriate document.
The code is written in LaTeX3. My version dates from December 2012. It may be compatible with earlier versions, but I cannot say for sure. When producing a document with any kinds of links in them, it is common to use the hyperref
package. To use the hyperref
package with this class, it needs to be at least version 6.83f (dated 2012-09-26) which introduced support for the custom driver
option which this class uses.
Although the intention is that a document written using this system be written knowing the eventual output, it is certainly not unreasonable to use a single file for several outputs, or to use a single fragment in documents on different systems. Whilst the attempt is to make it so that the same input works for all outputs, there will be times when one output type requires a slightly different input to another. For that situation, the imode
command is provided. It works in much the same fashion as the beamer
command \mode
. The syntax is either \imode<mode>{stuff}
or just \imode<mode>
, the latter must be by itself on a line. In both cases, mode
is one of the possible outputs or one of the key words doc
, text
, or all
.
Let us take \imode<mode>{stuff}
first. If mode
matches what was specified as a class option, then the contents of the argument is executed. If not, it is thrown away.
The second use, \imode<mode>
is more complicated. If it mode
matches, then TeX carries on as normal. If not, it starts gobbling stuff until it reaches a line with just imode
or \begin{document}
or \end{document}
on it. In the first case, the next line should be of the form <mode>
and TeX reevaluates the mode
. In the second or third cases, it starts acting normally again.
Using the \imode
command it is possible to specify material that is only processed in one mode.
(There is not yet support for specifying multi-modes.)
Whatever the output format, pdflatex
produces a PDF. When a text format is desired, this needs to be converted to text. There are many excellent tools for doing that, the program pdftotext
from the xpdf
system works well. Use it as follows:
pdftotext -enc ASCII7 -layout -nopgbrk texfile.pdf
Since line-breaks and indentation are often significant in text formats, the text modes define an extremely wide page in the hope that no paragraph will be quite that long. Whilst pdftotext
is fairly good at preserving the layout, it is not reliable as to preserving the exact number of spaces. For this reason, in a mode where indentation is significant, every line starts with a string of the form XXXY
where the number of X
s is the number of spaces to indent. A simple perl
script can convert those to spaces. A shell script, latex2txt.sh
is provided that automates the process of going from LaTeX document to text output with correct indentation.
Too many to mention!
The major limitation is to do with external packages. Most external packages will not work directly with this class (at least, in a text mode). This harks back to the problem of taking an already-written document and converting it. LaTeX packages were written with the understanding that the final outcome would be a static document, not some text that will be further processed.
That is not to say that no packages will ever be supported. Packages could be supported by writing an alternative which translates the commands to a sensible output. However, this will need to be done on a case-by-case basis as the need arises.
Thus, for now, it is best to put all the \usepackage
statements inside a \imode<doc>{...}
command.
Fork this repository!