  utf-8 for Sanskrit (utfskt) - Typesetting Sanskrit Texts with OmegaTeX


Processing Sanskrit texts in utf-8 notation with Omega TeX


Strictly speaking (La)TeX does not require a Sanskrit package, because all diacritics can be produced by TeX commands: a-macron (ā) can be produced through \=a, d‐underdot (ḍ) as \d{d} and so forth. Although there are individuals who, presumably as a sort of cyber-prāyaścita, still use this cumbersome system, there are now various ways to ensure that the input text remains readable. Unfortunately many Sanskritists who tried TeX a long time ago, when none of these tools were available, discarded it for reasons that no more apply.

It is well known that the sudden recognition that computers would be used in languages other than English led to the construction of many "code pages" and few Indologists interested in computers could resist constructing a more ingenious font layout than their predecessors for use of Sanskrit and other Indian languages. As long as text files were kept on a private harddisk this had no detrimental effects, but for sharing documents, sending electronic texts to a publisher, creating databases of texts, and for internet presentation, the existence of more than a dozen ways to encode texts in one language was a problem.

One attempt to resolve this was through the production of fonts in a unified layout for different programms and platforms, as for instance in the so-called CSXp (Classical Sanskrit Extended plus) convention, which was also widely used with TeX. For typesetting texts in Devanāgarī there is the devnag package, which consists of a high quality font and workes with a preprocessor. This preprocessor expects text in a specific coding: "aa" for a-macron, ".d" for d-underdot etc. The disadvantage of this was that different input files were required, one encoding for transcription and another for Nāgarī, which had a negative effect on searching and conversion. If we take into account that Sanskrit is only one in the canon of languages used in Indology, the growing set of converters and preprocessors was soon to become difficult to handle.

Understandably it was expected that Unicode would be the solution to some of these problems in the near future. The creation of a Unicode enabled TeX, Omega, was a step in this direction, but the unpredictable cycles for development and support led to various successors, as aleph or mem. Another attempt to harmonize Unicode and LaTeX without depending on Omega is the package ucs, which works with normal TeX and translates a large set of utf-8 input into the correct TeX-codes - a good alternative to utf-skt for those who wish to stick with normal TeX, but only for transliterated Sanskrit.

The present package is the result of occasional contacts between some, mostly Indological, TeX users, which led to an informal discussion group, which eventually included (in alphabetical order) Stefan Baums, Jürgen Hanneder, Richard Mahoney, Norbert Preining, John D. Smith, and Toru Tomabechi. In the course of these discussions it became clear that most of the components for a Sanskrit environment for OmegaTeX had already been written by Stefan Baums and Toru Tomabechi and "only" had to be harmonized, with the exception of an external translation program for Transliterated Sanskrit Unicode -> Devanagari Unicode (ur2ud), which was written by J.D. Smith. The package described here is an attempt to put all these elements into a simple LaTeX interface and add a few extensions, for instance to cover Tamil transliteration.

J Hanneder ( )

