Buddhist texts and resources for the cultivation path


三藏 Sanzang: CJK Machine Translation

[Dharma Wheel]

Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and even from ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are just stored in a text file and applied at runtime.

Demo: Sanzang on the Web

This is a simplified and limited Web interface for Sanzang, that is useful for demo purposes, or for short amounts of text (one or two fascicles of classical Buddhist texts). Using this, you can try out some basic Sanzang functionality without needing to download or install any software. For anything beyond some very basic usage, Sanzang Utils is more suitable.

Sanzang Utils (ZIP)

Sanzang Utils is the newer set of tools for working with CJK text and generating translation listings. This package supercedes the old Sanzang program, and the tools in this new version are smaller and faster. This package also contains related tools for editing translation tables and making text substitutions. This software is written in Python 3, and released as free and open-source software under the MIT License. Full documentation is included. Articles are also available:

Sanzang Tables (ZIP)

This is our current set of translation rules for the Taishō Tripiṭaka, which can be used with Sanzang Utils or with the older Sanzang program. The development of these rules is a long-term project. Our aim is to have a fairly reliable translation table in the future which will ease reading and translation of the Taishō Tripiṭaka.

Older Sanzang program

This is the legacy Sanzang program, which has been superceded by Sanzang Utils (above). The old Sanzang program is distributed in RubyGem format, and it is hosted on RubyGems.org. Ruby 1.9 or later is required, and it is licensed under the GNU GPL. Full documentation is included.

Project Status

The Sanzang translation engine is ready for use. Running on a mid-range PC with a translation table of over 6000 rules, Sanzang Utils can generate translation listing files for the entire CBETA standard corpus (Taishō volumes 1-55, and 85) in less than 10 minutes. The next major phase is the development of a larger and more reliable translation table.

^ top