Buddhist texts and resources for the cultivation path

Sanzang: A CJK Machine Translation System

Sanzang (三藏) is a simple machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and even from ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and rules are just stored in a text file and applied at runtime. To learn more, check out our Introduction to Sanzang.

Demo: Sanzang on the Web

This is a simplified and limited Web interface for Sanzang, that is useful for demonstration, or for short amounts of text (one or two fascicles of classical Buddhist texts). Using this, you can try out some basic Sanzang functionality without needing to download or install any software. For anything beyond this very basic usage, the Sanzang Utils package below is more suitable.

sanzang-utils ZIP | sanzang-tables ZIP | sanzang-lib ZIP

Sanzang Utils is the set of programs implementing the Sanzang translation system. This package also contains related tools for editing translation tables and formatting CJK text. This software is written in Python 3, and is released as free and open-source software under the MIT License. Full documentation is included. You can also see the Sanzang Utils Tutorial, which teaches you how to install and use Sanzang Utils.

Sanzang Tables is a package containing our current set of translation rules for translating classical texts from the Taishō Tripiṭaka into English. These translation rules can be used by the Sanzang Utils programs, which can apply the rules to generate translations. The development of these translation rules is a long-term project. Our aim is to have a fairly reliable translation table in the future which will ease reading and translation of the Chinese Buddhist canon.

Sanzang Lib is a Python module, or library, that contains core functions of the Sanzang translation system. If you are a programmer, you can use this software to integrate Sanzang functions into your own programs. This software is written in Python 3, and it is free and open-source software released under the MIT License. Note: you only need this if you are a programmer and you want to write special programs.

The old program (you probably do not want this)

This is the legacy Sanzang program, written in Ruby, which has been superceded by Sanzang Utils (above). The old Sanzang program is hosted on RubyGems.org. Ruby 1.9 or later is required, and it is licensed under the GNU GPL. Full program documentation is included. This remains here for users who wish to continue using the older and less efficient implementation.

Project Status

The Sanzang translation engine is ready for use. Running on a mid-range PC with a translation table of over 6000 rules, Sanzang Utils can generate translation listing files for the entire CBETA standard corpus (Taishō volumes 1-55, and 85) in less than 10 minutes. The next major phase is the development of a larger and more reliable translation table.

^ top