Software

Current Software

One important goal of this project is to make new and original cross-platform tools available for working with CJK source materials (Chinese, Japanese, and Korean). The main software unique to this project is Sanzang, a framework and set of tools that can aid readers in reading and analyzing texts. Other tools are also available such as a console CJK dictionary program, and tools for dictionary conversion and Hanyu Pinyin formatting.

sanzang-utils
This is our standard implementation of the Sanzang machine translation system. Using these programs, you can reformat CJK texts, generate rough translations into other languages, edit translation rules, and perform text substitutions. Source code is hosted on GitHub, and a simple Web tool is also available for online translation using a translation table for the Chinese Buddhist canon into English. Several articles are available, including Introduction to Sanzang, Sanzang Examples, and Sanzang Utils Tutorial.
sanzang-tables
This is a package containing our current set of translation rules for translating classical Buddhist texts from the Taishō Tripiṭaka into English (zh_en_tripitaka). These translation rules can be used by the sanzang-utils programs, which can apply the rules to generate translations. The development of these translation rules is a long-term project. Our aim is to have a fairly reliable translation table in the future which will ease reading and translation of the Chinese Buddhist canon. You can download a ZIP file to get a copy of the current files.
pinyin-dec
This is our software for formatting Chinese Pinyin transliterated text with the proper diacritics. Using this software, you can write nicely-formatted Hanyu Pinyin very easily, or you can automate the program from a script, and use it however you like to reformat text from files or other sources. The source code is hosted on GitHub, and a little Web tool is available for online use. You can read our article pinyin-dec: Pinyin Formatting to learn more about this program.
edict-to-csv
edict-to-csv is a set of tools for converting EDICT format dictionaries to delimited text (CSV). Using these programs, you can convert dictionary entries in EDICT1 or CEDICT formats. Useful for converting some definitions for open-source Chinese-English and Japanese-English dictionary projects. The source code is hosted on GitHub.
cjk-defn
cjk-defn is a small and flexible console CJK dictionary program with a SQLite back-end. Using this program, you can add dictionaries and use them to analyze lines of text. The application can match by the longest matching terms, or do single character look-ups. Any number of dictionaries can be added and used at the same time. The source code is hosted on GitHub.

Legacy Software

Occasionally we made programs or libraries that were later replaced by better or more suitable implementations. The old software is still listed below for reference purposes, but use of these programs is discouraged in favor of current programs such as sanzang-utils.

sanzang-lib
This is a Python module, or programming library, for using the basic functions of the Sanzang translation system. Originally, sanzang-utils programs could not be used programmatically, and so a special library had to be made for integrating the functions with other programs. Since sanzang-utils version 1.3.0, the program modules can be imported, which makes the library redundant and obsolete.
sanzang (Ruby implementation)
This is the first implementation of the Sanzang translation system, made with the Ruby programming language. While this program contained all the basic functionality for Sanzang, the Ruby program suffered performance issues due to slow string operations in Ruby. The Ruby implementation of Sanzang was replaced with a Python implementation, called sanzang-utils, which is much faster, lighter, and more efficient.

^ top