cjk-defn

https://github.com/yaoguai/cjk-defn

cjk-defn is a command-line dictionary program for looking up definitions of terms and phrases from the CJK languages (Chinese, Japanese, and Korean). Unlike most other dictionaries, cjk-defn allows you to analyze entire lines of text, and shows relevant definitions for all terms and phrases found. The program is implemented in Python 3, and released under the MIT License.

Features include:

History / Why?

Around 2008-2009, I wanted a program like this, but none existed at that time. All tools with similar functionality were Web-based, or proprietary, or were limited to traditional dictionary lookups based on a single term. For someone who preferred console programs, there were no applications for this. Therefore, I had no choice but to make one myself.

After playing around with different formats, I wrote a program similar to this one, also using a SQLite back-end. However, it was limited to only a few pre-defined dictionaries. Because the application was ugly Python 2 and tied to specific dictionaries, the program stayed private for years.

Coming back to the project recently, I wanted to make it general enough to be useful to other people, and capable of using any number of dictionaries. After kicking around a few designs, I went with a very simple database with just two tables and a handful of fields.

The old codebase has been replaced with a cleaner and smaller one. The new program has the flexibility to work with any number of dictionaries. Since the implementation has been generalized, I think it may now be useful to others who want such a dictionary program, and prefer the power and flexibility of Unix tools.

Dictionaries

By default, the dictionary program includes no dictionaries or definitions. These are added by creating the SQLite database and inserting data into it. The DICTIONARIES table contains basic dictionary metadata. The DEFINITIONS table is for all dictionary definitions, and includes only a few standard fields.

For definitions, several free dictionaries are available, including those from JMdict/EDICT (Japanese-English), and CC-CEDICT (Chinese-English). Tools from the "edict-to-csv" software package may be helpful in converting dictionary data for imports.

Installation

To run this program, Python 3.x is required. Installation on a Unix-like platform is advised, but Windows is possible too. If you must use Windows, then Cygwin is the best environment.

To install the program, you can use the old:

# python3 setup.py install

Or you can use pip, which is the new and better way.

Database Setup

After installing the program, the database should be initialized. The directory for the database is located under the installation path in the following location:

$(PREFIX)/var/lib/cjk-defn/

In this directory, you can run the script "make-database" to create the database. Only two tables are in the database, and these are as follows:

DICTIONARIES:       <= Each entry is a dictionary
    DI_DICT         <= Dictionary ID (letters and numbers, no spaces)
    DI_SIGIL_S      <= 2-char ID for standard definitions (e.g. =Z)
    DI_SIGIL_C      <= 2-char ID for character definitions (e.g. -Z)
    DI_SHORT_DESCR  <= Very short description of the dictionary
    DI_LONG_DESCR   <= Long description, as much as you like

DEFINITIONS:        <= Each entry is a dictionary definition
    DF_DICT         <= Dictionary ID
    DF_FORM1        <= Standard form of the term
    DF_FORM2        <= Other form of the term
    DF_ALT          <= Transliteration
    DF_DEFN         <= Full definition

Program Usage

By invoking the program with "-h" or "--help" flags, you can see usage information, and also see which dictionaries are available in the database:

$ cjk-defn -h

If we see that the dictionary "cedict" is available, then we can use it in the following way:

$ cjk-defn cedict

If we want to include character definitions, then we should add the suffix "/c" to the end of the dictionary:

$ cjk-defn cedict cedict/c

You can use any combination of dictionaries, in any order you like. If you want to view definitions for an entire text, you could do something like the following:

$ cat mytext.txt | cjk-defn cedict cedict/c | less

If you want to define a default set of dictionaries, you can set an environment variable, and then it is no longer necessary to specify the dictionaries when invoking cjk-defn:

$ CJK_DEFN_DICTS='cedict cedict/c'
$ cat mytext.txt | cjk-defn | less

Documentation

This document includes basic information about the program. In the future, the documentation should be expanded, and a manual page should be included with the program, called cjk-defn(1).

^ top