From Scratch: A Complete Guide to Configuring the Wubi Input Method

Preface#

Rime cannot be simply classified as an input method; rather, it is a framework for input methods, and "Squirrel" is its application on the macOS system. Specific operating systems and their corresponding input methods can be referenced in the table below.

Operating System	Corresponding Input Method	github
Windows	Weasel	https://github.com/rime/weasel
macOS	Squirrel	https://github.com/rime/squirrel
Ubuntu	Zhongzhouyun	https://github.com/rime/ibus-rime
iOS	Hamster Input Method	https://github.com/imfuxiao/Hamster
Android	Tongwen Input Method	https://github.com/osfans/trime

These applications operate based on the configuration of input schemes + corresponding schemes, so under this framework, theoretically, by integrating different configuration files, one can achieve a highly customized input method.

It should be noted that there are many mature input method software available on the market, and Rime's customization requires a certain level of familiarity. If you do not have high demands for input methods in your daily life, or if you do not need to improve productivity through customized dictionaries at work, then most of the content in this article may not be useful for you.

This article serves as notes taken during research for personal reference when reviewing related knowledge. Since my main platforms are macOS and iOS, the content primarily focuses on Squirrel and Hamster Input Method. However, as mentioned earlier, the underlying framework is the same Rime configuration and dictionary combination, regardless of the platform application. If you are interested in Rime and similar customized input methods and need to look up relevant information, this article can provide some reference.

There are many mature input method applications available on the market, and choosing one that requires high customization seems to be quite a hassle. Since I have relied heavily on Wubi input method since childhood, and due to a work experience during my early internship where I was highly dependent on fixed dictionaries, I have a unique sentiment towards customized input methods.

To make mainstream input methods user-friendly, they often focus on expanding dictionaries and providing richer interface customization. To make products more "user-friendly," most features require internet access, which is referred to as product optimization. This can lead to an uncomfortable feeling of being monitored when typing some personal key information, making Rime a preferred choice for me.

With the recent years of Apple's openness to third-party input method applications, I can finally achieve my wishes, hence this record.

Finally, the content of this article may be updated over time, and I will continue to update it on my personal blog. If you read this article on a public account, you can find the original address and updates at the end of the article.

Table of Contents

Configuration Files
Input Schemes (Schema)
Expanding Dictionaries
Other Supplements
References

Configuration Files#

System Configuration Directory#

"/Library/Input Methods/Squirrel.app/Contents/SharedSupport/"
This directory contains the preset configurations for Rime, which will be automatically updated when the software is updated, so there is no need for manual modification.

User Configuration Directory#

The modifications we need to make are usually in the ~/Library/Rime/ directory. Since Rime supports high user customization, I need to understand the main purposes of the files in this directory.

The installion.yaml file records the version information of the current Rime program. It contains a field installation_id used to uniquely identify the current Rime program when synchronizing user dictionaries.
The user.yaml file records the user's usage status, such as the timestamp of the last "re-deployment" and the last selected input scheme.
The build directory contains files generated after each "re-deployment," including the .bin files generated from compiled dictionary files and various yaml configuration files generated after merging with custom configurations.
The xxx.userdb directory contains the user dictionary for the corresponding input scheme, which includes dynamic information such as phrases and word frequencies selected by the user during use. This directory is updated in real-time.
The sync directory is used for user data synchronization. When clicking on Squirrel to synchronize user data, the contents of the user configuration directory will be synchronized and copied to this directory. Each sync/installation_id directory corresponds to user data from the Rime program on different devices.

Default Configuration Modification#

Warning

Note
If you want to modify the configuration, it is recommended not to directly modify the original default.yaml file, but to create a new default.custom.yaml file, where xxx is the same as the original file name. This is because some different input schemes may use the same default configuration name, and for content that requires custom configuration, using a newly created configuration can prevent the loss of custom configurations.

If you are unsure about what default configurations are needed, you can import a pre-made input scheme + configuration list from others. The open-source project Awesome-rime integrates mainstream shape codes, sound codes, Chinese, and minority language schemes available on the market, thanks to @ayaka 14732 for the compilation.
GitHub - ayaka14732 Awesome Rime Input Scheme and Configuration List

Scheme Switching#

You can switch between different input schemes and related configurations, such as candidate words, by viewing the shortcuts in the configuration.


patch:
  # List of schemes available in the menu, you can adjust the order according to your preference, the first one is the default scheme
  schema_list:
    - schema: numbers # Uppercase numbers
    # - schema: wubi86_jidian_trad        # Wubi - Simplified to Traditional
    # - schema: wubi86_jidian_trad_pinyin # Wubi Pinyin Mixed Input - Simplified to Traditional
    - schema: wubi86_jidian # Wubi
    - schema: wubi86_jidian_pinyin # Wubi Pinyin Mixed Input
    - schema: pinyin_simp # Simplified Pinyin


menu:
  page_size: 8 # Number of candidate words, supports up to 10
switcher:
  hotkeys: # Shortcut keys to pop up the menu
    - "Control+0"
    - "Shift+Control+0"
  abbreviate_options: true
  caption: "【 Input Method Settings 】"
  option_list_separator: "｜"

Color Appearance#

The configuration file for color appearance is squirrel.yaml, and we need to create a new squirrel.custom.yaml file in the user configuration directory.

This file is the appearance configuration file for Squirrel, and the configuration content includes color schemes, layout methods, fonts, etc.

Input Schemes (Schema)#

A scheme must include a scheme definition file (.schema.yaml) and a dictionary file (.dict.yaml)

Scheme Definition File (.schema.yaml)#

The file name prefix is <schema_id>, such as wubi86_jidian. This file configuration includes basic information such as scheme ID, scheme name, version number, author, dictionary dependencies, etc. It also includes the core components of the scheme.

Introduction to the Core and Components of the Rime Engine#

A diagram has been drawn to roughly explain the core components and working principles of Rime.

Dictionary File (`.dict.yaml`)#

Using the Ji Dian Wubi 86 scheme I use as an example:

---
name: wubi86_jidian  # Dictionary name. It can be the same as the input scheme identifier (`schema_id`) or different.
version: "4.3"  # Dictionary version
sort: by_weight # by_weight=weight original=original
import_tables: # This is where user-defined dictionaries are added.
  - wubi86_jidian_user # Personal private dictionary
  - wubi86_jidian_user_hamster # Hamster dictionary
  - wubi86_jidian_extra # Extended dictionary

use_preset_covabulary: true # This imports the essay.txt from the system configuration directory,

# The following is the code table

Compiling Input Schemes#

The build directory in the shared directory contains precompiled .bin files at the time of program release, while the build directory in the user directory contains .bin files generated after the user clicks "re-deploy."

Here are the main binary files generated and their purposes.

"Rime Prism" <scheme_id>.prism.bin
This is a binary file generated by combining the dictionary source files and the spelling operation rules defined in the scheme. (I guess it is based on the syllable encoding defined in the dictionary source files and the spelling operation rules to generate the corresponding relationship between the original syllables and replaceable syllables. For example, the original syllable "lue" + spelling operation rule "derive/^([nl])ve$/$1ue/" generates the corresponding relationship "lve" → "lue". Thus, when the user inputs "lve," it can find the character for "lue" based on this relationship.)
"Rime Solid Dictionary" <scheme_id>.table.bin
This comes directly from the dictionary source file .dict.yaml
"Rime Reverse Dictionary" <scheme_id>.reverse.bin
This is the binary file for the reverse dictionary. For example, when I use Wubi as the main input method but forget how to decompose characters, I will use it in conjunction with Pinyin. When I use Pinyin to spell out characters, the Wubi encoding will be displayed behind the characters.

Spelling Operations#

Spelling operations are used to achieve intelligent error correction and fuzzy sound functions. If not enabled, it must strictly follow the definitions in the dictionary file. The Wubi scheme does not often use this function, so I will not make notes on this aspect. If interested, you can read the reference articles at the end.

Modifying Punctuation#

In half/full-width mode, pressing the target key will provide related candidates. For example, pressing / will provide candidates like ÷ or 、, which can be modified in the scheme configuration.

Reference Example

punctuato	r:
	full_shape:
		# ... omitted ...
		"/": ["／", "÷"]
	half_shape:
		"/": [	"、", "､", "/", "／", "÷"]

Expanding Dictionaries#

Create a <name>.extended.dict.yaml file, and the name in the configuration needs to be consistent with the dictionary suffix. *.extended is not necessary; it is just to distinguish that this is an expanded dictionary. For example, I prefer to use <name>.extra.dict.yaml.

In the corresponding input scheme's dictionary configuration, use import_tables to import this dictionary into the input scheme, allowing for configuration based on different usage scenarios. For example, create a dictionary for countries and regions named china_district.extra.dict.yaml, and adjust the configuration in my wubi86_jidian.dict.yaml to include this dictionary.

# This configuration is for `wubi86_jidian.dict.yaml`
name: wubi86_jidian
version: "2.0"
sort: by_weight # by_weight=weight original=original
import_tables: # This is where user-defined dictionaries are added.
  - wubi86_jidian_2357 # Personal private dictionary
  - wubi86_jidian_user_hamster # Mobile Hamster dictionary
  - china_district_extra # Custom extended country and region dictionary

Custom Phrases (Reference)#

If you are a Pinyin user, it is not recommended to use English words and initials as the encoding for the Pinyin dictionary. Due to Rime's slicing mechanism, doing so will overwrite other alternative words that are needed. For words that require such mapping, like msd->马上到, it is more familiar to use custom phrases.

Since Wubi input does not have this slicing mechanism, you can directly set the encoding without using custom phrases. The following example can be referenced by Pinyin users.

Since custom phrases are independent of the dictionary, the priority of candidate words is too high, so use them with caution. More specific custom phrases can be referenced in the article by Ha Wu Wang at the end, which is more detailed than the official documentation.

## Example of custom_phrase.txt
custom_phrase:
  dictionary: ""
  user_dict: custom_phrase # Needs to be the same name as the custom phrase file
  db_class: stabledb
  enable_completion: false
  enable_sentence: false
  initial_quality: 1

Manual Word Creation#

Manual word creation is done by adding words directly to the .dict.yaml file.
Thanks to @KyleBing's open-source project Wubi Code Table Assistant, which provides a graphical interface for word creation.

Wubi Code Table Assistant

Automatic Word Creation#

Automatic word creation will disable the automatic screen-up and top-word screen-up functions (after all, you need to confirm the selected word).

You need to modify the following contents in wubi86_jidian.schema.yaml

speller:
  # max_code_length: 4                 # Four-code screen-up
  auto_select: false                   # Automatic screen-up

translator:
  enable_sentence: true                # Sentence input mode
  enable_user_dict: true               # Whether to enable user dictionary (user dictionary records dynamic word frequencies and user words)
  enable_encoder: true

Entry Weight#

The larger the weight number, the higher the weight, and the higher the ranking.

LUA Extension Scripts#

Rime defaults to load Lua script paths at: user configuration directory/rime.lua
If you cannot write your own, you can also find scripts already written by others online. After adding the LUA script, you need to introduce it in the corresponding input scheme's configuration file from engine/translators.

Example of introduction:

patch:
	engine:
		translators:
		- lua_translator@*<script_name>
# Add lua script

Other Supplements#

Multi-Device Synchronization#

The principle of multi-device synchronization is to use the same configuration address for synchronization.
Since I personally use Apple platforms, I use iCloud for synchronization. Moreover, since the iOS platform defaults to saving in iCloud, if you are on another platform, this storage address will be based on iOS.

During the first synchronization, you need to determine the shared scheme. Of course, you can also customize different input schemes based on the needs of different devices.

If you want to transfer the PC scheme to the mobile device, you can enable the Wi-Fi upload service for the scheme on the mobile device and replace the scheme configuration directory RIME on the Mac with the original directory on the mobile device. If you want to transfer the mobile scheme to the PC, you can enable iCloud synchronization and copy the application files to iCloud. Note that this will overwrite the original files with the same name in iCloud, so be sure to back up.

References: One Skill a Day | Use iCloud Drive to Achieve RIME Input Method Configuration Cross-Platform Synchronization - Minority
References: Multi-Device Synchronization | oh-my-rime Input Method

Default to Use English Input Method in Specific Programs#

You need to check the info.plist of specific programs to find strings similar to package names, such as com.apple.Xcode. For specific operations, you can search online or refer to the official documentation: CustomizationGuide · rime/home Wiki · GitHub

Example of enabling in terminal and iterm2:

# --- Define initial state of English input for specific programs ---
app_options:
	com.termius-dmg.mac: # Terminal - Mac
		ascii_mode: true
	com.googlecode.iterm2: # iTerm2 - Mac
		ascii_mode: true

Dictionary Saved but Not Effective#

The entries in the dict should start with .... Some editors may automatically adjust the format, changing the yaml ... to ---, which will cause the configuration to be ineffective.

Example

---
name: wubi86_jidian_extra
note: "Extended Dictionary"
version: "2024-01-01"
last_edit_time: "2024-01-1"
dict_grouped: true # Dictionary management application recognition: mark this code table as grouped mode
sort: original
# Sorting method of the code table: by_weight weight, original original order

...  # <-<- Note here that it should be three dots, not three dashes
## Cars
朗逸	yvqk
验车	cwlg

References#

Help and Feedback | RIME | Zhongzhouyun Input Method Engine - Official Documentation
Squirrel Input Method Configuration – Ha Wu Wang - A particularly well-written third-party document, this article focuses on this as notes
App +1 | Hamster Input Method: Let iOS Use Rime Comfortably - Minority - Introduces the synchronization method of notes on the iOS side
Rime Configuration: Wusong Pinyin - Dvel's Blog