togatoga / karukan

Karukan: A Japanese Input System for Linux and macOS — Neural Kana-Kanji Conversion Engine

Karukan：面向 Linux 和 macOS 的日语输入法 — 神经 Kana-Kanji 转换引擎

Project Structure / 项目结构

Crate	Description / 说明
karukan-fcitx5	IME frontend for Linux — fcitx5 addon + C FFI / 面向 Linux 的 IME 前端 — fcitx5 插件 + C FFI
karukan-macos	IME frontend for macOS — Swift/InputMethodKit / 面向 macOS 的 IME 前端 — Swift/InputMethodKit
karukan-im	Shared IME engine — state machine, romaji conversion, karukan-imserver (JSON-RPC server for macOS) / 共享 IME 引擎 — 状态机、罗马字转换、karukan-imserver（面向 macOS 的 JSON-RPC 服务器）
karukan-engine	Core library — Romaji-to-Hiragana conversion + neural Kana-Kanji conversion via llama.cpp / 核心库 — 罗马字转平假名 + 基于 llama.cpp 的神经 Kana-Kanji 转换
karukan-cli	CLI tools & server — dictionary build, Sudachi dictionary generation, dictionary viewer, AJIMEE-Bench, HTTP server / CLI 工具与服务器 — 词典构建、Sudachi 词典生成、词典查看器、AJIMEE-Bench、HTTP 服务器

Features / 特性

Neural Kana-Kanji Conversion: Advanced Japanese conversion using a GPT-2 based model inferred via llama.cpp. 神经 Kana-Kanji 转换： 使用基于 GPT-2 的模型并通过 llama.cpp 进行推理，实现高级日语转换。

Live Conversion: Real-time display of conversion results as you type. Conversion proceeds without pressing Space (Toggle ON/OFF with Ctrl+Shift+L). 实时转换： 输入时实时显示转换结果。无需按空格键即可进行转换（使用 Ctrl+Shift+L 开启/关闭）。

Context Awareness: Japanese conversion that considers surrounding text. 上下文感知： 考虑周边文本的日语转换。

Conversion Learning: Remembers user-selected conversion results and prioritizes them in future conversions. Supports predictive conversion (prefix matching) to suggest learned candidates while typing. 转换学习： 记忆用户选择的转换结果，并在后续转换中优先显示。支持预测转换（前缀匹配），在输入过程中即可提示已学习的候选词。

System Dictionary: Built from SudachiDict dictionary data. 系统词典： 基于 SudachiDict 词典数据构建。

Candidate Rewriter (Ported from Mozc): Automatically generates candidates for half-width Katakana, uppercase/lowercase/full-width/half-width English, symbols, and various numeric notations (Kanji numerals, Daiji, Roman numerals, circled numbers, hex/octal/binary). Each candidate includes annotations derived from Mozc (e.g., “Half-width Katakana”, “Hexadecimal”). 候选词重写器（移植自 Mozc）： 自动生成半角片假名、英文字母大小写/全角半角、符号以及各种数字表示（汉字数字、大字、罗马数字、带圈数字、16/8/2 进制）的相关候选词。每个候选词都附带源自 Mozc 的注释（如“半角片假名”、“16 进制”等）。

Emoji Input: Supports both Kana reading (e.g., pien → 🥺, kinniku → 💪) and Slack-style :trigger queries (e.g., :smile → 😄, :halo → 😇). Emoji 输入： 同时支持假名读音（如 pien → 🥺, kinniku → 💪）和 Slack 风格的 :trigger 查询（如 :smile → 😄, :halo → 😇）。

Note: The model is downloaded from Hugging Face upon the first launch, so there may be a delay before the first conversion starts. Subsequent launches will use the cached model. 注意： 首次启动时会从 Hugging Face 下载模型，因此首次转换开始前会有一定的延迟。第二次及以后启动将使用已下载的模型。

Installation / 安装

Linux (fcitx5): Refer to the karukan-fcitx5 README. / 请参考 karukan-fcitx5 的 README。
macOS: Refer to the karukan-macos README. / 请参考 karukan-macos 的 README。

License / 许可证

Provided under a dual license of MIT OR Apache-2.0. 以 MIT 或 Apache-2.0 双重许可证提供。

Data under karukan-engine/data/ includes derivatives from Mozc (Google’s Japanese Input System), which is distributed under the BSD 3-Clause License. Please refer to THIRD_PARTY_LICENSES for the origin of each derivative file and Mozc’s copyright notice. karukan-engine/data/ 下包含源自 Mozc（Google 日语输入法）的数据，该部分遵循 BSD 3-Clause License 分发。有关各衍生文件的来源及 Mozc 的版权声明，请参阅 THIRD_PARTY_LICENSES。