shiyu-coder / Kronos

Kronos: A Foundation Model for the Language of Financial Markets

Kronos：金融市场语言的基础模型

Kronos is the first open-source foundation model for financial candlesticks (K-lines), trained on data from over 45 global exchanges. Kronos 是首个针对金融蜡烛图（K 线）的开源基础模型，基于全球 45 个以上交易所的数据进行训练。

📰 News

📰 新闻

🚩 [2025.11.10] Kronos has been accepted by AAAI 2026. 🚩 [2025.11.10] Kronos 已被 AAAI 2026 录用。

🚩 [2025.08.17] We have released the scripts for fine-tuning! Check them out to adapt Kronos to your own tasks. 🚩 [2025.08.17] 我们已发布微调脚本！欢迎查看并根据您的任务需求适配 Kronos。

🚩 [2025.08.02] Our paper is now available on arXiv! 🚩 [2025.08.02] 我们的论文现已在 arXiv 上发布！

📜 Introduction

📜 简介

Kronos is a family of decoder-only foundation models, pre-trained specifically for the “language” of financial markets—K-line sequences. Unlike general-purpose TSFMs, Kronos is designed to handle the unique, high-noise characteristics of financial data. It leverages a novel two-stage framework: A specialized tokenizer first quantizes continuous, multi-dimensional K-line data (OHLCV) into hierarchical discrete tokens. A large, autoregressive Transformer is then pre-trained on these tokens, enabling it to serve as a unified model for diverse quantitative tasks. Kronos 是一个仅解码器（decoder-only）的基础模型系列，专门针对金融市场的“语言”——K 线序列进行了预训练。与通用时间序列基础模型（TSFM）不同，Kronos 旨在处理金融数据独特的高噪声特性。它利用了一种新颖的两阶段框架：首先，专门的标记器（Tokenizer）将连续的多维 K 线数据（OHLCV）量化为分层离散标记；随后，一个大型自回归 Transformer 在这些标记上进行预训练，使其能够作为处理多种量化任务的统一模型。

✨ Live Demo

✨ 在线演示

We have set up a live demo to visualize Kronos’s forecasting results. The webpage showcases a forecast for the BTC/USDT trading pair over the next 24 hours. 我们搭建了一个在线演示平台，用于可视化 Kronos 的预测结果。该网页展示了 BTC/USDT 交易对未来 24 小时的预测情况。

👉 Access the Live Demo Here 👉 点击此处访问在线演示

📦 Model Zoo

📦 模型库

We release a family of pre-trained models with varying capacities to suit different computational and application needs. All models are readily accessible from the Hugging Face Hub. 我们发布了一系列不同容量的预训练模型，以满足不同的计算和应用需求。所有模型均可从 Hugging Face Hub 轻松获取。

Model	Tokenizer	Context length	Params	Open-source
Kronos-mini	Kronos-Tokenizer-2k	2048	4.1M	✅ NeoQuasar/Kronos-mini
Kronos-small	Kronos-Tokenizer-base	512	24.7M	✅ NeoQuasar/Kronos-small
Kronos-base	Kronos-Tokenizer-base	512	102.3M	✅ NeoQuasar/Kronos-base
Kronos-large	Kronos-Tokenizer-base	512	499.2M	❌

🚀 Getting Started

🚀 快速入门

Installation 安装

Install Python 3.10+, and then install the dependencies: 安装 Python 3.10+，然后安装依赖项：

pip install -r requirements.txt

📈 Making Forecasts

📈 进行预测

Forecasting with Kronos is straightforward using the KronosPredictor class. It handles data preprocessing, normalization, prediction, and inverse normalization, allowing you to get from raw data to forecasts in just a few lines of code. 使用 KronosPredictor 类进行预测非常简单。它处理了数据预处理、归一化、预测和反归一化，让你只需几行代码即可从原始数据得到预测结果。

Important Note: The max_context for Kronos-small and Kronos-base is 512. This is the maximum sequence length the model can process. For optimal performance, it is recommended that your input data length (i.e., lookback) does not exceed this limit. The KronosPredictor will automatically handle truncation for longer contexts. 重要提示： Kronos-small 和 Kronos-base 的 max_context 为 512。这是模型能够处理的最大序列长度。为获得最佳性能，建议输入数据长度（即回溯窗口）不超过此限制。KronosPredictor 会自动处理超出长度的截断。

Here is a step-by-step guide to making your first forecast. 以下是进行首次预测的分步指南。

1. Load the Tokenizer and Model 1. 加载 Tokenizer 和模型

First, load a pre-trained Kronos model and its corresponding tokenizer from the Hugging Face Hub. 首先，从 Hugging Face Hub 加载预训练的 Kronos 模型及其对应的 Tokenizer。

from model import Kronos, KronosTokenizer, KronosPredictor

# Load from Hugging Face Hub
tokenizer = KronosTokenizer.from_pretrained("NeoQuasar/Kronos-Tokenizer-base")
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")

2. Instantiate the Predictor 2. 实例化预测器

Create an instance of KronosPredictor, passing the model, tokenizer, and desired device. 创建 KronosPredictor 实例，并传入模型、Tokenizer 和所需的设备。

# Initialize the predictor
predictor = KronosPredictor(model, tokenizer, max_context=512)

3. Prepare Input Data 3. 准备输入数据

The predict method requires three main inputs: predict 方法需要三个主要输入：

df: A pandas DataFrame containing the historical K-line data. It must include columns ['open', 'high', 'low', 'close']. volume and amount are optional.
df: 包含历史 K 线数据的 pandas DataFrame。必须包含 ['open', 'high', 'low', 'close'] 列。volume 和 amount 为可选列。
x_timestamp: A pandas Series of timestamps corresponding to the historical data in df.
x_timestamp: 对应 df 中历史数据的时间戳 pandas Series。
y_timestamp: A pandas Series of timestamps for the future periods you want to predict.
y_timestamp: 你想要预测的未来时间段的时间戳 pandas Series。

import pandas as pd

# Load your data
df = pd.read_csv("./data/XSHG_5min_600977.csv")
df['timestamps'] = pd.to_datetime(df['timestamps'])

# Define context window and prediction length
lookback = 400
pred_len = 120

# Prepare inputs for the predictor
x_df = df.loc[:lookback-1, ['open', 'high', 'low', 'close', 'volume', 'amount']]
x_timestamp = df.loc[:lookback-1, 'timestamps']
y_timestamp = df.loc[lookback:lookback+pred_len-1, 'timestamps']

4. Generate Forecasts 4. 生成预测

Call the predict method to generate forecasts. You can control the sampling process with parameters like T, top_p, and sample_count for probabilistic forecasting. 调用 predict 方法生成预测。你可以通过 T（温度）、top_p 和 sample_count 等参数控制采样过程，以进行概率预测。

# Generate predictions
pred_df = predictor.predict(
    df=x_df,
    x_timestamp=x_timestamp,
    y_timestamp=y_timestamp,
    pred_len=pred_len,
    T=1.0,          # Temperature for sampling
    top_p=0.9,      # Nucleus sampling probability
    sample_count=1  # Number of forecast paths to generate and average
)

print("Forecasted Data Head:")
print(pred_df.head())

The predict method returns a pandas DataFrame containing the forecasted values for open, high, low, close, volume, and amount, indexed by the y_timestamp you provided. predict 方法返回一个 pandas DataFrame，其中包含开盘价、最高价、最低价、收盘价、成交量和成交额的预测值，并以你提供的 y_timestamp 作为索引。

For efficient processing of multiple time series, Kronos provides a predict_batch method that enables parallel prediction on multiple datasets simultaneously. This is particularly useful when you need to forecast multiple assets or time periods at once. 为了高效处理多个时间序列，Kronos 提供了 predict_batch 方法，支持同时对多个数据集进行并行预测。当你需要一次性预测多种资产或多个时间段时，这特别有用。

# Prepare multiple datasets for batch prediction
df_list = [df1, df2, df3] # List of DataFrames
x_timestamp_list = [x_ts1, x_ts2, x_ts3] # List of historical timestamps
y_timestamp_list = [y_ts1, y_ts2, y_ts3] # List of future timestamps

# Generate batch predictions
pred_df_list = predictor.predict_batch(
    df_list=df_list,
    x_timestamp_list=x_timestamp_list,
    y_timestamp_list=y_timestamp_list,
    pred_len=pred_len,
    T=1.0,
    top_p=0.9,
    sample_count=1,
    verbose=True
)

# pred_df_list contains prediction results in the same order as input
for i, pred_df in enumerate(pred_df_list):
    print(f"Predictions for series {i}:")
    print(pred_df.head())

Important Requirements for Batch Prediction: 批量预测的重要要求：

All series must have the same historical length (lookback window).
所有序列必须具有相同的历史长度（回溯窗口）。
All series must have the same prediction length (pred_len).
所有序列必须具有相同的预测长度 (pred_len)。
Each DataFrame must contain the required columns: ['open', 'high', 'low', 'close'].
每个 DataFrame 必须包含必需的列：['open', 'high', 'low', 'close']。
volume and amount columns are optional and will be filled with zeros if missing.
volume 和 amount 列为可选，如果缺失将自动填充为零。
The predict_batch method leverages GPU parallelism for efficient processing and automatically handles normalization and denormalization for each series independently.
predict_batch 方法利用 GPU 并行处理以提高效率，并自动为每个序列独立处理归一化和反归一化。