stefan-jansen / machine-learning-for-trading

ML for Trading - 2nd Edition This book aims to show how ML can add value to algorithmic trading strategies in a practical yet comprehensive way. It covers a broad range of ML techniques from linear regression to deep reinforcement learning and demonstrates how to build, backtest, and evaluate a trading strategy driven by model predictions.

《机器学习交易》第二版 本书旨在以实用且全面的方式展示机器学习（ML）如何为算法交易策略增加价值。书中涵盖了从线性回归到深度强化学习等广泛的机器学习技术，并演示了如何构建、回测和评估由模型预测驱动的交易策略。

In four parts with 23 chapters plus an appendix, it covers on over 800 pages: important aspects of data sourcing, financial feature engineering, and portfolio management, the design and evaluation of long-short strategies based on supervised and unsupervised ML algorithms, how to extract tradeable signals from financial text data like SEC filings, earnings call transcripts or financial news, using deep learning models like CNN and RNN with market and alternative data, how to generate synthetic data with generative adversarial networks, and training a trading agent using deep reinforcement learning.

全书分为四个部分，共 23 章及一个附录，篇幅超过 800 页。内容涵盖：数据获取、金融特征工程和投资组合管理的重要方面；基于监督和无监督机器学习算法的多空策略设计与评估；如何从 SEC 文件、财报电话会议记录或财经新闻等金融文本数据中提取可交易信号；利用 CNN 和 RNN 等深度学习模型处理市场及替代数据；如何利用生成对抗网络（GAN）生成合成数据；以及使用深度强化学习训练交易智能体。

This repo contains over 150 notebooks that put the concepts, algorithms, and use cases discussed in the book into action. They provide numerous examples that show: how to work with and extract signals from market, fundamental and alternative text and image data, how to train and tune models that predict returns for different asset classes and investment horizons, including how to replicate recently published research, and how to design, backtest, and evaluate trading strategies.

本仓库包含 150 多个 Jupyter Notebook，将书中讨论的概念、算法和用例付诸实践。它们提供了大量示例，展示了：如何处理并从市场、基本面以及替代性的文本和图像数据中提取信号；如何训练和调优模型以预测不同资产类别和投资期限的收益（包括如何复现近期发表的研究成果）；以及如何设计、回测和评估交易策略。

We highly recommend reviewing the notebooks while reading the book; they are usually in an executed state and often contain additional information not included due to space constraints. In addition to the information in this repo, the book’s website contains chapter summary and additional information.

我们强烈建议在阅读本书时同步查看这些 Notebook；它们通常处于已执行状态，且往往包含因篇幅限制而未在书中呈现的额外信息。除本仓库内容外，本书官网还提供章节摘要及补充信息。

Join the ML4T Community! To make it easy for readers to ask questions about the book’s content and code examples, as well as the development and implementation of their own strategies and industry developments, we are hosting an online platform. Please join our community and connect with fellow traders interested in leveraging ML for trading strategies, share your experience, and learn from each other!

加入 ML4T 社区！ 为了方便读者就书中的内容、代码示例，以及各自策略的开发与实现、行业动态等进行提问，我们搭建了一个在线平台。欢迎加入我们的社区，与同样对利用机器学习进行交易策略感兴趣的交易者交流，分享经验并共同进步！

What’s new in the 2nd Edition? First and foremost, this book demonstrates how you can extract signals from a diverse set of data sources and design trading strategies for different asset classes using a broad range of supervised, unsupervised, and reinforcement learning algorithms. It also provides relevant mathematical and statistical knowledge to facilitate the tuning of an algorithm or the interpretation of the results. Furthermore, it covers the financial background that will help you work with market and fundamental data, extract informative features, and manage the performance of a trading strategy.

第二版有哪些新内容？ 首先，本书展示了如何从多种数据源中提取信号，并利用广泛的监督、无监督和强化学习算法为不同资产类别设计交易策略。书中还提供了相关的数学和统计学知识，以辅助算法调优或结果解读。此外，本书涵盖了金融背景知识，帮助读者处理市场和基本面数据、提取有效特征并管理交易策略的绩效。

From a practical standpoint, the 2nd edition aims to equip you with the conceptual understanding and tools to develop your own ML-based trading strategies. To this end, it frames ML as a critical element in a process rather than a standalone exercise, introducing the end-to-end ML for trading workflow from data sourcing, feature engineering, and model optimization to strategy design and backtesting.

从实践角度来看，第二版旨在为您提供开发机器学习交易策略所需的理论理解和工具。为此，本书将机器学习视为整个流程中的关键环节，而非孤立的练习，并介绍了从数据获取、特征工程、模型优化到策略设计与回测的端到端机器学习交易工作流。

More specifically, the ML4T workflow starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating ML models suited to the predictive task. Finally, it requires developing trading strategies to act on the models’ predictive signals, as well as simulating and evaluating their performance on historical data using a backtesting engine. Once you decide to execute an algorithmic strategy in a real market, you will find yourself iterating over this workflow repeatedly to incorporate new information and a changing environment.

具体而言，ML4T 工作流始于为明确的投资范围构思策略、收集相关数据并提取有效特征。它还涉及设计、调优和评估适合预测任务的机器学习模型。最后，需要开发交易策略以根据模型的预测信号进行操作，并使用回测引擎在历史数据上模拟和评估其表现。一旦您决定在真实市场中执行算法策略，您将发现自己需要不断迭代这一工作流，以纳入新信息并适应不断变化的环境。

The second edition’s emphasis on the ML4T workflow translates into a new chapter on strategy backtesting, a new appendix describing over 100 different alpha factors, and many new practical applications. We have also rewritten most of the existing content for clarity and readability. The trading applications now use a broader range of data sources beyond daily US equity prices, including international stocks and ETFs. It also demonstrates how to use ML for an intraday strategy with minute-frequency equity data. Furthermore, it extends the coverage of alternative data sources to include SEC filings for sentiment analysis and return forecasts, as well as satellite images to classify land use.

第二版对 ML4T 工作流的强调体现在：新增了关于策略回测的章节，新增了描述 100 多种不同 Alpha 因子的附录，以及许多新的实际应用。我们还重写了大部分现有内容，以提高清晰度和可读性。现在的交易应用使用了比每日美股价格更广泛的数据源，包括国际股票和 ETF。书中还演示了如何利用分钟级股票数据进行日内交易策略的机器学习应用。此外，替代数据源的覆盖范围也得到了扩展，包括用于情感分析和收益预测的 SEC 文件，以及用于土地利用分类的卫星图像。

Another innovation of the second edition is to replicate several trading applications recently published in top journals: Chapter 18 demonstrates how to apply convolutional neural networks to time series converted to image format for return predictions based on Sezer and Ozbahoglu (2018). Chapter 20 shows how to extract risk factors conditioned on stock characteristics for asset pricing using autoencoders based on Autoencoder Asset Pricing Models by Shihao Gu, Bryan T. Kelly, and Dacheng Xiu (2019), and Chapter 21 shows how to create synthetic training data using generative adversarial networks based on Time-series Generative Adversarial Networks by Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar (2019). All applications now use the latest available (at the time of writing) software versions such as pandas 1.0 and TensorFlow 2.2. There is also a customized version of Zipline that makes it easy to include machine learning model predictions when designing a trading strategy.

第二版的另一个创新是复现了近期发表在顶级期刊上的几项交易应用：第 18 章演示了如何基于 Sezer 和 Ozbahoglu (2018) 的研究，将时间序列转换为图像格式并应用卷积神经网络进行收益预测。第 20 章展示了如何基于 Shihao Gu、Bryan T. Kelly 和 Dacheng Xiu (2019) 的《自动编码器资产定价模型》，利用自动编码器提取以股票特征为条件的风险因子进行资产定价。第 21 章展示了如何基于 Jinsung Yoon、Daniel Jarrett 和 Mihaela van der Schaar (2019) 的《时间序列生成对抗网络》，利用生成对抗网络创建合成训练数据。所有应用现在均使用（撰写时）最新的软件版本，如 pandas 1.0 和 TensorFlow 2.2。此外，还提供了一个定制版的 Zipline，使得在设计交易策略时能够轻松集成机器学习模型的预测结果。

Installation, data sources and bug reports The code examples rely on a wide range of Python libraries from the data science and finance domains. It is not necessary to try and install all libraries at once because this increases the likelihood of encountering version conflicts. Instead, we recommend that you install the libraries required for a specific chapter as you go along. Update March 2022: zipline-reloaded, pyfolio-reloaded, alphalens-reloaded, and empyrical-reloaded are now available on.

安装、数据源与错误报告 代码示例依赖于数据科学和金融领域的多种 Python 库。无需尝试一次性安装所有库，因为这会增加遇到版本冲突的可能性。相反，我们建议您在阅读过程中，根据特定章节的需求逐步安装所需的库。2022 年 3 月更新：zipline-reloaded、pyfolio-reloaded、alphalens-reloaded 和 empyrical-reloaded 现已可用。