How China-focused funds turn Weibo into alt-data (Python, 2026)

How China-focused funds turn Weibo into alt-data (Python, 2026)

专注中国市场的基金如何将微博转化为另类数据(Python,2026)

If you run a China book — equities, FX, commodities, or just a macro tilt — you already know the problem: the official numbers are slow and the English-language coverage is downstream of what already moved on Chinese social platforms. By the time a theme reaches Bloomberg, retail Weibo has been talking about it for days. 如果你在管理中国相关的投资组合——无论是股票、外汇、大宗商品还是宏观策略——你一定深知这个问题:官方数据发布滞后,而英文媒体的报道往往滞后于中国社交平台上的热点。当一个主题出现在彭博社(Bloomberg)时,微博上的散户投资者可能已经讨论了好几天。

Weibo (微博) is where Chinese consumer and retail-investor sentiment shows up first. 580M+ monthly actives, a public hot-search board that turns over hourly, and cashtag-style chatter on every listed name. The catch: there’s no official API for international developers, and the data is in Chinese. 微博(Weibo)是中国消费者和散户投资者情绪最先显现的地方。它拥有超过 5.8 亿的月活跃用户,每小时更新的公开热搜榜,以及针对每家上市公司的“股票标签”式讨论。难点在于:微博没有为国际开发者提供官方 API,且数据均为中文。

This post walks through how to pull Weibo into a usable alt-data feed with a few lines of Python — hot-search trend tracking, keyword/cashtag sentiment, and KOL post monitoring — using an Apify Actor I maintain, so you don’t have to babysit visitor cookies or rate limits. 本文将介绍如何通过几行 Python 代码,将微博数据转化为可用的另类数据源。我们将利用我维护的 Apify Actor 来实现热搜趋势追踪、关键词/股票标签情绪分析以及 KOL 帖子监控,这样你就无需费心处理访客 Cookie 或频率限制问题。

The three signals worth pulling

值得抓取的三个信号

  1. Hot search board (the leading indicator). Weibo’s trending board is the single fastest read on what 1.4B people are paying attention to. A brand, a policy rumor, a product recall, a CEO quote — it surfaces here first. For a fund, the delta matters more than the snapshot: what entered the board in the last hour, and how fast it’s climbing.

  2. 热搜榜(领先指标)。 微博热搜榜是了解 14 亿人关注焦点的最快途径。无论是品牌动态、政策传闻、产品召回还是 CEO 的言论,都会第一时间在这里浮现。对于基金而言,变化量(delta)比快照更重要:即过去一小时内有哪些话题进入榜单,以及它们的攀升速度如何。

  3. Keyword / cashtag sentiment. Search a ticker’s Chinese name, a brand, or a product line and you get the raw retail read — positive, negative, the volume of chatter, and which posts have reach. This is the consumer-demand nowcast that quarterly filings give you 90 days late.

  4. 关键词/股票标签情绪。 搜索股票的中文名称、品牌或产品线,你就能获得原始的散户观点——包括正面或负面情绪、讨论热度以及哪些帖子的传播力更强。这是对消费者需求的实时预测(nowcast),而季度财报往往会滞后 90 天。

  5. KOL post monitoring. A single finance or consumer KOL with 5M followers moves retail flows in hours. Tracking specific accounts’ posts (and their engagement velocity) is a cleaner signal than aggregate noise.

  6. KOL 帖子监控。 一位拥有 500 万粉丝的财经或消费类 KOL,能在数小时内带动散户资金流向。追踪特定账号的帖子(及其互动速度)比分析整体噪音信号更清晰。


Pull the hot-search board

抓取热搜榜

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("zhorex/weibo-scraper").call(run_input={
    "mode": "hot_search",
    "maxResults": 100,
})

for topic in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(topic["rank"], topic["title"], topic.get("heat"))

Run this on a cron every 30-60 minutes and diff consecutive snapshots. A topic that jumps 40 ranks in one hour is the alpha — not its absolute position. 每 30-60 分钟运行一次此脚本,并对比连续的快照。一个在一小时内排名跃升 40 位的话题才是 Alpha 信号,而不是它当前的绝对排名。


Keyword sentiment as a consumer nowcast

作为消费者实时预测的关键词情绪分析

Say you’re long a Chinese EV name and want the retail read before the delivery numbers print: 假设你做多了某家中国电动汽车公司,并希望在交付数据公布前了解散户的看法:

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("zhorex/weibo-scraper").call(run_input={
    "mode": "search",
    "searchQuery": "比亚迪", # BYD in Chinese — Chinese keywords yield far better recall
    "maxResults": 300,
})

df = pd.DataFrame(client.dataset(run["defaultDatasetId"]).iterate_items())

# Reach-weight the chatter: a 2M-follower account counts more than a burner.
df["reach"] = df["repostsCount"].fillna(0) + df["commentsCount"].fillna(0) + df["likesCount"].fillna(0)
print(df.sort_values("reach", ascending=False)[["text", "reach", "createdAt"]].head(10))

Pipe the text field through whatever Chinese sentiment model you already run (or a multilingual LLM) and you have a daily polarity series per name. Track the 7-day delta in mention volume + polarity and you’ve built a sentiment-velocity factor for the cost of a few cents per run. 将文本字段输入你现有的中文情绪模型(或多语言大模型),你就能得到每家公司的每日情绪极性序列。追踪 7 天内的提及量变化和情绪极性变化,你就能以每次运行几美分的成本,构建出一个“情绪速度”因子。


Build a daily China alt-data job

构建每日中国另类数据任务

The two actors that matter together: Weibo for broad consumer + retail sentiment, and the Xueqiu Scraper for finance-specific cashtag chatter (Xueqiu is China’s retail-investor forum — closer to a StockTwits read). Run both on the same cron, join on ticker, and you get consumer sentiment and investor sentiment side by side. 两个关键的 Actor 配合使用效果最佳:微博用于获取广泛的消费者和散户情绪,雪球爬虫(Xueqiu Scraper)用于获取金融相关的股票标签讨论(雪球是中国散户投资者的论坛,类似于 StockTwits)。在同一时间运行两者,通过股票代码关联,你就能同时获得消费者情绪和投资者情绪。

tickers = {"BYD": "比亚迪", "Pop Mart": "泡泡玛特", "Luckin": "瑞幸咖啡"}
rows = []

for name, zh in tickers.items():
    run = client.actor("zhorex/weibo-scraper").call(run_input={
        "mode": "search",
        "searchQuery": zh,
        "maxResults": 200,
    })
    items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
    rows.append({"name": name, "mentions": len(items)})

print(pd.DataFrame(rows).sort_values("mentions", ascending=False))

Diff today’s mention counts against a trailing 7-day mean and you have a chatter-velocity screen across your whole China book. 将今天的提及次数与过去 7 天的平均值进行对比,你就能对整个中国投资组合进行“讨论速度”筛选。


Pricing

定价

The Weibo Scraper is pay-per-event — you pay per item returned, no subscription, no seat fee. A 300-post sentiment pull is a few cents. A daily 20-ticker monitoring job across the month lands in the low tens of dollars. Compare that to a Bloomberg China module or a packaged alt-data feed and the math is not close. 微博爬虫采用按事件付费模式——按返回的数据条数计费,无订阅费,无席位费。抓取 300 条帖子的情绪分析仅需几美分。每月监控 20 只股票的日常任务成本仅为几十美元。与彭博社的中国模块或打包的另类数据源相比,成本优势显而易见。

Job VolumeRough monthly cost
Hourly hot-search tracker~70K topics/mo
20-ticker daily sentiment~120K posts/mo
One-off theme researcha few K posts
任务量预估月成本
每小时热搜追踪~7 万话题/月
20 只股票每日情绪监控~12 万帖子/月
一次性主题研究数千条帖子

See the Actor’s Pricing tab for the exact per-result rate. 请查看 Actor 的“定价”选项卡以获取准确的单条结果费率。


What this is NOT

这不是什么

  • Not real-time tick data. Cron-based polling; 30-60 min cadence is realistic and plenty for sentiment.
  • 不是实时逐笔交易数据。 基于 Cron 的轮询;30-60 分钟的频率对于情绪分析来说既现实又足够。
  • Not a sentiment model. It returns the raw posts + engagement + metadata. You bring (or plug in) the NLP.
  • 不是情绪模型。 它返回的是原始帖子、互动数据和元数据。你需要自行提供(或接入)NLP 模型。
  • Not authenticated content. Public surface only — hot search, public search results, public profiles. Some modes (user timelines) work better with your own session cookie, which is optional.
  • 不是受限内容。 仅限公开页面——热搜、公开搜索结果、公开个人资料。某些模式(如用户时间线)使用你自己的会话 Cookie 效果更好,但这并非必须。
  • Not financial advice or a signal in a box. It’s a data feed. The factor construction is yours.
  • 不是投资建议或现成的信号包。 它只是一个数据源,因子构建需要你自己完成。

The broader China stack

更广泛的中国数据栈

If Weibo is the consumer + retail-sentiment layer, the rest of the stack fills in the gaps: 如果说微博是消费者和散户情绪层,那么以下工具可以补全其余部分:

  • Xueqiu Scraper — retail-investor forum, cashtag-tagged, the finance-specific sentiment read.
  • 雪球爬虫 — 散户投资者论坛,带有股票标签,提供金融专项情绪解读。
  • RedNote / Xiaohongshu Scraper — consumer-brand and product sentiment, the highest-trust purchase-decision channel in China.
  • 小红书爬虫 — 消费品牌和产品情绪,中国信任度最高的购买决策渠道。
  • Bilibili Scraper — Gen-Z video sentiment and creator analytics.
  • Bilibili 爬虫 — Z 世代视频情绪和创作者分析。
  • Chinese Brand Monitor — if you’d rather not wire up four scrapers, this aggregates Weibo + RedNote + Bilibili + Douban + Xueqiu into one normalized, deduplicated, sentiment-tagged feed at a per-mention price.
  • 中国品牌监测器 — 如果你不想分别接入四个爬虫,该工具将微博、小红书、Bilibili、豆瓣和雪球的数据聚合为一个标准化、去重、带有情绪标签的数据源,按提及次数计费。