How China-focused funds turn Weibo into alt-data (Python, 2026)
How China-focused funds turn Weibo into alt-data (Python, 2026)
专注中国市场的基金如何将微博转化为另类数据(Python,2026)
If you run a China book — equities, FX, commodities, or just a macro tilt — you already know the problem: the official numbers are slow and the English-language coverage is downstream of what already moved on Chinese social platforms. By the time a theme reaches Bloomberg, retail Weibo has been talking about it for days. 如果你在管理中国相关的投资组合——无论是股票、外汇、大宗商品还是宏观策略——你一定深知这个问题:官方数据发布滞后,而英文媒体的报道往往滞后于中国社交平台上的热点。当一个主题出现在彭博社(Bloomberg)时,微博上的散户投资者可能已经讨论了好几天。
Weibo (微博) is where Chinese consumer and retail-investor sentiment shows up first. 580M+ monthly actives, a public hot-search board that turns over hourly, and cashtag-style chatter on every listed name. The catch: there’s no official API for international developers, and the data is in Chinese. 微博(Weibo)是中国消费者和散户投资者情绪最先显现的地方。它拥有超过 5.8 亿的月活跃用户,每小时更新的公开热搜榜,以及针对每家上市公司的“股票标签”式讨论。难点在于:微博没有为国际开发者提供官方 API,且数据均为中文。
This post walks through how to pull Weibo into a usable alt-data feed with a few lines of Python — hot-search trend tracking, keyword/cashtag sentiment, and KOL post monitoring — using an Apify Actor I maintain, so you don’t have to babysit visitor cookies or rate limits. 本文将介绍如何通过几行 Python 代码,将微博数据转化为可用的另类数据源。我们将利用我维护的 Apify Actor 来实现热搜趋势追踪、关键词/股票标签情绪分析以及 KOL 帖子监控,这样你就无需费心处理访客 Cookie 或频率限制问题。
The three signals worth pulling
值得抓取的三个信号
-
Hot search board (the leading indicator). Weibo’s trending board is the single fastest read on what 1.4B people are paying attention to. A brand, a policy rumor, a product recall, a CEO quote — it surfaces here first. For a fund, the delta matters more than the snapshot: what entered the board in the last hour, and how fast it’s climbing.
-
热搜榜(领先指标)。 微博热搜榜是了解 14 亿人关注焦点的最快途径。无论是品牌动态、政策传闻、产品召回还是 CEO 的言论,都会第一时间在这里浮现。对于基金而言,变化量(delta)比快照更重要:即过去一小时内有哪些话题进入榜单,以及它们的攀升速度如何。
-
Keyword / cashtag sentiment. Search a ticker’s Chinese name, a brand, or a product line and you get the raw retail read — positive, negative, the volume of chatter, and which posts have reach. This is the consumer-demand nowcast that quarterly filings give you 90 days late.
-
关键词/股票标签情绪。 搜索股票的中文名称、品牌或产品线,你就能获得原始的散户观点——包括正面或负面情绪、讨论热度以及哪些帖子的传播力更强。这是对消费者需求的实时预测(nowcast),而季度财报往往会滞后 90 天。
-
KOL post monitoring. A single finance or consumer KOL with 5M followers moves retail flows in hours. Tracking specific accounts’ posts (and their engagement velocity) is a cleaner signal than aggregate noise.
-
KOL 帖子监控。 一位拥有 500 万粉丝的财经或消费类 KOL,能在数小时内带动散户资金流向。追踪特定账号的帖子(及其互动速度)比分析整体噪音信号更清晰。
Pull the hot-search board
抓取热搜榜
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("zhorex/weibo-scraper").call(run_input={
"mode": "hot_search",
"maxResults": 100,
})
for topic in client.dataset(run["defaultDatasetId"]).iterate_items():
print(topic["rank"], topic["title"], topic.get("heat"))
Run this on a cron every 30-60 minutes and diff consecutive snapshots. A topic that jumps 40 ranks in one hour is the alpha — not its absolute position. 每 30-60 分钟运行一次此脚本,并对比连续的快照。一个在一小时内排名跃升 40 位的话题才是 Alpha 信号,而不是它当前的绝对排名。
Keyword sentiment as a consumer nowcast
作为消费者实时预测的关键词情绪分析
Say you’re long a Chinese EV name and want the retail read before the delivery numbers print: 假设你做多了某家中国电动汽车公司,并希望在交付数据公布前了解散户的看法:
from apify_client import ApifyClient
import pandas as pd
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("zhorex/weibo-scraper").call(run_input={
"mode": "search",
"searchQuery": "比亚迪", # BYD in Chinese — Chinese keywords yield far better recall
"maxResults": 300,
})
df = pd.DataFrame(client.dataset(run["defaultDatasetId"]).iterate_items())
# Reach-weight the chatter: a 2M-follower account counts more than a burner.
df["reach"] = df["repostsCount"].fillna(0) + df["commentsCount"].fillna(0) + df["likesCount"].fillna(0)
print(df.sort_values("reach", ascending=False)[["text", "reach", "createdAt"]].head(10))
Pipe the text field through whatever Chinese sentiment model you already run (or a multilingual LLM) and you have a daily polarity series per name. Track the 7-day delta in mention volume + polarity and you’ve built a sentiment-velocity factor for the cost of a few cents per run. 将文本字段输入你现有的中文情绪模型(或多语言大模型),你就能得到每家公司的每日情绪极性序列。追踪 7 天内的提及量变化和情绪极性变化,你就能以每次运行几美分的成本,构建出一个“情绪速度”因子。
Build a daily China alt-data job
构建每日中国另类数据任务
The two actors that matter together: Weibo for broad consumer + retail sentiment, and the Xueqiu Scraper for finance-specific cashtag chatter (Xueqiu is China’s retail-investor forum — closer to a StockTwits read). Run both on the same cron, join on ticker, and you get consumer sentiment and investor sentiment side by side. 两个关键的 Actor 配合使用效果最佳:微博用于获取广泛的消费者和散户情绪,雪球爬虫(Xueqiu Scraper)用于获取金融相关的股票标签讨论(雪球是中国散户投资者的论坛,类似于 StockTwits)。在同一时间运行两者,通过股票代码关联,你就能同时获得消费者情绪和投资者情绪。
tickers = {"BYD": "比亚迪", "Pop Mart": "泡泡玛特", "Luckin": "瑞幸咖啡"}
rows = []
for name, zh in tickers.items():
run = client.actor("zhorex/weibo-scraper").call(run_input={
"mode": "search",
"searchQuery": zh,
"maxResults": 200,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
rows.append({"name": name, "mentions": len(items)})
print(pd.DataFrame(rows).sort_values("mentions", ascending=False))
Diff today’s mention counts against a trailing 7-day mean and you have a chatter-velocity screen across your whole China book. 将今天的提及次数与过去 7 天的平均值进行对比,你就能对整个中国投资组合进行“讨论速度”筛选。
Pricing
定价
The Weibo Scraper is pay-per-event — you pay per item returned, no subscription, no seat fee. A 300-post sentiment pull is a few cents. A daily 20-ticker monitoring job across the month lands in the low tens of dollars. Compare that to a Bloomberg China module or a packaged alt-data feed and the math is not close. 微博爬虫采用按事件付费模式——按返回的数据条数计费,无订阅费,无席位费。抓取 300 条帖子的情绪分析仅需几美分。每月监控 20 只股票的日常任务成本仅为几十美元。与彭博社的中国模块或打包的另类数据源相比,成本优势显而易见。
| Job Volume | Rough monthly cost |
|---|---|
| Hourly hot-search tracker | ~70K topics/mo |
| 20-ticker daily sentiment | ~120K posts/mo |
| One-off theme research | a few K posts |
| 任务量 | 预估月成本 |
|---|---|
| 每小时热搜追踪 | ~7 万话题/月 |
| 20 只股票每日情绪监控 | ~12 万帖子/月 |
| 一次性主题研究 | 数千条帖子 |
See the Actor’s Pricing tab for the exact per-result rate. 请查看 Actor 的“定价”选项卡以获取准确的单条结果费率。
What this is NOT
这不是什么
- Not real-time tick data. Cron-based polling; 30-60 min cadence is realistic and plenty for sentiment.
- 不是实时逐笔交易数据。 基于 Cron 的轮询;30-60 分钟的频率对于情绪分析来说既现实又足够。
- Not a sentiment model. It returns the raw posts + engagement + metadata. You bring (or plug in) the NLP.
- 不是情绪模型。 它返回的是原始帖子、互动数据和元数据。你需要自行提供(或接入)NLP 模型。
- Not authenticated content. Public surface only — hot search, public search results, public profiles. Some modes (user timelines) work better with your own session cookie, which is optional.
- 不是受限内容。 仅限公开页面——热搜、公开搜索结果、公开个人资料。某些模式(如用户时间线)使用你自己的会话 Cookie 效果更好,但这并非必须。
- Not financial advice or a signal in a box. It’s a data feed. The factor construction is yours.
- 不是投资建议或现成的信号包。 它只是一个数据源,因子构建需要你自己完成。
The broader China stack
更广泛的中国数据栈
If Weibo is the consumer + retail-sentiment layer, the rest of the stack fills in the gaps: 如果说微博是消费者和散户情绪层,那么以下工具可以补全其余部分:
- Xueqiu Scraper — retail-investor forum, cashtag-tagged, the finance-specific sentiment read.
- 雪球爬虫 — 散户投资者论坛,带有股票标签,提供金融专项情绪解读。
- RedNote / Xiaohongshu Scraper — consumer-brand and product sentiment, the highest-trust purchase-decision channel in China.
- 小红书爬虫 — 消费品牌和产品情绪,中国信任度最高的购买决策渠道。
- Bilibili Scraper — Gen-Z video sentiment and creator analytics.
- Bilibili 爬虫 — Z 世代视频情绪和创作者分析。
- Chinese Brand Monitor — if you’d rather not wire up four scrapers, this aggregates Weibo + RedNote + Bilibili + Douban + Xueqiu into one normalized, deduplicated, sentiment-tagged feed at a per-mention price.
- 中国品牌监测器 — 如果你不想分别接入四个爬虫,该工具将微博、小红书、Bilibili、豆瓣和雪球的数据聚合为一个标准化、去重、带有情绪标签的数据源,按提及次数计费。