The Smart TV in Your LivingRoom Is a Node in the AIScraping Economy
The Smart TV in Your Living Room Is a Node in the AI Scraping Economy
你客厅里的智能电视,正成为 AI 数据抓取经济中的一个节点
The work at Include Security has us working with AI day in and day out (hacking it, using it, training it, etc). We’re all aware of the community-level opposition happening against datacenters, aimed at improving AI capabilities, being built recently. What you might not be aware of are the distributed efforts to train AI that could be using the devices inside your home. In this post, we’re going to explore how the company Bright Data facilitates modern AI models scraping training data from the Internet using its residential proxy network.
在 Include Security,我们的工作日常就是与 AI 打交道(包括破解、使用、训练等)。我们都意识到,近期社区层面对于旨在提升 AI 能力的数据中心建设存在不少反对声音。但你可能没意识到的是,为了训练 AI,一些分布式手段可能正在利用你家里的设备。在这篇文章中,我们将探讨 Bright Data 公司如何利用其住宅代理网络,协助现代 AI 模型从互联网上抓取训练数据。
Bright Data is a data-collection company that sells access to what it markets as the world’s largest residential proxy network of 400M+ home IP addresses that its customers route web-scraping traffic through. The supply behind that network comes from an SDK: a piece of software embedded in consumer apps that, with the user’s consent, turns their phone or smart TV into one of those exit nodes.
Bright Data 是一家数据收集公司,它销售的是其所谓的“全球最大住宅代理网络”的访问权限,该网络拥有超过 4 亿个家庭 IP 地址,客户可以通过这些地址路由网络抓取流量。该网络背后的供应源于一个 SDK:这是一种嵌入在消费类应用中的软件,在用户同意的情况下,它会将用户的手机或智能电视变成这些代理网络中的一个出口节点。
We’ll document what you, the average user, should know about what this company’s SDK does on your systems such as your mobile phone and your smart TV. We’re going to explore how their SDK works, which platforms have shipped it, and why your Internet-connected TV is the ultimate proxy for AI models looking to train on data scraped from the Internet.
我们将记录作为普通用户的你,应该了解这家公司的 SDK 在你的手机和智能电视等系统上做了什么。我们将探讨其 SDK 的工作原理、哪些平台搭载了它,以及为什么你联网的电视会成为 AI 模型从互联网抓取训练数据的终极代理。
Why This Matters Now
为什么现在这很重要
AI companies depend on web-scraped content: for pre-training, for retrieval, for agent grounding, for search. But the modern web isn’t scrapeable from a datacenter. Cloudflare, DataDome, HUMAN, among others throttle or block requests from known cloud IPs. The workaround is residential proxies. A scraping job routed through a Comcast or T-Mobile subscriber’s connection arrives at the target site from an IP that belongs to a paying residential customer.
AI 公司依赖于网络抓取的内容:用于预训练、检索、智能体基础(agent grounding)和搜索。但现代网络无法直接从数据中心进行抓取。Cloudflare、DataDome、HUMAN 等服务商会限制或阻止来自已知云 IP 的请求。解决办法就是使用住宅代理。通过 Comcast 或 T-Mobile 用户连接路由的抓取任务,在到达目标网站时,其 IP 地址显示为付费的住宅用户。
Krebs reported in October 2025 that “a glut of proxies from Aisuru and other sources is fueling large-scale data harvesting efforts tied to various AI projects.” Academic measurement going back to 2019 shows these networks are overwhelmingly misused. The FBI issued a formal advisory earlier this year. Most of the existing press has focused on the illegal residential-proxy supply: botnets (Aisuru, Kimwolf), trojanized apps (HUMAN Security’s PROXYLIB disclosure), pre-infected IoT hardware (Google/Mandiant’s IPIDEA takedown). These are the bad actors.
Krebs 在 2025 年 10 月报道称,“来自 Aisuru 和其他来源的大量代理正在助推与各种 AI 项目相关的大规模数据采集工作。”追溯到 2019 年的学术测量显示,这些网络绝大多数被滥用。FBI 在今年早些时候发布了正式警告。目前大多数媒体报道集中在非法的住宅代理供应上:僵尸网络(Aisuru、Kimwolf)、木马化应用(HUMAN Security 对 PROXYLIB 的披露)、预感染的物联网硬件(Google/Mandiant 对 IPIDEA 的打击)。这些都是恶意行为者。
On the other hand, the legal supply side has received far less scrutiny. Today Bright Data is the largest residential proxy network in the world by its own marketing, advertising “150M+ IPs” sourced via a consent SDK embedded in partner apps. This research documents how that SDK works, which platforms have shipped it, and why the connected-TV is the ultimate residential proxy.
另一方面,合法的供应方受到的审查要少得多。如今,根据其自身的营销宣传,Bright Data 是全球最大的住宅代理网络,宣称通过嵌入在合作伙伴应用中的“同意 SDK”获取了“超过 1.5 亿个 IP”。本研究记录了该 SDK 的工作原理、哪些平台搭载了它,以及为什么联网电视是终极的住宅代理。
Why Connected TV (CTV) is the Ideal Proxy
为什么联网电视 (CTV) 是理想的代理
Connected TV, a.k.a Smart TV, is a near-perfect residential proxy. Compared to a mobile phone:
联网电视(即智能电视)是一种近乎完美的住宅代理。与手机相比:
| Factor | Mobile phone | Smart TV / CTV |
|---|---|---|
| Power | Battery most of the day | Always plugged in |
| Network | WiFi + cellular | Always WiFi, high-speed |
| Uptime | Intermittent | 24/7 in standby |
| Bandwidth ceiling | Low (cellular caps) | Effectively unlimited |
| User attention | Actively used | Often unattended |
| Consent UI | Text on a phone screen | Text navigated via TV remote arrow keys |
| Corporate/family oversight | Higher (MDM, mobile EDR) | Virtually none |
| 因素 | 手机 | 智能电视 / CTV |
|---|---|---|
| 电源 | 大部分时间靠电池 | 始终插电 |
| 网络 | WiFi + 蜂窝数据 | 始终连接 WiFi,高速 |
| 运行时间 | 间歇性 | 24/7 待机 |
| 带宽上限 | 低(受蜂窝数据限制) | 实际上无限制 |
| 用户关注度 | 主动使用 | 通常无人看管 |
| 同意界面 | 手机屏幕上的文字 | 通过电视遥控器方向键导航的文字 |
| 企业/家庭监管 | 较高 (MDM, 移动 EDR) | 几乎没有 |
A TV never hits 1% battery, jumps between WiFi networks or gets locked when the user is asleep. Some partner publishers do disclose the Bright Data relationship in their privacy policies PlayWorks is one example. But privacy-policy disclosure is the wrong control surface for a TV. It is hard to scroll through a legal document navigated by arrow keys on a remote, and the in-app consent dialog, doesn’t convey that a paying Bright Data customer is about to route their scraping traffic through the user’s home internet.
电视不会出现电量不足 1% 的情况,不会在 WiFi 网络间切换,也不会在用户睡觉时锁定。一些合作伙伴确实在隐私政策中披露了与 Bright Data 的关系,PlayWorks 就是一个例子。但隐私政策披露对于电视来说并不是合适的控制界面。通过遥控器方向键浏览法律文档非常困难,而且应用内的同意对话框也没有传达出:Bright Data 的付费客户即将通过用户的家庭网络路由其抓取流量这一事实。
Petflix, a Roku app documented by The Verge, is a representative case. Its opt-in screen reads: “To enjoy Petflix for free with fewer ads, you are allowing Bright Data to occasionally use your device’s free resources and IP address to download public web data from the internet. Bright Data will only use your IP address for approved business-related use cases. None of your personal information is accessed or collected except your IP address. Period.”
The Verge 报道的 Roku 应用 Petflix 是一个典型案例。其选择加入界面写道:“为了免费享受 Petflix 并减少广告,您允许 Bright Data 偶尔使用您设备的空闲资源和 IP 地址从互联网下载公共网络数据。Bright Data 仅会将您的 IP 地址用于经批准的商业相关用例。除您的 IP 地址外,不会访问或收集您的任何个人信息。仅此而已。”
The Petflix dialog says “occasionally.” The SDK’s publicly queryable config sets max_bw_monthly_wifi: 200,000,000,000 bytes — a 200 GB default monthly WiFi budget.
Petflix 的对话框说的是“偶尔”。但该 SDK 可公开查询的配置中设置了 max_bw_monthly_wifi: 200,000,000,000 bytes —— 即每月 200 GB 的默认 WiFi 流量预算。
Who Bright Data Names as Partners
Bright Data 列出的合作伙伴
Bright Data exposes a partner manifest endpoint. The endpoint is unauthenticated and anyone can fetch it. Names in the manifest that I was able to identify with high confidence from public sources:
Bright Data 暴露了一个合作伙伴清单端点。该端点无需身份验证,任何人都可以获取。我从公开来源中能够高置信度识别出的清单名称包括:
| Partner ID (from config) | Entity | Scale |
|---|---|---|
| playworks_digital | PlayWorks Digital Ltd | 400+ CTV game titles; reach ~250M TV homes via Comcast, Sky, Cox, LG, Samsung, Vizio, Roku |
| cloudtv | CloudTV | Integrated across 125+ TV brands and 15+ OEMs |
| longvision_media_hong_kong_co_limited | Longvision Media HK (LongTV) | 5M OTT users across HK and Malaysia |
| viber_media_s_r_l | Viber Media S.à r.l. (Rakuten) | 250M–820M monthly users of the Viber messenger |
| supercent_inc | Supercent (Korea) | #1 Korean mobile publisher by downloads in 2023 |
| moonfrog_labs_private_limited | Moonfrog Labs (Stillfront subsidiary) | ~10M MAU on Teen Patti Gold alone; acquired for $90M |
| hola_networks | Hola Networks | Bright Data’s lineage parent; user base reported in the tens to ~100M+ range at peak per Hola’s own historical marketing |
| 合作伙伴 ID (来自配置) | 实体 | 规模 |
|---|---|---|
| playworks_digital | PlayWorks Digital Ltd | 400+ 款 CTV 游戏;通过 Comcast, Sky, Cox, LG, Samsung, Vizio, Roku 覆盖约 2.5 亿家庭 |
| cloudtv | CloudTV | 集成在 125+ 个电视品牌和 15+ 个 OEM 中 |
| longvision_media_hong_kong_co_limited | Longvision Media HK (LongTV) | 香港和马来西亚拥有 500 万 OTT 用户 |
| viber_media_s_r_l | Viber Media S.à r.l. (Rakuten) | Viber 通讯软件月活用户 2.5 亿至 8.2 亿 |
| supercent_inc | Supercent (韩国) | 2023 年下载量排名第一的韩国移动发行商 |
| moonfrog_labs_private_limited | Moonfrog Labs (Stillfront 子公司) | 仅 Teen Patti Gold 就有约 1000 万月活;以 9000 万美元被收购 |
| hola_networks | Hola Networks | Bright Data 的母公司;根据 Hola 历史营销数据,用户群在巅峰时期曾达到数千万至 1 亿以上 |
Others (desoline, free_time, ott_studio, global_microtrading, m_m_media, easystaff_lp) are present but less identifiable from public sources. bright_screensavers, bright_videos, and brightdata are Bright Data’s own apps.
其他(如 desoline, free_time, ott_studio, global_microtrading, m_m_media, easystaff_lp)也存在,但从公开来源较难识别。bright_screensavers, bright_videos 和 brightdata 是 Bright Data 自己的应用。
A note on what the partner list proves: Being listed in Bright Data’s config means an integration might have existed at some point. It does not by itself prove that a specific publisher’s currently-shipping app(s) includes the SDK in production. For any named publisher, per-app verification is required. What the partner list does directly prove: Bright Data ships this roster in an unauthenticated public endpoint.
关于合作伙伴列表的说明:出现在 Bright Data 的配置中意味着集成可能在某个时间点存在过。它本身并不能证明特定发行商当前发布的应用在生产环境中包含了该 SDK。对于任何被点名的发行商,都需要进行逐个应用的验证。但该列表直接证明了:Bright Data 在一个无需身份验证的公共端点中发布了这份名单。