Setting Up Your Own Large Language Model

Setting Up Your Own Large Language Model

搭建你自己的大语言模型

Large Language Models: Still a long way to go, but the future is promising 大语言模型:前路漫漫,但未来可期

You’ve likely seen the headlines: frontier AI models are increasingly at risk of being locked behind strict export controls or mounting API costs. As this technology embeds itself into our daily lives, the open-source movement isn’t just a philosophical preference, it is a necessary mechanism to keep AI in the hands of everyday users. We aren’t at parity yet; the proprietary models from the massive tech labs still hold a commanding lead in pure performance. But, we can hope that the gap is closing fast. Around the clock, an independent community of researchers and developers is pushing to ensure this technology is accessible to anyone with a computer. 你可能已经看到过这样的头条新闻:前沿 AI 模型正面临越来越大的风险,被锁定在严格的出口管制或不断上涨的 API 成本之后。随着这项技术深入我们的日常生活,开源运动不仅是一种哲学上的偏好,更是确保 AI 掌握在普通用户手中的必要机制。我们目前尚未达到同等水平;大型科技实验室的专有模型在纯性能上仍占据领先地位。但我们有理由期待,这种差距正在迅速缩小。世界各地的独立研究人员和开发者社区正夜以继日地努力,确保任何拥有电脑的人都能使用这项技术。

Today, the foundation for true democratization is already here: you can run a highly capable model entirely on your own laptop. For today’s experiment, I set out to find a large language model that can run entirely on my laptop — and use it for the simple tasks I’d normally hand off to a big lab model. We’ll install Qwen 3 8B on my MacBook Air, run it fully offline, and finally have a language model living on my own machine instead of a distant datacenter. The Qwen family of models have been trained by Alibaba (the chinese company) and are fully open source, available on the internet for everyone to download. The model has 9 billion weights and takes up around 6gb of your RAM when loaded. 今天,实现真正民主化的基础已经具备:你完全可以在自己的笔记本电脑上运行一个能力出众的模型。在今天的实验中,我打算寻找一个可以完全在笔记本上运行的大语言模型,并用它来处理我通常会交给大型实验室模型处理的简单任务。我们将把 Qwen 3 8B 安装在我的 MacBook Air 上,实现完全离线运行,让语言模型驻留在我的机器上,而不是遥远的数据中心里。Qwen 系列模型由阿里巴巴(中国公司)训练,完全开源,任何人都可以从互联网下载。该模型拥有 90 亿参数,加载时大约占用 6GB 内存。

What follows now is a practical, start-to-finish guide to running a proper local LLM on an Apple Silicon Mac and it includes the terminal commands you need. But before we open the terminal, we need to talk about why this is worth doing at all. 接下来是一份实用的、从头到尾的指南,教你在 Apple Silicon Mac 上运行一个正经的本地大语言模型,其中包含了你所需要的终端命令。但在打开终端之前,我们需要谈谈为什么这样做是值得的。

Why Do This?

为什么要这样做?

Most of the time, cloud models are better and easier. I’m not going to pretend an 8-billion parameter model on a laptop beats frontier AI. It doesn’t and I will keep using the massive cloud models for heavy lifting. But the constant pricing and sovereignity wars around AI may make open source and local models very relevant for a future where having access to the technology will make a huge difference. Every time you use Claude or ChatGPT, you are sending your data to some remote servers where the access can be blocked at any time. 大多数时候,云端模型更好用也更简单。我不会假装笔记本上的 80 亿参数模型能打败前沿 AI。它做不到,我依然会使用大型云端模型来处理繁重的工作。但围绕 AI 不断变化的定价和主权之争,使得开源和本地模型在未来变得至关重要——在那个时代,能否掌握这项技术将产生巨大的差异。每次你使用 Claude 或 ChatGPT 时,你都在将数据发送到远程服务器,而这些访问权限随时可能被封锁。

“Digital sovereignty” is a grand phrase for a very ordinary desire: we may want to own the thing that reads our most sensitive thoughts, the same way you own a physical notebook or keep some cash at home. A local model answers that cleanly in the AI world. Once it’s downloaded, nothing leaves the machine. No API keys, no shifting terms of service, no quiet data retention policies. You can pull the Wi-Fi card out and it keeps working. For the highly sensitive part of your work, that alone may be worth the price of admission. “数字主权”是一个宏大的词汇,背后却是一个非常朴素的愿望:我们希望拥有那个能读取我们最敏感思想的东西,就像你拥有一本实体笔记本或在家里存放现金一样。在 AI 世界里,本地模型完美地回应了这一点。一旦下载完成,没有任何数据会离开这台机器。没有 API 密钥,没有变动的服务条款,没有暗地里的数据保留政策。你可以拔掉 Wi-Fi 网卡,它依然能正常工作。对于工作中高度敏感的部分,仅凭这一点就值得尝试。

People love to say local models are “democratizing” AI. I want that to be true, but we aren’t there yet. Running this stack still assumes you own a €1,500 laptop with massive unified memory and you’re comfortable in a command line. That’s a narrow, lucky slice of the world. But the trajectory is democratizing. Two years ago, running a decent offline model required a dedicated workstation and serious technical pain. This weekend, it took me a couple of hours and 5 gigabytes of disk space. So let’s install the thing. 人们喜欢说本地模型正在“民主化”AI。我希望这是真的,但我们还没走到那一步。运行这套架构仍然假设你拥有一台价值 1500 欧元的笔记本电脑,配备大容量统一内存,并且熟悉命令行操作。这只是世界上极少数幸运儿才能做到的。但趋势确实在向民主化发展。两年前,运行一个像样的离线模型需要专门的工作站和极高的技术门槛。而这个周末,我只花了几个小时和 5GB 的磁盘空间就搞定了。那么,让我们开始安装吧。

The Machine and the Specs

机器与配置

I built this on a MacBook Air M4 with 24 GB of unified memory and about 235 GB of free storage. This was a fresh start: no Homebrew, no Python environment nightmares. The number that actually matters here is the 24 GB. Apple Silicon’s “unified memory” is the magic trick that makes Macs so exceptionally good at this. Because the CPU and GPU share the exact same memory pool, massive neural network weights don’t have to be sluggishly shuttled back and forth. An 8B model takes up about 5 GB on disk and sits at roughly 6 GB in memory when loaded. On a 24 GB machine, that’s deeply comfortable. You could run a 14B model and still keep dozens of browser tabs open. (If you’re on an 8 GB Mac, stick to the 1.5B or 3B models and close your other apps). 我是在一台配备 24GB 统一内存和约 235GB 可用存储空间的 MacBook Air M4 上完成的。这是一个全新的开始:没有 Homebrew,没有 Python 环境的噩梦。这里真正重要的数字是 24GB。Apple Silicon 的“统一内存”是让 Mac 在这方面表现如此出色的魔法。因为 CPU 和 GPU 共享完全相同的内存池,庞大的神经网络权重不需要在两者之间缓慢地来回传输。一个 8B 模型在磁盘上占用约 5GB,加载到内存中大约占用 6GB。在 24GB 的机器上,这非常轻松。你甚至可以运行一个 14B 模型,同时还开着几十个浏览器标签页。(如果你使用的是 8GB 内存的 Mac,请坚持使用 1.5B 或 3B 模型,并关闭其他应用程序)。

Why Ollama?

为什么选择 Ollama?

There are a dozen ways to run local AI, and most of them ask you to care about compiler flags and dependency trees. You shouldn’t have to. Ollama is an open source framework and tool that just works. It’s a single binary that bundles a highly optimized model runner (llama.cpp using Apple’s Metal for GPU acceleration), a Docker-style model registry, and a local HTTP API. You install it, you pull a model, and you talk to it. That’s it! 运行本地 AI 的方法有很多种,但大多数都需要你关注编译器标志和依赖树。你不应该被这些所困扰。Ollama 是一个开箱即用的开源框架和工具。它是一个单一的二进制文件,捆绑了一个高度优化的模型运行器(使用 Apple Metal 进行 GPU 加速的 llama.cpp)、一个 Docker 风格的模型注册表以及一个本地 HTTP API。你安装它,拉取一个模型,然后就可以与它对话了。就是这么简单!

Step 1: Install Ollama (No Homebrew Required)

第一步:安装 Ollama(无需 Homebrew)

Ollama ships as a standard macOS app in a zip file. The command-line interface (CLI) lives secretly inside the app bundle, so we can set it up entirely by hand. Ollama 以标准 macOS 应用程序的形式打包在 zip 文件中。命令行界面 (CLI) 隐藏在应用程序包内,因此我们可以完全手动进行设置。

# Download the Apple Silicon build
cd ~/Downloads
curl -L -o Ollama-darwin.zip https://ollama.com/download/Ollama-darwin.zip

# Unzip and move the app into your Applications folder
unzip -o -q Ollama-darwin.zip
mv Ollama.app /Applications/

If you don’t know how to open the terminal, just go to your Mac applications and search for “terminal”: 如果你不知道如何打开终端,只需进入 Mac 的应用程序文件夹并搜索“terminal”即可。

Step 2: Put Ollama on Your PATH

第二步:将 Ollama 添加到 PATH

I didn’t want to fight with sudo permissions in /usr/local/bin, so I symlinked the bundled CLI into a local directory I own — this is just a handy shortcut to speed up the installation and spin up the LLM. 我不想处理 /usr/local/bin 中的 sudo 权限问题,所以我将捆绑的 CLI 符号链接到了我拥有的本地目录中——这只是一个方便的快捷方式,可以加快安装速度并启动 LLM。

# Create a local bin directory and symlink the CLI
mkdir -p ~/.local/bin
ln -sf /Applications/Ollama.app/Contents/Resources/ollama ~/.local/bin/ollama

# Make it permanent in your zsh profile
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

# Apply it to your current shell
export PATH="$HOME/.local/bin:$PATH"
ollama --version

Step 3: Start the Server

第三步:启动服务器

Ollama runs a lightweight background server to expose the API and manage your computer’s memory. Ollama 运行一个轻量级的后台服务器来暴露 API 并管理你电脑的内存。

# Start the server and log output
mkdir -p ~/.ollama/logs
nohup ollama serve > ~/.ollama/logs/serve.log 2>&1 &

# Ping it to check if it's alive
curl -s http://localhost:11434