How VictoriaLogs Stores Your Logs in a Columnar Layout

How VictoriaLogs Stores Your Logs in a Columnar Layout

VictoriaLogs 如何以列式布局存储日志

If you run VictoriaLogs, your day-to-day comes down to three things: sending logs, querying them, and setting retention so the disk does not fill up. Everything else happens quietly on disk. 如果你在使用 VictoriaLogs,你的日常工作主要围绕三件事:发送日志、查询日志以及设置保留策略以防止磁盘写满。除此之外的一切都在磁盘上静默发生。

This post follows a single log line from the moment it arrives to where it finally rests on disk, so you can picture what VictoriaLogs is doing under the hood and explain what you’re seeing: why your queries come back fast, why you sometimes see many files on disk, and which flags and metrics matter when something looks off. This article is for everyone, no programming background needed and no Go code to read. If you do want to go deeper, the VictoriaLogs source is always the reference. 本文将追踪一条日志从到达那一刻起到最终落盘的全过程,帮助你理解 VictoriaLogs 在底层的工作原理,并解释你所观察到的现象:为什么查询速度很快、为什么磁盘上偶尔会出现大量文件,以及当出现异常时哪些标志(flags)和指标(metrics)至关重要。本文适合所有人阅读,无需编程背景,也不涉及 Go 代码。如果你想深入研究,VictoriaLogs 的源代码永远是最好的参考。

1. A log line arrives

1. 日志到达

VictoriaLogs accepts logs over many protocols: JSON Lines, Elasticsearch bulk, Loki push, OpenTelemetry, syslog, and more (see the data ingestion docs for the full list). VictoriaLogs 支持通过多种协议接收日志:JSON Lines、Elasticsearch bulk、Loki push、OpenTelemetry、syslog 等(完整列表请参阅数据摄入文档)。

Whichever one you use, the first thing VictoriaLogs does is translate that record into a single internal shape that the rest of the system understands: a timestamp, a set of named fields, and a “stream identity”. 无论你使用哪种协议,VictoriaLogs 首先要做的是将记录转换为系统内部统一的格式:一个时间戳、一组命名字段以及一个“流标识(stream identity)”。

Every protocol maps to one internal shape. Each protocol has its own small processor that does this translation, and you can influence it either with query arguments or with headers in the request itself: 每种协议都会映射到同一种内部格式。每种协议都有一个小型处理器来执行此转换,你可以通过查询参数或请求头来干预这一过程:

  • Drop fields you do not want to store with the ignore_fields query argument or the VL-Ignore-Fields header. 使用 ignore_fields 查询参数或 VL-Ignore-Fields 请求头丢弃你不想存储的字段。
  • Strip terminal color codes from values with the decolorize_fields query argument or the VL-Decolorize-Fields header. 使用 decolorize_fields 查询参数或 VL-Decolorize-Fields 请求头去除值中的终端颜色代码。
  • Attach extra fields to every record with the extra_fields query argument or the VL-Extra-Fields header. 使用 extra_fields 查询参数或 VL-Extra-Fields 请求头为每条记录附加额外字段。
  • Point VictoriaLogs at the main message field (_msg) with the _msg_field query argument or the VL-Msg-Field header. 使用 _msg_field 查询参数或 VL-Msg-Field 请求头指定主消息字段(_msg)。
  • Tell it which field holds the timestamp with the _time_field query argument or the VL-Time-Field header. 使用 _time_field 查询参数或 VL-Time-Field 请求头指定时间戳字段。
  • Choose which fields define the stream identity with the _stream_fields query argument or the _stream_fields header. 使用 _stream_fields 查询参数或 _stream_fields 请求头选择定义流标识的字段。

Stream identity is the most important idea in this whole post. Logs that share the same stream fields are treated as a single stream, and you are the one who decides what that stream looks like. For example, set _stream_fields=pod,container, and all logs with the same pod and container form one stream. “流标识”是本文最重要的概念。共享相同流字段的日志被视为同一个流,而你可以决定流的构成。例如,设置 _stream_fields=pod,container,那么所有具有相同 pod 和 container 的日志将组成一个流。

VictoriaLogs keeps each stream’s logs together on disk, and that grouping is what makes them compress so well and lets a query touch only the streams it needs instead of scanning everything. VictoriaLogs 将每个流的日志存储在一起,这种分组方式不仅实现了极高的压缩率,还使得查询时只需访问相关的流,而无需扫描所有数据。

The practical rule for you as an operator: keep stream fields stable and low-cardinality, meaning they should have only a handful of distinct values, such as host, app, pod, or container, and keep high-cardinality values, ones with very many unique entries like trace_id or user_id, as normal fields, not stream fields. 作为运维人员,请遵循以下实践规则:保持流字段稳定且基数(cardinality)较低,即它们应该只有少量不同的值(如 host、app、pod 或 container);而对于高基数值(即拥有大量唯一条目的字段,如 trace_iduser_id),应将其作为普通字段,而非流字段。

Now, after receiving and normalizing the incoming records, VictoriaLogs does not handle them one at a time either. It accumulates them in an in-memory buffer and, about once a second (or sooner if the buffer fills up), turns the whole batch into a small searchable chunk that still lives in RAM (an in-memory part). 在接收并标准化传入的记录后,VictoriaLogs 不会逐条处理它们。它会将记录累积在内存缓冲区中,大约每秒一次(如果缓冲区满了则会更快),将整批数据转换为一个仍驻留在内存中的小型可搜索块(内存部分,in-memory part)。

That in-memory buffer is not a single shared queue. If every incoming batch had to line up for the same buffer, they would waste time waiting on each other, so VictoriaLogs splits the buffer into shards, one per CPU core, and spreads incoming batches across them in turn. 内存缓冲区并非单一的共享队列。如果每个传入的批次都要排队等待同一个缓冲区,会造成不必要的等待时间。因此,VictoriaLogs 将缓冲区拆分为多个分片(每个 CPU 核心一个),并轮流将传入的批次分配到这些分片中。

So on a 3-CPU machine there are 3 buffer shards filling in parallel, and each shard flushes on its own, writing its batch out as a new in-memory part about once a second. 因此,在 3 核 CPU 的机器上,会有 3 个缓冲区分片并行填充,每个分片独立刷新,大约每秒一次将其批次写入为一个新的内存部分。

A part is one of the core data structures across VictoriaLogs (and the other VictoriaMetrics products): a self-contained bundle of data that is searchable, which is to say queryable. “部分(Part)”是 VictoriaLogs(以及其他 VictoriaMetrics 产品)的核心数据结构之一:它是一个自包含的数据包,是可搜索的,即可以被查询。

Most of the time, the buffered batch is flushed into an in-memory part, but in some rare cases, if a batch is large enough to exceed the in-memory size limit, the part is written straight to disk as a small or big part instead. 大多数情况下,缓冲的批次会被刷新为内存部分;但在极少数情况下,如果批次过大超过了内存大小限制,该部分会直接作为小型或大型部分(small/big part)写入磁盘。

Tip: Metric vl_insert_flush_duration_seconds: how long it takes to turn a buffered batch into an in-memory part. 提示: 指标 vl_insert_flush_duration_seconds:表示将缓冲批次转换为内存部分所需的时间。

2. Daily partitions

2. 日度分区

When a batch is flushed, VictoriaLogs files each log into a partition, and a partition holds exactly one calendar day of logs (in UTC). It reads each log’s timestamp, works out which day it belongs to, and routes it there. 当批次被刷新时,VictoriaLogs 会将每条日志归档到一个分区中,一个分区正好容纳一个日历日(UTC 时间)的日志。它会读取每条日志的时间戳,计算出它所属的日期,并将其路由到相应分区。

In other words, your logs are separated by date. You can see this directly on disk, one directory per day: 换句话说,你的日志是按日期分隔的。你可以直接在磁盘上看到这一点,每天一个目录:

$ tree victoria-logs-data/
victoria-logs-data/
└── partitions/
    ├── 20260109
    ├── 20260110
    ├── 20260111
    └── 20260112

This per-day layout is not just an implementation detail; it is why two everyday operations are cheap: 这种按天布局的设计不仅仅是一个实现细节,它使得以下两项日常操作成本极低:

  • Retention is achieved by deleting whole day directories. When logs age out past -retentionPeriod (7 days by default), or when disk-based retention kicks in, VictoriaLogs drops entire day folders rather than hunting down individual log lines. 保留策略通过删除整天的目录来实现。当日志超过 -retentionPeriod(默认 7 天)或触发基于磁盘的保留策略时,VictoriaLogs 会直接删除整个日期文件夹,而不是去逐条查找日志行。
  • Queries are almost always time-bounded (_time:1h, _time:5m), so VictoriaLogs only has to open the day partitions that overlap your time range and can ignore the rest. 查询几乎总是带有时间限制(如 _time:1h, _time:5m),因此 VictoriaLogs 只需打开与查询时间范围重叠的日期分区,而忽略其余部分。

A partition is not purely a folder on disk. It has two faces: an on-disk side that holds the parts already written out, and an in-memory side that holds the buffer shards and the in-memory parts we just saw. 分区不仅仅是磁盘上的一个文件夹。它有两个侧面:一个是存储已写入部分的磁盘侧,另一个是存储缓冲区分片和内存部分的内存侧。

When you query a day, the partition serves results from both sides simultaneously, pulling the relevant parts from disk only when needed. 当你查询某一天的数据时,分区会同时从两侧提供结果,仅在需要时才从磁盘拉取相关部分。

Tip: Metrics vl_storage_parts counts how many parts exist, broken down by where they live: {type="storage/inmemory"} for parts still in memory, {type="storage/small"} and {type="storage/big"} for parts on disk. And vl_pending_rows{type="storage"} counts the rows. 提示: 指标 vl_storage_parts 统计了现有的部分数量,并按存储位置分类:{type="storage/inmemory"} 表示仍在内存中的部分,{type="storage/small"}{type="storage/big"} 表示磁盘上的部分。而 vl_pending_rows{type="storage"} 则统计了行数。