Sourcebot logo
    /

    Zoekt Shard Memory Layout

    Public (Read-only)

    Recent Chats

    Sign in to access your chat history.

    How are shards laid out in memory in Zoekt?

    Answer

    In Zoekt, shards are managed in memory by the shardedSearcher struct, which holds a collection of rankedShard pointers. Each rankedShard represents a single shard and contains a zoekt.Searcher instance, which is responsible for the actual search operations on that shard's data.

    Here's a breakdown of how shards are laid out in memory:

    Shard Management: shardedSearcher

    The shardedSearcher struct, defined in shards/shards.go shards.go:201-213, is the primary component for managing shards in memory. It contains:

    • A map of rankedShard pointers: shards map[string]*rankedShard shards.go:209-209. The key of the map is the file path of the shard.
    • A sorted slice of rankedShard pointers: ranked atomic.Value shards.go:212-212. This slice is sorted by priority to optimize search performance.
    • A scheduler to limit the number of parallel queries.

    Shards are loaded from disk via the loadShard function shards.go:1200-1217, which in turn calls zoekt.NewSearcher to create a Searcher for the shard's index file.

    In-Memory Shard Representation: indexData

    The zoekt.Searcher interface is implemented by the indexData struct, defined in read.go. The NewSearcher function read.go:564-577 reads an IndexFile and populates an indexData struct with the shard's data.

    The indexData struct holds all the data for a single shard in memory. This includes:

    • File Contents: The raw content of all files in the shard is stored in a byte slice. The boundaries field read.go:279-279 stores the start and end offsets for each file's content.
    • N-grams and Postings Lists: Zoekt uses an n-gram index for fast substring searches.
      • contentNgrams: A btreeIndex for the n-grams found in file contents read.go:314-314.
      • fileNameNgrams: A btreeIndex for n-grams in file names read.go:331-331.
      • The btreeIndex itself doesn't store the postings lists directly in memory but reads them from the IndexFile on demand.
    • File and Branch Metadata:
      • fileNameContent and fileNameIndex: The names of all files in the shard read.go:324-329.
      • fileBranchMasks: Bitmasks that associate files with their respective branches read.go:319-319.
      • repoMetaData: A slice of Repository structs containing metadata for each repository in the shard read.go:265-265.
    • Symbols and Sections:
      • symbols: Holds data related to symbol definitions and their kinds read.go:285-285.
      • docSectionsIndex: An index for document sections, used for structured data read.go:283-283.

    The IndexFile that indexData reads from is often a memory-mapped file, which allows for efficient access to the shard's data without loading the entire file into RAM at once. The readSectionBlob function read.go:198-200 is used to read chunks of data from the IndexFile as needed.

    shards.go
    read.go