Recent Chats
Sign in to access your chat history.
How are shards laid out in memory in Zoekt?
Details
Answer
In Zoekt, shards are managed in memory by the shardedSearcher struct, which holds a collection of rankedShard pointers. Each rankedShard represents a single shard and contains a zoekt.Searcher instance, which is responsible for the actual search operations on that shard's data.
Here's a breakdown of how shards are laid out in memory:
shardedSearcherThe shardedSearcher struct, defined in shards/shards.go shards.go:201-213, is the primary component for managing shards in memory. It contains:
rankedShard pointers: shards map[string]*rankedShard shards.go:209-209. The key of the map is the file path of the shard.rankedShard pointers: ranked atomic.Value shards.go:212-212. This slice is sorted by priority to optimize search performance.Shards are loaded from disk via the loadShard function shards.go:1200-1217, which in turn calls zoekt.NewSearcher to create a Searcher for the shard's index file.
indexDataThe zoekt.Searcher interface is implemented by the indexData struct, defined in read.go. The NewSearcher function read.go:564-577 reads an IndexFile and populates an indexData struct with the shard's data.
The indexData struct holds all the data for a single shard in memory. This includes:
boundaries field read.go:279-279 stores the start and end offsets for each file's content.contentNgrams: A btreeIndex for the n-grams found in file contents read.go:314-314.fileNameNgrams: A btreeIndex for n-grams in file names read.go:331-331.btreeIndex itself doesn't store the postings lists directly in memory but reads them from the IndexFile on demand.fileNameContent and fileNameIndex: The names of all files in the shard read.go:324-329.fileBranchMasks: Bitmasks that associate files with their respective branches read.go:319-319.repoMetaData: A slice of Repository structs containing metadata for each repository in the shard read.go:265-265.symbols: Holds data related to symbol definitions and their kinds read.go:285-285.docSectionsIndex: An index for document sections, used for structured data read.go:283-283.The IndexFile that indexData reads from is often a memory-mapped file, which allows for efficient access to the shard's data without loading the entire file into RAM at once. The readSectionBlob function read.go:198-200 is used to read chunks of data from the IndexFile as needed.
