Recent Chats
Sign in to access your chat history.
How are shards laid out in memory in Zoekt?
Details
Answer
In Zoekt, shards are managed in memory by the shardedSearcher
struct, which holds a collection of rankedShard
pointers. Each rankedShard
represents a single shard and contains a zoekt.Searcher
instance, which is responsible for the actual search operations on that shard's data.
Here's a breakdown of how shards are laid out in memory:
shardedSearcher
The shardedSearcher
struct, defined in shards/shards.go
shards.go:201-213, is the primary component for managing shards in memory. It contains:
rankedShard
pointers: shards map[string]*rankedShard
shards.go:209-209. The key of the map is the file path of the shard.rankedShard
pointers: ranked atomic.Value
shards.go:212-212. This slice is sorted by priority to optimize search performance.Shards are loaded from disk via the loadShard
function shards.go:1200-1217, which in turn calls zoekt.NewSearcher
to create a Searcher
for the shard's index file.
indexData
The zoekt.Searcher
interface is implemented by the indexData
struct, defined in read.go
. The NewSearcher
function read.go:564-577 reads an IndexFile
and populates an indexData
struct with the shard's data.
The indexData
struct holds all the data for a single shard in memory. This includes:
boundaries
field read.go:279-279 stores the start and end offsets for each file's content.contentNgrams
: A btreeIndex
for the n-grams found in file contents read.go:314-314.fileNameNgrams
: A btreeIndex
for n-grams in file names read.go:331-331.btreeIndex
itself doesn't store the postings lists directly in memory but reads them from the IndexFile
on demand.fileNameContent
and fileNameIndex
: The names of all files in the shard read.go:324-329.fileBranchMasks
: Bitmasks that associate files with their respective branches read.go:319-319.repoMetaData
: A slice of Repository
structs containing metadata for each repository in the shard read.go:265-265.symbols
: Holds data related to symbol definitions and their kinds read.go:285-285.docSectionsIndex
: An index for document sections, used for structured data read.go:283-283.The IndexFile
that indexData
reads from is often a memory-mapped file, which allows for efficient access to the shard's data without loading the entire file into RAM at once. The readSectionBlob
function read.go:198-200 is used to read chunks of data from the IndexFile
as needed.