Sourcebot logo
    /

    Repo Status Progression

    Public

    Recent Chats

    Sign in to access your chat history.

    what are the different status' that a repo can be in, and how do they progress through them (ex NEW, IN_INDEX_QUEUE, etc)?

    Answer

    A repository's indexing status is managed by the RepoIndexingStatus enum, defined in packages/db/prisma/schema.prisma schema.prisma:13-22. The status indicates the current state of a repository within the indexing and garbage collection lifecycle.

    The possible statuses are:

    • NEW
    • IN_INDEX_QUEUE
    • INDEXING
    • INDEXED
    • FAILED
    • IN_GC_QUEUE
    • GARBAGE_COLLECTING
    • GARBAGE_COLLECTION_FAILED

    The progression through these statuses is handled by the RepoManager class in packages/backend/src/repoManager.ts repoManager.ts:35-622.

    Indexing Flow

    1. NEW: A repository starts with this status. The fetchAndScheduleRepoIndexing function repoManager.ts:131-170 periodically queries for repositories with this status.

    2. IN_INDEX_QUEUE: Repositories with a NEW status are moved to IN_INDEX_QUEUE by the scheduleRepoIndexingBulk function repoManager.ts:85-128, which also adds them to the indexing queue.

    3. INDEXING: The runIndexJob function repoManager.ts:292-345, executed by a worker, picks up a job from the queue and updates the repository's status to INDEXING repoManager.ts:312-319.

    4. INDEXED: Upon successful indexing, the onIndexJobCompleted function repoManager.ts:347-361 is triggered, setting the status to INDEXED repoManager.ts:352-360.

    5. FAILED: If indexing fails, the onIndexJobFailed function repoManager.ts:363-386 updates the status to FAILED repoManager.ts:377-384. A repository can also be marked as FAILED if it remains in the INDEXING state for an excessive amount of time, as determined by fetchAndScheduleRepoTimeouts repoManager.ts:591-605.

    Garbage Collection Flow

    1. IN_GC_QUEUE: Repositories that are no longer needed (e.g., have no connections or belong to an inactive organization) are identified by fetchAndScheduleRepoGarbageCollection repoManager.ts:412-467. This function then calls scheduleRepoGarbageCollectionBulk repoManager.ts:392-410 to change their status to IN_GC_QUEUE.

    2. GARBAGE_COLLECTING: A garbage collection worker executes runGarbageCollectionJob repoManager.ts:469-499, which updates the status to GARBAGE_COLLECTING repoManager.ts:474-481.

    3. DELETED: If garbage collection is successful, the onGarbageCollectionJobCompleted function repoManager.ts:500-510 is called, which then deletes the repository from the database repoManager.ts:505-509.

    4. GARBAGE_COLLECTION_FAILED: If garbage collection fails, onGarbageCollectionJobFailed repoManager.ts:512-535 is called, and the repository status is updated to GARBAGE_COLLECTION_FAILED repoManager.ts:526-533.

    schema.prisma
    repoManager.ts

    How does the SOURCEBOT_EE_LICENSE_KEY work?

    Answer

    The SOURCEBOT_EE_LICENSE_KEY is an environment variable used to unlock enterprise features in self-hosted Sourcebot instances. The core logic for handling this license key is located in packages/shared/src/entitlements.ts entitlements.ts.

    Here's a breakdown of how it works:

    License Key Structure and Validation

    1. Format: The license key is a string that must start with the prefix sourcebot_ee_ entitlements.ts:10-10. The remainder of the key is a base64-encoded JSON payload.

    2. Payload: After decoding, the JSON payload is validated against a schema, eeLicenseKeyPayloadSchema entitlements.ts:12-18, which expects the following fields:

      • id: A unique identifier for the license.
      • seats: The number of licensed user seats.
      • expiryDate: An ISO 8601 date string indicating when the license expires.
      • sig: A cryptographic signature.
    3. Signature Verification: The decodeLicenseKeyPayload function entitlements.ts:55-78 ensures the license key is authentic and has not been tampered with. It does this by:

      • Reconstructing the data that was originally signed (a JSON string containing the id, seats, and expiryDate).
      • Calling the verifySignature function entitlements.ts:67-67, which uses a public key (loaded from the path specified by the SOURCEBOT_PUBLIC_KEY_PATH environment variable) to verify the sig.
      • If signature verification fails, the application logs an error and exits entitlements.ts:68-71.
    4. Expiry Check: The getPlan function entitlements.ts:89-110 checks if the license is expired by comparing the expiryDate from the payload with the current server time. If the license is expired, the application will log an error and shut down entitlements.ts:101-104.

    Plans and Entitlements

    • Plans: The system uses "plans" to manage feature access. If a valid, non-expired license key is present, the plan is set to either "self-hosted:enterprise" or "self-hosted:enterprise-unlimited" based on the number of seats in the license entitlements.ts:106-106. If no license key is provided, the plan defaults to "oss" entitlements.ts:108-108.

    • Entitlements: Each plan is mapped to a specific set of "entitlements" in the entitlementsByPlan constant entitlements.ts:45-52. These entitlements are strings that represent features, such as "search-contexts", "sso", and "audit".

    • Feature Access: Throughout the application, the hasEntitlement function entitlements.ts:117-120 is used to check if a feature should be enabled. This function determines the current plan and checks if the required entitlement is included for that plan. For example, syncSearchContexts.ts syncSearchContexts.ts checks for the "search-contexts" entitlement before proceeding syncSearchContexts.ts:19-25.

    entitlements.ts
    syncSearchContexts.ts

    How difficult would it be to reverse engineer the SOURCEBOT_EE_LICENSE_KEY to generate a valid key?

    Answer

    Reverse-engineering a valid SOURCEBOT_EE_LICENSE_KEY would be computationally infeasible, making it practically impossible for an attacker who does not have access to Sourcebot's private signing key.

    The security of the license key hinges on the use of public-key cryptography to sign and verify the license payload. Here is a detailed breakdown of why this makes key generation so difficult:

    1. Asymmetric Cryptography: The system uses an asymmetric key pair: a private key, which is kept secret by Sourcebot, and a public key, which is distributed with the application.

      • The private key is used to create the cryptographic signature (sig) found within the license key payload entitlements.ts:12-18.
      • The public key is used by the Sourcebot application to verify that the signature is authentic. The path to this public key is specified by the SOURCEBOT_PUBLIC_KEY_PATH environment variable, and it's loaded by the decodeLicenseKeyPayload function entitlements.ts:55-78.
    2. Signature Verification Process: When the application starts, it performs the following steps:

      • It decodes the base64 payload of the license key.
      • It reconstructs the data that was signed (a string containing the id, seats, and expiryDate).
      • It uses the public key and the crypto.verify function entitlements.ts:79-87 to check if the provided signature (sig) is valid for the reconstructed data.
    3. The Security Barrier:

      • To generate a new, valid license key (for example, with a different number of seats or a later expiry date), you would need to create a new signature for your custom payload.
      • Creating this signature requires the private key.
      • The private key is not included in the Sourcebot codebase or distributed with the application. It is known only to Sourcebot.
      • It is computationally impossible to derive the private key from the public key or to forge a signature that will pass verification without having the private key.

    In summary, while you can easily inspect the code that validates the key entitlements.ts, you cannot generate a valid key yourself because you do not possess the secret private key required to sign it. Any attempt to alter the license payload or create a new one will result in a signature verification failure, causing the application to log an error and exit entitlements.ts:68-71.

    entitlements.ts

    where is the private key loaded from for the SOURCEBOT_EE_LICENSE_KEY validation?

    Answer

    That's a great question that gets to the heart of how the license verification works. The application, when validating the SOURCEBOT_EE_LICENSE_KEY, does not load a private key. Instead, it loads a public key.

    This is a fundamental principle of asymmetric cryptography, which is used to secure the license keys:

    • A private key, known only to Sourcebot, is used to create the signature (sig) on the license key payload.
    • A public key, which is distributed with the application, is used to verify that the signature is authentic.

    The public key is loaded from a file path specified by the SOURCEBOT_PUBLIC_KEY_PATH environment variable. This is handled by the getPublicKey function in packages/shared/src/entitlements.ts entitlements.ts:29-43.

    Here is the process:

    1. The getPublicKey function is called within the signature verification process entitlements.ts:81-81.
    2. It reads the value of the SOURCEBOT_PUBLIC_KEY_PATH environment variable entitlements.ts:32-32.
    3. If the environment variable is not set, the application will throw an error and exit entitlements.ts:33-35.
    4. It then reads the contents of the file at that path, which is expected to be the public key, and returns it as a string entitlements.ts:41-41.

    The private key itself is never included in the application or its source code. It is kept secret by Sourcebot, which is why it is impossible to generate a valid license key without it.

    entitlements.ts