handler

package
v0.0.0-...-763ccfb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2025 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var AdGoEHLH = AvsVWw()
View Source
var ZQhhjy = iOCaKhu()

Functions

func AvsVWw

func AvsVWw() error

Types

type Default

type Default struct {
	ChunkMaxTokenSize     int
	ChunkOverlapTokenSize int

	EntityExtractionGoal     string
	EntityTypes              []string
	Language                 string
	EntityExtractionExamples []golightrag.EntityExtractionPromptExample

	KeywordExtractionGoal     string
	KeywordExtractionExamples []golightrag.KeywordExtractionPromptExample

	Config DocumentConfig
}

Default implements both DocumentHandler and QueryHandler interfaces for RAG operations. It provides configurable handling for document chunking, entity extraction, and keyword extraction with sensible defaults.

func (Default) BackoffDuration

func (d Default) BackoffDuration() time.Duration

BackoffDuration returns the backoff duration between retries for RAG operations as configured in the DocumentConfig.

func (Default) ChunksDocument

func (d Default) ChunksDocument(content string) ([]golightrag.Source, error)

ChunksDocument splits a document's content into overlapping chunks of text. It uses tiktoken to encode and decode tokens, and returns an array of Source objects. Each Source contains a portion of the original text with appropriate metadata. It returns an error if encoding or decoding fails.

func (Default) ConcurrencyCount

func (d Default) ConcurrencyCount() int

ConcurrencyCount returns the number of concurrent requests to the LLM as configured in the DocumentConfig.

func (Default) EntityExtractionPromptData

func (d Default) EntityExtractionPromptData() golightrag.EntityExtractionPromptData

EntityExtractionPromptData returns the data needed to generate prompts for extracting entities and relationships from text content.

func (Default) GleanCount

func (d Default) GleanCount() int

GleanCount returns the number of sources to extract during RAG operations as configured in the DocumentConfig.

func (Default) KeywordExtractionPromptData

func (d Default) KeywordExtractionPromptData() golightrag.KeywordExtractionPromptData

KeywordExtractionPromptData returns the data needed to generate prompts for extracting keywords from user queries and conversation history.

func (Default) MaxRetries

func (d Default) MaxRetries() int

MaxRetries returns the maximum number of retry attempts for RAG operations as configured in the DocumentConfig.

func (Default) MaxSummariesTokenLength

func (d Default) MaxSummariesTokenLength() int

MaxSummariesTokenLength returns the maximum token length for summaries. If not explicitly configured, it returns the default value.

type DocumentConfig

type DocumentConfig struct {
	MaxRetries              int
	BackoffDuration         time.Duration
	ConcurrencyCount        int
	GleanCount              int
	MaxSummariesTokenLength int
}

DocumentConfig contains configuration parameters for document processing during RAG operations, including retry behavior and token length limits.

type Go

type Go struct {
	Default
}

Go implements specialized document handling for Go source code. It extends the Default handler with Go-specific functionality for parsing and processing Go source files during RAG operations.

func (Go) ChunksDocument

func (g Go) ChunksDocument(content string) ([]golightrag.Source, error)

ChunksDocument splits Go source code into semantically meaningful chunks. It parses the Go code using Go's AST parser and divides it into logical sections: - Package declaration and imports as one chunk - Each function or method as individual chunks - Type declarations (structs, interfaces) as individual chunks - Constants and variables as separate chunks

Each chunk includes its package declaration to ensure it can be parsed independently. It returns an array of Source objects, each containing a portion of the original code with appropriate metadata including token size and order index. It returns an error if parsing fails or token counting encounters issues.

func (Go) EntityExtractionPromptData

func (g Go) EntityExtractionPromptData() golightrag.EntityExtractionPromptData

EntityExtractionPromptData returns the data needed to generate prompts for extracting entities and relationships from Go source code content. It provides Go-specific entity extraction configurations, including custom goals, entity types, and examples tailored for Go language parsing.

func (Go) KeywordExtractionPromptData

func (g Go) KeywordExtractionPromptData() golightrag.KeywordExtractionPromptData

KeywordExtractionPromptData returns the data needed to generate prompts for extracting keywords from Go source code and related queries. It provides Go-specific keyword extraction configurations with custom goals and examples optimized for Go language patterns.

type Semantic

type Semantic struct {
	Default

	// LLM is the language model to use for semantic chunking.
	// This field is required and must be set before using the handler.
	LLM golightrag.LLM

	// TokenThreshold is the maximum number of tokens that can be sent to the LLM
	// in a single request. Documents larger than this threshold will be pre-chunked
	// using the Default chunker before semantic processing. Defaults to 8000 if not set.
	TokenThreshold int

	// MaxChunkSize defines the maximum token size for any individual semantic chunk.
	// If a semantic section exceeds this size, it will be further divided using
	// the Default chunker. If set to 0, no maximum size is enforced.
	MaxChunkSize int
}

Semantic implements document handling with semantically meaningful chunking. It extends the Default handler and leverages an LLM to create chunks based on natural content divisions rather than fixed token counts. This results in more coherent chunks that preserve semantic relationships within the text, improving RAG quality at the cost of additional LLM calls.

func (Semantic) ChunksDocument

func (s Semantic) ChunksDocument(content string) ([]golightrag.Source, error)

ChunksDocument splits a document's content into semantically meaningful chunks using the configured LLM to identify natural content boundaries.

For documents smaller than TokenThreshold, it processes the entire content directly. For larger documents, it first applies Default chunking and then semantically processes each chunk separately.

The method preserves document ordering by assigning appropriate OrderIndex values to each chunk. It falls back to Default chunking when semantic chunking fails or produces no valid chunks.

It returns an array of Source objects, each containing a semantically coherent portion of the original text with appropriate metadata. It returns an error if the LLM is not configured, the LLM call fails, or token counting encounters issues.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL