Skip to main content

Configuration

VectorWave provides robust and type-safe configuration management based on pydantic-settings. All settings can be controlled via a .env file or system environment variables.

1. Connection & Schema

Defines the Weaviate database connection and the collection (Table) names where data will be stored.

Variable NameDescriptionDefault Value
WEAVIATE_HOSTWeaviate instance host addresslocalhost
WEAVIATE_PORTHTTP port (REST API)8080
WEAVIATE_GRPC_PORTgRPC port (for bulk data transfer)50051
COLLECTION_NAMECollection name for storing function metadata (static info)VectorWaveFunctions
EXECUTION_COLLECTION_NAMECollection name for storing execution logs (dynamic info)VectorWaveExecutions
GOLDEN_COLLECTION_NAMECollection name for storing golden datasets for testingVectorWaveGoldenDataset
IS_VECTORIZE_COLLECTION_NAMEWhether to store function definitions (static data) (If False, only logs are stored)True

2. AI & Vectorizer Strategy

Sets the engine and model for embedding (Vectorizing) data.

Strategy Selection (VECTORIZER)

ValueDescription
huggingface(Default) Uses local sentence-transformers models. Free & secure.
openai_clientUses OpenAI API. High accuracy, costs apply.
weaviate_moduleDelegates to Weaviate's internal modules (text2vec-*).
noneDisables vectorization (storage only).

Detailed Settings

Variable NameDescriptionDefault Value
HF_MODEL_NAMEHuggingFace model name (used in local mode)sentence-transformers/all-MiniLM-L6-v2
OPENAI_API_KEYOpenAI API key (sk-...)None
WEAVIATE_VECTORIZER_MODULEModule name to use in Weaviate module modetext2vec-openai
WEAVIATE_GENERATIVE_MODULEWeaviate generative module name (for RAG)generative-openai

3. Performance & Batching

Balances real-time processing and bulk data processing. VectorWave uses Async Batching by default.

Variable NameDescriptionDefault Value
BATCH_THRESHOLDBatch buffer size. Sends to DB when this count is reached.20
FLUSH_INTERVAL_SECONDSForcibly sends if this time passes even if the buffer isn't full.2.0 (seconds)

Tip: In high-traffic production environments, increase BATCH_THRESHOLD to 100 or more to maximize throughput.

4. Security & Data Masking

Automatically masks sensitive personal information or secret keys to prevent them from being stored in the Vector DB.

Variable NameDescriptionDefault Value
SENSITIVE_FIELD_NAMESList of keywords to mask (comma separated). Arguments containing these keywords are replaced with [MASKED].password,api_key,token,secret,auth_token
# Example: password argument is automatically masked and stored
@vectorize
def login(username, password): ...

5. Monitoring & Alerting

Sends external notifications when an error occurs.

Variable NameDescriptionDefault Value
ALERTER_STRATEGYAlert strategy (webhook, log, none)none
ALERTER_WEBHOOK_URLSlack/Discord webhook URLNone
ALERTER_MIN_LEVELMinimum log level to trigger alerts (INFO, WARNING, ERROR)ERROR

6. Advanced Analytics

Settings for detecting data changes or for recommendation systems.

📉 Data Drift Detection

Detects when the distribution of input data deviates from training (or past) data.

Variable NameDescriptionDefault Value
DRIFT_DETECTION_ENABLEDWhether to enable drift detectionFalse
DRIFT_DISTANCE_THRESHOLDVector distance threshold to determine drift (0~2)0.25
DRIFT_NEIGHBOR_AMOUNTNumber of recent neighbors to compare5

🎯 Recommendation Margins

Adjusts the search range for similarity search.

Variable NameDescriptionDefault Value
RECOMMENDATION_STEADY_MARGINStable (similar) recommendation range0.05
RECOMMENDATION_DISCOVERY_MARGINNew (diverse) recommendation discovery range0.15

7. File Path Settings

Variable NameDescriptionDefault Value
CUSTOM_PROPERTIES_FILE_PATHPath to custom metadata schema file.weaviate_properties
FAILURE_MAPPING_FILE_PATHPath to error code mapping file.vectorwave_errors.json