SaaS and Hosting Providers
Activate AI-as-a-Service in your region without setting up the machinery: inference endpoints per client with usage control, white-label reinstallation per account and migration between backends/vector stores. All self-hosted, without lock-in.
Channel policy: SMEs and final enterprises are served through authorized partners. We provide technical support (L3); no direct functional consulting to SMEs.
Why it's for you
You work with SMEs in housing or private cloud and they ask for "AI" without changing their infrastructure.
You want to offer inference as a service without setting up a different stack per client.
You need white label per account, usage limits and export for billing.
You want independence from a single model/vector store (migration and fallback).
What you can offer your clients (simple catalog)
Inference endpoints per client: one endpoint per account with their model/prompt/versioning.
Chatbots/Q&A with evidence: RAG on client documents (local Qdrant or commercial provider).
Internal copilots: support/sales/operations with A/B and metrics.
Vector Store Hosting: managed embedding storage and query, with migration and verification.
Domain fine-tuning (lite): PEFT/adapters with limited datasets and basic evaluation.
How you implement it (technical)
Base installation (15–30 min)
Opt-in stack: Prometheus, Qdrant, PostgreSQL + pgvector, MinIO, Kafka, Redis, RabbitMQ, MLflow. Isolation by namespace/volumes, no forced dependencies.
Per-client configuration
One namespace/tenant per account, token/QPS budgets, API keys, logs/export and white-label per client. Migration between backends/vector stores without downtime.
Monitoring and billing
Usage metrics, latency, throughput and cost per request. Export to CSV/API for billing. Alerts and SLOs per client.
Pricing model (suggested)
Per active client
Fixed monthly price per account (independent of usage). Includes: endpoint, base model, basic RAG, metrics and L3 support.
Per usage (tokens/requests)
Variable price according to real consumption. Includes: automatic billing, budgets and alerts, migration between backends.
Hybrid
Fixed base + variable for excess. Includes: availability guarantee, automatic scaling, complete white-label.
What your implementation includes
OEM Light: endpoints per client with budgets, API keys, logs/export and white-label.
Operational RAG: local Qdrant + one commercial vector store, migration with rollback.
Basic training: PEFT/adapters with evaluation and model versioning.
Metrics and billing: usage, cost, latency and export for billing.
Migration without downtime: between backends, vector stores and models.
Isolated installation: opt-in stack, no forced dependencies, 15–30 min typical.