Training and Custom Models

Create specialized models for your domain and stack without complex infrastructure: efficient fine-tuning (PEFT), adapters and merge, with reproducible evaluation. Register and deploy in one click.

"Verified compatibility" indicates internal testing until 09/2025. Some advanced functions appear as In development or Planned.

Why it's different

  • Production-oriented: models by domain and by stack (Technology/Domain Packs), not generic.
  • Real efficiency: PEFT (LoRA/QLoRA) to reduce cost/VRAM and accelerate iteration.
  • Automatic evaluation before deployment: quality, security, robustness, latency and cost.
  • Complete traceability: datasets, parameters, weights and versions with MLflow.

Training modes

PEFT (LoRA/QLoRA)
efficient fine-tuning on modest GPUs or accelerated CPU; ideal for specific domains.
GA
Adapters
activate/deactivate specializations by stack/domain without retraining the core.
GA
Model Merge
combines bases and specializations (e.g., "stack + domain") with stability controls.
Planned

Training Workflow

Domain model training workflow

Guided process from data to deployed model

Training packs

⚙️

Technology Packs

models fine-tuned to stacks and versions (frameworks, libraries and conventions).

🎯

Domain Packs

models fine-tuned to verticals (support, banking, retail, health) with specific metrics and evaluations.

Guided flow (from data to production)

1

Data

load your dataset (QA/code/tickets/manuals); validations and balancing.

2

Configuration

choose base and method (LoRA/QLoRA/Adapter).

3

Job

launch training with hardware presets (GPU/CPU); monitoring and logs.

4

Evaluation

run the suite (quality, security/robustness, latency/cost).

5

Registration

save weights and metadata in MLflow; version.

6

Deployment

one click to Serving module; define policies and metrics.

Automatic evaluation (reproducible)

🧠

General LLM

exact match, F1/accuracy, BLEU/ROUGE/BERTScore (according to task).

💻

Code

compiles/"test passes", pass@k.

🔍

RAG

recall@k, MRR/nDCG, groundedness/% response with source.

🛡️

Security and robustness

jailbreak/PII/toxicity, prompt injection, adversarial variations.

Operation

P50/P95 latency, throughput, cost per query/1k tokens.

Evaluation adapters to recognized methodologies; we add new ones in 48–72h if you need them.

Hardware and backends (verified compatibility)

  • GA: CPU and NVIDIA (vLLM/Transformers/ONNX Runtime), OpenVINO (Intel CPU).
  • Beta: AMD ROCm, Intel iGPU/NPU, Habana.
  • Presets by memory/speed (balanced, performance, memory_efficient).

Key metrics you'll see

  • Improvement vs. base in target metric (e.g., +F1, +exact match, +pass@k).
  • Latency/cost per query compared (before/after).
  • Adapter size/VRAM and total training time.
  • Global health score (quality+security+operation).

Availability

Available today (v0.10 – 10/10/2025)

  • PEFT (LoRA/QLoRA) and Adapters with guided flow and hardware presets.
  • Basic evaluation (quality/operation) and MLflow registration.
  • 1-click deployment to Serving module.

In development (v0.11 – Q4 2025)

  • Expanded suites (security/robustness), comparative dashboards and more Packs.

Planned (2026)

  • Advanced Model Merge, distributed multi-node jobs and expanded sector templates.

Ready to train your first domain model?