Fine-Tuning,
Now Fully Reproducible.
The specialized fine-tuning system for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models across Nvidia GPUs and Apple Silicon with zero friction and a 100% auditable gaslamp.md roadbook.
// Phase 1: Interview
[unsloth-buddy] Locking scope: SFT method, Qwen2.5-7B, Apple Silicon target.
[unsloth-buddy] Wrote decisions to gaslamp.md
// Phase 3: Hardware constraints
[unsloth-buddy] Environment: Darwin arm64. Unsloth unavailable.
[unsloth-buddy] Pivoting to mlx-tune backend.
[unsloth-buddy] VRAM check: Peak overhead ~4.2GB. Fits in 16GB.
// Phase 5.5: Demo Builder
[unsloth-buddy] Evaluating cross-entropy...
[unsloth-buddy] Generated static HTML comparison board 'demos/qwen2.5-medical/index.html'
// Phase 7: Reflection
[unsloth-buddy] Synthesizing memory snapshot...
[unsloth-buddy] +1 lesson saved to ~/.gaslamp/lessons.md
█
.gaslamp_context/🧠 Self-Evolving Memory.
The second time you fine-tune, it already knows your adapter path convention. The agent learns from your hardware constraints, hyperparameter tweaks, and setup requirements.
In Phase 7, it captures these "gotchas" into ~/.gaslamp/ as reusable skills and lessons. Every new project injects a Frozen Snapshot at startup natively applying your past knowledge.
The Reproducibility Contract.
Models without audit trails are just prototypes. Unsloth-Buddy documents every decision—from exact quantization settings to data parsing logic—in a structured, 11-section gaslamp.md roadbook.
Hand this file to any MLE (or a fresh agent session months later) to identically reproduce the project end-to-end.
Rank: 16 | Alpha: 32
Source: generated by src/prepare.py
Task-Aware Dashboards.
SSE streaming terminal UI. Whether you're tracking SFT loss curves or DPO chosen/rejected reward Deltas, the dashboard automatically adapts to your method.
Built for Empowered Teams.
We handle the infra and the math, so you can focus on the product value.
🚧 The 2-Question Interview
Generative AI is optimistic; it writes broken code happily. Unsloth-Buddy forces a simplified 2-question interview (Task + Data) capturing scope, method, and audience before jumping to PyTorch.
🔍 Apple vs Nvidia Routing
Hardware routing happens at the skill level. It detects your silicon and generates either native Unsloth scripts or MLX-Tune scripts. No more "CUDA out of memory" on a MacBook.
👁️ Native Vision SFT
Train Qwen2.5-VL and Gemma 4 Vision directly on Apple Silicon M-series chips via native integration with `mlx-vlm`. Plus static VLM HTML demos.
One Conversation. An 8-Phase Lifecycle.
Describe what you want. The agent locks scope, formats data, checks hardware, trains, generates demo UI, handles local deploy, and stores lessons.
Init
Creates project directory and injects the read-only frozen memory snapshot from past sessions.
Interview
2-questions locking task + data; capturing domain/audience for demo building.
Data Strategy
Acquires and reformats your data into the exact schema required natively.
Env & Math
Hardware scan, virtual environment block, and peak VRAM estimation calculation.
Train
Generates logic and executes SFT/DPO/GRPO/Vision models natively and securely.
Demo
Evaluate models side-by-side on an automatic portable HTML viewer dashboard.
Deploy
Export combined adapters or auto-quantize native GGUF and host via llama.cpp directly.
Reflect
Synthesizes run lessons, setup traps, and scenario recipes into a reusable memory footprint.