Fine-Tuning,
Now Fully Reproducible.

The specialized fine-tuning system for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models across Nvidia GPUs and Apple Silicon with zero friction and a 100% auditable gaslamp.md roadbook.

gaslamp/unsloth-buddy[ACTIVE]
claude ~/project $ /unsloth-buddy I need a model to parse medical records. I have a MacBook Air.

// Phase 1: Interview
[unsloth-buddy] Locking scope: SFT method, Qwen2.5-7B, Apple Silicon target.
[unsloth-buddy] Wrote decisions to gaslamp.md

// Phase 3: Hardware constraints
[unsloth-buddy] Environment: Darwin arm64. Unsloth unavailable.
[unsloth-buddy] Pivoting to mlx-tune backend.
[unsloth-buddy] VRAM check: Peak overhead ~4.2GB. Fits in 16GB.

// Phase 5.5: Demo Builder
[unsloth-buddy] Evaluating cross-entropy...
[unsloth-buddy] Generated static HTML comparison board 'demos/qwen2.5-medical/index.html'

// Phase 7: Reflection
[unsloth-buddy] Synthesizing memory snapshot...
[unsloth-buddy] +1 lesson saved to ~/.gaslamp/lessons.md
user.md
lessons.md
skills.md
Frozen Snapshot
Injected at Phase 0.gaslamp_context/

🧠 Self-Evolving Memory.

The second time you fine-tune, it already knows your adapter path convention. The agent learns from your hardware constraints, hyperparameter tweaks, and setup requirements.

In Phase 7, it captures these "gotchas" into ~/.gaslamp/ as reusable skills and lessons. Every new project injects a Frozen Snapshot at startup natively applying your past knowledge.

"Because debugging an OOM error once is enough."

The Reproducibility Contract.

Models without audit trails are just prototypes. Unsloth-Buddy documents every decision—from exact quantization settings to data parsing logic—in a structured, 11-section gaslamp.md roadbook.

Hand this file to any MLE (or a fresh agent session months later) to identically reproduce the project end-to-end.

# gaslamp.md
## 3. Model & LoRA
Base: mlx-community/Qwen2.5-7B
Rank: 16 | Alpha: 32
📖 Learn: Rank 16 provides enough capacity for specialized domain terminology without excessive VRAM overhead.
## 4. Data Strategy
Format: ChatML
Source: generated by src/prepare.py
📖 Learn: The JSONL is mapped to ChatML dynamically to prevent padding token bleed during GRPO reward generation.

Task-Aware Dashboards.

SSE streaming terminal UI. Whether you're tracking SFT loss curves or DPO chosen/rejected reward Deltas, the dashboard automatically adapts to your method.

Memory Breakdown
Total Peak VRAM 10.2 GB
■ Base Model    ■ LoRA Overhead
GRPO Dashboard (Terminal)
Reward ± StdDev Confidence Band

Built for Empowered Teams.

We handle the infra and the math, so you can focus on the product value.

🚧 The 2-Question Interview

Generative AI is optimistic; it writes broken code happily. Unsloth-Buddy forces a simplified 2-question interview (Task + Data) capturing scope, method, and audience before jumping to PyTorch.

🔍 Apple vs Nvidia Routing

Hardware routing happens at the skill level. It detects your silicon and generates either native Unsloth scripts or MLX-Tune scripts. No more "CUDA out of memory" on a MacBook.

👁️ Native Vision SFT

Train Qwen2.5-VL and Gemma 4 Vision directly on Apple Silicon M-series chips via native integration with `mlx-vlm`. Plus static VLM HTML demos.

One Conversation. An 8-Phase Lifecycle.

Describe what you want. The agent locks scope, formats data, checks hardware, trains, generates demo UI, handles local deploy, and stores lessons.

0

Init

Creates project directory and injects the read-only frozen memory snapshot from past sessions.

1

Interview

2-questions locking task + data; capturing domain/audience for demo building.

2

Data Strategy

Acquires and reformats your data into the exact schema required natively.

3

Env & Math

Hardware scan, virtual environment block, and peak VRAM estimation calculation.

4

Train

Generates logic and executes SFT/DPO/GRPO/Vision models natively and securely.

5

Demo

Evaluate models side-by-side on an automatic portable HTML viewer dashboard.

6

Deploy

Export combined adapters or auto-quantize native GGUF and host via llama.cpp directly.

7

Reflect

Synthesizes run lessons, setup traps, and scenario recipes into a reusable memory footprint.