Unsloth,
Zero Friction.

The specialized fine-tuning skill for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models with hardware-aware guardrails across Nvidia GPUs and Apple Silicon.

Install the Skill
01. Auto-Environment

No more CUDA nightmares. The agent probes your hardware to install the exact Unsloth wheel—or automatically pivots to mlx-tune for native Apple Silicon training.

02. VRAM Guardrails

Agent calculates dynamic QLoRA vs LoRA VRAM requirements to guarantee the model fits your hardware.

03. 1-Click Deploy

End-to-end execution. Automatically merge adapters and export to GGUF or Ollama Modelfiles.

gaslamp/unsloth-buddyACTIVE
claude ~/project $ /unsloth-buddy I have 500 doctor-patient consultation transcripts — want a model that summarizes them. I only have a MacBook Air

[Claude] Loading Gaslamp skill: unsloth-buddy
[unsloth-buddy] Probing environment...
[unsloth-buddy] OS: Darwin (macOS) | Arch: arm64
[unsloth-buddy] Detecting Apple Silicon processing unit...
[unsloth-buddy] Unsloth lacks native Mac training. Pivoting to mlx-tune framework...
[unsloth-buddy] Model mlx-community/Qwen2.5-7B-Instruct-4bit selected.
[unsloth-buddy] Writing MLX SFT Formatting Script for consultation summaries...
[unsloth-buddy] Launching MLX Trainer...

Expert Training Templates

Like all Gaslamp blueprints, the unsloth buddy is pre-programmed with expert design patterns—so your agent doesn't have to guess how to format complex datasets.

🧠 Reasoning (GRPO)

Natively train DeepSeek-R1 style models. The agent automatically builds standard schema reward functions (Format + Correctness) and configures the `GRPOTrainer`.

👁️ Vision & Multimodal

Handles the notoriously complex collators required for VLMs (like Qwen-VL or Llama 3.2 Vision) and automatically structures your image dataset correctly.

⚖️ Alignment (DPO/ORPO)

Train models on human preference data. The agent knows exactly how to load chosen/rejected data pairs to configure Unsloth's optimized preference trainers.

📝 Standard SFT

The classic instruction-tuning track. Convert any dataset into chat-template syntax effortlessly for rapid fine-tuning runs.

One Sentence. One Fine-Tuned Model.

A fine-tuning agent you talk to like a colleague. Describe what you want. It asks the right questions, formats your data, picks the appropriate technique, trains on your hardware, and packages it for deployment.

01

Interview

Locks in the method, model, data, hardware, and deploy target before writing a single line of code.

02

Data Strategy

Acquires and reformats your data into the exact schema the specific trainer (SFT, DPO, GRPO) requires.

03

Env & Math

Hardware scan blocks until the environment is perfectly matching. Calculates exact VRAM requirements.

04

Train

Generates the optimized Unsloth or MLX training script and executes the run natively.

05

Evaluate

Runs the fine-tuned adapter against the base model side-by-side so you can see the actual delta.

06

Export

Automatically merges and exports to GGUF, 16-bit, or HuggingFace Hub based on your chosen deploy target.