Unsloth,
Zero Friction.
The specialized fine-tuning skill for Gaslamp. Seamlessly train Reasoning (GRPO), Vision, and Text models with hardware-aware guardrails across Nvidia GPUs and Apple Silicon.
Install the SkillNo more CUDA nightmares. The agent probes your hardware to install the exact Unsloth wheel—or automatically pivots to mlx-tune for native Apple Silicon training.
Agent calculates dynamic QLoRA vs LoRA VRAM requirements to guarantee the model fits your hardware.
End-to-end execution. Automatically merge adapters and export to GGUF or Ollama Modelfiles.
[Claude] Loading Gaslamp skill: unsloth-buddy
[unsloth-buddy] Probing environment...
[unsloth-buddy] OS: Darwin (macOS) | Arch: arm64
[unsloth-buddy] Detecting Apple Silicon processing unit...
[unsloth-buddy] Unsloth lacks native Mac training. Pivoting to mlx-tune framework...
[unsloth-buddy] Model mlx-community/Qwen2.5-7B-Instruct-4bit selected.
[unsloth-buddy] Writing MLX SFT Formatting Script for consultation summaries...
[unsloth-buddy] Launching MLX Trainer...
█
Expert Training Templates
Like all Gaslamp blueprints, the unsloth buddy is pre-programmed with expert design patterns—so your agent doesn't have to guess how to format complex datasets.
🧠 Reasoning (GRPO)
Natively train DeepSeek-R1 style models. The agent automatically builds standard schema reward functions (Format + Correctness) and configures the `GRPOTrainer`.
👁️ Vision & Multimodal
Handles the notoriously complex collators required for VLMs (like Qwen-VL or Llama 3.2 Vision) and automatically structures your image dataset correctly.
⚖️ Alignment (DPO/ORPO)
Train models on human preference data. The agent knows exactly how to load chosen/rejected data pairs to configure Unsloth's optimized preference trainers.
📝 Standard SFT
The classic instruction-tuning track. Convert any dataset into chat-template syntax effortlessly for rapid fine-tuning runs.
One Sentence. One Fine-Tuned Model.
A fine-tuning agent you talk to like a colleague. Describe what you want. It asks the right questions, formats your data, picks the appropriate technique, trains on your hardware, and packages it for deployment.
Interview
Locks in the method, model, data, hardware, and deploy target before writing a single line of code.
Data Strategy
Acquires and reformats your data into the exact schema the specific trainer (SFT, DPO, GRPO) requires.
Env & Math
Hardware scan blocks until the environment is perfectly matching. Calculates exact VRAM requirements.
Train
Generates the optimized Unsloth or MLX training script and executes the run natively.
Evaluate
Runs the fine-tuned adapter against the base model side-by-side so you can see the actual delta.
Export
Automatically merges and exports to GGUF, 16-bit, or HuggingFace Hub based on your chosen deploy target.