kernel-triton-writing

Name: kernel-triton-writing
Author: NVIDIA

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

View Source framework-internals

maintainer

NVIDIA

Updated 4/8/2026

Stars

13335

Forks

2271

quick start

Installation and usage

Installation

$ install --globalskills.sh

Usage

Once installed, you can use this skill by running the following command in your terminal:

skills use kernel-triton-writing