wavecap-evaluate
Evaluate WaveCap audio analysis and transcription accuracy. Use when the user wants to run regression tests, compare transcriptions against ground truth, calculate WER/CER metrics, or assess overall system quality.
Evaluate WaveCap audio analysis and transcription accuracy. Use when the user wants to run regression tests, compare transcriptions against ground truth, calculate WER/CER metrics, or assess overall system quality.
Create multilingual glossaries for educational content, maintain terminology consistency across translations, build translation memory databases, and define preferred terms by domain and region. Use when managing translation terminology. Activates on "glossary", "terminology management", or "translation memory".
ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.
Split large texts into meaningful, AI-optimized chunks while preserving semantic coherence and document structure. Use when processing large documents for AI training, RAG systems, or when you need to break down content while maintaining context and relationships.
Expert in Natural Language Processing, designing systems for text classification, NER, translation, and LLM integration using Hugging Face, spaCy, and LangChain. Use when building NLP pipelines, text analysis, or LLM-powered features. Triggers include "NLP", "text classification", "NER", "named entity", "sentiment analysis", "spaCy", "Hugging Face", "transformers".
Medical document OCR processing for extracting structured clinical data from medical images (prescriptions, lab results, clinical notes). Uses Google Cloud Vision for text extraction and medical NLP for entity recognition. Deploy when processing healthcare documents, extracting patient data, or converting medical images to structured formats.
Development skill for CaseMark's Court Recording Transcriber - an AI-powered application for transcribing court recordings with speaker identification, synchronized playback, search, and legal document exports. Built with Next.js 16, PostgreSQL, Drizzle ORM, wavesurfer.js, and Case.dev APIs. Use this skill when: (1) Working on or extending the court-record-transcriber codebase, (2) Integrating with Case.dev transcription APIs, (3) Working with audio playback/waveforms, (4) Building transcript export features, or (5) Adding speaker identification logic.
Complete stenography guide for court reporting and legal transcription. Use when building steno projects from basic to professional level, learning court reporting workflows, optimizing legal dictionaries, setting up Plover for courtroom use, creating specialized legal briefs, or developing speed for legal documentation.
Process walk recordings into usable research material. Transcription workflow, insight extraction, integration with pipeline. Use after WALK stage recording.
使用 AI 理解和分析多媒体内容(图片、视频、音频)。Use when user wants to 理解图片, 分析视频, 音频转文字, 视频问答, understand media, analyze video, transcribe audio, describe image, what is in this video/image/audio.
AI生成日本語の違和感(AI臭)を検出・解消し、人間らしい自然な文章に脱臭するスキル。「この文章を脱臭して」「AI臭を消して」「人間らしい文章にして」「自然な日本語に直して」「翻訳調を修正」などのリクエストで起動する。プロンプトエンジニアリングによる予防とポストエディティングによる治療の両方をサポート。
Use when user requests translating Qt project localization files (TS files), automating translation workflows, or setting up multilingual support for Qt applications. This skill uses parallel processing with ThreadPoolExecutor to translate TS (Translation Source) files efficiently.
Transform textbook content based on the 10-dimension user profile to provide personalized learning experiences. Agent: AIEngineer
This skill is ALWAYS ACTIVE once installed. Automatically applies classical Chinese (文言文) writing style to all responses. Uses concise, elegant expressions while keeping technical terms intact. No trigger phrase needed - activates on every response.
Configure WaveCap LLM-based transcription correction. Use when the user wants to enable/disable LLM correction, change models, tune prompts, or optimize correction quality on Apple Silicon.
Document ingestion pipeline - docs to chunks to metadata for RAG
Process audio, video, and media on cloud GPUs. Transcribe with Whisper, clone voices, generate videos, upscale images, and run batch media processing. All results sync back to your Mac.
Configure WaveCap hallucination detection and prevention. Use when Whisper outputs gibberish, repeated phrases, or phantom text on silent audio.
Generate a Sora video from a text prompt via an Azure OpenAI endpoint, then download the resulting .mp4 locally. Use when the user asks to generate a Sora video/video.mp4 from a prompt or wants the generated video saved to disk.
Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).
Generates images and videos using ComfyUI node-based workflows. Use when creating AI-generated assets, text-to-image, text-to-video, image-to-video, running Stable Diffusion, Flux, HunyuanVideo, or when user mentions "comfy," "ComfyUI," "generate image," "generate video," "AI art," "diffusion model," or needs visual content for courses/projects.
Fine-tuning Speech-to-Text models like Whisper using Unsloth's optimized LoRA pipeline. Triggers: stt, whisper, transcription, audio fine-tuning, speech-to-text, audio normalization.
Combine multiple images using Gemini 2.5 Flash (Nano Banana) via OpenRouter. Use when merging 2-8 images with AI-guided composition.