segment-anything-model

davila7

Aktualisiert 11 days ago

292 Ansichten

18,478

1,685

18,478

Auf GitHub ansehen

MetaMultimodalImage SegmentationComputer VisionSAMZero-Shot

Über

Die Segment-Anything-Model-Fähigkeit führt Zero-Shot-Bildsegmentierung durch und ermöglicht Entwicklern, Objekte mithilfe von Prompts wie Punkten oder Begrenzungsrahmen zu isolieren oder automatisch alle Objektmasken zu generieren. Sie ist ideal für den Aufbau von Annotationstools, die Erzeugung von Trainingsdaten oder die Verarbeitung von Bildern in neuen Domänen ohne aufgabenspezifisches Training. Zu den Kernfähigkeiten gehören die Verarbeitung interaktiver Prompts und eine starke Out-of-the-Box-Leistung für verschiedene Computer-Vision-Pipelines.

Schnellinstallation

Claude Code

GitHub Repository

davila7/claude-code-templates

Pfad: cli-tool/components/skills/ai-research/multimodal-segment-anything

anthropicanthropic-claudeclaudeclaude-code

Verwandte Skills

blip-2-vision-language

Design

BLIP-2 is a vision-language framework that connects a frozen image encoder with a large language model for multimodal tasks. Use it for zero-shot image captioning, visual question answering, or image-text retrieval without task-specific fine-tuning. It's ideal for developers needing to add state-of-the-art visual understanding to LLM-based applications.

Skill ansehen

stable-diffusion-image-generation

audiocraft-audio-generation

whisper

Andere

Whisper is OpenAI's multilingual speech recognition model for transcription and translation across 99 languages. It handles tasks like speech-to-text, podcast transcription, and processing noisy or multilingual audio. Developers should use it for robust, production-ready automatic speech recognition (ASR).

Skill ansehen