histolab
정보
histolab는 전체 슬라이드 이미지(WSI)에서 타일을 추출하고 전처리하기 위한 경량 Python 라이브러리로, 기본적인 슬라이드 처리 및 데이터셋 준비에 이상적입니다. 조직 감지, 타일 추출, H&E 염색 정규화와 같은 핵심 작업을 처리하여 간단한 분석 파이프라인에 적합합니다. 공간 단백체학이나 딥러닝과 같은 고급 워크플로우의 경우 개발자는 대신 pathml 라이브러리를 고려해야 합니다.
빠른 설치
Claude Code
추천npx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/histolabClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Histolab
Overview
Histolab is a Python library for processing whole slide images (WSI) in digital pathology. It automates tissue detection, extracts informative tiles from gigapixel images, and prepares datasets for deep learning pipelines. The library handles multiple WSI formats, implements sophisticated tissue segmentation, and provides flexible tile extraction strategies.
Installation
Install OpenSlide system libraries first (OpenSlide download), then install histolab:
uv pip install histolab
For built-in TCGA sample slides via histolab.data, also install pooch:
uv pip install pooch
Histolab 0.7.0 (latest stable) supports Python 3.8–3.11 on Linux and macOS. Windows is not supported as of 0.7.0.
Quick Start
Basic workflow for extracting tiles from a whole slide image:
from histolab.slide import Slide
from histolab.tiler import RandomTiler
# Load slide
slide = Slide("slide.svs", processed_path="output/")
# Configure tiler
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
# Preview tile locations
tiler.locate_tiles(slide, n_tiles=20)
# Extract tiles
tiler.extract(slide)
Core Capabilities
1. Slide Management
Load, inspect, and work with whole slide images in various formats.
Common operations:
- Loading WSI files (SVS, TIFF, NDPI, etc.)
- Accessing slide metadata (dimensions, magnification, properties)
- Generating thumbnails for visualization
- Working with pyramidal image structures
- Extracting regions at specific coordinates
Key classes: Slide
Reference: references/slide_management.md contains comprehensive documentation on:
- Slide initialization and configuration
- Built-in sample datasets (prostate, ovarian, breast, heart, kidney tissues)
- Accessing slide properties and metadata
- Thumbnail generation and visualization
- Working with pyramid levels
- Multi-slide processing workflows
Example workflow:
from histolab.slide import Slide
from histolab.data import prostate_tissue
# Load sample data
prostate_svs, prostate_path = prostate_tissue()
# Initialize slide
slide = Slide(prostate_path, processed_path="output/")
# Inspect properties
print(f"Dimensions: {slide.dimensions}")
print(f"Levels: {slide.levels}")
print(f"Magnification: {slide.properties.get('openslide.objective-power')}")
# Save thumbnail to processed_path
from pathlib import Path
Path(slide.processed_path).mkdir(parents=True, exist_ok=True)
slide.thumbnail.save(Path(slide.processed_path) / f"{slide.name}_thumbnail.png")
2. Tissue Detection and Masks
Automatically identify tissue regions and filter background/artifacts.
Common operations:
- Creating binary tissue masks
- Detecting largest tissue region
- Excluding background and artifacts
- Custom tissue segmentation
- Removing pen annotations
Key classes: TissueMask, BiggestTissueBoxMask, BinaryMask
Reference: references/tissue_masks.md contains comprehensive documentation on:
- TissueMask: Segments all tissue regions using automated filters
- BiggestTissueBoxMask: Returns bounding box of largest tissue region (default)
- BinaryMask: Base class for custom mask implementations
- Visualizing masks with
locate_mask() - Creating custom rectangular and annotation-exclusion masks
- Mask integration with tile extraction
- Best practices and troubleshooting
Example workflow:
from histolab.masks import TissueMask, BiggestTissueBoxMask
# Create tissue mask for all tissue regions
tissue_mask = TissueMask()
# Visualize mask on slide
slide.locate_mask(tissue_mask)
# Get mask array
mask_array = tissue_mask(slide)
# Use largest tissue region (default for most extractors)
biggest_mask = BiggestTissueBoxMask()
When to use each mask:
TissueMask: Multiple tissue sections, comprehensive analysisBiggestTissueBoxMask: Single main tissue section, exclude artifacts (default)- Custom
BinaryMask: Specific ROI, exclude annotations, custom segmentation
3. Tile Extraction
Extract smaller regions from large WSI using different strategies.
Three extraction strategies:
RandomTiler: Extract fixed number of randomly positioned tiles
- Best for: Sampling diverse regions, exploratory analysis, training data
- Key parameters:
n_tiles,seedfor reproducibility
GridTiler: Systematically extract tiles across tissue in grid pattern
- Best for: Complete coverage, spatial analysis, reconstruction
- Key parameters:
pixel_overlapfor sliding windows
ScoreTiler: Extract top-ranked tiles based on scoring functions
- Best for: Most informative regions, quality-driven selection
- Key parameters:
scorer(NucleiScorer, CellularityScorer, custom)
Common parameters:
tile_size: Tile dimensions (e.g., (512, 512))level: Pyramid level for extraction (0 = highest resolution)check_tissue: Filter tiles by tissue contenttissue_percent: Minimum tissue coverage (default 80%)extraction_mask: Mask defining extraction region
Reference: references/tile_extraction.md contains comprehensive documentation on:
- Detailed explanation of each tiler strategy
- Available scorers (NucleiScorer, CellularityScorer, custom)
- Tile preview with
locate_tiles() - Extraction workflows and reporting
- Advanced patterns (multi-level, hierarchical extraction)
- Performance optimization and troubleshooting
Example workflows:
from histolab.tiler import RandomTiler, GridTiler, ScoreTiler
from histolab.scorer import NucleiScorer
# Random sampling (fast, diverse)
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
random_tiler.extract(slide)
# Grid coverage (comprehensive)
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
pixel_overlap=0,
check_tissue=True
)
grid_tiler.extract(slide)
# Score-based selection (most informative)
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer(),
level=0
)
score_tiler.extract(slide, report_path="tiles_report.csv")
Always preview before extracting:
# Preview tile locations on thumbnail
tiler.locate_tiles(slide, n_tiles=20)
4. Filters and Preprocessing
Apply image processing filters for tissue detection, quality control, and preprocessing.
Filter categories:
Image Filters: Color space conversions, thresholding, contrast enhancement
RgbToGrayscale,RgbToHsv,RgbToHedOtsuThreshold,AdaptiveThresholdStretchContrast,HistogramEqualization
Morphological Filters: Structural operations on binary images
BinaryDilation,BinaryErosionBinaryOpening,BinaryClosingRemoveSmallObjects,RemoveSmallHoles
Composition: Chain multiple filters together
Compose: Create filter pipelines
Reference: references/filters_preprocessing.md contains comprehensive documentation on:
- Detailed explanation of each filter type
- Filter composition and chaining
- Common preprocessing pipelines (tissue detection, pen removal, nuclei enhancement)
- Applying filters to tiles
- Custom mask filters
- Quality control filters (blur detection, tissue coverage)
- Best practices and troubleshooting
Example workflows:
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallHoles, RemoveSmallObjects
)
# Standard tissue detection pipeline
tissue_detection = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=5),
RemoveSmallHoles(area_threshold=1000),
RemoveSmallObjects(area_threshold=500)
])
# Use with custom mask
from histolab.masks import TissueMask
custom_mask = TissueMask(filters=tissue_detection)
# Apply filters to tile
from histolab.tile import Tile
filtered_tile = tile.apply_filters(tissue_detection)
5. Stain Normalization
Standardize staining appearance across slides for deep learning (added in histolab 0.6.0).
Key classes: MacenkoStainNormalizer, ReinhardStainNormalizer
from histolab.stain_normalizer import MacenkoStainNormalizer, ReinhardStainNormalizer
from PIL import Image
target = Image.open("reference_stain.png") # Style reference slide/tile
source = Image.open("slide_to_normalize.png")
normalizer = MacenkoStainNormalizer()
normalizer.fit(target)
normalized = normalizer.transform(source)
normalized.save("normalized.png")
Use ReinhardStainNormalizer() for Reinhard color transfer. Fit on a representative target image, then transform source tiles or thumbnails. See references/filters_preprocessing.md for filter-based alternatives.
6. Visualization
Visualize slides, masks, tile locations, and extraction quality.
Common visualization tasks:
- Displaying slide thumbnails
- Visualizing tissue masks
- Previewing tile locations
- Assessing tile quality
- Creating reports and figures
Reference: references/visualization.md contains comprehensive documentation on:
- Slide thumbnail display and saving
- Mask visualization with
locate_mask() - Tile location preview with
locate_tiles() - Displaying extracted tiles and mosaics
- Quality assessment (score distributions, top vs bottom tiles)
- Multi-slide visualization
- Filter effect visualization
- Exporting high-resolution figures and PDF reports
- Interactive visualization in Jupyter notebooks
Example workflows:
import matplotlib.pyplot as plt
from histolab.masks import TissueMask
# Display slide thumbnail
plt.figure(figsize=(10, 10))
plt.imshow(slide.thumbnail)
plt.title(f"Slide: {slide.name}")
plt.axis('off')
plt.show()
# Visualize tissue mask
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
# Preview tile locations
tiler = RandomTiler(tile_size=(512, 512), n_tiles=50)
tiler.locate_tiles(slide, n_tiles=20)
# Display extracted tiles in grid
from pathlib import Path
from PIL import Image
tile_paths = list(Path("output/tiles/").glob("*.png"))[:16]
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
axes = axes.ravel()
for idx, tile_path in enumerate(tile_paths):
tile_img = Image.open(tile_path)
axes[idx].imshow(tile_img)
axes[idx].set_title(tile_path.stem, fontsize=8)
axes[idx].axis('off')
plt.tight_layout()
plt.show()
Typical Workflows
Workflow 1: Exploratory Tile Extraction
Quick sampling of diverse tissue regions for initial analysis.
from histolab.slide import Slide
from histolab.tiler import RandomTiler
from pathlib import Path
import logging
# Enable logging for progress tracking
logging.basicConfig(level=logging.INFO)
# Load slide
slide = Slide("slide.svs", processed_path="output/random_tiles/")
# Inspect slide
print(f"Dimensions: {slide.dimensions}")
print(f"Levels: {slide.levels}")
Path(slide.processed_path).mkdir(parents=True, exist_ok=True)
slide.thumbnail.save(Path(slide.processed_path) / f"{slide.name}_thumbnail.png")
# Configure random tiler
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42,
check_tissue=True,
tissue_percent=80.0
)
# Preview locations
random_tiler.locate_tiles(slide, n_tiles=20)
# Extract tiles
random_tiler.extract(slide)
Workflow 2: Comprehensive Grid Extraction
Complete tissue coverage for whole-slide analysis.
from histolab.slide import Slide
from histolab.tiler import GridTiler
from histolab.masks import TissueMask
# Load slide
slide = Slide("slide.svs", processed_path="output/grid_tiles/")
# Use TissueMask for all tissue sections
tissue_mask = TissueMask()
slide.locate_mask(tissue_mask)
# Configure grid tiler
grid_tiler = GridTiler(
tile_size=(512, 512),
level=1, # Use level 1 for faster extraction
pixel_overlap=0,
check_tissue=True,
tissue_percent=70.0
)
# Preview grid
grid_tiler.locate_tiles(slide)
# Extract all tiles
grid_tiler.extract(slide, extraction_mask=tissue_mask)
Workflow 3: Quality-Driven Tile Selection
Extract most informative tiles based on nuclei density.
from histolab.slide import Slide
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
import pandas as pd
import matplotlib.pyplot as plt
# Load slide
slide = Slide("slide.svs", processed_path="output/scored_tiles/")
# Configure score tiler
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
scorer=NucleiScorer(),
check_tissue=True
)
# Preview top tiles
score_tiler.locate_tiles(slide, n_tiles=15)
# Extract with report
score_tiler.extract(slide, report_path="tiles_report.csv")
# Analyze scores
report_df = pd.read_csv("tiles_report.csv")
plt.hist(report_df['score'], bins=20, edgecolor='black')
plt.xlabel('Tile Score')
plt.ylabel('Frequency')
plt.title('Distribution of Tile Scores')
plt.show()
Workflow 4: Multi-Slide Processing Pipeline
Process entire slide collection with consistent parameters.
from pathlib import Path
from histolab.slide import Slide
from histolab.tiler import RandomTiler
import logging
logging.basicConfig(level=logging.INFO)
# Configure tiler once
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=50,
level=0,
seed=42,
check_tissue=True
)
# Process all slides
slide_dir = Path("slides/")
output_base = Path("output/")
for slide_path in slide_dir.glob("*.svs"):
print(f"\nProcessing: {slide_path.name}")
# Create slide-specific output directory
output_dir = output_base / slide_path.stem
output_dir.mkdir(parents=True, exist_ok=True)
# Load and process slide
slide = Slide(slide_path, processed_path=output_dir)
# Save thumbnail for review
Path(slide.processed_path).mkdir(parents=True, exist_ok=True)
slide.thumbnail.save(Path(slide.processed_path) / f"{slide.name}_thumbnail.png")
# Extract tiles
tiler.extract(slide)
print(f"Completed: {slide_path.name}")
Workflow 5: Custom Tissue Detection and Filtering
Handle slides with artifacts, annotations, or unusual staining.
from histolab.slide import Slide
from histolab.masks import TissueMask
from histolab.tiler import RandomTiler
from histolab.filters.compositions import Compose
from histolab.filters.image_filters import RgbToGrayscale, OtsuThreshold
from histolab.filters.morphological_filters import (
BinaryDilation, RemoveSmallObjects, RemoveSmallHoles
)
# Define custom filter pipeline for aggressive artifact removal
aggressive_filters = Compose([
RgbToGrayscale(),
OtsuThreshold(),
BinaryDilation(disk_size=10),
RemoveSmallHoles(area_threshold=5000),
RemoveSmallObjects(area_threshold=3000) # Remove larger artifacts
])
# Create custom mask
custom_mask = TissueMask(filters=aggressive_filters)
# Load slide and visualize mask
slide = Slide("slide.svs", processed_path="output/")
slide.locate_mask(custom_mask)
# Extract with custom mask
tiler = RandomTiler(tile_size=(512, 512), n_tiles=100)
tiler.extract(slide, extraction_mask=custom_mask)
Best Practices
Slide Loading and Inspection
- Always inspect slide properties before processing
- Save thumbnails with
slide.thumbnail.save()for quick visual review - Check pyramid levels and dimensions
- Verify tissue is present using thumbnails
Tissue Detection
- Preview masks with
locate_mask()before extraction - Use
TissueMaskfor multiple sections,BiggestTissueBoxMaskfor single sections - Customize filters for specific stains (H&E vs IHC)
- Handle pen annotations with custom masks
- Test masks on diverse slides
Tile Extraction
- Always preview with
locate_tiles()before extracting - Choose appropriate tiler:
- RandomTiler: Sampling and exploration
- GridTiler: Complete coverage
- ScoreTiler: Quality-driven selection
- Set appropriate
tissue_percentthreshold (70-90% typical) - Use seeds for reproducibility in RandomTiler
- Extract at appropriate pyramid level for analysis resolution
- Enable logging for large datasets
Performance
- Extract at lower levels (1, 2) for faster processing
- Use
BiggestTissueBoxMaskoverTissueMaskwhen appropriate - Adjust
tissue_percentto reduce invalid tile attempts - Limit
n_tilesfor initial exploration - Use
pixel_overlap=0for non-overlapping grids
Quality Control
- Validate tile quality (check for blur, artifacts, focus)
- Review score distributions for ScoreTiler
- Inspect top and bottom scoring tiles
- Monitor tissue coverage statistics
- Filter extracted tiles by additional quality metrics if needed
Common Use Cases
Training Deep Learning Models
- Extract balanced datasets using RandomTiler across multiple slides
- Use ScoreTiler with NucleiScorer to focus on cell-rich regions
- Extract at consistent resolution (level 0 or level 1)
- Generate CSV reports for tracking tile metadata
Whole Slide Analysis
- Use GridTiler for complete tissue coverage
- Extract at multiple pyramid levels for hierarchical analysis
- Maintain spatial relationships with grid positions
- Use
pixel_overlapfor sliding window approaches
Tissue Characterization
- Sample diverse regions with RandomTiler
- Quantify tissue coverage with masks
- Extract stain-specific information with HED decomposition
- Compare tissue patterns across slides
Quality Assessment
- Identify optimal focus regions with ScoreTiler
- Detect artifacts using custom masks and filters
- Assess staining quality across slide collection
- Flag problematic slides for manual review
Dataset Curation
- Use ScoreTiler to prioritize informative tiles
- Filter tiles by tissue percentage
- Generate reports with tile scores and metadata
- Create stratified datasets across slides and tissue types
Troubleshooting
No tiles extracted
- Lower
tissue_percentthreshold - Verify slide contains tissue (check thumbnail)
- Ensure extraction_mask captures tissue regions
- Check tile_size is appropriate for slide resolution
Many background tiles
- Enable
check_tissue=True - Increase
tissue_percentthreshold - Use appropriate mask (TissueMask vs BiggestTissueBoxMask)
- Customize mask filters to better detect tissue
Extraction very slow
- Extract at lower pyramid level (level=1 or 2)
- Reduce
n_tilesfor RandomTiler/ScoreTiler - Use RandomTiler instead of GridTiler for sampling
- Use BiggestTissueBoxMask instead of TissueMask
Tiles have artifacts
- Implement custom annotation-exclusion masks
- Adjust filter parameters for artifact removal
- Increase small object removal threshold
- Apply post-extraction quality filtering
Inconsistent results across slides
- Use same seed for RandomTiler
- Normalize staining with
MacenkoStainNormalizerorReinhardStainNormalizer - Adjust
tissue_percentper staining quality - Implement slide-specific mask customization
Resources
This skill includes detailed reference documentation in the references/ directory:
references/slide_management.md
Comprehensive guide to loading, inspecting, and working with whole slide images:
- Slide initialization and configuration
- Built-in sample datasets
- Slide properties and metadata
- Thumbnail generation and visualization
- Working with pyramid levels
- Multi-slide processing workflows
- Best practices and common patterns
references/tissue_masks.md
Complete documentation on tissue detection and masking:
- TissueMask, BiggestTissueBoxMask, BinaryMask classes
- How tissue detection filters work
- Customizing masks with filter chains
- Visualizing masks
- Creating custom rectangular and annotation-exclusion masks
- Integration with tile extraction
- Best practices and troubleshooting
references/tile_extraction.md
Detailed explanation of tile extraction strategies:
- RandomTiler, GridTiler, ScoreTiler comparison
- Available scorers (NucleiScorer, CellularityScorer, custom)
- Common and strategy-specific parameters
- Tile preview with locate_tiles()
- Extraction workflows and CSV reporting
- Advanced patterns (multi-level, hierarchical)
- Performance optimization
- Troubleshooting common issues
references/filters_preprocessing.md
Complete filter reference and preprocessing guide:
- Image filters (color conversion, thresholding, contrast)
- Morphological filters (dilation, erosion, opening, closing)
- Filter composition and chaining
- Built-in stain normalization (Macenko, Reinhard) and filter-based alternatives
- Common preprocessing pipelines
- Applying filters to tiles
- Custom mask filters
- Quality control filters
- Best practices and troubleshooting
references/visualization.md
Comprehensive visualization guide:
- Slide thumbnail display and saving
- Mask visualization techniques
- Tile location preview
- Displaying extracted tiles and creating mosaics
- Quality assessment visualizations
- Multi-slide comparison
- Filter effect visualization
- Exporting high-resolution figures and PDFs
- Interactive visualization in Jupyter notebooks
Usage pattern: Reference files contain in-depth information to support workflows described in this main skill document. Load specific reference files as needed for detailed implementation guidance, troubleshooting, or advanced features.
GitHub 저장소
연관 스킬
executing-plans
디자인executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.
requesting-code-review
디자인이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.
connect-mcp-server
디자인이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.
web-cli-teleport
디자인이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.
