Machine Learning 5 December 2024 11 min read

SAM2 for Remote Sensing: From Zero-Shot to Production Workflows

Meta's Segment Anything Model 2 is transforming geospatial image analysis. We explore how Geo-SAM2 and samgeo are bringing interactive segmentation to satellite imagery workflows.

SAM2SegmentationRemote SensingComputer Vision
Aerial view of agricultural fields showing geometric patterns
Tom Fisk on Unsplash

In August 2024, Meta released Segment Anything Model 2 (SAM2), and the geospatial community immediately recognized its potential. While the original SAM was revolutionary for interactive segmentation, SAM2 adds video understanding, improved accuracy, and crucially, better performance on the challenging domain of remote sensing imagery.

From SAM to SAM2: What Changed

The original Segment Anything Model, released in 2023, demonstrated that a single model could segment virtually any object in any image with simple point or box prompts. For remote sensing, this was promising but limited—SAM was trained on natural images, not the peculiarities of satellite and aerial photography.

SAM2 brings three major improvements: enhanced zero-shot generalization (better performance on unfamiliar domains without fine-tuning), video segmentation with memory (tracking objects across frames), and the Hiera backbone architecture that provides multi-scale, high-resolution features.

SAM2 uses a transformer architecture with streaming memory. This allows it to process video frames one at a time while storing information about segmented objects, enabling temporal consistency that's crucial for change detection workflows.
Meta AI Research

Geo-SAM2: Bringing SAM2 to QGIS

The Geo-SAM2 QGIS plugin represents the state of the art for interactive remote sensing segmentation. Developed by researchers at ESRI and the geospatial open-source community, it decouples the computationally intensive image encoding from lightweight prompt-based inference.

Geo-SAM2 Key Features

Multi-scale feature support

Leverages both image_embed and high_res_feats for detailed mask generation

Large image handling

Automatically splits large rasters into 1024×1024 patches with edge-adaptive cropping

Flexible spectral input

Supports 1-3 bands (grayscale, RGB, spectral indices, SAR)

CRS integration

Fully integrated with QGIS coordinate reference systems

CPU-viable inference

Prompt-based inference runs on modest hardware after encoding

The workflow is intuitive: load a raster, encode it (this takes time but only needs to happen once), then interactively click points to segment buildings, fields, water bodies, or any other features. Each click takes milliseconds.

samgeo: Python-First Approach

For those who prefer code to GUI, the samgeo Python package provides a Pythonic interface to SAM for geospatial data. Created by Dr. Qiusheng Wu, it's become the standard library for automated satellite image segmentation.

samgeo_example.py
from samgeo import SamGeo

# Initialize with SAM2 model
sam = SamGeo(
    model_type="vit_h",
    checkpoint="sam2_hiera_large.pt",
    automatic=True
)

# Segment satellite image
sam.generate(
    source="sentinel2_rgb.tif",
    output="segmented.tif",
    batch=True,
    foreground=True,
    unique=True
)

The samgeo library handles all the complexity of georeferencing, tiling large images, and converting segments back to vector polygons with proper coordinate systems.

Practical Applications

We've deployed SAM2-based workflows across several use cases:

Production Use Cases

  • Building footprint extraction — Interactive correction of OSM data for rural areas
  • Agricultural field delineation — Rapid mapping of parcel boundaries from high-res imagery
  • Water body mapping — Flood extent extraction with minimal training data
  • Solar panel detection — Identifying rooftop installations for energy audits
  • Road network extraction — Tracing unpaved roads in developing regions

Limitations and Hybrid Approaches

SAM2 isn't perfect for remote sensing. The model struggles with ambiguous boundaries common in natural landscapes—where exactly does a wetland end and forest begin? It also lacks semantic understanding; it can segment objects but doesn't inherently know what they are.

The emerging best practice is combining SAM2 with domain-specific classifiers. Use a trained classifier (or GeoFM like Clay) to identify what's in the scene, then apply SAM2 for precise boundary delineation. This hybrid approach captures the best of both worlds.

Our Perspective

SAM2 represents a genuine productivity multiplier for geospatial workflows. The ability to extract precise polygons with a few clicks, rather than painstakingly digitizing by hand or waiting for ML model training, changes the economics of spatial data production.

However, I'd caution against treating SAM2 as fully autonomous. In our experience, the best results come from human-in-the-loop workflows where operators use SAM2 to draft features, then refine boundaries with domain knowledge. The model excels at the tedious work of following edges; humans excel at knowing which edges matter.

Tell us about your project

Our Offices

  • Canberra
    ACT, Australia