The Dawn of Geospatial Foundation Models: Why Clay Changes Everything
Foundation models are revolutionizing how we analyze Earth observation data. Clay and similar GeoFMs represent a paradigm shift from task-specific models to versatile, pre-trained systems that understand our planet.
The geospatial industry is experiencing its GPT moment. Just as large language models transformed natural language processing by learning general representations from vast text corpora, geospatial foundation models (GeoFMs) are now doing the same for Earth observation data. At the center of this revolution is Clay—an open-source foundation model that represents a fundamental shift in how we approach satellite imagery analysis.
What Makes Foundation Models Different
Traditional machine learning for remote sensing has followed a predictable pattern: collect labeled data for a specific task (building detection, land cover classification, crop type mapping), train a model from scratch, and hope it generalizes to new regions. This approach is expensive, time-consuming, and brittle. A model trained on European agricultural fields often fails spectacularly on African landscapes.
Foundation models flip this paradigm. By pre-training on massive amounts of unlabeled satellite imagery using self-supervised learning, these models develop rich internal representations of what Earth looks like across seasons, geographies, and spectral bands. The result is a model that can be fine-tuned for downstream tasks with far less labeled data—sometimes just a few dozen examples.
GeoFMs offer immediate value without training. They represent an emerging research field and are a type of pre-trained vision transformer specifically adapted to geospatial data sources.
Clay: Open Foundation Model for Earth
Clay emerged from the team behind Microsoft's Planetary Computer and operates as a fiscally sponsored project under Radiant Earth. Unlike proprietary alternatives, Clay is fully open-source, allowing researchers and practitioners to inspect, modify, and build upon the model.
The architecture is a Vision Transformer (ViT) adapted to understand geospatial and temporal relationships in Earth observation data. Clay uses a Masked Autoencoder (MAE) approach for self-supervised learning—the model learns by predicting masked portions of satellite images, developing robust feature representations in the process.
Key Clay Capabilities
Multi-spectral input
Works with all Sentinel-2 bands, though commonly uses RGB and NIR
Location-aware
Incorporates geographic coordinates as input features
Temporal understanding
Processes time series data to understand seasonal patterns
768-dimensional embeddings
Rich representations for downstream tasks
Flexible inference
Can accept varying image sizes, resolutions, and band combinations
The GeoFM Landscape in 2024
Clay isn't alone. The GeoFM space has exploded with alternatives, each with different design philosophies:
Notable Foundation Models
- Prithvi-100M (IBM/NASA) — Trained on Harmonized Landsat-Sentinel data, strong on climate applications
- SatMAE — Pioneering work on masked autoencoders for satellite imagery
- SpectralGPT — Focuses on hyperspectral data with spectral-aware pretraining
- DOFA — Dynamic One-For-All architecture for multi-sensor fusion
- SatVision-Base — Microsoft's contribution optimized for high-resolution imagery
What sets Clay apart is its practical focus on deployment and its emphasis on similarity search. Clay-powered systems can detect emerging deforestation patterns before they expand into large-scale clearing operations—essentially enabling "reverse image search" for the planet.
Current Limitations and Challenges
Foundation models aren't magic. Research from ACM SIGSPATIAL notes that on multimodal geospatial tasks—those requiring fusion of satellite imagery with POI data, street-level photos, or tabular attributes—existing FMs still underperform task-specific models.
Pixel-level precision remains challenging. Transformer architectures reduce feature resolution 4-5x, sacrificing fine-grained spatial details needed for precise segmentation or sub-meter change detection. This is where hybrid approaches combining GeoFMs with specialized segmentation heads (like SAM2) become essential.
What's Next: The LLM-GeoFM Convergence
The most exciting frontier is the integration of large language models with GeoFMs. Imagine querying a satellite archive with natural language: "Show me all locations where solar panel installations increased by more than 20% between 2020 and 2024" and receiving not just coordinates, but explanatory analysis grounded in the imagery.
This convergence is already happening. AWS's geospatial FM service combines Prithvi with Claude for natural language interaction. Development Seed's work on semantic search using Clay embeddings points toward a future where geospatial analysis is accessible to domain experts without ML expertise.
Our Perspective
Having worked with enterprise GIS systems for nearly a decade, I see GeoFMs as the most significant shift since cloud-native geospatial formats. The ability to extract meaningful features from imagery without manual labeling campaigns changes the economics of satellite analytics entirely.
However, I'm skeptical of claims that GeoFMs will replace domain expertise. The real value lies in augmentation—combining foundation model capabilities with deep understanding of specific geographies, sensor characteristics, and application requirements. The teams that will succeed are those building on Clay and similar models while maintaining rigorous validation against ground truth.
References & Further Reading
Clay Foundation Model Documentation
Official Clay model documentation and API reference
https://clay-foundation.github.io/model/index.html
Using Foundation Models for Earth Observation
Development Seed's guide to GeoFM applications
https://developmentseed.org/blog/2024-11-01-geofm/
On the Opportunities and Challenges of Foundation Models for GeoAI
Comprehensive academic review of GeoFM capabilities and limitations
https://arxiv.org/abs/2304.06798
Revolutionizing Earth Observation with Geospatial Foundation Models on AWS
AWS implementation guide for production GeoFM deployment
https://aws.amazon.com/blogs/machine-learning/revolutionizing-earth-observation-with-geospatial-foundation-models-on-aws/
GeoAI Unpacked: EO Foundation Models
Practical overview of the GeoFM ecosystem
https://geoaiunpacked.substack.com/p/geoai-unpacked-1-eo-foundation-models