UncoverML Geoscience Pipeline
Deployed machine learning toolkit for applying ML to geoscience datasets on the National Computational Infrastructure, enabling distributed multi-core, multi-node processing for compute-intensive geological analysis.
Client
Geoscience Australia
Key Results
HPC Deployment
Analysis Time
Team Usage
Contribution
The Challenge
Geoscientists needed to apply machine learning to massive geological datasets but lacked infrastructure and tools optimized for high-performance computing environments.
Key challenges included:
- Massive datasets too large for single-machine processing
- Complex HPC environment with specific requirements
- Need for distributed processing across multiple nodes
- Integration with existing geoscience workflows
- Legacy Python 2 codebase requiring migration
National Computational Infrastructure
NCI is Australia's national facility providing high-performance computing services to researchers. Optimizing workflows for NCI enables researchers to tackle problems previously infeasible.
Gadi Supercomputer
Multi-petaflop computing power
Massive Datasets
Petabytes of geoscience data
Our Solution
Designed and deployed distributed infrastructure on NCI, implementing feature extraction, hyperparameter optimization, prediction mapping, and model exploration capabilities. Refactored codebase to Python 3 for future compatibility.
MPI Distribution
Implemented MPI-based parallelization for distributing workloads across multiple compute nodes efficiently.
Feature Extraction
Built scalable feature extraction pipelines for processing large geospatial raster datasets.
Hyperparameter Optimization
Automated hyperparameter tuning leveraging HPC resources for exhaustive search.
Prediction Mapping
Generated prediction maps at scale, producing interpretable outputs for geoscientists.
Python 3 Migration
Refactored entire codebase from Python 2 to Python 3 for long-term maintainability.
scikit-learn Integration
Deep integration with scikit-learn for access to wide range of ML algorithms.
Technology Stack
Project Impact
Research Acceleration
- Reduced analysis time from weeks to hours
- Enabled previously infeasible large-scale analyses
- Used by geoscience teams nationally
- Supports mineral exploration and mapping
Open Source Contribution
- Contributed improvements back to open source project
- Python 3 migration benefits entire community
- Documentation for reproducible research
- Foundation for ongoing geoscience ML work