A comprehensive benchmark and development suite evaluating 14 state-of-the-art depth models across 6 diverse datasets including Etna-LRNT, Etna-S3LI, and LunarSim. We introduce stereo ground truth for Chang'e-3 and the CHERI dark-analog suite.
Monocular Depth Estimation (MDE) is crucial for autonomous lunar rover navigation using electro-optical cameras. However, deploying terrestrial MDE networks to the Moon brings a severe domain gap due to harsh shadows, textureless regolith, and zero atmospheric scattering. Existing evaluations rely on analogs that fail to replicate these conditions and lack actual metric ground truth. To address this, we present LuMon, a comprehensive benchmarking framework to evaluate MDE methods for lunar exploration. We introduce novel datasets featuring high-quality stereo ground truth depth from the real Chang'e-3 mission and the CHERI dark analog dataset. Utilizing this framework, we conduct a systematic zero-shot evaluation of state-of-the-art architectures across synthetic, analog, and real datasets. We rigorously assess performance against mission critical challenges like craters, rocks, extreme shading, and varying depth ranges. Furthermore, we establish a sim-to-real domain adaptation baseline by fine tuning a foundation model on synthetic data. While this adaptation yields drastic in-domain performance gains, it exhibits minimal generalization to authentic lunar imagery, highlighting a persistent cross-domain transfer gap. Our extensive analysis reveals the inherent limitations of current networks and sets a standard foundation to guide future advancements in extraterrestrial perception and domain adaptation.
Six concrete contributions advancing the state of lunar autonomous perception research.
From simulation and Earth analogs to real Chang'e-3 mission imagery, LuMon aggregates six datasets for cross-domain lunar MDE evaluation.
| Dataset | Domain | GT Type | Resolution | Max Depth | Lighting | Status |
|---|---|---|---|---|---|---|
| LunarSim | Synthetic | Dense | 1280×720 | — | Standard | Existing |
| LuSNAR | Synthetic | Dense | 1024×1024 | 50 m | Varied | Existing |
| Etna-LRNT | Earth Analog | Dense (Stereo) | 1292×964 | 15 m | Natural | Existing |
| Etna-S3LI | Earth Analog | Sparse (LiDAR) | 688×512 | 30 m | Natural | Existing |
| CHERI | Dark Analog | Dense (Stereo) | 1280×720 | 17 m | Extreme | Novel ★ |
| Chang'e-3 | Real Lunar | Dense (Stereo) | 27 cm/px | 25 m | Harsh | Mission Data |
All models visualized with consistent depth colormaps. Comparisons reveal stark differences in handling of craters, shadows, and low-texture regions.
Holistic evaluation covering depth quality, structural fidelity, downstream navigation, and computational cost.
Metric depth models consistently dominate across all evaluation axes, with fine-tuned DAv2 setting the new state-of-the-art.
Metric foundation models preserve internal linearity for stable 3D geometry and provide the strongest zero-shot baseline for lunar topography.
Complex surface geometries like craters and rocks degrade zero-shot depth estimation significantly more than severe lunar shading.
While fine-tuning bridges the sim-to-sim gap, it fails to resolve the broader sim-to-real shift, proving simulated environments are insufficient alone.
| Model | LunarSim | LuSNAR | Etna-LRNT | Etna-S3LI | Chang'e-3 | Cheri | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| δ₁ ↑ | A.Rel ↓ | RMSE ↓ | δ₁ ↑ | A.Rel ↓ | RMSE ↓ | δ₁ ↑ | A.Rel ↓ | RMSE ↓ | δ₁ ↑ | A.Rel ↓ | RMSE ↓ | δ₁ ↑ | A.Rel ↓ | RMSE ↓ | δ₁ ↑ | A.Rel ↓ | RMSE ↓ | |
| DAv2-FT (Ours) | 0.84 | 0.13 | 13.96 | 0.93 | 0.07 | 1.07 | 1.00 | 0.05 | 0.13 | 0.84 | 0.13 | 1.30 | 0.92 | 0.08 | 0.70 | 0.36 | 0.50 | 3.90 |
| DepthAnything-v2 | 0.56 | 0.26 | 15.18 | 0.31 | 0.45 | 2.89 | 1.00 | 0.06 | 0.15 | 0.88 | 0.12 | 1.22 | 0.95 | 0.06 | 0.56 | 0.32 | 0.54 | 4.14 |
| DepthAnything-v3 | 0.63 | 0.22 | 14.15 | 0.53 | 0.29 | 1.99 | 1.00 | 0.04 | 0.11 | 0.91 | 0.10 | 1.12 | 0.96 | 0.05 | 0.44 | 0.34 | 0.51 | 3.97 |
| MoGe | 0.77 | 0.15 | 12.17 | 0.53 | 0.29 | 1.80 | 0.99 | 0.05 | 0.15 | 0.91 | 0.10 | 1.08 | 0.95 | 0.06 | 0.45 | 0.32 | 0.54 | 4.12 |
| DepthPro | 0.50 | 0.29 | 17.57 | 0.33 | 0.45 | 2.70 | 0.73 | 241.71 | 450.05 | 0.89 | 0.11 | 1.11 | 0.89 | 0.10 | 0.89 | 0.32 | 0.55 | 4.18 |
| Metric3Dv2 | 0.75 | 0.16 | 12.53 | 0.61 | 0.25 | 1.71 | 1.00 | 0.03 | 0.10 | 0.87 | 0.12 | 1.19 | 0.95 | 0.05 | 0.43 | 0.33 | 0.53 | 4.09 |
| UniDepth | 0.76 | 0.21 | 16.16 | 0.63 | 0.26 | 2.14 | 1.00 | 0.03 | 0.07 | 0.87 | 0.12 | 1.20 | 0.94 | 0.07 | 0.54 | 0.35 | 0.51 | 3.99 |
| MapAnything | 0.86 | 0.11 | 10.73 | 0.74 | 0.17 | 1.53 | 1.00 | 0.03 | 0.08 | 0.91 | 0.10 | 1.05 | 0.95 | 0.07 | 0.52 | 0.36 | 0.50 | 3.91 |
| VDA | 0.75 | 0.16 | 13.94 | 0.77 | 0.17 | 2.05 | 1.00 | 0.04 | 0.12 | 0.83 | 0.14 | 1.34 | 0.93 | 0.08 | 0.68 | 0.43 | 0.44 | 3.56 |
| MetricAnything | 0.79 | 0.15 | 12.08 | 0.51 | 0.30 | 1.83 | 0.98 | 0.07 | 0.22 | 0.91 | 0.10 | 1.05 | 0.95 | 0.06 | 0.44 | 0.32 | 0.55 | 4.17 |
| DepthAnything-AC | 0.70 | 0.28 | 19.12 | 0.70 | 0.26 | 2.63 | 1.00 | 0.05 | 0.12 | 0.87 | 0.12 | 1.19 | 0.66 | 0.22 | 1.55 | 0.55 | 0.35 | 3.09 |
| DepthCrafter | 0.40 | 0.37 | 20.28 | 0.41 | 0.47 | 2.96 | 0.92 | 0.10 | 0.28 | 0.80 | 0.15 | 1.44 | 0.87 | 0.13 | 1.05 | 0.33 | 0.53 | 4.08 |
| Lotus | 0.59 | 0.27 | 20.43 | 0.36 | 0.44 | 3.08 | 0.65 | 0.19 | 0.49 | 0.80 | 0.15 | 1.45 | 0.70 | 0.19 | 1.64 | 0.35 | 0.49 | 3.90 |
| MiDaS | 0.78 | 0.17 | 14.97 | 0.81 | 0.13 | 1.68 | 1.00 | 0.04 | 0.12 | 0.89 | 0.11 | 1.15 | 0.81 | 0.15 | 1.12 | 0.40 | 0.46 | 3.71 |
| Marigold | 0.67 | 0.21 | 17.25 | 0.62 | 0.24 | 2.02 | 0.94 | 0.10 | 0.27 | 0.86 | 0.12 | 1.26 | 0.91 | 0.09 | 0.69 | 0.38 | 0.47 | 3.75 |
↑ higher is better, ↓ lower is better. Gold indicates best, Silver indicates second best.
For extensive quantitative results, see the paper PDF.
Region-wise analysis identifies primary failure modes and their relative severity for autonomous rover depth perception.
If you find LuMon useful in your research, please consider citing our work.
@inproceedings{sekmen2026lumon,
title={LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation},
author={Aytac Sekmen and Fatih Gunes and Furkan Horoz and Umut Isik and Alp Ozaydin and Altay Topaloglu and Umutcan Ustundas and Alp Yeni and Ersin Soken and Erol Sahin and Gokberk Cinbis and Sinan Kalkan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
year={2026}
}