LuMon - Lunar Monocular Depth Estimation Benchmark

Aytaç Sekmen^1,3,♥ Fatih Emre Gunes^1,3,♥ Furkan Horoz^1,3,♥ Hüseyin Umut Işık^1,3,♥ Mehmet Alp Ozaydin^1,3,♥ Onur Altay Topaloglu^1,3,♥ Şahin Umutcan Üstündaş^1,3,♥ Yurdasen Alp Yeni^1,3,♥ Halil Ersin Soken^2,3 Erol Sahin^1,3 Ramazan Gokberk Cinbis^1,3,♦ Sinan Kalkan^1,3,♦

¹Department of Computer Engineering, ²Department of Aerospace Engineering, ³ROMER
Middle East Technical University (METU), Ankara, Turkey

Equal contribution.
Equal senior contribution. Contact Email: skalkan@metu.edu.tr

Abstract

Overview

Monocular Depth Estimation (MDE) is crucial for autonomous lunar rover navigation using electro-optical cameras. However, deploying terrestrial MDE networks to the Moon brings a severe domain gap due to harsh shadows, textureless regolith, and zero atmospheric scattering. Existing evaluations rely on analogs that fail to replicate these conditions and lack actual metric ground truth. To address this, we present LuMon, a comprehensive benchmarking framework to evaluate MDE methods for lunar exploration. We introduce novel datasets featuring high-quality stereo ground truth depth from the real Chang'e-3 mission and the CHERI dark analog dataset. Utilizing this framework, we conduct a systematic zero-shot evaluation of state-of-the-art architectures across synthetic, analog, and real datasets. We rigorously assess performance against mission critical challenges like craters, rocks, extreme shading, and varying depth ranges. Furthermore, we establish a sim-to-real domain adaptation baseline by fine tuning a foundation model on synthetic data. While this adaptation yields drastic in-domain performance gains, it exhibits minimal generalization to authentic lunar imagery, highlighting a persistent cross-domain transfer gap. Our extensive analysis reveals the inherent limitations of current networks and sets a standard foundation to guide future advancements in extraterrestrial perception and domain adaptation.

Contributions

What We Contribute

Six concrete contributions advancing the state of lunar autonomous perception research.

Novel Lunar Datasets

Stereo ground-truth depth for real Chang'e-3 imagery and the novel CHERI dark-analog dataset (extreme polar-style lighting). LuSNAR: Public simulator based dataset, providing stereo images, dense depth, and semantic masks across five geological classes.

Comprehensive Zero-Shot Benchmark

First systematic evaluation of 14 contemporary MDE architectures - covering both metric and relative paradigms - across 6 datasets without dataset-specific tuning.

Failure Mode Analysis

Region-wise evaluation across craters, rocks, shadows, and flat terrain with scale drift characterization across depth ranges. Reveals concrete model weaknesses for deployment planning.

Downstream Pose Estimation

Integrates depth quality metrics with relative pose estimation for rover navigation, bridging the gap between raw accuracy numbers and practical operational utility.

Open Development Suite

Full release of evaluation scripts, LoRA fine-tuning code, dataset processing tools, multi-GPU inference, and visualization utilities.

Efficiency Profiling

Detailed profiling of FPS, VRAM consumption, and TFLOPs alongside depth accuracy, offering practical deployment guidance for power and memory-constrained rover hardware.

Datasets

Benchmark Suite - 6 Datasets

From simulation and Earth analogs to real Chang'e-3 mission imagery, LuMon aggregates six datasets for cross-domain lunar MDE evaluation.

LunarSim

Synthetic

Physics-based lunar simulation with accurate shadow rendering, crater morphology, and rock distributions. Standard orbital lighting conditions.

1280×720 Dense GT Rel. depth

LuSNAR

Synthetic

Public simulator based dataset, providing stereo images, dense depth, and semantic masks across five geological classes.

1024×1024 Dense GT Seg. masks

Etna-LRNT

Earth Analog

Volcanic terrain analog with SGM stereo-derived ground truth. Reveals why Earth analogs can be deceptive proxies for true lunar performance.

1292×964 Dense GT 15 m max

Etna-S3LI

Earth Analog

LiDAR-derived ground truth from volcanic terrain. Enables precise geometric error analysis of crater rims and rock edges.

688×512 Sparse GT 30 m max

CHERI ★

Novel

Our novel dark analog environment with extreme polar lighting. Captures permanently-shadowed region conditions that cause catastrophic failure in relative methods.

1280×720 Dense GT 17 m max

Chang'e-3

In-situ

Authentic lunar surface data from China's Chang'e-3 mission. The only real lunar dataset allowing for the ultimate test of domain transfer from simulation and analogs.

27 cm/px Dense GT 25 m max

LuMon benchmark dataset comparison
Dataset	Domain	GT Type	Resolution	Max Depth	Lighting	Status
LunarSim	Synthetic	Dense	1280×720	—	Standard	Existing
LuSNAR	Synthetic	Dense	1024×1024	50 m	Varied	Existing
Etna-LRNT	Earth Analog	Dense (Stereo)	1292×964	15 m	Natural	Existing
Etna-S3LI	Earth Analog	Sparse (LiDAR)	688×512	30 m	Natural	Existing
CHERI	Dark Analog	Dense (Stereo)	1280×720	17 m	Extreme	Novel ★
Chang'e-3	Real Lunar	Dense (Stereo)	27 cm/px	25 m	Harsh	Mission Data

Videos

Side-by-Side Model Comparisons

All models visualized with consistent depth colormaps. Comparisons reveal stark differences in handling of craters, shadows, and low-texture regions.

LunarSim (Synthetic)

LuSNAR (Synthetic)

CHERI (Novel Dataset)

Etna-LRNT (Earth Analog)

Etna-S3LI (Earth Analog)

Evaluation Framework

11+ Evaluation Metrics

Holistic evaluation covering depth quality, structural fidelity, downstream navigation, and computational cost.

Depth Quality

δ₁, δ₂, δ₃ - Threshold accuracy at 1.25×, 1.25²×, 1.25³×

A.Rel - Absolute relative error

Sq.Rel - Squared relative error

RMSE - Root mean squared error (m)

MAE - Mean absolute error (m)

SILog - Scale-invariant log error

Pose Estimation

Median Rotation Error - Angular pose error (°)

Median Translation Error - Positional pose error (m)

AUC - Area under pose accuracy curve

Downstream rover navigation validation

Efficiency

FPS - Inference speed on standard GPU

VRAM - Peak GPU memory footprint (GB)

TFLOPs - Computational complexity

Practical rover deployment analysis

Results

Quantitative Findings

Metric depth models consistently dominate across all evaluation axes, with fine-tuned DAv2 setting the new state-of-the-art.

Foundation Dominance

Metric foundation models preserve internal linearity for stable 3D geometry and provide the strongest zero-shot baseline for lunar topography.

Topological Hazards

Complex surface geometries like craters and rocks degrade zero-shot depth estimation significantly more than severe lunar shading.

Sim-to-Real Gap

While fine-tuning bridges the sim-to-sim gap, it fails to resolve the broader sim-to-real shift, proving simulated environments are insufficient alone.

Overall Evaluation Across Datasets

Model	LunarSim			LuSNAR			Etna-LRNT			Etna-S3LI			Chang'e-3			Cheri
Model	δ₁ ↑	A.Rel ↓	RMSE ↓	δ₁ ↑	A.Rel ↓	RMSE ↓	δ₁ ↑	A.Rel ↓	RMSE ↓	δ₁ ↑	A.Rel ↓	RMSE ↓	δ₁ ↑	A.Rel ↓	RMSE ↓	δ₁ ↑	A.Rel ↓	RMSE ↓
DAv2-FT (Ours)	0.84	0.13	13.96	0.93	0.07	1.07	1.00	0.05	0.13	0.84	0.13	1.30	0.92	0.08	0.70	0.36	0.50	3.90
DepthAnything-v2	0.56	0.26	15.18	0.31	0.45	2.89	1.00	0.06	0.15	0.88	0.12	1.22	0.95	0.06	0.56	0.32	0.54	4.14
DepthAnything-v3	0.63	0.22	14.15	0.53	0.29	1.99	1.00	0.04	0.11	0.91	0.10	1.12	0.96	0.05	0.44	0.34	0.51	3.97
MoGe	0.77	0.15	12.17	0.53	0.29	1.80	0.99	0.05	0.15	0.91	0.10	1.08	0.95	0.06	0.45	0.32	0.54	4.12
DepthPro	0.50	0.29	17.57	0.33	0.45	2.70	0.73	241.71	450.05	0.89	0.11	1.11	0.89	0.10	0.89	0.32	0.55	4.18
Metric3Dv2	0.75	0.16	12.53	0.61	0.25	1.71	1.00	0.03	0.10	0.87	0.12	1.19	0.95	0.05	0.43	0.33	0.53	4.09
UniDepth	0.76	0.21	16.16	0.63	0.26	2.14	1.00	0.03	0.07	0.87	0.12	1.20	0.94	0.07	0.54	0.35	0.51	3.99
MapAnything	0.86	0.11	10.73	0.74	0.17	1.53	1.00	0.03	0.08	0.91	0.10	1.05	0.95	0.07	0.52	0.36	0.50	3.91
VDA	0.75	0.16	13.94	0.77	0.17	2.05	1.00	0.04	0.12	0.83	0.14	1.34	0.93	0.08	0.68	0.43	0.44	3.56
MetricAnything	0.79	0.15	12.08	0.51	0.30	1.83	0.98	0.07	0.22	0.91	0.10	1.05	0.95	0.06	0.44	0.32	0.55	4.17
DepthAnything-AC	0.70	0.28	19.12	0.70	0.26	2.63	1.00	0.05	0.12	0.87	0.12	1.19	0.66	0.22	1.55	0.55	0.35	3.09
DepthCrafter	0.40	0.37	20.28	0.41	0.47	2.96	0.92	0.10	0.28	0.80	0.15	1.44	0.87	0.13	1.05	0.33	0.53	4.08
Lotus	0.59	0.27	20.43	0.36	0.44	3.08	0.65	0.19	0.49	0.80	0.15	1.45	0.70	0.19	1.64	0.35	0.49	3.90
MiDaS	0.78	0.17	14.97	0.81	0.13	1.68	1.00	0.04	0.12	0.89	0.11	1.15	0.81	0.15	1.12	0.40	0.46	3.71
Marigold	0.67	0.21	17.25	0.62	0.24	2.02	0.94	0.10	0.27	0.86	0.12	1.26	0.91	0.09	0.69	0.38	0.47	3.75

↑ higher is better, ↓ lower is better. Gold indicates best, Silver indicates second best.

For extensive quantitative results, see the paper PDF.

Failure Analysis

Key Challenges for Lunar Deployment

Region-wise analysis identifies primary failure modes and their relative severity for autonomous rover depth perception.

Crater Geometry

Sharp depth discontinuities at crater rims and concave interiors cause 2.05× error increase. Models lack lunar-specific geometric priors for these non-terrestrial shapes.

Extreme Shadows

Permanent polar shadows cause catastrophic failure in relative methods. Metric models maintain 2–3× better robustness. CHERI dataset specifically targets this failure mode.

Rock & Boulder Fields

Scattered boulders create complex occlusion and high-frequency depth variations that challenge standard encoder architectures trained primarily on structured terrestrial scenes.

Scale Drift

Models show non-linear depth distortion across distance ranges. Performance degrades progressively beyond 15m without domain-specific fine-tuning on lunar data.

Low-Texture Regolith

Featureless lunar soil provides minimal photometric cues. Models must rely on shading gradients that break down under oblique, high-contrast lunar illumination.

Domain Transfer Gap

Sim-to-real transfer remains limited. Earth analog datasets (Etna) show deceptively good performance that fails to generalize to actual lunar imagery (Chang'e-3).

Citation

BibTeX

If you find LuMon useful in your research, please consider citing our work.

bibtex

@inproceedings{sekmen2026lumon,
  title={LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation},
  author={Aytac Sekmen and Fatih Gunes and Furkan Horoz and Umut Isik and Alp Ozaydin and Altay Topaloglu and Umutcan Ustundas and Alp Yeni and Ersin Soken and Erol Sahin and Gokberk Cinbis and Sinan Kalkan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2026}
}

LuMon - Lunar MonocularDepth Estimation Benchmark