Accepted · 4th AI4Space Workshop · CVPR 2026

LuMon - Lunar Monocular
Depth Estimation Benchmark

A comprehensive benchmark and development suite evaluating 14 state-of-the-art depth models across 6 diverse datasets including Etna-LRNT, Etna-S3LI, and LunarSim. We introduce stereo ground truth for Chang'e-3 and the CHERI dark-analog suite.

Denver, Colorado · CVPR 2026
June 3–7, 2026
ai4space.space
15 Models
DAv2-FT
6 Datasets ✓
Aytaç Sekmen1,3,♥ Fatih Emre Gunes1,3,♥ Furkan Horoz1,3,♥ Hüseyin Umut Işık1,3,♥ Mehmet Alp Ozaydin1,3,♥ Onur Altay Topaloglu1,3,♥ Şahin Umutcan Üstündaş1,3,♥ Yurdasen Alp Yeni1,3,♥ Halil Ersin Soken2,3 Erol Sahin1,3 Ramazan Gokberk Cinbis1,3,♦ Sinan Kalkan1,3,♦

1Department of Computer Engineering, 2Department of Aerospace Engineering, 3ROMER
Middle East Technical University (METU), Ankara, Turkey

Equal contribution.
Equal senior contribution. Contact Email: skalkan@metu.edu.tr

Abstract

Overview

Monocular Depth Estimation (MDE) is crucial for autonomous lunar rover navigation using electro-optical cameras. However, deploying terrestrial MDE networks to the Moon brings a severe domain gap due to harsh shadows, textureless regolith, and zero atmospheric scattering. Existing evaluations rely on analogs that fail to replicate these conditions and lack actual metric ground truth. To address this, we present LuMon, a comprehensive benchmarking framework to evaluate MDE methods for lunar exploration. We introduce novel datasets featuring high-quality stereo ground truth depth from the real Chang'e-3 mission and the CHERI dark analog dataset. Utilizing this framework, we conduct a systematic zero-shot evaluation of state-of-the-art architectures across synthetic, analog, and real datasets. We rigorously assess performance against mission critical challenges like craters, rocks, extreme shading, and varying depth ranges. Furthermore, we establish a sim-to-real domain adaptation baseline by fine tuning a foundation model on synthetic data. While this adaptation yields drastic in-domain performance gains, it exhibits minimal generalization to authentic lunar imagery, highlighting a persistent cross-domain transfer gap. Our extensive analysis reveals the inherent limitations of current networks and sets a standard foundation to guide future advancements in extraterrestrial perception and domain adaptation.

Contributions

What We Contribute

Six concrete contributions advancing the state of lunar autonomous perception research.

Novel Lunar Datasets
Stereo ground-truth depth for real Chang'e-3 imagery and the novel CHERI dark-analog dataset (extreme polar-style lighting). LuSNAR: Public simulator based dataset, providing stereo images, dense depth, and semantic masks across five geological classes.
Comprehensive Zero-Shot Benchmark
First systematic evaluation of 14 contemporary MDE architectures - covering both metric and relative paradigms - across 6 datasets without dataset-specific tuning.
Failure Mode Analysis
Region-wise evaluation across craters, rocks, shadows, and flat terrain with scale drift characterization across depth ranges. Reveals concrete model weaknesses for deployment planning.
Downstream Pose Estimation
Integrates depth quality metrics with relative pose estimation for rover navigation, bridging the gap between raw accuracy numbers and practical operational utility.
Open Development Suite
Full release of evaluation scripts, LoRA fine-tuning code, dataset processing tools, multi-GPU inference, and visualization utilities.
Efficiency Profiling
Detailed profiling of FPS, VRAM consumption, and TFLOPs alongside depth accuracy, offering practical deployment guidance for power and memory-constrained rover hardware.
Datasets

Benchmark Suite - 6 Datasets

From simulation and Earth analogs to real Chang'e-3 mission imagery, LuMon aggregates six datasets for cross-domain lunar MDE evaluation.

LunarSim
Synthetic
Physics-based lunar simulation with accurate shadow rendering, crater morphology, and rock distributions. Standard orbital lighting conditions.
1280×720 Dense GT Rel. depth
LuSNAR
Synthetic
Public simulator based dataset, providing stereo images, dense depth, and semantic masks across five geological classes.
1024×1024 Dense GT Seg. masks
Etna-LRNT
Earth Analog
Volcanic terrain analog with SGM stereo-derived ground truth. Reveals why Earth analogs can be deceptive proxies for true lunar performance.
1292×964 Dense GT 15 m max
Etna-S3LI
Earth Analog
LiDAR-derived ground truth from volcanic terrain. Enables precise geometric error analysis of crater rims and rock edges.
688×512 Sparse GT 30 m max
CHERI ★
Novel
Our novel dark analog environment with extreme polar lighting. Captures permanently-shadowed region conditions that cause catastrophic failure in relative methods.
1280×720 Dense GT 17 m max
Chang'e-3
In-situ
Authentic lunar surface data from China's Chang'e-3 mission. The only real lunar dataset allowing for the ultimate test of domain transfer from simulation and analogs.
27 cm/px Dense GT 25 m max
LuMon benchmark dataset comparison
Dataset Domain GT Type Resolution Max Depth Lighting Status
LunarSimSyntheticDense1280×720StandardExisting
LuSNARSyntheticDense1024×102450 mVariedExisting
Etna-LRNTEarth AnalogDense (Stereo)1292×96415 mNaturalExisting
Etna-S3LIEarth AnalogSparse (LiDAR)688×51230 mNaturalExisting
CHERIDark AnalogDense (Stereo)1280×72017 mExtremeNovel ★
Chang'e-3Real LunarDense (Stereo)27 cm/px25 mHarshMission Data
Videos

Side-by-Side Model Comparisons

All models visualized with consistent depth colormaps. Comparisons reveal stark differences in handling of craters, shadows, and low-texture regions.

LuSNAR (Synthetic)

CHERI (Novel Dataset)

Etna-LRNT (Earth Analog)

Etna-S3LI (Earth Analog)

Evaluation Framework

11+ Evaluation Metrics

Holistic evaluation covering depth quality, structural fidelity, downstream navigation, and computational cost.

Depth Quality
δ₁, δ₂, δ₃ - Threshold accuracy at 1.25×, 1.25²×, 1.25³×
A.Rel - Absolute relative error
Sq.Rel - Squared relative error
RMSE - Root mean squared error (m)
MAE - Mean absolute error (m)
SILog - Scale-invariant log error
Pose Estimation
Median Rotation Error - Angular pose error (°)
Median Translation Error - Positional pose error (m)
AUC - Area under pose accuracy curve
Downstream rover navigation validation
Efficiency
FPS - Inference speed on standard GPU
VRAM - Peak GPU memory footprint (GB)
TFLOPs - Computational complexity
Practical rover deployment analysis
Results

Quantitative Findings

Metric depth models consistently dominate across all evaluation axes, with fine-tuned DAv2 setting the new state-of-the-art.

Foundation Dominance

Metric foundation models preserve internal linearity for stable 3D geometry and provide the strongest zero-shot baseline for lunar topography.

Topological Hazards

Complex surface geometries like craters and rocks degrade zero-shot depth estimation significantly more than severe lunar shading.

Sim-to-Real Gap

While fine-tuning bridges the sim-to-sim gap, it fails to resolve the broader sim-to-real shift, proving simulated environments are insufficient alone.

Overall Evaluation Across Datasets

Model LunarSim LuSNAR Etna-LRNT Etna-S3LI Chang'e-3 Cheri
δ₁ ↑A.Rel ↓RMSE ↓ δ₁ ↑A.Rel ↓RMSE ↓ δ₁ ↑A.Rel ↓RMSE ↓ δ₁ ↑A.Rel ↓RMSE ↓ δ₁ ↑A.Rel ↓RMSE ↓ δ₁ ↑A.Rel ↓RMSE ↓
DAv2-FT (Ours)0.840.1313.960.930.071.071.000.050.130.840.131.300.920.080.700.360.503.90
DepthAnything-v20.560.2615.180.310.452.891.000.060.150.880.121.220.950.060.560.320.544.14
DepthAnything-v30.630.2214.150.530.291.991.000.040.110.910.101.120.960.050.440.340.513.97
MoGe0.770.1512.170.530.291.800.990.050.150.910.101.080.950.060.450.320.544.12
DepthPro0.500.2917.570.330.452.700.73241.71450.050.890.111.110.890.100.890.320.554.18
Metric3Dv20.750.1612.530.610.251.711.000.030.100.870.121.190.950.050.430.330.534.09
UniDepth0.760.2116.160.630.262.141.000.030.070.870.121.200.940.070.540.350.513.99
MapAnything0.860.1110.730.740.171.531.000.030.080.910.101.050.950.070.520.360.503.91
VDA0.750.1613.940.770.172.051.000.040.120.830.141.340.930.080.680.430.443.56
MetricAnything0.790.1512.080.510.301.830.980.070.220.910.101.050.950.060.440.320.554.17
DepthAnything-AC0.700.2819.120.700.262.631.000.050.120.870.121.190.660.221.550.550.353.09
DepthCrafter0.400.3720.280.410.472.960.920.100.280.800.151.440.870.131.050.330.534.08
Lotus0.590.2720.430.360.443.080.650.190.490.800.151.450.700.191.640.350.493.90
MiDaS0.780.1714.970.810.131.681.000.040.120.890.111.150.810.151.120.400.463.71
Marigold0.670.2117.250.620.242.020.940.100.270.860.121.260.910.090.690.380.473.75

↑ higher is better, ↓ lower is better. Gold indicates best, Silver indicates second best.

For extensive quantitative results, see the paper PDF.

Failure Analysis

Key Challenges for Lunar Deployment

Region-wise analysis identifies primary failure modes and their relative severity for autonomous rover depth perception.

Crater Geometry
Sharp depth discontinuities at crater rims and concave interiors cause 2.05× error increase. Models lack lunar-specific geometric priors for these non-terrestrial shapes.
Extreme Shadows
Permanent polar shadows cause catastrophic failure in relative methods. Metric models maintain 2–3× better robustness. CHERI dataset specifically targets this failure mode.
Rock & Boulder Fields
Scattered boulders create complex occlusion and high-frequency depth variations that challenge standard encoder architectures trained primarily on structured terrestrial scenes.
Scale Drift
Models show non-linear depth distortion across distance ranges. Performance degrades progressively beyond 15m without domain-specific fine-tuning on lunar data.
Low-Texture Regolith
Featureless lunar soil provides minimal photometric cues. Models must rely on shading gradients that break down under oblique, high-contrast lunar illumination.
Domain Transfer Gap
Sim-to-real transfer remains limited. Earth analog datasets (Etna) show deceptively good performance that fails to generalize to actual lunar imagery (Chang'e-3).
Citation

BibTeX

If you find LuMon useful in your research, please consider citing our work.

bibtex
@inproceedings{sekmen2026lumon,
  title={LuMon: A Comprehensive Benchmark and Development Suite with Novel Datasets for Lunar Monocular Depth Estimation},
  author={Aytac Sekmen and Fatih Gunes and Furkan Horoz and Umut Isik and Alp Ozaydin and Altay Topaloglu and Umutcan Ustundas and Alp Yeni and Ersin Soken and Erol Sahin and Gokberk Cinbis and Sinan Kalkan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  year={2026}
}