SurgGest Dataset — Surgical Hand Gesture Recognition

Dataset Overview

Our method consists of three main components: (1) a multi-view synchronized dataset collection pipeline, (2) a gesture recognition framework with multi-view fusion and knowledge distillation for efficient single-view deployment, and (3) an LLM-based feedback module (SurgEventRAG) that generates actionable coaching feedback for medical students based on recognized gesture sequences.

Camera Views

Expert Surgeons

Medical Students

Hours

Surgical Tasks

Gesture Classes

🎥

Synchronized Multi-View

Five cameras capturing simultaneous viewpoints, enabling multi-view and single-view recognition benchmarks.

🏥

Real Clinical Setting

Recorded at Geneva University Hospital with expert surgeons and medical students performing incision and suturing tasks.

🏷️

Dense Annotations

Frame-level gesture labels, skill-level annotations, and synchronized temporal segments across all views.

📊

Skill-Level Diversity

Covers the full expertise spectrum — from novice medical students to board-certified expert surgeons.

🤖

AI Feedback Benchmark

Designed to support downstream tasks: gesture recognition, skill assessment, and LLM-based feedback generation.

🔬

Two Surgical Tasks

Incision and suturing — covering distinct fine motor skills critical to surgical training curricula.

Publications

CVPR 2026 ✓ Accepted

SHANDS: A Multi-View Dataset and Benchmark for Surgical Hand-Gesture and Error Recognition Toward Medical Training

Le Ma, Thiago Freitas dos Santos, Nadia Magnenat-Thalmann, Katarzyna Wac

In surgical training for medical students, proficiency development relies on expert-led skill assessment, which is costly, time-limited, difficult to scale, and its exper- tise remains confined to institutions with available spe- cialists. Automated AI-based assessment offers a viable alternative, but progress is constrained by the lack of datasets containing realistic trainee errors and the multi- view variability needed to train robust computer vision ap- proaches. To address this gap, we present Surgical-Hands (SHANDS), a large-scale multi-view video dataset for sur- gical hand-gesture and error recognition for medical train- ing. SHANDS captures linear incision and suturing us- ing five RGB cameras from complementary viewpoints, per- formed by 52 participants (20 experts and 32 trainees) each completing three standardized trials per procedure. The videos are annotated at the frame level with 15 gesture primitives and include a validated taxonomy of 8 trainee error types, enabling both gesture recognition and error de- tection. We further define standardized evaluation protocols for single-view, multi-view, and cross-view generalization, and benchmark state-of-the-art deep learning models on the dataset. SHANDS will be publicly released to support the development of robust and scalable AI systems for surgical training grounded in clinically curated domain knowledge.

📄 Paper 💻 Code 🗂️ Dataset 🎞️ Video

Dataset Access

The dataset is available for academic and non-commercial research purposes. Please complete the data access agreement before downloading. We plan to make the SHANDS dataset available to the research community in June 2026.

📧

Request Access

For institutional access or collaboration inquiries.

Citation

If you use our dataset or models in your research, please cite:

CVPR 2026

@inproceedings{le2026surggest,
  title     = {SHANDS: A Multi-View Dataset and Benchmark for Surgical Hand-Gesture and
Error Recognition Toward Medical Training},
  author    = {Le, Ma; Thiago, Freitas dos Santos; Nadia, Magnenat-Thalmann; Katarzyna Wac},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision
               and Pattern Recognition (CVPR)},
  year      = {2026}
}

Acknowledgements

Funding: Supported by IDS (100.133 IP-ICT) and INDUX-R (GA No. 101135556; DOI: 10.3030/101135556). Funded by the European Union and the Swiss State Secretariat for Education, Research and Innovation (SERI). Disclaimer: Opinions expressed are the authors' alone and do not necessarily represent the EU or CINEA. Neither the EU nor the granting authority is responsible for the content.