Predicting Ray Pointer Landing Poses in VR Using Multimodal LSTM-Based Neural Networks

This IEEE conference talk explores a novel approach to predicting ray pointer landing poses in virtual reality using multimodal LSTM-based neural networks. Discover how researchers from Xi'an Jiaotong-Liverpool University, Hong Kong University of Science and Technology, Simon Fraser University, and Tsinghua University tackle the crucial challenge of target selection in VR interaction. Learn about their implementation of prediction heuristics to enhance user experience through an LSTM neural network that processes time-series data from three channels: hand movements, head-mounted display positioning, and eye tracking. Examine the research methodology involving two studies - one gathering motion data to identify optimal modality combinations, and another validating raycasting across various distances, angles, and target sizes. The presentation reveals impressive results showing prediction accuracy within 4.6° and improvements of 3.5 and 1.9 times compared to baseline and prior kinematic methods respectively. Part of the "Enhancing Interaction and Feedback in Virtual and Cross-Reality Systems" session at IEEE VR 2025.