Twofold Sparsity: Joint Bit and Network-level Sparse Deep Neural Networks for Energy-efficient RRAM Computing

Watch a technical talk exploring an innovative approach to deep neural network optimization through Twofold Sparsity - a joint bit- and network-level sparsity method designed for energy-efficient Compute-in-Memory (CIM) architecture. Learn how this method addresses the challenges of implementing AI on edge devices by combining network sparsification using Linear Feedback Shift Register masks with bit-level sparsity techniques. Discover how this approach achieves significant energy efficiency improvements ranging from 2.2x to 14x compared to traditional 8-bit networks, making it possible to run sophisticated deep learning models on power-constrained edge devices. Gain insights into how this solution overcomes the limitations of traditional Von Neumann architecture and enables more efficient on-device inference for AI-powered mobile applications.