SF Express's Journey with Apache Spark and Gluten

Learn how SF Express, one of China's largest logistics companies, leveraged Apache Spark and Gluten to transform their big data processing capabilities in this conference talk from the Data Lake & Data Warehouse Track. Discover the technical challenges faced by SF Express in handling massive logistics data volumes and explore how they implemented Gluten, a native vectorized execution engine, to accelerate their Apache Spark workloads. Gain insights into the performance improvements achieved through this integration, including query optimization strategies and real-world benchmarking results. Understand the architectural decisions made during the migration process and learn about best practices for deploying Gluten in production environments. Explore the specific use cases where vectorized execution provided the most significant performance gains for logistics data processing, including shipment tracking, route optimization, and customer analytics. Examine the collaboration between SF Express engineers and Intel developers in optimizing the solution for their specific workloads, and discover how this partnership contributed to the broader Apache Spark ecosystem through performance enhancements and bug fixes.