Master Finance Tools - 35% Off CFI (Code CFI35)
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the latest enhancements to PySpark's DataFrame API in this 39-minute conference talk that introduces powerful new features for more expressive and modular data workflows. Learn how to leverage table-valued functions (TVFs) through Python User-Defined Table Functions (UDTFs), including support for polymorphism that allows for flexible function definitions. Discover the new subquery API that simplifies complex analytical logic and understand how lateral joins connect these advanced features together. Master practical developer tools including built-in plotting capabilities for data visualization, profiling tools for performance optimization, and get a preview of upcoming features like UDF logging and Python-native data source API. Gain insights into building production-ready data pipelines and extending PySpark's capabilities, whether you're developing analytics workflows or AI applications. The session covers both foundational concepts and advanced techniques, making it valuable for developers working on production pipelines or those looking to extend PySpark's functionality with custom implementations.
Syllabus
What’s New in PySpark: TVFs, Subqueries, Plots, and Profilers
Taught by
Databricks