I’ve recently started working quite a bit with Spark and have found that there’s not much guidance on best practices for packaging and deploying libraries and apps on Spark. I’m planning to write a series of posts on Spark packaging and app deployment as we find patterns that work for the data platform at Mozilla.
Spark is written in Scala, but provides client libraries for Scala, Java, Python, and a few other languages. At Mozilla, we tend to write our large-scale ETL jobs in Scala, but most of the folks interacting with Spark are doing so in Python via notebooks like Jupyter, so we generally need to support libraries in both Scala and Python. This post focuses on how Python bindings can be packaged and deployed alongside Java/Scala code.[Read More]