## lib.jar: Java library? Python package? Both?

I’ve recently started working quite a bit with Spark and have found that there’s not much guidance on best practices for packaging and deploying libraries and apps on Spark. I’m planning to write a series of posts on Spark packaging and app deployment as we find patterns that work for the data platform at Mozilla.

Spark is written in Scala, but provides client libraries for Scala, Java, Python, and a few other languages. At Mozilla, we tend to write our large-scale ETL jobs in Scala, but most of the folks interacting with Spark are doing so in Python via notebooks like Jupyter, so we generally need to support libraries in both Scala and Python. This post focuses on how Python bindings can be packaged and deployed alongside Java/Scala code.

## A Change Data Capture Pipeline From PostgreSQL to Kafka

Originally posted on the Simple engineering blog; also presented at PGConf US 2017 and Ohio LinuxFest 2017

We previously wrote about a pipeline for replicating data from multiple siloed PostgreSQL databases to a data warehouse in Building Analytics at Simple, but we knew that pipeline was only the first step. This post details a rebuilt pipeline that captures a complete history of data-changing operations in near real-time by hooking into PostgreSQL’s logical decoding feature. The new pipeline powers not only a higher-fidelity warehouse, but also user-facing features.

## Building Analytics at Simple

Originally posted on the Simple engineering blog; also presented at PyOhio 2016 and CloudDevelop 2016

Early in 2014, Simple was a mid-stage startup with only a single analytics-focused employee. When we wanted to answer a question about customer behavior or business performance, we would have to query production databases. Everybody in the company wanted to make informed decisions, from engineering to product strategy to business development to customer relations, so it was clear that we needed to build a data warehouse and a team to support it.

## A Search for Exotic Particles

My Ph.D. dissertation, performed at the University of Wisconsin-Madison.

Abstract: A search for exotic particles decaying via WZ to final states with electrons and muons is performed using a data sample of pp collisions collected at 7 TeV center-of-mass energy by the CMS experiment at the LHC, corresponding to an integrated luminosity of 4.98 inverse femtobarns. A cross section measurement for the production of WZ is also performed on a subset of the collision data. No significant excess is observed over the Standard Model background, so lower bounds at the 95% confidence level are set on the production cross sections of hypothetical particles decaying to WZ in several theoretical scenarios. Assuming the Sequential Standard Model, W’ bosons with masses below 1143 GeV are excluded. New limits are also set for several configurations of Low-Scale Technicolor.

## Some LHC Calculations

The LHC is currently running at 7 TeV, giving a relativistic gamma factor of 3730: $$E = \gamma mc^2 \rightarrow \gamma = {E \over mc^2} \rightarrow \gamma(3.5\text{TeV}) = 3730$$ This means the protons are moving at 99.999996% the speed of light: $$\gamma = {1 \over \sqrt{1 - \beta^2}} \rightarrow \beta = \sqrt{1 - {1 \over \gamma^2}} \approx 0.99999996$$ Alright, now let’s look at the collision rate, assuming a recent luminosity of 1e33: [Read More]

## Influencing Dynamics in Neural Networks

This research was performed under an NSF-funded Research Experience for Undergraduates program at Indiana University and then extended at Wittenberg University to serve as an undergraudate honors thesis. The work was overseen by John Beggs. It was later published in the 2006 issue of Wittenberg University’s non-fiction literary magazine, Spectrum.