By Dusty — Dec 14, 2023

The wrong tool - Clojure and Spark

I try to use the right tool for the job, but sometimes I grab the wrong one. One of the worst was my decision to use Clojure for Spark jobs.

Spark runs on the JVM, like Clojure. Clojure is functional, and felt like a great fit for Spark jobs. The team had Clojure experience, no Scala and not much Java.

Worst case, we could wrap any needed Java classes in Clojure, right?

Initially things worked pretty well. Getting setup and going took some work, but then it felt like we were on the right path.

As our jobs become more complex, cracks started to show up. I don't remember all the details. But one was debugging a failed job was really hard because there wasn't support translating the Java stack trace back to the Clojure code.

Another issue was working with Clojure in Jupyter notebooks was really hard or impossible. We often explored the data with Python and then translated to Clojure for production. That's a big translation!

We persisted, mostly because the sunk cost fallacy got me. I should have re-evaluated when early signs of friction started to show up.

But we had mostly working code, we knew Clojure, etc., etc. The bottom line was it wasn't the tool we should be using.

Two lessons jumped out at me.

Make sure the language I want to use is well-supported by the ecosystem. Even though Clojure was a good fit for a number of reason, it wasn't a fit in the Spark ecosystem. Which meant an uphill battle for us.

And pay attention to friction early, remember the sunk cost fallacy. Maybe the friction isn't addressable, but I should have re-evaluated. I think switching languages would have made development and debugging faster, and that alone would have been worth it.

Subscribe to WebDev news from candland.net