How To Use Selenium To Web-Scrape on AWS Lambda
Karran Besen | Jan 12, 2023
In the last few months, I worked on two projects that were similar in scope and complexity; both were API backends for mobile apps, had similar requirements and timeframes. However, one thing set them apart: one was written in Django + REST while the other was written in Clojure + GraphQL.
Since I was the main developer for both projects, I was able to make useful comparisons between both approaches. In this post, I’ll describe the pros and cons of each.
I won’t get into details about why Clojure is awesome since there are many blogs around the web that do a much better job than me, but I’ll say this: the combination of immutability, repl-driven development, and functional programming is unmatched in any other language in terms of a productive and fun development experience.
There are a few GraphQL options in Clojure and, after testing the popular libraries, one came out way ahead of the others: Lacinia. It follows the Clojure principles of simplicity and data over functions over macros, letting the developer compose the schema in pure(ish) EDN. It is also used for internal APIs at Walmart.
Now, this is one well-designed library. It is simple, predictable, makes extending the code base a breeze and gets out of the way when things get complex. When you need to add a new query or mutation (GraphQL parlance for endpoints), it is simply a matter of updating the schema, writing the resolver function (which is datastore-agnostic, so you can fetch from one or multiple stores or even a REST endpoint) and you’re done. Plus, the error messages are excellent (especially for a Clojure lib).
Compared to the Django project, we experienced a lot less friction between the backend and mobile teams. There were fewer arguments about which fields an endpoint should provide or how to format a response since the client can just query however it wants.
In addition, having a typed schema means there’s a contract sitting between the client and the data that guarantees the entire team is on the same page from the start. It makes it possible to introspect the schema, which is used to great advantage in the GraphQL world.
Once the schema is defined, anyone with the GraphiQL client can interactively explore it with niceties such as autocompletion, syntax highlighting and even pretty formatting. This alone is worth its weight in gold. I am a fan of auto-generated API docs, and this feature makes the project less likely to suffer from documentation drift.
Since GraphQL queries are extremely flexible, allowing the client to rename fields in the response and even fetch the entire UI’s requirements in one go, the need for UI-aware endpoints is retired.
Every commit we make is formatted according to the Angular git commit guidelines so that we can auto-generate nice changelogs. This also allowed us to generate a few stats for comparison: the Django codebase contains 0.8 fixes for every feature committed while the Clojure codebase contains 0.2 fixes for every feature. This is a huge difference and, to be honest, I can’t really attribute this to either Clojure, Lacinia or GraphQL alone, but I think the synergy between these technologies is powerful.
This might be related to the point above since it’s known that there’s a correlation between the LOC and the number of bugs in a codebase.
I happen to hold a hard-won minority opinion about code bases. In particular I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size.
– Steve Yegge
Even though we had to implement many features that come for free with Django (or Django Rest Framework) in the Clojure project, like an admin site, token-based authentication, and social authentication, in the end, there were fewer lines of code. As of this writing, the Clojure project contains ~53% fewer lines of code.
In Django-land, most of the features that you might need in a web app are covered by an external library. Authentication is a good example: there are many mature libraries that are mostly plug and play, which lets you focus on the important part of the problem domain. Although there are lots of web libraries for Clojure, the number of libraries that solve the authentication problem for GraphQL is zero: you have to write your own. It is not a big deal since there are excellent libraries like Buddy to help you along but still, not even close to plug and play.
Besides libraries, there’s also the problem that there’s no consensus on many architecture issues, like: status codes, auth, and file uploads.
GraphQL endpoints always return 200 so there’s no status code to convey what happened in case of an error because it doesn’t surface errors from the backend. Of course, HTTP error codes have nothing to do with the failure of a GraphQL query, where a field resolver might return an error but the parent did not. Still, these are open questions and left to the developer to figure out.
We chose to be pragmatic from the start so, given a roadblock, we would implement the endpoint as REST and, later, try to find an idiomatic GraphQL solution. We hit this situation twice right at the beginning with authentication and file uploads and, months later, we still haven’t found a solution for file uploads.
We tried using base64 string-based mutations for file uploads, but there were two very problematic downsides: base64 makes files sizes ~30% larger and you lose the option of restarting a failed upload.
GraphQL doesn’t care about your store(s), it only cares about calling your resolve methods for populating the data for the request, so it is very easy to run into n+1 queries* and performance issues. Granted, even with those problems, an unoptimized Clojure app is an order of magnitude faster than a fast Django server). However, the burden is entirely on the developer to build smart resolvers
Our main tool for writing web servers in Clojure is Luminus, an awesome set of tested and documented utilities and libraries to start you off. Lacinia provides a companion library that exposes your schema as Pedestal endpoints and has support for subscriptions and a few utilities, but none for Luminus (we’re working on something).
Django provides out of the box a pretty decent admin interface, which is in itself a strong reason for choosing Django. There is no corresponding package for Clojure so our only choice was to develop a custom admin interface. Fortunately, we found a great library called Admin on Rest, which lets you build a custom admin interface using pre-built React components.
I left these for last because these complaints are not new in the Clojure community: horrendous Java stack traces, memory usage and startup time still suck. Plus, there’s the impedance mismatch between JS’s camel case and Clojure’s kebab case.
Generally, it was a much better experience developing a web app in Clojure + GraphQL than Django + Django Rest Framework. I feel like my experience is comparable in both languages (I’ve been using Clojure since 2009 and Python since 2007) and, although these were different projects, there were a lot of similarities in scope and size.
Besides being a framework on top of another framework, which makes it hard for a developer not familiar with DRF’s idioms to extend the codebase, the Django + DRF combo tends to encourage complex OO code, where you have to know exactly which methods to override when you need custom functionality, making the code path from an endpoint to the actual data fetching kind of opaque. If you only know Python, then the learning curve is pretty steep.
… the real win with choosing a immutable functional language such as Clojure is around the 2/3 mark in a project. Just when a large traditional OO system starts to buckle under it’s own gravity … this is when a Clojure code-base will shine in comparison
Clojure + Lacinia is the opposite to the above problem: everything’s just plain Clojure, using the same data structures you’re used to. If you know the language well, it’ll only take a few hours to get going.
In the end, although there’s a very large ecosystem around web development in Python and Django comes by default with the inimitable admin, given a similar project (backend API for a mobile app) I would choose Clojure (and probably Lacinia) for the simplicity and productivity.
Aimless contrarian, insurgent lisper, coffee addict. Loves sports and great food.