I’m a software engineer with a focus on designing for simplicity, reliability and stability in distributed systems.
At SoundCloud I’m building tools to the musicians and artists who host content on the platform, and before that I was the Tech Lead of the Data Platform, working on the architecture of our data infrastructure, and guiding how we work with data across the company.
- February 12, 2019 – automatic checks with supervision
SoundCloud Premier Distribution allows creators to distribute their music from SoundCloud to other streaming platforms and stores. For many of our users, this will be their first experience with the strict requirements of the music industry supply chain on metadata and media. Here we’ll look at how a system of automatic and manual validations allows users to get fast feedback as they prepare a release.
- December 1, 2017 – basic definitions
Google's 2015 paper on the Dataflow model describes general solutions to general data pipeline processing problems. The terms they use have been helpful to me in understanding patterns in these problems.
- June 20, 2017 – a better model for data ownership
We have a good solution for ownership of services in a microservices architecture. We can learn from this to define ownership of datasets in a way that reduces the total cost of maintenance and integration across teams.
- February 17, 2017 – limits of verifiability
A blockchain allows independent parties to make verifiable statements. This works with bitcoin, whose value comes from the system itself, but fails in applications where the value is external.
- January 6, 2017 – an old friend
Two-phase commit is a long-established means of keeping two resources strongly synchronised. These days it's not so sexy, but it's an important piece of heritage of distributed computing.
- December 30, 2016 – storing nested data in columns
Record shredding allows nested data structures to be considered in a sort-of-tabular way, and stored in a columnar data store. This post describes the intuition behind how this can be done preserving message structure, from Dremel and Parquet.
- December 7, 2016 – so many things to go wrong
We like our code to be "robust". This post looks at different failure modes against which a system needs to be protected
- October 15, 2016 – in praise of writing down design choices
Being explicit about costs and implications when making choices makes future decisions easier when things change. A collaborative document can be a great implementation of this.
- [Series] February 24, 2016 – introduction to JVM compilation
An introduction to compilation for the JVM, bytecode and JIT compilation, and benchmarking with JMH. It accompanies a talk I gave to the Berlin-Brandenburg Scala User Group.
- [How To] January 26, 2016 – multiple builds for one project
How to build two artifacts from one source folder in SBT
- December 14, 2015 – functional programming with state
Learning about what the State monad represents and how to use and understand it
- [How To] December 3, 2015 – implement sequence on your own types
How to add Applicative and Traverse instances for your own types, use sequence, sequenceU and Unapply
- April 12, 2015 – no-code intro
Deriving how non-blocking I/O must work, from first principles
- April 20, 2014 – a brief introduction to generated tests
Step-by-step guide to using the Guava Testlib library for test case generation
- [How To] June 17, 2013 – understanding RMI settings
An SSH tunnel can allow access to a JMX endpoint that is only exposed to the local machine.
- [How To] June 13, 2013 – creating a path back home
SSH tunnelling allows opening a hole back through a firewall or NAT, and it's really easy to set up.
- May 13, 2013 – my approach to interviewing
A description of my interview approach while at GSA – what I was looking for what I expected from candidates