I spent my summer at Knewton working on an autoscaling project. I learned a lot along the way about capacity planning, queuing theory, code instrumentation, and server management and deployment, and I recently presented about what I learned at Surge Conference 2012.
The talk focused on instrumentation more than anything else, and you can check out the slides here. If capacity planning sounds interesting, we also open-sourced leptoid – the autoscaling library I worked on – so that you could review the source on Github.
I also had a chance to speak with two excellent guys, Nathan Harvey and Dave Zwieback, for an interview on Food Fight. During that interview I talked about my Surge presentation and experiences as a junior engineer at Knewton.
During my talk I mentioned the unimportance of a good capacity forecasting model relative to good instrumentation and deployment practices. This was revelatory. I actually spent the majority of my summer outside of my machine learning & statistical wheelhouse, instead working on this autoscaling project with Knewton’s systems team. Lots of time went into
- instrumenting code with statsd and Coda Hale’s metrics
- managing a cluster of Graphite hosts
- fumbling around with Chef
That list of items looks really obnoxious, I know, but I don’t want to forget about them. I simultaneously studied operating systems at NYU during this project, and have a list of topics I’d like to dig into once things slow down:
- scheduling and thread prioritization (beyond aging and “NRU” scheduling algorithms)
- networking, especially DNS lookups and comparisons between proxies
- sharding (or distributed storage, especially w.r.t something like HDFS)
- kernel v. user modes, and how they work in practice
Either way, I feel like a better developer after this summer. It was an excellent experience, and Dave and Peter Norton (who really needs to get on Twitter or Pinterest or something) were excellent mentors.