Talks and Papers

Most of my conference and user group talks can be found in this GitHub repo. I recent removed some very old talks from the list on this page.

Modularity: A Retrospective

GOTO Chicago Nights, February 18, 2020 and Scala in the City, May 28, 2020

A look at what we've accomplished in making software modular and where we need to go.

Reinforcement Learning with Ray RLlib

Chicago Cloud Conference, September 22, 2020

Ray comes with a powerful reinforcement learning library, RLlib. This talk discussed reinforcement learning and how to use RLlib.

Cluster-wide Scaling of ML with Ray

YOW! Data, July, 2020, and CodeMesh, Nov., 2020

Ray is a distributed computing system that offers a concise, intuitive API, with excellent performance for distributed workloads. It emerged out of the AI community at U.C. Berkeley.

Ray for Natural Language Processing

NLP Summit, October 7, 2020

Ray is being adopted by popular NLP frameworks like spaCy and Hugging Face. I discuss the problems Ray solves for them and how Ray is being used by them.

Executive Briefing: What It Takes to Use ML in Fast Data Pipelines

Strata San Francisco, London, and NYC 2019

A briefing for managers and executives about the challenges of serving ML models in a streaming data context.

Executive Briefing: What You Need to Know about Fast Data

Strata London and NYC 2018

A briefing for managers and executives about the trends in Fast Data and how the impact on their organizations.

Download PDF Watch video (A similar webinar)

Streaming Microservices with Akka Streams and Kafka Streams

Strata San Jose and London 2018, Scala Days NYC 2018, Reactive Summit 2018, YOW! 2018

I discuss processing data in microservices using Akka Streams and Kafka Streams, vs. using tools like Spark and Flink.

Stream All the Things!!

Software Architecture Conference NYC 2017, Strata London and NYC 2017, Reactive Summit 2017, ScalaIO 2017, YOW! Data 2018

I discuss the emerging architecture for large-scale stream data processing, that also integrates the best of microservice architectures.

Bash and All That

GOTO Chicago 2018

A celebration of the UNIX philosophy and the tools it spawned.

Scala and the JVM for Big Data: Lessons from Spark

Strata San Jose and Singapore 2016

The JVM is the standard platform for Big Data and Scala is emerging as the standard programming language for Big Data Developers, driven in part by Spark. What lessons can we draw from this picture?

Why Spark Is The Next Top (Compute) Model

Numerous Venues 2014 and 2015

Spark has emerged as the replacement for MapReduce in Hadoop applications. This talk explains why.

Data Science at Scale with Spark

GOTO Chicago 2015

Using examples, I show how to use Spark for Data Science at scale in ways that were previously not feasible with other tools.

The Unreasonable Effectiveness of Scala for Big Data

Scala Days 2015.

Why Scala has proven so effective as the general-purpose programming language for Big Data development.

Copious Data, the Killer App for Functional Programming

LambdaJam Chicago

I argue that "Copious" Data (okay, Big Data) is driving adoption of Functional Programming (FP), more so than multicore concurrency concerns, because more developers will grapple with data problems than concurrency. Because FP is based on Mathematics, it is a natural fit for working with Data, whereas languages like Java, in which Hadoop is written, are poor choices. (November 21, 2013)

Download PDF Watch video (earlier version)

SQL Strikes Back! Recent Trends in Data Persistence and Analysis

CodeMesh London 2014

Relational databases fell out of fashion with the rise of NoSQL and Hadoop. But SQL proved too useful for too many people, so there are now many SQL-based query tools for Hadoop and subsets of SQL on several "NoSQL" databases. This talk discusses this trend and why it started. (November 4th, 2014)

The Seductions of Scala

Various Venues

An introduction to Scala that I often give at conferences and user groups. The PDF includes a lot of extra material that won't fit into a 50-60 minute time slot. The GitHub page for this talk also has the sources used for the examples. In particular, for the Akka-based Actor example at the end of the talk, see README.md. (November 19, 2013)

MapReduce and Its Discontents

QCon NYC 2012, and Big Data Techcon Boston 2013

My first public talk where I claimed that MapReduce is the Enterprise JavaBeans of our time. I criticized the MapReduce programming model and the technical limitations of the Hadoop implementation, in particular. In part, I argued that Java (pre Java-8 especially) is the wrong tool for developing Big Data applications and middleware. Instead, we should be using Functional Programming, since when we work with data, we are really doing Mathematics! (April 11, 2013)

Why Big Data Needs to Be Functional

NE Scala Symposium 2011

A more general version of the previous "Discontents" talk, where I argue that the Hadoop community needs to drop reliance on Java-centric, Object-Oriented approaches and embrace Functional Programming and languages like Scala. (April 15th, 2012)

Heresies and Dogmas in Software Development

Strange Loop 2011

I look at 5 ideas in the history of software development that were once popular, and still are in some quarters, but are now seen by most people as obsolete. (November 9th, 2011)

Better Programming Through Functional Programming

A half-day tutorial that introduces Functional Programming, why it has become important for our time, and how you can apply its ideas in almost any language. Examples are given in Java and Ruby. There is also a shorter talk version. (July 31st, 2011)

Polyglot and Poly-paradigm Programming

QCon San Francicso 2008

An argument that modern development problems benefit from a multi-paradigm and/or multi-language solution strategy. Different strategies are discussed in the contexts of example problems (April 2, 2011)

Download PDF Watch video (Early version of this talk)

Hive - SQL for Hadoop

Chicago Hadoop Users Group

This talk introduces Hive, the original SQL tool for Hadoop and explains why it's a key technology that drove adoption of the ecosystem, primarily because it makes it easier to transition SQL-based data warehouses to Hadoop and it enables conventional data analysts to work with Hadoop. (January 2012)

Other Presentations

In addition to the talks above, here are a few others.

Akka

Reactive Programming

The Reactive Manifesto lays out a vision for Reactive Programming. These talks explore various aspects of Reactive.

Aquarium

General AOSD