Talks and Papers

The “source files” for my talks can be found in this GitHub repo. The most recent and relevant ones are listed here.

See also my Google Scholar page for papers. some of which are listed below.

AI in the Open: Why It Matters. How to Achieve It. AI Camp, Chicago, February 2024 To maximize availability and safety of AI, we should follow the path of open-source software, while recognizing what is new. Download PDF
Open Source: Science vs. Software. What's Different? What's the Same? Scale by the Bay, November 2023 What I learned at IBM Research about the differences and similarities of open-source software (OSS) and open-source science (OSSci). Download PDF Watch video
Reinforcement Learning, ChatGPT, Games, and More GOTO Chicago, May 2023, and IBM Research, October 2023 Things move fast; an update to January's RL talk that expands the coverage of Reinforcement Learning from Human Feedback, a key element in training ChatGPT. Download PDF (GOTO 2023 version: Download PDF)
Reinforcement Learning with Ray RLlib (V2 for Data Day Texas) Data Day Texas, January 2023 Updated version of my talk on Ray's powerful reinforcement learning library, RLlib. This talk discusses reinforcement learning and how to use RLlib. Download PDF
Lessons Learned from 15 Years of Scala in the Wild Several Conferences, 2021-2022 Since I joined the Scala community roughly 15 years ago, the Scala community has learned a lot to make the language more robust and easier to use effectively. I've also learned lots of lessons about effective "enterprise" programming using Scala. Finally, I see warning signs for FP's future growth. Download PDF Watch video
Next Generation AI: Transitioning to the Continuous, Self-Learning Enterprise Draft, May 2021 A talk I'm developing that looks at the state of AI/ML and what enterprises need to do to fully adopt and leverage it. Download PDF
Modularity: A Retrospective GOTO Chicago Nights, February 18, 2020 and Scala in the City, May 28, 2020 A look at what we've accomplished in making software modular and where we need to go. Download PDF
Reinforcement Learning with Ray RLlib Several Conferences, 2020-2023 Ray comes with a powerful reinforcement learning library, RLlib. This talk discusses reinforcement learning and how to use RLlib. Download PDF
Cluster-wide Scaling of ML with Ray YOW! Data, July, 2020, and CodeMesh, Nov., 2020 Ray is a distributed computing system that offers a concise, intuitive API, with excellent performance for distributed workloads. It emerged out of the AI community at U.C. Berkeley. Download PDF
Ray for Natural Language Processing NLP Summit, October 7, 2020 Ray is being adopted by popular NLP frameworks like spaCy and Hugging Face. I discuss the problems Ray solves for them and how Ray is being used by them. Download PDF
Executive Briefing: What It Takes to Use ML in Fast Data Pipelines Strata San Francisco, London, and NYC 2019 A briefing for managers and executives about the challenges of serving ML models in a streaming data context. Download PDF
Executive Briefing: What You Need to Know about Fast Data Strata London and NYC 2018 A briefing for managers and executives about the trends in Fast Data and how the impact on their organizations. Download PDF Watch video (A similar webinar)
Streaming Microservices with Akka Streams and Kafka Streams Strata San Jose and London 2018, Scala Days NYC 2018, Reactive Summit 2018, YOW! 2018 I discuss processing data in microservices using Akka Streams and Kafka Streams, vs. using tools like Spark and Flink. Download PDF Watch video
Stream All the Things!! Software Architecture Conference NYC 2017, Strata London and NYC 2017, Reactive Summit 2017, ScalaIO 2017, YOW! Data 2018 I discuss the emerging architecture for large-scale stream data processing, that also integrates the best of microservice architectures. Download PDF Watch video
Bash and All That GOTO Chicago 2018 A celebration of the UNIX philosophy and the tools it spawned. Download PDF
Scala and the JVM for Big Data: Lessons from Spark Strata San Jose and Singapore 2016 The JVM is the standard platform for Big Data and Scala is emerging as the standard programming language for Big Data Developers, driven in part by Spark. What lessons can we draw from this picture? Download PDF Longer Version Watch video
Why Spark Is The Next Top (Compute) Model Numerous Venues 2014 and 2015 Spark has emerged as the replacement for MapReduce in Hadoop applications. This talk explains why. Download PDF Watch video
Data Science at Scale with Spark GOTO Chicago 2015 Using examples, I show how to use Spark for Data Science at scale in ways that were previously not feasible with other tools. Download PDF
The Unreasonable Effectiveness of Scala for Big Data Scala Days 2015. Why Scala has proven so effective as the general-purpose programming language for Big Data development. Download PDF Video
Why Scala Is Taking Over the Big Data World Scala Days 2014, Scala eXchange 2014, and Data Day Texas 2015 Scala has emerged as the de facto language for big data development, driven in part by tools like Scalding and Spark. Download PDF Watch video (Scala eXchange)
The Internet Was Made for Cats Chicago Scala Users Group, Jan 2016 My informal introduction to the Typelevel Cats project, including why I think it's model for open-source development. Download PDF
Copious Data, the Killer App for Functional Programming LambdaJam Chicago, others I argued that "Copious" Data (okay, Big Data) is driving adoption of Functional Programming (FP), more so than multicore concurrency concerns, because more developers will grapple with data problems than concurrency. Because FP is based on Mathematics, it is a natural fit for working with Data, whereas languages like Java, in which Hadoop is written, are poor choices. I revisited this subject in March 2020. The original talks where presented at events in 2013-2014 2020 Update - short version: Download PDF Original 2014 Talk - long version: Download PDF Video
SQL Strikes Back! Recent Trends in Data Persistence and Analysis CodeMesh London 2014 Relational databases fell out of fashion with the rise of NoSQL and Hadoop. But SQL proved too useful for too many people, so there are now many SQL-based query tools for Hadoop and subsets of SQL on several "NoSQL" databases. This talk discusses this trend and why it started. (November 4th, 2014) Download PDF
The Seductions of Scala Various Venues An introduction to Scala that I often give at conferences and user groups. The PDF includes a lot of extra material that won't fit into a 50-60 minute time slot. The GitHub page for this talk also has the sources used for the examples. In particular, for the Akka-based Actor example at the end of the talk, see README.md. (November 19, 2013) Download PDF
MapReduce and Its Discontents QCon NYC 2012, and Big Data Techcon Boston 2013 My first public talk where I claimed that MapReduce is the Enterprise JavaBeans of our time. I criticized the MapReduce programming model and the technical limitations of the Hadoop implementation, in particular. In part, I argued that Java (pre Java-8 especially) is the wrong tool for developing Big Data applications and middleware. Instead, we should be using Functional Programming, since when we work with data, we are really doing Mathematics! (April 11, 2013) Download PDF
Why Big Data Needs to Be Functional NE Scala Symposium 2011 A more general version of the previous "Discontents" talk, where I argue that the Hadoop community needs to drop reliance on Java-centric, Object-Oriented approaches and embrace Functional Programming and languages like Scala. (April 15th, 2012) Download PDF Watch video
Heresies and Dogmas in Software Development Strange Loop 2011 I look at 5 ideas in the history of software development that were once popular, and still are in some quarters, but are now seen by most people as obsolete. (November 9th, 2011) Download PDF Watch video
Better Programming Through Functional Programming A half-day tutorial that introduces Functional Programming, why it has become important for our time, and how you can apply its ideas in almost any language. Examples are given in Java and Ruby. There is also a shorter talk version. (July 31st, 2011) Download PDF Download Shorter Talk
Polyglot and Poly-paradigm Programming QCon San Francicso 2008 An argument that modern development problems benefit from a multi-paradigm and/or multi-language solution strategy. Different strategies are discussed in the contexts of example problems (April 2, 2011) Download PDF Watch video (Early version of this talk)
Hive - SQL for Hadoop Chicago Hadoop Users Group This talk introduces Hive, the original SQL tool for Hadoop and explains why it's a key technology that drove adoption of the ecosystem, primarily because it makes it easier to transition SQL-based data warehouses to Hadoop and it enables conventional data analysts to work with Hadoop. (January 2012) Download PDF

Research Papers and Other Presentations

In addition to the talks above, here are a few other talks and the research papers I have written or co-written.

Akka

The Akka Framework: An overview of the Akka Framework for building robust, highly concurrent servers in Java or Scala.

Reactive Programming

The Reactive Manifesto lays out a vision for Reactive Programming. These talks explore various aspects of Reactive.

Aquarium

Aquarium: Aspect-Oriented Programming for Ruby . There are a few exercises that go with the talk: Aquarium_RubyAOP_exercises.zip
Aquarium: AOP for Ruby . A 30-minute talk presented at Aspect-Oriented Software Development 2008. The talk was based on my Industry Track paper.

General AOSD

Noninvasiveness and Aspect-Oriented Design: Lessons from Object-Oriented Design Principles. Research Paper discussing classic OO design principles in the context of AOSD.
AOP in Academia and Industry .
Aspect-Oriented Design Principles: Lessons from Object-Oriented Design .
AOP@Work: Component Design with Contract4J (IBM developerWorks).
The Challenges of Writing Reusable and Portable Aspects in AspectJ: Lessons from Contract4J .
Contract4J for Design by Contract in Java: Design Pattern-Like Protocols and Aspect Interfaces
The Future of AOP

Other

Accelerating automation of digital health applications via cloud native approach. Experience report of lessons learned while building the digital health application portfolio on IBM’s Accelerated Discovery Platform.

Talks and Papers

AI in the Open: Why It Matters. How to Achieve It.

Open Source: Science vs. Software. What's Different? What's the Same?

Reinforcement Learning, ChatGPT, Games, and More

Reinforcement Learning with Ray RLlib (V2 for Data Day Texas)

Lessons Learned from 15 Years of Scala in the Wild

Next Generation AI: Transitioning to the Continuous, Self-Learning Enterprise

Modularity: A Retrospective

Reinforcement Learning with Ray RLlib

Cluster-wide Scaling of ML with Ray

Ray for Natural Language Processing

Executive Briefing: What It Takes to Use ML in Fast Data Pipelines

Executive Briefing: What You Need to Know about Fast Data

Streaming Microservices with Akka Streams and Kafka Streams

Stream All the Things!!

Bash and All That

Scala and the JVM for Big Data: Lessons from Spark

Why Spark Is The Next Top (Compute) Model

Data Science at Scale with Spark

The Unreasonable Effectiveness of Scala for Big Data

Why Scala Is Taking Over the Big Data World

The Internet Was Made for Cats

Copious Data, the Killer App for Functional Programming

SQL Strikes Back! Recent Trends in Data Persistence and Analysis

The Seductions of Scala

MapReduce and Its Discontents

Why Big Data Needs to Be Functional

Heresies and Dogmas in Software Development

Better Programming Through Functional Programming

Polyglot and Poly-paradigm Programming

Hive - SQL for Hadoop