Big Data Infrastructure: Spring 2015


Jimmy Lin


iSchool at Maryland


Spring 2015


Over the past few years, we have seen the emergence of “big data”: disruptive technologies that have transformed commerce, science, and many aspects of society. These developments are enabled by infrastructure that allows us to distribute computations across hundreds or even thousands of commodity servers. One key breakthrough that makes this all possible is the development of abstractions for data-intensive computing that allow programmers to reason about computations at a massive scale, hiding low-level details such as synchronization, data movement, and fault tolerance.

This course provides an introduction to big data infrastructure, starting with MapReduce, the first of these datacenter-scale programming abstractions. The Hadoop implementation of MapReduce lies at the core of an application stack that is gaining widespread adoption in both industry and academia. A major focus of this course is algorithm design and “thinking at scale”, applied to a variety of domains: text, graphs, relational data, etc. We will also cover a number of next generation systems that are vying to replace MapReduce as the de facto big data processing platform of tomorrow.

Required Textbook:

Lin, J., Dyer, C. 2010. Data-Intensive Text Processing with MapReduce.
White, T. 2015. Hadoop: The Definitive Guide.

Link to Syllabus:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s