I am a distributed systems researcher. My focus is on better abstractions for distributed systems that are simpler and easier to build, modify, and operate.

Previously, I worked at Facebook, where I started and productionized Delos, a storage system for control plane services (i.e., Facebook's Chubby). Delos has been in production for 3+ years; processes over 2B TXes/day for services such as the Twine scheduler; and is replacing all uses of ZooKeeper at Facebook. For more about Delos, see this blog and talk from 2018; this blog and talk from 2021; or our OSDI 2020 paper and talk. Delos builds upon my prior research on shared log systems, including CORFU (NSDI 2012; productionized at VMware as CorfuDB), Tango (SOSP 2013), and the FuzzyLog (OSDI 2018).

Prior to that, I was an Associate Professor in the Department of Computer Science at Yale University from 2015 to 2019. Before that, I worked at VMware Research and Microsoft Research Silicon Valley. I got my PhD at Cornell University in 2009 with Ken Birman on reliable communication protocols for data centers.

At Yale, my work was funded by an NSF AitF grant, Facebook Faculty Awards (2015 and 2016), and a VMWare Early Career Faculty Grant (2017). My work has received best paper awards at OSDI, ASPLOS, SYSTOR, Middleware, and HotStorage.

Here is an incipient blog.

Service:

NSDI 2023 (upcoming co-chair) || OSDI 2022 || NSDI 2022 || OSDI 2021 || FAST 2021 || NSDI 2021 || SoCC 2020 || HotCloud 2020 || ATC 2019 || FAST 2019 || OSDI 2018 (light) || ATC 2018 || FAST 2018 || HotStorage 2017 || SoCC 2017 (co-chair) || WWW 2017 || EuroSys 2017 || FAST 2017 || NSDI 2017 || SoCC 2016 || MaRS 2016 || HotStorage 2016 || ICDCS 2016 || ATC 2016 || SYSTOR 2016 || NSDI 2016 || EuroSys 2016 (light) || LADIS 2015 (co-chair) || INFLOW 2015 || SOCC 2015 (poster chair) || APSys 2015 || ICDCS 2015 || IPDPS 2015 || CloudDM 2015 || WWW 2015 || INFLOW 2014 || TRIOS 2014 || SFMA 2014 || SoCC 2013 || LADIS 2012 || ICDCS 2010 || SSS 2009

Publications:

SOSP 2021
Log-structured Protocols in Delos.
Mahesh Balakrishnan, Chen Shen, Ahmed Jafri, Suyog Mapara, David Geraghty, Jason Flinn, Vidhya Venkat, Ivailo Nedelchev, Santosh Ghosh, Mihir Dharamshi, Jingming Liu, Filip Gruszczynski, Jun Li, Rounak Tibrewal, Ali Zaveri, Rajeev Nagar, Ahmed Yossef, Francois Richard, Yee Jiun Song.
To appear in SOSP 2021: The 28th ACM Symposium on Operating Systems and Principles, Koblenz, Germany, November 2021.
[paper]

OSDI 2020
Virtual Consensus in Delos.
Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi, Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski, Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, Yee Jiun Song.
In OSDI 2020: 14th USENIX Symposium on Operating Systems Design and Implementation, November 2020.
(best paper award)
[paper]

NSDI 2020
Check before You Change: Preventing Correlated Failures in Service Updates.
Ennan Zhai, Ang Chen, Ruzica Piskac, Mahesh Balakrishnan, Bingchuan Tian, Bo Song, Haoliang Zhang.
In NSDI 2020: 17th USENIX Symposium on Networked Systems Design and Implementation, Santa Clara, CA, February 2020.
[paper]

SoCC 2019
WormSpace: A Modular Foundation for Simple, Verifiable Distributed Systems.
Ji Yong Shin, Juno Kim, Wolf Honore, Hernan Vanzetto, Srihari Radhakrishnan, Mahesh Balakrishnan, Zhong Shao.
In SoCC 2019: ACM Symposium on Cloud Computing, Santa Cruz, CA, November 2019.
[paper]

CACM 2018
How to implement any concurrent data structure.
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera.
In Communications of the ACM 61 (12), December 2018.
[paper]

OSDI 2018
The FuzzyLog: A Partially Ordered Shared Log.
Joshua Lockerman, Jose Faleiro, Juno Kim, Soham Sankaran, Daniel Abadi, Jim Aspnes, Siddhartha Sen, Mahesh Balakrishnan.
In OSDI 2018: 13th USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, CA, October 2018.
[paper]

TOS
Isotope: ACID Transactions for Block Storage.
Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, Hakim Weatherspoon.
In ACM Transactions on Storage (TOS), Volume 13 Issue 1, Feb 2017.
[paper]

ASPLOS 2017
Black-box concurrent data structures for NUMA architectures.
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera.
In ASPLOS 2017: 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Xi'an, China, April 2017.
(best paper award)
[paper]

FPT 2016
Design and Implementation of Open-Source SATA III Core for Stratix V FPGAs.
Sumedh Guha, Wen Wang, Shafeeq Ibraheem, Mahesh Balakrishnan, Jakub Szefer.
In FPT 2016: International Conference on Field-Programmable Technology, Xi'an, China, December 2016.
[paper]

SoCC 2016
Towards Weakly Consistent Local Storage Systems.
Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, Jakub Szefer, Hakim Weatherspoon.
In SoCC 2016: ACM Symposium on Cloud Computing, Santa Clara, CA, October 2016.
[paper]

SYSTOR 2016
Enabling Space Elasticity in Storage Systems.
Helgi Sigurbjarnarson, Petur Orri Ragnarsson, Juncheng Yang, Ymir Vigfusson, Mahesh Balakrishnan.
In SYSTOR 2016: 9th ACM International Systems and Storage Conference, Haifa, Israel, June 2016.
(best student paper award)
[paper]

FAST 2016
Isotope: Transactional Isolation for Block Storage.
Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, Hakim Weatherspoon.
In FAST 2016: 14th USENIX Conference on File and Storage Technologies, San Jose, CA, February 2016.
[paper]

HotStorage 2014
Harmonium: Elastic Cloud Storage via File Motifs.
Helgi Sigurbjarnarson, Petur Orri Ragnarsson, Ymir Vigfusson, Mahesh Balakrishnan.
In HotStorage 2014: 6th Usenix Workshop on Hot Topics in Storage and File Systems, Philadelphia, PA, June 2014.
[paper]

Internet Computing
Contrail: Decentralized and Privacy-Preserving Social Networks on Smartphones.
Patrick Stuedi, Iqbal Mohomed, Mahesh Balakrishnan, Morley Mao, Doug Terry, Ted Wobber.
In IEEE Internet Computing, December 2013.

CCSNA 2013
Hiding behind the Clouds: Efficient, Privacy-Preserving Queries via Cloud Proxies.
Surabhi Gaur, Melody Moh, Mahesh Balakrishnan.
In IEEE Globecom 2013 Workshop on Cloud Computing Systems, Networks, and Applications, Atlanta, GA, December 2013.
[paper]

TOCS 2013
CORFU: A Distributed Shared Log.
Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei, Ted Wobber.
In ACM Transactions on Computer Systems (TOCS), 31(4), 10, December 2013 (invited paper)

SOSP 2013
Tango: Distributed Data Structures over a Shared Log.
Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran, Michael Wei, John D. Davis, Sriram Rao, Tao Zou, Aviad Zuck.
In SOSP 2013: The 24th ACM Symposium on Operating Systems Principles.
[paper] [slides] [video]

SOSP 2013
Consistency-Based Service Level Agreements for Cloud Storage.
Douglas Terry, Vijayan Prabhakaran, Rama Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, Hussam Abu-Libdeh.
In SOSP 2013: The 24th ACM Symposium on Operating Systems Principles.
[paper]

SYSTOR 2013
Beyond Block I/O: Implementing a Distributed Shared Log in Hardware.
Michael Wei, John D. Davis, Ted Wobber, Mahesh Balakrishnan, Dahlia Malkhi.
In SYSTOR 2013: 6th International Systems and Storage Conference, Haifa, Israel, June 2013.
[paper]

FAST 2013
Gecko: Contention-Oblivious Disk Arrays for Cloud Storage.
Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, Hakim Weatherspoon.
In FAST 2013: 11th USENIX Conference on File and Storage Technologies, San Jose, CA, February 2013.
[paper]

HotStorage 2012
Gecko: A Contention-Oblivious Design for Cloud Storage.
Ji-Yong Shin, Mahesh Balakrishnan, Lakshmi Ganesh, Tudor Marian, Hakim Weatherspoon.
In HotStorage 2012: 4th Usenix Workshop on Hot Topics in Storage and File Systems, Boston, MA, June 2012.

NSDI 2012
CORFU: A Shared Log Design for Flash Clusters.
Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, John D. Davis.
In NSDI 2012: 9th Usenix Symposium on Networked Systems Design and Implementation, San Jose, CA, April 2012.
[paper] [video]

OSR 2012
From Paxos to CORFU: A Flash-Speed Shared Log.
Dahlia Malkhi, Mahesh Balakrishnan, John D. Davis, Vijayan Prabhakaran, Ted Wobber.
In ACM SIGOPS Operating Systems Review, Volume 46 Issue 1, January 2012.

Middleware 2011
Contrail: Enabling Decentralized Social Networks on Smartphones.
Patrick Stuedi, Iqbal Mohomed, Mahesh Balakrishnan, Ted Wobber, Doug Terry, Morley Mao.
In Middleware 2011: ACM/IFIP/USENIX 12th International Middleware Conference, Lisboa, Portugal, December 2011.
(best paper award)
[paper]

Usenix ATC 2011
Online Migration for Geo-Distributed Storage Systems.
Nguyen Tran, Marcos K. Aguilera, Mahesh Balakrishnan.
In Usenix 2011: Usenix Annual Technical Conference, Portland, OR, June 2011.
[paper]

TON 2011
Maelstrom: Transparent Error Correction for Communication between Data Centers.
Mahesh Balakrishnan, Tudor Marian, Ken Birman, Hakim Weatherspoon, Lakshmi Ganesh.
In IEEE/ACM Transactions on Networking, June 2011.

TOS 2010
Differential RAID: Rethinking RAID for SSD Reliability.
Mahesh Balakrishnan, Asim Kadav, Vijayan Prabhakaran, Dahlia Malkhi.
In ACM Transactions on Storage, Volume 6 Issue 2, June 2010 (invited paper)

EuroSys 2010
Differential RAID: Rethinking RAID for SSD Reliability.
Mahesh Balakrishnan, Asim Kadav, Vijayan Prabhakaran, Dahlia Malkhi.
In EuroSys 2010: 5th ACM European Conference on Computer Systems, Paris, France, April 2010.
[paper]

EuroSys 2010
Dr. Multicast: Rx for Data Center Communication Scalability.
Ymir Vigfusson, Hussam Abu-Libdeh, Mahesh Balakrishnan, Ken Birman, Robert Burgess, Gregory Chockler, Haoyuan Li, Yoav Tock.
In EuroSys 2010: 5th ACM European Conference on Computer Systems, Paris, France, April 2010.
[paper]

HotNets 2010
Location, Location, Location! Modeling Data Proximity in the Cloud.
Birjodh Tiwana, Mahesh Balakrishnan, Marcos Aguilera, Hitesh Ballani, Z. Morley Mao.
In HotNets IX: Ninth Workshop on Hot Topics in Networking, Monterey, CA, October 2010.
[paper]

HotStorage 2010
Depletable Storage Systems.
Vijayan Prabhakaran, Mahesh Balakrishnan, Ted Wobber, John Davis.
In HotStorage 2010: 2nd Workshop on Hot Topics in Storage and File Systems, Boston, MA, June 2010.
[paper]

FAST 2010
Extending SSD Lifetimes with Disk-Based Write Caches.
Gokul Soundararajan, Vijayan Prabhakaran, Mahesh Balakrishnan, Ted Wobber.
In FAST 2010: 8th USENIX Conference on File and Storage Technologies, San Jose, CA, February 2010.
[paper]

IMC 2009
Where's that Phone?: Geolocating IP Addresses on 3G Networks.
Mahesh Balakrishnan, Iqbal Mohomed, Venugopalan Ramasubramanian.
In IMC 2009: Internet Measurement Conference, Chicago, IL, November 2009.
[paper]

HotStorage 2009
Differential RAID: Rethinking RAID for SSD Reliability.
Asim Kadav, Mahesh Balakrishnan, Vijayan Prabhakaran, Dahlia Malkhi.
In HotStorage 2009: 1st Workshop on Hot Topics in Storage and File Systems, Big Sky, MT, October 2009. This version also appeared in ACM SIGOPS Operating Systems Review, 44(1), January 2010.
(best paper award)
[paper]

SIGMETRICS 2009
On the Treeness of Internet Latency and Bandwidth.
Venugopalan Ramasubramanian, Dahlia Malkhi, Fabian Kuhn, Mahesh Balakrishnan, Archit Gupta, Aditya Akella.
In SIGMETRICS / Performance 2009: Joint International Conference on Measurement and Modeling of Computer Systems, Seattle, WA, June 2009.

FAST 2009
Smoke and Mirrors: Shadowing Files at Remote Locations without Performance Loss.
Hakim Weatherspoon, Lakshmi Ganesh, Tudor Marian, Mahesh Balakrishnan, Ken Birman.
In FAST 2009: 7th USENIX Conference on File and Storage Technologies, San Francisco, CA, February 2009.

HotNets 2008
Dr. Multicast: Rx for Datacenter Communication Scalability.
Ymir Vigfusson, Hussam Abu-Libdeh, Mahesh Balakrishnan, Ken Birman, Yoav Tock.
In HotNets VII: Seventh ACM Workshop on Hot Topics in Networks, Calgary, Canada, October 2008.
[paper]

DSN 2008
Tempest: Soft State Replication in the Service Tier.
Tudor Marian, Mahesh Balakrishnan, Ken Birman, Robbert van Renesse.
In DSN 2008: 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DCCS track), Anchorage, AK, June 2008.
[paper]

NSDI 2008
Maelstrom: Transparent Error Correction for Lambda Networks.
Mahesh Balakrishnan, Tudor Marian, Ken Birman, Hakim Weatherspoon, Einar Vollset.
In NSDI 2008: Fifth Usenix Symposium on Networked Systems Design and Implementation, San Francisco, CA, April 2008.
[paper]

PODC 2007
Reconstructing Approximate Tree Metrics.
Ittai Abraham, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Kunal Talwar, Venugopalan Ramasubramanian.
In PODC 2007: 26th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Portland, OR, August 2007.
[paper]

HotOS 2007
Optimizing Power Consumption in Large Scale Storage Systems.
Lakshmi Ganesh, Hakim Weatherspoon, Mahesh Balakrishnan, Ken Birman.
In HotOS XI: 11th Workshop on Hot Topics in Operating Systems, San Diego, CA, May 2007.
[paper]

NSDI 2007
Ricochet: Lateral Error Correction for Time-Critical Multicast.
Mahesh Balakrishnan, Ken Birman, Amar Phanishayee, Stefan Pleisch.
In NSDI 2007: Fourth Usenix Symposium on Networked Systems Design and Implementation, Cambridge, MA, April 2007.
[paper]

Comsware 2007
Scalable Multicast Platforms for a New Generation of Robust Distributed Applications.
Ken Birman, Mahesh Balakrishnan, Danny Dolev, Tudor Marian, Krzysztof Ostrowski, Amar Phanishayee.
In COMSWARE 2007: 2nd IEEE/Create-Net/ICST International Conference on Communication System Software and Middleware, Bangalore, India, January 2007.

SRDS 2006
PLATO: Predictive Latency-Aware Total Ordering.
Mahesh Balakrishnan, Ken Birman and Amar Phanishayee.
In SRDS 2006: 25th IEEE Symposium on Reliable Distributed Systems, Leeds, UK, October 2006.
[paper]

WASR 2006
Reliable Multicast for Time-Critical Systems.
Mahesh Balakrishnan and Ken Birman.
In WASR 2006: 1st IEEE Workshop on Applied Software Reliability, Philadelphia, PA, June 2006.

MobiHoc 2006
Mistral: Efficient Flooding in Mobile Ad-hoc Networks.
Stefan Pleisch, Mahesh Balakrishnan, Ken Birman and Robbert van Renesse.
In MobiHoc 2006: 7th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Florence, Italy, May 2006.
[paper]

NCA 2005
Slingshot: Time-Critical Multicast for Clustered Applications.
Mahesh Balakrishnan, Stefan Pleisch and Ken Birman.
In NCA 2005: 5th IEEE International Symposium on Network Computing and Applications, Boston, MA, July 2005.
[paper]