Courses
Research
My research focuses broadly on data-oriented systems and the way they drive computing. This spans topics in database systems, distributed computing, data visualization, machine learning and programming languages.
More information on current and past research here.
Industry
In 2020 I co-founded RunLLM, an LLM-powered assistant for technical support. RunLLM grew out of research in the RISELab and SkyLab with my colleague Joseph Gonzalez and
PhD students Vikram Sreekanti and Chenggang Wu.
In 2012 I co-founded Trifacta, a cloud data engineering platform that runs on AWS, Azure and GCP (where it also powers Google
Dataprep by Trifacta). Trifacta is based on prior research in my group on interactive data preparation [1,
2, 3]. Trifacta was acquired by
Alteryx in 2022.
Greenplum was a parallel data warehouse startup based on PostgreSQL. I served as an advisor, collaborator (on Apache MADlib) and
briefly as Engineering Manager at the startup, on through its acquisition by EMC in 2010, where I also served as a technical advisor.
Intel Research Berkeley was an experimental lablet that colocated Intel staff researchers next to Berkeley faculty and students to do "off-roadmap" experimental research. I took a leave to join Intel as Lab Director from 2003-2005, leading research topics in networking, distributed systems, databases, IoT and human-computer interaction.
|
Selected Talks
- Hydroflow: A Compiler Target for Fast Correct Distributed Programs. Keynote, ACM SPLASH (OOPSLA) 2023. [pdf]
- Toward a Programmable Cloud: Foundations and Challenges. Keynote, ACM POPL 2021. [pdf]
- A Data-Centric Lens on Cloud Programming and Serverless Computing. Keynote, ICDE 2020. Distinguished Lecture, Darmstadt U, 2020. [pdf]
- Serverless Computing: One Step Forward, Two Steps Back. CIDR, 2019 [pptx] [pdf]
- Approximation and Interaction: A Progressive's View. Keynote, NSF ACAIA, 2017 [PPTX a>], [pdf].
- Ground: A Data Context Service. CIDR 2017. [pdf]
- People, Computers, and the Hot Mess of Real Data. Keynote, ACM KDD 2016. [pdf]
- Progressive Systems, LinkedIn NYC 2015. [pdf]
- Dancing Calmly with the Devil, Keynote, ACM SoCC 2014. [pdf]
- Of Rocket Ships and Washing Machines: Data Technology for People, Keynote, Strata 2012. [video,
10:46]
- Keep CALM and Query On, RICON 2012, UCSD 2013, UCR 2013. [pdf] [video, 49:24].
- The Declarative Imperative: Experiences and Conjectures in Distributed Logic. Keynote, ACM PODS, 2010. [.key.zip], [pdf],
[
video]
- MAD Skills: New Practices for Big Data. VLDB, 2009. [pptx], [pdf]
- Quantitative Data Cleaning for Large Databases. Keynote, QDB, 2009. [.key.zip], [pdf]
- Bricolage: Data at Play. Keynote, ICDM 2007. [.key.zip] [.mov] [
pdf]
-
The Marvelous Structure of Reality. Keynote, WebDB 2003 [PDF], [.mov] [
.key.sit]
|
Selected Publications
- Readings in Database Systems, 5th Edition. With M. Stonebraker and P. Bailis. [redbook.io]
- New Directions in Cloud Programming. With A. Cheung, N. Crooks and M. Milano. CIDR
2021[pdf]
- Cloudburst: Stateful Functions-as-a-Service. With V. Sreekanti, C. Wu, X. Lin, J. Schleier-Smith, J. Gonzalez and A. Tumanov. VLDB 2020 [pdf]
- Towards Scalable Dataframe Systems. With D. Petersohn, W. Ma, D. Lee, S. Macke, D. Xin, X. Mo, J. Gonzalez, A. Joseph and A. Parameswaran. VLDB 2020 [pdf]
- A Fault Tolerance Shim for Serverless Computing. With V. Sreekanti, C. Wu, S. Chhatrapati, J. E. Gonzalez, and J. M. Faleiro. EuroSys 2020. [pdf]
- Deep Unsupervised Cardinality Estimation. With Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, P. Chen, P. Abbeel, S. Krishnan and I. Stoica. PVLDB
2019.
- Serverless Computing: One Step Forward, Two Steps Back. With J. M. Faleiro, J. Gonzalez, J. Schleier-Smith, V. Sreekanti, A. Tumanov and C. Wu. CIDR 2019. [pdf]
- Anna: A KVS For Any Scale. With C. Wu, J. M. Faleiro and Y. Lin. ICDE 2018. [
pdf]
- Ground: A Data Context Service. with V. Sreekanti, J. Gonzalez et al. CIDR
2017. [pdf]
- Scalable Atomic Visibility with RAMP Transactions. With P. Bailis, A. Fekete, A. Ghodsi, and I. Stoica. TODS 2016.[pdf]
- Predictive Interaction for Data Transformation. With J. Heer and S. Kandel. CIDR
2015. [pdf]
- Logic and Lattices for Distributed Programming. With W. R. Marczak, P. Alvaro, N. R. Conway, and D. Maier. SoCC, 2012. [pdf]
- Enterprise Data Analysis and Visualization: An Interview Study. With S. Kandel, A. Paepcke and J. Heer. IEEE VAST, 2012. [pdf]
- Searching for Jim Gray: a technical overview. (with D. L. Tennenhouse on behalf of a large team of volunteers).
Commun. ACM 54(7), 2011. [pdf]
- Wrangler: Interactive Visual Specification of Data Transformation Scripts (with S. Kandel, A. Paepcke, and J. Heer). CHI 2011. [PDF]
- Data in the First Mile (with K. Chen and T. Parikh). CIDR 2011 [PDF].
- Consistency Analysis in Bloom: a CALM and Collected Approach (with P. Alvaro, N. Conway, and W.R. Marczak). CIDR 2011. [PDF]
- The Declarative Imperative: Experiences and Conjectures in Distributed Logic.
SIGMOD
Record 39:1, Sep. 2010. [pdf]
- Declarative
Networking (with B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, P. Maniatis, R. Ramakrishnan, T. Roscoe and I. Stoica). Research Highlights, CACM 52(11), 2009. [Intro
by Peter Druschel] [pdf].
- Architecture of a Database System. (with M. Stonebraker and J. Hamilton).
Foundations and
Trends in Databases 1(2). [PDF]
- Implementing Declarative Overlays. (with B. T. Loo, T. Condie, P. Maniatis, T. Roscoe, and I. Stoica). In 20th SOSP, 2005. [PDF]
- TinyDB: An Acqusitional Query Processing System for Sensor Networks. (with S. Madden, M. Franklin, and Wei Hong). ACM TODS. [PDF]
- Model-Driven Data Acquisition in Sensor Networks (with A. Deshpande, C. Guestrin, S. Madden and W. Hong.) VLDB 2004 [
PDF]
- Commencement Address. Computer Science, College
of Letters and Science, UC Berkeley, May 26, 2002. [pdf]
- On a Model of Indexability and its Bounds for Range
Queries (with E. Koutsoupias, D. Miranker, C. Papadimitriou, and V. Samoladas).
JACM 49(1) (2002). [pdf]
- Potter's Wheel: An Interactive Data Cleaning System (with V. Raman). VLDB
2001. [PDF]
-
Eddies: Continuously Adaptive Query Processing (with R. Avnur).
SIGMOD 2000. [PDF] [
PS].
-
Interactive Data Analysis with CONTROL (with many others). IEEE
Computer, August 1999. [PDF]
-
Generalized Search Trees for Database Systems (with J. F. Naughton and A. Pfeffer.)
VLDB
1995. [PS]
|
Publications
|