UC Berkeley
 
Home
Biography
Publications
Talks
Research
Students
Courses
Quotes and Links
Campus Directions
Campus Map
Directions to Soda Hall
Blog

Joseph M. Hellerstein

Jim Gray Professor of the Graduate School
EECS Computer Science Division
UC Berkeley

Joseph M. Hellerstein

Research Interests

My research focuses broadly on data-oriented systems and the way they drive computing. Recently this includes distributed programming models, serverless computing, distributed consistency and isolation, data management for machine learning and data science, interactive data visualization and transformation, and query processing.

My research is driven by collaborations with colleagues in a wide variety of fields including Programming Languages, Human-Computer Interaction, AI, Networking, Security, and Theoretical Computer Science.

I am co-director of the

Current Projects

Distributed Systems: The Hydro project is developing new techniques for the programmable cloud. Sub-projects include:

  • Anna: an any-scale, multi-tier autoscaling key-value store.
  • Cloudburst: stateful functions-as-a-service.
  • Cloudflow: a dataflow DSL for prediction serving pipelines.

Data Management for Machine Learning: The machine learning lifecycle presents many data management problems.

  • FLOR is a system for hindsight logging of ML training pipelines.

Interactive Data Visualization: Data visualization systems merge language design, data processing and asynchronous event processing in service of human-centric data interaction. Current projects include:

  • B2 is a Jupyter extension bridging code with interactive visualization
  • DIEL is an interactive visualization framework for handling asynchrony

Past Projects

: Data Context Services.

BOOM and : Orders Of Magnitude simpler code for the Cloud.

d^p ("deep"): Data to the People, led to Trifacta, Captricity and MADlib.

BayesStore: Probabilistic data management

Declarative Networking and the P2 system

Querying, monitoring, and networking using wireless sensor networks

PIER: A peer-to-peer query engine based on distributed hash table (DHT) overlay technologies.

Telegraph: An Adaptive Dataflow System for networked data and services.

TinyDB: A query processing engine for ad-hoc wireless sensor networks.

CONTROL: Interactive Analysis of Massive Datasets, including online aggregation, online data cleaning (Potter's Wheel), online data mining and scalable spreadsheets.

GiST: Generalized Indexing (GiST for PostgreSQL, libgist), Access Method Profiling and Debugging (amdb), and Indexability

Open Source Software


Last modified: $Date: 2020/12/02 08:24:20 $ by Joe Hellerstein