- UNSW
- ...
- Our schools
- Mathematics & Statistics
- Engage with us
- Seminars
- 2019
- Automated Scalable Bayesian Inference via Data Summarization
- Home
- Our school
- Study with us
- Our research
-
Student life & resources
- Undergraduate
- Honours year
- Postgraduate coursework
-
Postgraduate research
- Info for new students
- Current research students
- Postgraduate conference
- Postgraduate events
- Postgraduate student awards
- Michael Tallis PhD Research Travel Award
- Information about research theses
- Past research students
- Resources
- Entry requirements
- PhD projects
- Obtaining funding
- Application & fee information
-
Student services
- Help for postgraduate students
- Thesis guidelines
- School assessment policies
- Computing information
- Mathematics Drop-in Centre
- Consultation
- Statistics Consultation Service
- Academic advice
- Enrolment variation
- Changing tutorials
- Illness or misadventure
- Application form for existing casual tutors
- ARC grants Head of School sign off
- Computing facilities
- Choosing your major
- Student societies
- Student noticeboard
- Casual tutors
- Engage with us
- News & events
- Contact
- Home
- Our school
- Study with us
- Our research
-
Student life & resources
Postgraduate research
- Info for new students
- Current research students
- Postgraduate conference
- Postgraduate events
- Postgraduate student awards
- Michael Tallis PhD Research Travel Award
- Information about research theses
- Past research students
- Resources
- Entry requirements
- PhD projects
- Obtaining funding
- Application & fee information
Student services
- Help for postgraduate students
- Thesis guidelines
- School assessment policies
- Computing information
- Mathematics Drop-in Centre
- Consultation
- Statistics Consultation Service
- Academic advice
- Enrolment variation
- Changing tutorials
- Illness or misadventure
- Application form for existing casual tutors
- ARC grants Head of School sign off
- Computing facilities
- Choosing your major
- Engage with us
- News & events
- Contact
Abstract:
Bayesian methods are attractive for analyzing large-scale data due to in part to their coherent uncertainty quantification, ability to model complex phenomena, and ease of incorporating expert information. Many standard Bayesian inference algorithms are often computationally expensive, however, so their direct application to large datasets can be difficult or infeasible. Other standard algorithms sacrifice accuracy in the pursuit of scalability. We take a new approach. Namely, we leverage the insight that data often exhibit approximate redundancies to instead obtain a weighted subset of the data (called a "coreset") that is much smaller than the original dataset. We can then use this small coreset as input to existing Bayesian inference algorithms without modification. We provide theoretical guarantees on the size and approximation quality of the coreset. In particular, we show that our method provides geometric decay in posterior approximation error as a function of coreset size. We validate on both synthetic and real datasets, demonstrating that our method reduces posterior approximation error by orders of magnitude relative to uniform random subsampling.