## February 19, 2012

### "From Data to Knowledge: Machine-Learning with Real-time & Streaming Applications" (Dept. of Signal Amplification)

Attention conservation notice: Intellectuals gathering in Berkeley to argue about "knowledge" and "revolution".

This looks like fun, and if I didn't have conflicting obligations I'd definitely be there.

#### From Data to Knowledge: Machine-Learning with Real-time & Streaming Applications

May 7-11 2012
On the Campus of the University of California, Berkeley

We are experiencing a revolution in the capacity to quickly collect and transport large amounts of data. Not only has this revolution changed the means by which we store and access this data, but has also caused a fundamental transformation in the methods and algorithms that we use to extract knowledge from data. In scientific fields as diverse as climatology, medical science, astrophysics, particle physics, computer vision, and computational finance, massive streaming data sets have sparked innovation in methodologies for knowledge discovery in data streams. Cutting-edge methodology for streaming data has come from a number of diverse directions, from on-line learning, randomized linear algebra and approximate methods, to distributed optimization methodology for cloud computing, to multi-class classification problems in the presence of noisy and spurious data.

This conference will bring together researchers from applied mathematics and several diverse scientific fields to discuss the current state of the art and open research questions in streaming data and real-time machine learning. The conference will be domain driven, with talks focusing on well-defined areas of application and describing the techniques and algorithms necessary to address the current and future challenges in the field.

Sessions will be accessible to a broad audience and will have a single track format with additional rooms for breakout sessions and posters. There will be no formal conference proceedings, but conference applicants are encouraged to submit an abstract and present a talk and/or poster.

See the conference page for submission details, schedules, etc.

Via conference organizer and CMU alumnus Joey Richards.

Posted by crshalizi at February 19, 2012 12:44 | permanent link

### Talks Next Week

Attention conservation notice: Only of interest if you (1) like hearing people talk about statistics and machine learning, and (2) will be in Pittsburgh next week.

I have been remiss about advertising upcoming talks.

Mark Davenport, "To Adapt or Not To Adapt: The Power and Limits of Adaptivity for Sparse Estimation"
Abstract: In recent years, the fields of signal processing, statistical inference, and machine learning have come under mounting pressure to accommodate massive amounts of increasingly high-dimensional data. Despite extraordinary advances in computational power, the data produced in application areas such as imaging, remote surveillance, meteorology, genomics, and large scale network analysis continues to pose a number of challenges. Fortunately, in many cases these high-dimensional signals contain relatively little information compared to their ambient dimensionality. For example, signals can often be well-approximated as sparse in a known basis, as a matrix having low rank, or using a low-dimensional manifold or parametric model. Exploiting this structure is critical to any effort to extract information from such data.
In this talk I will overview some of my recent research on how to exploit such models to recover high-dimensional signals from as few observations as possible. Specifically, I will primarily focus on the problem of estimating a sparse vector from a small number of noisy measurements. To begin, I will consider the case where the measurements are acquired in a nonadaptive fashion. I will establish a lower bound on the minimax mean-squared error of the recovered vector which very nearly matches the performance of $\ell1$-minimization techniques, and hence shows that these techniques are essentially optimal. I will then consider the case where the measurements are acquired sequentially in an adaptive manner. I will prove a lower bound that shows that, surprisingly, adaptivity does not allow for substantial improvement over standard nonadaptive techniques in terms of the minimax MSE. Nonetheless, I will also show that there are important regimes where the benefits of adaptivity are clear and overwhelming.
Time and place: 4--5 pm on Monday, 20 February 2012, in Scaife Hall 125
Ambuj Tewari, "From Probabilistic to Game Theoretic Foundations for Learning and Prediction"
Abstract: The probabilistic approach to prediction problems assumes that the data is generated from an underlying stochastic process. A reasonable goal then is to minimize the expected loss, or risk. The game theoretic approach, in contrast, views prediction as a repeated game between the learner and an adversary. The learner's goal then is to do well no matter what strategy is followed by the adversary. Minimizing regret is one of the well known ways to operationalize the notion of doing well. With a long history in varied disciplines such as Computer Science, Economics, Information Theory, and Statistics, the game theoretic approach has witnessed a vigorous development. Yet the suite of standard tools available for the probabilistic setting, such as Rademacher & Gaussian averages, covering numbers, and combinatorial dimensions, was missing in the game theoretic setting. In this talk, I will show how it is indeed possible to develop analogues of these tools for the game theoretic setting. Unlike the probabilistic setting, where empirical risk minimization is a canonical algorithm, we will not be able to exhibit a corresponding canonical algorithm for the game theoretic setting. However, under the additional assumption of convexity, I will show that Mirror Descent, a classic algorithm from optimization theory, is a canonical algorithm achieving minimax regret rates.
(Talk is based on papers written jointly with Alexander Rakhlin, Nathan Srebro, and Karthik Sridharan.)
Time and place: 10--11 am on Wednesday, 22 February 2012, in Gates Hall 6115
Forrest W. Crawford, "Birth, Death, Sex, Lies: Markov Counting Processes in Genetics and Beyond"
Abstract: A general birth-death process (BDP) is a continuous-time Markov chain that counts the number of particles in a system over time. At any moment in time, a particle may give birth or die, and the rate at which these events occur depends on the number of particles in the system at that time. While widely used in population biology, genetics, and evolution, statistical inference techniques for general BDPs remain elusive. In fact, the likelihood of a discrete observation from many of these processes cannot be written in closed form. In this talk, I outline several fundamental results that allow computation of transition probabilities and maximum likelihood estimates for general BDPs. I apply these novel methods to three important applied problems. First, I describe a technique for determining the effect of antibody treatment on the growth of lymphoma cells in vitro. Second, I investigate the evolution of DNA microsatellites in humans and chimpanzees using a log-linear model for the rates of repeat duplication and deletion. Finally, I use a BDP to infer true counts of sex acts from rounded self-reported counts in a longitudinal study of risky behaviors in young people living with HIV. These applications illustrate the mathematical, statistical, and computational challenges involved in learning from BDPs in biology, medicine, and public health.
Time and place: 4--5 pm on Wednesday, 22 February 2012, in Scaife Hall 125
Ron Bekkerman, "Scaling Up Machine Learning"
Abstract: In this talk, I'll provide an extensive introduction to parallel and distributed machine learning. I'll answer the questions "How actually big is the big data?", "How much training data is enough?", "What do we do if we don't have enough training data?", "What are platform choices for parallel learning?" etc. Over an example of k-means clustering, I'll discuss pros and cons of machine learning in Apache Pig, MPI, DryadLINQ, and CUDA. Time permitting, I'll take a dive into a super large scale text categorization task.
Time and place: 1:30--2:30 pm on Thursday, 23 February 2012, in Newell-Simon Hall 1305

As always, the talks are free and open to the public.

(You see why I have trouble keeping up with these.)

Posted by crshalizi at February 19, 2012 12:30 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems