Thursday, September 24, 2015

This Year's Andreessen-Horowitz Meeting

Andreessen Horowitz had another Academic Roundtable, which I've written about in previous years (here, here). 

For me, the most exciting session was on Virtual and Augmented Reality (VR/AR).  It made me sit up and take notice. 

Matthew Turk started with an introduction, likening augmented reality systems to giving people superpowers.  Then he discusses his Augmented Reality system for remote communication.  Imagine something wrong with your car;  you call up a repairperson with your phone.  Instead of just sending video, your phone also sends information building up a 3-D model of what's being viewed.  So the repairperson can manipulate that 3-D model, essentially poking around the car looking at things independently, and then sending you annotated information (like "flip this switch", and circling the switch in an image from the model).  It looked pretty useful.

Steven Seitz then talked about Google's Jump and Cardboard projects.  Jump is a relatively cheap circular camera system that lets you take 360 degree "surround-sound" type video.  Cardboard let's you take your phone, put it in an Viewmaster style cardboard box, and view these 360 degree videos.  I got to try it, and it's amazing;  you really feel like you're in the middle of a scene.  It's "immersive", which appears to the one of the buzzwords for VR/AR experiences.  A scene when you're in barn surrounded by horses in their stalls really had the feel of being in a barn surrounded by horses.  You have to be careful -- you tend to turn and walk a bit as you're in the scene, which can be a recipe for trouble (like walking into a pool with your cell phone).   I didn't realize how "soon" this type of VR experience will be mainstream.

Derek Belch of STRIVR talked about his company, which is bringing virtual reality systems to sports training.  So the quarterback can practice passing plays, getting the mental training without having to have a team physically out on the field.  Besides showing off their innovative system, he talked about building a company in this space. 

There were some other great talks -- several others, but to name just a couple, Michael Jordan of Berkeley talked about the theory of combining differential privacy with statistical inference, and Jason Mars of the University of Michigan talked about Sirius, their open-source intelligent personal assistant  

The meeting always makes the whole startup idea seem very exciting.  However, during Marc Andreessen's fireside chat session (no actual fire), they said the dominant failure case for failed startups coming out of universities is where a professor starts a company on an interim basis, and tries to hand it off to others.  There is no substitute for the "core team" being full time devoted to the startup.  (The "core team" doesn't have to include the professor, it could be students, of course.)   

The fireside chat also talked about CS education, and what it meant that CS was becoming mainstream.  How do we design a more open CS education, so people know enough to use and develop/create things on computers without being "computer scientists"?  How should we think of movements like the recent activity in New York, where Mayor de Blasio has said that all the city's public schools will be required to offer computer science to all of their students? 

Overall, again, I'm glad I had the chance to go.

[Note:  Andreessen-Horowitz doesn't pay me anything, or even suggest I blog about their meeting.  But they do pay for expenses for people they invite.]  

Thursday, September 03, 2015

One Lecture Down....

CS125, the "new", "honors-ish" Algorithms and Complexity course, got off to a good start today.  The room was full with not enough seats for people, the students asked good questions and responded well to questions asked, and we got through the amount of material I expected.  It's year 2 of the class, which is easier than year 1 in some ways (lots of material prepared), and possibly harder in others (some thinking about what needs to be tweaked or fixed from year 1).  We'll see how things shake out next week, but I'm expecting we'll be in the 30-40 student range, like last year.  I can never tell if I managed to scare students off or make them want to take it.  (The challenge is that I want to do both;  scare off people without sufficient background, but interest students who do but might not realize it and might not even be Computer Science majors.)  Pleasantly, I felt very excited during and after the lecture, and will try to hold on to that positive energy.  

In other, much much stranger news, Harvard's CS50 appears to have a "backlash" movement, as described in today's Crimson article.  Apparently, according to some, there's intense "social pressure" to take CS50, and students need to be told that it's OK not to take the class.  I find this quite odd and, from my vantage point, misguided.  (Of course, I'm not a college freshman.)  I can't recall any such organized movement against Economics 10 at Harvard, which has been for decades now the most popular class at Harvard, although even when I was a student there was something one could potentially call cult-like about it.  (Cult of Wall Street....)  But that didn't mean people complained;  if you didn't want to take the class, you didn't, not really a thing.  Sure, the CS department here has been actively trying to attract students for decades -- CS50 was a good-sized class even before David Malan took over -- and David has just been very successful at it, with a combination of organization, interesting material, vision, and, yes, good marketing.  Naturally, here in CS, we believe that in our idealized world nearly all undergraduates would have a CS course as part of their liberal arts education, and we provide several other entry courses besides CS50.  I was initially thinking the movement described in the article was just a joke, and maybe I'm being April Fooled, but I'm not sure where those responsible are coming from.

And speaking of bringing in students to CS, Ben Golub and Yaron Singer are doing a new Econ-CS class at Harvard (counts for either;  also good for Applied Math majors) simply entitled Networks.  I'm a bit jealous -- this is a class I've thought about teaching also, but was busy and happy teaching algorithms -- but hopefully now that they've started it up it means I'll get a chance to teach it some year(s) down the line.

More insight into whether our enrollment numbers are still rising (is that still possible?) next week...


Tuesday, August 25, 2015

CACM Viewpoints on Theory and Experiments

There's a fun pair of viewpoints in the September CACM by Jeffrey Ullman and myself on experiments in computer science research, with him addressing systems conferences(/people) being far too focused on experiments as the research validation methodology, and me addressing theory conferences(/people) being almost strangely averse to experimental results.  (This link may bring you a digital version of his viewpoint, and this link to mine.)  I hope they might be interesting reading or food for thought.  As someone who works in both camps, I find this separation -- which we both seem to think is growing -- worrisome for the future of the CS research community.   

We actually wrote these up about a year ago (or maybe longer).  Jeff wrote something on the topic on Google+, and I responded.  I think he got drafted into writing something for CACM, and then I got drafted in later.  There was a pretty thorough reviewing process, with a couple of back and forth rounds;  then there was a non-trivial wait for publication.  This seems OK to me -- I'm glad CACM has a non-trivial queue of items for publication.  Overall it was a thorough and reasonably pleasant publication experience, and it's appealing that CACM offers a platform for these types of editorial comments.

Friday, August 21, 2015

SIGACT Meeting, Some Stuff

As some of you know, I was recently elected to the position of SIGACT (ACM Special Interest Group on Algorithms and Computation Theory) Chair.  So some part of this blog will be devoted to issues related to SIGACT for the next few years.  Comments and opinions are, naturally, extremely welcome.  

I'm on the train back from an orientation meeting for new chairs, and the SIG Board meeting.  Here are some things mostly from the orientation meeting, in a semi-random order.

1)  For people who have issues with ACM publication practices, some news.  First, they've moved to a model where conferences can maintain links to papers so that they are freely accessible.  STOC is/will be doing this, for example, so there will be a "permanent" STOC conference page each year with links to the STOC papers.  Similarly, authors can generate an "Author-izer" link for their web page giving people access to each of their ACM papers.  (I think there is a limit there of one such link per author.)  In any case, making publications more freely accessible is an issue ACM is dealing with, and the direction appears quite positive.

2)  There's other publication issues being considered, including whether/how conference papers can be published as journal articles in computer science (what should be the conference-journal relationship?), and some other forward thinking about publication models.  [There will be editorial and some viewpoints in the September CACM issue.]

3)  On the more mundane side, ACM is re-doing its website.  There is a preview at , and they are desirous of feedback.  (On most browsers, there should be a feedback link over on the right hand side of the page.)

4)  The ACM is starting to do book series.  You can get more information at , and they are looking for authors.

5)  The ACM has a student research competition (beyond the CRA stuff) -- information at .  We may look to see if we can incorporate this into STOC (or another theory conference).

6)  For those interested in logic and computation, or formal methods in computer science (broadly defined), you may be interested in (and not yet have heard about) SIGLOG (The ACM Special Interest Group on Logic and Computation), which formally came into being about a year ago.  Please consider joining.  

There was plenty of other stuff, but it was primarily administrative information that would bore you.

Friday, July 03, 2015

The High Cost of Conferences

At some point, I'm convinced the "conference structure" is going to fall apart.

Case in point -- I haven't bought my tickets yet for SIGCOMM because, unless I'm missing something, a schedule isn't up yet, and unfortunately, because ACM has scheduled a SIG chair meeting overlapping with SIGCOMM (which I don't understand also, but perhaps beside the point), I want to see what's going on when at the conference to plan my timing.   

6+ weeks out, round trip tickets from Boston to London are over $2500 on nonstop economy flights.  And those don't seem to be on US carriers;  since I have to stop back through New York for this other meeting, and I need to find a US carrier (or figure out if this is a case where it's an exception to what is the currently believed NSF policy), tickets look to be well over $3000.  Then there's registration, hotel, etc. 

At some point, this becomes unsustainable, I think.

Monday, June 15, 2015

SoCG Proceedings

The 31st International Symposium on Computational Geometry has its proceedings available online here.

I point this out because the SoCG proceedings were managed by LIPIcs, the Leibniz International Proceedings in Informatics.  I've been on the editorial board for LIPIcs for several years.  The goal was to establish an open but professional publication mechanism that would be affordable in comparison to what was being offered by standard publishing programs.  From the announcement in 2009:

Schloss Dagstuhl Leibniz Center for Informatics (LCI) establishes a new series of conference proceedings called Leibniz International Proceedings in Informatics (LIPIcs). The objective of this series is to publish the proceedings of high-quality conferences in all fields of computer science, and LCI institutes an Editorial Board to oversee the selection of the conferences to be included in this series.

The proceedings in the LIPIcs series will be published electronically and will be accessible freely and universally on the internet, keeping the copyrights of the authors, and under an open access license guaranteeing free dissemination. To face the cost of electronic publication, a one-time fee will be required from the conference organizers. This fee will be kept to a minimum, thought to cover the costs of LCI, thanks in particular to a sharing of the workload between LCI and the conference organizers. 

LIPIcs has been, I think, growing in visibility and success, in terms of attracting more conferences.  If you're looking for a different approach to publishing proceedings for your conference that seems more in line with what many I've heard want from their conference proceedings, I encourage you to take a look.  It may not be for everyone, but we expect it may be useful for more conferences than are currently using it.  

Wednesday, June 03, 2015

More Good News for SEAS

About six months ago, I was able to point to a nice gift from Steve Ballmer for Computer Science at Harvard.  Today, more good news, this time for all of the Harvard School of Engineering and Applied Sciences:  a $400 million gift from John Paulson, which (at least for now) is the largest donation to Harvard in its history. 

Not surprisingly, there's extensive new coverage, starting with the Harvard Gazette, the Harvard Crimson, and the New York Times.

While I'm not aware of the details of the effects on our budget, it's clearly great for us long-term, and will help us continue to innovate and grow.   

Also, for those interested, a recent Q&A with Harry Lewis from the Crimson, who's been acting as Interim Dean for SEAS.

Monday, May 18, 2015

What makes a well-attended conference? (STOC 2017 Discussion)

I've always been impressed how the SIGCOMM conference accepts a very small number of papers, but many hundreds of people attend the conference.  (This is true for some other conferences as well;  SIGCOMM is just he one I happen to know best.)  The conference is somehow an "attend-if-possible" venue for the networking community.  Most other large conferences I know work instead by accepting a lot of papers.  ISIT is a week-long event with 9 parallel sessions (in 2014).   Location may certainly be helpful;  ISIT was in Honolulu in 2014 and Hong Kong in 2015;  SIGCOMM will be in London this year.  But it also isn't decisive;  the Allerton conference has been steadily growing, so it's a few hundred people, even though Urbana-Champaign, where the conference is held, really can't be the biggest draw.  (No offense to those who are there -- I always enjoy visiting!)  In fact, really, at this point, Allerton has been so successful it has seemingly outgrown the Allerton conference center!

If you have ideas for what makes a conference a "must-attend" venue, I'd be interested in hearing them.  The question has arisen because of a Windows on Theory post about changing the format for STOC 2017;  for those of you on the theory side who haven't seen it, I'd recommend looking at the post and registering an opinion by comment or by e-mail to the appropriate parties, after you've considered the issue.  The underlying question is what role should the STOC/FOCS conferences play in the world these days, and how might we change things to get them to play that role.  I think another way of framing this is that I don't think STOC/FOCS aren't really "attend-if-possible" venues for much of the theory community -- particularly for certain subareas -- and the question is whether this can change.  Again, I'd be happy for insights from other communities as well.  

There are a large number of comments already.  I'd say in terms of my personal opinion on what to do I'm most aligned with Alex Lopez-Ortiz and Eric Vigoda.  Alex's point is straightforward -- accept more papers.  I've been in the accept more papers camp for a while (a very old post on the topic suggests one way it could be done), but for some reason there seems to be huge numbers of people in the theory community against the idea, based on some sort of quality argument. For me, the tradeoffs are pretty simple;  I generally prefer not to travel, so I need a good reason to go to a conference, and one very good reason is that I have a paper in the conference that I or a colleague is presenting.  (I recognize this is not true for everyone, but being selective in travel is an issue that arises for people with families, or other regular obligations.)  Clearly conferences such as SIGCOMM show that there are other paths possible, but I'm not clear on how to reproduce that kind of buy-in from the theory community.  Eric's suggestion was that lots of theory conferences happen during the summer, so instead of trying other ways to create a larger "STOC festival", why not co-locate a number of conferences (that would have parallel sessions) that would get more of the theoretical computer science community together.  That makes sense to me, and seems like it could be particularly beneficial for some of the smaller conferences. 

But my personal opinion, while undoubtedly fascinating to some (most notably myself), is really not the point.  The point is for the community to examine the current situation, think if there are ways to make it better, and then act on what they come up with.  The bet way for this to happen is if people actively participate in discussions.   So please, go opine!  There's no reason the post shouldn't have 100+ comments, so comment away.  

Friday, May 15, 2015

New Work on Equitability and MIC

A few years ago, working with a diverse group of scientists, I was looking at a problem related to big data analysis.  The setting was data exploration, where you are working with a high-dimensional dataset, and you are seeking pairs of dimensions that are related in interesting but a priori unknown ways;  that is, maybe there's a linear relationship, but maybe there's a more complicated relationship (sinusoidal, parabolic, etc.), and you don't know what you're looking for in advance.

One statistical approach is to throw a so-called measure of dependence at the problem, which will tell you whether variables are related or unrelated.  But in big data sets, you may expect to have lots of dependent pairs;  what you really want is not just a "dependent-independent" test, but a scoring mechanism that ranks the extent of dependence appropriately over a wide range of possible relationships.  This methodology seemed underserved by the literature.  We dubbed what we were aiming for "equitability", and designed a statistic we dubbed MIC (maximal information coefficient), a "bucketed" version of mutual information, that seemed to be a good ranking mechanism for equitability.  The work appeared in Science some years back (the link here will allow you to access it). 

The work has, I think, been quite successful -- a number of people have used MIC in their research to analyze data sets, and the idea of equitability seems to be catching on.  But after it came out, there were some questions and issues raised primarily by statisticians.  We had always planned to go back and revisit some of these issues, and, after various delays caused by life, a group of us started back on it again.  The project seemed to balloon, and we've been working on and off on it for at least a year now.  (Some earlier, initial drafts pointing to where we were going are on the arxiv.)   Finally, we've reached a stopping point, and we've just put up 3 papers on the arxiv, all of which are now being submitted to various journals.  In this post, I'll outline what the papers cover, after the links.
  1. Equitability, interval estimation, and statistical power. arXiv

  2. Measuring dependence powerfully and equitably. arXiv

  3. An Empirical Study of Leading Measures of Dependence. arXiv 
Really, there seemed to be 3 issues that people wanted to hear more about after the initial work, and each topic ended up being "large" enough that it merited its own paper.

First, people wanted further theoretical foundations for equitability.  Our Science paper, being for a general audience, provided an intuitive definition, but we didn't define it as one would in a mathematics paper.  (That was, I should be clear, intentional.)  We thought our definition was pretty clear in plain English, but for some it was not sufficient;  one group, in particular, seemed to ignore our explanations when in their own work they added restrictions to "equitability" and incorrectly ascribed them to us. 

So the first paper formalizes this notion of equitability.  As is usually the case, formalizing it turns out to be helpful.  In particular, it shows that there are (at least) two natural ways of thinking about equitability.  One way formalizes our original conception that an equitable statistic is a statistic that gives similar scores to relationships with the same amount of noise, even if the relationships are of different types.  But a different way of viewing the formalization shows that equitability can naturally be seen as an extension of statistical power against a null hypothesis of independence (i.e., relationship strength = 0), to power against null hypotheses representing all levels of dependence (i.e. relationship strength <=x for any x).  This view clarifies the relationship between equitability and power against independence, showing the former generalizes the latter.  In our new papers we have some really nice "heat-map" style visualizations for viewing equitability performance results based on this view that I think are really useful in thinking about and understanding equitability in the paper.  

Second, people wanted faster, better versions of our proposed scoring function, MIC.  We ended up going back to the theory of MIC, examining further its relation to mutual information, and, perhaps unsurprisingly, doing so allowed us to make tangible practical advances.  We ended up with algorithms for computing MIC that are both faster and more accurate.  (Our original algorithm in the Science paper used a heuristic approach based on dynamic programming to approximate the MIC score from the data;  our new approaches have both improved speed and accuracy.)  We're hopeful that people using MIC will be able to switch over time to these improved algorithms.  Connecting to the first paper, in re-examining MIC we also developed a variation of MIC (called TIC, for total information coefficient) that is better designed for achieving power against independence as opposed to equitability.  We are hopeful that TIC may prove useful to people on its own, as well as in conjunction with MIC. 

Third, people just wanted to see more comparisons.  Well, really, here I think everyone just has their own favorite measure of dependence, and before they go switching to this new-fangled thing that appeared in Science of all places, they want to see more.  Specifically, subsequent to that initial paper, there were people concerned about the power against independence of our methods -- although, again, to be clear, equitability is a larger notion that power against independence, so one might expect a method designed for equitability would not have as high power against independence as methods designed specifically for that purpose.  Also, there were some people who claimed, based on very limited experiments, that mutual information estimation would be more equitable than MIC.

So the third paper is a large-scale empirical study.  In terms of equitability, we find that MIC still seems to be the current best measure.  (Mutual information sometimes does well -- in some very particular circumstances, it can do better than MIC, which is not especially surprising since MIC is a "scaled" version of mutual information -- but MIC still performs much better overall.)  In terms of power against independence, we find that MICs power was underrated in previous studies.  The discrepancy seemed to be that previous studies used MICs default parameter settings, which were designed for equitability performance, not power against independence;  using different default parameters yields substantially better power against independence.  (The new algorithms in the second paper that yield more accurate calculations improve the power further.)  Finally, the TIC measure described in the second paper does even better, is easily computed when computing MIC scores (and so has negligible overhead if computing MIC scores), and appears to be comparable to other state-of-the-art measures in terms of power against independence.  We also find that the new ways of computing MIC are indeed faster and more accurate than previous methods. 

We're hoping these works, collectively, move the ball forward on this topic.  We like seeing MIC being used, and hope people will start using it more when analyzing their large dimensional data sets.  To be clear, perhaps tomorrow we will find that there are better scoring mechanisms than MIC for equitability;  perhaps even better mutual information estimators will suffice.  That would be great!  (I think of it like clustering algorithms;  there are lots of good clustering algorithms, different ones may be better suited to different situations.  The more the merrier.)  We also think equitability continues to be a useful framework, and that there's more to be done with equitability.  Though for now our group may again take a breather on this topic, and see what arises from our current work.