Thursday, September 18, 2014

On Academia vs. Industry.... MSR SVC Closing

I'm one of those professor types that ends up defending the joys of life as an academic versus a career in industry -- some of you who read this blog have probably seen me comment at Matt Welsh's blog or Daniel Lemire's blog/Google+ chain.  And to be clear I'm not anti-industry, I just want to make sure there's a fair discussion.

In that vein, I'm sad to hear that Microsoft Research Silicon Valley will be closing, as described in this article.  I know many people at MSRSV, and have visited there often.  It's a loss to the community to have it close.  I have sympathy for those who work there -- this must be stressful -- but this sympathy is happily diminished because I know the people there are so talented, they will quickly move on to other employment.  (When Digital Systems Research Center closed long ago, a number of the people I worked with in the lab just moved on to some small company that seemed to be just starting to really get going, called Google.  I wonder how it worked out for them.) 

After a moment of silence for the lab, I do feel it necessary to point out that this is one of the "issues" in the academia vs industry life choice.  The life cycle for companies moves rapidly.  That's not a bad thing, but it's a thing.  Disruptions like this are a non-trivial risk in industry, much less so in academia.  (Just look at the history of research labs.)  Again, I'm sure it will work out for the people at MSRSV, but any life shift like this -- even if it ends positively -- is stressful.  Without wanting to overstate the consequences, it's worth pointing out.

Monday, September 15, 2014

Simultaneous Enrollment

The Crimson has an op-ed on simultaneous enrollment, that I agree with.

Harvard does not like simultaneous enrollment, which means a student taking two classes that meet at the same time -- any time overlap counts (whether the whole class or half an hour once a week).  If you want to take a class via simultaneous enrollment, you have to petition the Administrative Board, and your professor is supposed to provide direct hour-per-hour instruction for the class you can't intend.  As a previous Crimson article states:
The Faculty Handbook requires that “direct and personal compensatory instruction” for simultaneous enrollment, but only recently has the Ad Board refused to recognize videotaped lectures as a stand-in for class time.
The article references that for the past several years the Ad Board has accepted recorded lectures, under some additional conditions, as a suitable proxy for the direct and personal compensatory instruction.  This apparently represented a change from their past position, and this last year, while I was on sabbatical, some Standing Committee on Education Policy decided to push back and say no more recorded substitutions. 

(An aside:  I don't know why they used the dated term "videotaped".  My lectures have been recorded and made available online;  these days, I'm not sure any "videotape" is actually involved in the process.)  

I believe I had a great deal to do with the Ad Board accepting recorded lectures the last several years.  My undergraduate class, CS 124, was recorded with very high quality production for the Harvard Extension School, and I made this video available to the students.  Some years ago, students starting approaching me wanting to do simultaneous enrollment for CS 124, and I thought it was fine, because of the availability of class lectures.  (That was, for instance, how the Extension students were taking the class.)  In particular, every year some number of students seemed to want to take CS 124 and a conflicting math class that overlapped, which made it very difficult for a small number of students who wanted to pursue both CS theory and mathematics.  (Both classes were "gateways" to more advanced classes;  at least for me, changing my class time would have just introduced other more challenging time conflicts;  neither class appeared poised to change times.)  But other class overlaps came up as well. 

I initially faced some opposition from the Ad Board when I supported the students' petitions for simultaneous enrollment because I suggested the recorded lectures were a suitable proxy.  They objected, and I objected to their objection.  After some back and forth, we found language which we managed to agree met the language of the faculty rules, with the outcome being the students could, in the end, rely on the recordings of class lectures.  I admit, I believe it helped that I had served on the Ad Board, and had a good working relationship with them, so they were willing to work with me to come to a mutually acceptable solution.  The framework we established was later used for CS 50 and other computer science classes, because I passed on my successful approach with the Ad Board to others.  (David Malan asked me for assistance on the issue, for example.)  There was, however, always some tension, with the Ad Board continuously questioning whether using class recordings to justify simultaneous enrollment was suitable.  Meanwhile, the number of such petitions kept growing. 

Apparently, while I was sabbaticalling, various powers that be have acted to tighten the rules for simultaneous enrollment, and disallow my previous framework.  Interestingly, CS 50 -- the class with the most requests for simultaneous enrollment -- seems to have acquired a blanket exception, which I am happy for, but still leaves the rest of us facing unhappy students who often have to deal with ugly situations.

What I find frustrating is that, to my knowledge, this change is not based on any data about student performance, or any understanding of student behavior.  It's based on a presumption that students are supposed to sit in classes to learn.  David Malan has some great data showing this presumption doesn't seem to have a basis in reality.  Students in CS50 taking the class under simultaneous enrollment don't do any worse than other students.  In fact, if I recall the data right, they do better than average;  this would not be surprising, as they are a self-selected group with apparently high interest in the subject.  David also has great data showing how lecture attendance steadily drops over the semester.  If a substantial fraction of the students are missing class and watching the video anyway, what's the point of blocking simultaneous enrollment?  Maybe because of all this data at David's command CS 50 got its exemption, but I'm sure if we go back we'd find similar findings for CS 124 and other classes.   

Professors have always had the right to refuse simultaneous enrollment petitions for their class, so if the professor expects student attendance, there is no problem.  I don't think it's a radical suggestion to say that for recorded classes, professors and students should have the ability to jointly decide if simultaneous enrollment is a suitable solution to the small number of inevitable class scheduling conflicts.  I'm not sure what combination of faculty and administrative busybodies decided they had to fix what I see as a non-problem, but I'm sure their rule-making (or in this case rule-interpreting), while it only affect a small number of students, will be a significant net negative.  Kudos to the Crimson for pointing out that this is a poor decision, and I only wish I was optimistic that it could be quickly reversed.  

Thursday, September 11, 2014

Biggering

Preliminary course enrollment numbers are in.  Our new theory course, CS 125 (rapid introduction to theory), has 32 undergrads and 5 others, for an enrollment of 37.  Salil and I had predicted and desired in the 20-40 range for the first year of this course, so we're on target.  We hadn't thought about CS 125 being a class for grad students, but it actually makes sense.  Entering CS grad students who haven't had background in theory, or grad students from other related areas (statistics, mathematics, etc.) who want a fast-paced introduction to the basics of CS theory, would both fit in well for this class.  Maybe next year we'll even advertise it for grad students as well.

Our mainstream required-for-majors theory course, CS 121 (the "here's Turing machines and undecidability and NP-completeness" course), has an enrollment of just over 150, the largest it's ever been.  Given that CS 125 was supposed to draw some students from 121, that's even more amazing.  

And the standard CS introductory course, CS 50, has over 800 undergraduates (and over 850 total) signed up, making it now the largest class at Harvard.  This was so noteworthy that the Crimson had an article this morning about it.  As usual, you can find a great quote from Harry Lewis: 
“Harvard students are smart people,” said Harry R. Lewis ’68, former dean of the College and current director of undergraduate studies for Computer Science. “They have figured out that in pretty much every area of study, computational methods and computational thinking are going to be important to the future.”
So the growth continues, at least for another year.

Wednesday, September 10, 2014

You Can Rename My Family for.....

An interesting article in today's Crimson about the background to the announcement that Harvard's School of Public Health will be re-named the Harvard T. H. Chan School of Public Health, for a $350 million donation.  Feel free to comment on your feelings regarding the issue.  (The Kennedy School of Government notwithstanding, Harvard schools have not been "named" via donations historically, although certainly buildings and such have.)  I'm not publicly taking sides -- indeed, it seems like a wonderful debate challenge to figure out arguments on both sides of the issue as to whether naming schools in this way is a good or bad idea.  Even if you think it's a good idea, you might wonder about the price tag...

In particular, I can't help but wonder what the naming rights of the School of Engineering and Applied Sciences could go for?  In the course of our capital campaign, will I have the chance to find out?  Maybe we could get a bidding war going?  How much would SEAS have to get for me to feel good about having "Harvard's XXXX School of Engineering and Applied Sciences" on my letterhead for some appropriate name XXXX? 

At a baser level, Mitzenmacher has always been a problematic name.  Too long.  (On so many standardized forms, I end up as Michae Mitzenmache....)  Maybe I should offer myself up for naming rights.  Sadly, I don't think I'd get $350 million.  

Sunday, September 07, 2014

Teaching Sorting

For many years now, while teaching the introductory Algorithms and Data Structures, I have told students that we won't be starting with (or really covering) sorting and searching, because it's boring.  While that description is an exaggeration (the students see mergesort [an example of divide and conquer] and heapsort [heaps are important as an implementation of priority queues for Dijkstra's algorithm]), I've never thought that staring this class with a long unit on sorting/searching algorithms, which most textbooks like to start with, is the best use of time.

So it was a bit odd to find myself for our new "honors course", CS 125, spending the first day-and-a-half or so on sorting algorithms.  The key was that there was the right motivation for it.  We'll be tackling both algorithms/data structures and complexity, and the theme of how to model computation will play a big role throughout the course.  When we thought about what was the right way to introduce the class, and in particular this theme, sorting seemed like the right choice.

After very quickly refreshing the students on some basic comparison sorts (bubblesort and mergesort), we present the standard lower bound argument for comparison-based sorts.  But then we show how this lower bound can be broken, by assumptions about the data that allow one to go beyond comparisons to other operations.  Specifically, we raced through counting sort, bucket sort (for random data), and sorting via van Emde Boas trees.    (The students in particular seemed to be talking through the details of van Emde Boas trees at the class break, which I'd like to think because they were new and interesting, but possibly was due to the fact that it was the first time I'd ever presented them, and maybe my delivery for that topic needs some tuning.) 

Lo and behold, I find myself finding sorting interesting!  We didn't even get to more advanced interesting possibilities, like data-oblivious sorting or "bleeding edge" parallel sorting.  I feel I need to apologize to sorting, for mistakenly suggesting that it's a boring topic.  I still wouldn't want to start an algorithms/data structures class with multiple weeks of sorting and searching algorithms, but sorting is a natural way to introduce some key concepts in algorithms, and there's fun to be had in the large variety of approaches to sorting.