## Sunday, February 26, 2012

### Technology Diffusion (Guest Post from Zhenming Liu)

My student, Zhenming Liu, writes about some of his recent work:

Sharon Goldberg and I recently submitted a paper about technology diffusion in the context of communication networks. There has been lots of work in this area in the networking literature; while networking researchers and practitioners usually know what technology they want to deploy in the network (e.g. Secure BGP, QoS, IPv6, etc.), they often don't know how to get the economically-motivated nodes in the network to adopt it. What sort of utility models could capture an economically-motivated node's decision to adopt a communication technology? Some assumptions Sharon has found in the literature (and used in her own work) include:
• A node's utility depends on the number of other nodes it can communicate with using the new technology. This is sometimes known as Metcalfe's Law.
• Two nodes can communicate with each other only if there is a path between them such that every node on the path has deployed the new technology.
Thus, unlike existing works on viral marketing in social networks -- where a node's utility function depends only on its immediate neighbors-- in our setting, node utility is non-local. We considered the following question:

Given a communication network $G(V, E)$, we assume that node $u$ activates (i.e., deploys the new technology) if it is adjacent to a connected component of active nodes in $G$ of size exceeding node $u$’s threshold $\theta(u)$. Our goal is to find an efficient algorithm that determines the smallest seedset of early adopter nodes, that once activated, causes a cascade that eventually causes all other nodes in the networks to activate as well.

We designed an approximation algorithm that returns a seedset which is at most $O(r\ell \log |V|)$ larger than the optimal seedset, where $r$ is the graph diameter and each node's threshold can take on one of at most $\ell$ possible values.
1. We seem to have the first tractable model that considers cascade effects with non-local influence in a graph; in the social network literature, the influences are local, while the networking literature considers non-local influences, but thus far has not developed any approximation algorithms.
2. Our algorithm substantially deviates from the greedy strategies we see in almost all influence maximization papers for social networks today. After reading lots of papers in this area and writing some ourselves (e.g. this and this), my colleagues and I started to suspect that a “computational dichotomy” exists among these problems: either the problem is deadly hard or greedy algorithms already work well. Our new result demonstrates that this is not the case, at least when influence is non-local.
3. Finally, our randomized algorithm makes use of a linear program, which is quite surprising because the problem seems highly non-linear; we managed to linearize the process by first carefully restricting our search space, while paying the price of only constant approximation loss.
I hope this paper will lead to more work on diffusion processes with non-local influences; studying other relevant models, or developing more efficient algorithms would both be interesting.

## Thursday, February 16, 2012

### Advantages of a Bigger Class

I'm several weeks into the super-sized class this semester.  Mostly, I find it's going fine -- indeed, there are some advantages, I'm finding, to the larger class size.

1)  Student participation.  While I'm sure the fraction of students attending is probably about the same, with a bigger class, the room actually feels full.  And, when I ask a question, there are more students who will answer.  I think the interactive aspect of the class is much higher this year.  (Related bonus:  sounds like more laughing when I tell a joke.)
2)  More creativity.  This is mostly due to my teaching staff.  With a larger teaching staff, we've optimized our resources a bit (e.g.: not every TA teaches a section every week, freeing them to do other things;  one-person jobs like prepping section notes have to be done less frequently per person).  So now we have special post-homework-go-over-the-problems sections, a new Web site, and Piazza.  Thanks team.

The advantages, so far, outweigh the downsides.
1)  Grading.  I historically grade the first programming assignment on my own, and help grade the midterm/final.  I haven't had to do any of that yet.  I'm not looking forward to it...
2)  Student interest.  More students want to meet with me, have lunch, and so on than in previous years.  Normally, I would put this in the positive column.  But since the semester started I've been soooo busy that I just end up feeling guilty telling students yes... but later in the semester.

I'm curious to see if the increase in students will hold for next year.  But if it does I think I can find the positive.

## Wednesday, February 15, 2012

### Using Piazza

My TAs forced me to "get with the program" and use Piazza as our mechanism for student questions.  In the past I've simply used a central e-mail address that got forwarded to all the TAs (and me);  questions sent there would be answered by a TA (or me) and we all would see the reply.

Piazza offers some additional bulletin-board+ functionality on top of that framework -- things like being able to type things in Latex, easily see what questions are left open, etc.  It seems useful enough;  it's definitely better than past bulletin-board-style offerings I've seen.  Honestly, though, for me the primary benefit is that it lessens the amount entering my mail spool -- it's more organized having these questions all in one place, with formatting.  I'll have to see what the students think during and at the end of the semester.

At the moment I'd recommend it, which is not something I thought I'd say -- like I said, I've eschewed such tools in the past.  On the other hand, I'm not clear on their eventual business model -- I don't think I'd pay for using it as a service.  Perhaps the plan is just to get us so used to it that the idea of not using it is painful enough that we decide to pay.  We'll see if I reach that point over the semester.

## Tuesday, February 14, 2012

### A Systems Complaint

I think of myself as a person who works both in systems and theory.  And I've found that's often a challenging position.  I've been vocal before that I think theory conferences, to their detriment, downplay experimental work -- most algorithms papers don't actually implement algorithms.  Today it's time for a complaint about the systems culture.

I think many systems people appreciate good and useful theoretical results.  However, they have I think some unusual expectations abut what is good and useful, in that it has to be something they can absorb in the space of a few pages*.  (Really, in the space of the paper you submit, but given that you have to actually produce some implementation results for a systems paper, you really only have a couple of pages to get your theory across.)  For example, I was fairly disturbed when I got back reviews for a paper (some time back now...) sent to a systems conference with significant theoretical content that included comments like (edited to protect the conference):
While the program chairs were eventually able to locate reviewers who knew enough about the previous work in the area to understand and verify your work, this indicates to me that the XXXX audience will have a difficult time understanding this paper.   (The average XXXX attendee will not be familiar with Celebrated20YearOldTheoryResult, much less the more recent work you build on.)
Another review stated:
It is reasonable (though a bit of a stretch) to assume a XXXX reader is familiar with MAJOR20+YEAROLDAREAOFTHEORY and Celebrated20YearOldTheoryResult. It is less reasonable to assume the reader is familiar with [major paper cited several times in our submission], and not at all reasonable to assume the reader is familiar with [other work under submission and on the arxiv], an unpublished manuscript by the authors. Yet I found that this paper was not sufficiently self-contained to understand it without referring to those prior works.
Of course, this was a challenging paper, and in fact in the paper we were quite clear that we were building on lots of previous work.  We tended to state things in the form of, "It is known that X holds [citation], and using that we do..."  So if they didn't believe us, they could go look it up (though, admittedly, it's not easy stuff).  We had expected that they would take our word on background material;  as the reviews state, they did verify we were accurate in our use of the prior work, and that we had clearly cited it.

But apparently (and I didn't get the memo on this), "self-containedness" is considered important for systems conferences.  Which, interestingly, goes against most work in theory, where you're often trying to build on a substantial chain of previous work.

I find this worrisome for work spanning the systems/theory divide, and will plan to fight this attitude on whatever systems PCs I am on.  I think it's great that systems people are interested in and like to use theory.  But if they want to apply something more complex than, say, a Bloom filter (a truly wonderful data structure, easily described completely in 2-3 paragraphs), the community might have to accept that, yes, to fully grok the interesting stuff in our paper, they'll either have to accept our word for what's in the past papers, or find and trust reviewers who can vouch for what's in past papers, or go read the papers themselves.  Not everything can be adequately spelled out in complete detail starting from a blank slate in a 10 page paper.

Perhaps one could argue that this was an isolated incident, and certainly I know there are (plenty of) people in systems who do not think this way.  But I do think it's representative of (a non-trivial part) of the culture.

I find the idea that a paper that involves theory that goes beyond what you, or even your community, has yet to experience is automatically out of scope a disturbing framework, and I find the expectation that I should be able to get you up to speed on a complex area in a few pages so that the paper is "self-contained" unreasonable.

*I've heard some people more disparagingly say that you have to make them think they've understood it, not actually understand it.

## Friday, February 10, 2012

### Class Size Incentives

There's an interesting story in today's Crimson about a possible rise in "limited enrollment" classes at Harvard.  (See this older Crimson article as well.)  There may, of course, be didactic reasons for limiting the number of students in a class, but there seems to be an impression that there's something else behind this.  A somewhat harsh quote from Harry Lewis:
“My sense is that it’s increasingly thought to be a matter of professorial prerogative to say, ‘to make a better teaching experience for me, I’m not going to teach these students who are paying \$50,000 a year to come to Harvard’—and it doesn’t seem right to me,” Lewis said.
Clearly Harry has a point.  On the other hand, from the faculty point of view, Harvard (and from what I know, many other schools) don't generally incentivize teaching larger classes -- which, I can tell you (since my class size doubled from last year), is indeed more work.  If you're not tenured, there may be some incentive to teach a larger class sometime early on -- it shows you're a team player, useful to the department/college, etc.  Past tenure, I'm not sure there's a direct incentive -- although plenty of faculty enjoy teaching, and are happy to teach larger classes at least once in a while.  Without such incentives, isn't limited enrollment understandable (as long as nobody stops you)?

Should a university offer incentives for teaching larger classes?  Perhaps some additional time on teaching leave?  Or direct monetary compensation?  What is the "social contract" between faculty and students regarding when class size should be limited?  Or the less social contract between faculty and their employers, the universities?  How are these questions affected (if at all) by the newly developing paradigm of classes of 100,000+ online?

Challenging questions, I think.

## Thursday, February 09, 2012

### Citation Circles on FSP

Sorry I haven't blogged much.  Life's been busy.

But here's an interesting post by FemaleScienceProfessor discussing the appearance of "citation circles", people forming groups to cite each other (when possible) in order to get their citation numbers up.  On the Web, the equivalent is link farms, designed to get one's ranking up in Web search.  It makes me wonder if other methods for improving one's search rank are showing up or already commonly in play by people who try to work their citation count numbers upwards.  And whether methods for finding link farms used by search engines could (and should?) be used to correct citation counts in some way.