Monday, February 22, 2010

EWSN 2010 Keynote Video

I've posted the video for my keynote at EWSN 2010 below. You can check out the full resolution version at blip.tv.

Highlights from EWSN 2010

I was invited to give the keynote speech at the European Wireless Sensor Networks conference in Coimbra, Portugal. This was a fantastic location for a conference -- Coimbra has one of the oldest universities in Europe, over 700 years old. It's a beautiful city. EWSN is the European counterpart to conferences such as SenSys and IPSN. It is a very different crowd than typically attends those events. I learned a lot about a couple of the big EU-sponsored sensor nets projects including CoNet and GINSENG. Interestingly, the Contiki OS seems to be pretty popular amongst the European research groups, in contrast to the TinyOS-dominated US landscape.

My keynote was entitled "The Next Decade of Sensor Networking" and I tried to argue that the field is running the risk of becoming stagnant unless we define some big research challenges that will carry us for the next decade. I've blogged about these themes here before. I delivered the talk in "Larry Lessig" style -- having written the "script" as an essay and then making slides to highlight the key points, rather than starting with the slides and ad libbing the voiceover as I usually do. I'll post a video here soon - the slides are more than 50 MB and don't really go well on their own.

A couple of highlights from the conference, though I had to miss the last day.

Jayant Gupchup from Johns Hopkins gave a talk on Phoenix, an approach to reconstructing the timestamps for sensor data after the fact. The idea is to not use a time synchronization protocol, but rather have nodes log enough data that can be used for post-hoc time correction. This is an interesting problem that was motivated by their experiences running sensor nets for more than a year, in which they observed a lot of node reboots (which throw off simple timing approaches) and extended periods when there was no suitable global timebase. The Phoenix approach collects information on nodes' local timestamps and beacons from GPS-enabled nodes at the base station, and performs a time rectification technique, similar to the one we developed for correcting our volcano sensor network data. Phoenix achieves around a 1 sec data accuracy (which is acceptable for environmental monitoring) even when the GPS clock source is offline for a significant period of time.

Raghu Ganti from UIUC gave a talk on "Privacy Preserving Reconstruction of Multidimensional Data Maps in Vehicular Participatory Sensing." The title is a bit unwieldy, but the idea is to reconstruct aggregate statistics from a large number of users reporting individual sensor data, such as their vehicle speed and location. The problem is that users don't want to report their true speed and location, but we still want the ability to generate aggregate statistics such as the mean speed on a given road. Their approach is to add noise to each user's data according to a model that would make it difficult for an attacker to recover the user's original data. They make use of the E-M algorithm to estimate the density distribution of the data in aggregate.

Although the paper considered a number of attacks against the scheme, I was left wondering about a simple binary revelation of whether a user had recently left their home (similar to PleaseRobMe.com). One solution is to delay the data reporting, although one would be able to learn the approximate time that an individual was likely to leave home each day. The other approach is to perturb the timing data as well, but this would seem to interfere with the ability to ask questions about, say, traffic levels at certain times of day.

Finally, Christos Koninis from the University of Patras gave a talk on federating sensor network testbeds over the Internet, allowing one to run testbed experiments across multiple testbeds simultaneously, with "virtual" radio links between nodes on different testbeds. So you could combine a run on our MoteLab testbed (around 190 nodes) with the TWIST testbed (220 nodes) to get a virtual testbed of more than 400 nodes. This is a very cool idea and potentially extremely useful for doing larger-scale sensor net experiments. Their approach involves routing data over a node's serial port through a gateway server to the other end where it is injected into the destination testbed at the appropriate point. They can emulate a given packet loss across each virtual link, not unlike Emulab. Unfortunately they did not really consider making the cross-testbed packet transmission timings realistic, so it would be difficult to use this approach to evaluate a MAC protocol or time sync protocol. It also does not properly emulate RF interference, but I think this is still a very interesting and useful ideas. Another cool aspect of this project is that they can add virtual simulated nodes to the test
bed, allowing one to run mixed-mode experiments.

Wednesday, February 3, 2010

David Shaw's Anton Supercomputer

Today, David Shaw of D. E. Shaw Research delivered a Distinguished Lecture in Computational Science here at Harvard (this is a new seminar series that Ros Reid and I cooked up to bring in a few high-profile speakers each semester). Of course, prior to forming D. E. Shaw Research, David founded D. E. Shaw and Co., a hedge fund which was one of the most successful quantitative analysis shops. Since 2001, David has been doing research full time -- D. E. Shaw Research develops both algorithms and customized machine architectures for molecular dynamic simulations, such as protein folding and macromolecule interactions. The result is the Anton supercomputer, a heavily customized machine with 512 specialized computing cores specifically designed for particle interaction simulations. It was a great talk and was very well attended -- I'll post the video once it's available.

David presented the algorithms and architecture behind the Anton machine. The goal is to run molecular dynamic simulations of molecules on the order of 50,000 atoms for 1 millsecond of simulated time. The performance target for Anton is 10,000 simulated nanoseconds for each day of compute time. To put this in perspective, the fastest codes on conventional parallel machines can muster around 100 ns of simulated time a day, meaning that 1 ms of simulated time would take more than 27 years to run. Anton can do the same in around 3 months. Prior to Anton, the longest simulations of these macromolecules that have been done to date are on the order of a few microseconds, which is not long enough to see some of the interesting structural changes that occur over longer time scales. (1 ms may not seem like a lot but it's amazing how much happens to these proteins during that time.)

Basically, each of the 512 nodes consists of a heavily pipelined special-purpose ASIC that is designed to compute the particle force interactions (using an algorithm called the NT method), along with a general-purpose processor that supports a limited instruction set. Communication is heavily optimized to reduce the amount of data exchanged between nodes. The processors are connected into a 3D toroidal hypercube and each processor "owns" a set of particles corresponding to a particular cube of space. They have built eight 512-node machines with a 1024-node machine coming online in March. They are working to make one of these available free to the research community to be hosted at the Pittsburgh Supercomputing Center.

The best part of the talk was the animations showing visualizations of a protein structure evolving over time. A movie showing just 230 usec of gpW showed substantial structural changes including partial unfoldings of the manifold. Apparently these dynamics have never been observed in other simulations and it's incredible how much insight the longer time scales can reveal.

David was very cool to talk with -- over lunch the conversation ran the gamut from computer architecture to quantum chromodynamics. I never got a sense of whether D.E. Shaw Research has any plans to commercialize this -- they really seem to be in it for the research value, and consider Anton just a tool to make discoveries. (Of course, I can imagine such a "tool" would be pretty lucrative to the drug-discovery industry.) This is a project is a great example of what's possible when computer scientists and domain scientists work closely together.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.