The fact that perfection can never be achieved in the process of software building is nothing short of an axiom, but even with high quality, code exceptions have a way of interfering with solutions deployed into production environments. In this regard, distributed systems present perhaps one of the most challenging scenarios for identifying and dealing with bugs. Distributed computing is associated with resources that span across multiple computers, with communications centralized via a network. Microsoft's MoDist tool is designed to handle distributed systems.
Roy Levin, Director, Silicon Valley Lab showcased the tool at the Microsoft Research Road Show in Silicon Valley at the end of the past week. MoDist is, of course, a project of Microsoft Research Silicon Valley. "This is a tool that goes in and intercepts on each of those computers the place where they call through on a core interface, the WIN API in this case, and then there's a controlling machine that basically intercepts all of those requests and basically drives unexpected events back. So it causes packets to be dropped, or it causes wrong answers to happen, or simply simulates a crash, or whatever," Levin explained.
MoDist is set up not only to simulate exceptions as part of the effort for sniffing bugs in distributed systems, but also to observe the environment. The monitoring aspect of the tool involved the systematic explorations of all possible error causes. "There's obviously a very large state space here, and we use model checking as the technique for exploring that state space, and some guidance from the programmer about what the high level properties of the system are supposed to be, and then check those results to see whether the thing is done," Levin added.
But, in the end, model checking will not do very much in terms of actually identifying a bug. However, it will cause the systems under test to enter states generated especially to pin point problems. This is a benefit which eludes normal testing scenarios. Levin applauded MoDist for being capable of identifying latent bugs in multi-machine production environments which were in operation for a few years.