"Our greatest responsibility is to be good ancestors."

-Jonas Salk

Tuesday, January 1, 2008

NCAR vs Google: Place your Bets

Sticking with a technical theme for a bit, and following onto my most recent previous article, I'd like to further address what I perceive as the limited productivity of climate modeling as an enterprise.



Getting software right is a very subtle skill. In general it is much easier to write barely useful software than to write powerful software, and broken software is easier still. On the other hand, observing that it is possible to do much better than has been done in the past has become trivially easy.

That is is possible to get software right is proven daily in the commercial sector. In my commercial career and especially more recently in the Chicago Python group I have been exposed to extremely talented programmers, some of whom are literally orders of magnitude more productive than ordinary programmers.

They are very particular regarding their choice of tool as any highly talented and productive person is, and many of them tend to choose Python or similar tools for most of their work. Of course, I cannot paint like Rembrandt even given the finest brushes and paints, but that doesn't mean brushes and paints don't matter. The tool isn't the point, though. The point is that a new approach has emerged.

You don't have to take my word for it anymore. The consequences of these capacities, interestingly developed almost entirely outside the academic sector, are increasingly visible through the productivity of Google, the largest collection of people with this sort of talent. Many smaller companies and organizations also partake of this cluster of techniques, but the remarkable productivity of Google is visible to anyone who cares to look.

So let me flip my question from the previous article, which was, whether computational technique can help to increase the rate of progress in climate modeling? Here is a recasting of that question:

Can we be sure that the greatly improved methodologies recently developed in the private sector can't be applied to climate modeling and related endeavors?


It seems to be obviously premature to say so.

The question, then, is whether climate modeling (and computational science, at least as applied to complex natural systems) matters. If it doesn't, we should all take up macrame or real estate. If it does matter, we should be paying very close attention to what works elsewhere. Not everything will apply, but what will apply will not be nothing, either.

Should Silicon Valley folks just build a climate model? It's a risky endeavor and I don't see a viable business proposition, but maybe, maybe somehow.

Should they just fund a climate model and not get involved? No. Foundation money directed toward software is probably even more likely to go astray into worthless boondoggles than government money.

Should they just shrug and forget the whole sorry business, accept that CO2 is a major forcing, and get on with other things? Maybe but I hope not. This abandons the adaptation side which is going to matter, trust me.

So I think the problem is institutional and motivational, not technical.

12 comments:

Craig Allen said...

I would have thought that helping climatologists to revolutionize their climate modeling software would be something that Google would be very interested in sponsoring and lending their expertise to. Perhaps a collective of climate modelers could put together a proposal and take it to them.

Michael Tobis said...

Craig, there are arguments against that I can think of, from Google's point of view, but I guess I will refrain from making them.

Rob said...

I think Google has already looked at this and decided what they can best do:

http://www.google.org/climate.html

EliRabett said...

Have you looked at LabView and similar that are used for programming real time applications??

Dr. Lemming said...

What if the NSF of some other government funding agency was to offer an X-prize style reward for the first climate model that met certain objective performance criteria?

Marion Delgado said...

I don't see "commercial" programmers as ever evidencing any superiority whatsoever as a group, Michael. The Python you mentioned, for instance, is open source software. The developers are not all either commercial or noncommercial.

Microsoft is famous for taking good programmers and either ruining them or underemploying them. Their goal is not good product, it's profit. The key difference is that they have an anti-scientific process. The whole goal is to identify as many possible micro-points of patentable intellectual property as possible, and push the envelope of alleged value-added minus costs - more productive coding is simply one element of cost reduction. MSFT are monopolists, but so are many commercial software companies, in their specialties.

The science code developed at universities, and the general code developed there and shared with, in effect, only quasi-commercial labs like Bell Labs* and Xerox PARC - stuff developed for mainframes and timesharing - are what the commercial programmers relied on later.

It blows my mind to see commercial programmers held up as a standard. The actual process DOES in fact involve more academic money and more people, and a process of give and take that commercial programming simply does not permit.

The most successful commercial programming ventures that do science work are, in fact, the most open and reciprocal.

For example, Michael, it seems to me most bioinformatics software is free and while it may not be GNU compliant, is open to changes. I think there's a reason. The trend is in fact away from relying on corporations to produce software. Even MSFT is trying to lure in open source programmers with its .NET, C# and MONO.

Unless you meant to imply that quite often good general, academic or open source programmers are often able to leverage good commercial employment, I think that that's a wrongheaded approach.

The actual process of improving climate science software is almost certainly going to resemble science.

How do you get ACCURATE results? By non-disclosure agreements and deception? Or by peer review? An open source development cycle when everyone involved will benefit - say by better modeling - can be many times faster than a commercial development cycle.

*AT&T were net beneficiaries of what was in fact open sourcing but not labeled as such. The people they sold UNIX to have not added anything to the world, but have, Wright Brothers-like, actually inhibited progress.

Michael Tobis said...

Marion, I think I understand what you are saying. If so, I think you misunderstand what I am saying.

I am contrasting academic vs non-academic, not open source vs closed source.

In a sense academic computing is a third model, and to some extent it lags either of the others. I think we have more to gain from the software craftsmanship model than from the process-driven software engineering model.

What we have now is very loosely informed by either process. It is sad to say that partially under the influence of hungry university lawyers looking for commercial royalties, partly under the pressure of the paranoia induced by unfair criticisms of the denialist stripe, and partly out of sheer impenetrability, the practice of scientific coding is very far from the scientific exchange ideal that Linus likes to talk about and is much closer to closed source practice.

Things may be a bit better in the actual CS community, but they have long since stopped taking much interest in computational disciplines. In the application disciplines, especially messy ones like environmental sciences, it's an astonishing mess.

That all said, it's not process I'm talking about here, but technique.

There simply isn't enough reward for doing a better job. It's the only conceivable explanation of the awkward mess we have to work with.

The weak capacity of the disciplines and the computer sciences to collaborate effectively is trapping us in antiquated methodologies.

There's endless lip service paid to the possibility of such a collaboration but for some reason the practice doesn't work out.

In America, ESMF is charged with solving this problem and I think they're barking up the wrong tree.

I think the European effort may be less misguided, but still may lack the needed agility.

David B. Benson said...

Sorry, but what is 'ESMF'?

Michael Tobis said...

Eli, re LabView, I've seen some demos.

Honestly I am not that big on GUIs for these purposes. If we do our job right (in which the most doubtful step is that we get funding) a GUI will practically write itself someday if it's useful.

On the other hand, if there were more of an educational component to our proposal (and I have tried to spin my approach that way in the past) yes, something like that would be very interesting.

It happens that there's an Austin-based python company called Enthought that is doing absolutely amazing things with science/engineering GUIs.

Michael Tobis said...

David, "GIYF" stands for "Google Is Your Friend".

"ESMF" stands for Earth System Modeling Framework which has a lot of institutional backing, and has engendered a lot of disgruntlement in the trenches.

Most people who actually touch climate codes think that the flaw in ESMF is too much computer science, but it is really too much software engineering misdirected. NASA leadership is inappropriate for this sort of objective; NASA mostly needs to write bulletproof code, not flexible code. The best development techniques are very different.

Another way of looking at the ESMF problem is via the legend of the toaster.

Anonymous said...

Hey cool betting blog.

We have a group of UK tipsters who just started out Free Bets. Free Betting & UK free Betting previews. We are doing pretty well, take a look, we are always on the look out for new tipsters.

Michael Tobis said...

Spam spam spam spam baked beans and spam. Turning comment moderation back on...