"Our greatest responsibility is to be good ancestors."

-Jonas Salk

Tuesday, December 25, 2007

Excised Paragraphs

Spending XMas day trying yet again to convince NSF to let me, not so much rewrite the climate models, as redesign the architecture of the models to match the workflow.

Given the nature of the call, the following is probably not going to strengthen my argument, but I think it's interesting and I welcome your input. (I'd especially welcome commentary from JM and JM).

===

The difficulties in constructing working high-performance codes color the scientific process and other decision support networks dependent on it.

To some extent the problem in climate modeling is based on the origins of the component models in operational prediction communities (such as weather prediction), wherein the goal of software design is the cost-efficient optimal projection of the state of the atmosphere into the near future. It's often noted that this is an initial value problem while running similar codes in climate mode is a boundary condition problem; the objectives are substantially different. Nevertheless, a weather code has a climate and a climate code has weather; these are structurally similar. Accordingly, the methods of the weather modeling community are injected into climate methodologies.

The problem is not in the different mathematical structure of the purposes at hand. It is in the different social structures. A weather model is write-once, run many times. Its purpose is efficiency and correctness. A climate model is an experimental platform. While it is efficiency constrained, flexibility and transparency are key to its utility, keys which are of trivial importance in operational settings.

I believe that despite the very slow progress of the past decade or so, climate modeling has the potential to be vastly more skillful. It seems at least that this should be put to the test. Flexibility, transparency, interoperability, testability and accessibility to automated reasoning are needed. Climate modeling needs to partake of modern agile development methodologies such as those at Google and similar very high productivity companies. Certainly the potential value add is there. It's time that some institutional structure existed to support this.

9 comments:

David B. Benson said...

There are several quite new programming lanaguages which seem would aid in solving some of the problems better than (whatever version of) Fortran does.

Fortress, from Sun Research, comes to mind as specifically designed for scientific programming.

Also F#, from Microsoft Research, might be worth your consideration, being a redo of O'Caml.

Disclosure: I do almost all my programming in SML/NJ, a slight variant of Standard ML.

Michael Tobis said...

I believe that Python + C suffices.

I don't expect that Fortress, Chapel et al. will succeed, nor that they solve the problems that most concern me.

F# is news to me, but thanks for the tip.

David B. Benson said...

Python is good (expressive) and seems to be very fast for some problems. I know nothing about the state of software engineering support for this lanaguage.

C has many, many known problems with expresitivity and the difficulty of properly assessing the code quality. I say this despite the model checking work which has managing to find actual bugs in production compilers.

Michael Tobis said...

Maybe we should take the technical discussions about languages offline. I'd be happy to carry this conversation further, being a bit of a Python fanatic.

My posting was intended to distinguish codes used operationally from codes used in an exploratory mode.

It is my understanding that the exploratory nature of climate research is not well supported by the operational structure of the models, which by the way are almost always written in some dialect of Fortran.

The expressivity of Python is indeed what I like about it. It's important to understand that the nature of scientific modeling is very different from other codes. The concepts that need to be expressed are very different.

In most coding environments, there is a clear heirarchy of skill with perhaps a few specialists lurking in corners. Nevertheless the idea is that a good coder can fill in for another if it becomes necessary, and management's role is to facilitate that.

In scientific coding, there is non-overlapping technical insight between the people designing the system and the people implementing it. This is a crucial and unique aspect that is being dealt with inadequately.

Anonymous said...

In my past career which included writing software for a certain industry, our group found prototyping and analysis of algorithms and methods very much easier to be done with scripting languages. The content was then (manually and with many refinements) transferred to a compiled language on a different platform, for cross-verification with the script version, testing and finally release.

It is so much faster and more productive to do research as well as development work with a nice scripting language in a nice environment *cough* like Matlab *cough* where you can do vector operations natively, plot stuff very easily, pass multiple arguments to and from functions with ease etc etc...

But damn if it ain't slow running compared to real compiled C. But speed is not important in the R&D phase.

You just have to have some way of keeping the versions similar. One should be able to keep em modular and cross-change modules between script and compiled versions (very very handy for debugging, also makes dev faster as you need to only keep part of the cycle in script, or you can save intermediate results), and results should stay unchanged (to a point, there might be some allowable computational differences).

That's how I'd tackle this problem, but I of course don't have very massive experience in all this...

James Annan said...

almost always written in some dialect of Fortran

I've never met a programming language I couldn't write Fortran in :-)

David B. Benson said...

Just a brief note to state that model checking has been applied to find errors in production operating systems, not, AFAIK, compilers.

I lurk and occasionally post on
comp.lang.functional
which seems a more appropriate place for
programming language discussions.

But james annan needs to look at Haskell. I doubt that he can fortran in that language. :-)

EliRabett said...

What you want is something like LabView from National Instruments and is used for real time data acquisition. It abstracts instruments (voltmeters, etc) into single objects into a graphical interface. LabView and relatives are used for robotics also.

Bryan Lawrence said...

Well, I'm a big python fan too, and what with the advent of parallel python, a number of options abound.

However, regardless of the programming language, module definition is the issue for constructing earth system models with enough complexity for climate modelling. Plug and play with modules is hard, without initiatives such as ESMF (US) and PRISM (EU), and it's hard with them too ...

What's the take home point: the reason why python is so useful is the modularisation and thinking to support it is hidden below the python layer. We need to get to that level with climate modelling, and that means we need the hard computer science that is (slowly, oh so slowly) delivering automatic couplers and the like ...