"Our greatest responsibility is to be good ancestors."

-Jonas Salk

Tuesday, July 21, 2009

Literate Climate Model Development

I am gratified to have been contacted by Steve Easterbrook of the other U of T since we seem to have similar interests. He is threatening to take me up on the idea of building a literate climate model, from scratch, in Python (yay!). He is considering the enlistment of several semesters worth of computer science classes as unpaid labor. I guess I should encourage this if it's at all realistic.

The idea is NOT to advance the state of the art in resolution or fidelity or process inclusion in climate modeling, at least not immediately. Rather it is to build a more accessible (readable, robust and modifiable) coupled general circulation model, representing physical oceanography, physical meteorology, and sea ice, forced by prescribed atmospheric concentration, orbital configuration, land surface and land ice configurations. It should be possible to download and install the model on mass market commercial computers. The design should not preclude eventual elaboration into a competitive state-of-the-art model but that should not be a primary concern.

In the Python spirit, readability counts. There may be places where readability must be sacrificed for performance, but we seek to minimize those. Above all, this is a project in literate computing. The idea is to build a CGCM with an entirely new, open, flexible and testable codebase that the maximum number of people will be able to, and want to read that can also be easily run on conventional computers. To do this, we seek to minimize complexity subject to a requirement of moderate fidelity.

I think it is important to start a project like this from a well-designed plan and a committed body of participants. At first I liked Steve's idea of drawing on an army of students passing through his classroom. But I think that's over-optimistic. Somebody has to spend a long time getting up to speed, especially on the radiation and dynamics codes.

However, there are big chunks of the code, including the overall design, that are amenable to classroom work. This might help such a project attain critical mass. But I don't think it can come to full fruition without some specialists. Perhaps we should start with the more modest intention of building an EMIC tuned to a CGCM.


Image snarfed from csa.fragme.org

18 comments:

Frank O'Dwyer said...

Has anyone attempted an open source climate model, written in a language that facilitates parallel computing and optimisation for modern CPUs (I think erlang is such a thing but I may be mistaken, there may be a better choice. Python is probably not it).

I think it would be very interesting to see such a thing as you would then benefit from computer scientists optimising the code and perhaps wringing more resolution/performance out of the code. My assumption here is that the faster the code goes the better the model could be - this is because I read somewhere that the models are currently better that the computer's ability to calculate them.

Anonymous said...

CCSM from NCAR (http://www.ccsm.ucar.edu/models/ccsm3.0/) is technically an open source project, but it is basically a traditional Fortran GCM with a whole lot of support in place for community contributions.

There is also a Java climate model with source available, but it isn't set up as an open source project (yet): http://www.astr.ucl.ac.be/users/matthews/jcm/index.html

Anyway, I agree with Michael about the need for expertise for some of the core elements. I'll have to think on how we do that...

Anonymous said...

Whatever language you use, and whether or not the model is, er, "literate", it's still a model, that is, someone's opinion/best guess (delete, to taste) backed up by some nifty statistics.

And still not worth the paper it's not printed on!
David Duff

Michael Tobis said...

Well, David, you have not the slightest idea what it actually is, do you?

I do not know how anybody can look at this and call it a "guess". Perhaps David missed that. Whatever that is, "guess" isn't it.

Anyway a major point point (aside from creating a curriculum in a certain type of high performance computing of which climate modeling is the prototype) is to make it possible for someone, perhaps someone with a little more patience and a little more openmindedness than David, to find out.

There is a lot of complaining about how science is opaque and data inaccessible on the more responsible skeptical reaches of the climate discussion. Of course, much of it fades into paranoia and smug foolishness, but the key point is true enough.

Science of the sort we practice should be open from top to bottom. Then David could go over it with a fine toothed comb and identify the guesswork.

Further commentary on this thread will be tightly moderated under the "please do not derail the thread" clause.

Real discussion is welcome, especially if it has technical content. Plenty of other threads are around here and elsewhere for random sniping.

Aaron said...

Michael,
David Duff has a point. The video is visually impressive, but are the effects it conveys accurate? Engineers must post a performance bond for their designs. Would you post a performance bond for any of the current climate models?

Some guys made a guess about what would be important to a climate model. The things they guessed would not be important got left out. Errors from what got left out cascade throgh the entire model. It is to fix some of those guesses, that your Python Climate model is so important. If the current models were very good, we would not need your Python model.

Look me in the eye and tell me that current climat models are correct with respect to ice sheet dynamics and ocean currents driven by sea water density near submerged ice. Tell me they correctly handle CO2 and CH4 from permafrost melt. Tell me they correctly handle sea bed methane clathrates. Tell me they understand the anchor ice on the flanks of Antarctica.

Tell me current models allow planning of civil infrastructure with an expected useful life of 80 years. With 20 year planning and finance period, engineers need to be looking 100 years into the future. Will current models predict maximum storm events in 100years with the accuracy required for an engineer to sign a drawing (and post a bond?) (What kind of storm drains should we allow?) Will it predict sea level in 100 years? (Where do we put the bridge?)

Are current models good risk assessment tools? (When will THIS_CITY have to be abandoned?)

Given the past performance of climate models with respect to Arctic Sea Ice, I do not think we can have a lot of confidence in any of the current climate models. For engineering and public policy pruposes, current models are no better than guesses.

We need a better model.

Michael Tobis said...

I don't think this project will build a better model in the sense you advocate. It will build a better model in other important senses, but it is unlikely to answer the important questions you list.

"Some guys made a guess about what would be important to a climate model. The things they guessed would not be important got left out. Errors from what got left out cascade throgh the entire model."

This is sort of true, and sort of not. The existing ability of the models to actually generate the right dynamics proves that the most important things are already in. (The probability of hitting it so closely with a wrong model is negligible.)

That's the point of showing one (of many) aspects, and one which might be relatively familiar, of the daily output as a movie. See, it behaves very closely analogously to the real system.

Look me in the eye and tell me that current climat models are correct with respect to ice sheet dynamics and ocean currents driven by sea water density near submerged ice. Tell me they correctly handle CO2 and CH4 from permafrost melt. Tell me they correctly handle sea bed methane clathrates. Tell me they understand the anchor ice on the flanks of Antarctica.

None of these are even attempted in CGCMs. You are advocating for ESMs (Earth system models). I actually think ESMs are premature at best. As far as what is conventionally called a climate model, these are at best explicit inputs.

Tell me current models allow planning of civil infrastructure with an expected useful life of 80 years. With 20 year planning and finance period, engineers need to be looking 100 years into the future. Will current models predict maximum storm events in 100years with the accuracy required for an engineer to sign a drawing (and post a bond?) (What kind of storm drains should we allow?) Will it predict sea level in 100 years? (Where do we put the bridge?)

These questions are much more germane to what I would call CGCM work. The answer is that we cannot remotely do these things, and only a few people pretend we can. Unfortunately, some of those people are getting funding at the state/provincial level. I have no kind words for this sort of activity.

I have already raised the question here as to whether such results will ever be practicable.

You raise important questions that are beyond the reach of science at present and may remain so. Of course, that's why we do research.

None of them are primary purposes of CGCMs. The primary design purpose of existing CGCMs is in exploring global dynamics and paleoclimate. The application of these tools to century-scale prognoses is natural - they are the best we can do.

Most people either grossly overstate or grossly understate the capacities of the current models. They are an impressive achievement.

The purpose that I propose is primarily didactic: it would be a tool to introduce motivated, scientifically serious people to the actual state of climatological knowledge, not some ridiculous half baked ideas gotten from magazine articles and blogs.

It would be a tool to investigate the architecture of complex modeling systems and to nucleate an academic discipline to study them.

It would not immediately be a tool for advancing the state of the art directly. Perhaps we would learn enough to do so.

On the other hand, as I have said before perhaps we are reaching a level of diminishing returns. We have to make decisions based on the information we already have. It is in no way clear that better information is coming.

Anonymous said...
This comment has been removed by a blog administrator.
Michael Tobis said...

Frank: Steve has a view of the openness of climate source here. It's not pretty.

We have been parallel for a long time but we don't use Erlang-style parallelism.

It's all Fortran with message passing. I expect we will stick to the message-passing paradigm for the low level code. This is because message passing is relatively easy and suffices.

Performance Python is not as silly a concept as it might appear, but in practice so far high performance Python remains wrapped around compiled, staticly typed modules. (It can be argued that the boundary between Python and C is fuzzy, and in fact the most common flavor of Python is just a large and powerful C library. It can link to other C codes either as caller or callee.)

It's perfectly possible to build systems that are Python at the coarse grain and Erlang at the fine grain. The only serious example I know of is disco.

Anonymous said...

Well, now I'm disappointed!

I'm a rising Junior studying Atmospheric Science at Cornell, and this is exactly what I planned on starting during my remaining tenure at the school. My idea was to use a project like this as a vessel to explore dynamics and the works because it's getting frustrating waiting around for those classes to come by. I wanted to experiment with implementing some of the numerics in CUDA, but I was going to stick with Python for the vast majority of the project.

Looks like I'll have to fire off an e-mail to Steve to indicate my support for the project.

Michael Tobis said...

Don't be disappointed! Build us a dynamic core! Software engineers, no matter how high up the credentials ladder, can't do that!

Forget CUDA for now. (A technical stunt like that is not good for a career in climate. I will tell you the sad story of somebody else who fell for that once. Me, specifically.)

We are not about performance. Anyway the low level computations can be abstracted away and targeted to CUDA later. (See my PyNSol project.)

We just need spherical dynamics that works. We can talk about appropriate wrappers for the low level math; what we are after is code that is correct and self-describing and not horribly inefficient. NumPy will be fine.

email me if you're interested.

Michael Tobis said...
This comment has been removed by the author.
Anonymous said...

Alas, Michael, your mysterious 'thing' does not appear to be showing via the link you provided.

I am very happy for you to construct a new method of modeling climate provided that at each stage where there are two or more differing possibilities you tell us what they are and why you chose what you chose. You indicate that complete openess would be your policy and I accept that without question. I wish you good luck in the near certain knowledge that you will fail to predict anything much with any degree of accuracy for reasons which have nothing to do with your undoubted competence.

Incidentally, you know as well as me that I couldn't even double check the date/time stamp on your programme, let alone the statistics but, to quote a well-known advertising line over here "I know a man who can" and that is the key element.
David Duff

Anonymous said...

David is confusing two different uses of models: for forecasting (like weather forecasting) and for understanding complex systems (by constructing a simulation and exploring comparisons with the real system). Climate modellers do the latter, and tend to be rather nervous about calls from policy makers to provide predictions about the future.

However, the science is now sufficiently mature that climate models do represent embodied scientific theories - they really are detailed theories about how earth systems work, and, just like any good theory, can be used to make predictions that can subsequently be tested.

That doesn't obviate the need to answer the question of how we know these theories are valid. I'm working on a longer answer to that and will pop back in a week or so with details. But calling a climate model "just a guess" is about equivalent to calling the theory of evolution "just a guess".

Michael Tobis said...

I was going to say something very much like what Steve said, but I held back specifically because of the extent to which the IPCC schedule drives the course of action at the CCSM group at NCAR at the least.

The extent to which "prediction" is part of the funding game and hence part of the legitimate expectation of the public is a key part of the deterioration of the field.

(Another key part is buying snake oil when they could have had the whole snake, but that's another story entirely.)

But I will go so far as to agree prescriptively: climate modeling research in the foreseeable future should not be about prediction. If people make this mistake it is little wonder, though.

Consider the CCSM homepage. " Such models use mathematical formulas to recreate the chemical and physical processes that drive Earth's climate. What emerges from trillions of computer calculations is a picture of the world's climate in all its complexity" it begins. Fair enough. But just below that the "news":

- Community Ice Sheet Model Will Aid Understanding of Sea Level Rise

- Projecting Emperor Penguin Population in a Warming World

- The computational future for climate and Earth system models: on the path to petaflop and beyond

- Predicting 21st-century polar bear habitat distribution from global climate models

It's not all prognostic, but it all has a prognostic spin. (Also a bit heavy on the polar charismatic megafauna, wouldn't you say?)

If people mistakenly understand that we modelers are in the prediction business, we have only ourselves to blame.

Anonymous said...

@helicity (Daniel): Take heart - we're going to need you for this (if we ever get it going)!

Anonymous said...

Steve, thank you for clarifying what passes for my mind with such courtesy.

Alas and alack, both you and Michael are in dreamland if you think a 'climate model', that is, a model that simulates and explains how climate 'works', is going to be of anything other than esoteric interest. We know (or think we know - but that's another story!) how evolution works and whilst it is of interest it has no practical application because it cannot, by its very nature (no pun intended), predict anything.

Michael's model, assuming it works, will be of like mode and unless it can predict with accuracy it will remain fascinating to the scientists concerned but of marginal interest to everyone else. Of course, that's not a reason not to do it.

Finally, it has been a source of great amusement to me that so many of your fellow scientists in the last 50 years or so have, like lost tourists wandering into Soho (or the Bronx 'over there'?) been blinded by the razzle-dazzle of the lights and been suckered into all sorts of naughtiness. Well, at least it shows that scientists are human just like the rest of us!

Perhaps I, too, could offer some advice to the young scientist above who is just setting forth on his voyage of discovery: Don't ever even shake hands with a politician!
David Duff

Michael Tobis said...

David, yes, esoteric, exactly. Please pick another thread, now.

David B. Benson said...

I strongly recommend you consider spherical triangles (several web sites) for the basic gridding of the (almost) sphere. These easily subdivide for locations which need finer gridding. That can even be changed dynamically if that is a help.