DETECTION AND DESIGNATION

Characterization, detection, and designation of observed extreme events.

Extreme weather has large effects on human and natural systems. Through the use of observations, climate model and statistical techniques, CASCADE researchers examine how changes in the natural environment have impacted recent weather extremes.

Increased understanding of the influence of environmental drivers on current extreme weather increases confidence in projections of changes in future extreme weather statistics.  The centerpiece modeling effort of the CASCADE Detection and Designation team is the C20C+ (Climate of the 20th Century) experiment. CASCADE is a primary contributor to the World Climate Research Program (WCRP) coordinated international project. This multi-model effort aims to aid event attribution by building a database of ensemble climate model simulations describing the “world that was” in a realistic-as-possible configuration and the “worlds that might have been” in a counterfactual configuration where environmental drivers have not changed.

Associating changes in the behavior of extreme events to specific environmental drivers requires a systematic characterization of extreme events in the recent past. Recent advances in simulation capabilities and statistical methodologies allow us to focus on the impacts of a wide range of environmental drivers at regional-to-local scales, to focus on factors impacting the spatial and temporal co-occurrence of extremes, and to simulate how we might expect extremes to change in the future.

UNDERSTANDING STRANGE WEATHER

Understanding how changes in the ocean and atmosphere make weather events more extreme.

Climate extremes – such as hurricanes, major floods, and heat waves – not only stress society, they push the bounds of what modern climate models can simulate.

While extreme weather events often have impacts at relatively small (city-wide) scales, they are often driven by planetary scale forces. Observing and simulating these events requires datasets and models with high fidelity at a wide range of scales.  CASCADE is making novel use of self-similarity* in the atmosphere to define new standards for how model performance should change as model scale changes from ‘city’ to ‘planetary’. The CASCADE team is using the Department of Energy’s new Accelerated Climate Model for Energy (ACME) to simulate past weather and is using these new standards to evaluate the model.  Through a tight collaboration with ACME developers, these insights are being translated in to improved model fidelity at a wide range of scales.

Designating and projecting changes in extremes requires a well-developed understanding of the processes that drive changes in extremes. In particular, for the overall goal of the CASCADE SFA, it is necessary to understand how have changes in the physical behavior of the coupled system altered the frequencies of occurrence and the characteristics of extreme climate events? To address this issue, it is also necessary to advance our understanding of the processes governing the properties of extremes that are being investigated within the CASCADE SFA. The SFA team focuses specifically on the processes that drive multivariate extremes, the processes that drive changes in the spatio-temporal characteristics of extremes, and the fidelity with which these processes are represented in climate models.

*self-similarity means that the statistics of the atmosphere change predictably depending on the scale at which the statistics are evaluated

How to run UV-CDAT in parallel at NERSC

**[Michael Wehner](mailto:mfwehner@lbl.gov) and [Hari Krishnan](mailto:hkrishnan@lbl.gov), Lawrence Berkeley National Laboratory**

## Introduction

Most climate data analyses have at least one dimension that be exploited at NERSC in an embarrassingly parallel manner. In fact, the most common of these is simply time. The scripts presented here are a general solution to take advantage of temporal parallelism for a wide variety of lengthy UVCDAT calculations.

Typically, long time series climate data sets are spread across multiple files, usually to keep the file sizes manageable. The UVCDAT script called `cdscan` is used to construct an xml file to be read by the `cdms2` module as a single pseudo data file that contains the entire time domain. In the case of exploiting temporal parallelism, it is most straightforward to instead keep the files separate and assign a single processor to each file. Hence, in order to get high degrees of parallelism, more files of short time intervals is actually better than a few lengthy files. Another often parallel dimension is found in ensemble simulations, where files are often arranged by realization number.

In order to assign individual files to individual processors, we use the `mpi4py` module. The total number of MPI tasks is set equal to the total number of processors. For instance, if the number of input files is 48, the following simple batch command will work on hopper.nersc.gov or edison.nersc.gov.

“`
qsub python_test.pbs
“`

where the batch input file to execute the parallel UVCDAT script, `python_test.py`, described below, is as follows:

### python_test.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48 #PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err
#PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd $SCRATCH/my-directory

aprun -n 48 python python_test.py –parallel *nc
“`

In this example, it assumed that there are 48 netcdf files in $SCRATCH/my-directory. This job would use 2 nodes on hopper or edison.

**Note: It is highly recommended that you perform all parallel operations in the scratch directories or other parallel file systems. Jobs executed on parallel file systems such as your home directory or /project could be up to 10 times slower than on a parallel file systems such as LUSTRE.**

This script uses the `cascade` module, but you may replace that with `module load uvcdat`

## Memory limits

In many cases, there will not be enough memory per processor to run your script. There are many weird error messages that can be returned from batch runs. In the case of not enough memory per processor, the log file, python_test.$PBS_JOBID.out, where $PBS_JOBID is a number, will contain a message like this:

“`
[NID 05718] 2015-03-18 17:07:02 Apid 47875875: OOM killer terminated this process.
“`

To correct this error, you will need to idle out some processors. This script below will use only 12 of each of the 24 processors per hopper node. Hence, it asks for twice as many processors (96) to run 48 MPI tasks. The NERSC website can tell you how much memory per node is available. If you know the memory footprint of your code, you can calculate `mppnppn` by dividing the available memory on a node by the code’s memory footprint. It is not uncommon to use values of 4 or less for high resolution data sets. Note that the value of –N in the `aprun` command is equal to `mppnppn` and should be an integer divisor of the number of processors per node (24 on hopper and edison). Also note that the newly recommended syntax is to continue to set `mppwidth` to the number of tasks actually required.

### python_test_2xMemory.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48
#PBS -l mppnppn=12
#PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err #PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd /global/project/projectdirs/m1517/tmp/python_test

aprun -n 48 –N 12 python python_test.py –parallel *nc
“`

This job would use 4 nodes on hopper or edison.

## UVCDAT example using mpi4py

In order to robustly enable multiple MPI tasks, see the python script below. Real scripts will be more complex, but this will work as a stencil.

### python_test.py

“`python
import sys, cdms2, string
# The parallel branch
if sys.argv[1]==”–parallel”:
# note that mpi4py has been imported in the batch script. You may want to do it here instead in some cases.
# import mpi4py
from mpi4py import MPI
comm = MPI.COMM_WORLD
# Size is the number of tasks controlled by –n in aprun. 48 in this example.
size = comm.Get_size()
# rank is the id of this task and hence this processor.
rank = comm.Get_rank()
# files is a list of the input file names determined by the filtered list on the aprun line.
files=sys.argv[2:]
# file_name is the name of the file to be processed by this task.
file_name=files[rank]
# A serial branch that we find useful to test code.
# The execute line is:
# python python_test.py some_netcdf_file.nc
if sys.argv[1]!=’–parallel’:
rank=0
file_name=sys.argv[1]

# The main body of the code. Likely extracted from a current serial code.
print “processor “+string.zfill(rank,4)+” is starting “+file_name

try:
f=cdms2.open(file_name)
# Do some fancy math on file_name with python code here. var=f(‘tas’) # etc.
# Or you can call some other kind of program, such as C, Fortran, ncl, etc. import os
os.system(‘some_serial_code.x ‘+file_name)
print “Rank”,rank, ” succeeded!!”
except:
print “Rank”,rank, ” failed!!!!!! ”

sys.exit(0)
# end of python_test.py
“`

Print statements will be found in the logfile `python_test.$PBS_JOBID.out`. The try/except structure is an attempt to enable parallel jobs with a few bad input files to complete the tasks on the good files. In cases with thousands of files, you don’t want the job killed because one of those files is corrupt. You can find the bad input files quickly by grepping on “failed” in the logfile, then grepping again for the rank numbers that failed. The order of print statements will be pretty random as tasks do not appear to initiate in lock step. Without the try/except coding structure, any error will cause all tasks to end immediately. With this coding structure, if only a few files are corrupted, you can go back and run your script serially afterwards on the repaired files. This will be a lot faster than waiting again in the queue for 10000 processors. This trick appears to work, but we there are segmentation errors that it does not capture. A robust exit strategy to capture errors without ending all tasks remains to be developed.

Also, this example can be used as a quick and dirty way to call an external serial program on many files in parallel as indicated above. This may be a good way to quickly parallize ncl/nco scripts or compiled programs. But note that it will fail when calling serial python scripts due to conflicts. So for python scripts, you will need to modify the code itself per this example.

## Efficiency issues.

As there is no interprocessor communication in this embarrassingly parallel example, you might expect near 100% parallel efficiency. This will not likely be the case. As noted above, efficiency will be much better on parallel file systems. However even then, `cdms2` is not designed for parallel input and output, hence there is contention for i/o resources. We find that i/o is generally the biggest single computational efficiency issue. Nonetheless, we have reduced throughput from weeks to hours in many cases. We do not recommend trying to read xml files constructed by `cdscan`, as contention for the scanned files is large.

Furthermore, we find that a large number of shorter duration files, each read by a single processor, is faster than having multiple processors read segments of longer duration files, due to file contention resulting from the lack of parallel i/o UVCDAT modules. However, we do note that if time is not an embarrassingly parallel dimension than space often is (such as a temporal averaging or similar calendar operations). In this case, you most likely will need to extract data from single files by many processors and efficiency will suffer. However, throughput gains over serial execution may still be possible.

## Limitations of these scripts.

These scripts deal with only one use case. That is a parallel execution of a serial script that reads one file per task. Although this is often useful, especially when time is an embarrassingly parallel dimension, there can be more parallelism to be had if files contain more than one time step. In this case, code could be written to have multiple files read from the same file. However, without a parallel `cdms2`, efficiency is limited. Regarding output, writing multiple files is not too troublesome as `cdscan` can be used to serially join them afterwards.

Finally, depending on your patience, it may not be worth the trouble to run UVCDAT in parallel at all. Queue waiting times can be many hours, even days. We do not generally bother with a parallel execution if a serial execution can be run in a day or less. However, when execution time (including queue waiting times) can be reduced from months to days or hours, parallel UVCDAT execution can enable analyses heretofore impractical if not impossible.

Additionally posted on: https://github.com/UV-CDAT/uvcdat/wiki/How-to-run-UV-CDAT-in-parallel-at-NERSC

North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends

Two BER funded DOE laboratory scientists, Ruby Leung (PNNL) and Michael Wehner (LBNL), are among the principal authors of a comprehensive new review of the large scale meteorological patterns (LSMP) responsible for short term North American heat waves and cold snaps. The objective of this paper is to review statistical methods, dynamics, modeling efforts and trends related to such events. In particular, the role of LSMPs on observed past and simulated future extreme temperature changes is explored. Leung reviewed the state of the art in climate modeling of LSMP and extreme temperatures and future changes. Wehner reviewed the current statistical modeling of observed and simulated temperature extremes and trends and assessed climate model performance in simulating observations. The paper concludes by assessing gaps in our knowledge about LSMPs and temperature extremes.

This review paper was part of the activities conducted under the auspices of the US CLIVAR “Extremes and Large Scale Meteorological Patterns” Working Group. Previously, the working group sponsored a workshop on the topic at the Lawrence Berkeley National Laboratory.

Citation: Richard Grotjahn, Robert Black, Ruby Leung, Michael F. Wehner, Mathew Barlow. Mike Bosilovich, Alexander Gershunov, William J. Gutowski, John R. Gyakum, Richard W. Katz, Yun-Young Lee, Young-Kwon Lim, Prabhat (2015) North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends. Climate Dynamics, 0930-7575. 10.1007/s00382-015-2638-6

http://link.springer.com/article/10.1007%2Fs00382-015-2638-6

Untitled

Performance portrait of the CMIP5 models’ ability to represent the temperature based ETCCDI indices over North American land. The colors represent normalized root mean square errors (RMSE) of seasonal indices compared to the ERA Interim reanalysis. Blue colors represent errors lower than the median error, while red colors represent errors larger than the median error. Seasons are denoted by triangles within each square. Models marked with “*” are not included in the RCP8.5 projections. Root mean square errors normalized by the model median RMSE for 3 other reanalyses are shown in the rightmost columns for comparison.

tas_ETCCDI_changes

Projected seasonal changes in North American extreme temperatures from the CMIP5 multi-model at the end of this century under the RCP8.5 forcing scenario. The reference period is 1985-2005 while the future period is 2080-2100. Winter changes are shown on the left while summer changes are shown on the right. The top figures represent changes in cold nights (Tnn) while the lower figures represent changes in hot days (Txx). Units: Kelvins.

tas_figure1 tas_figure1_right

Change over 1950-2007 in estimated 20-year annual return values (oC) for a) hot tail of daily maximum temperature (TXx), b) cold tail of daily maximum temperature, (TXn) c) hot tail of daily minimum temperature, (TNx) and d) cold tail of daily minimum temperature (TNn). Results are based on fitting extreme value statistical models with a linear trend in the location parameter to exceedances of a location-specific threshold (greater than the 99th percentile for upper tail and less than the 1th percentile for lower tail). As this analysis was based on anomalies with respect to average values for that time of year, hot minimum temperature values, for example, are just as likely to occur in winter as in summer. The circles indicate the z-score for the estimated change (estimate divided by its standard error), with absolute z-scores exceeding 1, 2, and 3 indicated by open circles of increasing size. Higher z-score indicates greater statistical significance.

Fewer total number of hurricanes but more intense ones in a warmer world

The four idealized configurations of the US CLIVAR Hurricane Working Group are integrated using the global Community Atmospheric Model version 5.1 at two different horizontal resolutions, approximately 100km and 25km. The publicly released 0.9ox1.3o configuration is a poor predictor of the sign of the 0.23ox0.31o model configuration’s change in the total number of tropical storms in a warmer climate. However, it does predict the sign of the higher resolution configuration’s change in the number of intense tropical cyclones in a warmer climate. In the 0.23ox0.31o model configuration, both increased CO2 concentrations and elevated sea surface temperature (SST) independently lower the number of weak tropical storms and shorten their average duration. Conversely, increased SST causes more intense tropical cyclones and lengthens their average duration resulting in a greater number of intense tropical cyclone days globally. Increased SST also increased maximum tropical storm instantaneous precipitation rates across all storm intensities. We find that while a measure of maximum potential intensity based on climatological mean quantities adequately predicts the 0.23ox0.31o model’s forced response in its most intense simulated tropical cyclones, a related measure of cyclogenesis potential fails to predict the model’s actual cyclogenesis response to warmer SSTs. These analyses lead to two broader conclusions: 1) Projections of future tropical storm activity obtained by a direct tracking of tropical storms simulated by CMIP5-class resolution climate models must be interpreted with caution. 2) Projections of future tropical cyclogenesis obtained from metrics of model behavior that are based solely on changes in long-term climatological fields and tuned to historical records must also be interpreted with caution.

Citation: Michael Wehner, Prabhat, Kevin Reed, Daithi Stone, William D. Collins, Julio Bacmeister, Andrew Gettleman (2014) Resolution dependence of future tropical cyclone projections of CAM5.1 in the US CLIVAR Hurricane Working Group idealized configurations. J. Climate 28, 3905-3925. DOI: 10.1175/JCLI-D-14-00311.1

figure4.xlsx

Changes in the number of tropical cyclones in the idealized hurricane working group simulations.

Self-similarity of clouds

Self-similarity of clouds from MODIS

The above three images are from the same satellite image.  Each has a different size, and all have been resized to the same size in the above image.  One is approximately 112 km across, one is approximately 325 km across, and one is approximately 750 km across.  Which is which?*

Moreover, why does this matter?

Strange weather–extreme events–generally involve a wide range of scales.  Weather systems are often several hundreds of kilometers wide, but rain within them changes rapidly over a few kilometers.  Doppler radar imagery that is prevalent on weather reports show this. One part of a city might have pouring rain while another part of a city has none.

Because of this wide range of scales, weather and climate models must be able simulate how the atmosphere moves at a wide range of scales in order to simulate strange weather.  Weather and climate models divide the atmosphere into a bunch of relatively large `boxes’ (analogous to pixels in an image) and track the movement of wind, water, and energy among the boxes.   The boxes in state-of-the-science global climate models  are approximately 25 km across, and they must use a separate type of model (technically called a parameterization) to deal with what happens within these boxes.  In order to do a good job at simulating strange weather, these parameterizations must take what is happen at large scales and translate them reliably into what is happening at the small scales at which strange weather occurs.

This is where self-similarity comes in.

Many aspects of the atmosphere are what we call `self-similar‘.  The cloud images above illustrate this perfectly.  The images are by no means identical, but they also aren’t very different.  There is very little that distinguishes one image from another: despite one image being almost 10 times wider than another.  This is the essence of self-similarity: the statistics at one scale are identical to the statistics at another.  In the cloud image above, this means that if you look at the number of clouds that have 8 pixels and compare that to the number of clouds that have 67 pixels, the ratio of those two numbers will be about the same in each image!

In the CASCADE project, we are taking advantage of self-similarity to improve the way that weather and climate models connect the large scales to the small scales.  If you know the statistics of the atmosphere at large scales, this tells you what the statistics should be at small scales.  This also means that as we shrink the size of weather and climate model grid boxes, we know how the statistics of the atmosphere in these grid boxes should change.  In some recent work, we used this knowledge to provide a detailed guide for how parameterizations of clouds should change as model grid boxes shrink.

We are actively working on developing a new type of cloud parameterization that takes advantage of this self-similarity. We look forward to blogging about this as we develop this idea further!

*Cloud image key: left (750 km), middle (325 km), right (112 km)

Extreme value statistics for daily precipitation

The CASCADE D&A team is fully committed to using the highest resolution climate models possible on the machines at DOE’s National Energy Research Supercomputing Center. We have demonstrated that high resolution (of the order 25km) is a necessary but not sufficient condition to reproduce the distribution of extreme daily averaged precipitation. Working closely with the CASCADE statistics and software teams, we have applied extreme value statistics to a variety of observed and simulated daily precipitation products to quantify model performance in simulating extreme precipitation.

webDA-conus_figure2a-for-webThe figure to the left shows comparisons of the annual probability density distributions of daily precipitation between the range of observed precipitation and three different CAM5.1 horizontal resolutions over the contiguous United States. Only the high-resolution model (blue) falls within the range of observational uncertainty.

webDA-figure4a-for-web

The  figure to the right shows twenty-year return values (simulated and observed) of the boreal winter maximum daily precipitation over land. Observations are calculated from the period 1979-1999. Model results are calculated from the period 1979-2005. All results are shown at the native resolution.

Source: Wehner et al. (2014) The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1. Early online release: Journal of Modeling the Earth System 06, doi:10.1002/2013MS000276.

Systematically simulating past weather

modeval-CAM5-Clouds-July-for-web

ACME: weather as seen by satellite

In the CASCADE project, we are focused on understanding extreme climate events–strange weather–and we use climate models to do this.  These extreme climate events are fundamentally weather events, and so our climate models must be able to do a good job of simulating weather.  We are running an early version of the Accelerated Climate Model for Energy to simulate past weather events.  This way we can check the model’s weather against the weather that actually happened.

The figure to the left is a snapshot of weather in the Accelerated Climate Model for Energy (ACME) as a satellite would see it.

UNCERTAINTY OF EXTREMES

Statistical methods to quantify changes in extreme weather in light of uncertainty.

Extremes are by definition rare. Statistical methods are critical for characterizing extreme weather in time and space, estimating changes in extreme weather, and quantifying our uncertainty about the frequency and trends in extremes.

The Statistics team is developing, implementing and advising on statistical methods for characterizing extremes in observations and model output. We are particularly focused on detection and designation of changes in extreme events, quantifying the evidence that the probability of extreme events are changing over time and that changes are caused by environmental drivers. As part of this work we are addressing the question of uncertainty characterization. A key focus is to identify the leading sources of uncertainty in our understanding of weather extremes (e.g., initial condition uncertainty/sampling uncertainty, forcing uncertainty, model parameter uncertainty).

Studies that aim to detect and attribute changes in extreme events have numerous sources of uncertainty, including parametric uncertainty, structural uncertainty, and even methodological uncertainty. In the current paradigm, these sources of uncertainty are dealt with in a piecemeal fashion that can result in overconfident statements of causation. This could cause, and in fact has produced, conflicts among causation statements on the same event. The sensitivity of event causation conclusions to the various sources of uncertainty remains sparsely investigated but is demonstrably important. The SFA team focuses on performing a multifaceted set of modeling experiments and analyses designed specifically to characterize, and if possible quantify, the importance of structural uncertainty, parametric uncertainty, and methodological uncertainty on our understanding of various classes of events.

How to run UV-CDAT in parallel at NERSC

 

**[Michael Wehner](mailto:mfwehner@lbl.gov) and [Hari Krishnan](mailto:hkrishnan@lbl.gov), Lawrence Berkeley National Laboratory**

## Introduction

Most climate data analyses have at least one dimension that be exploited at NERSC in an embarrassingly parallel manner. In fact, the most common of these is simply time. The scripts presented here are a general solution to take advantage of temporal parallelism for a wide variety of lengthy UVCDAT calculations.

Typically, long time series climate data sets are spread across multiple files, usually to keep the file sizes manageable. The UVCDAT script called `cdscan` is used to construct an xml file to be read by the `cdms2` module as a single pseudo data file that contains the entire time domain. In the case of exploiting temporal parallelism, it is most straightforward to instead keep the files separate and assign a single processor to each file. Hence, in order to get high degrees of parallelism, more files of short time intervals is actually better than a few lengthy files. Another often parallel dimension is found in ensemble simulations, where files are often arranged by realization number.

In order to assign individual files to individual processors, we use the `mpi4py` module. The total number of MPI tasks is set equal to the total number of processors. For instance, if the number of input files is 48, the following simple batch command will work on hopper.nersc.gov or edison.nersc.gov.

“`
qsub python_test.pbs
“`

where the batch input file to execute the parallel UVCDAT script, `python_test.py`, described below, is as follows:

### python_test.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48 #PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err
#PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd $SCRATCH/my-directory

aprun -n 48 python python_test.py –parallel *nc
“`

In this example, it assumed that there are 48 netcdf files in $SCRATCH/my-directory. This job would use 2 nodes on hopper or edison.

**Note: It is highly recommended that you perform all parallel operations in the scratch directories or other parallel file systems. Jobs executed on parallel file systems such as your home directory or /project could be up to 10 times slower than on a parallel file systems such as LUSTRE.**

This script uses the `cascade` module, but you may replace that with `module load uvcdat`

## Memory limits

In many cases, there will not be enough memory per processor to run your script. There are many weird error messages that can be returned from batch runs. In the case of not enough memory per processor, the log file, python_test.$PBS_JOBID.out, where $PBS_JOBID is a number, will contain a message like this:

“`
[NID 05718] 2015-03-18 17:07:02 Apid 47875875: OOM killer terminated this process.
“`

To correct this error, you will need to idle out some processors. This script below will use only 12 of each of the 24 processors per hopper node. Hence, it asks for twice as many processors (96) to run 48 MPI tasks. The NERSC website can tell you how much memory per node is available. If you know the memory footprint of your code, you can calculate `mppnppn` by dividing the available memory on a node by the code’s memory footprint. It is not uncommon to use values of 4 or less for high resolution data sets. Note that the value of –N in the `aprun` command is equal to `mppnppn` and should be an integer divisor of the number of processors per node (24 on hopper and edison). Also note that the newly recommended syntax is to continue to set `mppwidth` to the number of tasks actually required.

### python_test_2xMemory.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48
#PBS -l mppnppn=12
#PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err #PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd /global/project/projectdirs/m1517/tmp/python_test

aprun -n 48 –N 12 python python_test.py –parallel *nc
“`

This job would use 4 nodes on hopper or edison.

## UVCDAT example using mpi4py

In order to robustly enable multiple MPI tasks, see the python script below. Real scripts will be more complex, but this will work as a stencil.

### python_test.py

“`python
import sys, cdms2, string
# The parallel branch
if sys.argv[1]==”–parallel”:
# note that mpi4py has been imported in the batch script. You may want to do it here instead in some cases.
# import mpi4py
from mpi4py import MPI
comm = MPI.COMM_WORLD
# Size is the number of tasks controlled by –n in aprun. 48 in this example.
size = comm.Get_size()
# rank is the id of this task and hence this processor.
rank = comm.Get_rank()
# files is a list of the input file names determined by the filtered list on the aprun line.
files=sys.argv[2:]
# file_name is the name of the file to be processed by this task.
file_name=files[rank]
# A serial branch that we find useful to test code.
# The execute line is:
# python python_test.py some_netcdf_file.nc
if sys.argv[1]!=’–parallel’:
rank=0
file_name=sys.argv[1]

# The main body of the code. Likely extracted from a current serial code.
print “processor “+string.zfill(rank,4)+” is starting “+file_name

try:
f=cdms2.open(file_name)
# Do some fancy math on file_name with python code here. var=f(‘tas’) # etc.
# Or you can call some other kind of program, such as C, Fortran, ncl, etc. import os
os.system(‘some_serial_code.x ‘+file_name)
print “Rank”,rank, ” succeeded!!”
except:
print “Rank”,rank, ” failed!!!!!! ”

sys.exit(0)
# end of python_test.py
“`

Print statements will be found in the logfile `python_test.$PBS_JOBID.out`. The try/except structure is an attempt to enable parallel jobs with a few bad input files to complete the tasks on the good files. In cases with thousands of files, you don’t want the job killed because one of those files is corrupt. You can find the bad input files quickly by grepping on “failed” in the logfile, then grepping again for the rank numbers that failed. The order of print statements will be pretty random as tasks do not appear to initiate in lock step. Without the try/except coding structure, any error will cause all tasks to end immediately. With this coding structure, if only a few files are corrupted, you can go back and run your script serially afterwards on the repaired files. This will be a lot faster than waiting again in the queue for 10000 processors. This trick appears to work, but we there are segmentation errors that it does not capture. A robust exit strategy to capture errors without ending all tasks remains to be developed.

Also, this example can be used as a quick and dirty way to call an external serial program on many files in parallel as indicated above. This may be a good way to quickly parallize ncl/nco scripts or compiled programs. But note that it will fail when calling serial python scripts due to conflicts. So for python scripts, you will need to modify the code itself per this example.

## Efficiency issues.

As there is no interprocessor communication in this embarrassingly parallel example, you might expect near 100% parallel efficiency. This will not likely be the case. As noted above, efficiency will be much better on parallel file systems. However even then, `cdms2` is not designed for parallel input and output, hence there is contention for i/o resources. We find that i/o is generally the biggest single computational efficiency issue. Nonetheless, we have reduced throughput from weeks to hours in many cases. We do not recommend trying to read xml files constructed by `cdscan`, as contention for the scanned files is large.

Furthermore, we find that a large number of shorter duration files, each read by a single processor, is faster than having multiple processors read segments of longer duration files, due to file contention resulting from the lack of parallel i/o UVCDAT modules. However, we do note that if time is not an embarrassingly parallel dimension than space often is (such as a temporal averaging or similar calendar operations). In this case, you most likely will need to extract data from single files by many processors and efficiency will suffer. However, throughput gains over serial execution may still be possible.

## Limitations of these scripts.

These scripts deal with only one use case. That is a parallel execution of a serial script that reads one file per task. Although this is often useful, especially when time is an embarrassingly parallel dimension, there can be more parallelism to be had if files contain more than one time step. In this case, code could be written to have multiple files read from the same file. However, without a parallel `cdms2`, efficiency is limited. Regarding output, writing multiple files is not too troublesome as `cdscan` can be used to serially join them afterwards.

Finally, depending on your patience, it may not be worth the trouble to run UVCDAT in parallel at all. Queue waiting times can be many hours, even days. We do not generally bother with a parallel execution if a serial execution can be run in a day or less. However, when execution time (including queue waiting times) can be reduced from months to days or hours, parallel UVCDAT execution can enable analyses heretofore impractical if not impossible.

Additionally posted on: https://github.com/UV-CDAT/uvcdat/wiki/How-to-run-UV-CDAT-in-parallel-at-NERSC

North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends

Two BER funded DOE laboratory scientists, Ruby Leung (PNNL) and Michael Wehner (LBNL), are among the principal authors of a comprehensive new review of the large scale meteorological patterns (LSMP) responsible for short term North American heat waves and cold snaps. The objective of this paper is to review statistical methods, dynamics, modeling efforts and trends related to such events. In particular, the role of LSMPs on observed past and simulated future extreme temperature changes is explored. Leung reviewed the state of the art in climate modeling of LSMP and extreme temperatures and future changes. Wehner reviewed the current statistical modeling of observed and simulated temperature extremes and trends and assessed climate model performance in simulating observations. The paper concludes by assessing gaps in our knowledge about LSMPs and temperature extremes.

 

This review paper was part of the activities conducted under the auspices of the US CLIVAR “Extremes and Large Scale Meteorological Patterns” Working Group. Previously, the working group sponsored a workshop on the topic at the Lawrence Berkeley National Laboratory.

 

Citation: Richard Grotjahn, Robert Black, Ruby Leung, Michael F. Wehner, Mathew Barlow. Mike Bosilovich, Alexander Gershunov, William J. Gutowski, John R. Gyakum, Richard W. Katz, Yun-Young Lee, Young-Kwon Lim, Prabhat (2015) North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends. Climate Dynamics, 0930-7575. 10.1007/s00382-015-2638-6

http://link.springer.com/article/10.1007%2Fs00382-015-2638-6

Untitled

Performance portrait of the CMIP5 models’ ability to represent the temperature based ETCCDI indices over North American land. The colors represent normalized root mean square errors (RMSE) of seasonal indices compared to the ERA Interim reanalysis. Blue colors represent errors lower than the median error, while red colors represent errors larger than the median error. Seasons are denoted by triangles within each square. Models marked with “*” are not included in the RCP8.5 projections. Root mean square errors normalized by the model median RMSE for 3 other reanalyses are shown in the rightmost columns for comparison.

 

tas_ETCCDI_changes

Projected seasonal changes in North American extreme temperatures from the CMIP5 multi-model at the end of this century under the RCP8.5 forcing scenario. The reference period is 1985-2005 while the future period is 2080-2100. Winter changes are shown on the left while summer changes are shown on the right. The top figures represent changes in cold nights (Tnn) while the lower figures represent changes in hot days (Txx). Units: Kelvins.

 

tas_figure1 tas_figure1_right

Change over 1950-2007 in estimated 20-year annual return values (oC) for a) hot tail of daily maximum temperature (TXx), b) cold tail of daily maximum temperature, (TXn) c) hot tail of daily minimum temperature, (TNx) and d) cold tail of daily minimum temperature (TNn). Results are based on fitting extreme value statistical models with a linear trend in the location parameter to exceedances of a location-specific threshold (greater than the 99th percentile for upper tail and less than the 1th percentile for lower tail). As this analysis was based on anomalies with respect to average values for that time of year, hot minimum temperature values, for example, are just as likely to occur in winter as in summer. The circles indicate the z-score for the estimated change (estimate divided by its standard error), with absolute z-scores exceeding 1, 2, and 3 indicated by open circles of increasing size. Higher z-score indicates greater statistical significance.

Fewer total number of hurricanes but more intense ones in a warmer world

The four idealized configurations of the US CLIVAR Hurricane Working Group are integrated using the global Community Atmospheric Model version 5.1 at two different horizontal resolutions, approximately 100km and 25km. The publicly released 0.9ox1.3o configuration is a poor predictor of the sign of the 0.23ox0.31o model configuration’s change in the total number of tropical storms in a warmer climate. However, it does predict the sign of the higher resolution configuration’s change in the number of intense tropical cyclones in a warmer climate. In the 0.23ox0.31o model configuration, both increased CO2 concentrations and elevated sea surface temperature (SST) independently lower the number of weak tropical storms and shorten their average duration. Conversely, increased SST causes more intense tropical cyclones and lengthens their average duration resulting in a greater number of intense tropical cyclone days globally. Increased SST also increased maximum tropical storm instantaneous precipitation rates across all storm intensities. We find that while a measure of maximum potential intensity based on climatological mean quantities adequately predicts the 0.23ox0.31o model’s forced response in its most intense simulated tropical cyclones, a related measure of cyclogenesis potential fails to predict the model’s actual cyclogenesis response to warmer SSTs. These analyses lead to two broader conclusions: 1) Projections of future tropical storm activity obtained by a direct tracking of tropical storms simulated by CMIP5-class resolution climate models must be interpreted with caution. 2) Projections of future tropical cyclogenesis obtained from metrics of model behavior that are based solely on changes in long-term climatological fields and tuned to historical records must also be interpreted with caution.

 

Citation: Michael Wehner, Prabhat, Kevin Reed, Daithi Stone, William D. Collins, Julio Bacmeister, Andrew Gettleman (2014) Resolution dependence of future tropical cyclone projections of CAM5.1 in the US CLIVAR Hurricane Working Group idealized configurations. J. Climate 28, 3905-3925. DOI: 10.1175/JCLI-D-14-00311.1

 

figure4.xlsx

Changes in the number of tropical cyclones in the idealized hurricane working group simulations.

 

 

Self-similarity of clouds

Self-similarity of clouds from MODIS

The above three images are from the same satellite image.  Each has a different size, and all have been resized to the same size in the above image.  One is approximately 112 km across, one is approximately 325 km across, and one is approximately 750 km across.  Which is which?*

Moreover, why does this matter?

Strange weather–extreme events–generally involve a wide range of scales.  Weather systems are often several hundreds of kilometers wide, but rain within them changes rapidly over a few kilometers.  Doppler radar imagery that is prevalent on weather reports show this. One part of a city might have pouring rain while another part of a city has none.

Because of this wide range of scales, weather and climate models must be able simulate how the atmosphere moves at a wide range of scales in order to simulate strange weather.  Weather and climate models divide the atmosphere into a bunch of relatively large `boxes’ (analogous to pixels in an image) and track the movement of wind, water, and energy among the boxes.   The boxes in state-of-the-science global climate models  are approximately 25 km across, and they must use a separate type of model (technically called a parameterization) to deal with what happens within these boxes.  In order to do a good job at simulating strange weather, these parameterizations must take what is happen at large scales and translate them reliably into what is happening at the small scales at which strange weather occurs.

This is where self-similarity comes in.

Many aspects of the atmosphere are what we call `self-similar‘.  The cloud images above illustrate this perfectly.  The images are by no means identical, but they also aren’t very different.  There is very little that distinguishes one image from another: despite one image being almost 10 times wider than another.  This is the essence of self-similarity: the statistics at one scale are identical to the statistics at another.  In the cloud image above, this means that if you look at the number of clouds that have 8 pixels and compare that to the number of clouds that have 67 pixels, the ratio of those two numbers will be about the same in each image!

In the CASCADE project, we are taking advantage of self-similarity to improve the way that weather and climate models connect the large scales to the small scales.  If you know the statistics of the atmosphere at large scales, this tells you what the statistics should be at small scales.  This also means that as we shrink the size of weather and climate model grid boxes, we know how the statistics of the atmosphere in these grid boxes should change.  In some recent work, we used this knowledge to provide a detailed guide for how parameterizations of clouds should change as model grid boxes shrink.

We are actively working on developing a new type of cloud parameterization that takes advantage of this self-similarity. We look forward to blogging about this as we develop this idea further!

*Cloud image key: left (750 km), middle (325 km), right (112 km)

 

Extreme value statistics for daily precipitation

 

The CASCADE D&A team is fully committed to using the highest resolution climate models possible on the machines at DOE’s National Energy Research Supercomputing Center. We have demonstrated that high resolution (of the order 25km) is a necessary but not sufficient condition to reproduce the distribution of extreme daily averaged precipitation. Working closely with the CASCADE statistics and software teams, we have applied extreme value statistics to a variety of observed and simulated daily precipitation products to quantify model performance in simulating extreme precipitation.

webDA-conus_figure2a-for-webThe figure to the left shows comparisons of the annual probability density distributions of daily precipitation between the range of observed precipitation and three different CAM5.1 horizontal resolutions over the contiguous United States. Only the high-resolution model (blue) falls within the range of observational uncertainty.

 

 

webDA-figure4a-for-web

The  figure to the right shows twenty-year return values (simulated and observed) of the boreal winter maximum daily precipitation over land. Observations are calculated from the period 1979-1999. Model results are calculated from the period 1979-2005. All results are shown at the native resolution.

 

 

Source: Wehner et al. (2014) The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1. Early online release: Journal of Modeling the Earth System 06, doi:10.1002/2013MS000276.

Systematically simulating past weather

modeval-CAM5-Clouds-July-for-web

ACME: weather as seen by satellite

In the CASCADE project, we are focused on understanding extreme climate events–strange weather–and we use climate models to do this.  These extreme climate events are fundamentally weather events, and so our climate models must be able to do a good job of simulating weather.  We are running an early version of the Accelerated Climate Model for Energy to simulate past weather events.  This way we can check the model’s weather against the weather that actually happened.

The figure to the left is a snapshot of weather in the Accelerated Climate Model for Energy (ACME) as a satellite would see it.

COMPUTATION AND PREDICTION

High performance computing to detect and predict changes in weather extremes.

The CASCADE computation and predictions team is developing scientific tools, workflow patterns, and scalable algorithms that can process massive model output on modern HPC systems.

The computation and predictions team is tightly integrating the detection system with the attribution framework so that statistics from the detection analyses automatically yield the probability distribution functions required to produce quantitative attribution and projection statements for extreme events. In a related effort, we are integrating event detection and analysis with the ILIAD ((InitiaLized-ensemble Identify, Analyze, Develop) framework to ensure that probabilities of event detection do not depend on model configuration, thereby mitigating the resolution dependence of hurricane detection.

The CASCADE research portfolio requires extensive computational and statistical infrastructure. Much of the SFA research requires implementation of novel statistical methods. Likewise, the formal application of UQ methods for extremes requires the implementation of a surrogate model and new developments in emulator methodology. Further, all of the SFA’s analyses require sophisticated, robust, and parallelizable data analysis tools to operate on the enormous datasets that we use (O (100{1000) TB). Therefore, the SFA focuses on three main research and development foci to support the broader goals of the project:  methodological development for systematic event causation at ne spatial scales, development of a statistical framework for the holistic uncertainty characterization work, and development of a multilevel model emulator for extremes.

How to run UV-CDAT in parallel at NERSC

 

**[Michael Wehner](mailto:mfwehner@lbl.gov) and [Hari Krishnan](mailto:hkrishnan@lbl.gov), Lawrence Berkeley National Laboratory**

## Introduction

Most climate data analyses have at least one dimension that be exploited at NERSC in an embarrassingly parallel manner. In fact, the most common of these is simply time. The scripts presented here are a general solution to take advantage of temporal parallelism for a wide variety of lengthy UVCDAT calculations.

Typically, long time series climate data sets are spread across multiple files, usually to keep the file sizes manageable. The UVCDAT script called `cdscan` is used to construct an xml file to be read by the `cdms2` module as a single pseudo data file that contains the entire time domain. In the case of exploiting temporal parallelism, it is most straightforward to instead keep the files separate and assign a single processor to each file. Hence, in order to get high degrees of parallelism, more files of short time intervals is actually better than a few lengthy files. Another often parallel dimension is found in ensemble simulations, where files are often arranged by realization number.

In order to assign individual files to individual processors, we use the `mpi4py` module. The total number of MPI tasks is set equal to the total number of processors. For instance, if the number of input files is 48, the following simple batch command will work on hopper.nersc.gov or edison.nersc.gov.

“`
qsub python_test.pbs
“`

where the batch input file to execute the parallel UVCDAT script, `python_test.py`, described below, is as follows:

### python_test.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48 #PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err
#PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd $SCRATCH/my-directory

aprun -n 48 python python_test.py –parallel *nc
“`

In this example, it assumed that there are 48 netcdf files in $SCRATCH/my-directory. This job would use 2 nodes on hopper or edison.

**Note: It is highly recommended that you perform all parallel operations in the scratch directories or other parallel file systems. Jobs executed on parallel file systems such as your home directory or /project could be up to 10 times slower than on a parallel file systems such as LUSTRE.**

This script uses the `cascade` module, but you may replace that with `module load uvcdat`

## Memory limits

In many cases, there will not be enough memory per processor to run your script. There are many weird error messages that can be returned from batch runs. In the case of not enough memory per processor, the log file, python_test.$PBS_JOBID.out, where $PBS_JOBID is a number, will contain a message like this:

“`
[NID 05718] 2015-03-18 17:07:02 Apid 47875875: OOM killer terminated this process.
“`

To correct this error, you will need to idle out some processors. This script below will use only 12 of each of the 24 processors per hopper node. Hence, it asks for twice as many processors (96) to run 48 MPI tasks. The NERSC website can tell you how much memory per node is available. If you know the memory footprint of your code, you can calculate `mppnppn` by dividing the available memory on a node by the code’s memory footprint. It is not uncommon to use values of 4 or less for high resolution data sets. Note that the value of –N in the `aprun` command is equal to `mppnppn` and should be an integer divisor of the number of processors per node (24 on hopper and edison). Also note that the newly recommended syntax is to continue to set `mppwidth` to the number of tasks actually required.

### python_test_2xMemory.pbs

“`
#PBS -q debug
#PBS -l mppwidth=48
#PBS -l mppnppn=12
#PBS -l walltime=00:30:00
#PBS -N python_test
#PBS -e python_test.$PBS_JOBID.err #PBS -o python_test.$PBS_JOBID.out
#PBS -V

module load cascade
module load mpi4py

cd /global/project/projectdirs/m1517/tmp/python_test

aprun -n 48 –N 12 python python_test.py –parallel *nc
“`

This job would use 4 nodes on hopper or edison.

## UVCDAT example using mpi4py

In order to robustly enable multiple MPI tasks, see the python script below. Real scripts will be more complex, but this will work as a stencil.

### python_test.py

“`python
import sys, cdms2, string
# The parallel branch
if sys.argv[1]==”–parallel”:
# note that mpi4py has been imported in the batch script. You may want to do it here instead in some cases.
# import mpi4py
from mpi4py import MPI
comm = MPI.COMM_WORLD
# Size is the number of tasks controlled by –n in aprun. 48 in this example.
size = comm.Get_size()
# rank is the id of this task and hence this processor.
rank = comm.Get_rank()
# files is a list of the input file names determined by the filtered list on the aprun line.
files=sys.argv[2:]
# file_name is the name of the file to be processed by this task.
file_name=files[rank]
# A serial branch that we find useful to test code.
# The execute line is:
# python python_test.py some_netcdf_file.nc
if sys.argv[1]!=’–parallel’:
rank=0
file_name=sys.argv[1]

# The main body of the code. Likely extracted from a current serial code.
print “processor “+string.zfill(rank,4)+” is starting “+file_name

try:
f=cdms2.open(file_name)
# Do some fancy math on file_name with python code here. var=f(‘tas’) # etc.
# Or you can call some other kind of program, such as C, Fortran, ncl, etc. import os
os.system(‘some_serial_code.x ‘+file_name)
print “Rank”,rank, ” succeeded!!”
except:
print “Rank”,rank, ” failed!!!!!! ”

sys.exit(0)
# end of python_test.py
“`

Print statements will be found in the logfile `python_test.$PBS_JOBID.out`. The try/except structure is an attempt to enable parallel jobs with a few bad input files to complete the tasks on the good files. In cases with thousands of files, you don’t want the job killed because one of those files is corrupt. You can find the bad input files quickly by grepping on “failed” in the logfile, then grepping again for the rank numbers that failed. The order of print statements will be pretty random as tasks do not appear to initiate in lock step. Without the try/except coding structure, any error will cause all tasks to end immediately. With this coding structure, if only a few files are corrupted, you can go back and run your script serially afterwards on the repaired files. This will be a lot faster than waiting again in the queue for 10000 processors. This trick appears to work, but we there are segmentation errors that it does not capture. A robust exit strategy to capture errors without ending all tasks remains to be developed.

Also, this example can be used as a quick and dirty way to call an external serial program on many files in parallel as indicated above. This may be a good way to quickly parallize ncl/nco scripts or compiled programs. But note that it will fail when calling serial python scripts due to conflicts. So for python scripts, you will need to modify the code itself per this example.

## Efficiency issues.

As there is no interprocessor communication in this embarrassingly parallel example, you might expect near 100% parallel efficiency. This will not likely be the case. As noted above, efficiency will be much better on parallel file systems. However even then, `cdms2` is not designed for parallel input and output, hence there is contention for i/o resources. We find that i/o is generally the biggest single computational efficiency issue. Nonetheless, we have reduced throughput from weeks to hours in many cases. We do not recommend trying to read xml files constructed by `cdscan`, as contention for the scanned files is large.

Furthermore, we find that a large number of shorter duration files, each read by a single processor, is faster than having multiple processors read segments of longer duration files, due to file contention resulting from the lack of parallel i/o UVCDAT modules. However, we do note that if time is not an embarrassingly parallel dimension than space often is (such as a temporal averaging or similar calendar operations). In this case, you most likely will need to extract data from single files by many processors and efficiency will suffer. However, throughput gains over serial execution may still be possible.

## Limitations of these scripts.

These scripts deal with only one use case. That is a parallel execution of a serial script that reads one file per task. Although this is often useful, especially when time is an embarrassingly parallel dimension, there can be more parallelism to be had if files contain more than one time step. In this case, code could be written to have multiple files read from the same file. However, without a parallel `cdms2`, efficiency is limited. Regarding output, writing multiple files is not too troublesome as `cdscan` can be used to serially join them afterwards.

Finally, depending on your patience, it may not be worth the trouble to run UVCDAT in parallel at all. Queue waiting times can be many hours, even days. We do not generally bother with a parallel execution if a serial execution can be run in a day or less. However, when execution time (including queue waiting times) can be reduced from months to days or hours, parallel UVCDAT execution can enable analyses heretofore impractical if not impossible.

Additionally posted on: https://github.com/UV-CDAT/uvcdat/wiki/How-to-run-UV-CDAT-in-parallel-at-NERSC

North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends

Two BER funded DOE laboratory scientists, Ruby Leung (PNNL) and Michael Wehner (LBNL), are among the principal authors of a comprehensive new review of the large scale meteorological patterns (LSMP) responsible for short term North American heat waves and cold snaps. The objective of this paper is to review statistical methods, dynamics, modeling efforts and trends related to such events. In particular, the role of LSMPs on observed past and simulated future extreme temperature changes is explored. Leung reviewed the state of the art in climate modeling of LSMP and extreme temperatures and future changes. Wehner reviewed the current statistical modeling of observed and simulated temperature extremes and trends and assessed climate model performance in simulating observations. The paper concludes by assessing gaps in our knowledge about LSMPs and temperature extremes.

 

This review paper was part of the activities conducted under the auspices of the US CLIVAR “Extremes and Large Scale Meteorological Patterns” Working Group. Previously, the working group sponsored a workshop on the topic at the Lawrence Berkeley National Laboratory.

 

Citation: Richard Grotjahn, Robert Black, Ruby Leung, Michael F. Wehner, Mathew Barlow. Mike Bosilovich, Alexander Gershunov, William J. Gutowski, John R. Gyakum, Richard W. Katz, Yun-Young Lee, Young-Kwon Lim, Prabhat (2015) North American Extreme Temperature Events and Related Large Scale Meteorological Patterns: A Review of Statistical Methods, Dynamics, Modeling, and Trends. Climate Dynamics, 0930-7575. 10.1007/s00382-015-2638-6

http://link.springer.com/article/10.1007%2Fs00382-015-2638-6

Untitled

Performance portrait of the CMIP5 models’ ability to represent the temperature based ETCCDI indices over North American land. The colors represent normalized root mean square errors (RMSE) of seasonal indices compared to the ERA Interim reanalysis. Blue colors represent errors lower than the median error, while red colors represent errors larger than the median error. Seasons are denoted by triangles within each square. Models marked with “*” are not included in the RCP8.5 projections. Root mean square errors normalized by the model median RMSE for 3 other reanalyses are shown in the rightmost columns for comparison.

 

tas_ETCCDI_changes

Projected seasonal changes in North American extreme temperatures from the CMIP5 multi-model at the end of this century under the RCP8.5 forcing scenario. The reference period is 1985-2005 while the future period is 2080-2100. Winter changes are shown on the left while summer changes are shown on the right. The top figures represent changes in cold nights (Tnn) while the lower figures represent changes in hot days (Txx). Units: Kelvins.

 

tas_figure1 tas_figure1_right

Change over 1950-2007 in estimated 20-year annual return values (oC) for a) hot tail of daily maximum temperature (TXx), b) cold tail of daily maximum temperature, (TXn) c) hot tail of daily minimum temperature, (TNx) and d) cold tail of daily minimum temperature (TNn). Results are based on fitting extreme value statistical models with a linear trend in the location parameter to exceedances of a location-specific threshold (greater than the 99th percentile for upper tail and less than the 1th percentile for lower tail). As this analysis was based on anomalies with respect to average values for that time of year, hot minimum temperature values, for example, are just as likely to occur in winter as in summer. The circles indicate the z-score for the estimated change (estimate divided by its standard error), with absolute z-scores exceeding 1, 2, and 3 indicated by open circles of increasing size. Higher z-score indicates greater statistical significance.

Fewer total number of hurricanes but more intense ones in a warmer world

The four idealized configurations of the US CLIVAR Hurricane Working Group are integrated using the global Community Atmospheric Model version 5.1 at two different horizontal resolutions, approximately 100km and 25km. The publicly released 0.9ox1.3o configuration is a poor predictor of the sign of the 0.23ox0.31o model configuration’s change in the total number of tropical storms in a warmer climate. However, it does predict the sign of the higher resolution configuration’s change in the number of intense tropical cyclones in a warmer climate. In the 0.23ox0.31o model configuration, both increased CO2 concentrations and elevated sea surface temperature (SST) independently lower the number of weak tropical storms and shorten their average duration. Conversely, increased SST causes more intense tropical cyclones and lengthens their average duration resulting in a greater number of intense tropical cyclone days globally. Increased SST also increased maximum tropical storm instantaneous precipitation rates across all storm intensities. We find that while a measure of maximum potential intensity based on climatological mean quantities adequately predicts the 0.23ox0.31o model’s forced response in its most intense simulated tropical cyclones, a related measure of cyclogenesis potential fails to predict the model’s actual cyclogenesis response to warmer SSTs. These analyses lead to two broader conclusions: 1) Projections of future tropical storm activity obtained by a direct tracking of tropical storms simulated by CMIP5-class resolution climate models must be interpreted with caution. 2) Projections of future tropical cyclogenesis obtained from metrics of model behavior that are based solely on changes in long-term climatological fields and tuned to historical records must also be interpreted with caution.

 

Citation: Michael Wehner, Prabhat, Kevin Reed, Daithi Stone, William D. Collins, Julio Bacmeister, Andrew Gettleman (2014) Resolution dependence of future tropical cyclone projections of CAM5.1 in the US CLIVAR Hurricane Working Group idealized configurations. J. Climate 28, 3905-3925. DOI: 10.1175/JCLI-D-14-00311.1

 

figure4.xlsx

Changes in the number of tropical cyclones in the idealized hurricane working group simulations.

 

 

Self-similarity of clouds

Self-similarity of clouds from MODIS

The above three images are from the same satellite image.  Each has a different size, and all have been resized to the same size in the above image.  One is approximately 112 km across, one is approximately 325 km across, and one is approximately 750 km across.  Which is which?*

Moreover, why does this matter?

Strange weather–extreme events–generally involve a wide range of scales.  Weather systems are often several hundreds of kilometers wide, but rain within them changes rapidly over a few kilometers.  Doppler radar imagery that is prevalent on weather reports show this. One part of a city might have pouring rain while another part of a city has none.

Because of this wide range of scales, weather and climate models must be able simulate how the atmosphere moves at a wide range of scales in order to simulate strange weather.  Weather and climate models divide the atmosphere into a bunch of relatively large `boxes’ (analogous to pixels in an image) and track the movement of wind, water, and energy among the boxes.   The boxes in state-of-the-science global climate models  are approximately 25 km across, and they must use a separate type of model (technically called a parameterization) to deal with what happens within these boxes.  In order to do a good job at simulating strange weather, these parameterizations must take what is happen at large scales and translate them reliably into what is happening at the small scales at which strange weather occurs.

This is where self-similarity comes in.

Many aspects of the atmosphere are what we call `self-similar‘.  The cloud images above illustrate this perfectly.  The images are by no means identical, but they also aren’t very different.  There is very little that distinguishes one image from another: despite one image being almost 10 times wider than another.  This is the essence of self-similarity: the statistics at one scale are identical to the statistics at another.  In the cloud image above, this means that if you look at the number of clouds that have 8 pixels and compare that to the number of clouds that have 67 pixels, the ratio of those two numbers will be about the same in each image!

In the CASCADE project, we are taking advantage of self-similarity to improve the way that weather and climate models connect the large scales to the small scales.  If you know the statistics of the atmosphere at large scales, this tells you what the statistics should be at small scales.  This also means that as we shrink the size of weather and climate model grid boxes, we know how the statistics of the atmosphere in these grid boxes should change.  In some recent work, we used this knowledge to provide a detailed guide for how parameterizations of clouds should change as model grid boxes shrink.

We are actively working on developing a new type of cloud parameterization that takes advantage of this self-similarity. We look forward to blogging about this as we develop this idea further!

*Cloud image key: left (750 km), middle (325 km), right (112 km)

 

Extreme value statistics for daily precipitation

 

The CASCADE D&A team is fully committed to using the highest resolution climate models possible on the machines at DOE’s National Energy Research Supercomputing Center. We have demonstrated that high resolution (of the order 25km) is a necessary but not sufficient condition to reproduce the distribution of extreme daily averaged precipitation. Working closely with the CASCADE statistics and software teams, we have applied extreme value statistics to a variety of observed and simulated daily precipitation products to quantify model performance in simulating extreme precipitation.

webDA-conus_figure2a-for-webThe figure to the left shows comparisons of the annual probability density distributions of daily precipitation between the range of observed precipitation and three different CAM5.1 horizontal resolutions over the contiguous United States. Only the high-resolution model (blue) falls within the range of observational uncertainty.

 

 

webDA-figure4a-for-web

The  figure to the right shows twenty-year return values (simulated and observed) of the boreal winter maximum daily precipitation over land. Observations are calculated from the period 1979-1999. Model results are calculated from the period 1979-2005. All results are shown at the native resolution.

 

 

Source: Wehner et al. (2014) The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1. Early online release: Journal of Modeling the Earth System 06, doi:10.1002/2013MS000276.

Systematically simulating past weather

modeval-CAM5-Clouds-July-for-web

ACME: weather as seen by satellite

In the CASCADE project, we are focused on understanding extreme climate events–strange weather–and we use climate models to do this.  These extreme climate events are fundamentally weather events, and so our climate models must be able to do a good job of simulating weather.  We are running an early version of the Accelerated Climate Model for Energy to simulate past weather events.  This way we can check the model’s weather against the weather that actually happened.

The figure to the left is a snapshot of weather in the Accelerated Climate Model for Energy (ACME) as a satellite would see it.