TECA2.0 publicly released
We are pleased to announce the release of TECA2.0
You can download the software at https://github.com/LBL-EESA/TECA
Documentation is here: github.com
This is a major rewrite of the TECA with much friendlier user and developer interfaces. V2.0 contains only the TECA1.0 tropical cyclone detection algorithm. Other TC methods and analytics are under development as are trackers for other storm types.
On nersc systems, simply type module load teca to use.
Here is a movie from TECA2.0 illustrating some TC tracks.
https://drive.google.com/file/d/0B3y5yyus32lvNjkyUUF5YklmMUE/view
The big data ecosystem for science
Large-scale data management is essential for modern climate science. The CASCADE project has generated upwards of 3PB of data itself. LBNL data scientists have prepared a blog post describing the state of large scale data management across a variety of disciplines including climate science. See this link https://www.oreilly.com/ideas/the-big-data-ecosystem-for-science
LBNL CASCADE fvCAM5.1 US CLIVAR Hurricane Working Group datasets now online
0.25 and 1 degree simulations from the US CliVar Hurricane Working Group are now online. Data is held on HPSS tape, so there will be a short staging period prior to data downloads.
Click here
http://portal.nersc.gov/cascade/hwg/
The First C20C+ Event Attribution Hackathon
The first C20C+ “Hackathon” was held December 7-11, 2015 at the Computational Research Division of the Lawrence Berkeley National Laboratory. This get-together was a true working meeting with scientists learning what is in the C20C+ database of simulations and how to access it. Twelve scientists from around the world, including Japan, Australia, Great Britain, Germany, South Africa and the United States spent the week in a room full of computer terminals sharing ideas and experiences about extreme event attribution. Well over half of the attendees were early career scientists. Led by CASCADE team member, Dáithí Stone, the C20C+ Event Attribution Subproject is building a multi-model database of simulations target for extreme event analysis. Currently containing over 2PB of model simulations from four different modeling groups, the database is continuing to grow from both additional modeling groups and simulations. C20C+ data is available now, without restriction from the website http://portal.nersc.gov/c20c/ . Questions about the C20C+ Detection and Attribution Project should be directed to Dr. Stone at dstone@lbl.gov .
C20C+ hackers toured the new Computational Research and Theory (CRT) Facility and saw the latest NERSC supercomputer cori.nersc.gov on which many high resolution C20C+ simulations will be performed.
O’Reilly article on Top 10 Data Analytics problems at NERSC
See this popular press blog article by Prabhat detailing data analytics, including a contribution from the CASCADE project.
https://www.oreilly.com/ideas/big-science-problems-big-data-solutions
Graphics from CASCADE’s high resolution model simulation featured in NATURE magazine
An oft-used figure from the original 25km CAM5 simulation performed by the CASCADE team of Michael Wehner and Prabhat has been featured again in a high profile way. The article in this week’s Nature Magazine discussing imprecise computing (or more precisely, stochastic parameterization) used the image at the top of the page. See
http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437
This image is actually a still from an animation which may be viewed here:
Berkeley Lab Climate Software Honored for Pattern Recognition Advances
original article: http://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2015/berkeley-lab-climate-software-honored-for-pattern-recognition-advances/
Contact: Kathy Kincade, +1 510 495 2124, kkincade@lbl.gov
The Toolkit for Extreme Climate Analysis (TECA), developed at Lawrence Berkeley National Laboratory to help climate researchers detect extreme weather events in large datasets, has been recognized for its achievements in solving large-scale pattern recognition problems.
“TECA: Petascale Pattern Recognition for Climate Science,” a paper presented by scientists from Berkeley Lab and Argonne National Laboratory at the 16th International Conference on Computer Analysis of Images and Patterns (CAIP), was awarded CAIP’s Juelich Supercomputing Center prize for the best application of HPC technology in solving a pattern recognition problem.
The paper, authored by Prabhat, Surendra Byna, Eli Dart, Michael Wehner and Bill Collins of Berkeley Lab and Venkat Vishwanath of Argonne, is funded in part by Berkeley Lab’s Scientific Focus Area entitled “Calibrated and Systematic Characterization, Attribution, and Detection of Extremes (CASCADE),” in which pattern recognition is being analyzed to better identify extreme weather events associated with climate change. The researchers demonstrated how TECA, running at full scale on NERSC’s Hopper system (a Cray XE6) and Argonne’s Mira system (an IBM BG/Q), reduced the runtime for pattern detection tasks from years to hours.
TECA implements multi-variate threshold conditions to detect and track extreme weather events in large climate datasets. This visualization depicts tropical cyclone tracks overlaid on atmospheric flow patterns.
Modern climate simulations produce massive amounts of data, requiring sophisticated pattern recognition on terabyte- to petabyte-sized datasets. For this project, Prabhat and his colleagues downloaded 56 TB of climate data from the fifth phase of theCoupled Model Intercomparison Project (CMIP5) to NERSC. Their goal was to access subsets of those datasets to identify three different classes of storms: tropical cyclones, atmospheric rivers and extra-tropical cyclones.
All of the datasets were accessed through a portal created by the Earth Systems Grid Federation to facilitate the sharing of data—in this case, atmospheric model data at six-hour increments running out to the year 2100 that was stored at 21 sites around the world, including Norway, the United Kingdom, France, Japan and the United States. NERSC’s Hopper system was used to preprocess the data, which took about two weeks and resulted in a final 15 TB dataset.
ESnet’s High-Speed Network Critical
Moving the data from NERSC to the Argonne Leadership Computing Facility (ALCF) was accomplished using ESnet’s 100-gigabit-per-second network backbone. Globus, a software package developed for moving massive datasets easily, further sped things up. Several datasets were moved during the project, with much better performance than the original data staging; for example, a replication of the entire raw project dataset (56 TB) from NERSC to ALCF took only two days.
“ESnet exists to enable precisely this sort of work,” said Dart, a network engineer at ESnet who oversaw the data transfer. “It is essentially impossible, in the general case, to assemble large-scale datasets from this big, distributed archive without a high-performance, feature-rich network to connect all the different pieces. Participating in projects like this not only helps the project but helps ESnet understand what the needs of the scientists are so that we can run our networks better and help other groups that might have similar problems.”
Once the data was received at ALCF, Vishwanath helped optimize the code to scale to Mira’s massively parallel architecture and high-performance storage system, thereby enabling TECA to fully exploit the system. The job ran simultaneously on 755,200 of Mira’s 786,432 processor cores, using 1 million processor hours in just an hour and a half.
“We have now developed capabilities to scale TECA to entire supercomputers—Hopper and Edison at NERSC and Mira at ALCF—and ask sophisticated questions about how extreme weather is expected to change in future climate regimes,” said Prabhat, who leads NERSC’s Data and Analytics Services team. “Only by running at these large scales are we able to process these terabyte datasets in under an hour. Attempting similar analytics on a single machine would be simply intractable. TECA is enabling climate scientists to extract value from large-scale simulation and observational datasets.”
Related reading:
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy’s Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.
Juelich Supercomputing Center Prize Winner
Using CASCADE tools at NERSC
CASCADE support on NERSC requires a simple command:
On Hopper & Edison simply type:
“module load cascade”
This post will highlight many of the software technologies that underpin the CASCADE environment. In future blog posts, we will delve into each technology and provide examples on how to exercise them.
Technologies provided within the CASCADE module:
- Python –
- mako, pnetcdf, rpy2, matplotlib, UV-CDAT
- R –
- pbd – pmclust, pbdBASE, pbdSLAP, pbdDMAT, pbdNCDF4, pbdMPI
- maps, ncdf, fields, splancs, SpatialExtremes, extRemes
- Custom Tools –
- TECA (C/C++), LLEX (R), Depcache (Python), MPI-Regridder (Python)
- Other operations:
- HPSS-Archiver, ESGF Publish support.