DETECTION AND DESIGNATION

Characterization, detection, and designation of observed extreme events.

Extreme weather has large effects on human and natural systems. Through the use of observations, climate model and statistical techniques, CASCADE researchers examine how changes in the natural environment have impacted recent weather extremes.

Increased understanding of the influence of environmental drivers on current extreme weather increases confidence in projections of changes in future extreme weather statistics.  The centerpiece modeling effort of the CASCADE Detection and Designation team is the C20C+ (Climate of the 20th Century) experiment. CASCADE is a primary contributor to the World Climate Research Program (WCRP) coordinated international project. This multi-model effort aims to aid event attribution by building a database of ensemble climate model simulations describing the “world that was” in a realistic-as-possible configuration and the “worlds that might have been” in a counterfactual configuration where environmental drivers have not changed.

Associating changes in the behavior of extreme events to specific environmental drivers requires a systematic characterization of extreme events in the recent past. Recent advances in simulation capabilities and statistical methodologies allow us to focus on the impacts of a wide range of environmental drivers at regional-to-local scales, to focus on factors impacting the spatial and temporal co-occurrence of extremes, and to simulate how we might expect extremes to change in the future.

Extreme value statistics for daily precipitation

  The CASCADE D&A team is fully committed to using…

C20C+ highlights

The figure shows an Australian heat wave as an example use of…

Tropical cyclones

The figure shows an average number of tropical storms, tropical…

UNDERSTANDING STRANGE WEATHER

Understanding how changes in the ocean and atmosphere make weather events more extreme.

Climate extremes – such as hurricanes, major floods, and heat waves – not only stress society, they push the bounds of what modern climate models can simulate.

While extreme weather events often have impacts at relatively small (city-wide) scales, they are often driven by planetary scale forces. Observing and simulating these events requires datasets and models with high fidelity at a wide range of scales.  CASCADE is making novel use of self-similarity* in the atmosphere to define new standards for how model performance should change as model scale changes from ‘city’ to ‘planetary’. The CASCADE team is using the Department of Energy’s new Accelerated Climate Model for Energy (ACME) to simulate past weather and is using these new standards to evaluate the model.  Through a tight collaboration with ACME developers, these insights are being translated in to improved model fidelity at a wide range of scales.

Designating and projecting changes in extremes requires a well-developed understanding of the processes that drive changes in extremes. In particular, for the overall goal of the CASCADE SFA, it is necessary to understand how have changes in the physical behavior of the coupled system altered the frequencies of occurrence and the characteristics of extreme climate events? To address this issue, it is also necessary to advance our understanding of the processes governing the properties of extremes that are being investigated within the CASCADE SFA. The SFA team focuses specifically on the processes that drive multivariate extremes, the processes that drive changes in the spatio-temporal characteristics of extremes, and the fidelity with which these processes are represented in climate models.

*self-similarity means that the statistics of the atmosphere change predictably depending on the scale at which the statistics are evaluated

CASCADE Scientist Dáithí Stone interviewed at AGU on the Paris COP21

CASCADE Scientist Dáithí Stone was video interviewed at the AGU about the Paris COP21 agreement.

see this link, Dáithí’s section starts at 2:49 in the video
http://www.carbonbrief.org/agu-2015-scientists-react-to-paris-agreement-on-climate-change

The First C20C+ Event Attribution Hackathon

The first C20C+ “Hackathon” was held December 7-11, 2015 at the Computational Research Division of the Lawrence Berkeley National Laboratory. This get-together was a true working meeting with scientists learning what is in the C20C+ database of simulations and how to access it. Twelve scientists from around the world, including Japan, Australia, Great Britain, Germany, South Africa and the United States spent the week in a room full of computer terminals sharing ideas and experiences about extreme event attribution. Well over half of the attendees were early career scientists. Led by CASCADE team member, Dáithí Stone, the C20C+ Event Attribution Subproject is building a multi-model database of simulations target for extreme event analysis. Currently containing over 2PB of model simulations from four different modeling groups, the database is continuing to grow from both additional modeling groups and simulations. C20C+ data is available now, without restriction from the website http://portal.nersc.gov/c20c/ . Questions about the C20C+ Detection and Attribution Project  should be directed to Dr. Stone at dstone@lbl.gov .

IMG_1104

C20C+ hackers toured the new Computational Research and Theory (CRT) Facility and saw the latest NERSC supercomputer cori.nersc.gov on which many high resolution C20C+ simulations will be performed.

O’Reilly article on Top 10 Data Analytics problems at NERSC

See this popular press blog article by Prabhat detailing data analytics, including a contribution from the CASCADE project.
https://www.oreilly.com/ideas/big-science-problems-big-data-solutions

CASCADE team members participate in National Academy report on Event Attribution

The National Academy of Sciences has been commissioned to write a report on “Extreme Weather Events and Climate Change Attribution”. Chris Paciorek, the team leader for statistics has been chosen to be on the committee. A recent open meeting at the National Academy was also attended by Bill Collins, Daithi Stone and Michael Wehner. Michael was on a panel to discuss attribution of extreme flooding events and gave a short presentation. Daithi was a rappoteur for one of the break out groups. The agenda and presentations for the entire meeting can be found at this link

http://dels.nas.edu/global/basc/eea-workshop-linked-agenda.xml

The meeting was streamed live and we expect that a video recording will be available shortly.

Graphics from CASCADE’s high resolution model simulation featured in NATURE magazine

An oft-used figure from the original 25km CAM5 simulation performed by the CASCADE team of Michael Wehner and Prabhat has been featured again in a high profile way. The article in this week’s Nature Magazine discussing imprecise computing (or more precisely, stochastic parameterization) used the image at the top of the page. See

http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437

This image is actually a still from an animation which may be viewed here:

Berkeley Lab Climate Software Honored for Pattern Recognition Advances

The Toolkit for Extreme Climate Analysis (TECA), developed at Lawrence Berkeley National Laboratory to help climate researchers detect extreme weather events in large datasets, has been recognized for its achievements in solving large-scale pattern recognition problems.

“TECA: Petascale Pattern Recognition for Climate Science,” a paper presented by scientists from Berkeley Lab and Argonne National Laboratory at the 16th International Conference on Computer Analysis of Images and Patterns (CAIP), was awarded CAIP’s Juelich Supercomputing Center prize for the best application of HPC technology in solving a pattern recognition problem.

The paper, authored by Prabhat, Surendra Byna, Eli Dart, Michael Wehner and Bill Collins of Berkeley Lab and Venkat Vishwanath of Argonne, is funded in part by Berkeley Lab’s Scientific Focus Area entitled “Calibrated and Systematic Characterization, Attribution, and Detection of Extremes (CASCADE),” in which pattern recognition is being analyzed to better identify extreme weather events associated with climate change. The researchers demonstrated how TECA, running at full scale on NERSC’s Hopper system (a Cray XE6) and Argonne’s Mira system (an IBM BG/Q), reduced the runtime for pattern detection tasks from years to hours.

TECA implements multi-variate threshold conditions to detect and track extreme weather events in large climate datasets. This visualization depicts tropical cyclone tracks overlaid on atmospheric flow patterns.

Modern climate simulations produce massive amounts of data, requiring sophisticated pattern recognition on terabyte- to petabyte-sized datasets. For this project, Prabhat and his colleagues downloaded 56 TB of climate data from the fifth phase of theCoupled Model Intercomparison Project (CMIP5) to NERSC. Their goal was to access subsets of those datasets to identify three different classes of storms: tropical cyclones, atmospheric rivers and extra-tropical cyclones.

All of the datasets were accessed through a portal created by the Earth Systems Grid Federation to facilitate the sharing of data—in this case, atmospheric model data at six-hour increments running out to the year 2100 that was stored at 21 sites around the world, including Norway, the United Kingdom, France, Japan and the United States. NERSC’s Hopper system was used to preprocess the data, which took about two weeks and resulted in a final 15 TB dataset.

ESnet’s High-Speed Network Critical

Moving the data from NERSC to the Argonne Leadership Computing Facility (ALCF) was accomplished using ESnet’s 100-gigabit-per-second network backbone. Globus, a software package developed for moving massive datasets easily, further sped things up. Several datasets were moved during the project, with much better performance than the original data staging; for example, a replication of the entire raw project dataset (56 TB) from NERSC to ALCF took only two days.

“ESnet exists to enable precisely this sort of work,” said Dart, a network engineer at ESnet who oversaw the data transfer. “It is essentially impossible, in the general case, to assemble large-scale datasets from this big, distributed archive without a high-performance, feature-rich network to connect all the different pieces. Participating in projects like this not only helps the project but helps ESnet understand what the needs of the scientists are so that we can run our networks better and help other groups that might have similar problems.”

Once the data was received at ALCF, Vishwanath helped optimize the code to scale to Mira’s massively parallel architecture and high-performance storage system, thereby enabling TECA to fully exploit the system. The job ran simultaneously on 755,200 of Mira’s 786,432 processor cores, using 1 million processor hours in just an hour and a half.

“We have now developed capabilities to scale TECA to entire supercomputers—Hopper and Edison at NERSC and Mira at ALCF—and ask sophisticated questions about how extreme weather is expected to change in future climate regimes,” said Prabhat, who leads NERSC’s Data and Analytics Services team. “Only by running at these large scales are we able to process these terabyte datasets in under an hour. Attempting similar analytics on a single machine would be simply intractable. TECA is enabling climate scientists to extract value from large-scale simulation and observational datasets.”

Related reading:

Weathering the Flood of Big Data in Climate Research


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy’s Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.

UNCERTAINTY OF EXTREMES

Statistical methods to quantify changes in extreme weather in light of uncertainty.

Extremes are by definition rare. Statistical methods are critical for characterizing extreme weather in time and space, estimating changes in extreme weather, and quantifying our uncertainty about the frequency and trends in extremes.

The Statistics team is developing, implementing and advising on statistical methods for characterizing extremes in observations and model output. We are particularly focused on detection and designation of changes in extreme events, quantifying the evidence that the probability of extreme events are changing over time and that changes are caused by environmental drivers. As part of this work we are addressing the question of uncertainty characterization. A key focus is to identify the leading sources of uncertainty in our understanding of weather extremes (e.g., initial condition uncertainty/sampling uncertainty, forcing uncertainty, model parameter uncertainty).

Studies that aim to detect and attribute changes in extreme events have numerous sources of uncertainty, including parametric uncertainty, structural uncertainty, and even methodological uncertainty. In the current paradigm, these sources of uncertainty are dealt with in a piecemeal fashion that can result in overconfident statements of causation. This could cause, and in fact has produced, conflicts among causation statements on the same event. The sensitivity of event causation conclusions to the various sources of uncertainty remains sparsely investigated but is demonstrably important. The SFA team focuses on performing a multifaceted set of modeling experiments and analyses designed specifically to characterize, and if possible quantify, the importance of structural uncertainty, parametric uncertainty, and methodological uncertainty on our understanding of various classes of events.

CASCADE Scientist Dáithí Stone interviewed at AGU on the Paris COP21

CASCADE Scientist Dáithí Stone was video interviewed at the AGU about the Paris COP21 agreement.

see this link, Dáithí’s section starts at 2:49 in the video
http://www.carbonbrief.org/agu-2015-scientists-react-to-paris-agreement-on-climate-change

The First C20C+ Event Attribution Hackathon

The first C20C+ “Hackathon” was held December 7-11, 2015 at the Computational Research Division of the Lawrence Berkeley National Laboratory. This get-together was a true working meeting with scientists learning what is in the C20C+ database of simulations and how to access it. Twelve scientists from around the world, including Japan, Australia, Great Britain, Germany, South Africa and the United States spent the week in a room full of computer terminals sharing ideas and experiences about extreme event attribution. Well over half of the attendees were early career scientists. Led by CASCADE team member, Dáithí Stone, the C20C+ Event Attribution Subproject is building a multi-model database of simulations target for extreme event analysis. Currently containing over 2PB of model simulations from four different modeling groups, the database is continuing to grow from both additional modeling groups and simulations. C20C+ data is available now, without restriction from the website http://portal.nersc.gov/c20c/ . Questions about the C20C+ Detection and Attribution Project  should be directed to Dr. Stone at dstone@lbl.gov .

 

IMG_1104

C20C+ hackers toured the new Computational Research and Theory (CRT) Facility and saw the latest NERSC supercomputer cori.nersc.gov on which many high resolution C20C+ simulations will be performed.

O’Reilly article on Top 10 Data Analytics problems at NERSC

See this popular press blog article by Prabhat detailing data analytics, including a contribution from the CASCADE project.
https://www.oreilly.com/ideas/big-science-problems-big-data-solutions

CASCADE team members participate in National Academy report on Event Attribution

The National Academy of Sciences has been commissioned to write a report on “Extreme Weather Events and Climate Change Attribution”. Chris Paciorek, the team leader for statistics has been chosen to be on the committee. A recent open meeting at the National Academy was also attended by Bill Collins, Daithi Stone and Michael Wehner. Michael was on a panel to discuss attribution of extreme flooding events and gave a short presentation. Daithi was a rappoteur for one of the break out groups. The agenda and presentations for the entire meeting can be found at this link

http://dels.nas.edu/global/basc/eea-workshop-linked-agenda.xml

 

The meeting was streamed live and we expect that a video recording will be available shortly.

Graphics from CASCADE’s high resolution model simulation featured in NATURE magazine

An oft-used figure from the original 25km CAM5 simulation performed by the CASCADE team of Michael Wehner and Prabhat has been featured again in a high profile way. The article in this week’s Nature Magazine discussing imprecise computing (or more precisely, stochastic parameterization) used the image at the top of the page. See

http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437

This image is actually a still from an animation which may be viewed here:

Berkeley Lab Climate Software Honored for Pattern Recognition Advances

The Toolkit for Extreme Climate Analysis (TECA), developed at Lawrence Berkeley National Laboratory to help climate researchers detect extreme weather events in large datasets, has been recognized for its achievements in solving large-scale pattern recognition problems.

“TECA: Petascale Pattern Recognition for Climate Science,” a paper presented by scientists from Berkeley Lab and Argonne National Laboratory at the 16th International Conference on Computer Analysis of Images and Patterns (CAIP), was awarded CAIP’s Juelich Supercomputing Center prize for the best application of HPC technology in solving a pattern recognition problem.

The paper, authored by Prabhat, Surendra Byna, Eli Dart, Michael Wehner and Bill Collins of Berkeley Lab and Venkat Vishwanath of Argonne, is funded in part by Berkeley Lab’s Scientific Focus Area entitled “Calibrated and Systematic Characterization, Attribution, and Detection of Extremes (CASCADE),” in which pattern recognition is being analyzed to better identify extreme weather events associated with climate change. The researchers demonstrated how TECA, running at full scale on NERSC’s Hopper system (a Cray XE6) and Argonne’s Mira system (an IBM BG/Q), reduced the runtime for pattern detection tasks from years to hours.

TECA implements multi-variate threshold conditions to detect and track extreme weather events in large climate datasets. This visualization depicts tropical cyclone tracks overlaid on atmospheric flow patterns.

 

Modern climate simulations produce massive amounts of data, requiring sophisticated pattern recognition on terabyte- to petabyte-sized datasets. For this project, Prabhat and his colleagues downloaded 56 TB of climate data from the fifth phase of theCoupled Model Intercomparison Project (CMIP5) to NERSC. Their goal was to access subsets of those datasets to identify three different classes of storms: tropical cyclones, atmospheric rivers and extra-tropical cyclones.

All of the datasets were accessed through a portal created by the Earth Systems Grid Federation to facilitate the sharing of data—in this case, atmospheric model data at six-hour increments running out to the year 2100 that was stored at 21 sites around the world, including Norway, the United Kingdom, France, Japan and the United States. NERSC’s Hopper system was used to preprocess the data, which took about two weeks and resulted in a final 15 TB dataset.

ESnet’s High-Speed Network Critical

Moving the data from NERSC to the Argonne Leadership Computing Facility (ALCF) was accomplished using ESnet’s 100-gigabit-per-second network backbone. Globus, a software package developed for moving massive datasets easily, further sped things up. Several datasets were moved during the project, with much better performance than the original data staging; for example, a replication of the entire raw project dataset (56 TB) from NERSC to ALCF took only two days.

“ESnet exists to enable precisely this sort of work,” said Dart, a network engineer at ESnet who oversaw the data transfer. “It is essentially impossible, in the general case, to assemble large-scale datasets from this big, distributed archive without a high-performance, feature-rich network to connect all the different pieces. Participating in projects like this not only helps the project but helps ESnet understand what the needs of the scientists are so that we can run our networks better and help other groups that might have similar problems.”

Once the data was received at ALCF, Vishwanath helped optimize the code to scale to Mira’s massively parallel architecture and high-performance storage system, thereby enabling TECA to fully exploit the system. The job ran simultaneously on 755,200 of Mira’s 786,432 processor cores, using 1 million processor hours in just an hour and a half.

“We have now developed capabilities to scale TECA to entire supercomputers—Hopper and Edison at NERSC and Mira at ALCF—and ask sophisticated questions about how extreme weather is expected to change in future climate regimes,” said Prabhat, who leads NERSC’s Data and Analytics Services team. “Only by running at these large scales are we able to process these terabyte datasets in under an hour. Attempting similar analytics on a single machine would be simply intractable. TECA is enabling climate scientists to extract value from large-scale simulation and observational datasets.”

Related reading:

Weathering the Flood of Big Data in Climate Research


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy’s Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.

COMPUTATION AND PREDICTION

High performance computing to detect and predict changes in weather extremes.

The CASCADE computation and predictions team is developing scientific tools, workflow patterns, and scalable algorithms that can process massive model output on modern HPC systems.

The computation and predictions team is tightly integrating the detection system with the attribution framework so that statistics from the detection analyses automatically yield the probability distribution functions required to produce quantitative attribution and projection statements for extreme events. In a related effort, we are integrating event detection and analysis with the ILIAD ((InitiaLized-ensemble Identify, Analyze, Develop) framework to ensure that probabilities of event detection do not depend on model configuration, thereby mitigating the resolution dependence of hurricane detection.

The CASCADE research portfolio requires extensive computational and statistical infrastructure. Much of the SFA research requires implementation of novel statistical methods. Likewise, the formal application of UQ methods for extremes requires the implementation of a surrogate model and new developments in emulator methodology. Further, all of the SFA’s analyses require sophisticated, robust, and parallelizable data analysis tools to operate on the enormous datasets that we use (O (100{1000) TB). Therefore, the SFA focuses on three main research and development foci to support the broader goals of the project:  methodological development for systematic event causation at ne spatial scales, development of a statistical framework for the holistic uncertainty characterization work, and development of a multilevel model emulator for extremes.

CASCADE Scientist Dáithí Stone interviewed at AGU on the Paris COP21

CASCADE Scientist Dáithí Stone was video interviewed at the AGU about the Paris COP21 agreement.

see this link, Dáithí’s section starts at 2:49 in the video
http://www.carbonbrief.org/agu-2015-scientists-react-to-paris-agreement-on-climate-change

The First C20C+ Event Attribution Hackathon

The first C20C+ “Hackathon” was held December 7-11, 2015 at the Computational Research Division of the Lawrence Berkeley National Laboratory. This get-together was a true working meeting with scientists learning what is in the C20C+ database of simulations and how to access it. Twelve scientists from around the world, including Japan, Australia, Great Britain, Germany, South Africa and the United States spent the week in a room full of computer terminals sharing ideas and experiences about extreme event attribution. Well over half of the attendees were early career scientists. Led by CASCADE team member, Dáithí Stone, the C20C+ Event Attribution Subproject is building a multi-model database of simulations target for extreme event analysis. Currently containing over 2PB of model simulations from four different modeling groups, the database is continuing to grow from both additional modeling groups and simulations. C20C+ data is available now, without restriction from the website http://portal.nersc.gov/c20c/ . Questions about the C20C+ Detection and Attribution Project  should be directed to Dr. Stone at dstone@lbl.gov .

 

IMG_1104

C20C+ hackers toured the new Computational Research and Theory (CRT) Facility and saw the latest NERSC supercomputer cori.nersc.gov on which many high resolution C20C+ simulations will be performed.

O’Reilly article on Top 10 Data Analytics problems at NERSC

See this popular press blog article by Prabhat detailing data analytics, including a contribution from the CASCADE project.
https://www.oreilly.com/ideas/big-science-problems-big-data-solutions

CASCADE team members participate in National Academy report on Event Attribution

The National Academy of Sciences has been commissioned to write a report on “Extreme Weather Events and Climate Change Attribution”. Chris Paciorek, the team leader for statistics has been chosen to be on the committee. A recent open meeting at the National Academy was also attended by Bill Collins, Daithi Stone and Michael Wehner. Michael was on a panel to discuss attribution of extreme flooding events and gave a short presentation. Daithi was a rappoteur for one of the break out groups. The agenda and presentations for the entire meeting can be found at this link

http://dels.nas.edu/global/basc/eea-workshop-linked-agenda.xml

 

The meeting was streamed live and we expect that a video recording will be available shortly.

Graphics from CASCADE’s high resolution model simulation featured in NATURE magazine

An oft-used figure from the original 25km CAM5 simulation performed by the CASCADE team of Michael Wehner and Prabhat has been featured again in a high profile way. The article in this week’s Nature Magazine discussing imprecise computing (or more precisely, stochastic parameterization) used the image at the top of the page. See

http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437

This image is actually a still from an animation which may be viewed here:

Berkeley Lab Climate Software Honored for Pattern Recognition Advances

The Toolkit for Extreme Climate Analysis (TECA), developed at Lawrence Berkeley National Laboratory to help climate researchers detect extreme weather events in large datasets, has been recognized for its achievements in solving large-scale pattern recognition problems.

“TECA: Petascale Pattern Recognition for Climate Science,” a paper presented by scientists from Berkeley Lab and Argonne National Laboratory at the 16th International Conference on Computer Analysis of Images and Patterns (CAIP), was awarded CAIP’s Juelich Supercomputing Center prize for the best application of HPC technology in solving a pattern recognition problem.

The paper, authored by Prabhat, Surendra Byna, Eli Dart, Michael Wehner and Bill Collins of Berkeley Lab and Venkat Vishwanath of Argonne, is funded in part by Berkeley Lab’s Scientific Focus Area entitled “Calibrated and Systematic Characterization, Attribution, and Detection of Extremes (CASCADE),” in which pattern recognition is being analyzed to better identify extreme weather events associated with climate change. The researchers demonstrated how TECA, running at full scale on NERSC’s Hopper system (a Cray XE6) and Argonne’s Mira system (an IBM BG/Q), reduced the runtime for pattern detection tasks from years to hours.

TECA implements multi-variate threshold conditions to detect and track extreme weather events in large climate datasets. This visualization depicts tropical cyclone tracks overlaid on atmospheric flow patterns.

 

Modern climate simulations produce massive amounts of data, requiring sophisticated pattern recognition on terabyte- to petabyte-sized datasets. For this project, Prabhat and his colleagues downloaded 56 TB of climate data from the fifth phase of theCoupled Model Intercomparison Project (CMIP5) to NERSC. Their goal was to access subsets of those datasets to identify three different classes of storms: tropical cyclones, atmospheric rivers and extra-tropical cyclones.

All of the datasets were accessed through a portal created by the Earth Systems Grid Federation to facilitate the sharing of data—in this case, atmospheric model data at six-hour increments running out to the year 2100 that was stored at 21 sites around the world, including Norway, the United Kingdom, France, Japan and the United States. NERSC’s Hopper system was used to preprocess the data, which took about two weeks and resulted in a final 15 TB dataset.

ESnet’s High-Speed Network Critical

Moving the data from NERSC to the Argonne Leadership Computing Facility (ALCF) was accomplished using ESnet’s 100-gigabit-per-second network backbone. Globus, a software package developed for moving massive datasets easily, further sped things up. Several datasets were moved during the project, with much better performance than the original data staging; for example, a replication of the entire raw project dataset (56 TB) from NERSC to ALCF took only two days.

“ESnet exists to enable precisely this sort of work,” said Dart, a network engineer at ESnet who oversaw the data transfer. “It is essentially impossible, in the general case, to assemble large-scale datasets from this big, distributed archive without a high-performance, feature-rich network to connect all the different pieces. Participating in projects like this not only helps the project but helps ESnet understand what the needs of the scientists are so that we can run our networks better and help other groups that might have similar problems.”

Once the data was received at ALCF, Vishwanath helped optimize the code to scale to Mira’s massively parallel architecture and high-performance storage system, thereby enabling TECA to fully exploit the system. The job ran simultaneously on 755,200 of Mira’s 786,432 processor cores, using 1 million processor hours in just an hour and a half.

“We have now developed capabilities to scale TECA to entire supercomputers—Hopper and Edison at NERSC and Mira at ALCF—and ask sophisticated questions about how extreme weather is expected to change in future climate regimes,” said Prabhat, who leads NERSC’s Data and Analytics Services team. “Only by running at these large scales are we able to process these terabyte datasets in under an hour. Attempting similar analytics on a single machine would be simply intractable. TECA is enabling climate scientists to extract value from large-scale simulation and observational datasets.”

Related reading:

Weathering the Flood of Big Data in Climate Research


About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is the primary high-performance computing facility for scientific research sponsored by the U.S. Department of Energy’s Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 6,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a U.S. Department of Energy national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. DOE Office of Science. »Learn more about computing sciences at Berkeley Lab.