a person on a boat with a laptop where the laptop is connected to a cloud overhead.

Why NOAA’s Big Data Partnership with Amazon is a Game Changer

Introduction

The National Oceanic and Atmospheric Administration (NOAA) is one of the premier scientific agencies in the United States. As part of the Department of Commerce, it has a mission to “understand and predict changes in climate, weather, oceans, and coasts.” To generate useful insights from the vast amounts of data it collects and processes daily from satellites, radars, weather models, and more, NOAA announced in 2017 that it was partnering with Amazon Web Services (AWS) to develop a private cloud-based big data system.

This collaboration marked a major milestone for NOAA and signifies a transformation in how the agency can derive value from its data assets. In this article, we will explore why NOAA’s big data capabilities enabled through its AWS partnership are a total game changer.

Related article: The Benefits of Cloud Computing and Data Storage for Scientific Research

The Scale and Complexity of NOAA’s Data

To appreciate why transitioning to a cloud-based big data architecture is monumental for NOAA, it is important to first understand the massive scale and intricacy of the data they continuously collect and generate.

NOAA gathers over 20 terabytes of observational data daily from sources like satellites, sea buoys, weather radars, and scientific instruments all around the world. Additionally, NOAA’s super computers produce sophisticated weather and climate models that can output petabytes of data.

All of this raw data is incredibly diverse, consisting of thousands of different variables covering the land, atmosphere, oceans, space, and sun. It arrives in multitudes of formats from countless instruments and modeling systems. Analyzing this heterogeneous and unstructured data requires substantial computing power.

On top of the volume and variety challenges, the velocity of NOAA’s data poses yet another complexity. Massive amounts of real-time data are continuously streaming in from weather radars, satellites, and ocean buoys that needs to be processed immediately. Historical data also needs to be re-examined as computing capabilities improve.

Limitations of NOAA’s Legacy IT Systems

In the past, NOAA’s systems were not designed to handle this tremendous scale and heterogeneity of observational and model data arriving at blazing speeds. Their legacy IT infrastructure consisted of three data centers containing over 200 application-specific databases. Each database had its own custom schema tailored to particular data types.

This fragmented system made accessing, integrating, and analyzing data across applications quite difficult. Scientists wasted time learning how to extract and reformat data from individual databases instead of gaining insights. It also led to data silos, preventing users from leveraging the full spectrum of NOAA’s data assets.

In addition, the traditional data centers relied on aging hardware and inflexible on-premise storage. The infrastructure could not easily scale storage capacity or compute resources to match explosive data growth. Latency issues led to slow processing times, hampering real-time analytics.

Advantages of a Cloud-Based Big Data Architecture

To overcome these hurdles, NOAA embarked on an ambitious initiative in 2017 to transform its data infrastructure in partnership with AWS. The cornerstone of this modernization focused on migrating to a cloud-based big data architecture. This transition to the cloud unlocked numerous advantages over NOAA’s aging legacy systems.

First and foremost, the limitless storage capacity and flexible compute of the cloud allows the system to efficiently scale up or down to meet variable demand. NOAA’s big data platform can now easily expand to accommodate spikes in data volume, variety, and velocity. Advanced cloud services provide the “elasticity” needed for rapidly growing data workloads.

The new architecture also enables simplified but secure data access through governed APIs. This makes sharing and querying data far easier for users across NOAA. Cloud-based analytics tools facilitate advanced processing like machine learning to uncover hidden insights.

Furthermore, the cloud’s global infrastructure offers resilient storage with built-in redundancy and failover. This protects NOAA’s irreplaceable observational records while providing speedy access for users worldwide. Overall, the cloud delivers the availability, agility, and durability required for smooth operations.

Building a Cloud-Native Big Data Platform

The process of constructing NOAA’s cloud-native big data architecture on AWS was meticulously planned and executed over multiple years. It involved migrating a massive 300 petabytes of legacy storage into AWS along with associated applications, models, and processing tools. Let’s examine some key steps NOAA undertook to ensure a successful transition.

Consolidating Databases into a Data Lake

Rather than copying hundreds of segregated databases into the cloud, NOAA streamlined storage into a centralized AWS S3 data lake. This eliminated data silos and consolidated storage into a single repository using standardized formats. Building a well-governed data lake was essential for scalability and accessibility.

Leveraging Managed Cloud Services

NOAA took advantage of managed services like AWS Glue, Athena, and Redshift to enable serverless data integration, querying, and analytics directly in the cloud data lake. This simplified the process of working with vast amounts of heterogeneous data.

Creating New Data Pipelines

New AWS-based ingestion pipelines were developed to collect and process streaming data from satellites, radars, and other observation systems. Open formats like NetCDF were utilized to make data management easier. Automation ensured low-latency delivery into the cloud data lake.

Related article: The Benefits of Cloud Native Geospatial File Formats

Building Cloud-Native Applications

Hundreds of applications were rebuilt as cloud-native to leverage on-demand processing and maximize performance. Containers and orchestration systems helped deploy and run these applications efficiently across the dynamic cloud infrastructure.

Enhancing Security Posture

Security was paramount when designing the new architecture. Controls like role-based access, encryption, VPCs, and network segmentation helped enforce strict governance and access policies for sensitive data assets.

Impact and Results Thus Far

While migration to the cloud is still ongoing, NOAA’s big data capabilities enabled by AWS have already led to huge performance gains along with improved reliability and security. Most notably, data access latency plummeted from hours to just seconds in some cases. This allows scientists to derive insights at unprecedented speeds.

Additionally, the new architecture has created major cost savings by eliminating expensive legacy hardware expenses. Reduced maintenance needs also lower operational costs over the long term. Most importantly, the AWS partnership enables NOAA’s data to deliver much greater value to scientists, policymakers, businesses, and the public through advanced analytics.

As evidence of early successes, NOAA has already built innovative cloud-based services like NOAA Commercial Weather Data Pilot on AWS. This provides businesses with easy access to its weather data assets to support improved decision making. More creative initiatives and use cases will continue to be unlocked as NOAA’s cloud-migration progresses.

Looking Ahead to a Cloud-Powered Future

NOAA’s big data move to the AWS cloud represents a truly transformative leap into the future that lays the foundation for tackling ever-growing data challenges. With a scalable, elastic, and serverless architecture, NOAA can keep pace with exponential data volume increases while making information easier to access and analyze.

This will empower scientists to combine complex weather, climate, and environmental datasets in new ways, leading to groundbreaking discoveries that were impossible before. By unleashing its data assets’ full potential through the cloud, NOAA can continue fulfilling its public mission to understand and predict our dynamic Earth system like never before.

The efficiencies unlocked by migrating mission-critical systems to the cloud highlights why vast governmental agencies are following NOAA’s lead. As more scientific and research organizations make similar cloud transitions in the coming years, they will be able to gain extraordinary insights from data that delivers greater value to science and society as a whole.


Next Steps

Round Table Environmental Informatics (RTEI) is a consulting firm that helps our clients to leverage digital technologies for environmental analytics. We offer free consultations to discuss how we at RTEI can help you.

References

Scroll to Top