Content area
Abstract
ABSTRACT
Integrated Global Radiosonde Archive Toolkit (IGRAT) is a software that allows users to process data from the Integrated Global Radiosonde Archive. The archive provides global radiosonde observations in a text‐based format that requires additional manipulation to make it suitable for analysis. IGRAT provides an easy‐to‐use set of tools to streamline this preprocessing step, allowing users to readily visualise temporal and spatial patterns, plot atmospheric profiles, and export processed data sets in the more standard formats. IGRAT is accessible through a Python library and web interface, and users can adopt it to their preferred workflow. IGRAT significantly reduces preprocessing time before analysis, making it suitable for applications in climate research, meteorology and atmospheric sciences. IGRAT is fully open‐source, allowing the community to make contributions as well as modify IGRAT for personal use.
Full text
Introduction
The Integrated Global Radiosonde Archive (IGRA) is the most comprehensive set of global radiosonde observations (Durre et al. 2006, Durre et al. 2016), containing over 50 million soundings for over 2800 stations from 1905 to the present, with data updated on a daily basis. It is an invaluable resource for studies concerning the atmosphere, including analyses of troposphere-stratosphere exchange, tropopause structure and trends, water vapour content and climatological studies (Seidel and Randel 2006; Seidel et al. 2001, 2010; Randel et al. 2007, 2009; Xian and Homeyer 2019; Philipona et al. 2018; Schröder et al. 2016; Van Malderen et al. 2014; Thompson and Solomon 2005). Additionally, IGRA data are assimilated into numerical weather prediction models and global reanalyses, and are employed to calibrate satellite observations (Kobayashi et al. 2015; Reale et al. 2012; Free et al. 2005). IGRA data are distributed as simple text-based files in a standardised format, which must be parsed and preprocessed to obtain the desired observations. This preprocessing step can be cumbersome and it is prone to errors owing to the ambiguity of the data documentation and lack of usability of the data format. Consequently, the broader non-specialist audience has limited access to the IGRA data.
Here, we introduce the Integrated Global Radiosonde Archive Toolkit (IGRAT), a Python program and library which streamlines the preprocessing step of IGRA data. IGRAT automates the ingestion and parsing of the raw text files, converting the data into structured and easily manageable Network Common Data Form (NetCDF) files. It also provides built-in functions for data manipulation, visualisation and exporting preprocessed datasets. Additionally, since the toolkit directly queries the IGRA archive rather than storing a separate copy, any updates to IGRA (such as the addition of new sites) are automatically available through the toolkit without requiring modifications to the code.
It is important to note that the toolkit does not modify or maintain its own copy of the IGRA dataset; rather, it functions solely as a software layer that facilitates streamlined access, filtering and visualisation of IGRA data. Consequently, any limitations arising from the underlying dataset (e.g., missing values or gaps in station records) reflect the characteristics of IGRA itself rather than the toolkit, and users should remain mindful of these inherent data properties when conducting analyses.
IGRAT is open-source and easily modifiable for personal use cases. Unlike PyIGRA (Stauffer 2017), the only previous effort to simplify IGRA data preprocessing, IGRAT offers a comprehensive suite of analysis functions in addition to download and extraction functions. To the best of our knowledge, IGRAT is the first open-source, comprehensive package for IGRA data analysis.
In the next sections, we demonstrate how to install and use IGRAT, its functionality, and suggested workflows. We then show the practicality of IGRAT with function calls that perform tasks for classical tropospheric analysis.
Getting Started
The IGRA Toolkit requires Python 3.7 or higher and can be installed directly from GitHub with the following command:
IGRAT is compatible with all major operating systems (Linux, MacOS, Windows) and we have included detailed instructions on how to instal and use it. IGRAT provides access to the IGRA, which contains quality-controlled radiosonde observations from over 2800 stations worldwide.
Several scientific Python packages must be installed before using IGRAT, including NumPy (Harris et al. 2020), Pandas (McKinney 2010), Xarray (Hoyer and Hamman 2017), NetCDF (Rew and Davis 1990), Plotly (Inc., P. T 2015) and Matplotlib (Hunter 2007). These dependencies are automatically installed when installing the toolkit.
IGRAT provides two main data formats for working with radiosonde observations: (i) Pandas DataFrames, which offer flexible data manipulation and analysis capabilities; and (ii) NetCDF, which is optimised for large-scale atmospheric data and maintains the hierarchical structure of sounding profiles. IGRAT allows users to choose their preferred format based on their analysis needs.
A select set of illustrative example notebooks is available in the toolkit's documentation, demonstrating various use cases including data analysis, visualisation and quality control. IGRAT's GitHub repository () provides additional examples and documentation.
Features
IGRAT functions belong to one of three categories: data access, data processing, or data visualisation (Table 2). Data access functions manage interactions with the IGRA. Specifically, these functions implement efficient data downloading, caching and parsing mechanisms to handle the complex structure of raw IGRA data files. Additionally, there are functions for station metadata management, data file retrieval and format conversion between IGRA's native format and more accessible data structures. Data processing functions provide tools for data manipulation and analysis, including interpolation (on station data and individual profiles), filtering, and statistical analysis. With respect to quality control, we utilise the quality flags provided by the original IGRA dataset for data filtering. Specifically, missing or removed values are indicated by the codes −9999 and −8888, and our functions automatically exclude these values during data access and processing. Using modern visualisation libraries (e.g., matplotlib and plotly), visualisation functions allow users to plot individual profiles and station locations for both spatial and temporal analysis. The design of IGRAT allows interactions between all three categories, which we demonstrate in later sections.
All data processing and data visualisation functions take as input either a Pandas DataFrame or NetCDF Dataset in the IGRAT format (Figure 1). The pandas DataFrame format offers a tabular representation of radiosonde observations, where each row represents a single observation level within a sounding profile. The structure includes profile identification columns, atmospheric variables and quality control flags. This format excels at data manipulation, filtering, and statistical analysis, making it ideal for exploratory data analysis and time series studies. The DataFrame structure maintains the hierarchical nature of sounding data while providing a familiar interface for users accustomed to tabular data analysis.
[IMAGE OMITTED. SEE PDF]
In contrast, the NetCDF format preserves the multidimensional structure of radiosonde observations, organising data into a hierarchical dataset with dimensions for profiles and levels. The format includes the same atmospheric variables as the DataFrame but stores them as multidimensional arrays, with dimensions ‘num_profiles’ and ‘levels’. This structure is particularly well suited for large-scale atmospheric analysis, as it maintains the vertical structure of soundings and enables efficient subsetting of data. The NetCDF format also includes comprehensive metadata, such as variable units, long names and global attributes describing the dataset's origin and processing history. This format is optimised for memory efficiency when working with large datasets and integrates seamlessly with other atmospheric science tools that use the NetCDF standard.
Both formats support the same set of atmospheric variables in the same units (Table 1), ensuring consistency in analysis regardless of the chosen format. The toolkit provides functions to convert between formats, allowing users to leverage the strengths of each representation as needed. The DataFrame format is particularly useful for interactive analysis and data exploration, while the NetCDF format excels at handling large datasets and maintaining the multidimensional structure of atmospheric observations.
TABLE 1 Data variables and units.
| Variable | Unit |
| Pressure | hPa |
| Geopotential height | metres |
| Temperature | °C |
| Relative humidity | % |
| Dewpoint depression | °C |
| Wind direction | Degrees from north, 90° = east |
| Wind speed | Metres per second |
TABLE 2 Core IGRAT functions.
| Name | Category | Description |
| download_station_file | Data access | Download and unzip a station's data file from the IGRA archive |
| read_station_data | Data access | Download, unzip, read, and parse a station's data file, converting it to a Pandas DataFrame or NetCDF file in IGRAT format (Figure 1) |
| open_data | Data access | Open a file in IGRAT format |
| filter_by_date_range | Data processing | Filter an IGRAT file by date |
| filter_variables | Data processing | Filter an IGRAT file by variable |
| filter_stations | Data processing | Filter station data by year, latitude, and longitude range |
| interp_data | Data processing | Creates a uniform grid of the index variable and linearly interpolates the specified variable onto that grid |
| interp_data_to_pressure_levels | Data processing | Interpolate station data onto standard pressure levels |
| read_station_locations | Data processing | Downloads the station list from NOAA's IGRA archive and parses it into a DataFrame containing station metadata including location, elevation, and data availability |
| get_availability | Data processing | Returns all available dates and times in an IGRAT file |
| get_profile | Data processing | Extracts a profile for a specific date and time given an IGRAT file |
| compute_potential_temp | Data processing | Given an IGRAT file, computes the potential temperature for all soundings (WMO 1966) |
| plot_station_map | Data visualisation | Displays an interactive map of IGRA stations using Plotly. The map can optionally be coloured by elevation, date or number of observations |
| plot_profile | Data visualisation | Plot a profile between any two variables for a specific date and time |
Usage
To identify stations of interest, IGRAT provides functions to search and filter stations based on location, data availability and other criteria.
The plot_station_map function allows users to visualise select stations (Figure 2), all stations in the IGRA (Figure 3) or only recently updated stations, that is, stations that have received updates in 2025 (Figure 4). The plot_station_map function plots an interactive map that can be coloured elevation, starting year, ending year or number of observations. The function directly accesses the original data hosted by the NCEI every time this function is called.
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
[IMAGE OMITTED. SEE PDF]
For illustration, suppose a user has identified a station of interest (e.g., USM00072435). Users can access raw IGRA station files via the download_station_file function. Alternatively, they can access station data as NetCDF Datasets or Pandas DataFrames in the IGRAT file structure via the read_station_data function. The read_station_data function directly downloads the raw IGRA files hosted by the NCEI and converts it into a Pandas DataFrame or NetCDF Dataset in the IGRAT format. IGRAT also provides functions to convert between NetCDF Datasets and Pandas DataFrames.
IGRAT provides functions to interpolate variables to a uniform grid or standard pressure levels, and then plot individual profiles (Figure 5). The interp_data function supports a wide range of interpolation techniques (e.g., linear, cubic, nearest neighbourhood).
[IMAGE OMITTED. SEE PDF]
Case Study: Tropopause Analysis
IGRA data is frequently used in climatological analyses of the tropopause. This is usually done by interpolating temperature data for a select set of stations, calculating the height of the tropopause from the interpolated data, and then taking the average height across the stations (Seidel et al. 2001; Xian and Homeyer 2019). IGRAT allows users to do this in as little as 25 lines of code, demonstrated below for the station ID GQM00091212. The output of the code is displayed in Figure 6. We note that the tropopause figure presented here is not intended to provide an authoritative or fully accurate representation; rather, it serves as a proof of concept to demonstrate how the toolkit can be applied to IGRA soundings in the context of a real research problem. Users will need to modify the find_tropopause function in the code below for more sophisticated methods.
[IMAGE OMITTED. SEE PDF]
The IGRAT web application provides a comprehensive and user-friendly interface for interacting with IGRA radiosonde data entirely through a web browser, eliminating the need for local file parsing or specialised software installation. It can export station data to NetCDF files in IGRAT convention and also supports batch exporting. It includes an availability viewer and station information pane as well as a global map of all possible stations in the IGRA (Figure 7).
[IMAGE OMITTED. SEE PDF]
The IGRAT web application significantly enhances access to IGRA for specialists and non-specialists by providing an intuitive, interactive platform for exploring radiosonde data without requiring programming expertise. Additionally, the webapp enables not only researchers, but also educators, students, and scientists in other fields to interact with an important atmospheric science data set. By lowering the barrier to entry, the webapp facilitates hands-on learning and fosters curiosity about atmospheric processes, enabling users to visualise measured atmospheric temperature, pressure and humidity profiles, and investigate phenomena such as the tropopause or inversion layers. As a result, IGRAT not only serves the research community but also functions as an effective educational tool, supporting outreach efforts and promoting early engagement in climate and atmospheric science.
Conclusion
The lack of a readily available interface for IGRA data means that individual researchers must develop their own tools. This leads to methodological inconsistencies, unnecessary duplication of effort, reduced reproducibility of analyses and reduced opportunities for collaboration in atmospheric research involving IGRA data. IGRAT closes these gaps by providing the community with a standardised and well-maintained library, enabling researchers to more efficiently leverage the IGRA for diverse atmospheric investigations.
The present version of IGRAT was developed as a lightweight toolkit to introduce a core set of essential functions for accessing, filtering and visualising IGRA data. While the current release provides a foundational framework for radiosonde data analysis, several enhancements are planned to further expand its scientific utility.
In particular, incorporating additional thermodynamic and dynamical parameters such as convective inhibition (CIN) and wind shear would substantially broaden the scope of analyses possible with the toolkit. Similarly, the development of functions for automated identification of convective and stable boundary layers represents a promising avenue for enhancing the toolkit's capabilities.
The open-source nature of IGRAT lets researchers use it as a base for developing more specialised tools, thereby saving a significant amount of time needed to develop the tedious but necessary base tools for parsing raw IGRA files. We recognise that GitHub may not be accessible in all countries, and to address this concern we are exploring options for providing mirrored access through additional platforms to ensure broader accessibility. We will update the project documentation accordingly as additional hosting options become available. Finally, the IGRAT web application provides even greater accessibility of radiosonde data by providing much of the same functionality as the IGRAT Python library in an online, easy-to-use web interface.
Conflicts of Interest
The authors declare no conflicts of interest.
Data Availability Statement
The authors have nothing to report.
Durre, I., R. S. Vose, and D. B. Wuertz. 2006. “Overview of the Integrated Global Radiosonde Archive.” Journal of Climate 19, no. 1: 53–68.
Durre, I., X. Yin, R. S. Vose, et al. 2016. “Overview of the Integrated Global Radiosonde Archive.”
Free, M., D. J. Seidel, J. K. Angell, J. Lanzante, I. Durre, and T. C. Peterson. 2005. “Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC): A New Data Set of Large‐Area Anomaly Time Series.” Journal of Geophysical Research: Atmospheres 110, no. D22: D22102.
Harris, C. R., K. J. Millman, S. J. Van Der Walt, et al. 2020. “Array Programming With Numpy.” Nature 585, no. 7825: 357–362.
Hoyer, S., and J. Hamman. 2017. “Xarray: Nd Labeled Arrays and Datasets in Python.” Journal of Open Research Software 5, no. 1: 10.
Hunter, J. D. 2007. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering 9, no. 3: 90–95.
Inc., P. T. 2015. “Collaborative Data Science.”
Kobayashi, S., Y. Ota, Y. Harada, et al. 2015. “The JRA‐55 Reanalysis: General Specifications and Basic Characteristics.” Journal of the Meteorological Society of Japan. Ser. II 93: 5–48.
McKinney, W. 2010. “Data Structures for Statistical Computing in Python.” SciPy 445, no. 1: 51–56.
Philipona, R., C. Mears, M. Fujiwara, et al. 2018. “Radiosondes Show That After Decades of Cooling, the Lower Stratosphere Is Now Warming.” Journal of Geophysical Research: Atmospheres 123, no. 22: 12–509.
Randel, W. J., D. J. Seidel, and L. L. Pan. 2007. “Observational Characteristics of Double Tropopauses.” Journal of Geophysical Research: Atmospheres 112, no. D7: D07309.
Randel, W. J., K. P. Shine, J. Austin, et al. 2009. “An Update of Observed Stratospheric Temperature Trends.” Journal of Geophysical Research: Atmospheres 114, no. D2: D02107.
Reale, T., B. Sun, F. H. Tilley, and M. Pettey. 2012. “The NOAA Products Validation System (NPROVS).” Journal of Atmospheric and Oceanic Technology 29, no. 5: 629–645.
Rew, R., and G. Davis. 1990. “Netcdf: An Interface for Scientific Data Access.” IEEE Computer Graphics and Applications 10, no. 4: 76–82.
Schröder, M., M. Lockhoff, J. M. Forsythe, H. Q. Cronk, T. H. Vonder Haar, and R. Bennartz. 2016. “The GEWEX Water Vapor Assessment: Results From Intercomparison, Trend, and Homogeneity Analysis of Total Column Water Vapor.” Journal of Applied Meteorology and Climatology 55, no. 7: 1633–1649.
Seidel, D. J., C. O. Ao, and K. Li. 2010. “Estimating Climatological Planetary Boundary Layer Heights From Radiosonde Observations: Comparison of Methods and Uncertainty Analysis.” Journal of Geophysical Research: Atmospheres 115, no. D16: D16113.
Seidel, D. J., and W. J. Randel. 2006. “Variability and Trends in the Global Tropopause Estimated From Radiosonde Data.” Journal of Geophysical Research: Atmospheres 111, no. D21: D21101.
Seidel, D. J., R. J. Ross, J. K. Angell, and G. C. Reid. 2001. “Climatological Characteristics of the Tropical Tropopause as Revealed by Radiosondes.” Journal of Geophysical Research: Atmospheres 106, no. D8: 7857–7878.
Stauffer, R. 2017. “Pyigra.” https://github.com/retostauffer/PyIGRA.git.
Thompson, D. W., and S. Solomon. 2005. “Recent Stratospheric Climate Trends as Evidenced in Radiosonde Data: Global Structure and Tropospheric Linkages.” Journal of Climate 18, no. 22: 4785–4795.
Van Malderen, R., H. Brenot, E. Pottiaux, et al. 2014. “A Multi‐Site Intercomparison of Integrated Water Vapour Observations for Climate Change Analysis.” Atmospheric Measurement Techniques 7, no. 8: 2487–2512.
WMO. 1966. “International Meteorological Tables.”
Xian, T., and C. R. Homeyer. 2019. “Global Tropopause Altitudes in Radiosondes and Reanalyses.” Atmospheric Chemistry and Physics 19, no. 8: 5661–5678.
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.