Content area
Accurate and precise
Using the proposed evaluation framework, we were able to define optimized parameter settings for the methods that have user-configurable parameters. Most of these methods showed a significant reduction in the estimation errors after the optimization, with respect to the default settings. We have found significant differences in the performance of the studied methods, where the best-performing methods showed smaller normalized biases in the high reflectivity values (i.e.,
1 Introduction
The specific differential phase () plays an important role in many weather radar applications, particularly in hydrometeor classification and quantitative precipitation estimation (QPE) , and is used in data assimilation for numerical weather prediction models and in hydrological applications . Compared to radar power variables; i.e., the reflectivity factor at horizontal polarization () and differential reflectivity (), offers advantages in terms of accuracy, resilience, and reliability due to its immunity to radar miscalibration, attenuation , and partial beam blockage . It has also proven successful in hydrometeor classification routines , especially in the detection of graupel and small melting hail , and in the dendritic growth zone and the processes within . The ability of to accurately estimate heavy rainfall, differentiate hydrometeor types, and overcome attenuation in precipitation makes it an invaluable operational and research radar variable.
Despite its advantages, accurate estimation of from the radar-measured differential phase () remains challenging. Mathematically, is half of the range derivative of , which measures the phase shift between horizontally and vertically polarized signals as they propagate through precipitation. This phase shift () is influenced by hydrometeor concentration, shape, orientation, and composition . However, is not typically smooth and does not monotonically increase along the rain path; it contains fluctuations due to noise () and the backscattering differential phase () . Excessive filtering of to remove can lead to the loss of fine-scale precipitation features, affecting the accuracy of estimates, especially in light precipitation . In heavier precipitation, causes spikes in , especially at higher radar frequencies, further complicating accurate estimation .
To address these challenges, various methods have been developed to post-process and derive . Basic approaches include median filters and moving windows, while more advanced methods use regression techniques and self-consistency constraints based on or . Many of these methods are now available in open-source Python libraries such as the Python ARM (atmospheric radiation measurement) Radar Toolkit
Recent studies show that estimates can vary significantly depending on the algorithm and the optimizing parameters used. evaluated the errors in several methods using synthetic profiles and found that no single algorithm was optimal across all rainfall conditions. Instead, performance varied according to the complexity of the rain profile and the parameters selected. They identified kdp_maesaka (Py-ART's implementation of the , method) and phase_proc_lp (Py-ART's implementation of the , method) as particularly versatile. However, used synthetic data, which may miss some of the effects present in radar observations of rainfall (e.g., ). More recently, compared kdp_maesaka and phase_proc_lp in an extreme summer rainfall event, finding that fine-tuning the methods played a key role in retrieving the most accurate estimate. Despite these insights, the performance of and uncertainties in most methods of rainfall observations remain largely unexplored.
The goal of this study is to evaluate the performance of publicly available estimation methods on real rainfall observations and quantify their uncertainties as a function of reflectivity intensities. To achieve this, we employ a benchmarking , herein , computed from measured and , and use self-consistency relations in rain. In rainfall observations, the polarimetric radar variables are not independent, but one can be computed in terms of the others via the self-consistency relations . These relations have proven successful in hydrometeor classification and radar calibration correction routines. For instance, showed that the hydrometeors can be classified based on their proximity to clusters around self-consistency curves between polarimetric variables. At nearly the same time, noted the self-consistency of , , and in rainfall and proposed a method to calibrate and correct –rainfall estimates, benchmarking against –rainfall estimates. Thereafter, several methods linking the polarimetric variables via self-consistency relations have been widely used to calibrate . In this study, is computed using the consistency relation linking to and , which was first noted by and described in , requiring thorough selection and filtering of and . computed from quality controlled and measurements provides a solid benchmark against which to compare the methods' performance, to select optimal parameters, and to quantify the associated uncertainties.
This paper is organized as follows. Section describes the radar and disdrometer data, shows the evaluation framework, and introduces the estimation methods. Section presents and discusses the parameter optimization and performance evaluation of the methods, and Sect. summarizes the findings.
2 Data and methods
2.1 Radar and disdrometer data
This study evaluates the performance of estimation methods using real rainfall data. The dataset was collected from the Finnish Meteorological Institute (FMI) C-band Vantaa radar, located near Helsinki, Finland (see Fig. ). The radar recorded various quantities, including , , , , the cross-correlation coefficient (), and the hydrometeor classification product available in IRIS and based on the methodology described by . The spatial resolution of the radar is 500 m in range and 1° in azimuth, with scans performed every 5 min, and the data were collected with an elevation angle of 0.7°. The dataset spans June to September during the years 2017 to 2019, capturing precipitation events with variable rainfall intensities and spatial extents. The raw radar dataset as well as the post-processed estimates are available from the link provided in .
Figure 1
Map showing the locations of FMI's Vantaa radar (VAN) and Hyytiälä's research station where the drop size distribution (DSD) data were collected. The shaded area is a circle with a 250 km radius corresponding to the spatial coverage of the radar.
[Figure omitted. See PDF]
To ensure data quality, only periods when the Vantaa radar had calibration errors within 1 dB were selected. The calibration was verified by (i) identifying periods where solar flux estimates from Vantaa radar estimates aligned consistently with Dominion Radio Astrophysical Observatory (DRAO) estimates and (ii) selecting radar scans within the periods where -calibration offsets were within 1 dB, following the absolute calibration procedure outlined by . bias was estimated and corrected during these periods by computing the offset between observed and self-consistent derived from observed , as described in , and by computing the average for several cases.
The performance of the estimation methods is benchmarked against the self-consistent computed from measured and and by using self-consistency relations in rain. The self-consistency relations, which link the polarimetric radar variables, were derived by fitting radar variables computed using the open-source library, PyTMatrix . PyTMatrix provides a simple interface for T-matrix electromagnetic scattering calculations , requiring the user to provide drop size distribution (DSD) data and setting parameters such as temperature, the radar wavelength band, and the raindrop shape model. The parameters used for the T-matrix calculations were 10 °C, C-band, and , respectively, and the DSD data provided were collected by an optical Parsivel disdrometer located in Hyytiälä, Finland (see Fig. ).
The Parsivel disdrometer records the number of particles and their velocity at 1 min intervals, sorting the data into 32 bins depending on the particle's size (i.e., equivalent volume diameter) and 32 additional bins depending on the particle's fall velocity. From the number of particles and the size and velocity classes, the Parsivel disdrometer computes the precipitation type, which was used to filter out non-liquid particles. Observations were further limited to times when the 30 min average 2 m temperature exceeded 2 °C to ensure liquid rain. Following the filtering procedure suggested by to reduce statistical errors, only those measurements with at least 100 counts in two consecutive bins and positive counts in at least four consecutive bins were retained. The disdrometer dataset, covering June to September from 2014 to 2019, provided a robust basis for deriving average summer-season DSD parameters such as the mean volume diameter () and intercept () and shape () parameters. These parameters showed strong agreement with those reported by in a climatological study of Finland. From the derived DSD parameters (, , and ), the polarimetric radar variables were computed and used to derive the self-consistency relation, defining the framework to evaluate the estimation methods.
2.2 evaluation frameworkThe performance of the estimation methods is evaluated using as a benchmark. This quantity is calculated from each radar-measured tuple (, ), following a relationship of the form
1 where represents in linear units (mm6 m−3), and is in decibels (dB). The coefficients used in this relation are , , , and . The coefficients align well with those reported by , which employed the raindrop shape models by and .
To ensure the accuracy and robustness of the estimates used in the method assessment, it was crucial to quality control the and data. Radar observations of rain are often affected by non-meteorological measurements, resonance effects, and hail contamination . To address these issues, the following filtering steps were applied.
-
Noise filtering. A minimum threshold of 0.97 was applied to .
-
Non-meteorological observation filtering. The hydrometeor classification product from IRIS , based on , was used to exclude gates classified as non-meteorological.
-
reduction. Gates with 3.5 dB were excluded .
-
Non-liquid rain filtering.
-
Only radar scans from the warm months (June–September) were selected.
-
Gates not classified as rain by the hydrometeor classification product were excluded.
-
Hail contamination was addressed by removing gates with 50 dBZ.
-
Observations from the melting layer and above were suppressed by masking gates further than 70 km (see last dashed ring in Fig. ) from the radar in the radial direction. The distance was manually set by identifying gates with melting layer signatures .
-
In addition to addressing noise and non-liquid rain measurements, estimates are affected by attenuation in and differential attenuation in , particularly in cases of heavy rainfall, of extended propagation paths through rain (hereafter rain paths) , and when the radar's antenna radome is wet . To mitigate these effects, radar scans when there was rain on top of the radar within the past 20 min were discarded. Then, for the remaining cases, attenuation in heavy precipitation or extended rain paths was addressed by flagging the radar gates when suspected attenuation of at least 1 dB was detected. The attenuation in range gates was inferred using a standard method that linearly relates the losses in and to increases in . corresponds to the total span of along the radial within a rain path. A rain path was defined as a set of consecutive gates with rain features extending at least 20 km in the radial direction. For C-band radar, a minimum threshold of 12° in indicates attenuation of at least 1 dB . In this study, a threshold of 10° was used, meaning that gates within rain paths featuring 10° were flagged as attenuated.
An example of the filtering procedure applied to a radar scan is shown in Fig. . This figure demonstrates the effects of the filtering process and the attenuation considered on the chosen data samples.
Figure 2
Example of a Vantaa radar scan during a precipitation event on 16 July 2019 at an elevation angle of 0.7°. Panel (a) shows measured ; panel (b) shows filtered with masked gates in gray; panel (c) shows the same as panel (b) but with attenuated gates marked in red and non-attenuated gates marked in blue. Dashed rings represent radial distances of 10, 30, 50, and 70 km from the radar.
[Figure omitted. See PDF]
Following the filtering process, the dataset comprised 652 624 quality controlled gates from 70 radar scans. Figure presents a histogram of the data proportions across different values, showing the highest percentage of data between 30 and 35 dBZ, with a sharp decrease from 35 to 50 dBZ. The stacked bars indicate the percentages of attenuated and non-attenuated gates, with the ratio of attenuated to non-attenuated data increasing with greater .
Figure 3
Proportion of data across intervals of 5 dBZ. Attenuated data are represented by red bars, and non-attenuated data are represented by blue bars. The legend indicates the total number of gates with suspected attenuation of at least 1 dB (red) and less than 1 dB (blue).
[Figure omitted. See PDF]
2.3 estimation methodsThis section provides an overview of the estimation methods selected for this study. The selection criteria focused on the availability of these methods in widely used open-source libraries, such as Py-ART and radlib . At the time of this study, Py-ART version 1.17.0 included the following methods: kdp_maesaka, kdp_vulpiani, phase_proc_lp, and kdp_schneebeli. radlib version 2.0.3 included kdp_from_phidp and phidp_kdp_vulpiani. However, phidp_kdp_vulpiani was excluded from our analysis, as it is based on the same method proposed by that is already represented in Py-ART by kdp_vulpiani. Additionally, kdp_iris, a method based on and implemented by Vaisala in the IRIS software , was included. Table summarizes the key features of the selected methods, and a brief description of the methods is provided below.
a.
kdp_maesaka. Developed by and available in Py-ART, this method estimates non-negative from liquid-precipitation measurements. It addresses the issue of negative estimates observed in exclusively liquid-precipitation regions when using classical methods based on iterative filtering and local linear regression. identified that negative were caused by noise in during weak precipitation and by during heavy precipitation. The method restricts to positive values and assumes that is a monotonically increasing function with range, which is already unfolded.
- b.
kdp_vulpiani. Developed by and available in Py-ART, this method estimates for any type of precipitation. It uses a multistep moving-window range derivative approach to obtain . It calculates a profile from the range derivative of a noise-reduced, offset-corrected, and unfolded profile. At each window, is compared to thresholds representing unrealistic values within precipitation, correcting possible aliasing with the minimum threshold.
- c.
phase_proc_lp. Developed by and available in Py-ART, this method estimates non-negative from liquid-precipitation measurements. It uses a linear-programming (LP) method to enforce monotonic behavior in , restricting to positive values. It extracts from , and it uses self-consistency constraints to bound estimates based on measured . The method requires quality controlled and allows user-defined thresholds to exclude hail and the setting of the environmental 0 °C level to exclude mixed-phase particles.
- d.
kdp_from_phidp. Implemented in radlib and based on , this method estimates for any type of precipitation. It computes range-wise differentiation of over a user-defined window size length, defaulting to seven gates for a range resolution of 1 km. Unlike kdp_vulpiani, it allows the selection of the method for range gate differentiation, albeit without supporting multiple iterations, prioritizing speed over phase unfolding and noise issues in .
- e.
kdp_schneebeli. Developed by and available in Py-ART, this method estimates for any type of precipitation. It selects the best-averaged profile from forward and backward propagation Kalman-filtered estimates. The Kalman filters are applied twice to each range gate state (accounting for forward and backward propagation) multiple times, recalculating the covariance matrices each time to yield unique states, and the best estimate is selected.
- f.
kdp_iris. Implemented in the Vaisala software IRIS and based on , this method estimates for any type of precipitation. It computes adaptively through piece-wise regression and a regularization framework that minimizes both smoothness in and regression errors. The regularization adapts based on range variations in and measurements, preserving steep changes in high-intensity precipitation while reducing variations in low-intensity precipitation.
List of the methods studied, with key features.
| Method | Source | Data | Precipitation | Mathematical approach | Tested parameters |
|---|---|---|---|---|---|
| pre-requisites | type | (constraints) | |||
| kdp_maesaka | Py-ART | Unfolded | Liquid | Variational | Clpf |
| kdp_vulpiani | Py-ART | Pre-filtered | Any | Moving window | windsize, n_iter |
| phase_proc_lp | Py-ART | Unattenuated | Liquid | Linear programming () | self_const, coef, window_len |
| kdp_from_phidp | radlib | No NaN values | Any | Moving window | winlen, dr |
| kdp_schneebeli | Py-ART | Pre-filtered | Any | Kalman filter | – |
| kdp_iris | IRIS | – | Any | Adaptive regression | – |
3.1 Parameter optimization of methods
All the methods except kdp_iris are available in open-source libraries and feature user-configurable parameters to improve the estimates. However, two methods are excluded from the optimization: kdp_schneebeli and kdp_iris. In kdp_schneebeli, the error covariance matrices of the measurements (rcov) and state transitions (pcov) require a large ensemble of stochastic simulated rainfall fields to be derived. Since such information is not available to us, we use the method with default settings. In kdp_iris, the end user has no effect on the derivation of . Instead, at the FMI, we use the product as it comes from the IRIS software . Therefore, the optimization focuses on the kdp_maesaka, kdp_vulpiani, phase_proc_lp, and kdp_from_phidp methods, and in this section, we quantify the errors under varying parameter settings and select the optimal values.
First, a qualitative analysis is provided using of vs. scatterplots, illustrating the relationship between estimated ( axis) and ( axis) and benchmarking against (dashed black line). Then, the errors in each method as a function of parameter setting and are provided. To achieve this, the dataset was divided into six 5 dB intervals ranging from 20 to 50 dBZ; we then computed the root-mean-square error (RMSE) and mean error (herein bias) for each interval and normalized by the mean from each interval. The optimal parameters were selected based on the smallest averaged normalized RMSE (herein NRMSE) in the last three intervals (i.e., 35–50 dBZ), prioritizing the accuracy of estimates in high-intensity precipitation.
The settings tested for kdp_maesaka, kdp_vulpiani, phase_proc_lp, and kdp_from_phidp are summarized in Table , which indicates the tested values, the default value(s) used in the implementation, and the optimal value(s) found in this study.
Table 2
Summary of the parameter settings for each of the optimized methods.
| Method | Parameter(s) tested | Tested values | Default | Optimal |
|---|---|---|---|---|
| kdp_maesaka | Clpf | |||
| kdp_vulpiani | windsize | 10 | 10 | |
| n_iter | 10 | 2 | ||
| phase_proc_lp | window_len | 35 | 5 | |
| kdp_from_phidp | winlen | 7 | 11 | |
| dr | 1.0 | 2 |
Py-ART's implementation of the method, kdp_maesaka, features the optimizing parameter Clpf, which regulates the low-pass filter in . The low-pass filter controls the degree of smoothing of , with higher Clpf values producing smoother profiles. In kdp_maesaka, the default value of Clpf is 1.0, and this value is scaled by the range resolution of the radar to match the resolution of the constraints applied to . The scaling is proportional to the fourth power of the range resolution of the radar, and if we were to compare to the values used in , a value of 1.0 corresponds to for the Vantaa radar's range resolution of 500 m. In , Clpf values from to were tested on one rainfall case using a 250 m range resolution X-band radar. Their results show that values closer to suppressed fine-scale precipitation features while producing a smooth and clean , whereas values closer to preserved fine-scale features while including substantially more noise. These results lead us to test values from to , corresponding to and in kdp_maesaka and accounting for the Vantaa radar's range resolution. Figure a–h show scatterplots of estimates using kdp_maesaka as a function of for different Clpf values. All scatterplots show overall accurate and precise estimates within the range of 0–30 dBZ. This result implies that the subset of Clpf values studied produces sufficiently smoothed to reduce the impact of noise in light precipitation. However, the effects of excessive smoothing are observed in the range of 40–50 dBZ, where noticeably underestimates . By comparing the scatterplots from Clpf to Clpf in the interval of 40–50 dBZ, the underestimation of is stronger with increasing Clpf.
Figure 4
Scatterplots of estimated from kdp_maesaka as a function of reflectivity and for various values of Clpf. Panels (a)–(h) show results with Clpf values from to . The dashed black line corresponds to .
[Figure omitted. See PDF]
To capture the influence of Clpf on the errors when estimating as a function of precipitation intensity, Fig. a and b show NRMSE and the normalized bias of estimates with varying Clpf. The smaller and more consistent NRMSEs in regions of 35 dBZ in Fig. a indicate that kdp_maesaka reaches stable solutions for all Clpf values tested. However, Clpf of showed the largest variability when transitioning from lowest to highest among the values tested, producing the largest NRMSE for 35 dBZ and the lowest otherwise. The underestimation of using is evidenced in Fig. b for 35 dBZ, where the results were the most negatively biased. The biases from the remaining parameters were equally consistent and smaller.
Figure 5
Panel (a) shows RMSE normalized by the interval-averaged of kdp_maesaka relative to as a function of reflectivity and for various values of Clpf; panel (b) shows the same as panel (a) but for the normalized bias metric.
[Figure omitted. See PDF]
Our results show that larger values of Clpf lead to larger errors due to oversmoothing of . Overall, kdp_maesaka performs consistently when precipitation intensities reach 35 dBZ. The Clpf yielding the smallest 35–50 dBZ averaged NRMSE was .
3.1.2 Py-ART's Vulpiani methodPy-ART's implementation of the method, kdp_vulpiani, features two optimizing parameters: windsize (the number of gates used for estimating ) and n_iter (the number of re-estimations of per window). Higher values of these parameters result in smoother profiles. found various parameter combinations that worked well depending on precipitation complexity, leading us to test combinations from 2 to 14 for both parameters. Figure a–p show scatterplots comparing the performance of kdp_vulpiani for different values of windsize and n_iter when estimating . Figure a shows the scatter of using the largest settings tested, whereas Fig. p shows the results for the smallest. Each row holds windsize constant, while each column holds n_iter constant. In the scatterplot from Fig. a with windsize 14 and n_iter 14, the data are predominantly clustered under for 35 dBZ, indicating underestimation of . For 35 dBZ, this parameter setting produces accurate and precise results. The scatterplots for smaller setting values, i.e., towards Fig. p, are slightly more accurate, albeit significantly less precise; the scatterplot from Fig. p with windsize 2 and n_iter 2 shows a wider spread of data for all values, although with slightly enhanced clustering of data around for 35 dBZ. These results indicate a trade-off between precision and accuracy when varying windsize and n_iter from 14 to 2. In particular, larger settings favored precision while degrading accuracy, and smaller settings favored accuracy with degraded precision.
Figure 6
Scatterplots of estimated from kdp_vulpiani as a function of reflectivity and for various values of windsize and n_iter. Panels (a)–(p) show results with (windsize, n_iter) tuple values from (14, 14) to (2, 2), decreasing windsize with increasing rows. The dashed black line corresponds to .
[Figure omitted. See PDF]
To further analyze the trade-off between accuracy and precision when varying windsize and n_iter in kdp_vulpiani, Fig. a–b show the NRMSE and normalized bias of estimates with varying windsize and n_iter as a function of . Figure a shows that a windsize of 2 yielded the worst performance, implying that the gain in accuracy by including fine-scale fluctuations in is not enough to compensate for the increased errors due to the inclusion of outliers. On the other hand, a windsize of 14 shows good performance across the entire range. However, the predominantly negative normalized bias of a windsize of 14 relative to the smaller counterparts in Fig. b indicates that the larger windsize leads to more underestimation of than lower windsize values. The consistent errors when varying n_iter in Fig. a indicate that this parameter setting does not impact the performance of kdp_vulpiani as strongly as windsize does, especially in low . However, results from Fig. b suggest that smaller n_iter significantly reduces the underestimation of estimates when the windsize is large. Our results strongly resemble those reported in , indicating that a smaller number of iterations and moderate window sizes significantly enhance the performance of kdp_vulpiani. In particular, among the RMSE heat maps of kdp_vulpiani shown in , windsize 10 and n_iter 2 produced the best results, coinciding with the smallest 35–50 dBZ averaged NRMSE in this study.
Figure 7
Panel (a) shows RMSE normalized by the interval-averaged of kdp_vulpiani relative to as a function of reflectivity and for various values of windsize and n_iter; panel (b) shows the same as panel (a) but for the normalized bias metric.
[Figure omitted. See PDF]
3.1.3 Py-ART's linear programming methodPy-ART's implementation of an LP method proposed in , phase_proc_lp, allows the user to tune the window length to smooth , window_len, and two intertwined parameters constraining the output via self-consistency relations: self_const and coef. The former is the weight of the self-consistency constraint and the latter is the exponent in the self-consistency relation linking to , which is given in as but is expressed in phase_proc_lp as self_const. Since information about the expected was known beforehand, given by , we provided the method with the optimal values of self_const and coef 0.914. In this way, the parameter optimization of phase_proc_lp was focused solely on window_len variations.
The parameter window_len defines the window length for smoothing of the LP-processed field before is estimated. The default setting of this parameter is 35, indicating a smoothing window length of 17.5 km for a range resolution of 500 m. To include finer-scale precipitation features (e.g., 2.5 km), phase_proc_lp was tested with window_len values ranging from 5 to 40. Figure a–h show scatterplots comparing the performance of phase_proc_lp for different settings of window_len estimating . Each panel from Fig. a to h shows estimated using window lengths from 5 to 40 in intervals of 5. The scatterplot from Fig. a with window_len 5 shows data points predominantly clustered around across the entire range, indicating strong correlation between and . Even in high ranges (i.e., 35 dBZ), the tight correlation between and holds, indicating high accuracy and precision of in the presence of heavy precipitation. The accuracy and precision of relative to decreases progressively when window_len increases, indicated by the spreading and downward shifting of the estimates relative to . Especially for the range 35 dBZ, the scatterplots from Fig. e to h, with window_len from 25 to 40, respectively, show substantial underestimation of relative to , indicating stronger oversmoothing of for larger values of window_len. Comparing the scatterplots, window_len 5 undoubtedly shows the best performance of phase_proc_lp. This result agrees with phase_proc_lp window_len experiments by in an extremely heavy precipitation event, where small window_len yielded the best performance. Compared to the phase_proc_lp experiments by , our results suggest that smaller window_len produce overall more accurate estimates. However, the influence of the self-consistency constraints proposed in plays a key role in this aspect; if optimal self-consistency constraints are not provided or do not match theoretical expectations, the precision and accuracy in significantly decreases, and larger window_len values compensate for this by oversmoothing (see Appendix for results of the performance of phase_proc_lp with very little influence of self-consistency constraints).
Figure 8
Scatterplots of estimated from phase_proc_lp as a function of reflectivity and for various values of window_len. Panels (a)–(h) show results with window_len values from 5 to 40 when fixing coef at 0.914 and self_const at . The dashed black line corresponds to .
[Figure omitted. See PDF]
To further investigate the effects of window_len on the performance of phase_proc_lp, Fig. a–b show the NRMSE and normalized bias of estimates with varying window_len as a function of . In agreement with the patterns observed in the scatterplots in Fig. , window_len 5 produced the best performance compared to other parameter settings. Interestingly, even in light precipitation (e.g., 30 dBZ), smaller values of window_len produced the best NRMSE metrics, indicating that larger window_len do not further improve the precision of phase_proc_lp. Instead, larger window_len enhanced the bias of relative to , as shown in Fig. b. The parameter window_len 5 produced undoubtedly the best metrics for phase_proc_lp, and it was selected as the optimal parameter.
Figure 9
Panel (a) shows RMSE normalized by the interval-averaged of phase_proc_lp relative to as a function of reflectivity and for various values of window_len; panel (b) shows the same as panel (a) but for the normalized bias metric.
[Figure omitted. See PDF]
3.1.4 radlib's Vulpiani methodradlib's implementation of the method, kdp_from_phidp, features two optimizing parameters: winlen (the number of gates used to reconstruct ) and dr (the gate length resolution in km). We tested winlen values from 3 to 11 and dr values from 0.5 to 4. Figure a–l show scatterplots of estimates using kdp_from_phidp when varying the settings of winlen and dr. Each row of scatterplots holds winlen constant, while decreasing dr from left to right. Similarly, each column of scatterplots holds dr constant, while decreasing winlen from top to bottom. The scatterplot in Fig. a, with winlen 11 and dr 4, shows clustered predominantly around 0 ° km−1 across the entire range, indicating substantial oversmoothing of . Even for 30 dBZ, the noticeable underestimation of relative to indicates that kdp_from_phidp is not able to capture signatures of heavy precipitation for large winlen and dr settings. Moving towards the scatterplot in Fig. d, a smaller dr enhances the accuracy of kdp_from_phidp, particularly for 30 dBZ. However, the gain in accuracy comes together with a loss in precision in estimates, indicated by the wider spread of the data. In addition, decreasing dr makes kdp_from_phidp more prone to the inclusion of outliers, illustrated by data points with 1 ° km−1, even for 20 dBZ. The scatterplots in Fig. e–h follow the same behavior as in the first row except for a wider spread of data, suggesting that decreasing winlen while holding dr constant overall reduces the precision of kdp_from_phidp. When moving from Fig. e to h, the accuracy of estimates increases while precision decreases with decreasing dr. In the last row, i.e., from Fig. i to l, estimates are the most scattered for the same dr, indicating a loss in precision of kdp_from_phidp when reducing winlen. The scatterplot in Fig. l with the smallest parameter settings tested (winlen 3 and dr 0.5) resembles a scatterplot of random noise with no significant clustering of data, suggesting extremely poor correlation relative to . Comparing the scatterplots row-wise and column-wise, decreasing winlen or dr significantly degrades the precision of the method. However, the effect on the accuracy is more complex; simultaneously setting winlen and dr to large values leads to substantial underestimation of , whereas small values lead to noisy . These results suggest that the effects of varying winlen and dr on the performance of kdp_from_phidp are strongly intertwined, requiring more analysis of the trade-off between accuracy and precision offered by variations in these parameters.
Figure 10
Scatterplots of estimated from kdp_from_phidp as a function of reflectivity and for various values of winlen and dr. Panels (a)–(l) show results with (winlen, dr) tuple values from (11, 4) to (3, 0.5); winlen decreases in intervals of 4 per row, whereas dr decreases by half per column. The dashed black line corresponds to .
[Figure omitted. See PDF]
To analyze the trade-off between accuracy and precision when using winlen and dr in kdp_from_phidp, Fig. a–b show the NRMSE and normalized bias of estimates with varying winlen and dr as a function of . Even though Fig. a has been clipped at 5.0, it is important to note the significantly high values when using the smallest dr (97.6, 135.4, and 278.6 for winlen of 11, 7, and 3, respectively). The predominantly higher NRMSE values with the smallest dr indicate that the precision of kdp_from_phidp reduces significantly with dr 1 for any winlen tested. An exception occurs in the interval dBZ, where the smallest dr yield the best metrics due to slight improvements in the accuracy. Despite the limited amount of data within this interval (see Fig. ), the clustering of around in Fig. and the small normalized biases in Fig. b suggest that accuracy improved slightly for the smallest dr. The smaller NRMSE with high dr in Fig. a is counterbalanced by the predominantly larger negative bias for larger dr in Fig. b. This implies that larger dr values in kdp_from_phidp lead to the underestimation of for all winlen tested. As a conclusion, combining large winlen with smaller dr produces the best performance for heavier precipitation (i.e., 30 dBZ), whereas combining large winlen with larger dr produces the best results for light precipitation. Overall, small values of winlen reduce the precision significantly in the method without improving accuracy. The parameter setting with the smallest 35–50 dBZ averaged NRMSE was winlen 11 and dr 2.
Figure 11
Panel (a) shows RMSE normalized by the interval-averaged of kdp_from_phidp relative to as a function of reflectivity and for various values of winlen and dr; panel (b) shows the same as panel (a) but for the normalized bias metric. The numbers on top of the bars indicate the values of the metric exceeding the -axis limit selected.
[Figure omitted. See PDF]
3.2 Performance assessment of methods relative toThe performance of the methods is analyzed qualitatively in Sect. and quantitatively in Sect. . For these analyses, we used the parameter-optimized kdp_maesaka, kdp_vulpiani, phase_proc_lp, and kdp_from_phidp and included kdp_schneebeli and kdp_iris.
3.2.1 Qualitative assessment
We qualitatively assessed the precision and accuracy of the estimated using scatterplots of vs. for each method. Figure shows six scatterplots comparing the performance of kdp_maesaka, kdp_vulpiani, phase_proc_lp, kdp_from_phidp, kdp_schneebeli, and kdp_iris at estimating . Each scatterplot illustrates the relationship between estimated ( axis) relative to ( axis) against benchmarking (dashed black line). For the parameter-optimized methods in Fig. a–d, the optimal parameter selected is indicated in the panel title together with the method's name. Comparing the scatterplots, phase_proc_lp demonstrates the highest accuracy and precision, evidenced by the data narrowly clustered around across the entire range. The kdp_from_phidp and kdp_schneebeli methods show the least accuracy and precision, with a broader spread and more outliers, particularly when 30 dBZ. For higher values, even though kdp_from_phidp shows better precision but worse accuracy than kdp_schneebeli, these two methods strongly underestimate , evidenced by the predominant clustering of estimates below 0.5 ° km−1. The kdp_maesaka method shows less scattering of estimates compared to kdp_from_phidp and kdp_schneebeli, indicating higher precision and accuracy, particularly for 30 dBZ. However, for 30 dBZ, the performance of kdp_maesaka deteriorates rapidly, as shown by the broader spread and significant underestimation of relative to . The kdp_vulpiani and kdp_iris methods show moderate performance, with better accuracy and precision than kdp_from_phidp, kdp_schneebeli, and kdp_maesaka but worse performance than phase_proc_lp. Between the kdp_vulpiani and kdp_iris methods, kdp_vulpiani shows better correlation of estimates with for 35 dBZ, indicating higher accuracy in heavier precipitation. However, kdp_iris shows less scattering across the entire range, indicating overall higher precision than kdp_vulpiani. The kdp_vulpiani, kdp_from_phidp, kdp_schneebeli, and kdp_iris methods include negative values, which should not be expected in rain observations. These negative estimates predominantly show up in lighter precipitation (i.e., 30 dBZ), indicating that they are most likely produced by noise in . However, the inclusion of negative estimates is useful, for instance, in the detection of snow crystals, allowing kdp_vulpiani, kdp_from_phidp, kdp_schneebeli, and kdp_iris to be used in a wider range of applications compared to kdp_maesaka and phase_proc_lp. The relatively high accuracy and precision of kdp_iris and kdp_vulpiani, together with the inclusion of negative estimates, leave these two methods as candidates well-suited for QPE and calibration and hydrometeor classification routines.
Figure 12
Scatterplot of estimated from each parameter-optimized method relative to as a function of reflectivity. Panels (a)–(f) show kdp_maesaka, kdp_vulpiani, phase_proc_lp, kdp_from_phi_dp, kdp_schneebeli, and kdp_iris, respectively. The dashed black line corresponds to .
[Figure omitted. See PDF]
3.2.2 Quantitative assessmentThe quantitative assessment of the methods was achieved through the metrics of NRMSE and normalized bias and complemented with statistics from the Wasserstein distance (WD) . The WD measures the similarity between two cumulative distributions, given in this study by the estimated by each method and . On the one hand, NRMSE and normalized bias computed as a function of , allow the assessment of the relative accuracy and precision of the methods based on precipitation intensities. The WD, on the other hand, estimated from each radar scan independently and with statistics over the entire set of scans, allows the assessment of the relative consistency and robustness of the methods.
Figure a–b show the NRMSE and normalized bias of estimated for each method. Overall, phase_proc_lp shows the best performance, as evidenced by the lowest NRMSE values in Fig. a and moderately low bias in Fig. b across all intervals. In contrast, kdp_schneebeli shows the worst performance among the methods, indicated by the highest NRMSE values and moderately high bias across all intervals. The kdp_from_phidp method shows substantially higher NRMSE values than kdp_maesaka, kdp_vulpiani, phase_proc_lp, and kdp_iris but is significantly smaller than kdp_schneebeli, particularly for the smallest values. The relatively small bias of kdp_from_phidp when NRMSE values are substantially high is explained by the positive-to-negative symmetrical spread of estimates around the axis, indicating poor precision. Additionally, the persistently negative and large normalized bias of this method relative to the other methods indicates that kdp_from_phidp underestimates the most. The kdp_maesaka, kdp_vulpiani, and kdp_iris methods have moderate NRMSE values, performing better than kdp_schneebeli and kdp_from_phidp but not as well as phase_proc_lp. Among these three methods, kdp_maesaka has the smallest NRMSE values for 35 dBZ but the largest when 40 dBZ. The relatively large positive bias of kdp_maesaka when 30 is a direct consequence of the exclusion of negative estimates. However, the persistently larger negative bias of kdp_maesaka relative to kdp_vulpiani and kdp_iris when 30 dBZ indicates stronger underestimation of and thus less accuracy. These results indicate that in comparison to other methods, kdp_maesaka performs slightly better in light precipitation (i.e., dBZ) but worse in heavier precipitation. Between kdp_vulpiani and kdp_iris, kdp_iris shows overall smaller NRMSEs and normalized bias, indicating higher accuracy and precision than kdp_vulpiani.
Figure 13
Panel (a) shows the bias of estimated from each parameter-optimized method relative to as a function of reflectivity; panel (b) shows the same as panel (a), but the bias is normalized by interval-averaged . The numbers on top of the bars indicate the values of the metric exceeding the -axis limit selected.
[Figure omitted. See PDF]
Figure 14
Panel (a) shows the boxplot of the WD computed for each parameter-optimized method; panel (b) shows the same as panel (a) but excluding kdp_schneebeli for better visualization of the better-performing methods. The boxplots display the WD median (dashed black line), IQRs (boundaries of the box), IQR (whiskers), and the outliers (black crosses).
[Figure omitted. See PDF]
Complementary to the NRMSE and normalized bias metrics, we evaluated the consistency and robustness of the methods using the Wasserstein distance (WD). The WD was computed for each radar scan independently using the wasserstein_distance module from SciPy . Then, the statistics from the estimated WD values for all scans were visualized and analyzed using boxplots. Figure consists of two panels comparing the WD boxplots of the methods. Figure a compares the WD for all methods, including kdp_schneebeli, which presented a significantly large WD. Figure b presents the same data as (a) but excludes kdp_schneebeli to better compare the remaining methods. Each boxplot summarizes the statistics of estimated WDs by showing the median (dashed black line), interquartile ranges (IQR), IQR (whiskers), and outliers (crosses). The insights provided by the boxplots in this analysis are twofold. First, a WD median closer to 0 indicates higher similarity between the cumulative distributions of a method's estimated and that from , ultimately indicating higher accuracy. Second, a narrower IQR indicates less variability in a method's performance between scans, indicating higher consistency.
In Fig. a, the axis lists six methods: kdp_maesaka, kdp_vulpiani, phase_proc_lp, kdp_from_phidp, kdp_schneebeli, and kdp_iris. The axis measures the WD values ranging from 0 to 2. The boxplot for the kdp_schneebeli method shows the largest WD, with a median of 0.33, an IQR from 0.18 to 0.45, and several outliers. The other methods (kdp_maesaka, kdp_vulpiani, phase_proc_lp, kdp_from_phidp, and kdp_iris) have median WD values ranging from 0.0 to 0.1, with smaller IQRs and fewer outliers. In Fig. b, the kdp_schneebeli method is excluded, allowing for a clearer comparison of the kdp_maesaka, kdp_vulpiani, phase_proc_lp, kdp_from_phidp, and kdp_iris methods. The axis is rescaled to range from 0.0 to 0.2 for better visualization. The phase_proc_lp method has the lowest WD median at 0.01, with a narrow IQR from 0.008 to 0.018. The kdp_from_phidp method has a significantly larger WD median of 0.098 and an IQR from 0.077 to 0.122. The kdp_maesaka and kdp_iris methods have WD medians of 0.026 and 0.041, respectively, with moderate IQRs and few outliers. The kdp_vulpiani method has a moderate WD median of 0.049 but a noticeably wider IQR from 0.033 to 0.096 when compared to kdp_maesaka, phase_proc_lp, kdp_from_phidp, and kdp_iris.
The large WD median of kdp_schneebeli indicates that it performs worse compared to the other methods, overshadowing the performance differences among the remaining methods. Additionally, the large IQR of kdp_schneebeli implies that the method does not perform consistently, thus reducing its reliability. The phase_proc_lp method demonstrates the best and most consistent performance, with the lowest WD median and narrowest IQR. These results additionally indicate that the distribution of estimated from phase_proc_lp is the closest to . It is important to remember here that phase_proc_lp is supported by self-consistency relations constraining the estimates based on observations, ultimately enhancing its accuracy and stability. The moderate IQR and significantly larger WD median of kdp_from_phidp indicate that its performance is consistent, albeit less accurate relative to the other methods. The kdp_vulpiani method, in turn, has a moderate WD median but relatively larger IQR, indicating better accuracy than kdp_from_phidp, although less consistency. The kdp_maesaka and kdp_iris methods show similar consistency and accuracy, evidenced by their relatively low WD medians and moderate IQRs. These findings suggest that while kdp_schneebeli is the least accurate and consistent, the performance among the remaining methods varies, with phase_proc_lp presenting the highest robustness, provided that the method with quality controlled and optimized self-consistency settings is used.
3.3 Consistency analysis of retrievalsEach method has a unique combination of mathematical approaches, data requirements, and constraints (see Table ), indicating uniqueness in the fields produced. The similarity or dissimilarity of these outputs is not clearly visible from the metrics computed or from the scatterplots displayed in Sect. . To answer this question, we study the consistency among methods using the vs. correlation plots shown in Fig. . Each scatterplot in Fig. shows the relationship between estimated by a method ( axis) with respect to estimated by a different method ( axis), and the Pearson correlation coefficient () is shown in the upper-left corner of each scatterplot. The axes range from 0.5 to 3.0 ° km−1 to include negative estimates. This part of the analysis does not require any ground-truth framework, allowing the use of the entirety of the radar dataset, i.e., including the attenuated observations (see red data in Fig. ).
Figure 15
Correlation between the estimation methods. Each scatterplot shows the relationship between two different methods without repetition, and no method is compared to itself. The and axes represent the estimated by a method, in units of ° km−1. Each panel shows the Pearson correlation coefficient between the two methods compared.
[Figure omitted. See PDF]
In Fig. , the scatterplot of kdp_iris against kdp_vulpiani shows the best correlation among the methods, illustrated by the data significantly clustered along the diagonal and corroborated by the highest of 0.66. The kdp_iris and kdp_vulpiani methods correlate similarly with phase_proc_lp, indicated by the second-highest of 0.65 for both. In relation to kdp_maesaka, the consistencies of kdp_iris and kdp_vulpiani are rather moderate, whereas in relation to kdp_from_phidp and kdp_schneebeli, they are significantly poorer. Among the methods, kdp_schneebeli correlates the least with any of the other methods, evidenced by the data widely spread along the axes and showing negligible clustering along the diagonal. In particular, kdp_schneebeli against kdp_from_phidp shows the worst consistency, with 0 and the majority of the data clustered around the and axes. The phase_proc_lp method correlates moderately with kdp_maesaka, with an 0.41, although the scatterplot does not exhibit any particular pattern or clustering of data along the diagonal. Relative to kdp_from_phidp, phase_proc_lp shows significantly lower despite the clear data correlation off of the diagonal. However, the small value becomes evident when observing the dense clustering of data around 0 ° km−1 for phase_proc_lp. This result indicates that the consistency between kdp_from_phidp and phase_proc_lp is highly influenced by the negative estimates in kdp_from_phidp that are mapped to 0 ° km−1 in phase_proc_lp. Overall, the scatterplots show that kdp_from_phidp underestimates relative to the other methods. The kdp_maesaka method shows no significant correlation with any method, with the largest being 0.41 relative to both phase_proc_lp and kdp_vulpiani.
4 ConclusionsIn this study, we conducted a comprehensive evaluation of several estimation methods using C-band weather radar data, with a focus on their performance in rainfall observations. We employed a self-consistency framework that links and observations, with as the basis for our evaluations. This approach allows for the construction of the reference observations that can be used to assess the accuracy and robustness of the estimation methods studied. The use of the self-consistency framework requires rather strict quality control, which is described in the paper. In this way, our study focuses on the performance of the methods in highly idealized rainfall observations.
Some (four out of six) of the estimation methods have user-configurable parameters. Using the evaluation framework proposed, we were able to define optimized parameter settings. Most of the methods showed significant improvement in the performance after the optimization.
By comparing the relative performance of the estimation methods over the range of rain intensities, as characterized by the radar values, we have found significant differences in the performance of the methods evaluated. Overall, implementations of the , , and methods exhibited the lowest NRMSE and normalized biases over the range of values studied, from 20 to 50 dBZ.
Our comparative analysis revealed that while the implementation of the method stands out for its high accuracy and precision, its performance is heavily dependent on the self-consistency constraint provided. Without proper optimization of the self-consistency relation, linking of and , and quality control of , even the best window length setting for this method can lead to suboptimal results, i.e., higher RMSE and underestimation at higher values. It should be noted, however, that the reference framework and the method use self-consistency relations to determine , and, therefore, the uncertainties are correlated, and part of the reported performance is caused by this dependence. The implementations of and showed good performance and do not require the use of other radar variables, which potentially make them less sensitive to radar data quality issues, such as calibration and attenuation.
An additional qualitative comparison of the performance of the methods was carried out by computing correlations of derived values from the dataset that also included attenuated radar observations. The correlation between values estimated using different methods is not very high. The highest correlation values (0.65–0.66) were observed between the , , and methods. This indicates that uncertainty between different precipitation estimates could stem from the differences in the methods used.
The study is based on a self-consistency framework that limits its use to the cases where no significant attenuation is observed. Additionally, the scope of our study is limited to the Finnish climatology and a single radar frequency, namely C-band radar observations. Despite these limitations, our findings offer valuable guidance for the use of estimation methods in rainfall observations. These results have significant implications for both operational radar networks and hydrometeorological research, where the accuracy, precision, and stability of estimates are crucial.
Appendix A Influence of the self-consistency constraint on phase_proc_lp
Figure shows the same scatterplots as Fig. , with estimated from phase_proc_lp using self_const of instead of . The motivation behind this was to study the performance of phase_proc_lp with little influence from self-consistency constraints. In , the non-negativity condition in estimates is ensured by restricting the vectors: 0. In addition, to produce more realistic estimates, they introduced the self-consistency relation to bound the estimates based on observed , requiring that the user provide quality controlled data. The restriction of the vectors becomes , which in phase_proc_lp is implemented as self_const. Therefore, a self_const value that is 2 orders of magnitude larger was used in this study to test the performance of phase_proc_lp with a significantly reduced influence of self-consistency constraints. The scatterplots show data clustered around up to 35 dBZ. Beyond this threshold, precision and accuracy decay significantly regardless of the window length. However, in scatterplots with larger window lengths, data are less scattered across the entire range and are only slightly less accurate after 35 dBZ.
Figure A1
Scatterplots of estimated from phase_proc_lp as a function of reflectivity and for various values of window_len. Panels (a)–(h) show results with window_len values from 5 to 40 when fixing coef to 0.914 and self_const to . The dashed black lines correspond to .
[Figure omitted. See PDF]
To further investigate the effects of the self-consistency constraint on phase_proc_lp, Fig. a–b show the normalized RMSE and bias of (estimated with self_const ) relative to . Interestingly, the normalized RMSE in Fig. a behaves inversely to the normalized RMSE in Fig. , whereas normalized bias shows similar behavior for both. The opposite behaviors in normalized RMSE results indicate that window length has a strong impact on the performance of phase_proc_lp, depending on whether adequate self-consistency settings were provided; if so, smaller window lengths yield better performance by capturing fine-scale precipitation features, especially in heavy precipitation. In the opposite case, larger window lengths yield better performance by oversmoothing , thus reducing the impact of noise at the expense of losing fine-scale precipitation features. The oversmoothing effect from larger window lengths in is also implied from the normalized bias shown in Fig. b; larger window lengths produced the largest absolute biases at both extremes of the range. In addition, even though the normalized bias shows similar behavior for self_const and self_const , the latter produces larger differences between window lengths, indicating that the high accuracy and precision of phase_proc_lp predominates in smaller window lengths, provided that there are adequate self-consistency constraints and quality controlled .
Figure A2
Panel (a) shows RMSE normalized by the interval-averaged of phase_proc_lp relative to as a function of reflectivity and for various values of window_len; panel (b) shows the same as panel (a) but for the normalized bias metric.
[Figure omitted. See PDF]
Data availability
The raw radar data and dataset can be accessed via the link at 10.57707/fmi-b2share.4126c5db27d24ddeae10d5c3163ff95a . This includes the raw radar data and the -processed data used to analyzed the estimation methods. The data have been processed using Python and include the following.
- -
The radar folder includes several subfolders, such as yyyy/mm/dd/iris/raw/VAN. The VAN subfolder includes the .raw radar with plan position indicators (PPIs) observed by Vantaa radar at an elevation angle of 0.7 for a specific time: yyyymmddHHMM_VAN.PPI3_B.raw. This data can be read with Py-ART
10.5334/jors.119 . - -
The folder _data includes five .hdf5 files storing tables containing information about date (in pandas numerical value). It requires transformation to a date–time object), (in dBZ), (in dB), attenuated gate (Boolean), theoretical or self-consistent (in ° km−1), or computed (in ° km−1) from a given method for different settings. The method is indicated in the name of the file as kdp_method_scatter.hdf5, where the method can be one of the following.
- -
iris_sch refers to table containing from the iris software (used in the Finnish Meteorological Institute) and computed from Py-ART's implementation of
10.1109/TGRS.2013.2287017 . These two methods were computed together because only one output was retrieved. They do not feature any user-configurable parameters to test. - -
mae refers to table containing computed from Py-ART's implementation of (,
http://www.meteo.fr/cic/meetings/2012/ERAD/extended_abs/QPE_233_ext_abs.pdf ). The columns correspond to computed by varying the parameter Clpf. - -
vulpiani refers to a table containing computed from Py-ART's implementation of
10.1175/JAMC-D-10-05024.1 . The columns correspond to computed by varying the parameters windsize and n_iter. - -
pplp refers to table containing computed from Py-ART's implementation of
10.1175/JTECH-D-12-00147.1 . The columns correspond to computed by varying the parameter windowlen. - -
wradlib refers to table containing computed from radlib's implementation of
10.1175/JAMC-D-10-05024.1 . The columns correspond to computed by varying the parameters winlen and dr.
- -
The disdrometer dataset to obtain the DSD parameters can be accessed via the link provided in (,
Author contributions
MA conducted the investigation process, collected the data, and performed the formal analysis of the data and visualization; MA, SP, and DM designed the methodology; SP, AL, MK, and DM formulated the research goals and aims; AL and DM provided data; MA prepared the paper draft; and MA, SP, AL, MK, and DM reviewed, commented on, and edited the paper.
Competing interests
The contact author has declared that none of the authors has any competing interests.
Disclaimer
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
Acknowledgements
We thank Jenna Ritvanen for her valuable comments for improving the visualization of the data; we thank colleagues from the Early Career Community at the Finnish Meteorological Institute for feedback on how to make the paper understandable also for a general audience.
Financial support
Open-access funding was provided by the Helsinki University Library.
Review statement
This paper was edited by Gianfranco Vulpiani and reviewed by two anonymous referees.
© 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.