1 Introduction
Microphysical processes in clouds involving ice particles contribute to major uncertainties in cloud formation, evolution and precipitation formation and subsequently to radiative properties associated with these clouds on both regional and global scales .
A major limitation remains the accurate phase partitioning between water and ice particles, which requires reliable discrimination of supercooled liquid cloud droplets from ice crystals in mixed-phase clouds (MPCs).
For instance, for the same water content, ice clouds are optically thinner compared to liquid water clouds and thus reflect less shortwave radiation back to space , while simultaneously trapping more longwave radiation in the Earth's atmosphere system, due to their lower cloud top temperatures.
Recently, reviewed MPCs
and concluded that a major limitation in studying these clouds is associated with the availability of instrumentation that is able to distinguish cloud droplets from ice crystals. For particles below (diameter) reliable measurements are especially scarce .
The operating principle of many cloud probe instruments, such as the Cloud Droplet Probe
Nevertheless, particle size reported by optical scattering instruments is often used to infer particle phase during continuous flow diffusion chamber (CFDC) studies
To overcome such limitations, scattering probes with phase-discriminating capabilities have been deployed.
A common way to determine hydrometeor phase in CFDC studies encompasses polarization analysis
Over the past 2 decades, many instruments analyzing spatially resolved scattering profiles have been built for the purpose of cloud particle detection
A major limitation of these devices is the low frame rate on the order of a few tens of particles per second, which limits the number of single particles sampled. This becomes problematic when hydrometeors of different phase are inhomogeneously spatially distributed within clouds . Besides, high frame rates are beneficial for laboratory studies, where particle number concentration easily exceeds a few tens of particles per cubic centimeter, and low frame rates lead to coincidence errors even at moderate number concentrations .
In order to overcome the low frame rates of previous optical devices, deployed linear complementary metal-oxide semiconductor (CMOS) arrays, rather than a CCD camera, when investigating the light scattering of asbestos fibers. The reduction in light scattering (pixel) information captured by two linear CMOS array as compared to the complete 2-D scattering pattern recorded by a CCD camera allows for significantly increased detection rates , as less information has to be processed and stored. Recent advances in data analysis techniques available within the field of machine learning (ML) provide powerful tools to handle large data sets. Rather than following a rule-based scheme to split particles within a data set, ML techniques determine the parameters (called predictors) that categorize the data best. Hence, these methods provide enormous potential to facilitate the particle classification problem faced in MPC studies and thus push our understanding of the microphysical processes therein.
Here, we present a new instrument called the High-speed Particle Phase Discriminator (PPD-HS) to discriminate cloud hydrometeor phase.
Specifically, PPD-HS developed by the University of Hertfordshire, UK, is designed to capture the spatial intensity distribution of forward-scattered light by airborne particles on two linear CMOS arrays on a particle by particle (PbP) basis.
Since the resulting scattering pattern is a function of particle size, shape and orientation with respect to the incident light as well as the polarization and wavelength of the incident light , some morphological features of the particle can be inferred , with size and sphericity being of interest to this study.
To assess the performance of PPD-HS, characterization experiments have been carried out using laboratory-generated particles of known and well-defined size and geometry. Using this calibration data set, a random forest model is trained to classify particles detected by PPD-HS.
Finally, we examine the discrimination of simulated cloud hydrometeors as either ice crystals or cloud droplets from a set of experiments wherein PPD-HS was coupled to the horizontal ice nucleation chamber
2 Description of PPD-HS
2.1 Overview and flow configuration
Particles are drawn into the instrument by an external pump through the inlet, a inner diameter stainless steel pipe, tapered to over a length of .
The resulting laminar flow has a parabolic flow profile, causing elongated particles such as fibers, columns and needle-shaped ice crystals to be preferentially aligned along the flow axis
2.2 Particle detection and sizing
Inside the scattering chamber, shown in Fig. a, particles are exposed to two laser beams, as illustrated in Fig. b. Firstly, particles pass a continuous wave trigger laser beam ( diode laser, Optoelectronics Inc., part no. HL6501MG), which is used for particle detection and sizing. The beam is apertured to transform the initial Gaussian intensity profile into a top-hat profile. The beam is then expanded to a width of in the plane normal to the sample flow and focused at the intersection of the particle flow and the trigger laser to a depth of . As the sample flow, which has a diameter of approximately at the position of the trigger laser, lies wholly within the trigger laser beam, the so-called sensing volume of the trigger laser (gray dot on top of the trigger laser beam in Fig. a) is defined mechanically by the flow speed and inlet size. The focal depth of the trigger beam of constrains the size of particles that can be detected by PPD-HS to this diameter, as each particle needs to be contained within the focus of the trigger laser beam in order to cause a trigger and become recorded later. Scattered light between approximately and to the laser beam axis (light orange shading) is collected by a spherical mirror (M1, Edmund Optics, part no. 43467) and focused onto a silicon photodiode (Edmund Optics, part no. 54-035). Over this angular range, dominated by sideways scattered light, a near monotonic relation between photodiode intensity and particle size is achieved.
Figure 1
Schematic of PPD-HS along with an illustration of its working principle. (a) Detailed top view of the optics, (b) simplified side view illustrating vertical displacement of the lasers used for particle detection and sizing, as well as imaging and (c) the corresponding signal and laser switch. L1 to L7 denote optical lenses, M1 a parabolic mirror, IL an imaging laser, TL a trigger laser and PD a photodiode. Light orange and brown shading in panels (a) and (b) correspond to light scattered by particles when passing the trigger laser beam and image laser beam, respectively, and which are ultimately detected by the PD and the CMOS arrays.
[Figure omitted. See PDF]
2.3 Particle imagingThe intensity signal recorded by the photodiode is used to assess particle size and whether the particle is central to the sample flow. If the intensity of the light scatter meets a user-defined threshold, a trigger signal is sent to the image laser to initiate the subsequent scattering pattern acquisition process. The image laser is vertically displaced from the trigger laser by approximately (Fig. b). The image laser is a pulsed , multimode diode laser (Optnext Japan Inc., part no. HL6388MG). The beam is focused at the intersection with the sample flow and apertured twice before reaching the scattering volume. The first aperture limits the width of the beam to approximately in order to reduce stray light and evens the energy distribution across the beam to limit the variation to approximately %, minimizing erroneous classification of particles with trajectories close to the beam edge. The second aperture, in front of the scattering volume, further reduces stray light. Laser firing and duration times are adjustable by the user through the trigger delay and integration time (Fig. c), in order to account for the different total flow rates at which PPD-HS can be operated. For particles illuminated by the image laser, light scattered in the forward direction between and (vertical, azimuthal angle is ) is collected and passed through an optical assembly composed of a series of lenses (L2–L5) before impinging on the detector arrays (light brown shading, Fig. a). The lens assembly is designed such that a beam dump, mounted around the center axis of the image laser beam on the surface of L2, behind the sensitive volume, absorbs the direct laser beam. The use of the two cylindrical lenses L2 and L3 allows for the vertical compression of the scattering pattern and thus increases the elevation angle of scattered light detected by the CMOS arrays. The lenses L4 and L5 reduce the size of the image independently in the horizontal and vertical planes, respectively, yielding the elliptical output image ultimately captured by the linear CMOS array system. The detector arrays are composed of two linear CMOS arrays (Hamamatsu Corp., Japan, model S9227) aligned vertically and spaced laterally and symmetrically away from the center of the optical axis. CMOS arrays are chosen to exhibit an almost linear response over a wide dynamic size range in order to cover a particle size range between and . At the same time, CMOS arrays have both a high readout speed, allowing for high particle detection rates, and a small pixel size, producing high-resolution scattering patterns, with each array being composed of vertically aligned, individually sensed pixels. The intensity signal recorded by each pixel is integrated during the activation of the image laser (integration time, Fig. c) and corrected by a background value (see Supplement Sect. S5.1).
The example in Fig. a illustrates an interference pattern that is radially symmetric around the center line of the optical axis, comprised of concentric intensity maxima (white rings) obtained from spherical particles, such as liquid cloud droplets. The scattered light, falling on the detector plane indicated by the squares on top of Fig. a, defines a two-dimensional transformation of the three-dimensional intensity distribution of the light scattered by a particle in the near forward direction. Furthermore, the light intensities recorded by the two linear detector arrays denote one-dimensional strips out of the complete two-dimensional scattering image (red and green colored pixels). In the case of a spherical particle the light scattering information captured by the detectors of PPD-HS is given by the intersection of the CMOS arrays with the diffraction fringes, resulting in a scattering pattern, as illustrated in Fig. d, where the vertical dimension covers the 512 pixels and the horizontal axis indicates relative forward light scattering intensity for the two CMOS arrays, shown in different colors for clarity. Symmetry evaluation of these scattering patterns is subsequently used to determine particle shape, with spherical and aspherical particles producing symmetric and asymmetric scattering patterns, respectively (see Sect. ). As particle phase is ultimately related to particle shape, we constrain our discussion to particle shape in the following.
All data on optical particle size and scattering patterns are collected for individual particles and thus allow for quantifying optical properties on a PbP basis.
2.4 Electronic data acquisition configuration
There are two different sets of electronics reading out the CMOS data. One to perform rapid analysis of the CMOS data, providing real-time feedback to the user containing information of different scattering pattern parameters along with particle size and information of the instrument settings.
The other electronics board collects and stores the raw data, i.e., the intensity values for the individual pixels of the detector arrays, for subsequent offline processing and analysis.
Both boards are based around a field-programmable gate array.
Reduction of scattering information, along with implementation of fast electronics, is key for the high detection rate of PPD-HS (see Supplement Table S1), with CMOS dead time being the rate limiting factor in the case of PPD-HS (see Supplement Sect. S5.3).
We note that the minimum dead time of PPD-HS ( ) is relatively high compared to that of, e.g., SID-2H
The readout of the real-time electronics is triggered by an analogue circuit detecting an intensity peak from the photodiode (see Fig. 1c). It stores this peak value for later conversion to particle size and controls the pulsing of the image laser, as described above. When the real-time electronics begin the readout of the CMOS arrays, it signals the raw electronics to readout simultaneously, which is not connected to the photodiode and thus its output does not contain any information on particle size (see Supplement Sect. S4.1). If either set of electronics is still processing when another particle arrives, that board will not readout that trigger. This means that particle information of the two electronics boards are complementary but not congruent.
2.5 Phase discrimination indicators
Figure d–f show examples of CMOS array data for individual airborne particles of different shapes.
Figure 2
(a–c): Example 2-D scattering patterns captured by SID3 , showing (a) a cloud droplet, (b) an ice crystal and (c) a columnar ice crystal. The forward scattering pattern of the cloud droplet reveals rotational symmetry, while those of the ice crystal and the fiber do not. The pixels of the CCD camera are schematically indicated by the gray squares on top of panel (a). The reduced information captured by the linear CMOS arrays of PPD-HS is highlighted by the red (array 1) and green (array 2) squares, respectively. Panels (d)–(f) show scattering data from the linear CMOS arrays, showing relative forward-scattered light intensity for the individual pixels comprising an array for (d) a spherical particle, (e) an aspherical particle and (f) a fiber-like particle. Images (a)–(c) provided by Chris Stopford. Please see figures below for further scattering patterns. Scattering patterns of spherical particles, where multiple diffraction fringes are detected by the linear CMOS arrays can be found in, for example, Supplement Fig. S20.
[Figure omitted. See PDF]
As can be seen from Fig. a, the rotational symmetry of a spherical particle translates into a nearly perfect azimuthal symmetry in the 2-D scattering pattern, i.e., nearly perfect alignment of the intensity peaks across the two arrays, as well as symmetry around the center line of each array. On the other hand, the scattering pattern of a more complexly shaped, aspherical particle is a jagged pattern, showing multiple randomly arranged peaks with overall little symmetry among these peaks within one array and between the two arrays (Fig. b). Finally, needles and/or columnar particles are characterized by very distinct scattering patterns comprised of a single sharp peak along each array, as shown in Fig. c. Here, fiber-like particles were used to allow for discrimination between columnar ice crystals and (hexagonal) plates, both constituting an aspherical particle class. For particle shape evaluation and associated phase classification, a combination of different indicators is calculated from these scattering patterns, as described in the following.
Similarly to we calculate a peak-to-mean (PTM) ratio given as follows: 1 where, denotes the maximum intensity recorded by a pixel on array and the mean scattering intensity along that array. Fiber-like particles, characterized by an intense and narrow peak on both arrays (Fig. c), will yield relatively large PTM values, whereas values for spheres and aspherical (non fiber-like) particles will be significantly lower, as their intensity patterns cover a larger pixel range. It should be noted that an asphericity factor as used by, for example, and cannot be calculated due to the linearity of the detector arrays in PPD-HS. calculate an asphericity factor (), as a measure of variation in scattered light intensity detected by the pixels across their circular detector array. For highly symmetrical scatterers, i.e., spherical particles, producing ring-shaped interference patterns, should theoretically yield values of zero. More aspherical particles yield larger values. For our linear detector system, we developed a new shape indicator, called top-to-bottom comparison (TBC), that can be used to investigate the symmetry of a scattering pattern along one array. TBC sums the absolute differences of equidistant pairs of pixels around a midpoint pixel, normalized by the maximum intensity. Mathematically the TBC is defined as follows: 2 where the subscript denotes the midpoint pixel of a CMOS array and the pixel number relative to this midpoint pixel. The number of pixels relative to the midpoint that are considered for the TBC calculation is given by . Perfectly spherical scatterers would also yield TBC values of zero (neglecting electrical noise), whereas aspherical particles such as ice crystals yield larger TBC values. Normalization to the maximum intensity is needed to compare the TBC of hydrometeors of different (physical) sizes, which produce light scattering patterns of different absolute intensities (see, for instance, the intensity values in Fig. d–f). At the same time, normalization to the total number of pixels considered for TBC calculation () ensures comparability of TBC values with different midpoint configurations, i.e., pixel information content. The TBC values of both arrays can moreover be used to derive further sphericity parameters, e.g., the ratio of both TBCs or their absolute difference, which can improve particle classification.
Similar to the TBC, the scattering patterns captured by each CMOS array can also be directly compared. We therefore calculate the so-called array intercomparison (AIC) indicator as follows:
Again, an equal number of pixels is compared around the midpoint pixels but also across the two CMOS arrays. The total number of pixels considered for the calculation of AIC is thus given by . For spherical particles, the relative intensities captured by both arrays should theoretically cancel out, whereas aspherical particles, causing the intensity peaks to be randomly distributed along the CMOS arrays, yield larger values.
It should be noted that the midpoints are not necessarily equivalent to the physical center (pixel 256) of the CMOS array but were found to be and for array 1 and array 2, respectively (see Appendix ).
3 Methods3.1 Experimental setup
3.1.1 PPD-HS calibration measurements
Different types of particles were used for PPD-HS calibration experiments. A vibrating orifice aerosol generator (VOAG, TSI Inc., model 3450) was used to produce almost monodisperse populations of both spherical and aspherical particles as proxies for cloud droplets and ice crystals, respectively (see Supplement Fig. S12). The experimental setup is shown in Fig. .
Within a VOAG, moving a solution with a constant flow rate through a micrometer-sized metal orifice results in a constant cylindrical liquid jet ejected from the orifice. The assembly holding the orifice vibrates at a constant frequency, which periodically breaks up the liquid jet, resulting in solution droplets of uniform mass.
In an evaporation column downstream of the VOAG, the solvent evaporates, resulting in equally sized solute particles.
In order to produce spherical particles, solutions of 2-propanol and polyethylene glycol (PEG-400, Sigma Aldrich, BioUltra 400, hereafter referred to as PEG) were used.
The low vapor pressure of PEG at room temperature ( ) allows for production of uniform droplet sizes. PEG has a refractive index of at , and we assume a shape factor of unity.
For the production of aspherical particles, mixtures of 2-propanol, Milli-Q water and were used, resulting in formation of solid salt crystals upon evaporation of the solvents.
The VOAG was operated with a and diameter orifice, a dispersion air flow rate of and a dilution air flow rate of , using particle-free compressed air.
Adjustment of the vibration frequency and solute-to-solvent ratio allows the production of different particle sizes. Here, we generated particles with diameters between approximately 3 and 32 , covering the typical size range of freshly formed cloud particles, where phase segregation remains an ongoing issue. At the same time it covers the size range of hydrometeors typically formed within CFDCs
Figure 3
Experimental setup used for PPD-HS calibration measurements. A VOAG was used to generate both spherical and aspherical particles of different sizes. An APS was operated in parallel to PPD-HS, in order to monitor the particle size distribution (see example size distribution in Supplement Fig. S12). A vacuum pump, attached downstream of PPD-HS, is used to draw air through PPD-HS.
[Figure omitted. See PDF]
3.1.2 PPD-HS coupled to HINCA series of cloud chamber experiments were performed to simulate liquid, MPC and cirrus cloud conditions.
Generation of ice particles and cloud droplets is achieved by means of HINC, operated upstream of PPD-HS. HINC is a CFDC and has recently been described in detail elsewhere .
Experiments using HINC were conducted using two different aerosol species.
In a first set of experiments particles were used, atomized from a solution and size-selected for a mobility diameter of using a differential mobility analyzer (DMA, TSI Inc., classifier model 3080, with a 3081 column and a polonium radiation source) and an aerosol-to-sheath flow ratio of .
aerosol was used in order to produce cloud droplets at and RH %, as well as ice crystals at where homogeneous freezing rates are expected to cause significant freezing .
These experiments provide droplet-only and ice-only cases in order to cross-check the shape discrimination capability of PPD-HS.
In a second set of HINC experiments we used illite NX (Arginotec, NX, Nanopowder), a mineral dust aerosol, dry suspended from a fluidized bed aerosol generator (FBAG, TSI Inc., model 3400A) and DMA size-selected to , operated at an aerosol-to-sheath flow ratio of . Illite NX is widely used in ice nucleation experiments
3.2 Particle shape classification: supervised machine learning
Particle shape is ultimately inferred from classification of the PbP data using a supervised ML approach. Supervised ML is frequently used for laboratory-collected data sampled under controlled conditions, where correct classification outcomes are known through the production process of the particles. In general, supervised ML algorithms are used to assign new, unknown data to predetermined classes through similarity analysis of the new data and the data comprising the a-priori-determined target classes. In the case of PPD-HS, the target classes are spherical particles (cloud droplets) and aspherical particles (ice crystal). Thus, these target classes require training data that cover the given classes . We have generated calibration data sets for each of these classes using the VOAG experimental setup as described in Sect. . Training of the random forest model was constrained to the PEG and particles generated by the VOAG, which we assume to have similar shape properties to spherical cloud droplets and aspherical ice crystals, respectively. Moreover, the vertical alignment of the VOAG setup allowed us to cover a larger size range without changing the experimental conditions, than would have been possible within the horizontal CFDC setup. Finally, by excluding the HINC-PPD-HS data from model training, these data sets provide independent data for testing classification model performance.
The supervised ML approach used to classify the PPD-HS PbP data is a random forest model, based on MATLAB's (MathWorks Inc., R 2018b) TreeBagger algorithm.
A random forest model constitutes an ensemble of decision trees that are combined through averaging over multiple individual trees. Decision tree approaches have previously been used for cloud particle classification specifically
In an individual decision tree, a matrix of feature vectors encompassing the PbP data is used, and the tree is formed through consideration of all possible splits across all predictors (variables), ultimately choosing the predictor that divides the data best, through maximization of particle class separation within the entire phase space. However, individual decision trees have the tendency to over-fit the data and are thus often too specific to the training data used; i.e., they do not cover the entirety of particle types that might be collected during conditions not encountered in the training data set, with the latter being a general disadvantage of supervised ML approaches . Random forest models, on the other hand, i.e., ensembles of multiple decision trees, allow for the over-fitting aspect to be relaxed and adhere a higher level of generalization, due to the random statistical methods used for model construction. Contrary to basic decision trees, only a random subset of predictors is used at each decision node within a tree. Besides, only a random subset (bootstrap sample) of the entire training data set is used to grow the decision tree during training, through sampling with a replacement, allowing individual particles to be selected multiple times or not at all. Thus, randomness is introduced through both choosing the predictors used for decision splits and bagging the training data. In fact, each tree of the random forest ensemble constitutes an independently trained model, grown on an equally sized, independent sample from the entire training data set. The unsampled particles of the training data set are referred to as out-of-bag (OOB) observations. These OOB provide a means to ensure training and testing is not performed on the same particle data, allowing for model cross-validation, i.e., estimation of the classification error of the model. This is achieved through prediction of particle class for these unsampled data and comparing to the true particle label (see Supplement Sect. S7). In our model, each particle is ultimately assigned a class from each tree within the random forest and final particle type prediction is derived from the type most frequently chosen, equally weighting the predictions of all trees.
For building the random forest model for the PPD-HS data, the calibration data sets of all target classes (particle shapes) were processed to calculate the PbP matrix of parameter values. Particle usability was checked using a minimum variance criterion of (see Appendix ). Particles not fulfilling this criterion were rejected from the respective target class and assigned as noise. In addition, particles were required to have a minimum mean intensity of and a positive PTM value along either array, in order to discard particles affected by bad CMOS backgrounds (see Supplement Sect. S5.1). Any PEG or NaCl particle produced as described in Sect. and fulfilling these usability criteria is defined as a spherical or aspherical particle, respectively, within the calibration data set, without further visual inspection of the scattering pattern. However, it should be noted that this approach does not filter out aspherical NaCl particles associated with rather symmetric scattering patterns and consequently low values for the symmetry parameters (see Supplement Fig. S16). Hence, this can explain the overlap in the distribution of for instance of the TBC values of both particles classes (see Supplement Fig. S13a and b) and thus potential misclassification of asymmetrical as spherical particles.
In order to identify those parameters, which are best suited for particle shape determination out of the pool of parameters calculated from the scattering pattern data, describing the data cloud in an -dimensional parameter space, we have performed a principal component analysis (PCA; see Appendix ). PCA results revealed that , , and AIC yield the most robust shape analysis. We therefore trained our random forest model using a matrix of feature vectors constrained to , , and AIC, which we refer to as shape indicators in the following. Using these four predictors, we trained a random forest model on 400 000 randomly selected particles (training part), constituted of equal fractions of spherical and aspherical particles and using a total number of trees (see Supplement Sect. S7). These particles were randomly selected from the entire VOAG data set (see Supplement Table S1) covering the entire particle size range of 3–32 . Selection of a sufficiently large number of aspherical particles for training the random forest model ensures that a statistical particle majority show a TBC values different from those observed for spherical particles (see Fig. S13a and b). Classification performance was then tested on the remaining particle data (test data set, 4 371 162 particles, same size range) and subsequently applied to simulated hydrometeors from our HINC experiments, where particle size is usually constrained to diameters .
4 Results and discussion
4.1 Particle sizing
As stated above, particle size is inferred from the signal recorded by the photodiode, by relating the signal intensity to the scattering cross section of the particle using Mie theory, assuming spherical particle geometry and an isotropic refractive index. For spherical particles the diameter value can usually be calculated with reasonable accuracy. Aspherical particles, however, are ascribed a spherical equivalent size based on the photodiode intensity value and thus can be incorrectly sized . In Fig. , we show a comparison of the optical particle size obtained by PPD-HS and the corresponding aerodynamic diameter from the APS for the calibration data sets obtained with the setup shown in Fig. for those within the APS size range (0–20 ; see Supplement Table S1 for details). We find good agreement between both the optical and aerodynamic size for spherical and aspherical particles. It should be noted that even for the aspherical particles, where the relationship between optical and aerodynamic size is complicated by an unknown shape factor, we find reasonable agreement between the sizing of PPD-HS and the APS. From these measurements we conclude that PPD-HS sizes particles in the range up to approximately , where we can compare to our APS measurements, with reasonable accuracy.
Figure 4
Particle sizing of PPD-HS and APS for spherical (PEG) and aspherical () particles, showing the geometric mean of the optical diameter () determined by PPD-HS as a function of the geometric mean of the aerodynamic diameter () obtained from the APS, using the calibration setup shown in Fig. . Vertical and horizontal error bars indicate the geometric standard deviation of the optical and aerodynamic mean size, respectively. Data points outside the size range of the APS are not shown. A complete list of the individual data sets, including those covering sizes , is given in Supplement Table S1.
[Figure omitted. See PDF]
This is further supported by comparing the particle sizes determined by PPD-HS, which we refer to as instrument response (AD), to theoretically predicted sizes for the PEG particles using Mie theory and taking into account the optical geometry of PPD-HS (see Supplement Sect. S8). In Fig. we depict the final calibration curves for the particle types used here, showing instrument response as a function of particle diameter. It can further be seen that the a maximum AD is reached for particles of approximately yielding an upper size limit for particles to be detected and recorded by PPD-HS (detector saturation). However, it should be noted that the maximum particle size tested here was 32 m and that an upper size limit of PPD-HS would need to be tested in future experiments.
Figure 5PPD-HS instrument response (AD) as a function of particle diameter for different particle types measured in this study. AD is a function of the particle scattering cross section and instrument properties such as photodiode sensitivity, signal amplification and laser power (see Supplement Sect. S8).
[Figure omitted. See PDF]
4.2 Particle shape classification: random forest modelIn Fig. we provide the classification results, when the trained random forest model is applied to the test data, encompassing both PEG and NaCl particles of sizes between 3 and 32 in diameter (see Table S1). The confusion matrix is derived from comparing the prediction of the model against the true particle type. In this matrix, the diagonal cells (green boxes) show the number of particles that are correctly classified and indicate the corresponding percentage from total number of particles in the test data set. The dark gray cell at the bottom right indicates the overall model accuracy, defined as the ratio of correctly predicted particles to the overall number of particles classified by the model. We find a high overall model accuracy with a true positive rate of %, i.e., a good discrimination of particle shape when applying our random forest model to test data set.
However, in the case of an imbalanced number of particles making up the individual classes of the test data set, the overall model accuracy yields a biased picture, as the class with the largest number of members will dominate the counting statistics. Since our calibration data set encompasses a larger number of spherical particles compared to the aspherical particles, it is more meaningful to assess the model performance for each class separately, yielding a per-class accuracy. These per-class accuracies are indicated in the light gray cells on the bottom row and indicated by the number of correctly predicted particles over the true number of particles within a target class. For instance, 592 168 particles are correctly classified as being aspherical, resulting in % of all particles belonging to the (true) aspherical target class to be correctly classified by the model. Higher classification performance is achieved for spherical particles with a per-class accuracy of %. We interpret the lower per-class accuracy of the aspherical class to result from small-scale aspherical features of some particles, which may result in calculated parameter values (TBC, , AIC) comparable to those of spherical particles. In addition, some of the misclassification can result from near-spherical NaCl particles within the training data set, as discussed above.
Figure 6
Confusion matrix of the random forest model applied to test data, i.e., the fraction of particles not used for model training, encompassing both spherical and aspherical particles of diameters between approximately 3 and 32 . The random forest model was trained using four predictor variables and 200 trees. The confusion matrix shows the true (i.e., target) class (columns) of the particles versus the predicted (i.e., output) class (rows). Colored cells: in each cell, the number of particles and percentage from total is given. Precision: the light gray column on the right-hand side indicates percentages of all the particles predicted to belong to each class that are correctly (green, true discovery) and incorrectly (red, false discovery) predicted. Per-class accuracy: the light gray cells on the bottom row give the percentages of all particles belonging to each class that are correctly (green, true positives) and incorrectly (red, false negatives) classified. Overall accuracy: values shown in the bottom right cell give overall model accuracy, i.e., the fraction of particles correctly (green) and incorrectly (red) predicted out of all particles classified by the model.
[Figure omitted. See PDF]
Yet, another way to quantify model prediction power is achieved by evaluation of the model precision. Model precision is given by the values on the light gray, right-hand side column, indicating the percentages of all particles predicted to belong to a class that are correctly (green) and incorrectly (red) predicted. We will refer to the incorrectly predicted values as false discovery rate (FDR). For instance, the FDR of the aspherical particle class is %. Hence, out of all particles predicted to belong to the aspherical class, % are incorrectly classified. Since the model precision is a measure of the closeness of repeated classifications by the model and does not involve a direct comparison to the true particle label, usage of the FDR is meaningful, as it directly yields an uncertainty for our model predictions when used to classify particles of unknown label. Assuming a data set independent of the FDR, the number of wrongly predicted particles for each class can be calculated from the total number of particles predicted to belong to that class.
4.3 Coupled HINC-PPD-HS measurementsTo further test the performance of PPD-HS, we coupled it to HINC for detection of simulated cloud hydrometeors. Changing the thermodynamic conditions within HINC allows simulation of clouds containing only ice crystals, only supercooled liquid cloud droplets or a mix of the two, akin to MPCs, depending on the aerosol, and RH used in HINC. In the following, the results of three experiments using and illite NX as seed aerosol are presented by applying the validated classification method derived from the calibration particles.
Figure 7
Freezing experiments of aerosol using HINC at and a residence time . (a) RH (blue line, left-hand ordinate) along with at HINC center line (orange line, right-hand ordinate), where aerosol particles are injected and hydrometeors are formed. (b) Classification of particles detected by PPD-HS (left-hand ordinate) using the random forest model (see Sect. ) along with the total number of detected particles (dashed orange line, right-hand ordinate). (c) Distribution of maximum TBC value and (d) AIC value of PPD-HS scattering patterns. (e) PPD-HS optical particle size distribution obtained from RT electronics. Vertical solid lines indicate starting and end times of periods of interest, respectively.
[Figure omitted. See PDF]
Figure 8Collection of example particle intensity patterns imaged with PPD-HS during period 1 (where misclassification is high) of the experiment shown in Fig. . All scattering patterns are shown on the same intensity scale ( a.u.). Background color of the individual scattering patterns indicates that the particle classification by the random forest model into target classes is spherical (blue) or aspherical (yellow). The values on top of each panel depict the TBC, where the first number corresponds to array 1 and the second number to array 2, respectively, and the AIC.
[Figure omitted. See PDF]
Figure 9As in Fig. but for particles imaged with PPD-HS during period 2 of the experiment shown in Fig. .
[Figure omitted. See PDF]
In Fig. , we show the results of an experiment at using aerosol. In Fig. a the temporal evolution of and RH along the center of HINC (where the aerosols are injected) is shown, representing a typical RH scan within a CFDC. It should be highlighted again that cloud particles in HINC are nucleated on the injected seed aerosol but also that supercooled liquid cloud droplets can only form once conditions of RH % are reached within the chamber at . Thus, the measurements in Fig. illustrate the response of PPD-HS to a pure ice cloud, established through homogeneous freezing of solution droplets formed by the particles. The experiment starts at low RH when the inlet valve is opened and particles are introduced into HINC. As the RH is increased within the chamber, the particles grow hygroscopically and form solution droplets. Ice crystals ultimately start to form, above homogeneous freezing conditions, as indicated by the dashed red line in Fig. a, where the gray enveloping shading indicates the uncertainty in RH across the aerosol layer in HINC . At % (11:47 UTC1), where the particles detected by PPD-HS sharply increase, as displayed in Fig. b, the first ice crystals that formed via homogeneous freezing have grown large enough to be detected. The delay of observed homogeneous freezing in our experiment compared to the theoretical predictions from the water-activity-based homogeneous freezing parameterization of solution droplets by can likely be explained by particles initially being below the detectable size of PPD-HS. In Fig. b we further show the particle type classification, as determined when applying the random forest model to the ice nucleation data. In the early stages of the experiment, particle classification is noisy, revealing strong fluctuation of the individual particle type fractions. After 11:49 UTC1 the majority of the particles are correctly classified as ice particles. We have highlighted two periods from this ice cloud experiment, as indicated by the vertical lines in Fig. , representing distinct periods at the beginning and end of the experiment, where particle misclassification is high and low, respectively. In Figs. and we show the corresponding scattering patterns of a random collection of particles in chronological order. Form the scattering patterns shown in Fig. it becomes clear that many ice particles show symmetric scattering patterns, i.e., small, freshly nucleated ice crystals that appear optically spherical. In that case that the axis of a hexagonal ice crystal is perfectly aligned with the optical axis of the image laser and, at the same time, the ice crystal is oriented in such a way that a symmetrical diffraction pattern is impinged on the detector arrays, low TBC and AIC values can result, which would cause the corresponding particles to get classified as spherical by our random forest model (see Supplement Fig. S1). Nevertheless, while this can cause some ice crystals to produce symmetrical scattering patterns, it is unlikely to explain all optically spherical ice crystals at low RH conditions in our experiments.
The optical properties of ice particles have been reported to depend on the formation process, ice crystals nucleated from the vapor phase generally depicting a higher degree of (optical) asphericity, compared to liquid-origin ice crystals, where smooth frozen droplets can form . While this would be in line with our homogeneous freezing experiment, where the solution droplets initially freeze into spherical frozen water droplets, we cannot exclude the formation of droxtals, i.e., frozen water droplets with faceted surfaces, complex crystals whose asphericity cannot be resolved by our instrument.
Figure 10Freezing experiments of aerosol using HINC at and a residence time . Panels and symbols as in Fig. .
[Figure omitted. See PDF]
Specifically, similarly high spherical fractions were found at low RH, when forming pure ice clouds heterogeneously on illite NX dust particles (not shown), i.e., ice crystals nucleated from and grown in the vapor phase.
Thus, we attribute the observed misclassification to the small optical particle size at these low RH values, at which particle asphericity cannot be resolved by our instrument, consistent with the findings of , who report ice particle roundness to be mainly a function of particle size.
In fact, spherical ice particles, commonly reported for cirrus clouds
Nevertheless, we note that there are particles classified as aspherical, even though the scattering pattern appears symmetrical, for instance, particles 39 and 46 in Fig. , which have values for the symmetry parameters comparable to particles classified as spherical (e.g., particle 29 in Fig. ). This likely results from using the reduced information (shape indicators) for particle classification within our random forest model. For instance, high TBC values are not completely exclusive for aspherical particles, as can be seen from the overlapping probability distribution of the TBC predictor shown in Supplement Fig. S13. This likely results from spherical particles where the symmetric scattering patterns show a (slight) offset to the midpoint pixel (see Supplement Fig. S15). Conversely, we also observed some particles to appear symmetrical (low TBC value) on one of the detector arrays (see Supplement Fig. S16) but usually not on both (see Supplement Fig. S14). Thus, using the information from two independent arrays should largely avoid and/or reduce misclassification of such particles. Overall, we cannot completely exclude the presence of artifacts within our training data sets, for instance particles producing symmetrical scattering patterns, which are not removed by our usability criteria. The consequence is that some particles become misclassified as, for example, aspherical, despite their overall symmetric scattering patterns (see above). This error could be reduced through manual visual inspection and manual selection and definition of particle class for every particle within the calibration data set, prior to training of the random forest model. This would require manual inspection of every particle within the calibration data set. This approach is time-consuming and impractical for the number of particles sampled and not done here, given the overall good classification of spherical and aspherical particles from the VOAG. The latter results from the majority of the spherical and aspherical particles within the calibration data set to distinctively differ in terms of their symmetry parameters (see Supplement Fig. S13).
Figure 11Freezing experiments of illite NX aerosol using HINC at and a residence time . Panels and symbols as in Fig. .
[Figure omitted. See PDF]
At approximately 11:51 UTC1 the aspherical fraction starts to dominate, consistent with the ice particles having established sufficient asphericity during diffusional growth, as the RH is increased within HINC and nonspherical features emerge. Using the corresponding size distribution from the RT electronics for the time period around 11:51 UTC1 (Fig. e), when the majority of the particles get correctly classified as aspherical, we find a minimum optical particle size of approximately to detect asphericity (see Supplement Fig. S19). After 12:00 UTC1, negligible observations are made of maximum TBC values below (Fig. c; see also Supplement Fig. S13a–b), both the aspherical and spherical fractions stay almost constant at approximately % and %, respectively. The high aspherical fraction is supported by the asymmetric scattering patterns observed during period , depicted in Fig. .
In Fig. we show the results from a pure supercooled liquid sample formed at within HINC using aerosol particles. Water supersaturated conditions (Fig. a) are required to activate cloud droplets within HINC and grow them to sizes detectable by PPD-HS. As expected, no particles are detected by PPD-HS at RH % because these should be below the detection limit. From Fig. b it becomes immediately clear that the hydrometeors formed within HINC are correctly classified as spherical particles, consistent with the absence of freezing of supercooled liquid cloud droplets formed on the at this temperature. While the high spherical fraction could partly result from the optical particle sizes being constrained to approximately below (Fig. e), the low maximum TBC values (Fig. c) along with the symmetrical scattering patterns (see Supplement Figs. S20 and S21) observed during the experiment are consistent with the classification results from our random forest model and at the same time reveal the power of using particle shape for phase analysis.
Finally, in Fig. we depict results when using illite NX aerosol particles within HINC to simulate MPC conditions at .
The first ice crystals are heterogeneously nucleated on the dust particles around 17:45 UTC1 at water subsaturated conditions, either through deposition nucleation or pore condensation and freezing
In the current PPD-HS configuration, the raw electronics are only triggered after the RT board is triggered by the photodiode. A direct triggering of both RT and raw electronics from the photodiode would allow for higher particle detection rates by the RT electronics, desirable for laboratory experiments with usually high particle number concentrations and at the same time allow the raw electronics to contain (optical) particle size information. The latter could be used to more closely investigate the (a)sphericity of small ice crystals, as well as a size-dependent particle classification by the random forest model. In PPD-HS, the raw electronics have the advantage of recording the complete scattering pattern, which allows the calculation of any particle parameters in a post-processing step, whereas the RT electronics have the benefit that they could (theoretically) achieve higher particle detection rates than presented here of approximately particles per second, but at the same time are limited to the a priori specified parameters (such as TBC, PTM, etc.) that need to be calculated. Furthermore, we have noted in Sect. that the CMOS dead time of PPD-HS is relatively high compared to similar devices. This causes the fraction of missed particles to be relatively high when sampling with PPD-HS at high particle number concentrations and using high total flow rates with consequences for sampling atmospheric MPC, where the ice fraction is (initially) low. Hence, upgrades to CMOS arrays with reduced dead time would be meaningful in view of potential future field applications of similar devices. Nevertheless, such changes do not affect the capability of using the reduced CMOS array scattering data to successfully determine particle shape.
Our coupled HINC-PPD-HS measurements are limited by the need for high supersaturation (or longer growth times), required to form ice crystals and cloud droplets of similar size, coupled with the horizontal alignment of the setup.
The fast growth kinetics for means, the cloud droplets quickly grow by diffusion to diameters , where phase can (reliably) be determined by PPD-HS.
However, for the residence times used in the experiments presented herein, these particles are close to being lost by gravitational settling prior to reaching PPD-HS, resulting in an optimization between enough particle growth and loss due to gravitational settling before being sampled by PPD-HS. Such losses are circumvented by CFDCs with vertical orientation
Finally, we have noted above that our random forest model is associated with a misclassification rate, resulting in some symmetrical scattering patterns to be classified as aspherical and vice versa (see Sect. and Fig. ). We have argued that this is a consequence of artifacts within the calibration data set (see Supplement Sect. S6.1), from which particles are randomly selected for the training of the classification algorithm. This error could be reduced and overall classification could be improved in future studies, upon manual cleaning of the calibration data set prior to model training.
6 Conclusions
A major challenge in MPC analysis remains the discrimination between cloud droplets and ice crystals. Here, a new instrument, the High-speed Particle Phase Discriminator (PPD-HS), has been presented and characterized for sizing cloud particles and determining their phase, with the goal to quantify the liquid and ice fraction in conditions relevant for MPCs.
PPD-HS captures the near forward spatial intensity distribution of scattered light on a single particle basis. Different from previous devices, such as the PPD-2K, which use CCD cameras to capture the complete 2-D scattering pattern, PPD-HS deploys two linear detector arrays, which capture a fraction of two 1-D strips out of the complete scattering pattern. This reduction of the scattering data recorded and analyzed on a single particle basis, combined with the implementation of fast electronics used to process this data, allows for the high particle detection rates of several hundred particles per second. Symmetry analysis of these 1-D scattering pattern is used to determine the shape of the light scatterer, which in turn is used to discriminate between spherical cloud droplets and aspherical ice crystals. Here, we introduced new shape indicators, the top-to-bottom comparison (TBC) and the array intercomparison (AIC), which can be used to determine particle phase from symmetry analysis of the scattering patterns captured by the two linear CMOS arrays. We presented a systematic instrument characterization of both particle size and phase determination in a well-controlled laboratory setup, which allows generation of nearly monodisperse spherical and aspherical particle populations, covering a size range of approximately 3–32 using a vibrating orifice aerosol generator. Supervised machine learning was applied to the laboratory-generated monodisperse calibration particles to train a random forest model. Applying the trained model to a test data set of similar particles we demonstrated high overall classification accuracy, with the model correctly classifying % of the particles. The classifier was subsequently used to classify simulated cloud hydrometeors sampled by PPD-HS in a set of CFDC experiments, using mineral dust (illite NX) and salt () aerosol, where the phase of the hydrometeors can thermodynamically be predicted. The results discussed in this paper show that for the case of an ice crystal only sample flow, our random forest model incorrectly classifies the majority of particles as droplets at early stages of RH scan within the CFDC experiment, consistent with the symmetrical scattering patterns recorded during these experiments. We attribute this to small, optically spherical ice crystals formed within the CFDC. Thus, small ice crystals with diameters below approximately still remain a challenge for optical instruments such as PPD-HS. However, after RH is increased and ice crystals have grown sufficiently, ice crystals are correctly recognized, yielding as the lower size limit for the phase discrimination capabilities of PPD-HS. The misclassification rate is significantly lower in the case of a pure supercooled liquid cloud, where spherical fractions of unity are predicted by our random forest model nearly throughout the experiment. This likely results from a less variable TBC distribution that is constrained to lower absolute values, but at the same time is likely biased by the limited droplet sizes ( ) achievable within our horizontal setup. To our knowledge, these data are the first of their type to be recorded on linear CMOS arrays, showing successful discrimination of spherical and aspherical cloud particles.
To what extent PPD-HS can be used to determine the phase of atmospheric cloud particles remains to be investigated. It is clear from this study that PPD-HS successfully discriminates between cloud droplets and ice crystals for particles when used with a CFDC setup, rendering PPD-HS an alternative to the size threshold criterion usually used with OPCs.
Code availability
The code version used for data post-processing and analysis in this paper is written in MATLAB and is available upon request to the authors.
Data availability
The data presented in this publication are available at the following DOI: 10.3929/ethz-b-000313787 (Mahrt et al., 2019).
Appendix A Array midpoint determination
Determination of the midpoint of an array, i.e., the center pixel, is crucial for the correct calculation of shape indicators and thus ultimately phase discrimination. The midpoint for each array is determined empirically through calibration of PPD-HS with spherical particles of uniform size. For this purpose, all particles within a data set of spherical particles are considered as an entity and the mean TBC of this entity is calculated for different midpoint pixels () using Eq. () and a range of pixels around the physical array midpoint ().
The midpoint pixel is then chosen to be the pixel yielding the lowest TBC value on average, since spherical particles producing symmetrical scattering patterns should theoretically yield a TBC of zero. In Fig. the mean TBC as a function of the midpoint pixel is shown separately for both arrays. It can be seen that the mean TBC of the different spherical data sets converges towards a minimum for a center pixel close to the physical array center. Using this method we determined the midpoint pixels to be and for array 1 and array 2, respectively, and all our data presented here are referenced to these midpoints. For reference we include the mean TBC for our aspherical data sets (dashed gray line). It immediately becomes clear that for incorrect midpoint pixels (where TBC is not minimized for spherical particles), spherical and aspherical particles cannot be distinguished anymore using the TBC because the TBC of aspherical particles (dashed gray line) is smaller than that of the spherical particles (bold colored lines) for incorrect midpoint pixels, as seen in Fig. .
Figure A1
CMOS array midpoint determination from PPD-HS data of spherical particles. Mean TBC across all particles within a data set of uniformly sized particles for different CMOS pixels considered as midpoint within Eq. () for (a) CMOS array 1 and (b) for CMOS array 2. All panels include the same data where the lightly shaded lines correspond to individual data sets (see Supplement Table S1) and the bold lines correspond to the mean across all data sets. The dashed gray line corresponds to the mean TBC across all data sets of aspherical particles for reference.
[Figure omitted. See PDF]
Appendix B PPD-HS data processing and analysisUsing MATLAB we developed routines to analyze data from PPD-HS. Each data set sampled by PPD-HS contains artifacts, which requires a careful usability check prior to phase discrimination analysis. In Fig. we show an exemplary set of scattering patterns for a data set of spherical PEG particles of uniform size. While most particles show scattering patterns with clear features, some particles reveal very low or noisy, almost absent peak intensities. Scattering patterns with a low signal-to-noise ratio are caused by small particles that scatter minimally or particles that miss the image laser beam completely due to an expansion of the particle sample flow or a mismatch between trigger detection and pulsing of the image laser resulting from the (parabolic) velocity distribution of the particles. An indicator for the intensity features recorded by the two arrays is evaluated by the variance of the intensity along each array:
B1
Low variance values are mainly associated with noisy scattering patterns. Usable scattering patterns with a clear signal-to-noise ratio, on the contrary, are characterized through relatively higher variance, resulting from clear scattering features (distinct peaks) along each CMOS array. Comparing the variance distributions of data sets of spherical particles and mere electronic noise, detected by the CMOS arrays when no flow and thus no particles were present within the scattering chamber, we have empirically found a minimum variance value of needed on each array, in order to be considered usable. A raw scattering pattern that does not fulfill this criterion is considered noise and is rejected from determining particle shape. Nevertheless, these noise patterns are included for concentration analysis, without classifying the particle type. For the scattering patterns shown in Fig. , we indicate for each particle, the values of TBC, AIC and variance of both arrays on the top of each panel. Visual inspection reveals that particles 1,3, 9 and 11 are not suitable for phase discrimination, consistent with low variance values. Particle 13 shows weak scattering intensities, compared to other particles (e.g., 2, 10 and 12) but can still be associated with a spherical particle and has variance values above on each array.
In Fig. we show the effect of the scattering patterns usability check using a minimum variance criterion of for a typical VOAG data set of spherical particles. TBC probability density functions (PDFs) are constrained to lower TBC values when only particles with a minimum variance of on either CMOS array are considered (dotted lines), compared to when all particles of the data set are used (solid lines). For instance, for the data set shown in Fig. , most of the particles with TBC are removed after applying the minimum variance criterion. Furthermore we note this example shows good agreement of the TBC distributions among the two CMOS arrays when using the respective midpoints discussed above.
In addition to this variance criterion, other selection criteria can be applied to constrain the data, at the cost of reducing the size of the data set. This is particularly useful in cases where a sample is constituted of multiple particle types. For instance, the fly ash suspension contained residuals that resulted in scattering patterns that could be associated with spherical particles, such as those displayed in Fig. a, and thus cannot be unambiguously attributed to fibers (see Supplement Fig. S2). Fibrous particles can heuristically be separated out from this data set by selecting only particles above a certain PTM threshold, i.e., particles with a certain aspect ratio. Similarly, in order to avoid any bias in the calculated particle parameters from high CMOS background intensities (see Supplement Sect. S5.1), we categorized all particles as noise that have intensities below (see Supplement Sect. S5.2).
Figure B1Example scattering patterns showing the relative scattered light intensity as a function of array pixel number for array 1 (green) and array 2 (red). Data are background-corrected and correspond to VOAG-generated PEG particles of . The values on top of each panel depict the TBC, AIC and variance, where the first number corresponds to array 1 and the second number to array 2.
[Figure omitted. See PDF]
Figure B2Example comparison of TBC distributions of a VOAG data set of spherical particles ( ) when the minimum variance criterion of is (dotted) and is not (solid) applied. Green lines correspond to array 1 and red to array 2.
[Figure omitted. See PDF]
Appendix C Multivariate PbP dataC1 Principal component analysis
The intensity data from the CMOS arrays can be used to calculate a user-defined number of parameters for each particle in a post-processing step, leading to a multivariate data set where each variable constitutes a dimension (degree of freedom) and each particle represents an observation. Here, we calculate a total number of variables from the scattering pattern data. In the phase space described by these variables, information is often correlated and thus redundant. Covariances of the variables describing the phase space of such a data cloud need to be considered when using ML techniques to classify particles, in order to obtain robust classification results.
Here, we use PCA on our multivariate data set. In general the purpose of PCA is detection of the dominant modes of variability, which are mutually orthogonal and uncorrelated. During PCA, redundant information in the form of variable correlation is bundled by describing the PbP data using a set of new, linearly uncorrelated variables, so-called principal components (PC), which constitute linear combinations of the original variables, i.e., particle parameters (TBC, PTM, etc.). Usually, a few PCs are sufficient to explain a large fraction of the total variance, so that describing the data in a subspace with fewer dimensions (reduced to the dominant PCs) is adequate. Here, we are interested in reducing the dimensionality of our phase space describing the PPD-HS data, prior to using supervised ML for particle shape classification, aiming to identify variables that are suited to make statements about particle shape.
Mathematically, the PbP data are transformed into a new, orthonormal coordinate system described by the eigenvectors, derived from an eigenvalue decomposition of the variance–covariance matrix, of the original PbP data matrix. That is, the eigenvectors are aligned along the symmetry axes of the data cloud, with the first eigenvector pointing into the direction of the largest data spread, the second eigenvector along the largest variability orthogonal to the first eigenvector and so on, with the total variance being preserved upon coordinate transformation. In order to derive meaningful results, it is important to normalize the original PbP data matrix prior to performing PCA. For instance, for the size range covered here, the mean light intensity shows a larger variability compared to the TBC, since the latter one is normalized to the maximum light intensity and with that independent of physical particle size, whereas the mean light intensity is not. Thus, without normalization, the variability in the mean light intensity would lead to a stronger weighting of this variable upon PCA compared to variables with less variability (such as, e.g., TBC). Therefore, a score standardization on the variables is performed here prior to PCA, subtracting the average from each data value and dividing by the standard deviation, so that a similar emphasis is given in all phase-space directions upon PCA.
Figure C1
Fraction of variance explained by each PC (left hand ordinate), listed in Table , and cumulative explained variance (right hand ordinate). PCA was performed on a subset of 10 000 particles, randomly selected from the entire calibration data set and consisting of equal numbers of spherical and aspherical particles.
[Figure omitted. See PDF]
In Fig. we show the variance explained by each PC, obtained when performing a PCA on the normalized PbP data matrix, using a total of 10 000 particles that were randomly selected from the entire calibration data set but encompassing an equal number of spherical and aspherical particles (target classes). The first four PCs, listed in Table , describe over % of the total variance and are thus associated with high eigenvalues.
Table C1PC coefficients, also known as loadings or eigenvectors, derived from performing a PCA on the standardized (mean centered and normalized to standard deviation) PbP data matrix of original particle variables. PCA was performed on 10 000 particles composed of an equal number of aspherical and spherical particles, randomly selected from the entire calibration data set. Each column contains the coefficients for one PC, which are given by a linear combination of the original variables (rows).
Variable | PC 1 | PC 2 | PC 3 | PC 4 | PC 5 | PC 6 | PC 7 | PC 8 | PC 9 | PC 10 | PC 11 |
---|---|---|---|---|---|---|---|---|---|---|---|
0.41 | 0.02 | 0.09 | 0.14 | 0.02 | 0.05 | 0.61 | 0.23 | ||||
0.39 | 0.04 | 0.12 | 0.02 | 0.29 | |||||||
0.43 | 0.04 | 0.04 | 0.56 | 0.06 | |||||||
0.43 | 0.00 | 0.15 | 0.52 | 0.66 | |||||||
0.01 | 0.53 | 0.07 | 0.77 | 0.02 | 0.02 | 0.00 | |||||
0.00 | 0.54 | 0.21 | 0.00 | 0.00 | |||||||
0.26 | 0.62 | 0.67 | 0.21 | 0.05 | 0.18 | 0.06 | |||||
0.00 | 0.24 | 0.67 | 0.25 | 0.11 | 0.10 | 0.04 | |||||
0.39 | 0.18 | 0.57 | 0.14 | 0.06 | |||||||
0.39 | 0.02 | 0.51 | 0.02 | 0.63 | 0.41 | ||||||
AIC | 0.03 | 0.55 | 0.75 | 0.16 | 0.01 | 0.00 |
Using the PC coefficients depicted in Table , we transformed the PbP data matrix of the randomly selected 10 000 particles containing the original particle variables (TBC, PTM, AIC, etc.; see first column in Table ) into the phase space described by the eigenvectors of the PCA. The transformed data matrix was subsequently used to train a random forest model, as described in Sect. . This was done 10 times, each time randomly sampling 10 000 particles from the entire calibration data set (see Supplement Table S1) but using the same PC coefficients (see Table ), in order to test the robustness of the model and estimate predictor importance, with the goal to identify robust particle shape predictors.
Figure shows box plots of the predictor importance for particle shape discrimination, derived from the PC-based random forest model and estimated using a curvature test. In the curvature test, the best predictor is determined through minimization of the value for evaluation of the null hypothesis that predictor and response are independent, as detailed in , at every decision node within a tree. The predictor importance can thus be viewed as a measure for how well a given predictor splits the data, out of the randomly selected set of predictors considered at a decision node. The second and sixth PCs show a significantly larger predictor importance compared to the other PCs, thus best discriminate particle shape out of the entire set of predictors (PCs) that is used as input for our random forest model.
This observation is consistent with the PC coefficients depicted in Table . We interpret PC 2 as shape component (variability), consistent with the relative strong contribution of coefficients, which describe the symmetry of a scattering pattern, namely , and AIC. In contrast, in PC 6 the signs for the TBC coefficients show an opposing trend. We interpret this PC to describe particle shape in terms of a , given by the difference between the TBC values obtained from the two CMOS arrays. This value is small for spherical particles with similar scattering patterns across both arrays and larger for aspherical particles, where the intensity distribution is not symmetric.
From the PCA we conclude that the symmetry parameters , , and AIC are best suited to discriminate the shape of particles detected by PPD-HS and that the PbP input for our classification model should be constrained to these predictors. We note the contributions of the PTM coefficients in PC 2. However, we cannot think of any physical contribution of the PTM to distinguish between spherical and aspherical particles and thus do not use this parameter within our random forest model.
Figure C2Estimates of predictor importance derived from the random forest model using the PCs as model input (see Table ), encompassing 10 000 observations composed of equal numbers of spherical and aspherical particles, randomly pulled from the entire calibration data set. The random forest model was grown on 200 independent decision trees and predictor importance was derived from a curvature test for predictor splitting using the OOB samples. Each box encompasses a total of 10 independent simulations (random forest models), with the red line representing the median and the bottom and top edges of the box indicating the 25th and 75th percentiles, respectively. Whiskers extend to the most extreme data points that are not considered as outliers, which are given by the red crosses.
[Figure omitted. See PDF]
Author contributions
FM prepared all figures and wrote the manuscript with contributions from all authors. The schematic illustrating the experimental setup was created by JW. PPD-HS was built by CS and purchased by ETH, where detailed characterization measurements were performed. FM designed the calibration measurements and FM and ZAK designed the ice nucleation experiments. JW and FM conducted PPD-HS experiments with the help of RD. JW, RD and FM analyzed PPD-HS measurements and developed data analysis routines. HS performed theoretical Mie calculations. RD, JW, FM, CS, HRS and ZAK discussed and interpreted data. ZAK supervised the project.
Competing interests
The authors declare that they have no conflict of interest.
Acknowledgements
We would like to thank Sarah Grawe for providing the fly ash sample and help with the production of fiber-like particles. Hannes Wydler, Peter Isler and Marco Vechellio are acknowledged for technical support throughout the project. Fabian Mahrt acknowledges Carsten Kykal for advice about using the VOAG. The authors thank Ulrike Lohmann, Caroline Rösch and Zbigniew Ulanowski for reading the manuscript and for the helpful discussions. Niklas Pfister is thanked for his helpful input and discussions on the supervised machine learning approach. The authors thank the anonymous reviewers for their careful reading and their helpful comments.
Financial support
This research has been supported by the ETH grant (grant no. ETH-25-15-1) and from the ETH Scientific Equipment Program.
Review statement
This paper was edited by Charles Brock and reviewed by two anonymous referees.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A new instrument, the High-speed Particle Phase Discriminator (PPD-HS), developed at the University of Hertfordshire, for sizing individual cloud hydrometeors and determining their phase is described herein. PPD-HS performs an in situ analysis of the spatial intensity distribution of near-forward scattered light for individual hydrometeors yielding shape properties. Discrimination of spherical and aspherical particles is based on an analysis of the symmetry of the recorded scattering patterns. Scattering patterns are collected onto two linear detector arrays, reducing the complete 2-D scattering pattern to scattered light intensities captured onto two linear, one-dimensional strips of light sensitive pixels. Using this reduced scattering information, we calculate symmetry indicators that are used for particle shape and ultimately phase analysis. This reduction of information allows for detection rates of a few hundred particles per second.
Here, we present a comprehensive analysis of instrument performance using both spherical and aspherical particles generated in a well-controlled laboratory setting using a vibrating orifice aerosol generator (VOAG) and covering a size range of approximately 3–32
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 Institute for Atmospheric and Climate Science, ETH Zurich, 8092 Zurich, Switzerland
2 Institute for Atmospheric and Climate Science, ETH Zurich, 8092 Zurich, Switzerland; now at: Center for Climate System Modeling, ETH Zurich, 8092 Zurich, Switzerland
3 Centre for Atmospheric and Climate Physics, University of Hertfordshire, Hatfield, Hertfordshire, AL10 9AB, UK