Optimizing ExoMars Rover Remote Sensing

Full text

Turn on search term navigation

Introduction

Problems of Data Limited Dynamic Exploration

A robotic spacecraft explores an environment by collecting data and transmitting it to Earth, where it is interpreted manually to inform a new sequence of data collecting actions that are transmitted back to the spacecraft and executed. A new data set is collected, and the cycle of exploration continues. This separation of the sensory and cognitive systems imposes a “data-limitation” problem to dynamic robotic planetary exploration (Francis et al., 2017), where the volume of data collected during a given set of actions may exceed the data budget afforded by the next communication window. This problem can be addressed by increasing the science autonomy of robotic spacecraft, enabling on-board identification and follow-up observations of science targets (Castano et al., 2007; Chien et al., 2005; Estlin et al., 2012; Francis et al., 2017; Kubitschek et al., 2007). However, science autonomy is perceived to introduce risk and expense, and implementation requires co-ordination across many aspects of a mission-architecture at a systems-level (Amini et al., 2021).

Without autonomy, data-limited scenarios require strategies for prioritizing subsets of data for transmission, based on the decisions required for the next command sequence, even when prior knowledge of the data content is limited. This paper proposes a method for using limited prior knowledge of an environment to prioritize a subset of spectral channels for transmission from the multispectral imager of a robotic spacecraft that is dynamically exploring that environment.

Once received at Earth, these data sets must be visualized in a way that minimizes cognitive loading, as interpretations and next-step decisions may be required on a timescale of hours. High-dimensional data sets require reduction for interpretation, at the cost of information-loss. However, this loss can be mitigated by using prior knowledge of signals of interest. For example, in spectral reflectance image data sets, electronic or vibrational absorption features can indicate mineral composition; dimension reductions that measure these features, or spectral parameters, can be designed in advance, generated with low computational cost, and displayed as false color images, providing an efficient means for browsing these high-dimensional data sets for specific compositional indicators (Bell et al., 2000; Farrand et al., 2007; Fraeman et al., 2020; Jacob et al., 2020; Johnson et al., 2015; Pelkey et al., 2007; Viviano-Beck et al., 2014; Wellington et al., 2017). Spectral parameters must be chosen with care, to complement the spectral sampling of the instrument and the form of the target material reflectance spectra.

In this paper, we address these two problems of data-prioritization for transmission and data-reduction for interpretation, in tandem, for spectral image data sets under constrained data downlink volume, in the special case where a particular target material is sought in an environment, and where the background environment materials have been inferred from prior observations. We present and explore a strategy for selecting and combining subsets of spectral images captured by a panoramic multispectral imager of a Mars rover, using prior information of the materials composing the rover surroundings obtained from orbit.

Multispectral Imaging From a Mars Rover

ExoMars PanCam

This work has been conducted in preparation for the operation of the Panoramic Camera (PanCam), the stereo multispectral imaging system (Coates et al., 2017) of the ESA ExoMars Rosalind Franklin rover (Vago et al., 2017). Rosalind Franklin has the primary objective of finding evidence of ancient life in the subsurface of Mars at Oxia Planum, an ancient phyllosilicate-rich terrain (Quantin-Nataf et al., 2021). PanCam will provide visual characterization of the geology of the landing site, and, with tricolor and 12-channel multispectral visible-to-near-infrared (VNIR; ∼400–1,100 nm) imaging, a preliminary assessment of the material composition (Cousins et al., 2010, 2012). Rosalind Franklin will operate semi-autonomously (Winter et al., 2017), but without science autonomy, relaying data to the Rover Operation Control Center (Turin, Italy) via orbiter, typically with a transmission opportunity at the start and end of each sol (Vago et al., 2017). As such, tactical data interpretation and next-sol activities must be sequenced on a timescale of hours for an economically efficient mission.

VNIR Reflectance Spectroscopy

Multispectral images can be calibrated into units of relative surface reflectance, providing an approximation of the intrinsic surface reflectance spectrum of the material within the projected area of each pixel (Hayes et al., 2021; Reid et al., 1999). Features identified in the population of pixel spectra can be compared with reference libraries of mineral reflectance spectra (captured under controlled conditions), which may lead to the identification of signatures of chemical composition, diagnostic of particular mineral species, and the discrimination of spectrally unique features. At Mars, VNIR spectral features are dominated by the electronic absorptions of the transition metals, notably iron. The electronic transitions are influenced by neighboring ligands, as described by crystal field theory (Burns, 1993a, 1993b), charge transfer absorptions of transition metals centered in the ultra-violet (Burns, 1993a; Clark, 1999; Hunt, 1977), and overtones of vibrational modes of the infrared-active molecules of H₂O, OH and CO₃ centered in the short-wave infrared (Farmer, 1974; Hunt, 1977).

Examples of VNIR Multispectral Imaging at Mars

VNIR multispectral imaging has provided vital support to the exploration of the surface of Mars, providing preliminary identification of mineralogy indicative of water activity, including hematite and hydrous silicates, guiding further investigations that have established evidence of sites of past habitability (Arvidson et al., 2014; Bell et al., 2004; Calvin et al., 2008; Rice et al., 2010, 2023; Squyres et al., 2008). Reviews of VNIR multispectral imaging from the martian surface are given by Bell et al., 2019, 2008, Farrand et al., 2008; Gunn & Cousins, 2016.

VNIR multispectral imaging from the surface can be complemented with hyperspectral imaging from orbit, as exemplified by the investigation of Vera Rubin Ridge by the Mars Science Laboratory Curiosity rover (MSL). Strong signatures of hematite, detected from high spectral resolution orbital imagery (Compact Reconnaissance Imaging Spectrometer for Mars, CRISM (Murchie et al., 2007)), in a distinct ridge unit were hypothesized to indicate a redox interface, providing a case for further ground exploration (Fraeman et al., 2013). During the traverse of MSL toward the ridge, spectral parameters derived from multispectral images captured by the on-board Mastcam imaging system (Bell et al., 2017) were used to compare and confirm ground-based and orbital observations with the absorption features associated with the unit (Fraeman et al., 2020; Wellington et al., 2017). The band depths of the 535 and 850 nm features were measured from the available multispectral rover and hyperspectral orbital image channels and compared, validating the presence of the hematite signature in both of these observation modes (Fraeman et al., 2020; Wellington et al., 2017). False color decorrelation-stretch images of infrared channels were also used to visualize the spectral diversity of the scenes (Fraeman et al., 2020; Wellington et al., 2017). The exploration of Vera Rubin Ridge demonstrated how spectral parameters can be used to understand relationships between VNIR observations from the ground and from orbit, providing lessons on the role of subpixel areal mixing of sand and bedrock signatures and intimate mixing of dust, and ultimately leading to a renewed understanding of the distribution of hematite across the MSL traverse. 3-filter subsets for band-depth measurements have been advised for Mastcam-Z (Mars 2020 Perseverance rover) multispectral observations to relieve downlink data volume (Rice et al., 2020), and spectrophotometric studies have been volume-optimized using these subsets (Johnson et al., 2022).

Solution Overview

In anticipation of Rosalind Franklin performing a campaign analogous to Vera Rubin Ridge in a data-limited scenario, we have developed a method for quantitatively ranking subsets of PanCam multispectral filters for identifying a target mineral against a background. We have implemented the method in a Python open-source toolkit, the Spectral Parameters Toolkit (sptk) (Stabbins & Grindrod, 2024c). Functionally, the toolkit maps target and background mineral labels, via multispectral filter properties, to a list of candidate subsets of filters sorted by a target-vs.-background separation score (see Section 3.3). We construct the filter subsets as linear combinations of spectral parameters, and use Linear Discriminant Analysis to find optimal linear coefficients of the spectral parameter combinations, and to evaluate the target-vs.-background separation. A secondary objective of this paper is to investigate different ways of evaluating the separation score.

To develop this method, we have used the example of identifying hematite in the context of the expected mineralogy of the ExoMars rover Oxia Planum landing site (Quantin-Nataf et al., 2021). Notably, PanCam will sample with a unique set of spectral channels, optimized for detection of the mineralogy of Mars (Cousins et al., 2010, 2012), and distinct from the spectral channels of previous Mars rovers (Grindrod et al., 2022; Gunn & Cousins, 2016). This tool allows for the investigation of how this new multispectral instrument should be optimally operated in the new environment of Oxia Planum.

Data

Target and Background Material Selection

To develop the method we have chosen a set of materials expected at the landing site of Oxia Planum. We represent the Fe/Mg-rich phyllosilicate unit with vermiculite and saponite (Carter et al., 2016) and the mafic-rich capping unit with basalt and basaltic soil (Quantin-Nataf et al., 2021). We also represent a putative Al-rich phyllosilicate unit with montmorillonite (Turner et al., 2021), and include hematite, that has been reported in orbital Gamma-Ray Spectroscopy observations (Da Pieve et al., 2021), but has not yet been reported from orbital reflectance spectroscopy studies of Oxia Planum. As found during the investigation of Vera Rubin Ridge, the signatures of hematite can be weak in orbital reflectance spectroscopy observations even when in situ contact measurements show a significant abundance (Fraeman et al., 2020). The presence of hematite can indicate a change from a reducing to an oxidizing environment, and/or aqueous surface or groundwater activity (Burns, 1993b; Catling & Moore, 2003; Jiang et al., 2022), of relevance to the astrobiological objectives of ExoMars.

For each mineral we seek multiple sources of spectral reflectance to represent the variability of spectral signatures for a single species due to variations in grain size (Hapke, 1981) and the effects of small amounts of sample impurities, allowing for the statistical analysis presented in this paper to be performed. Hematite typically occurs on Mars in three VNIR spectrally distinct phases: nanophase, fine-grained red crystalline and coarse-grained gray, with typical particle sizes of <10 nm, <10 μm and >10 μm (Jiang et al., 2022; Morris et al., 1989). Although the discrimination of these phases has the potential to provide a richer understanding of formation conditions, here we do not sub-categorize the hematite entries, and instead attempt to simultaneously separate all 3 phases from the background materials. Weathering can also affect spectral reflectance, for instance basalts can develop ferric signatures. No basalt entries used in this study showed indication of weathering.

Mineral Spectral Library

We have sourced multiple reference reflectance spectra for each mineral species via the Western Washington University Visible and Infrared Spectroscopy brOwseR (VISOR) (Million et al., 2022; St. Clair & Million, 2022). VISOR provides an online interface to a standardized formatting of the following publicly available spectral libraries: ASTER/ECOSTRESS Spectral Library (Baldridge et al., 2009; Meerdink et al., 2019), CSIRO National Virtual Core Library (Huntington, 2016), playa evaporites (Crowley, 1991), USGS Spectral Database (Clark et al., 2007; Kokaly et al., 2017), University of Winnipeg Spectrophotometer Facility (Cloutis, 2015; Cloutis et al., 2006), and data collected at Western Washington University. Each entry is formatted with details of the source library and notes on the measurement conditions, such as angles of emission and incidence, instrument resolution and range, grain size if the sample was powdered, and known impurities.

Upon retrieving all matches of the candidate mineral types in the VISOR database with complete spectral coverage in the VNIR domain, defined here as (400, 1,100) nm, the entries associated with each relevant mineral type of the study are labeled as belonging to either the “target” or “background” classes. The entries are then re-sampled and truncated from the various original instrument resolutions of each source to a common wavelength resolution of 1 nm across (400, 1,100) nm, that we denote as the discrete set Λ, by linear interpolation (Figure 1a).

[IMAGE OMITTED. SEE PDF]

Class Balancing

In this method we learn dimension reductions from the labeled data set, and as such the learning process is susceptible to bias in the training data set. Ideally the relative proportions of the class sizes should represent the expected relative abundances of the class materials in the environment. The spatial distribution of materials can be inferred from orbital studies, but only at a coarse resolution relative to the ground resolution of a multispectral imager. At the finer scale of ground observations, the distribution of materials is not expected to be homogeneous, such that a given scene captured by the field of view of the imager is unlikely to contain a distribution of abundances that matches that of an orbital view. We represent this lack of knowledge of the content of a rover-view scene by assigning equal probability to each class through class balancing, such that each class is of equal size, and the constituent mineral groups of each class are of equal size. We do this with under-sampling, by discarding entries at random from the larger mineral groups and larger class until the sizes match. In this study, this results in the “hematite” and “basalts & phyllosilicates” classes each having a size of 60 entries, and the basalts, montmorillonite, saponite and vermiculite groups each having a size of 15, giving a total data set size of 120 samples (see Table S1 for a list of entries used in this study). Our focus is on mapping spectral reflectance features to material labels, so we do not apply balancing across the latent variables including the different hematite phases. Empirical studies have shown that class balancing typically improves linear discriminant classifier performance (Xue & Hall, 2015), supporting our use for class balancing, but theoretical support for this is lacking, warranting further study of the implications of class balancing for this application.

Instrument Spectral Transmission

We represent the spectral sampling of the PanCam Geology filter set with Gaussian transmission profiles (Equation 1), using the center wavelength (λ_CWL) and full-width-at-half-maximum (FWHM, Δλ) of each filter given by (Coates et al., 2017), illustrated in Figure 1c (see Table S2 in Supporting Information S1 for values of λ_CWL and Δλ used in this study). 1 ${T}_{f}\left(\lambda \vert {\lambda }_{CWL},{\Delta }\lambda \right)=\mathrm{exp}\left(\frac{-{\left(\lambda -{\lambda }_{CWL}\right)}^{2}}{2{\left(\frac{{\Delta }\lambda }{2\sqrt{2\,\mathit{ln}\,2}}\right)}^{2}}\right)$

We represent the transmission function of the filter labeled f as T_f[λ] ∈ [0,1],λ ∈ Λ.

Resampling of the Spectral Library With the Multispectral Filter Suite

We simulate the multispectral sampling of the reflectance by each filter with Equation 2 2 ${R}_{m}[f]=\frac{\underset{{\Lambda }}{\int }{R}_{m}[\lambda ]{T}_{f}[\lambda ]d\lambda }{\underset{{\Lambda }}{\int }{T}_{f}[\lambda ]d\lambda }$ under the assumption that Δλ ≫ 1 nm. Here we represent the reflectance spectrum as R_m[λ]:R ∈ [0,1],λ ∈ Λ, where m indexes the material in the set of all entries of the study, D. The resampled case study spectral library is illustrated in Figure 1b (see the Data Products repository, “/observation/tables/observation.csv” for the resampled reflectance spectra data (Stabbins & Grindrod, 2024a)).

Theory and Method

Method Rationale

Given this data set of class-labeled mineral reflectance spectra, sampled to the PanCam multispectral filter wavelengths, our goal is to find minimal subsets of the filters that separate the labeled classes. To separate 2 classes we require a model that reduces the dimensionality of the data set from the number of unique filters in a given subset to 1. After projection of the data set onto a 1D distribution, we evaluate class separation with the Fisher Ratio, a measure of mean inter-class distance weighted by total intra-class variance (Equation 3), and with Classification Accuracy (Equation 15), with respect to a decision boundary between the projected classes.

We construct multiple models for each filter subset using spectral parameters and Linear Discriminant Analysis (LDA). Spectral parameters are defined by equations parameterized by channel reflectance, and in some cases channel wavelength. We compute spectral parameters from the sampled reflectance data set for 4 different equation types and for every allowed permutation of the available channels. Individual spectral parameters are typically insufficient to separate material classes, but paired combinations have been used to reveal clusters and patterns of material compositions and properties (e.g., Figure 12 of Rice et al., 2022, Figure 18 of Rice et al., 2023). We consider in turn each possible pair combination of the spectral parameters we compute, and we use LDA to find the linear coefficients of the pair that maximizes the class separation, through maximization of the Fisher Ratio. The spectral parameter equations and the linear coefficients give a model for target-vs.-background separation for the given filter subset, and the maximized Fisher Ratio gives a score, comparable across all models, of the separation of the classes.

LDA: Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a standard dimension reduction technique (Duda et al., 2001), that has been applied successfully to the classification of spectral imagery data for several decades (Chang & Ren, 2000; Jordan et al., 1978; Manolakis et al., 2016; Steiner, 1970; Tom & Miller, 1984). The key distinction of LDA from the more commonly used Principal Component Analysis (PCA) is that PCA does not use labeled data and finds the dimension-reducing projections that maximize the total variance, whereas LDA uses data assigned with class labels to find projections that maximize the separation between the mean values of each class (between-class scatter), whilst minimizing the total of the variance within each class (within-class scatter). This optimization problem maximizes the Fisher Ratio, that for the simple 2-class case is expressed as 3 $FR=\frac{\vert {\mu }_{a}-{\mu }_{b}{\vert }^{2}}{{\sigma }_{a}^{2}+{\sigma }_{b}^{2}}$ where μ_a, μ_b and ${\sigma }_{a}^{2}$ , ${\sigma }_{b}^{2}$ are the mean and variance values of classes a and b after projection by the n-vector a of the data set from the n-dimensional feature space to 1-dimension. LDA solves for a by maximizing the Fisher Ratio expressed as a function of a (Equation 4) 4 $FR(\boldsymbol{a})=\frac{\vert {\boldsymbol{a}}^{\boldsymbol{T}}{\boldsymbol{S}}_{B}\boldsymbol{a}\vert }{\vert {\boldsymbol{a}}^{\boldsymbol{T}}{\boldsymbol{S}}_{W}\boldsymbol{a}\vert }$ where S_B is the n × n between-class scatter matrix (Equation 5) 5 ${\boldsymbol{S}}_{B}=\sum\limits _{c\in \mathcal{C}}{n}_{c}\left({\boldsymbol{\mu }}_{c}-\boldsymbol{\mu }\right){\left({\boldsymbol{\mu }}_{c}-\boldsymbol{\mu }\right)}^{T}$ and ${\boldsymbol{S}}_{W}$ is the $n\times n$ within-class scatter matrix (Equation 6) 6 ${\boldsymbol{S}}_{W}=\sum\limits _{c\in \mathcal{C}}\left(\sum\limits _{\mathbf{x}\in c}\left(\mathbf{x}-{\boldsymbol{\mu }}_{c}\right){\left(\mathbf{x}-{\mu }_{c}\right)}^{T}\right)$ and ${n}_{c}$ is the number of entries of class $c$ , $\mathbf{x}$ is an $n$ -vector datapoint in $c$ , ${\boldsymbol{\mu }}_{c}$ is the $n$ -vector mean across $c$ , and $\mu$ is the $n$ -vector mean across all classes. For the 2-class case $FR(\boldsymbol{a})$ is maximized by solving the generalized eigenvalue problem (under the condition that ${\boldsymbol{S}}_{W}$ is non-singular) 7 ${{\boldsymbol{S}}_{W}}^{-1}{\boldsymbol{S}}_{B}{\boldsymbol{\phi }}_{i}={\lambda }_{i}{\boldsymbol{\phi }}_{i}$ and setting $\boldsymbol{a}$ to the eigenvector ${\boldsymbol{\phi }}_{i}$ corresponding to the largest eigenvalue ${\lambda }_{i}.$

Applying LDA to Spectral Parameter Combinations

Typically, individual filters are not capable of uniquely discriminating materials, but LDA can be used to find optimal linear combinations of filters that improve discrimination (e.g., Robert et al., 1992). Spectral parameters can also be defined that combine filters in ways that increase discrimination (Section 1.2.3); however, some key spectral parameters (e.g., Band Depth, Ratio, Equation 11) are not linear, and so these reductions are not discoverable through LDA. Hence, we expand the 12-filter feature space by computing spectral parameters, and we treat each of these as a dimension of the data set (see Section 3.3.1). This extends previous work in the literature, where multiclass-LDA has been applied to trichromatic imaging data for the classification of rock outcrops from a rover perspective, using a small number of spectral parameters (e.g., red/blue channel ratio) as part of the feature space (Francis et al., 2014).

We do not wish to apply LDA across the high dimensionality of this spectral parameter feature space, as our goal is to search for ways of combining minimal numbers of filters. Instead, we draw pairs of spectral parameters, taking advantage of the fact that each may require up to 3 filters to construct, so that by taking pairs we are considering a range of subsets using 1–6 unique filters. We allow for spectral parameters to be combined with themselves, and we treat individual filter channels as special types of spectral parameter. We apply LDA to each of the pairs, or spectral parameter combinations (SPCs), finding the optimal projection a and evaluating FR(a) for each. By also recording the number of unique filter channels (NUC) required to compute the SPC, we obtain a look-up table to find the minimal optimal filters to choose, and instructions of how to use them (Equation 14), to perform a material discrimination task.

Computing the Spectral Parameters

We have chosen 4 spectral parameters types: Ratio, Slope, Band-Depth and Shoulder-Height (Equation 8–13) as defined by Pelkey et al., 2007; Viviano-Beck et al., 2014 for analyzing CRISM spectral imaging data (excluding the doublet/two-band spectral parameters that typical multispectral VNIR systems are not capable of resolving).

Reflectance 8 $\mathrm{R}\,{\lambda }_{CWL}[f]={R}_{i}[f]$

Ratio 9 $\mathrm{R}\_{\lambda }_{CWL}\left[{f}_{1}\right]\_{\lambda }_{CWL}\left[{f}_{2}\right]=\frac{{R}_{i}\left[{f}_{1}\right]}{{R}_{i}\left[{f}_{2}\right]}$

Slope 10 $\mathrm{S}\_{\lambda }_{CWL}\left[{f}_{1}\right]\_{\lambda }_{CWL}\left[{f}_{2}\right]=\frac{{R}_{i}\left[{f}_{1}\right]-{R}_{i}\left[{f}_{2}\right]}{{\lambda }_{CWL}\left[{f}_{1}\right]-{\lambda }_{CWL}\left[{f}_{2}\right]}$

Band Depth 11 $\text{BD}\_{\lambda }_{CWL}\left[{f}_{1}\right]\_{\lambda }_{CWL}\left[{f}_{2}\right]\_{\lambda }_{CWL}\left[{f}_{3}\right]=1-\frac{{R}_{i}\left[{f}_{2}\right]}{{aR}_{i}\left[{f}_{1}\right]-{bR}_{i}\left[{f}_{3}\right]}$

Shoulder Height 12 $\text{SH}\_{\lambda }_{CWL}\left[{f}_{1}\right]\_{\lambda }_{CWL}\left[{f}_{2}\right]\_{\lambda }_{CWL}\left[{f}_{3}\right]=1-\frac{{a\mathrm{R}}_{i}\left[{f}_{1}\right]+{b\mathrm{R}}_{i}\left[{f}_{3}\right]}{{R}_{i}\left[{f}_{2}\right]}$ 13 $a=1-b,\qquad b=\frac{{\lambda }_{CWL}\left[{f}_{2}\right]-{\lambda }_{CWL}\left[{f}_{1}\right]}{{\lambda }_{CWL}\left[{f}_{3}\right]-{\lambda }_{CWL}\left[{f}_{1}\right]}$

To compute the complete set of these spectral parameters, lists are constructed of the component channels for each permutation. Slope filter permutations are constrained such that λ_CWL[f₁] < λ_CWL[f₂] to eliminate duplication, and Band Depth and Shoulder Height filter combinations are constrained such that λ_CWL[f₁]<λ_CWL[f₂] < λ_CWL[f₃] to ensure geometrically meaningful results. For the PanCam 12-filter set, this results in 12 Reflectance, 132 Ratio, 66 Slope, 220 Band-Depth and 220 Shoulder-Height spectral parameter features, giving a 650-dimensional feature-space (listed in Table S3). This results in 211,575 pair SPCs to perform LDA on (listed in Table S4).

Evaluating Target-Versus-Background Separation of Spectral Parameter Combinations With LDA

Each spectral parameter is a representation of the data set of materials, with each material entry belonging to either the target (“hematite”) or background (“basalts & phyllosilicates”) class. Each SPC therefore gives a 2D representation of the data set, that we perform LDA on to give the linear combination: 14 $\boldsymbol{s}\boldsymbol{p}{\boldsymbol{c}}_{x,y}={a}_{x}{\boldsymbol{s}\boldsymbol{p}}_{x}\left({\boldsymbol{f}}_{\boldsymbol{x}}\right)+{a}_{y}{\boldsymbol{s}\boldsymbol{p}}_{y}\left({\boldsymbol{f}}_{y}\right)$ where a = (a_x,a_y) is the linear projection of the first (sp_x(f_x)) and second (sp_y(f_y)) spectral parameters of the given SPC that maximizes FR(a) (where f_i is the set of filters used to construct sp_i) (Equation 4). We record a such that the SPC can be retrieved and recreated when analyzing natural data sets, and we record the Fisher Ratio as a measure of SPC success.

Evaluating Spectral Parameter Combination Classification Accuracy

The relationship between the Fisher Ratio and the separation of classes relies on the assumption of Gaussian class distributions (Duda et al., 2001). As the distributions of material reflectance across the class populations are not constrained to be Gaussian (e.g., due to the assortment of mineral groups composing the background class), we require a second metric, classification accuracy, to verify the Fisher Ratio representation of class separation.

We evaluate the classification accuracy of the projected data set spc_x,y by finding the class-mean midpoint decision boundary (under the condition of equal class sizes) that separates the classes, assigning a class label to either side of the boundary, and comparing the true class label of each datapoint to the area it falls in. We perform LDA fitting and classification accuracy evaluation on separate training and test data sets respectively. These are drawn by performing stratified random sampling of the complete data set with a train/test ratio of 80/20.

For each SPC, each entry in the test data set is projected and compared to the boundary value and assigned a class label accordingly. The assigned labels are compared to the true labels of the test data set. We count the number of true-positive (TP), true-negative (TN), false-negative (FN) and false-positive (FP) detections, and from these compute the metrics of classification accuracy (ACC, Equation 15), sensitivity (TPR, Equation 16) and specificity (1 − FPR, Equation 16). 15 $ACC=\frac{TP+TN}{TP+TN+FP+FN}$ 16 $TPR=\frac{TP}{TP+FN}$ 17 $FPR=\frac{FP}{FP+TN}$

Repeat-Holdout and Re-Substitution

The repeat-holdout method is implemented to evaluate the spectral parameter combinations in a way that reveals sensitivity to variations in the training data set. LDA and classification are repeated for 500 trials and the results are aggregated for analysis. For each trial, random stratified data splitting is repeated with replacement to generate new test/train data sets. The mean and standard deviation for FR and ACC are then computed, giving a table of metrics for each SPC.

After repeat-holdout, we repeat the LDA training process on the complete data set and evaluate FR and ACC with re-substitution, to ensure that the computation of a uses all available data. These measures of FR and ACC are faster to compute than the repeat-holdout metrics; the correlation and concordance of these metrics is therefore of interest to the problem of efficient analysis.

Results

For each of 211,575 SPCs we have fitted optimal linear discriminants to the instrument-sampled data set and evaluated the Fisher Ratio (FR) and the classification accuracy (ACC). We have performed 500 repeat trials of training and testing, with random train/test splitting with replacement, and aggregated the results of these to give the mean and variance of the Fisher Ratio (FR_μ, $F{R}_{{\sigma }^{2}}$ ) and classification accuracy (ACC_μ, $AC{C}_{{\sigma }^{2}}$ ). We have also obtained linear discriminants across the complete data set, and evaluated the all-data Fisher Ratio (FR_D) and classification accuracy (ACC_D) by re-substitution validation (i.e., without testing/training splitting). The complete table giving these results for each of the 211,575 SPCs can be found in the data products repository (“/spc_classifier/tables/complete_results.csv” (Stabbins & Grindrod, 2024a)). Here we present the highest scoring SPCs when ranked by these metrics and when the NUC of each SPC is constrained, and we present visualizations and statistical summaries of these metrics across the population of SPCs. The data for each plot and table can be found in the data products repository (Stabbins & Grindrod, 2024a).

Performance of Top Ranked SPCs on Complete Data Set

Overall Top Ranked Spectral Parameter Combinations

Two different SPCs are found to perform best when ranked by FR_μ and ACC_μ, R_500_740 versus R_440_840 and BD_530_840_900 versus BD_440_780_1000 respectively (Figure 2) (See Equations 8–13 for definitions of SPC labels). R_500_740 versus R_440_840 has NUC = 4, using channels L01, L02, R01 and R03, and BD_530_840_900 versus BD_440_780_1000 has NUC = 6, using channels L01, L03, R02, R03, R05, and R06. Figures 2a and 2b show the distribution of the complete data set in the 2D SPC spaces, and the linear discriminant and decision boundary lines fitted to this complete data set, and Figures 2c and 2d show the data set distributions in the 1D LDA projection spaces. The top-ranked SPC by FR_μ of R_500_740 versus R_440_840 yields scores of FR_μ = 6.291 and ACC_μ = 0.991 and the top-ranked SPC by ACC_μ of BD_530_840_900 versus BD_440_780_1000 yields scores of and FR_μ = 1.316 and ACC_μ = 0.996. We address the discrepancy between the scores for the different ranking methods in Sections 5.3 and 5.5.

[IMAGE OMITTED. SEE PDF]

Top-Ranked Spectral Parameter Combinations for Each Number of Unique Filter Channels

We can query the results database for SPCs that use a given NUC, such that we can report on the change in class separation performance under data budget constraints. Figure 3 gives the top SPCs when limited to each NUC ∈ [1,6], and when ranked by FR_μ and FR_D (Figure 3a), and ACC_μ and ACC_D (Figure 3b), in contrast to the upper percentile, decile, quartile and median values for each of these ranking metrics (Section 4.2.1). Note that when ranked by ACC_μ there are 3,424 equal top-ranked SPCs for NUC = 5, and when ranked by ACC_D there are non-unique top-ranked SPCs for all cases but NUC = 4. The tabular information for these top-SPCs are provided in the data products repository (“spc_classifier/tables/top_lda_*.csv” (Stabbins & Grindrod, 2024a)), and we comment on these results in the context of minimal-filter selection in Section 5.1.

[IMAGE OMITTED. SEE PDF]

Spectral Parameter Combination Population Performance

Here we report on the distribution of FR and ACC performance across the entire population of SPCs, providing insight to the relationship between these different metrics.

Upper Percentile, Decile and Quartiles of the Fisher Ratio and Classification Accuracy

The values of the 99th, 90th and 75th percentiles (and SPC population sizes that exceed these) are, for FR_μ: 99th = 2.536 (2,116), 90th = 1.706 (21,158), 75th = 1.344 (52,894); and for ACC_μ: 99th = 0.991 (6,530), 90th = 0.982 (21,200), 75th = 0.953 (52,914), illustrated in Figure 3. The population size for the 99th percentile by ACC_μ exceeds that of the equivalent FR_μ population by a significant margin due to the large number of equal-ranked SPCs with ACC_μ = 0.991, as noted previously.

Receiver Operating Characteristic

We report on the classification accuracy of the population of SPCs with a Receiver Operating Characteristic scatter plot (Figure 4a), illustrating the relationship between the mean True-Positive Rate (TPR) (Equation 16) and mean False-Positive Rate (FPR) (Equation 17) (averaged over repeat-holdout trials). Each point on the plot represents the classification performance of a single SPC with a fixed decision boundary, as opposed to representing each SPC with a line of varying decision boundaries as is typical for a Receiver Operating Characteristic diagram. This shows a concentration of SPCs with classification of TPR ∼ 0.95–1 and FPR ∼ 0–0.1, indicated by density contours.

[IMAGE OMITTED. SEE PDF]

Uni- and Bi-Variate Distributions of Mean Fisher Ratio and Mean Classification Accuracy

The univariate axes of Figure 4b show the negatively skewed ACC_μ and the positively skewed FR_μ distributions, and the bivariate plot shows the relationship between these. The relationship between FR_μ and ACC_μ is nonlinear (Figure 4b) with a regression coefficient of r² = 0.547. SPCs with FR_μ exceeding or equal to the 99th percentile (2.536) have ACC_μ≳0.9. As FR_μ decreases, the lower limit of ACC_μ decreases, and consequently the range of ACC_μ at a given value of FR_μ increases with decreasing FR_μ.

All-Data Metrics Versus Repeat-Holdout Mean Metrics

Figures 4c and 4d show the distributions of FR_D versus FR_μ±FR_σ and ACC_D versus ACC_μ±ACC_σ, to visualize the concordances and linearities between the all-data and repeat-holdout derived metrics, and the variance of these metrics across the repeat-holdout trials. FR_μ and FR_D (Figure 4c) exhibit a concordance correlation coefficient of 0.9998, and a Pearson correlation coefficient of 0.9999. ACC_μ and ACC_D (Figure 4d) exhibit a concordance correlation coefficient of 0.9922, and a Pearson correlation coefficient of 0.9902. The spread of ACC_D about the concordance line appears to be greater than for FR_D, and ACC_σ appears to typically be greater than FR_σ with respect to the total range of the data. However, when considering the density of the distribution of SPCs by ACC, we see a concentration (indicated by white/gray shaded region) about the line of concordance. That is, the SPCs that visibly deviate from the line of concordance represent a small percentage of the SPC population, in agreement with the ACC high concordance and Pearson correlation coefficients. As expected, we see that there is an asymmetry in the distribution of SPCs about the line of concordance, showing that for a given interval of values of ACC_D, ACC_μ is typically less than ACC_D.

Mean versus Coefficient-Of-Variation Analysis of Fisher Ratio and Classification Accuracy

The plots of FR_σ/FR_μ versus FR_μ and ACC_σ/ACC_μ versus ACC_μ of Figures 4eE and 4f show the amount of variation of each metric across the 500 repeat trials performed, indicating the sensitivity of each SPC to changes in the data set. Figure 4e shows that for FR_μ≳1.8, the coefficient of variation is typically constrained to $\frac{F{R}_{\sigma }}{F{R}_{\mu }}\lesssim 0.2$ , and that the highest FR_μ SPCs lie close to this boundary. For each value of FR_μ we see that the lowest $\frac{F{R}_{\sigma }}{F{R}_{\mu }}$ valued SPCs form a boundary (the “efficient frontier”), and we see that all SPCs lying on this boundary exhibit ACC_μ≳0.95 (indicated by hue). We see that as FR_μ→0, FR_σ typically exceeds FR_μ. The ACC_σ/ACC_μ versus ACC_μ distribution is notably distinct from the FR_σ/FR_μ versus FR_μ distribution, with the highest ACC_μ SPCs (ACC_μ≥0.991) also exhibiting the lowest ACC_σ/ACC_μ values of <0.02, and with the highest FR_μ SPCs (FR_μ > 6) also appearing in this region, indicated by hue.

Discussion

On the Recommended Filter Subsets

We have identified the highest performing SPCs, and thereby the highest performing filter subsets (Figure 2), and we have identified the highest performing SPCs and filter subsets for each number of unique channels. Here we discuss a decision process for selecting the highest performing minimal filter subset, using the information collected in Figure 3, that we now consider in the context of the complete set of results.

In agreement with the implications of Figure 4c we see that ranking by FR_μ and FR_D each select the same SPCs, with almost identical FR values, and that a unique SPC is found for each NUC (Figure 3a). As the evaluation of FR_D does not require repeat-holdout, we recommend ranking by FR_D as opposed to FR_μ. We see that the peak FR occurs for NUC = 4, with the implication that using >4 filters either limits the inter-class distance, or amplifies the intra-class variance. This implies that the Fisher Ratio favors SPCs that use shared channels, which may be a result of high correlations between the constituent spectral parameter distributions, resulting in class separations achieved with a projection with low variance (also exemplified by Figure 2a). Figure 3a shows that, whilst NUC = 1 yields a score near the upper quartile, all other NUCs have scores exceeding the upper percentile. If we choose the upper percentile as an arbitrary threshold that an SPC score must exceed, then we can conclude that ranking by FR_D produces a recommended minimal subset of NUC = 2 using the L01 and L02 filters, using the spectral parameters of R440 and R_440_500 (Figures 5a and 5c).

[IMAGE OMITTED. SEE PDF]

When ranking by ACC_μ and ACC_D we see first that a set of SPCs and filter subsets are selected distinct from those recommended by FR, and second that evaluation by re-substitution yields multiple SPCs with equal scores for all but NUC = 4 (Figure 3b), and multiple SPCs with “perfect” scores of ACC_D = 1.0. This confirms that ACC evaluation by re-substitution is not a suitable method for ranking SPCs, and thus we recommend ranking by ACC_μ as opposed to ACC_D. We also see that for NUC = 5 when ranked by ACC_μ there are multiple equal-ranked SPCs. This implies that ACC_μ does not offer sufficient granularity to distinguish all SPCs, an issue that could be addressed through increasing the size of the testing data set, through adjustment of the train/test ratio, or sourcing a greater number of material entries. Of the SPCs selected by ACC_μ, we see that all top-SPCs for NUC > 1 have scores near the upper percentile. Taking the upper percentile as an arbitrary threshold of performance, we find that the minimum subset of filters recommended when ranking by ACC_μ has NUC = 3, using filters L01, L02 and L03, via the spectral parameters of R_440_500 and R_440_530 (Figures 5b and 5d).

Performance of Hematite Indicative Spectral Parameters of the Literature

We can place the spectral parameters used for the identification of hematite in previous studies (Fraeman et al., 2020) into the context of the results of this study by substituting the constituent spectral channels for the nearest afforded by PanCam. Of the 5 spectral parameters listed by Fraeman et al. (2020), 4 can be approximated uniquely with the PanCam filter suite, labeled as (as detailed through Equations 10 and 11): BD_740_840_1000, after BD867 (Bell et al., 2000) and BD860_2 (Viviano-Beck et al., 2014); BD_440_530_670, after BD527 (Wellington et al., 2017); BD_500_530_610, after BD535 (Johnson et al., 2015); and S_740_840, after S750:840 (Johnson et al., 2015). The data separation, rankings and metric scores achieved by these single spectral parameters are shown in Figure 6.

[IMAGE OMITTED. SEE PDF]

Ranking by either $F{R}_{\mu }$ or $AC{C}_{\mu }$ gives the same ordering to the literature spectral parameters of #1 BD_740_840_1000, #2 S_740_840, #3 BD_440_530_670 and #4 BD_500_530_610. With respect to the population, by $AC{C}_{\mu }$ , BD_740_840_1000 can be considered to perform well (93rd Percentile Rank (P.R.)), S_740_840 performs moderately (54th P.R.), whilst BD_440_530_670 and BD_500_530_610 perform poorly (18^th and 11^th P.R.s respectively). By $F{R}_{\mu }$ , BD_740_840_1000 can be considered to perform well (90th P.R.), whilst S_740_840, BD_440_530_670 and BD_500_530_610 all perform poorly (39th, 24th and 11th P.R.s respectively). That the highest rank achieved is #15239 (93rd P.R., BD_740_840_1000 ranked by $AC{C}_{\mu }$ ) shows that there exist many (∼15,000) SPCs capable of greater separation than the literature spectral parameters, under the conditions and metric definitions of this study and method. The literature SPCs, as single spectral parameters, have NUCs of 3 or 2, whilst the top-ranked SPCs for these NUC values are all in the 99th P.R., with the exception of NUC = 2 in the 95th P.R., when ranked by $AC{C}_{\mu }$ (Figure 3). This highlights the ability of the method to identify low-NUC SPCs with high performance, and the potential for improved performance by using combinations over single spectral parameters.

Interpretation of the Spectral Parameter Combinations

With regard to the literature spectral parameters, BD_740_840_1000 and S_740_840 both exhibit lower percentile rank by FR_μ (1.7, 90th P.R. and 0.645, 39th P.R. respectively) relative to the ACC_μ (0.986, 93rd P.R. and 0.883, 54th P.R.) respectively. By visual inspection of Figure 6a we see that the coexistence of these scores is explained by the opposite and high skewness of each class, such that visible separation is achieved (i.e., high ACC), whilst also achieving high total intra-class variance (i.e., low Fisher Ratio) (Figure 6b). These examples provide explanation of the existence of low FR_μ and high ACC_μ SPCs as consequences of unequal class covariance (Figure 6a) and non-Gaussian class distributions (Figure 6b), both of which are typically enforced as assumptions when conventionally applying LDA (Duda et al., 2001). This highlights the importance of verification of the Fisher Ratio representation of class separation with the classification accuracy.

As discussed by Fraeman et al. (2020) the literature spectral parameters are chosen to uniquely identify crystalline (“red”) hematite, whereas here we have not distinguished crystalline from specular (“gray”) samples in the target class. We can speculate that the positive-skew and fat-tail distributions of the hematite class in the BD_740_840_1000, BD_440_530_670, and BD_500_530_610 distributions (Figure 6) represent the separation of crystalline (“red”) and specular (“gray”) samples; validation of this could be achieved though isolation of the tail population and consultation of the metadata of the samples in VISOR (or the respective source spectral library) for grain size information. As this study focuses on deriving spectral parameter combinations from reflectance spectra only, we reserve the incorporation of additional metadata for future development.

Given the previous use of the 840 and 530 nm Band Depths and 740 versus 840 nm Slope spectral parameters of the literature, we might expect the top-ranked SPCs to also feature these, or similar, spectral parameters. Indeed, the top SPC for NUC = 1 is R840 under all rankings (Figure 3), and the top SPCs for all NUC when ranked by ACC_μ uses an 840 nm centered Band Depth (BD_530_840_900), and when ranked by FR_μ, although no direct equivalents of the literature spectral parameters are used, both constituent parameters use the 840 and 740 nm channels (Figure 2). The literature spectral parameter band depths are typically narrow and therefore trivial to interpret as absorption features; conversely, we note that the wings of the band depths composing the top-ranked SPC by ACC_μ are too broad to be interpreted as direct measurements of absorption. We can interpret BD_530_840_900 as simultaneously measuring the slope between the troughs of the 840–910 nm (⁶A₁➔ ⁴T₁) and the 535 nm (2(⁶A₁)➔2(⁴T₁)) transition absorption bands, and the slope across the 840–910 nm trough, and BD_440_780_1000 as measuring the 780 nm peak with respect to the VNIR short- and long-wavelength limits (Fraeman et al., 2020 and references therein). We would not expect either of these parameters alone to provide unique separation of hematite, verified by the visibly poor separation in the univariate plots of Figure 2b, and by low ACC_μ (0.466, 0^th P.R. and 0.686, 15^th P.R. for BD_530_840_900 and BD_440_780_1000 respectively), yet the combination provides the highest ranking ACC_μ score. This is a result not just of the combined observations on hematite samples, but the contrast when compared to the observations on the background class, that we see from Figure 2b typically gives negative values (concave curvature). This highlights the strength in this method of finding otherwise unintuitive spectral parameter combinations. Similarly, for the top-ranked SPC by FR_μ the individual spectral parameters perform poorly (both in the first P.R.).

The top minimal-NUC SPCs (Figure 5) share the use of the L01 and L02 filters, such that both focus on the relative reflectances between the short-wavelength single- and pair-electron transitions in the 400–535 nm region (Fraeman et al., 2020 and references therein), distinguishing between the near-flat profile exhibited by all hematite samples in this range in contrast to the typical positive slope of the basalt and phyllosilicate samples. It is notable that when the NUC is minimized in this way the infrared wavelengths featured in the SPCs of Figure 2 are discarded in favor of the short-wavelengths, in contrast to the expected importance of the depth of the ∼840 nm band and ∼535 nm bands relative to the neighboring red-to-infrared channels. In contrast to the overall top-ranked SPCs (Figure 2), the constituent spectral parameters of these minimal-NUC SPCs visibly provide some separation of the classes (Figure 5), albeit with ACC_μ scores below the median (0.778 (30th P.R.), 0.803 (34th P.R.), and 0.791 (32nd P.R.) for R440, R_440_500, and R_440_530 respectively).

Implications of Fisher Ratio and Classification Accuracy Distributions on Instrument Material Discrimination

The distributions of FR_μ or ACC_μ provide insight to the performance of the given multispectral instrument on the given material discrimination task. In this study, the ACC_μ upper quartile of 0.953 can be considered high; that is, we have found over 50,000 SPCs capable of classifying the data set with ≤5 misclassifications per 100 samples. This simple metric of the SPC population implies that the separation task is not difficult for the given target and background class and multispectral instrument, as expected given the discussion and results of Fraeman et al. (2020) (and references therein). This is verified by the more comprehensive ROC plot (Figure 4a), where the clustering in the top left axes (TPR ∼ 1, FPR ∼ 0) shows that many of the SPCs found can be considered near-perfect classifiers. The ACC_μ upper quartile and ROC plot could conceivably be implemented as a method of comparing the abilities of alternative multispectral instruments to discriminate target and background materials, building on the cross-comparison of Grindrod et al. (2022). The FR_μ upper quartile of 1.344 is less intuitively interpreted. If we take the square-root of Equation 3, we can interpret $\sqrt{F{R}_{\mu }}$ as a measure of the distance between the class means in terms of the quadrature sum of the class standard deviations (σ_t), indicating the statistical significance of the separation. Under this interpretation, we can report that the FR_μ upper quartile here implies a separation of 1.16σ_t. As an alternative to using the upper percentile as a threshold for selection of the top minimal-NUC SPCs, we could instead define absolute (as opposed to population-relative) limits, such as ACC_μ > 0.99 and $\sqrt{F{R}_{\mu }} > \,2$ , to enforce minimum separability requirements. However, as we explore alternative data sets and separation problems, we may find that these thresholds require revision, hence our recommendation of the upper percentiles as thresholds.

Ranking by Fisher Ratio Versus Ranking by Classification Accuracy

We have reported on the rankings of SPCs by both metrics of FR and ACC in parallel. ACC is the more direct measure of successful separation of a target from a background data set, as exemplified in Figure 2, where, when applied to the complete data set, the highest ranked FR_μ SPC included a misclassification event, whereas the highest ranked ACC_μ SPC did not. However, ACC_μ requires repeated training and testing cycles on randomly resampled data sets and the additional computational steps of defining a decision boundary and evaluating the projected data set against this, whereas FR_D can be evaluated on the complete data set just once (Figure 4c). In this study, trials were repeated 500 times as typically the metric mean and variance settled at near steady values after this number. The computation, implemented to perform LDA across all SPCs in parallel, requires 22 min 32 ± 1.5 s, whereas training the LDA coefficients on the complete data set just once for all SPCs takes 1.95 ± 0.01 s, using a consumer-grade laptop (Apple MacBook Air M2 2022). Therefore it is of interest to consider if FR_D can be used as a proxy for ACC_μ, in time-constrained dynamic spacecraft operation scenarios. As demonstrated in Figure 4b, the relationship between FR_μ, and by approximation FR_D (Figure 4c) and ACC_μ is highly nonlinear (r² = 0.547). This is expected: under the assumption that both classes in the data set are Gaussian distributed, as FR increases by either an increase in ${\left({\mu }_{a}-{\mu }_{b}\right)}^{2}$ or a decrease in ${\sigma }_{a}^{2}+{\sigma }_{b}^{2}$ (Equation 4) the area of overlap between the distributions, represented by ACC, would decrease at a rate proportional to the gradients of the two distributions, that tend to 0 far from the mean. This is evident in the lower boundary of ACC_μ values as FR_μ increases (Figure 4b). The range of ACC_μ exhibited at each FR_μ value, that has an upper value of ACC_μ ∼ 1 for much of the range of FR_μ, can be explained by considering that there is no requirement for each spectral parameter to represent each class with a Gaussian distribution. The key implication of this observation is that selecting an SPC with a high rank by FR_μ will be constrained to also have a high ACC_μ, but selecting an SPC with high rank by ACC_μ will not constrain an SPC with a high FR_μ. Therefore, we conclude that in time-constrained situations FR_D can be used as a computationally efficient proxy to the classification accuracy, but where time allows, ACC_μ should be evaluated and used as the ranking metric. However, we emphasize that this conclusion is drawn for this specific case study, and further investigations of alternative target and background material types should be conducted before generalizing this conclusion. Neglected from this discussion is consideration of the potential cost of misclassification as a consequence of poor filter selection. This is important in the evaluation of this method, but requires systematic consideration of the operational scenarios that the ExoMars Rover, or similar data-limited dynamic exploration missions, may encounter; we reserve such a study for future development of the method.

On the Fisher Ratio and Classification Accuracy Repeat-Holdout Variance

As we have used material reflectance spectra measured on prepared samples under laboratory conditions, we can expect data sets collected from natural scenes to exhibit noise, and thus it is preferable to select an SPC that exhibits robustness to data set variance. Figure 4e illustrates that for SPCs with FR_μ ≳ 1.8 there is a trade-off between FR_μ and the coefficient of variation, such that the higher ranking SPCs by FR_μ are not amongst the SPCs with the lowest scoring $\frac{F{R}_{\sigma }}{F{R}_{\mu }}$ values. With regards to classification accuracy, the relationship between ACC_μ, ACC_σ/ACC_μ, and FR_μ illustrated in Figure 4f implies that high FR_μ, and therefore high ACC_μ SPCs will have scores that have low sensitivity to variations in the data sets, with $\frac{AC{C}_{\sigma }}{AC{C}_{\mu }}< 0.1$ . Whilst it would be possible to incorporate the $\frac{F{R}_{\sigma }}{F{R}_{\mu }}$ score into the SPC ranking process, we have not implemented this, as $\frac{F{R}_{\sigma }}{F{R}_{\mu }}$ requires repeat-holdout, and thus it's evaluation loses the efficiency advantage that FR_D otherwise holds over ACC_μ, that we have established takes precedence as a ranking metric. In further investigations using more diverse data sets of background and target materials, and when validating the method on natural scenes and incorporating instrument noise into the analysis, we recommend that changes to the mean versus coefficient-of-variance distributions should be tracked, for insights into the effects of these additional uncertainty sources on the use of LDA.

Method Limitations and Scope for Further Development

Limitations

We have demonstrated that the method of constructing dimension reductions using linear combinations of spectral parameters provides an efficient means for exploring the separability of a spectral data set for a given multispectral imaging system, and for finding viable combinations of minimal numbers of multispectral filters. The key limitations to the study are associated with the choice of data set, namely the exclusive use of (a) spectra of pure-endmember minerals, (b) spectra of powdered samples captured under fixed phase angles, and (c) the restricted scope of background and target materials used, (d) the non-Gaussian class distributions. To address problems 1, 2, and 3, before application of the method in operational scenarios the recommended SPCs should be trialled on sets of mixed minerals, expressed in comprehensively characterized bulk samples, or complete natural scenes, under natural illumination conditions, and using background materials that have been selected as analogs of Oxia Planum under more rigorous criteria. Currently no comprehensive analogs of Oxia Planum have been identified on Earth, and so validation studies using natural scenes, such as those captured in previous PanCam analog campaigns (Allender et al., 2021; Harris et al., 2015), may require complementary dedicated laboratory studies combined with synthetic investigation through spectral imaging simulation (e.g., using methods developed by Stabbins, 2022). Extending the data set in this way will crucially provide more robust evaluation of the discriminatory ability of the SPCs, in particular those that incorporate multiple broad spectral features that have potential for false-positive target material identification, but also more robust training of the linear discriminants, through exposure to a greater diversity of materials that may occur in natural scenes on Mars. Problem 4) is unavoidable for this task, as there is no a priori condition for target and background material collections of assorted mineral types to have Gaussian distributed reflectance. Despite this, this case-study has found that high Fisher Ratio scores constrain high classification accuracy scores. However, the generality of this observation must be investigated further, through testing on a diverse set of target and background classification tasks, as applicable to the context of the Oxia Planum landing site to be explored by the ExoMars Rosalind Franklin rover.

On the Influence of Dust

At Oxia Planum we can expect our material signatures to be mixed with those of the global Martian surface dust (Kinch et al., 2015), including a ferric signature characterized by strong ratios of the form R_[>600 nm]_[<600 nm] and a shallow 850 nm absorption similar to hematite, but without a ∼535 nm band, such that we can expect the hematite to be separable from the dust (Fraeman et al., 2020). We can expect the SPC rankings to be sensitive to the inclusion of dust, penalizing SPCs that rely on R_[>600 nm]_[<600 nm] type spectral parameters (e.g. the SPC of Figures 2a and 2c). To investigate the implications of dust on the LDA method and SPC recommendations, further studies should not only include dust as an endmember in the background material set, but also consider linearly and nonlinearly modeled mixtures of dust with both the background and target minerals (Johnson et al., 2006).

Minimal Filters for Multiple Problems

In this study we have limited ourselves to the 2-class problem, such that we only consider paired combinations of spectral parameters. Extending to n-class LDA requires the consideration of n + 1 spectral parameters in each combination, with the number of SPCs increasing exponentially with n + 1; to keep this method computationally light-weight we have therefore not extended to the multi-class case. However, the existing scheme may still be used for multi-class classification, if multiple classification tasks are defined as separate problems, and the result tables of SPCs (one for each classification problem) are compared to find the highest scoring SPCs that use common filter subsets across all problems. This can be interpreted as utilizing the variance (i.e., information) of the data set in SPC space perpendicular to the linear discriminant (e.g., Figure 2), such that information discarded in one SPC to be utilized in another.

Mapping Spectral Parameter Combinations to PanCam Image Products

An SPC learned from a spectral library can be applied directly to PanCam images of natural scenes by evaluating spc_x,y (Equation 14) for each pixel, via the constituent spectral parameters (Equations 8–13), and then producing a binary classification mask by comparing spc_x,y to the boundary value, and highlighting pixels with positive target material detections (e.g., hematite). To preserve the information contained in the range of spc_x,y values, we could alternatively produce a novel diverging colormap, centered on the boundary value, with color saturation reached at the mean value of each class. Future studies should explore these opportunities for novel image representations.

Applicability of the Method Beyond ExoMars

The method is in principle applicable to any VNIR multispectral imaging experiment, in particular those operated on board dynamic exploration missions; this instrument-generality could be investigated trivially by substitution of the filter transmission profiles. Application of the method to current Mars rover multispectral imagers (Bell et al., 2017, 2021) or future sample collecting missions such as the MMX OROCHI instrument (Kameda et al., 2021) could identify minimal subsets of filters for prioritized transmission under data volume and time constrained scenarios, effectively improving the data efficiency of science-driven operation decisions A key extension of the method that would become prescient in this cross-instrument comparison would be the incorporation of instrument noise, via synthetic generation of Gaussian distributed reflectance values according to the expected instrument signal-to-noise ratio. The additional noisy samples would directly effect the Fisher Ratio and classification accuracy scores, such that the method would otherwise not need modification.

Conclusions

In order to investigate ways of selecting and combining a minimal subset of multispectral filters to perform a material identification task, we have developed and demonstrated a method for exhaustively evaluating and ranking the Fisher Ratio of a labeled data set of material reflectance spectra across all pair combinations of all channel permutations of a fixed set of spectral parameter types. The method provides recommendations of filter combinations in terms of optimal linear combination coefficients of pairs of spectral parameters, and we have applied the method to the task of separating hematite from a mixture of basalts and phyllosilicates observed with the multispectral filters of PanCam for the ExoMars Rover. We have found that a maximum accuracy of 99.6% can be achieved with 6 filters, and that when limited to 3 filters an accuracy of 99.2% can be achieved. We have found that when ranked by the more efficiently computed Fisher Ratio, the top spectral parameter combination achieves an accuracy of 99.1% with 4 filters, and when limited to 2 filters an accuracy of 95.1% is achieved. We have found that the top scoring Fisher Ratio spectral parameter combinations (FR_μ > 2.536 (99th P.R.)) exhibit high classification accuracy (ACC_μ ≳ 0.9) and low sensitivity to variations of the data set (ACC_σ/ACC_μ < 0.02). As the Fisher Ratio can be evaluated on a timescale of ∼1 min on all spectral parameter combinations and across the complete data set in parallel, we conclude that the method of evaluating and ranking spectral parameter combinations by Fisher Ratio shows promise for facilitating time-sensitive command sequencing for data-limited dynamic spacecraft mission operations. However, when timescales allow for processing times of >20 min we recommend the evaluation of linear discriminant classification accuracy over 500 repeat-holdout trials. This conclusion is drawn under the specific conditions of this study, namely the materials selected and their representative reflectance spectra, and should be investigated on a diverse range of material discrimination tasks and validated against natural scenes before deployment during spacecraft operations. We have also found that the analysis offers a novel method for investigating the ability of a given multispectral instrument to perform a given material discrimination task, and for finding novel and unintuitive spectral parameters and combinations that maximize class separation. We have found for this specific study that there are ∼50,000 spectral parameter combinations that score >95% classification accuracy, and that more than 15,000 spectral parameter combinations yield Fisher Ratio and classification accuracy scores greater than the spectral parameters used previously in the literature for crystalline hematite discrimination. We anticipate that further novel spectral parameter combinations could be discovered through systematic exploration of the materials cataloged by the VISOR spectral library collection, not only for the ExoMars PanCam instrument, but any VNIR multispectral imaging system.

Acknowledgments

RBS, PMG, CRC, and EJA thank the UK Space Agency for support (Grants ST/T001747/1 and ST/Y005910/1). SM acknowledges a UK Science and Technology Facilities Council (STFC) PhD studentship (Grant ST/R504961/1). We thank 2 anonymous reviewers and A. Fraeman for their insightful comments that have helped to improve the quality of this manuscript.

Data Availability Statement

The spectral library (mineral reflectance data) used for training and testing the Spectral Parameter Combination Classifiers, and the instrument data used to sample the spectral library, in the study are available at Zenodo via (R. Stabbins & Grindrod, 2024b). All processed data produced for this study are available at Zenodo via (R. Stabbins & Grindrod, 2024a). Version 0.1 of the Spectral Parameters Toolkit (sptk) used for resampling the spectral library with the instrument data, computing the spectral parameters and performing the training and testing of the Spectral Parameter Combination Classifiers is preserved at , available via the MIT license and developed openly at (R. Stabbins & Grindrod, 2024c). The Jupyter Notebook to execute the analysis in the paper is hosted at and is preserved at (R. Stabbins & Grindrod, 2024d).

References

Allender, E. J., Cousins, C. R., Gunn, M. D., & Mare, E. R. (2021). Multiscale spectral discrimination of poorly‐crystalline and intermixed alteration phases using aerial and ground‐based ExoMars rover emulator data. Icarus, 114541, [eLocator: 114541]. [DOI: https://dx.doi.org/10.1016/j.icarus.2021.114541]

Word count: 9678

Show less

© 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In this paper we address two problems associated with data‐limited dynamic spacecraft exploration: data‐prioritization for transmission, and data‐reduction for interpretation, in the context of ESA ExoMars rover multispectral imaging. We present and explore a strategy for selecting and combining subsets of spectral channels captured from the ExoMars Panoramic Camera, and attempt to seek hematite against a background of phyllosilicates and basalts as a test case scenario, anticipated from orbital studies of the rover landing site. We compute all available dimension reductions on the material reflectance spectra afforded by 4 spectral parameter types, and consider all possible paired combinations of these. We then find the optimal linear combination of each pair whilst evaluating the resultant target‐vs.‐background separation in terms of the Fisher Ratio and classification accuracy, using Linear Discriminant Analysis. We find ∼50,000 spectral parameter combinations with a classification accuracy >95% that use 6‐or‐less filters, and that the highest accuracy score is 99.6% using 6 filters, but that an accuracy of >99% can still be achieved with 2 filters. We find that when the more computationally efficient Fisher Ratio is used to rank the combinations, the highest accuracy is 99.1% using 4 filters, and 95.1% when limited to 2 filters. These findings are applicable to the task of time‐constrained planning of multispectral observations, and to the evaluation and cross‐comparison of multispectral imaging systems at specific material discrimination tasks.

Details

Title

Optimizing ExoMars Rover Remote Sensing Multispectral Science II: Choosing and Using Multispectral Filters for Dynamic Planetary Surface Exploration With Linear Discriminant Analysis

Author

Stabbins, R. B.¹

; Grindrod, P. M.¹

; Motaghian, S.¹; Allender, E. J.²

; Cousins, C. R.³

¹ Natural History Museum, London, UK
² Department of Space & Climate Physics, Mullard Space Science Laboratory, University College London, London, UK
³ School of Earth & Environmental Sciences, University of St. Andrews, St Andrews, UK

Section

Research Article

Publication year

2024

Publication date

Oct 1, 2024

Publisher

John Wiley & Sons, Inc.

e-ISSN

2333-5084

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1029/2023EA003398

ProQuest document ID

3121355246

Optimizing ExoMars Rover Remote Sensing Multispectral Science II: Choosing and Using Multispectral Filters for Dynamic Planetary Surface Exploration With Linear Discriminant Analysis

Jump to:

Full text

Abstract

Details

Suggested sources