1. Introduction
Nonlinear dimensionality reduction (NDR) mapping [1,2,3,4] addresses transforming high dimensional observations to an embedded lower dimensional manifold. NDR mapping has attracted many attentions for analyzing large volume of high dimensional observations, such as genomics [5], images [6,7], video [8] and audio signals. The goal is to preserve and visualize neighborhood relations of observations by displaying transformed images in the low dimensional output space. Principle component analysis (PCA) [9,10] extracts orthogonal eigenvectors, termed as principle components, which serve as internal representations of given training observations for linearly transforming high-dimensional observations to an output space. Linear projections of observations on selected principle components can be determined without reserving original training observations. Linear transformation by selected principle components can operate as an online process that transforms one observation at a time, but it has been shown infeasible for topology preserving [1,2,3] and can’t be directly applied for NDR mapping.
Locally linear embedding (LLE) [1] has been presented for NDR mapping and data visualization. LLE is a batch process that simultaneously determines images of all given observations in a training set . Applying the -nearest neighboring method, it recruits k closest observations to form the neighborhood of each observation . It assumes a locally linear relation within such that observation is a linear combination of observations in . For topology preserving, images of observations in are regarded as neighbors of the image of . After optimizing coefficients of a correspondent linear relation that expresses each , LLE further poses a linear relation within images of observations in . Based on the assumption that is a linear combination of images of observations in using , solving linear relations that express all simultaneously attains images of all observations.
In LLE, expressing by a linear relation makes use of the neighborhood and coefficients of the linear relation that expresses . Inferring the image of any novel observation during execution phase hence needs neighbors defined over all training observations. LLE cannot operate with only internal representations extracted from for image inference during testing phase. This limits portability and computational efficiency of LLE due to massive memory access to all training observations. To overcome the difficulty, this work extends LLE to locally nonlinear embedding (LNE) for NDR mapping. LNE adopts nonlinear relations for inferring images of novel observations during execution phase. LNE stands within a larger scope than LLE and can operate with only extracted internal representations for neural approximation of NDR mapping.
Similar to LLE, Isomap [2] and Laplacian Eigenmaps [11,12] maintain for each . Isomap [2] applies the nearest neighboring method to calculate geodesic distances and applying the traditional multi-dimensional scaling method [13,14,15], equivalently PCA, to infer images of all observations. Laplacian Eigenmaps sketch the k-nearest-neighbor graph based on for all and solve the generalized eigenvalue problem for inference of images of all observations. Both Isomap and Laplacian Eigenmaps require reserving all training observations for inferring images of novel observations during execution phase.
Self-organization maps (SOM) [16,17,18,19] as well as elastic nets (EN) [20,21] use grid-organized receptive fields as adaptable internal representations for inferring images of observations. Unsupervised learning is a process that extracts internal representations subject to training observations. Equipped with well extracted receptive fields, SOM emulates a cortex-like map, and attains a two-dimensional embedding for topology preserving mapping. It activates one and only one node in response to an observation following the winner-take-all principle. The active neural node must have a receptive field that is closest to the given observation and its geometrical location on a grid refers to the inferred image in the low-dimensional embedding. SOM infers images of novel observations during execution phase without reserving training observations. Since unsupervised learning of SOM makes use of updating operations, which directly adopt Euclidean distances among observations, it needs further improvement for NDR mapping.
The NDR mapping proposed in this work ensures properties of extracting essential internal representations and recovering the low-dimensional embedded manifold. Based on the extracted internal representations and locally nonlinear embedding, the NDR mapping infers images of novel observations during testing phase, requiring no reservation of training observations.
This work proposes graph-organized data supports to scope training observations. The union of graph-organized data supports well sketches the underlying global density support of raw observations. Internal representations of the proposed NDR mapping contain a set of receptive fields and built-in parameters of adalines (adaptive linear elements) [22,23], where receptive fields are related to represent centroids of distributed data supports. The scope of each local data support is a -dimensional regular box, where is less than or equals the dimension of the input space. A neural module consisting of pairs of adalines is employed to determine the membership of observations to a correspondent data support. An adaline neural module is an indicator to the scope of a correspondent data support. There are neural modules, respectively determining individual scopes of data supports as well as their neighboring relations. Neighboring relations among data supports are related to edges of a graph. Derived from training observations, the graph configuration describes neighboring relations among data supports. All neural modules are further extended for NDR mapping. The extension simply equips every neural module with a posterior weight that represents the image of the centroid of a correspondent data support. The image of every centroid and images of its neighboring centroids following the property of locally nonlinear embedding induce nonlinear constraints for optimizing all posterior weights.
Internal representations extracted from training observations include features well characterizing the membership to every data support. Based on extracted internal representations of neural modules as well as posterior weights, the NDR mapping following locally nonlinear embedding can infer images of novel observations during testing phase without reserving original training observations. This property highly increases portability of the proposed NDR mapping. The size of adaptable built-in parameters for the proposed NDR mapping depends on the number of neural modules and the dimension of every data support. Massive training observations are no more required during execution of the proposed NDR mapping for testing.
The challenge is to optimize adaptable built-in parameters and posterior weights of joint adaline neural modules for the proposed NDR mapping. The union of graph-organized data supports sketches a bounded domain of the proposed neural approximation for NDR mapping. The NDR mapping explored in this work transforms high dimensional observations to images in the output space that recovers the manifold embedded within the input space. It is realized by adaline neural modules extended with posterior weights. The learning process mainly contains stages respectively constructing graph-organized cluster supports and optimizing posterior weights by the Levenberg-Marquardt algorithm [24,25,26,27]. The first learning stage is aimed to optimize centroids, bulit-in parameters of adaline modules and graph interconnections for representing graph-organized data supports. The second stage is to determine posterior weights by solving a nonlinear system that characterizes distance preserving mapping of centroids to images in the output space. The proposed neural approximation realizes NDR mapping without reserving training observations, depending only on adaptable feature representations or built-in parameters. Equipped with well trained built-in parameters, the proposed NDR mapping can determine the image of a novel testing observation during execution phase by resolving constraints of locally nonlinear embedding.
2. Materials and Methods
2.1. Adaline Neural Modules for Representing Distributed Data Supports
Adaline neural modules are proposed for NDR mapping from the space of high dimensional observations to the output space of images. Those neural modules are with lateral interconnections and posterior weights for performing composite linear and nonlinear transformation. Every neural module is equipped with a receptive field as well as pairs of adalines [22,23], where denotes the projection dimension, in reception of observations. An NDR mapping model is composed of many neural modules. Like SOM and EN, there exist lateral interconnections among neural modules. The NDR mapping is realized by graph-organized neural modules, where the graph configuration represents neighboring relations among extracted data supports with interconnections dynamically derived subject to training observations.
Let denote an observation in the input space and denote a receptive field. The deference is a result of de-mean, as is related to the centroid of a data cluster. The difference propagates forward through projective fields of adalines as shown in Figure 1, where denotes a projective field and matrix collects receptive fields. The projection forms an external field to paired threshold elements of adalines,
and when both threshold elements are active in response to , it means that .Each adaline neural module for receipting observations is equipped with projective fields and pairs of threshold elements in additional to a receptive field. Let , where and respectively denote the lower and upper bounds of projection . When all pairs of threshold elements are active, the tuple is within a K-dimensional cuboid or box defined by the Cartesian product of projection intervals, , expressed by
For , in the original observation space, a bounded region termed as a data support is expressed by
A neural module is equipped with a threshold element that has a lower bound for indicating membership to as shown in Figure 1. This threshold element activates if all of threshold elements contribute positive responses to given . Internal representations of a neural module in Figure 1 include a receptive field , projective fields and pairs of lower and upper bounds, for reception of observations.
When , equivalently , an adaline module summarizes the membership to the cluster support and is activated if the external field reaches the lower threshold level, . If the membership threshold element is active, the attached posterior weight produces a non-zero image in the output space , where is less than for dimensionality reduction, otherwise the output is a zero vector.
The membership threshold element in an adaline neural module activates to indicate the membership of the input to a correspondent data support. The proposed NDR model consists of multiple graph-organized neural modules, each possessing its own data support. The union of all cluster supports presents an approximation to the global density support underlying training observations in the original input space for modeling the embedded manifold. An NDR neural model contains not only receptive fields for input perception but also posterior weights for output generation.
The NDR neural model shown in Figure 2 is composed of adaline neural modules. Each neural module is with internal representations, including a receptive field , projective fields in matrix form , lower and upper bounds, respectively denoted by and , and a posterior weight . Figure 3 shows training observations from the Swiss roll and edges of vertices in a graph. Each vertex in the graph has a set of neighboring vertices, denoted by , according to the graph configuration derived by the learning process. It is notable that neighboring relations of nodes exactly concise with those among correspondent data supports and lateral interconnections of joint adaline neural modules.
2.2. NDR Model Learning
The proposed neural model for NDR mapping inherits receptive fields and graph-organized neural nodes of SOM and EN. The model design further recruits neural organization of dynamical graph configuration, projective fields, threshold elements and posterior weights. For data analysis, an NDR model has adaptable built-in parameters for representing distributed cluster supports, neighboring relations of local supports, and images of centroids. NDR model learning is with objectives of well approximating the global density support underlying training observations by the union of distributed cluster supports, structuring neighborhood relations of distributed cluster supports by the dynamic graph configuration and determining images of centroids based on distance preserving mapping.
2.2.1. Clustering Analysis for Learning Receptive Fields
Given high dimensional observations are patterns without labels for NDR mapping. Let be a training set. NDR model learning is subject to unlabeled training observations and is hence unsupervised. An NDR model transforms an observation to an image that is not explicitly provided in advance. This work relates receptive fields to centroids of clusters derived by clustering analysis subject to training observations in .
Mathematical modeling [28,29,30] for clustering analysis involves formulation of constraints and the objective using mixed integer and continuous variables. The mixed integer programming leads to the annealed clustering algorithm [28,29] in Appendix A, where receptive fields eventually partition training observations in to non-overlapping subsets. Let collect observations that are closest to among receptive fields,
(1)
The annealed clustering algorithm attains not only all but also the exclusive membership of each to subsets, denoted by a unitary vector of binary elements, , where . There is one only one active bit among binary bits in . It is ensured by the annealed clustering algorithm that is the only active bit if and only if belongs to . represents the membership of to subsets partitioned by receptive fields.
The objective function [28,29] for optimizing continuous receptive fields and discrete memberships could have different forms under variant computation objectives. The objective function is commonly not differentiable with respect to discrete . Minimization of the objective function with respect to discrete and continuous variables by the annealed clustering algorithm has been extensively verified in previous works [28,29]. Appendix A gives a simplified version of the annealed clustering algorithm. Numerical simulations that show its effectiveness and reliability for clustering analysis have been extensively given in previous works [28,29].
2.2.2. Deriving Cluster Supports by Optimizing Adaline Modules
Each adaline module in Figure 2 plays a role of indicating the membership to a cluster support. Clustering analysis partitions training observations in to disjoint subsets. It follows and . Both SOM and EN make use of receptive fields to determine the exclusive membership of an observation following the winner-take-all (WTA) principle. This work further constructs cluster supports by optimizing adaline modules. A local cluster support is related to a region that is characterized by orthogonal projective fields derived from training observations in , where . The membership to is determined by lower and upper thresholds of projections on projective fields.
The receptive field derived by clustering analysis well represents the mean of training observations in . Subtracting attains demeaned observations in . Let matrix denote the covariance matrix of demeaned observations in . Orthogonal projective fields can be obtained by solving the following eigenvalue problem,
Orthogonal projective fields, denoted by , are set to eigenvectors corresponding to largest eigenvalues. Let and respectively denote the lower bound and the upper bound of projections on over , where
Built-in parameters in an adaline module including projective fields and pairs of lower and upper thresholds have been determined for constructing cluster supports. The projection interval denotes the minimal interval that covers projections of all demeaned observations in on .
Let denote the result of transferring to vector , where . The local support is defined by
(2)
If belongs to the Cartesian product , belongs to .
If belongs to , by definition of and , the component of must belong to for any , and belongs to . If equals , is simply a geometry box with orthogonal edges and is centered at , containing training observations in . The union of all is an approximation to the real global density support that covers all training observations in . The volume of the local support corresponding to can be calculated by
(3)
The ratio of cluster size to the volume can be calculated by that represents the uniform density of and the global uniform density is estimated by .
It is notable that the membership of the projection to is calculated by pairs of threshold elements shown in Figure 1. The top threshold element in Figure 1 that has a lower bound, , is active if pairs of threshold elements are active.
2.2.3. Graph Configuration and Neighboring Relations of Cluster Supports
The membership of to can be determined by the adaline neural module in Figure 2. It takes one subtraction and projections on projective fields and comparisons by threshold elements to determine the membership. It is extended to determine the membership of a line segment between two points in the space of to in case of . The membership of a line segment to a cluster support is significant for calculating the distance between centroids of and . The line segment between and is expressed by
where . By linearly spacing the interval to a partition, one can obtain a lot of equally spaced points on the segment. If all sampled points on the segment belong to , it is inferred that the line segment belongs to . as shown in Figure 4C. In the occasion, the distance between and is defined by(4)
If there exists some t such that does not belong to , the above definition is not feasible for determining the distance between centroids of and .
When equals , the mapping is invertible. is a geometrical box with vertices defined by the Cartesian product . Two vertices with same coordinates and only one different coordinate are neighboring. Applying the inverse of to two neighboring vertices induces two end points of an edge of in . Again the membership of each edge of to can be examined by the neural circuit in Figure 2. If there exists one vertex or one point sampled from an edge of belonging to , or inversely, from an edge of belonging to , is not empty. Two cluster supports are neighboring, if their intersection is not empty. The neighboring relations are maintained by a symmetric interconnection matrix with element . Both and equal one if and are neighboring, and are zero otherwise.
The distance between and can be also determined if some part in the whole segment does not belong to as shown in Figure 4D and equals one. The following definition is employed if for some t and ,
(5)
Calculation of is approximated by searching for the best among vertices and edges of two neighboring cluster supports for simplification. For , subject to given training observations from , the vertices of can not be determined from the Cartesian product , since is no more invertible. If is less than , the nearest distance between and is employed to determine the neighboring relation of and .
When , if is less than a predetermined small positive number , and , otherwise, where denotes the nearest distance between and defined by the minimal distance between any and . The two observations, respectively belonging to and , whose distance induces less than could be reserved for recalculating during testing phase. The distance between centroids and , if , is determined by the sum of three distances, respectively between and , and , and and , where , , and the distance between and is less than . Both and in accompany with for , are reserved in order to calculate the distance between two observations respectively belonging to two neighboring local supports.
For the case with , the connectivity is one if intersection of and is non-empty and is zero otherwise. Since the projection function is invertible, vertices and edges of each local support can be sketched and their memberships to other supports can be checked directly. However, in case with , the projection function is not invertible. The connectivity is checked by comparing the set distance between and with a predetermined threshold. Figure 3B shows the connectivity derived subject to the Swiss roll data. The graph configuration, denoted by , is not predetermined and is a result of checking neighboring relations between cluster supports. Neighboring relations among training observations in LLE have been extended to neighboring relations of data supports.
The distance between centroids of two cluster supports that are not neighboring is calculated based on given graph configuration and all with . The problem now is to determine between and for . Generally, denotes edges among nodes on a graph such that nodes and are connected if and only if . The distance corresponding to an edge that connects nodes and is well defined. Two arbitrary nodes on a graph are path-connected if there exists a path between them. The distance between two path-connected nodes can be determined by the shortest path algorithm of Dijkstra [31].
If nodes on the graph are path-connected, all distances among nodes or all entries in matrix D are determined.
2.3. Levenberg-Marquardt Learning for Distance-Preserving Mapping and Optimal Posterior Weights
The distance matrix D provides clues for determining images of centroids of cluster supports, where images of centroids are regarded as optimal posterior weights of graph-organized neural modules for NDR mapping. The dimension of the output space is typically less than or equals three for data visualization by computer graphics. If the output dimension equals one, without losing generality, the output space is related to the interval for visualization and the posterior weight refers to the radian of a unitary circle. The problem of determining all posterior weights turns to find images on a unitary circle based on the distance matrix . On a unitary circle, each image has two neighbors. The goal is to minimize the total distance of neighboring images subject to given . The neighboring relation defines a cycle path that visits each node exactly once and returns to where it starts. The task is different from a typical TSP (traveling salesman problem) that is provided with city cites within a Euclidean space.
The Hopfield neural network [32,33] and Potts neural networks [34,35] for solving TSP with only the distance matrix can be directly applied for optimizing images of centroids, equivalently posterior weights. Given distance matrix D, neural approaches [34,35] attain a circular sequence that visits centroids. Let denote the index of a node at stop on the determined circular sequence, where +1 and for circular representation. Let denote the difference between images of node and . Let be proportional to and the sum of all be . Then
(6)
By setting the image of node to zero, one can determine the image of node by adding to the image of node for all .
For , the matrix contains distances defined inside the union of cluster supports with the non-negative, symmetric and triangle properties for distance measure. The first two properties are trivial. The third triangle property holds for distances in D. Let denote the shortest path from node to node k. According to the shortest path algorithm, path contains no cycle. Let and respectively denote the shortest path between node i and node j, and node j and node k. Let node j be absent in path . The triangle property holds if . Assume greater than . It follows that path is not the shortest one, since it could be improved by the path that concatenates with . This leads to a contradiction. So, the length of must be less than or equal to .
Now the task extends to find images of centroids or posterior weights, where the dimension of the output space is two or three and . The purpose is to find and display the image of every . The distance is determined by Equations (4) and (5) if , and by the shortest path algorithm otherwise. For distance preserving mapping, distances in D propose the following constraint for determining and in the output space,
(7)
The constraint (7) for any path-connected node and constitutes a nonlinear system. There are constraints. The LM (Levenberg-Marquardt) algorithm has been extensively applied for learning multilayer neural networks [26,27] and solving nonlinear system. This work applies the LM algorithm for solving the nonlinear system (7) and determining images of centroids for distance preserve mapping. The following criterion is formulated for least square fitting,
where denotes a collection of all posterior weights. The outstanding performance of the LM algorithm has been extensively explored in previous works [26,27] for learning neural networks. The LM algorithm is applied to find optimal defined byIf contains unconnected subgraphs, solving the nonlinear system (7) by the LM algorithm can be separately applied to every unconnected subgraph. There will be multiple isolated subgraphs and multiple sets of images, each corresponding to one subgraph whose nodes are path-connected.
2.4. Locally Nonlinear Embedding for NDR Mapping
Learning the proposed neural model subject to training observations in achieves optimal built-in parameters, including receptive fields, projective fields, projection bounds and posterior weights. This section presents locally nonlinear embedding (LNE) for NDR mapping based on a neural model that has been equipped with optimal built-in parameters and posterior weights.
In response to an observation in the training set , a well-trained NDR model is expected to generate an image in the output space. Let belong to some subset . Then belongs to cluster support and a correspondent neural module is active. Let and with denote the distance between and similar to distance measure described in Equation (5). If the line segment between and belongs to , since , is calculated by the Euclidean distance between and , otherwise by the following equation
(8)
The calculation of is approximated by seeking the best on vertices and edges of for simplicity. Similar to Equation (7), the LNE distance-preserving constraint is given by
(9)
where . Equation (9) poses locally nonlinear constraints for inferring image subject to given distances. Following locally nonlinear embedding, the constraint (9) specifies a nonlinear system, where the only unknown is and the constraint number is only . By the strategy of distance preservation, optimal can be trivially resolved by the LM algorithm with random initialization near . A well trained NDR mapping can be applied to determine the image of each observation in .A well trained NDR model is employed to generate locally nonlinear constraints (9) for mapping a novel observation. The graph-organized neural modules simultaneously operate to encode a novel observation . If these is no active module that generates an image in the output space, it is implied that is out of all local supports and is not within the domain of NDR mapping, otherwise there exists at least one local support that covers . Let denote a set that collects indices of local supports commonly covering . These local supports must be overlapping and is a small set only with several elements. Let denote a vector with element that measures the distance between and according to Equation (8), and the image vector consist of , where for each . Locally nonlinear constraints (9) are well defined by given vectors and . The calculation of in case of and has been given in contexts of Section 2.2.3.
Inferring the image of is based on distances in and images in . According to Equation (9), locally nonlinear constraints are in number identical to the length of vector or , or the sum of over . Since is a small set and there is only one unknown, locally nonlinear constraints (9) constitute a nonlinear system that can be easily solved by the LM algorithm.
3. Results
Numerical simulations verify the proposed neural model and learning process for NDR mapping of the 3-dimensional Swiss-roll data [1,2]. Given Swiss-roll data contain N = 5000 3-dimensional points, . The training data S are embedded within a 2-dimensional surface as shown in Figure 3A. The Swiss-roll data are suitable for evaluating the effectiveness of the proposed neural model for NDR mapping.
The first step applies the annealed clustering algorithm [28,29] in Appendix A to partition S to M non-overlapping subsets as described in Section 2.2.1. This step obtains receptive fields as centroids of cluster supports.
The second step completes cluster support construction for optimal projective fields and projection intervals as described in Section 2.2.2. The projection dimension K is set to three for current example. The centroid of a correspondent local support is set to the mean of a cluster. Figure 4A and Figure 5A show all centroids. Subtracting from observations in leads to demeaned observations, subsequently a covariance matrix . Solving the eigenvalue problem corresponding to attains K orthogonal projective fields, , and K projection intervals, each being denoted by . As shown in Figure 4B, these parameters sketch a local support as a 3-dimensional box. The union of all local supports approximates the global density support underlying observations in S.
If M = 1, there is only one cluster support. Let collect novel 20,000 data points randomly sampled from the sole regular cluster support that covers all observations in S. A well-trained neural model with M = 64 can evaluate memberships of newly sampled observations in to the union of M cluster supports. The membership is verified if there exists at least one activated module in the neural model. Reserving those inside the union of cluster supports attain results in Figure 5B, which displays similar structure to the original Swiss-roll.
The third step figures out lateral interconnections of neural modules, equivalently neighboring relations of cluster supports. For current situation, is invertible such that vertices and edges of cluster support can be efficiently tracked. Figure 6A shows neighborhood relations of cluster supports in the input space. On the basis, the distance between centers of two neighboring cluster supports has two different ways for calculation, depending on the straight line between two centroids and totally within two neighboring cluster supports as shown in Figure 4C or partially within as shown in Figure 4D. Based on distances of centroids of neighboring cluster supports, as described in Section 2.2.3, the shortest path algorithm of Dijkstra [12] is further applied to determine the distance between every pair of path-connected nodes m and n, where . Now all entries in matrix D have been determined for inferring images of all centroids.
Based on distances of centroids in matrix D, distance preserving constraints in Equation (7) constitute a large scaled nonlinear system, where all images of centroids in are unknown. As described in Section 2.3, the LM algorithm minimizes the least square criterion for optimizing all posterior weights. As a result, images of all on a 2D plane are shown in Figure 6B. The centroids in Figure 6A are mapped to images according to the distance preserving criterion. This illustrates significance of applying the LM algorithm for topology-preserving dimensionality reduction.
A well trained NDR neural model is employed to map each training or testing observation. As described in Section 2.4, locally nonlinear constraints (9) are given for each training or testing observation, where the only unknown denotes the generated image in the output space. Figure 6C shows the generated images of all observations. The LM algorithm is effective and reliable for seeking the image of each testing observation that satisfies locally nonlinear constraints (9). It is notable that a well-trained NDR neural model is semi-parametric. It operates without reserving full training observations and is able to map novel testing observations to an embedded low dimensional manifold faithfully.
4. Discussion
Let V(M) be the sum of all Vi, where Vi denotes the volume of cluster support determined by Equation (3). V(M) is expected to be related to an upper bound of the volume of the the global density support that covers all data points in S. However, as shown in Figure 5A, V(M) decreases as M increases. Hence minimal V(M) cannot serve as the only criterion of setting M. For current example, two local supports are neighboring if . The connectivity of the graph G tends to increase as M increases. Following the minimal volume and maximal connectivity principle, the optimal number of cluster supports derived by advanced clustering analysis is determined by
(10)
where C(M) is the ratio of the sum of elements in G to the number, M(M-1), of full connections. Figure 5A shows empirical volume V(M) and connectivity C(M), where the horizontal axis denotes the number of clusters. Here λ is set to V(1) for balancing two quantities. As the number of clusters increases to M = 64, it tends to balance volume and connectivity. Advanced clustering analysis partitions S into M clusters, as shown in Figure 5B. For this case, nodes on the graph are connected and there is no isolated node.The proposed NDR model is applied for neighborhood preservation mapping. Consisting of two hidden layers in additional to input and output layers, the proposed NDR model is a deep neural network [36,37,38]. For generalization, learning an NDR model subject to training observations is verified by a testing set that is not provided during learning phase. Let denote the testing set. The first step is to check if given testing observations are within the union of local supports defined by the trained NDR model. Let collect screened observations whose images could be inferred by the trained NDR model. The reservation ratio of testing observations is expressed by .
By a well-trained NDR model and locally nonlinear embedding, each observation in is transformed to an image in the output space. Let denote nearest neighbors in the input space and denote nearest neighbors of the image in the output space. For , an indicator is set to one if the image of belongs to , and zero otherwise. An error percentage for testing is defined by,
(11)
Determining images of testing observations can be realized by parallel computation. Inferences of two testing observations to images are independent and can be performed simultaneously. Numerical simulations employ a notebook equipped with four 2.3 GHz Intel Core i5 workers that can simultaneously execute to speed up NDR mapping of testing observations under Matlab environment.
Numerical simulations independently generate two “brokenswiss” datasets respectively for training and testing. Each dataset consists of three dimensional observations. The testing “brokenswiss” dataset is shown in Figure 7. Learning an NDR model with M = 91 internal nodes subject to the training set ten times attains entries in the first row of Table 1, where mean and variance of reservation ratio and error percentage over ten executions are listed. Low variances in Table 1 reflect high reliability of the proposed learning process. For current example, and . The mean of reservation ratio is near 90% and the mean of error percentage is less than 4%, which shows effectiveness of the proposed learning process.
Both LLE and Isomap methods are not model based. Without further improvement, these two methods are unable to produce models for testing, hence inducing neither reservation ratio nor error percentage for testing in Table 1. LLE and Isomap codes [39] induce execution results that miss images of half training observations in the output space for this example. Both LLE and Isomap are not able to reduce the training error percentage for current example. Their training error percentage is about 50%.
Figure 8A shows observations sampled from the two-colored surface of a ball. The training set contains four-dimensional observations, since except for 3D coordinates an attribute is recruited for representing two different colors. The dimension of the input space is four for this example. Figure 8B shows a graph with edges in the output space, where the positions of nodes refer to images of centroids of local supports. These edges represent neighboring relations of extracted local supports. The graph contains two isolated subgraphs. It is observed that two nodes respectively belonging to two isolated subgraphs are not connected. It is verified that local supports corresponding to nodes on each isolated subgraph contain observations of the same color and two subgraphs separately reflect two different colors of observations. The proposed NDR mapping successfully translates 4D observations to an embedded low-dimensional manifold. The encouraging results motive further applications of the proposed NDR neural model to high dimensional observations oriented from spectral features of sounds and handwritten patterns of characters.
The second row of Table 1 lists numerical results of evaluating the proposed learning process for the testing dataset in Figure 8. Either the training set or the testing set contains four dimensional observations. For this example, , . The proposed learning process executes ten times for training an NDR model with subject to . Each time the trained NDR model is verified with . Statistics over ten executions are summarized in Table 1 and Table 2, where the reservation ratio is 95% and the error percentage is 16.51% in average. For the same reason stated previously, testing results of LLE and Isomapp are not listed in Table 1. As shown in Table 2, the proposed learning process in average takes 166 s to train an NDR model, and 190 s to map 8000 testing observations to images through parallel computation. Matlab environment can further extend parallel computation with more cores. Both LLE and Isomap are also unable to reduce training error percentage effectively for this example. It is notable that Isomap takes 1336 s to process training observations for this example. The proposed learning process significantly improves computational efficiency for training and accuracy for mapping testing observations.
5. Conclusions
Numerical simulations show the proposed NDR neural model feasible for nonlinear dimensionality reduction and visualization of the embedded low dimensional manifold. A well trained NDR neural model needs no more training observations for mapping novel testing observations during execution phase. Traditional LLE and Isomap require reserving full training observations for mapping a novel testing observation. Compared with LLE and Isomap, the proposed NDR neural model possesses better portability for mapping testing observations during execution phase. It only requires built-in parameters for further execution.
All built-in parameters of a well-trained NDR model are comprehensible and meaningful for abstractive data structures. The optimized built-in parameters subject to training observations constitute extracted local supports, serving as internal representations of the centroid and projected PCA box of each local support. The obtained posterior weights refer to images of all centroids of local supports. The dynamic graph configuration states neighboring relations of local supports. Edges on the graph refer to lateral interconnections of adaline neural modules in a well-trained NDR model.
The LM algorithm has been shown effective for solving a large scaled nonlinear system toward transferring centroids to images in the output space. The nonlinear system is derived following the principle of distance preserving mapping. The LM algorithm has been also shown for seeking the image of a novel observation that satisfies locally nonlinear embedding constraints (9) during execution phase. Since there is only one unknown in a small set of locally nonlinear constraints (9), the LM algorithm is fast for NDR mapping of a single testing observation. This work has successfully extended LLE to locally nonlinear embedding constraints. The LNE constraints hold within a larger scope in comparison with traditional LLE, extensively enhancing quality of NDR mapping. Learning an NDR neural model is a process of integrating advanced clustering analysis, optimizing adaline neural modules and solving large-scale distance-preserving nonlinear constraints. The integrating learning process has been shown effective and reliable for optimizing built-in parameters of the proposed NDR neural model. An NDR neural model performs a standalone approximation that infers images of testing observations without maintaining huge volume of training observations.
Author Contributions
Conceptualization of locally nonlinear embedding, K.H. and J.-M.W.; methodology, S.-S.W. and J.-M.W.; software, S.-S.W. and J.-M.W.; validation, S.-J.J., K.H. and J.-M.W.; data curation, S.-S.W. and S.-J.J.; writing—original draft preparation, S.-S.W. and J.-M.W.; writing—review and editing, S.-S.W., K.H. and J.-M.W.; visualization, S.-S.W. and J.-M.W.; supervision, J.-M.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Informed Consent Statement
Not Applicable.
Data Availability Statement
The data presented in this study are openly available in [39].
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
A simplified version of the annealed clustering algorithm [28,29] is summarized by the following iterative steps.
Randomly set all near the mean of all data points, , initialize the inverse temperature parameter β to sufficiently low value, and set to a small positive value.
Calculate the distance between each point and .
Update the expectation of each element in exclusive memberships. .
Set stability to the mean of over . If stability is less than the sum of and , add each with a small andom noise for perturbation.
Fix all and minimize the with respect to all .
If stability < 0.98, set β to β/0.995 and repeat Step ii-v, otherwise halt.
As in previous works [28,29], the algorithm operates under a physical-like annealing process. Annealing schedules a temperature-like parameter from sufficiently high to low values. The expectation of each element in stochastic membership variables and each adaptive centroid are maintained during the physical-like annealing process. At each intermediate temperature, the algorithm iteratively updates each and each centroid . At the end of the annealing process, the temperature-like parameter is scheduled to sufficiently low values, where the algorithm eventually attains optimal exclusive memberships and centroids. The mean of over denotes the stability. This quantity is calculated at step iv. If it is less than the sum of and a pre-determined small positive value , each is added with a noise to escape a trivial state at step iv. If it approaches one at step vi, the algorithm halts.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Figure 1. A feedforward neural module for translation of a high-dimensional observation to an image in the output space.
Figure 2. A deep neural network consisting of multiple neural modules for dimensionality reduction mapping.
Figure 3. (A) Swiss-roll data. (B) A graph with derived edges for organizing neural modules.
Figure 4. (A) Centroids of partitioned subsets. (B) A regular box for representing a local support. (C) The line segment that connects two centers totally belongs to union of two local supports. (D) The line segment that connects two centers partially belonged to union of two local supports.
Figure 5. (A) Volume V(M) and connectivity C(M) empirically change as cluster number M increases. (B) Centroids.
Figure 6. (A) Edges on a graph and neighboring relations of cluster supports in the input space. (B) Images of centroids and training observations. (C) Images of all observations in the output space.
Figure 7. (A) A broken Swiss dataset for testing. (B) Isolated images of centroids and their edges. (C) Images of testing observations.
Figure 8. (A) Observations sampled from the two-colored ball surface for testing. (B) Isolated sub-graphs and their edges.
Mean and variance of reservation ratio and error percentage for testing.
Reservation Ratio (Testing) | Error Percentage (Testing) | |||
---|---|---|---|---|
Dataset | Mean | Var | Mean | Var |
Brokenswiss | 89.78% | 1.73 × 10−5 | 3.92% | 6.03 × 10−5 |
Colorball | 95.00% | 5.35 × 10−6 | 16.51% | 2.83 × 10−5 |
Mean and variance of execution time for training and testing.
Execution Time | ||||
---|---|---|---|---|
Training (Seconds) | Testing (Seconds) | |||
Dataset | Mean | Var | Mean | Var |
Colorball | 166.0 | 66.0 | 190.0 | 43.7 |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
This work explores neural approximation for nonlinear dimensionality reduction mapping based on internal representations of graph-organized regular data supports. Given training observations are assumed as a sample from a high-dimensional space with an embedding low-dimensional manifold. An approximating function consisting of adaptable built-in parameters is optimized subject to given training observations by the proposed learning process, and verified for transformation of novel testing observations to images in the low-dimensional output space. Optimized internal representations sketch graph-organized supports of distributed data clusters and their representative images in the output space. On the basis, the approximating function is able to operate for testing without reserving original massive training observations. The neural approximating model contains multiple modules. Each activates a non-zero output for mapping in response to an input inside its correspondent local support. Graph-organized data supports have lateral interconnections for representing neighboring relations, inferring the minimal path between centroids of any two data supports, and proposing distance constraints for mapping all centroids to images in the output space. Following the distance-preserving principle, this work proposes Levenberg-Marquardt learning for optimizing images of centroids in the output space subject to given distance constraints, and further develops local embedding constraints for mapping during execution phase. Numerical simulations show the proposed neural approximation effective and reliable for nonlinear dimensionality reduction mapping.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer