Content area
Due to the widespread use of point cloud, the demand for compression and transmission is more and more prominent. However, this cause various losses to the point cloud. It is necessary for application to evaluate the quality of point cloud. Therefore, we propose a new point cloud quality assessment (PCQA) metric named statistical information similarity (SISIM). First, we preprocess point cloud (PC) by scaling based on density and then project PC into texture maps and geometry maps. In addition, the SISIM based on Natural Scene Statistics (NSS) is proposed as texture features under the premise of proving that the texture maps meet NSS. Furthermore, we propose to extract geometry features based on local binary patterns (LBP) on account of the phenomenon that LBP maps of geometry images vary with different distortions. Finally, we predict the quality of PCs by fusing texture features with geometry features. Experiments show that our proposed method outperforms the state-of-the-art PCQA metrics on three publicly available datasets.
Introduction
In recent years, with the continuous upgrading of cameras and sensors, it is more convenient to obtain visual information [1]. The immersive perceptual interactive media content represented by Virtual Reality (VR) and Augmented Reality (AR) technologies has attracted great attention [2]. As the data base of immersive media, three-dimensional (3D) PC becomes a matter of great concern [3, 4].
PC data is a set of data points in 3D space. A static PC usually contains a large number of irregularly arranged data points, which contains 3D coordinates (x, y, z), as well as additional attribute information such as color, reflection intensity, normal vector, etc. Compared with two-dimensional (2D) images, PC is larger and loosely organized [5]. Therefore, PC needs to be compressed for storage and transmission, which inevitably introduce distortion. In order to ensure visual quality, PCQA studies how human eyes perceive the degradation of PCs and quantify their quality [6]. Based on the assessment results, people optimize PC processing algorithms and then improve the visual experience of immersive media. Obviously, PCQA is essential in PC processing [7].
PCQA can be divided into subjective evaluation and objective evaluation. Subjective evaluation is to measure the quality of PCs via the visual perception of human eyes [8, 9–10]. Objective evaluation is to quantitatively describe degradation through algorithm. Early researches on PCQA focused on subjective assessment experiments, Pan et al. [8] and Zhang et al. [9] analyzed the psychological factors that affect subjective evaluation scores, and Bulbul et al. [10] explored the impact of experimental environment on subjective scores. However, subjective evaluation is too expensive and time-consuming. Based on PCQA databases with Mean Opinion Score (MOS) [9, 11, 12, 13–14], objective quality evaluation methods have developed rapidly and been widely used in reality. Generally, the objective PCQA models can be divided into three types, i.e., full reference, reduced reference and no reference evaluation models [12]. Among them, reduced reference and no reference models mainly rely on the characteristics of the PC itself [15, 16]. At present, more researchers focus on the full-reference evaluation model. However, projection-based full reference models tend to lose the spatial information of point cloud during projection process.
Based on above analysis, we propose a projection-based full reference PCQA metrics called SISIM, which combine the texture features and the geometry features of point cloud. The main contributions of this paper are summarized as follows:
We propose a scaling preprocessing strategy based on distribution density of PC, which is more suitable for the perception of the human eyes and is beneficial to the PCQA.
We prove that the texture projection of PC complies with Natural Scene Statistics (NSS) rules and SISIM are proposed to extract texture features.
We find that LBP map of geometric projection is different under different distortion types. Geometry features are designed based on LBP maps.
We propose SISIM combining texture features with 3D geometric features. Proposed method predicts PC quality more accurately than classical metrics
Related work
Full reference model can be subdivided into two categories, namely, 3D feature-based evaluation models and projection-based evaluation models.
3D feature-based models
3D feature-based models measure the quality of PCs by calculating the difference between the corresponding points or areas in reference PCs and distorted PCs. Point-to-Point (P2Point) method calculates the Euclidean distance between corresponding points [17]. Point-to-Plane (P2Plane) method calculates the distance from the distorted PC to the corresponding local plane in the reference PC [18]. P2Plane method calculates the angle of the normal vectors of the corresponding point to further measure the degree of similarity [19]. The methods in [17, 18–19] need to match the correspondence between point clouds through nearest algorithm. However, this process may cause mismatch and has high computational complexity.
Inspired by the Image Quality Evaluation (IQA) algorithm-SSIM, Meynet et al. [20] proposed a PC-MSDM method based on local curvature statistics to compare the geometric disparity between PCs. PC-MSDM only took account of geometry discrepancy but ignores the color and other information. They further proposed PCQM algorithm on the basis of PC-MSDM, which supplemented the comparison information of texture attributes [21]. Color attributes are taken into consideration in later PCQA methods. Diniz et al. directly calculated LBP descriptors from nearest neighbors of points in 3D space, and then compared the LBP features to predict quality [22]. In [23], they calculated P2Point metrics of PCs, P2Plane metrics of 3D LBP maps and distance between LBP histograms, and used these distances to fit a third-degree polynomial function to generate the quality model. These LBP-based algorithms [22, 23] proved the effectiveness of LBP in PCQA. Alexiou and Ebrahimi [24] proposed the PointSSIM method by comprehensively considering the four attributes of PC: geometry attributes, normal vector, curvature and color attributes. The geometry attributes correlation is measured by Euclidean distance, the normal vector correlation is measured by angular similarity, the curvature similarity is designed as PCQM algorithm, and the color attributes similarity is designed referring to SSIM. This algorithm integrates multiple evaluation indicators and achieves good prediction results. Diniz et al. utilized CIEDE2000 color distance as the color feature and normalized normal vector difference as the geometry feature [25]. Simulating the calculation method of LBP, they coded the features, counted histograms and performed logical function fitting on these data to generate a quality prediction score. A Local Color Pattern (LCP) which is adapted from LBP descriptor was proposed by Diniz et al. [26] as well. Combing the 3D LBP in [22] and LCP, they established a PC visual quality prediction model. It is shown that local characteristics can reflect the overall quality of the PC in [22, 23, 25, 26].
Projection-based models
Projection-based models project the PC into 2D images, and then use the widely accepted IQA metrics to predict the visual quality of the PC. Ricardo used PSNR metrics on projected PCs to judge the performance of PC compression [17]. IQA metrics such as SSIM and VIFP are further explored in [27]. To analyze whether the number of projection planes will affect PCQA, Alexiou and Ebrahimi [28] weighted different views based on the time that users observed each projection plane in subjective evaluation. However, 3D structure features are discarded in projection-based models. Wu et al. considered projection schemes in Video based Point Cloud Compression (V-PCC). They directly calculated the IQA metrics of geometry and texture maps generated by patch projection as the evaluation score [29]. In [30], Diniz et al. used a deep neural network to fold and map PC attributes to 2D meshes, and then designed texture descriptors to extract pure-texture information from the 2D meshes and estimate the visual quality. The methods in [29, 30] began to explore new project strategy and consider the geometry features, but were time-consuming.
Based on the analysis above, the projection-based PCQA methods is simple and effective, but the projection strategy will affect PCQA, besides it is difficult to process geometric structure information. Our proposed SISIM combines texture and geometry features, extending the mature IQA metrics to make it suitable for 3D PCs. Furthermore, qualitative and quantitative experimental results show the effectiveness of proposed model.
Proposed method
As shown in Fig. 1, the proposed SISIM algorithm is composed of four modules: preprocessing and projection, texture feature extraction, geometry feature extraction and feature fusion prediction. Firstly, we proposed a preprocessing method based on distribution density to the reference and the distorted PCs and orthogonally projected them into texture maps and geometry maps, respectively. Then, the SISIM based on Natural Scene Statistics (NSS) distribution from the texture images is proposed as texture features. Besides, geometry features are devised based on Local Binary Patterns (LBP), in which global geometry features are defined as the JS divergence and residual of LBP maps and local geometry features are designed according to human perception characteristics. Finally, we utilize Support Vector Regression (SVR) to fuse these features and predict the quality of PC.
Fig. 1 [Images not available. See PDF.]
Diagram of the SISIM model
Preprocessing and projection
At first, we use perpendicular projection to obtain six sets of orthogonal projections. However, when we only conduct projection without any preprocessing, the projections are shown in the Fig. 2. There are gaps of different sizes on the projection images of some point clouds, which leads to unstable visual quality. For example, PCs (a–d) are good, PCs (h–i) are rough. This is attributed to the distribution differences of PCs. To more accurately describe the distribution density of PCs, we define it using the average nearest neighbor distance .
1
2
where is the total number of points in PC, is the distance between point and its nearest neighbor. Therefore, is the average nearest neighbor distance and is the distribution density of PC. can quantitatively describe the denseness of PCs. Further, the distribution density of reference PCs in SJTU-PCQA dataset is shown in Table 1.Fig. 2 [Images not available. See PDF.]
PC projections of SJTU-PCQA. Numbers match the order in Table 1
Table 1. Density on PCs in SJTU-PCQA
No | PCs | Projections (before resize) | |
|---|---|---|---|
a | longdress.ply | Great | 1.000 |
b | loot.ply | Great | 1.000 |
c | redandblack.ply | Great | 1.000 |
d | soldier.ply | Great | 1.000 |
e | hhi.ply | Sparse | 0.011 |
f | Romanoillamp.ply | Great | 24.390 |
g | shiva.ply | Holes | 1.076 |
h | statue.ply | Sparse | 0.311 |
i | ULB Unicorn.ply | Sparse | 0.517 |
It can be found from Table 1 that distribution density of point cloud is closely related to the performance of projection. The density of PC (a–d) is 1.000, and people are willing to observe projections for their continuity. By comparison, projections of PCs (f–g) are blurry in texture, while projections of PCs (h–i) look grainy. Both of these problems are caused by unsuitable density for projection. Based on those observations, in order to reduce the effect of projection on PCQA, we should choose suitable density of PCs before projecting. Therefore, we propose to preprocess PCs by scaling the distribution density to 1.000 before perpendicular projection. When projecting, the depth and color of each point are drawn on geometry map and texture map as [12], respectively. In case of self-occlusion, only the information of the point closest to the projection plane will be recorded. The resolution of the projection map is set to the minimum power of two which is not less than the projection width of the point cloud.
Figure 3 demonstrates two texture projections after scaling. It can be seen from Fig. 3 that “Romanoillamp” with density of 24.390 loses details without scaling. When it is scaled, more texture details can be accurately reflected by projection map. While for “statue” with density of 0.311, the original projection is so scattered that it is difficult to judge the distortion. After scaling, we can obtain a projection complying with subjective observation of human eyes and observe the distortion more clearly. Therefore, we scale PCs as a preprocessing step to reduce the effect of projection to PCQA.
Fig. 3 [Images not available. See PDF.]
PC projections of “Romanoillamp.ply” (a) and “statue.ply” (c) and their preprocessed version (b, d)
Texture feature extraction
Image statistical features are widely used to measure the level of image and video distortion [31, 32]. We study the statistical characteristics of the texture projections and find that the reference texture projection still conforms to the statistical rules of the natural image [33]. However, distortion will change the statistical characteristics of projection maps. Therefore, we extract the statistical distribution parameter similarity of texture projection as texture features. The NSS rules can be summarized as follows:
Mean Subtracted Contrast Normalized (MSCN) coefficients of natural images are subject to the normal distribution, but distortion deviate MSCN coefficients from normal distribution.
The correlation of adjacent MSCN coefficients is low, and normalization can be used to de-correlate adjacent pixels.
The inner product of adjacent coefficients approximates the Asymmetric Generalized Gaussian Distribution (AGGD).
Firstly, we obtain the MSCN coefficients of texture projection to explore whether it conforms to the first statistical rule of natural images. Each single texture map is recorded as , and the MSDN coefficients can be expressed as :
3
4
5
where is the local average value, is the local contrast, is the symmetric Gaussian weight function, and we set .Figure 4 shows the histogram of “longdress.ply” original texture map and its distorted versions. We observe that the texture map of the reference PC presents a unimodal symmetric distribution, which is approximately symmetric Gaussian distribution. The histogram of the distorted version has different deviations which is consistent with the first statistical rule of natural images mentioned above. Previous research shows that MSCN coefficients can be fitted using Generalized Gaussian Distribution (GGD) model [23]. Therefore, GGD parameters and for PCs can be utilized as a part of texture features [34].
Fig. 4 [Images not available. See PDF.]
Histograms of MSCN coefficients of “longdress” PC front projection view and its various distorted versions. Distortion types are from SJTU-PCQA dataset. OT octree-based compression, CN color noise, DS downscaling, D + C downscaling and color noise, D + G downscaling and geometry Gaussian noise, GGN geometry Gaussian noise, C + G color noise and geometry Gaussian noise
Then, we study the decorrelation ability of MSCN which corresponds to the second statistical rule. Figure 5a, b shows the results. The more obvious the diagonal structure is, the higher the correlation between pixels. As shown in Fig. 5a, the adjacent pixels of texture projection have strong correlation. In Fig. 5b, the diagonal trend is weakened, which indicates that MSCN still reduces the correlation of adjacent pixels to a certain extent.
Fig. 5 [Images not available. See PDF.]
Rows from top to bottom illustrate horizontal, vertical, main diagonal, and secondary diagonal neighbors. a Scatter plot between neighboring values of original gray maps. b Scatter plot between neighboring values of MSCN coefficients. c Histograms of paired product of “longdress” PC front projection view and various distorted versions. OT octree-based compression, CN color noise, DS downscaling, D + C downscaling and color noise, D + G downscaling and geometry Gaussian noise, GGN geometry Gaussian noise, C + G color noise and geometry Gaussian noise
Subsequently, we verify the third statistical rule. Specifically, we calculate the paired product in four orientations , , and , and obtain the corresponding histogram to verify the third statistical rule. The histograms of paired product are shown in Fig. 5c. Based on the assumption that MSCN conforms to GGD, the paired product follows an asymmetric probability density distribution. However, it has a few parameters, so it is not suitable for fitting the product of distorted MSCN coefficients. In addition, we find that histograms of paired product diffuse asymmetrically to both sides. Parameters and of AGGD can describe its characteristics accurately.
In preceding context, we have proved that texture projection conforms to all statistical rules of natural images. Therefore, we adopt AGGD model with zero mode for fitting, which is defined as follows:
6
where are obtained by fitting, and parameter is defined as follows:7
where is a set of AGGD distribution parameters. Four sets of AGGD distribution parameters can be obtained by paired products of adjacent MSCN coefficients in four orientations.According to Eqs. (3–7), each single projection image can be represented by two GGD distribution parameters and four sets of AGGD distribution parameters in four orientations, totaling 18 distribution features. In addition, considering the impact of multi-scale on human perception, image statistical features are extracted at two scales, there are totally 36 distribution parameter features (, , … ). Specifically, we use the distribution parameter characteristics (, , … ) of the images to fit the Multivariate Gaussian Model (MVG) to obtain the statistical characteristics of each projection:
8
where and are the variance and mean of MVG model, respectively. Note that the variance matrix extracted from the reference PC projection plane and the distortion PC projection plane is and the mean vector is , respectively. Then, the statistical similarity feature of a single projection plane is defined as follows:9
The similarity features of each projection plane are calculated to obtain six statistical similarity features , , , , , .
Geometry feature extraction
Due to LBP with excellent performance in PCQA [22, 23, 25, 26], we calculate the LBP maps of geometry projections with different distortions to determine whether the LBP can distinguish different types of distortions. Specifically, we calculate the by calculating P = 8 equally spaced pixel points in a circular neighborhood with a radius of R = 1 near the center point [35].
10
11
12
where is the th point in circular neighborhood and is the center point. represents the value where the number is rotated to the right by bits.Figure 6 shows that the LBP map of the geometry image changes due to different distortions. The stability of the relative relationships between local geometric coordinates within the point cloud, coupled with the LBP based on local neighborhood pixel encoding, ensures the grayscale invariance in the geometry LBP images. At the same time, the rotation invariant LBP relieves the restriction of fixed projection angle in the projection-based methods, making the prediction result more stable. Therefore, we extract geometry features based on the LBP maps of geometry projections to reflect the point cloud quality, where the geometry features are divided into global features and local features as shown in Sects. 3.3.1 and 3.3.2.
Fig. 6 [Images not available. See PDF.]
LBP maps of reference PC and PCs with different distortion. a LBP maps of reference PC, b LBP maps of PC with OT, c LBP maps of PC with CN, d LBP maps of PC with DS, e LBP maps of PC with D + C, f LBP maps of PC with D + G, g LBP maps of PC with GGN, h LBP maps of PC with C + G
Global geometry features
Usually, the statistical histogram of the LBP feature spectrum is used for classification and recognition tasks [36, 37]. For the purpose of showing the structural difference between the distorted PC and the reference PC, SISIM uses two global features of their LBP map: Jensen–Shannon (JS) divergence and average residual (Res). Specifically, JS divergence measures the difference between two probability distributions in the same event space in a symmetrical way. So, we calculate the JS divergence between the LBP histograms to count the deviation of the geometry maps caused by distortion.
13
where and are the geometry LBP maps of reference PC and distorted PC, respectively. represents the histogram.The Res is used to quantify the global discrepancy of LBP maps.
14
where and represent the width and height of LBP map.Local geometry features
Global features may ignore local geometric distortion. In addition, local features are more flexible and robust when dealing with image aliasing and shielding. Thus, we divide the point cloud LBP graph into several 8 × 8 patches. In LBP, the regions with larger value have complex depth changes, and are more likely to attract the attention of the human eyes. Therefore, we count the number of pixels with larger LBP values in each patch . Once it exceeds 10% of the total pixel of a patch, we regard it as one of valid blocks in . Equation 15 represents the filtering process from patches to blocks.
15
where represents the number of effective pixels in patches, represents the filtering threshold 0.1, represents all patches, represents a patch that meets the filtering criteria, represents all blocks obtained after filtering. Then, the mean of each block in is calculated as follows:16
The mean of corresponding blocks in the LBP map for distorted PCs is . The local similarity feature based on the reference PC is defined as follows:
17
where refers to the number of valid blocks filtered from the reference PC. To prevent zero division error, is set to .Symmetrically, we get the number of valid blocks , which are selected from the distorted PC and the local similarity feature based on distorted PC. As distortions will affect the number of valid blocks, the selection rate of valid blocks is also an important indicator:
18
To prevent zero division error, is set to .The mean , , of six geometric projection planes are transferred into the feature fusion module as local features.
SVR feature fusion prediction
In this work, we extracted 11 features, namely . Regression algorithm is used to fuse these features. SVR takes these features as input, learns the mapping relationship between features and subjective scores, and finally outputs the predicted value of PC quality scores. The LIBSVM package [38] is applied in the proposed model to mapping the features to quality scores. In detail, ɛ-SVR based on a Radial Basis Function (RBF) kernel is utilized. We choose the parameters of SVR by optimizing function of parameter search [38].
Experimental result and discussions
Configuration and settings
In this paper, three static PCQA databases are used for comparative experiments, and these databases are briefly introduced here.
SJTU-PCQA [12] This database contains objects PCs and human PCs. The distortion types include octree-based compression, color noise, geometry gaussian noise, and scaling;
IRPC [14] This database contains objects PCs and human PCs. Distortion types include octree-based compression, G-PCC, and V-PCC.
WPC [13] This database only contains objects PCs. Distortion types include downsampling, Gaussian noise, and compression.
In experiments, we use Pearson linear correlation coefficient (PLCC), Spearman Rank-order Correlation Coefficient (SRCC) and Root Mean Square Error (RMSE) as evaluation metrics. PLCC and SRCC describe the consistency between MOS and predicted quality, greater values show a higher correlation. Smaller RMSE represents smaller deviations between MOS and predicted scores. The nonlinear regression procedure is performed as [39], mapping the prediction scores to the range of MOS to evaluate the performance of prediction.
In the experiment, we divide the distorted PCs into training set and test set, 80% of which are training set, and the remaining 20% are test set. The calculation is repeated for 1000 times, and the median of the results is the final performance evaluation index.
Performance evaluation
Firstly, we analyze the performance of the proposed SISIM based on the comparison between the point-based Moving Picture Experts Group video (MPEG) metrics (including a. p2point_RMSE, b. PSNR_p2point_Hausdorf, c. PSNR_p2plane_RMSE and d. PSNR_p2plane_Hausdorf) [40], traditional IQA metrics (including PSNR [41], SSIM [27], MS-SSIM [27] and VIFP [27]), PointSSIM [24], GraphSIM [42]. Table 2 demonstrates the comparative results. The Top two results are in bold.
Table 2. Results of comparing experiments
Metrics | Databases | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
SJTU-PCQA | IRPC | WPC | Average | |||||||||
PLCC | SRCC | RMSE | PLCC | SRCC | RMSE | PLCC | SRCC | RMSE | PLCC | SRCC | RMSE | |
P2point a | 0.258 | 0.408 | 2.340 | 0.509 | 0.567 | 0.811 | 0.444 | 0.465 | 20.544 | 0.404 | 0.48 | 7.898 |
P2point b | 0.294 | 0.391 | 2.320 | 0.385 | 0.423 | 0.869 | 0.345 | 0.302 | 21.513 | 0.341 | 0.372 | 8.234 |
P2plane c | 0.119 | 0.242 | 2.410 | 0.324 | 0.416 | 0.891 | 0.340 | 0.290 | 22.914 | 0.261 | 0.316 | 8.738 |
P2plane d | 0.229 | 0.357 | 2.362 | 0.289 | 0.279 | 0.902 | 0.354 | 0.304 | 21.440 | 0.291 | 0.313 | 8.235 |
PSNR | 0.538 | 0.550 | 2.047 | 0.231 | 0.247 | 0.916 | 0.519 | 0.484 | 19.596 | 0.429 | 0.427 | 7.520 |
SSIM | 0.617 | 0.616 | 1.910 | 0.324 | 0.264 | 0.891 | 0.459 | 0.457 | 20.371 | 0.467 | 0.446 | 7.724 |
MS-SSIM | – | – | – | 0.346 | 0.315 | 0.883 | 0.610 | 0.589 | 18.172 | – | – | – |
VIFP | 0.650 | 0.640 | 1.844 | 0.476 | 0.521 | 0.828 | 0.630 | 0.704 | 17.801 | 0.585 | 0.622 | 6.824 |
Pointssim | 0.672 | 0.687 | 1.798 | 0.792 | 0.643 | 0.937 | 0.460 | 0.454 | 20.351 | 0.641 | 0.595 | 7.695 |
GraphSIM | 0.846 | 0.841 | 1.292 | 0.940 | 0.760 | 0.310 | 0.692 | 0.694 | 16.557 | 0.826 | 0.765 | 6.053 |
SISIM | 0.796 | 0.780 | 1.462 | 0.826 | 0.692 | 0.523 | 0.699 | 0.677 | 16.381 | 0.774 | 0.716 | 6.122 |
SISIM-T | 0.703 | 0.689 | 1.717 | 0.674 | 0.504 | 0.656 | 0.495 | 0.447 | 19.834 | 0.624 | 0.547 | 7.402 |
SISIM-G | 0.692 | 0.674 | 1.737 | 0.873 | 0.743 | 0.451 | 0.589 | 0.553 | 18.442 | 0.718 | 0.657 | 6.877 |
SISIM achieves higher accuracy in all three datasets than MPEG metrics, traditional IQA metrics and PointSSIM. Among them, MPEG metrics and PointSSIM only compare the geometric structure characteristics of point clouds, while proposed SISIM considers both texture features and geometry features. It is found that PLCC of our approach increases 91.58–196.55% than MPEG metrics, and SRCC rises 49.17–128.75% in average. In addition, our method improves 20.75% and 20.34% in PLCC and SRCC compared with PointSSIM, respectively. As for RMSE, over 20% reduction is brought by using SISIM. The superior performance of SISIM proves the effectiveness of combining texture and geometric features. Traditional IQA metrics are all projection-based texture metrics, so we compare SISIM with them to evaluate the performance of geometric features. For PLCC and SRCC, the correlation between predicted results and MOS improves 32.31–80.42% and 15.11–67.68%. RMSE is averagely reduced by 10.29–20.74%. It is indicated that geometry features are greatly helpful to PCQA. When only using texture features (SISIM-T) or geometry features (SISIM-G) to predict quality, the prediction performance is close to VIFP algorithm.
SISIM also implements competitive PLCC and RMSE with GraphSIM on WPC database. However, it does not perform well on SJTU-PCQA and IRPC databases. To explore the causes of this phenomenon, we compare the algorithms and find that GraphSIM extracts features in 3D space, while SISIM extracts features in projections. Based on this observation, we export the view of PCs and their corresponding Poisson reconstruction via CloudCompare software. Figure 7 shows the possible causation. There is an obvious hole in shiva PC from Fig. 7a. Holes formed by incomplete collection will lead to defects in the projected view, which change its texture statistical features. At the same time, the hole edge may be identified as an edge, affecting the geometric features. Figure 7b represents a severer defect, i.e., the bottom of PC is missing. Statue and other relic PCs lack effective bottom view. Moreover, we also observe similar problems in IRPC database, which is shown in Fig. 8. There are several holes distributed on the “Façade.ply” and the roof and bottom of “House.ply” have no data point. Therefore, we infer that projection will introduce errors partly. SISIM extracts features from the wrong bottom projection, and the prediction reliability will be reduced.
Fig. 7 [Images not available. See PDF.]
a “shiva.ply” in SJTU-PCQA and b “statue.ply” in SJTU-PCQA and their Poisson Reconstruction Surface under specific observation views
Fig. 8 [Images not available. See PDF.]
a Poisson reconstruction of “Façade.ply” in IRPC, b bottom view of “House.ply” in IRPC
Besides, we compare the time cost of feature extraction between GraphSIM and SISIM. Table 3 demonstrates the time comparative results for feature extraction of 9 reference PCs from the SJTU-PCQA dataset. The average time is shown in bold. For all experimental point clouds, the runtime of SISIM is significantly lower than that of GraphSIM, with an average time consumption of about 10% of GraphSIM. Combining the results of Tables 2 and 3, SISIM can achieve prediction accuracy similar to GraphSIM in less time, improving the speed of point cloud quality prediction.
Table 3. Time cost comparison results in SJTU-PCQA
PC name | Algorithm | |
|---|---|---|
GraphSIM (s) | SISIM (s) | |
longdress.ply | 231.1089 | 12.1271 |
loot.ply | 242.5160 | 13.5942 |
redandblack.ply | 239.0433 | 11.9433 |
soldier.ply | 255.7105 | 15.2590 |
hhi.ply | 292.2625 | 45.4080 |
Romanoillamp.ply | 416.2010 | 45.3291 |
shiva.ply | 302.4393 | 72.6721 |
ULB Unicorn.ply | 643.0716 | 51.6857 |
statue.ply | 128.0698 | 11.9266 |
Average | 305.6025 | 31.1050 |
Ablation study
In this section, two ablation experiments are conducted. The first is to prove the effectiveness of preprocessing. And the second is to verify the effectiveness of combining multiple features.
Density ablation
According to the analysis on Sect. 3.1, we find that the distribution density of PC is an important factor determining the quality of the projection images. Besides, PCs with density near 1.000 achieve projection that is closer to human vision. To verify the effectiveness of scaling, a comparative experiment is carried out and the results are listed in Table 4.
Table 4. Results on SJTU-PCQA
PLCC | SRCC | RMSE | |
|---|---|---|---|
Not preprocessed | 0.746 | 0.733 | 1.601 |
Preprocessed | 0.796 | 0.780 | 1.462 |
The best performance results of each metrics are shown in bold
The first row in Table 4 shows the performance of SISIM without scaling preprocessing on SJTU-PCQA dataset, and the second row is the result after scaling density to 1.000. The preprocessing step increases PLCC and SRCC by over 6% and decreases RMSE by 8.68%. SISIM predicts more accurately based on preprocessed PCs, which verifies that PCs with a density of 1.000 are more suitable for predicting quality.
Feature ablation
Figure 9 and Table 5 report the results of the feature ablation experiment. SISIM-T only uses texture features to to predict quality scores, while just geometry features are utilized for PCQA in SISIM-G.
Fig. 9 [Images not available. See PDF.]
Ablation results averaged over 3 databases
Table 5. Results of feature ablation on 3 databases
Databases | SJTU-PCQA | IRPC | WPC | Average | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithms | PLCC | SRCC | RMSE | PLCC | SRCC | RMSE | PLCC | SRCC | RMSE | PLCC | SRCC | RMSE |
SISIM | 0.796 | 0.780 | 1.462 | 0.826 | 0.692 | 0.523 | 0.699 | 0.677 | 16.381 | 0.774 | 0.716 | 6.122 |
SISIM-T | 0.703 | 0.689 | 1.717 | 0.674 | 0.504 | 0.656 | 0.495 | 0.447 | 19.834 | 0.624 | 0.547 | 7.402 |
SISIM-G | 0.692 | 0.674 | 1.737 | 0.873 | 0.743 | 0.451 | 0.589 | 0.553 | 18.442 | 0.718 | 0.657 | 6.877 |
The best performance results of each metrics are shown in bold
Compared to SISIM-T, PLCC of SISIM are improved by 24.04%, SRCC is increased by 30.90%, and meanwhile the RMSE is reduced by 17.29%. At the same time, SISIM raised PLCC and SRCC by 7.80% and 8.98% in contrast with SISIM-G, respectively. RMSE is reduced by 10.98%. Proposed SISIM combining texture features with geometric features achieves more accurate performance. For a more comprehensive analysis, the numerical results of ablation experiments on three databases are provided in Table 5.
In addition, in order to show the advantages of the proposed algorithm, we compare SISIM-T and SISIM-G with other PCQA algorithms in Table 2 of Sect. 4.2. When only texture features are utilized to predict quality, SISIM-T shows a more stable performance than these projection-based IQA metrics in Table 2, indicating that the proposed texture features are less susceptible to be disturbed when evaluating PCs. This proves that the proposed texture features extracted from projections are more suitable for describing PCs. Furthermore, SISIM-G shows strong advantages in comparison with MPEG metrics and PointSSIM which only compare the geometry structure characteristics. Geometry features based on LBP maps have the ability to represent the quality of PCs more accurately. These results reveal that both the texture features and the geometry features are beneficial for PCQA.
Conclusion
In this paper, we innovatively propose a scaling preprocessing strategy based on density that unifies the denseness of PCs. What is more, a quality evaluation metrics for PC is proposed based on statistical information similarity, which combines both texture features and geometry features to improve the performance of PCQA. The statistical similarity features of texture images inspect distortions in 2D space. Geometric features collect spatial information unique to point clouds.
SISIM is compared with several classic and most advanced PCQA indicators on three common PCQA databases. It outperforms all classic and projection-based metrics. Although the performance of SISIM is affected by projection algorithms, it still performs better than most 3D feature-based indicators. We expect to propose a projection algorithm that can accurately reflect human eye perception in the future, which will greatly improve the prediction accuracy of SISIM.
Funding
This work was supported by the Natural Science Foundation of China under Grants 61671283.
Data availability
The associated datasets of the current study are available from the corresponding author on reasonable request.
Declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Li, C; Yu, L; Fei, S. Large-scale, real-time 3D scene reconstruction using visual and IMU sensors. IEEE Sens. J.; 2020; 20, pp. 5597-5605. [DOI: https://dx.doi.org/10.1109/JSEN.2020.2971521]
2. Pereira, F.: Deep learning-based point cloud coding for immersive experiences. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7368–7370 (2022). https://doi.org/10.1145/3503161.3546961
3. Jia, X; Yang, S; Wang, Y et al. Dual-view 3D reconstruction via learning correspondence and dependency of point cloud regions. IEEE Trans. Image Process.; 2022; 31, pp. 6831-6846. [DOI: https://dx.doi.org/10.1109/TIP.2022.3215024]
4. Chen, J; Zhang, Y; Huang, K et al. Self-supervised boundary point prediction task for point cloud domain adaptation. IEEE Robot. Autom. Lett.; 2023; 8, pp. 5878-5885. [DOI: https://dx.doi.org/10.1109/LRA.2023.3301278]
5. Yuan, H; Zhang, D; Wang, W et al. A sampling-based 3D point cloud compression algorithm for immersive communication. Mob. Netw. Appl.; 2020; 25, pp. 1863-1872. [DOI: https://dx.doi.org/10.1007/s11036-020-01570-y]
6. Alexiou, E., Upenik, E., Ebrahimi, T.: Towards subjective quality assessment of point cloud imaging in augmented reality. In: 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2017). https://doi.org/10.1109/MMSP.2017.8122237
7. He, Z., Jiang, G., Jiang, Z., et al.: Towards a colored point cloud quality assessment method using colored texture and curvature projection. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 1444–1448 (2021). https://doi.org/10.1109/ICIP42928.2021.9506762
8. Pan, YX; Cheng, I; Basu, A. Quality metric for approximating subjective evaluation of 3-D objects. IEEE Trans. Multimed.; 2005; 7, pp. 269-279. [DOI: https://dx.doi.org/10.1109/TMM.2005.843364]
9. Zhang, J., Huang, W., Zhu, X., et al.: A subjective quality evaluation for 3D point cloud models. In: ICALIP, Shanghai, China, pp. 827–831 (2014). https://doi.org/10.1109/ICALIP.2014.7009910
10. Bulbul, A; Capin, T; Lavoué, G et al. Assessing visual quality of 3-D polygonal models. IEEE Signal Process. Mag.; 2011; 28, pp. 80-90. [DOI: https://dx.doi.org/10.1109/MSP.2011.942466]
11. Javaheri, A., Brites, C., Pereira, F., et al.: Subjective and objective quality evaluation of 3D point cloud denoising algorithms. In: ICMEW, Hong Kong, China, pp. 1–6 (2017). https://doi.org/10.1109/ICMEW.2017.8026263
12. Yang, Q; Chen, H; Ma, Z et al. Predicting the perceptual quality of point cloud: a 3D-to-2D projection-based exploration. IEEE Trans. Multimed.; 2021; 23, pp. 3877-3891. [DOI: https://dx.doi.org/10.1109/TMM.2020.3033117]
13. Su, H., Duanmu, Z., Liu, W., et al.: Perceptual quality assessment of 3d point clouds. In: ICIP, Taipei, Taiwan, China, pp. 3182–3186 (2019). https://doi.org/10.1109/ICIP.2019.8803298
14. Javaheri, A; Brites, C; Pereira, F et al. Point cloud rendering after coding: impacts on subjective and objective quality. IEEE Trans. Multimed.; 2021; 23, pp. 4049-4064. [DOI: https://dx.doi.org/10.1109/TMM.2020.3037481]
15. Viola, I; Cesar, P. A reduced reference metric for visual quality evaluation of point cloud contents. IEEE Signal Process. Lett.; 2020; 27, pp. 1660-1664. [DOI: https://dx.doi.org/10.1109/LSP.2020.3024065]
16. Liu, Q; Yuan, H; Su, HL et al. PQA-Net: deep no reference point cloud quality assessment via multi-view projection. IEEE Trans. Circuits Syst. Video Technol.; 2021; 31, pp. 4645-4660. [DOI: https://dx.doi.org/10.1109/TCSVT.2021.3100282]
17. Common test conditions for point cloud compression, N17995, Schwarz S, Flynn D. Macau: ISO/IEC JTC1/SC29/WG11 (2018)
18. Dong, T., Hideaki, O., Chen, F., et al.: Geometric distortion metrics for point cloud compression. In: ICIP, Beijing, China, pp. 3460–3464 (2017)
19. Alexiou, E., Ebrahimi, T.: Point cloud quality assessment metric based on angular similarity. In: ICME, Santiago, USA, pp. 1–6 (2018). https://doi.org/10.1109/ICME.2018.8486512
20. Meynet, G., Digne, J., Lavoué, G.: PC-MSDM: a quality metric for 3D point clouds. In: QoMEX, Berlin, Germany, pp. 1–3 (2019),. https://doi.org/10.1109/QoMEX.2019.8743313
21. Meynet, G., Nehmé, Y., Dignem, J., Lavoué, G.: PCQM: a full-reference quality metric for colored 3D point clouds. In: QoMEX, Athlone, Ireland, pp. 1–6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123147
22. Diniz, R., Freitas, P.G., Farias, M.C.Q.: Towards a point cloud quality assessment model using local binary patterns. In: QoMEX, Athlone, Ireland, pp. 1–6 (2020). https://doi.org/10.1109/QoMEX48832.2020.9123076
23. Diniz, R., Freitas, P.G., Farias, M.C.Q.: Multi-distance point cloud quality assessment. In: ICIP, Abu Dhabi, United Arab Emirates, pp. 3443–3447 (2020). https://doi.org/10.1109/ICIP40778.2020.9190956
24. Alexiou, E., Ebrahimi, T.: Towards a point cloud structural similarity metric. In: ICMEW, London, UK, pp. 1–6 (2020). https://doi.org/10.1109/ICMEW46912.2020.9106005
25. Diniz, R; Freitas, PG; Farias, MCQ. Color and geometry texture descriptors for point-cloud quality assessment. IEEE Signal Process. Lett.; 2021; 28, pp. 1150-1154. [DOI: https://dx.doi.org/10.1109/LSP.2021.3088059]
26. Diniz, R; Freitas, PG; Farias, MCQ. Point cloud quality assessment based on geometry-aware texture descriptors. Comput. Graph.; 2022; 103, pp. 31-44. [DOI: https://dx.doi.org/10.1016/j.cag.2022.01.003]
27. Torlig, E.M., Alexiou, E., Fonseca, T.A., et al.: A novel methodology for quality assessment of voxelized point clouds. In: Applications of digital image processing XLI., San Diego, California, United States, p. 10752 (2018). https://doi.org/10.1117/12.2322741
28. Alexiou, E., Ebrahimi, T.: Exploiting user interactivity in quality assessment of point cloud imaging. In: QoMEX, Berlin, Germany, pp. 1–6 (2019). https://doi.org/10.1109/QoMEX.2019.8743277
29. Wu, X; Zhang, Y; Fan, C et al. Subjective quality database and objective study of compressed point clouds with 6DoF head-mounted display. IEEE Trans. Circuits Syst. Video Technol.; 2021; 31, pp. 4630-4644. [DOI: https://dx.doi.org/10.1109/TCSVT.2021.3101484]
30. Freitas, XG; Diniz, R; Farias, MCQ. Point cloud quality assessment: unifying projection, geometry, and texture similarity. Vis. Comput.; 2022; [DOI: https://dx.doi.org/10.1007/s00371-022-02454-w]
31. Mittal, A; Soundararajan, R; Bovik, AC. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett.; 2013; 20, pp. 209-212. [DOI: https://dx.doi.org/10.1109/LSP.2012.2227726]
32. Wang, Y; Shuai, Y; Zhu, Y et al. Jointly learning perceptually heterogeneous features for blind 3D video quality assessment. Neurocomputing; 2019; 332, pp. 298-304. [DOI: https://dx.doi.org/10.1016/j.neucom.2018.12.029]
33. Moorthy, A.K., Bovik, A.C.: Statistics of natural image distortions. In: ICASSP, Dallas, TX, USA, pp. 962–965 (2010). https://doi.org/10.1109/ICASSP.2010.5495298
34. Sharifi, K; Leon-Garcia, A. Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video. IEEE Trans. Circuits Syst. Video Technol.; 1995; 5, pp. 52-56. [DOI: https://dx.doi.org/10.1109/76.350779]
35. Ojala, T; Pietikainen, M; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell.; 2002; 24, pp. 971-987. [DOI: https://dx.doi.org/10.1109/TPAMI.2002.1017623]
36. Karanwal, S.: COC-LBP: complete orthogonally combined local binary pattern for face recognition. In: 2021 IEEE 12th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), pp. 0534–0540 (2021). https://doi.org/10.1109/UEMCON53757.2021.9666506
37. Chen, H; Ma, M; Liu, G et al. Breast tumor classification in ultrasound images by fusion of deep convolutional neural network and shallow LBP feature. J. Digit. Imaging; 2023; 36, pp. 932-946. [DOI: https://dx.doi.org/10.1007/s10278-022-00711-x]
38. Chang, CC; Lin, CJ. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol.; 2011; [DOI: https://dx.doi.org/10.1145/1961189.1961199]
39. Lu, Z; Huang, H; Zeng, H; Hou, J; Ma, K-K. Point cloud quality assessment via 3D edge similarity measurement. IEEE Signal Process. Lett.; 2022; 29, pp. 1804-1808. [DOI: https://dx.doi.org/10.1109/LSP.2022.3198601]
40. Mpeg’s Pcc Metric Version 0.13.5, Flynn, D., Julien, R., Tian, D., Mekuria, R., Jean-Claude, C., Valentin, V. (2020). https://github.com/rafael2k/bitdance-pc_metric/tree/main/mpeg-pcc-dmetric-0.13.05
41. de Queiroz, RL; Chou, PA. Motion-compensated compression of dynamic voxelized point clouds. IEEE Trans. Image Process.; 2017; 26, pp. 3886-3895.
42. Yang, Q; Ma, Z; Xu, Y et al. Inferring point cloud quality via graph similarity. IEEE Trans. Pattern Anal. Mach. Intell.; 2022; 44, pp. 3015-3029. [DOI: https://dx.doi.org/10.1109/TPAMI.2020.3047083]
Copyright Springer Nature B.V. Jan 2025