Next Article in Journal
Advanced Millimeter-Wave Radar System for Real-Time Multiple-Human Tracking and Fall Detection
Next Article in Special Issue
SGK-Net: A Novel Navigation Scene Graph Generation Network
Previous Article in Journal
SAW Humidity Sensing with rr-P3HT Polymer Films
Previous Article in Special Issue
An Image Dehazing Algorithm for Underground Coal Mines Based on gUNet
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering

1
Medical College, Tianjin University, Tianjin 300072, China
2
Peng Cheng Laboratory, Shenzhen 518000, China
3
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
4
Department of Mathematics and Physics, North China Electric Power University, Baoding 071003, China
5
College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
6
Hangzhou Institute of Technology, Xidian University, Hangzhou 311231, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2024, 24(11), 3659; https://doi.org/10.3390/s24113659
Submission received: 2 April 2024 / Revised: 21 May 2024 / Accepted: 31 May 2024 / Published: 5 June 2024

Abstract

:
Robust face clustering enjoys a wide range of applications for gate passes, surveillance systems and security analysis in embedded sensors. Nevertheless, existing algorithms have limitations in finding accurate clusters when data contain noise (e.g., occluded face clustering and recognition). It is known that in subspace clustering, the 1 - and 2 -norm regularizers can improve subspace preservation and connectivity, respectively, and the elastic net regularizer (i.e., the mixture of the 1 - and 2 -norms) provides a balance between the two properties. However, existing deterministic methods have high per iteration computational complexities, making them inapplicable to large-scale problems. To address this issue, this paper proposes the first accelerated stochastic variance reduction gradient (RASVRG) algorithm for robust subspace clustering. We also introduce a new momentum acceleration technique for the RASVRG algorithm. As a result of the involvement of this momentum, the RASVRG algorithm achieves both the best oracle complexity and the fastest convergence rate, and it reaches higher efficiency in practice for both strongly convex and not strongly convex models. Various experimental results show that the RASVRG algorithm outperformed existing state-of-the-art methods with elastic net and 1 -norm regularizers in terms of accuracy in most cases. As demonstrated on real-world face datasets with different manually added levels of pixel corruption and occlusion situations, the RASVRG algorithm achieved much better performance in terms of accuracy and robustness.

1. Introduction

Subspace clustering aims to find groups of similar objects or clusters, which usually exist in low-dimensional subspaces. With the devolvement of artificial intelligence and the popularity of computer vision applications such as face recognition and clustering [1,2], motion segmentation [3] and document analysis [4], subspace clustering has attracted more attention in recent years, especially mask-occluded face recognition due to COVID-19 in embedded sensors. Samples of different classes can be approximated well by data from a union of low-dimensional subspaces. In practice, we perform the task of dividing the data points based on their similarity into various subspaces. Subspace clustering is one subcategory of clustering which gathers data into different groups until each group consists of data points from the same subspace only. For subspace clustering, there are series of related methods being developed, such as statistical, iterative and algebraic methods, spectral clustering and deep learning algorithms [5,6,7,8,9,10].
Compared with other techniques, the methods based on spectral clustering have become increasing popular because of convenient implementation, complete theoretical support and reliable accuracy [11]. The key of these methods is to adopt an 1 , 2 -norm or elastic net (i.e., a mixture of 1 - and 2 -norms) regularizer to solve an optimization problem for obtaining an affinity matrix and making further use of spectral clustering with the matrix. Each point from the union of subspaces can be represented as a linear combination of other data points in the subspace [12], which is called the self-expressiveness property. It can be formulated as follows:
x j = X c j and c j j = 0 .
In fact, Equation (1) is equivalent to the following form:
X = X C and diag ( C ) = 0
where X = x 1 , x 2 , , x N R D × N is the data matrix, whose jth column corresponds to the sparse representation of x j , C = c 1 , c 2 , , c N R N × N is the coefficient matrix, whose jth column corresponds to c j , c j j is the jth element of c j and diag ( C ) R N is the vector of the diagonal elements of C. Although multiple sets of solutions for C may be found rather than a unique solution, such a solution under the condition that if c i j 0 , then x i belongs to the same subspace as x j still exists. Due to the reservation of subspace clustering, these solutions are called subspace-preserving. If a subspace-preserving C exists, and the connection between a pair of points x i and x j can be established in an affinity matrix W (i.e., w i j = c i j + c j i ), then one can cluster the data points by applying spectral clustering [13] to the affinity matrix.
In order to obtain the subspace-preserving matrix C, one effective method is to regularize C and solve the following minimization problem:
C * = arg min C C 1 , s . t . , X = X C , diag ( C ) = 0
where · 1 is the 1 -norm (i.e., C 1 = i = 1 n j = 1 n | c i j | ). Here, the 1 -norm can be replaced by the 2 -norm (i.e., C 2 = i = 1 n j = 1 n c i j 2 ).
Due to the choice of regularizers for the coefficient matrix C, there are different subspace clustering algorithms. For instance, the sparse subspace clustering (SSC) method [12] applies the 1 -norm to find the coefficient matrix C. Previous work has indicated that SSC can provide a subspace-preserving solution under certain circumstances, where the subspaces are independent [14] or the data in different subspaces meet some separation conditions, and the data in the same subspaces distribute well [14,15]. Similar conclusions are also obtained when data are corrupted by noise [16] or outliers [17]. Least squares regression [18] uses the 2 -norm regularizer on the matrix C. Low-rank representation [19] applies the nuclear norm regularizer to C to retain its low-rank property. Moreover, the authors of [17,20,21] utilized the elastic net regularizer to induce a sparse matrix C.
In recent years, many subspace clustering methods have greatly promoted the development of SSC algorithms. However, the performance of these methods is mostly evaluated in a case where the datasets are clean, ignoring the existence of potential noise in reality. On the other hand, many algorithms require additional procedures to evaluate noises and remove them. For instance, the authors of [22] required principal components analysis (PCA) to be performed on the data for dimensionality reduction and noise reduction, and the authors of [17] modeled and removed the outliers for further clustering. These methods have a strong dependence on the cleanliness of the data. Therefore, in consideration of the above reasons, the actual performance of existing methods on real-world datasets is not always satisfied.
In this paper, we propose a robust accelerated stochastic variance reduction gradient (RASVRG) method and present its efficient implementation for self-expressiveness-based subspace clustering problems. Our algorithms can be directly applied to SSC problems with 1 -norm and elastic-net regularizers on datasets that may be corrupted by potential noise and achieve superior clustering accuracy and efficiency compared with existing popular algorithms, demonstrating the excellent performance and strong robustness of the RASVRG.
The key accelerated technique in the RASVRG is the snapshot momentum proposed in our previous work [23,24]. We introduce the momentum acceleration technique into the proximal stochastic variance reduction gradient (Prox-SVRG) method [25]. The proposed RASVRG algorithms require tracking only one variable vector in the inner loop, which means that its computational time and memory overhead are exactly the same as those of the SVRG [26] and Prox-SVRG [25]. Thus, our RASVRG algorithms have much lower per iteration complexity than other accelerated methods (e.g., Katyusha [27]), which means that the RASVRG is more suitable for large-scale SSC problems [28], especially large-scale robust face clustering problems. To the best of our knowledge, this work is the first one to propose faster stochastic optimization algorithms instead of deterministic methods to solve various SSC problems, including robust face clustering.
We summarize the major contributions of this paper as follows:
  • Faster convergence rates: Our RASVRG obtains the oracle complexity O ( ( D + D κ ) log ( 1 / ϵ ) ) for strongly convex (SC) subspace clustering problems (e.g., the elastic net regularized face clustering problem), which is the best oracle gradient complexity, as pointed out in [29], where κ is the condition number of the objective function. For subspace clustering problems which are not strongly convex (non-SC) (e.g., the 1 -norm regularized problem), the RASVRG achieves an optimal convergence rate O ( 1 / S 2 ) , where S is the number of epochs; that is, the RASVRG is much faster than existing stochastic and deterministic algorithms, such as the Prox-SVRG [25].
  • Better accuracy: Both in theory and in practice, our algorithms can generally yield better performance than existing state-of-the-art methods for solving problems with the 1 -norm or elastic net regularizer, while most existing methods are greatly influenced by the choice of the regularizer.
  • More robust: Our RASVRG obtains much better performance compared with other algorithms on real-world datasets with different manually added levels of random pixel corruption or unrelated block occlusion for simulating potential real-world noise, while existing methods may be seriously deteriorated by such strong noise.
  • Extension to more applications: Our RASVRG requires tracking only one variable vector in the inner loop, which means its computational cost and memory overhead are exactly the same as the SVRG [26] and Prox-SVRG algorithms. This feature allows the RASVRG to be extended to other real-world clustering applications and more settings such as the sparse and asynchronous setting, which can significantly accelerate the speed of the RASVRG.
The rest of this paper is organized as follows. In Section 2, we discuss some related works concerning sparse subspace clustering and stochastic optimization methods. Section 3 proposes two efficient RASVRG algorithms for solving both strongly convex and non-strongly convex models and analyzes their convergence properties for 1 -norm and elastic net regularized SSC problems. In Section 4, we exhibit the practical performance of the RASVRG for subspace clustering tasks on synthetic and real-world face datasets. Section 5 concludes this paper and discusses future work.

2. Related Works

In this section, we briefly overview some related works concerning sparse subspace clustering and stochastic optimization.

2.1. Sparse Subspace Clustering

In this part, we briefly overview sparse subspace clustering (SSC) [22]. Let X = [ x 1 , , x N ] R D × N be a union of N signals { x i R D } i = 1 N from a union of K linear subspaces S 1 S 2 S n and { d k } k = 1 K R D . In addition, X k R D × N k is a submatrix of X and N k points in the subspace S k which satisfies k = 1 K N k = N .
Each point from a random subspace S k can be represented by a linear combination of at most ( N N k ) other points from other subspaces. Therefore, we can find it by solving the following optimization problem:
min c j 0 , s . t . , x j = X c j , c j j = 0
where c j = [ c j 1 , c j 2 , , c j N ] T R N are the coefficient vectors, and c j 0 counts the number of nonzero entries in vector c j . Since this is an NP-hard problem, the authors of [12] relaxed this problem and solved the following problem:
min c j 1 , s . t . , x j = X c j , c j j = 0
where c j 1 = i = 1 N c j i is the 1 -norm of c j R N . For all the data points i = 1 , , N , the optimization problem (Equation (5)) can be expressed in matrix form:
min C 1 , s . t . , X = X C , diag ( C ) = 0
where C = c 1 , c 2 , , c N R N × N is the coefficient matrix and each column c i is the sparse representation vector of x i .
Equations (4) and (5) have attracted much attention in the fields of compressed sensing [30,31], subspace clustering [5] and face recognition [32]. In fact, the two solutions are the same under certain conditions. However, the results of compressed sensing may not be suitable for the subspace clustering problem, since the solution for C is not necessarily unique as the columns of X lie in a union of subspaces. When the matrix C is obtained, spectral clustering [33] is adopted in the affinity matrix W = | C | + | C | T for clustering.
Furthermore, we also consider the case where the data points from a union of linear subspaces contain a certain amount of noise. More specifically, the jth data point contaminated with noise ζ j is represented by x ¯ j = x j + ζ j , where ζ j satisfies the condition of ζ j 2 ϵ and · 2 is the Euclidean norm. We can find the sparsest solution of the following problem to obtain the sparse representation of x ¯ j with a given error tolerance ϵ :
min c j 1 s . t . , X c j x ¯ j 2 ϵ , c j j = 0 .
However, we cannot acquire the scale of the noise ζ j in most instances. Under this circumstance, the Lasso optimization algorithm [34] can be applied to obtain the sparse representation in the following form:
min c j 1 + γ X c j x ¯ j 2 2 , c j j = 0
where γ 0 is a constant parameter.
In addition, potential connectivity issues in the representation graph may exist [35] (i.e., over-segmentation problems). This phenomenon is caused by sparsity of the representation matrix C computed from Equation (8). In order to promote connections between data points, the authors of [17,21] used the elastic net regularizer to solve the sparse coefficient matrix C as follows:
min c j λ c j 1 + 1 λ 2 c j 2 2 + γ 2 X c j x ¯ j 2 2
where λ [ 0 , 1 ] determines the trade-off between sparseness (from the 1 -norm regularizer) and connectivity (from the 2 -norm regularizer). In particular, when λ is extremely close to one, the performance of the elastic net approaches is much better than that of the method based on the 1 -norm. The purpose of the 2 -norm regularizer is to enhance the connectivity between data points; that is, in the case of a relatively small λ , there exist more nonzero elements in the matrix C.

2.2. Stochastic Methods

All the algorithms mentioned above for SSC are deterministic methods, and their per iteration complexity is O ( N D ) , which is expensive for extremely large N values. Recently, stochastic gradient descent (SGD) has been successfully applied to many large-scale machine learning problems due to its significantly lower per iteration complexity O ( D ) . SGD only requires one (or a small batch of) component function(s) per iteration to form an estimator of the full gradient. However, the variance of the stochastic gradient estimator may be large [26], which leads to slow convergence and poor performance.
More recently, many researchers have proposed accelerated stochastic variance reduced methods such as Acc-Prox-SVRG [36], APCG [37], Catalyst [38], SPDC [39], point-SAGA [40], Katyusha [27] and MiG [23]. For strongly convex problems, both Acc-Prox-SVRG [36] and Catalyst [38] make good use of Nesterov’s momentum in [41] and attain the corresponding oracle gradient complexities O ( ( D + b κ ) log ( 1 / ϵ ) ) with a sufficiently large mini-batch size b and O ( ( D + κ D ) log ( κ ) log ( 1 / ϵ ) ) , respectively, where κ = L σ denotes the condition number of an L-smooth and σ strongly convex function. In particular, APCG, SPDC, point-SAGA and Katyusha essentially achieve the best-known oracle complexity O ( ( D + κ D ) log ( 1 / ϵ ) ) . For non-strongly convex problems, Katyusha also attains the best-known convergence rate O ( 1 / S 2 ) . However, existing accelerated stochastic variance reduction methods generally have more complex update rules and higher computational costs [42]. Therefore, this paper will propose a faster stochastic variance reduced gradient method for sparse subspace clustering.

3. Accelerated Stochastic Variance Reduced Gradient Algorithms for Sparse Subspace Clustering

In this section, we propose a new robust accelerated stochastic variance reduced gradient (RASVRG) method for solving sparse representation problems such as SSC. For elastic net regularized problems, we present a strongly convex (SC) version of the RASVRG (RASVRG   SC ) which has the best-known linear convergence rate. Moreover, we also provide a non-strongly convex (NSC) version (RASVRG   NSC ) which attains the fastest convergence rate O ( 1 / S 2 ) .
Compared with stochastic variance reduced gradient methods (e.g., SVRG [26] or SAGA [43]), most existing accelerated methods have improved convergence rates while having complex coupling structures, which leads to slow convergence in practice [24]. Thus, we propose a robust accelerated stochastic variance reduced gradient (RASVRG) method for both strongly convex and non-strongly convex problems. This means that the RASVRG can solve the SSC problem [14] based on the 1 -norm regularizer and elastic net regularizer [21]. We focus on the following convex optimization problem with a finite-sum structure, which is a common problem in machine learning and statistics:
min α R d { F ( α ) = f ( α ) + g ( α ) } .
Here, the symbol α stands for the parameter, and N is the dimension of α . Meanwhile, f ( α ) = 1 D i = 1 D f i ( α ) is a finite average of D smooth convex functions f i ( x ) , and g ( x ) is a relatively simple convex function. A convex function f : R N R is L-smooth if for all α , β R N , we have
f ( α ) f ( β ) + f ( β ) , α β + L 2 α β 2
and it is σ strongly convex if for all α , β R d , we have
f ( α ) f ( β ) + G , α β + σ 2 α β 2
where G f ( α ) is a sub-gradient of f at α . We make the following two assumptions to categorize Equation (10):
Assumption 1
(Strongly convex). In Equation (10), each f i ( · ) is L-smooth and convex, and g ( · ) is σ strongly convex.
Assumption 2
(Non-strongly convex). In Equation (10), each f i ( · ) is L-smooth and convex, and g ( · ) is convex.
Next, we propose an efficient RASVRG method for both strongly convex (e.g., elastic net regularized SSC) and non-strongly convex (e.g., 1 -norm regularized SSC) problems.

3.1. RASVRG   SC for Elastic Net Regularized SSC

In this subsection, we consider the elastic net regularized SSC problem, which is strongly convex. Inspired by our previous work [23], we present the RASVRG   SC , shown in Algorithm 1, as a solver together with an active set-based optimization framework [21] (i.e., Oracle Guided Elastic Net (ORGEN)):
f ( c ; b , X ) : = λ c 1 + 1 λ 2 c 2 2 + γ 2 b X c 2 2 , c j j = 0
where b R D , X = x 1 , , x N R D × N , γ 0 and λ [ 0 , 1 ) . In addition, b and x j j = 1 N are normalized to an 2 -norm unit. The goal of the elastic net model is to attain c * ( b , X ) as follows:
c * ( b , X ) : = arg min c f ( c ; b , X ) .
We apply our RASVRG   SC to compute c * ( b , X ) . As we can see in step 5 of Algorithm 1, y is a convex combination of c and c ˜ with the momentum parameter θ . In other words, our algorithm uses the momentum acceleration technique proposed in our previous work [23]. We just need to keep track of y in a single inner loop and have a fancy update for c ˜ (see step 10), which is efficient but simple in implementation. Note that temporary variables w 1 and w 2 are also introduced into Algorithm 1, which will make our algorithm statement clearer. Of course, we can rewrite the algorithm and keep track of only one vector per iteration during implementation. Below, we give the convergence analysis of the RASVRG   SC .
Algorithm 1 RASVRG   SC .
Input: 
Initial vector c 0 , train matrix A R D × ( N 1 ) , b R D , epoch length m, learning rate η and parameters θ and γ .
Initialize: 
c ˜ 0 = c 0 1 = c 0 , p = 1 + η σ .
1:
for  s = 1 S   do
2:
    μ s = γ D A T A c ˜ s 1 b ;
3:
   for  j = 1 m  do
4:
     Pick a row i j { 1 D } randomly from A and assign it to a T ;
5:
      y j = θ c j 1 s + ( 1 θ ) c ˜ s 1 ;
6:
      w 1 = a a T y j b i j a a T c ˜ s 1 b i j + μ s + 1 λ γ c j 1 s ;
7:
      w 2 = c j 1 s η × w 1 ;                          / / temp variable w 1 , w 2
8:
      c j s = sign ( w 2 ) max { | w 2 | λ γ η , 0 } ;
9:
   end for
10:
    c ˜ s = θ j = 0 m 1 p j 1 j = 0 m 1 ω j c j + 1 s + ( 1 θ ) c ˜ s 1 ;
11:
    c 0 s + 1 = c m s ;
12:
end for
Output: 
c ˜ s .
Theorem 1
(Strongly convex). Let c * be the optimal solution for Equation (13). Suppose that Assumption 1 holds. Then, by choosing m = Θ ( D ) , the RASVRG   SC achieves an ϵ-additive error with the following oracle complexity in expectation:
O κ D log F c 0 F c * ϵ , if m κ 3 4 , O D log F c 0 F c * ϵ , if m κ > 3 4 .
The proof of this theorem is similar to that in our previous work [23], and thus it is omitted here. Similar to the analysis in our previous work [23], the overall oracle complexity of the RASVRG   SC is O ( D + κ D ) log 1 ϵ . This result indicates that under strongly convex conditions, the RASVRG   SC has the best-known oracle complexity for stochastic accelerated algorithms (e.g., APCG [37], SPDC [39] and Katyusha [27]). As analyzed in [23], the RASVRG   SC has only one variable y, while most existing accelerated methods, including Katyusha, require two additional variables. Therefore, the RASVRG   SC has a faster convergence speed than them in practice.
We applied the RASVRG   SC to the ORGEN framework, which is an efficient method that can handle large-scale datasets. We could find the optimal c * of each column of X. The basic idea of ORGEN is to solve a sequence of reduced-scale subproblems defined by an active set. Here, we introduce the ORGEN-RASVRG   SC algorithm, as shown in Algorithm 2.
Algorithm 2 ORGEN-RASVRG   SC .
Input: 
A R D × ( N 1 ) , b R D , λ and γ .
Initialize: 
The support set T 0 and set k 0 .
1:
while  T k + 1 T k   do
2:
   Compute c * b , A T k by using Algorithm 1;
3:
   Compute δ ( b , A ) : = γ · b A c * ( b , A ) ;
4:
   Update the active set: T k + 1 j : a j δ b , A T k ;
5:
   Set k k + 1
6:
end while
Output: 
c: c T k = c * b , A T k and zeros otherwise.
By solving a series of reduced-scale problems in step 2 of Algorithm 2, we can address large-scale data efficiently. Next, we define the subspace clustering problem with the ORGEN-RASVRG   SC . Let X R D × N be a real value matrix whose columns are drawn from a union of n subspaces of R D , where each x i is normalized to be an 2 -norm unit. Here, we use A = X j to denote the matrix X without the jth column. The goal of subspace clustering is to segment each column of X into their corresponding subspaces by finding a sparse representation of each point in terms of other points. The sparse subspace clustering procedure of the ORGEN-RASVRG   SC is shown in Algorithm 3.
Algorithm 3 SSC by ORGEN-RASVRG   SC .
Input: 
Data X = x 1 , , x N and parameters k max and ϵ .
  •    b x j , A X j , and compute c * ( b , A ) by ORGEN-RASVRG   SC ;
  •   Set C * = c 1 * , , c N * and W = C * + C * ;
  •   Compute segmentation from W by spectral clustering;
Output: 
Segmentation of data X.
The vector c j * R N (i.e., the jth column of C * R N × N ) is computed by the ORGEN-RASVRG   SC . After C * was computed, and by using spectral clustering for the matrix W = C * + C * , we then obtained the segmentation result of X.

3.2. RASVRG   NSC for 1 -Norm Regularized SSC

In this subsection, we consider Equation (13) with λ = 1 , also known as the Lasso problem, which is a non-strongly convex problem:
f ( c ; b , X ) : = c 1 + γ 2 b X c 2 2 .
The RASVRG   NSC shown in Algorithm 4 can achieve a convergence rate of O 1 / S 2 .
Algorithm 4 RASVRG   NSC .
Input: 
A R D × ( N 1 ) , b R D , γ and epoch length m.
Initialize: 
c ˜ 0 = c 0 1 = c 0 ;
1:
for  s = 1 S   do
2:
    θ = 2 s + 4 , η = 1 4 L θ , c ˜ = c ˜ s 1 , μ s = γ D A T A c ˜ s 1 b ;
3:
   for  j = 1 m  do
4:
     Pick a row i j { 1 D } randomly from A and assign it to a T ;
5:
      y j = θ c j 1 s + ( 1 θ ) c ˜ s 1 ;
6:
      w 1 = a a T y j b i j a a T c ˜ s 1 b i j + μ s ;
7:
      w 2 = c j 1 s η × w 1 ;      // temp variable w 1 , w 2
8:
      c j s = sign ( w 2 ) max { | w 2 | 1 γ η , 0 } ;
9:
   end for
10:
    c ˜ s = θ m j = 1 m c j s + ( 1 θ ) c ˜ s 1 ;
11:
    c 0 s + 1 = c m s ;
12:
end for
Output: 
c ˜ s .
Theorem 2
(Non-strongly convex). If Assumption 2 holds, then by choosing m = Θ ( D ) , the RASVRG   NSC can achieve the following oracle complexity in terms of expectation:
O D F ( c 0 ) F ( c * ) ] / ϵ + D L c 0 c * 2 / ϵ .
The proof of Theorem 2 is similar to our previous work [23], and thus it is omitted here. This result indicates that the RASVRG   NSC attained the optimal convergence rate of O 1 / S 2 , where each epoch included m + D stochastic iterations. We also applied Algorithm 4 to solve the SSC problem. The sparse subspace clustering procedure of the RASVRG   NSC is shown in Algorithm 5.
We will test whether the RASVRG   NSC works well under the general experimental framework as the algorithm in [22] did. Moreover, we have an intuitive demonstration of the optimization ability of the RASVRG   NSC for computing c j * by recovering the corrupted image. We simply computed the matrix multiplication of X j and c j * to find the restoration of the corrupted image, as shown in Figure 1. All the results show that our RASVRG   NSC performed well for restoration of the corrupted image.
Algorithm 5 Sparse subspace clustering by RASVRG   NSC .
Input: 
Date Data X = x 1 , , x N and parameters k max and ϵ .
1:
b x j , A X j , and compute c * ( b , A ) with RASVRG   NSC ;
2:
Set C * = c 1 * , , c N * and W = C * + C * ;
3:
Compute segmentation from W by spectral clustering;
Output: 
Segmentation of data X.
For example, we chose an image from the dataset and manually added random pixel corruption to the image with corruption rates ρ = 0.3 and 0.6, as shown in Figure 1b,c. Then, the image was resized to the vector b, and the whole dataset without the chosen image was A. By using the RASVRG   NSC , the recovery vector was then computed as the matrix multiplication of A and c * . Eventually, the recovery image was resized to a recovered vector. As we can see in Figure 1d,e, our algorithm could recover the damaged image excellently, which shows that the RASVRG   NSC enjoys good robustness intuitively.

4. Experimental Results

In this section, we evaluate the efficiency, clustering accuracy and robustness of the RASVRG on many synthetic and real-world face datasets.

4.1. Experimental Set-Up

Datasets. Firstly, the synthetic datasets were generated as follows. Each element in the matrix X R D × N was independent and had identical Gaussian distributions. When solving the high underdetermination problem, we found that the performance of all the algorithms was not good, and there was no convergence. For this reason, we generated highly overdetermined data X for sparse subspace clustering. For more details, refer to [14]. Test sample b could be obtained by b = X c 0 , where the sparsity of c 0 was set to p, which means the number of nonzero entries in c 0 was p × c 0 0 . Each entry of c 0 was generated in the interval [−10, 10] according to a uniform distribution. Under the conditions p = 0.1 and λ = 1e−6, the sparsity of c 0 was set to p = 0.1 , which was for consistency with the other literature [14] to compare the results. We computed the relative error of c 0 as a function of time and the number of passes.
Secondly, the AR face database included more than 4000 frontal images 165 × 120 in size under different illumination changes, expressions and facial disguises. In order to save computational time, we downsampled the face images and reduced the number of individuals in the experiments. We randomly chose a subset with no more than 15 individuals, and each individual had 26 face images, which were downsampled to 32 32 pixels.
Thirdly, the extended Yale B database contains 2414 images [44]. The face images of each individual were taken under different illumination conditions and cropped to images [45] 192 168 in size. We randomly chose 10 persons, and each individual had 64 images. Each image was manually downsampled to a size of 42 48 .
On these three datasets, we compared the clustering accuracy and running time of our algorithms with those of state-of-the-art algorithms under different conditions. For the AR face database, we manually added different levels of random, unrelated block occlusion to measure the performance in the framework of the elastic net. And for the extended Yale B database, different levels of random pixel corruption were added to the images, and the performance was measured in the framework of the 0 / 1 -norm regularizer.
All of the experiments were performed on a computer with a CPU Intel i7-7700K, Windows 7 operating system and 40 GB of memory. We used Matlab and C++ to implement each of our clustering tasks.
Parameter Settings. In face clustering tasks, the regularization parameter γ of each algorithm is selected in the range of 10 { 1 , 2 , , 9 } . Through tuning γ , all of the algorithms achieve their best performance. The addition of different levels of random pixel corruption and random square blocks is explained in detail. For random pixel corruption, a variable ρ [ 0 , 1 ) is introduced as the random corruption index to corrupt randomly chosen pixels from each image, which follows a uniform distribution between [0,1]. Moreover, the occlusion index ϕ [ 0 , 1 ] is set. We replaced random square blocks with the unrelated image at a percentage of ϕ . In two real-world face data experiments, through setting each of parameters ρ and ϕ [ 0 , 1 ] to 0.3 and 0.6, respectively, we simulated two levels of possible corruption on the clean original image. The occlusion index was set to 0.3 and 0.6 for analysis, and it could also be set to other values, such as 0.1 and 0.8. The smaller the value, the better the recovery result.

4.2. Clustering on Synthetic Data

We tested the performance of the RASVRG in the frameworks of the 1 / 0 -norm regularizer and elastic net regularizer in terms of clustering accuracy and running time on synthetic data. All the results reported are the averages of 10 independent trials.
We compared the proposed RASVRG   NSC with three state-of-the-art algorithms namely OMP [22], Prox-SVRG [25] and DALM [46], with 1 / 0 -norm regularizers. The experimental results for the clustering accuracy are shown in Figure 2. Here, K denotes the number of subspaces, D is the ambient dimension, N i is the number of points per subspace, and d denotes the dimension of the subspaces. The clustering accuracies are reported with different D, K and γ values when setting d = 10 and N i 60 .
From Figure 2, we can see that the RASVRG   NSC consistently achieved accuracy rates over 95% and outperformed other methods, including the Prox-SVRG algorithm. Compared with the Prox-SVRG algorithm, the RASVRG   NSC could achieve a higher accuracy in a shorter time, which also shows the obvious acceleration effect of our RASVRG method. We all know that OMP is a fast algorithm, but its accuracy is usually the lowest. Other algorithms such as the Prox-SVRG algorithm run quickly but ultimately do not reach the highest accuracy. It can be seen that the clustering accuracies of all the algorithms except OMP fluctuated up and down over time. In addition, the accuracies of the Prox-SVRG decreased even after 4 s in Figure 2e, while the RASVRG   NSC had the most stable performance and always achieved superior accuracy over other algorithms. Moreover, it was found that the performance of the DALM and Prox-SVRG algorithms was affected by the change in γ , while the RASVRG   NSC had stable performance. Moreover, we compared the convergence performance (including the objective value minus the minimum value (i.e., objective gap) versus the number of effective passes or running time) of all the methods, as shown in Figure 3. Note that evaluating N component function gradients or computing a single full gradient was considered one effective pass. All the experimental results show that the proposed RASVRG   NSC algorithm converged significantly faster than other methods.
We compared the proposed RASVRG   SC with the four algorithms, namely RFSS [47], FISTA [48], Homotopy [49], and the Prox-SVRG [25], for the elastic net model. In order to evaluate the robustness of different methods versus the parameter γ , we varied γ in the range of 10 { 5 , 6 , 7 , 8 , 9 } when setting d = 40 , N i 60 to generate two types of random data.
The clustering accuracies of these methods are listed in Table 1. It is clear that the RASVRG   SC achieved the highest accuracy in most cases. The stochastic method (Prox-SVRG) performed relatively well, but the running time was longer than that of the RASVRG   SC . The accuracies of RFSS and Homotopy were similar and much lower than those of the RASVRG   SC and Prox-SVRG, and thus we did not report their running time results. FISTA performed much better than Homotopy and RFSS, and it sometimes achieved an accuracy of over 90%, but it ran more than five times slower than the RASVRG   SC . All the results indicate that the RASVRG   SC was superior to the other methods in terms of both robustness and efficiency.

4.3. Face Clustering Based on Elastic-Net Regularizer

In this part, we compare the proposed RASVRG   SC with popular elastic net solvers on the AR face database, including regularized feature sign search (RFSS) [47], Homotopy [49], the proximal stochastic variance reduction gradient (Prox-SVRG) [25] and the fast iterative shrinkage thresholding algorithm (FISTA) [48]. In each trial, we randomly picked k { 2 , 5 , 8 , 11 , 14 } individuals and took all the images (under different illuminations) as the data to be clustered.
Table 2 reports the clustering accuracies of different methods on the face datasets with manually added random, unrelated block occlusion. All the results indicate that in most cases, the RASVRG   SC outperformed the other algorithms in terms of clustering accuracy. When the number of clusters was k = 2 , all the algorithms except for Homotopy and RFSS achieved relatively high accuracies. With the number of clusters and the occlusion rate ϕ increasing, the performance of Homotopy had certain advantages over other methods, except for the RASVRG   SC . In addition, the accuracies of Homotopy were close to those of RFSS. Due to the existence of random block occlusion, the clustering accuracies of all of the algorithms decreased rapidly as the number of clusters k increased. When k 10 , the RASVRG   SC obtained the best clustering accuracy. In the case of slight corruption (e.g., ϕ = 0.3 ), the Prox-SVRG algorithm achieved the best accuracy when k = 14 , but in the other cases, the RASVRG   SC significantly outperformed the other algorithms. When the occlusion rate was high (e.g., ϕ = 0.6 ), the RASVRG   SC performed much better than the other algorithms, except for the case of k = 14 .

4.4. Face Clustering Based on 0 - and 1 -Norm Regularizers

In this part, we evaluate the clustering accuracy and robustness of the RASVRG   NSC on the extended Yale Face B database compared with five popular 0 / 1 -based solvers, namely the Prox-SVRG [25], Homotopy [50], OMP [22], DALM [46] and PALM [46]. We randomly picked k { 2 , 3 , 5 , 8 , 10 } individuals and took all the images under different illuminations as the data to be clustered.
The clustering performance of all the methods is shown in Figure 4. The RASVRG   NSC attained the highest clustering accuracy in almost all cases. It is quite clear that the performance of OMP was the worst in all cases when the random pixel corruption value ρ varied from 0.3 to 0.6. In the case of slight pixel corruption (e.g., ρ = 0.3 ), all the algorithms reached more than 90% accuracy when k = 2 , except for OMP. The accuracy of Homotopy decreased rapidly when k > 2 . The Prox-SVRG and PALM had rather close accuracies and maintained high clustering accuracies as k increased. When the random pixel corruption (e.g., ρ = 0.6 ) was high, only the RASVRG   NSC and Prox-SVRG algorithms achieved more than 70% accuracy when k = 2 . As the number of clusters increased, the clustering accuracies of all of the algorithms decreased. DALM performed the best when k = 10 . Similarly, the RASVRG   NSC performed well under various numbers of clusters, and the clustering accuracy was always about 10% higher than that of Homotopy.
The RASVRG algorithm outperformed other algorithms in terms of clustering accuracy under the frameworks of both elastic net and 1 / 0 -norm regularizers on the real-world face datasets with manually added random pixel corruption and unrelated block occlusion in most cases. This illustrates the strong robustness and wide applicability of our algorithms for various SSC problems, especially robust face clustering.

5. Conclusions and Future Work

In this paper, we proposed two efficient algorithms, the RASVRG   SC and RASVRG   NSC , to solve the elastic net regularizer and 1 -norm regularizer-based sparse subspace clustering problems, respectively. To the best of our knowledge, this work is the first one to propose faster stochastic optimization algorithms instead of deterministic methods to solve various large-scale SSC problems, especially large-scale robust face clustering. The experimental results for both synthetic and real-world face datasets demonstrated the effectiveness of our algorithms. Our algorithms performed much better than the state-of-the-art methods in terms of both clustering accuracy and running time. On the synthetic datasets, both the RASVRG   NSC and RASVRG   SC achieved more stable and higher clustering accuracies in most cases compared with other elastic net solvers and 1 -norm solvers. On the real-world face datasets with different levels of random pixel corruption and random block occlusion, our algorithms also achieved much higher clustering accuracies, which indicates their robustness to corrupted or damaged data. In other words, aisde from enjoying a higher speed, the RASVRG algorithm performed much better than the state-of-the-art methods in terms of both accuracy and robustness.
It was noticed that the RASVRG tracked only one variable in the inner loop, which made it quite friendly to asynchronous parallel and distributed implementation, including privacy-preserving federated learning [51]. Applying parallel acceleration to our algorithms and other subspace clustering problems, such as those in [52,53], can make the RASVRG excellent in terms of both clustering accuracy and speed in large-scale privacy-preserving clustering. This can be an exciting orientation for our future work.

Author Contributions

Methodology, H.L.; Validation, L.Y.; Formal analysis, F.S.; Investigation, L.Z. and L.W.; Resources, L.W.; Data curation, H.L., L.Z. and Y.L.; Writing—original draft, L.Y.; Writing—review & editing, F.S.; Supervision, Y.L.; Funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62276182), and Peng Cheng Lab Program (No. PCL2023A08).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We thank all of the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  2. Shi, Y.; Otto, C.; Jain, A.K. Face Clustering: Representation and Pairwise Constraints. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1626–1640. [Google Scholar] [CrossRef]
  3. Keuper, M.; Andres, B.; Brox, T. Motion trajectory segmentation via minimum cost multicuts. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3271–3279. [Google Scholar]
  4. da Cruz Nassif, L.F.; Hruschka, E.R. Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection. IEEE Trans. Inf. Forensics Secur. 2013, 8, 46–54. [Google Scholar] [CrossRef]
  5. Vidal, R. Subspace clustering. IEEE Signal Process. Mag. 2011, 28, 52–68. [Google Scholar] [CrossRef]
  6. Wang, M.; Deng, W. Deep Face Recognition: A Survey. Neurocomputing 2021, 429, 215–244. [Google Scholar] [CrossRef]
  7. Yang, Y.; Li, P. Noisy L0-sparse subspace clustering on dimensionality reduced data. In Proceedings of the Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands, 1–5 August 2022; pp. 2235–2245. [Google Scholar]
  8. Zhu, W.; Peng, B. Sparse and low-rank regularized deep subspace clustering. Knowl. Based Syst. 2020, 204, 1–8. [Google Scholar] [CrossRef]
  9. Wang, L.; Wang, Y.; Deng, H.; Chen, H. Attention reweighted sparse subspace clustering. Pattern Recognit. 2023, 139, 109438. [Google Scholar] [CrossRef]
  10. Zhao, J.; Li, Y. Binary multi-view sparse subspace clustering. Neural Comput. Appl. 2023, 35, 21751–21770. [Google Scholar] [CrossRef]
  11. Zhong, G.; Pun, C.M. Subspace clustering by simultaneously feature selection and similarity learning. Knowl. Based Syst. 2020, 193, 1–10. [Google Scholar] [CrossRef]
  12. Elhamifar, E.; Vidal, R. Sparse subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2790–2797. [Google Scholar]
  13. Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  14. Elhamifar, E.; Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef]
  15. Soltanolkotabi, M.; Candes, E.J. A geometric analysis of subspace clustering with outliers. Ann. Stat. 2012, 40, 2195–2238. [Google Scholar] [CrossRef]
  16. Wang, Y.X.; Xu, H. Noisy sparse subspace clustering. J. Mach. Learn. Res. 2016, 17, 320–360. [Google Scholar]
  17. You, C.; Robinson, D.P.; Vidal, R. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3395–3404. [Google Scholar]
  18. Lu, C.Y.; Min, H.; Zhao, Z.Q.; Zhu, L.; Huang, D.S.; Yan, S. Robust and efficient subspace segmentation via least squares regression. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 347–360. [Google Scholar]
  19. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed]
  20. Panagakis, Y.; Kotropoulos, C. Elastic net subspace clustering applied to pop/rock music structure analysis. Pattern Recognit. Lett. 2014, 38, 46–53. [Google Scholar] [CrossRef]
  21. You, C.; Li, C.G.; Robinson, D.P.; Vidal, R. Oracle based active set algorithm for scalable elastic net subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3928–3937. [Google Scholar]
  22. You, C.; Robinson, D.; Vidal, R. Scalable sparse subspace clustering by orthogonal matching pursuit. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3918–3927. [Google Scholar]
  23. Zhou, K.; Shang, F.; Cheng, J. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5975–5984. [Google Scholar]
  24. Shang, F.; Jiao, L.; Zhou, K.; Cheng, J.; Ren, Y.; Jin, Y. ASVRG: Accelerated Proximal SVRG. In Proceedings of the Asian Conference on Machine Learning, Beijing, China, 14–16 November 2018; pp. 815–830. [Google Scholar]
  25. Xiao, L.; Zhang, T. A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 2014, 24, 2057–2075. [Google Scholar] [CrossRef]
  26. Johnson, R.; Zhang, T. Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
  27. Allen-Zhu, Z. Katyusha: The first direct acceleration of stochastic gradient methods. J. Mach. Learn. Res. 2017, 18, 8194–8244. [Google Scholar]
  28. Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 116. [Google Scholar]
  29. Lan, G.; Zhou, Y. An optimal randomized incremental gradient method. Math. Program. 2018, 171, 167–215. [Google Scholar] [CrossRef]
  30. Bruckstein, A.M.; Donoho, D.L.; Elad, M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 2009, 51, 34–81. [Google Scholar] [CrossRef]
  31. Candès, E.J.; Wakin, M.B. An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
  32. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
  33. Ng, A.Y.; Jordan, M.I.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002; pp. 849–856. [Google Scholar]
  34. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
  35. Nasihatkon, B.; Hartley, R. Graph connectivity in sparse subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 2137–2144. [Google Scholar]
  36. Nitanda, A. Stochastic proximal gradient descent with acceleration techniques. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1574–1582. [Google Scholar]
  37. Lin, Q.; Lu, Z.; Xiao, L. An accelerated proximal coordinate gradient method. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3059–3067. [Google Scholar]
  38. Lin, H.; Mairal, J.; Harchaoui, Z. A universal catalyst for first-order optimization. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 3384–3392. [Google Scholar]
  39. Zhang, Y.; Xiao, L. Stochastic primal-dual coordinate method for regularized empirical risk minimization. J. Mach. Learn. Res. 2017, 18, 2939–2980. [Google Scholar]
  40. Defazio, A. A simple practical accelerated method for finite sums. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 676–684. [Google Scholar]
  41. Nesterov, Y. Introductory Lectures on Convex Optimization: A Basic Course; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 87. [Google Scholar]
  42. Shang, F.; Zhou, K.; Liu, H.; Cheng, J.; Tsang, I.; Zhang, L.; Tao, D.; Jiao, L. VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning. IEEE Trans. Knowl. Data Eng. 2020, 32, 188–202. [Google Scholar] [CrossRef]
  43. Defazio, A.; Bach, F.; Lacoste-Julien, S. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1646–1654. [Google Scholar]
  44. Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 643–660. [Google Scholar] [CrossRef]
  45. Lee, K.C.; Ho, J.; Kriegman, D.J. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 684–698. [Google Scholar] [PubMed]
  46. Yang, A.Y.; Zhou, Z.; Balasubramanian, A.G.; Sastry, S.S.; Ma, Y. Fast 1-Minimization Algorithms for Robust Face Recognition. IEEE Trans. Image Process. 2013, 22, 3234–3246. [Google Scholar] [CrossRef] [PubMed]
  47. Jin, B.; Lorenz, D.A.; Schiffler, S. Elastic-net regularization: Error estimates and active set methods. Inverse Probl. 2009, 25, 115022. [Google Scholar] [CrossRef]
  48. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  49. Carreira-Perpinán, M.A. The Elastic Embedding Algorithm for Dimensionality Reduction. In Proceedings of the International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; Volume 10, pp. 167–174. [Google Scholar]
  50. Osborne, M.R.; Presnell, B.; Turlach, B.A. A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 2000, 20, 389–403. [Google Scholar] [CrossRef]
  51. Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A Hybrid Approach to Privacy-Preserving Federated Learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–12. [Google Scholar]
  52. Zheng, Q.; Zhu, J.; Tian, Z.; Li, Z.; Pang, S.; Jia, X. Constrained bilinear factorization multi-view subspace clustering. Knowl. Based Syst. 2020, 194, 1–10. [Google Scholar] [CrossRef]
  53. Chen, J.; Mao, H.; Sang, Y.; Yi, Z. Subspace clustering using a symmetric low-rank representation. Knowl. Based Syst. 2017, 127, 1–12. [Google Scholar] [CrossRef]
Figure 1. Examples of recovered results of our RASVRG method for a face image with random pixel corruption, where the face image was chosen from the extended Yale B database. (a) An original clean image chosen from the extended Yale B database [44]. (b,c) The images with random pixel corruption of ρ = 0.3 and ρ = 0.6 , respectively. (d,e) The images recovered by the RASVRG from (b) and (c), respectively.
Figure 1. Examples of recovered results of our RASVRG method for a face image with random pixel corruption, where the face image was chosen from the extended Yale B database. (a) An original clean image chosen from the extended Yale B database [44]. (b,c) The images with random pixel corruption of ρ = 0.3 and ρ = 0.6 , respectively. (d,e) The images recovered by the RASVRG from (b) and (c), respectively.
Sensors 24 03659 g001
Figure 2. Comparison of the algorithms based on 0 - and 1 -norm regularizers on synthetic datasets under different conditions.
Figure 2. Comparison of the algorithms based on 0 - and 1 -norm regularizers on synthetic datasets under different conditions.
Sensors 24 03659 g002
Figure 3. Comparison of the convergence performance of the methods on the synthetic data 50,000 × 1000 in size. Note that the horizontal axis denotes the number of effective passes (left) or running time (right, in seconds), and the vertical axis corresponds to the objective value minus the minimum value.
Figure 3. Comparison of the convergence performance of the methods on the synthetic data 50,000 × 1000 in size. Note that the horizontal axis denotes the number of effective passes (left) or running time (right, in seconds), and the vertical axis corresponds to the objective value minus the minimum value.
Sensors 24 03659 g003
Figure 4. Comparison of the algorithms based on 0 - and 1 -norm regularizers on the extended Yale B database with different random pixel corruption.
Figure 4. Comparison of the algorithms based on 0 - and 1 -norm regularizers on the extended Yale B database with different random pixel corruption.
Sensors 24 03659 g004
Table 1. Clustering accuracy and running time of the algorithms based on elastic net regularizers on synthetic data. The highest clustering accuracy is shown in bold.
Table 1. Clustering accuracy and running time of the algorithms based on elastic net regularizers on synthetic data. The highest clustering accuracy is shown in bold.
γ 10 9 10 8 10 7 10 6 10 5
(a) D = 50 , K = 20
Clustering accuracy (%)
RFSS73.3881.5878.4070.8871.10
FISTA90.6592.6588.9691.7589.03
Homotopy76.1879.6574.9073.2171.56
Prox-SVRG94.8198.2196.1897.3095.88
RASVRG   SC 97.3398.6398.7698.8196.06
Running time (seconds)
FISTA7.6110.4610.1510.639.82
Prox-SVRG1.861.741.772.032.05
RASVRG   SC 1.311.341.321.461.49
(b) D = 50 , K = 40
Clustering accuracy (%)
RFSS69.7872.7871.3171.8270.50
FISTA87.0187.8588.9586.5187.50
Homotopy74.4175.7076.3371.4371.01
Prox-SVRG92.8591.6093.2892.8993.64
RASVRG   SC 94.2594.8593.3294.1794.79
Running time (seconds)
FISTA11.8618.7522.6323.1120.24
Prox-SVRG3.603.884.704.214.36
RASVRG   SC 2.773.023.233.163.26
Table 2. Clustering performance of different algorithms based on the elastic net regularizer on the AR face database with random, unrelated block occlusion.
Table 2. Clustering performance of different algorithms based on the elastic net regularizer on the AR face database with random, unrelated block occlusion.
Clusters (k)2581114
(a) ϕ = 0.3
Clustering accuracy (%)
RFSS53.8433.8425.0018.8817.58
FISTA55.7633.0724.5119.9317.85
Homotopy53.8433.8424.0318.8817.58
Prox-SVRG61.5332.3024.5120.9719.27
RASVRG   SC 65.3835.1425.3121.6719.38
(b) ϕ = 0.6
Clustering accuracy (%)
RFSS53.8430.0022.1119.5818.40
FISTA55.7628.4621.1518.5315.93
Homotopy53.8429.2323.0720.6219.23
Prox-SVRG57.6930.0022.1119.9316.75
RASVRG   SC 57.6931.1423.4121.5220.94
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, H.; Yang, L.; Zhang, L.; Shang, F.; Liu, Y.; Wang, L. Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering. Sensors 2024, 24, 3659. https://doi.org/10.3390/s24113659

AMA Style

Liu H, Yang L, Zhang L, Shang F, Liu Y, Wang L. Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering. Sensors. 2024; 24(11):3659. https://doi.org/10.3390/s24113659

Chicago/Turabian Style

Liu, Hongying, Linlin Yang, Longge Zhang, Fanhua Shang, Yuanyuan Liu, and Lijun Wang. 2024. "Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering" Sensors 24, no. 11: 3659. https://doi.org/10.3390/s24113659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop