Skip to main content

Analyzing the Convergence of Federated Learning with Biased Client Participation

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14177))

Included in the following conference series:

  • 786 Accesses

Abstract

Federated Learning (FL) is a promising decentralized machine learning framework that enables a massive number of clients (e.g., smartphones) to collaboratively train a global model over the Internet without sacrificing their privacy. Though FL’s efficacy in non-convex problems is proven, its convergence amidst biased client participation lacks theoretical study. In this paper, we analyze the convergence of FedAvg on non-convex problems, which is the most renowned FL algorithm. We assume even data distribution but non-IID among clients, and elucidate the convergence rate of FedAvg in situations characterized by biased client participation. Our analysis reveals that biased client participation can significantly reduce the precision of the FL model. We validate this through trace-driven experiments, demonstrating that unbiased client participation results in 11% to 50% higher test accuracy compared to extremely biased client participation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abay, A., Zhou, Y., Baracaldo, N., Rajamoni, S., Chuba, E., Ludwig, H.: Mitigating bias in federated learning. arXiv preprint arXiv:2012.02447 (2020). https://doi.org/10.48550/arXiv.2012.02447

  2. Amiri, M.M., Gündüz, D., Kulkarni, S.R., Poor, H.V.: Convergence of federated learning over a noisy downlink. IEEE Trans. Wireless Commun. 21(3), 1422–1437 (2021). https://doi.org/10.1109/TWC.2021.3103874

    Article  Google Scholar 

  3. Balakrishnan, R., Li, T., Zhou, T., Himayat, N., Smith, V., Bilmes, J.: Diverse client selection for federated learning via submodular maximization. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  4. Chen, F., Chen, N., Mao, H., Hu, H.: Assessing four neural networks on handwritten digit recognition dataset (MNIST). arXiv preprint arXiv:1811.08278 (2018). https://doi.org/10.48550/ARXIV.1811.08278

  5. Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI), pp. 571–582 (2014)

    Google Scholar 

  6. Cho, Y.J., Wang, J., Joshi, G.: Client selection in federated learning: convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243 (2020). https://doi.org/10.48550/arXiv.2010.01243

  7. Duan, M., Liu, D., Chen, X., Liu, R., Tan, Y., Liang, L.: Self-balancing federated learning with global imbalanced data in mobile systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 59–71 (2020). https://doi.org/10.1109/TPDS.2020.3009406

    Article  Google Scholar 

  8. Haddadpour, F., Mahdavi, M.: On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425 (2019). https://doi.org/10.48550/arXiv.1910.14425

  9. Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)

    Article  MATH  Google Scholar 

  10. Khaled, A., Mishchenko, K., Richtárik, P.: First analysis of local GD on heterogeneous data. arXiv preprint arXiv:1909.04715 (2019). https://doi.org/10.48550/ARXIV.1909.04715

  11. Khan, L.U., Saad, W., Han, Z., Hossain, E., Hong, C.S.: Federated learning for internet of things: recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. (2021). https://doi.org/10.1109/COMST.2021.3090430

    Article  Google Scholar 

  12. Krizhevsky, A.: Learning Multiple Layers of Features From Tiny Images. University of Toronto, Toronto (2012)

    Google Scholar 

  13. Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.Y.: Sample-level data selection for federated learning. In: IEEE Conference on Computer Communications (INFOCOM), pp. 1–10 (2021). https://doi.org/10.1109/INFOCOM42981.2021.9488723

  14. Li, T., Hu, S., Beirami, A., Smith, V.: Ditto: Fair and robust federated learning through personalization. In: Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 6357–6368. PMLR (2021)

    Google Scholar 

  15. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2, 429–450 (2020)

    Google Scholar 

  16. Li, T., Sanjabi, M., Smith, V.: Fair resource allocation in federated learning. In: International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  17. Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of Fedavg on non-IID data. In: Eighth International Conference on Learning Representations (ICLR) (2020)

    Google Scholar 

  18. Liu, R., Cao, Y., Yoshikawa, M., Chen, H.: Fedsel: Federated SGD under local differential privacy with top-k dimension selection. In: DASFAA (2020)

    Google Scholar

  19. Ma, J., Xie, M., Long, G.: Personalized federated learning with robust clustering against model poisoning. In: Chen, W., Yao, L., Cai, T., Pan, S., Shen, T., Li, X. (eds.) ADMA 2022. LNCS, vol. 13726, pp. 238–252. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22137-8_18

    Chapter  Google Scholar 

  20. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics (AISTATS), pp. 1273–1282 (2017)

    Google Scholar 

  21. Segarceanu, S., Gavat, I., Suciu, G.: Evaluation of deep learning techniques for acoustic environmental events detection. Romanian J. Technical Sci. Appl. Mech. 66(1), 19–37 (2021)

    MathSciNet  Google Scholar 

  22. Tan, L., et al.: Adafed: optimizing participation-aware federated learning with adaptive aggregation weights. IEEE Trans. Network Sci. Eng. 9(4), 2708–2720 (2022). https://doi.org/10.1109/TNSE.2022.3168969

    Article  Google Scholar 

  23. Xu, J., Glicksberg, B.S., Su, C., Walker, P., Bian, J., Wang, F.: Federated learning for healthcare informatics. J. Healthcare Inform. Res. 5(1), 1–19 (2021)

    Article  Google Scholar 

  24. Yang, H., Fang, M., Liu, J.: Achieving linear speedup with partial worker participation in non-IID federated learning. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  25. Yang, W., et al.: Gain without pain: Offsetting DP-injected Nosies stealthily in cross-device federated learning. IEEE Internet Things J. 9(22), 22147–22157 (2021). https://doi.org/10.1109/JIOT.2021.3102030

    Article  Google Scholar 

  26. Yu, H., Jin, R., Yang, S.: On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization. In: International Conference on Machine Learning (ICML), pp. 7184–7193 (2019)

    Google Scholar 

Download references

Acknowledgements

This study received support from the National Natural Science Foundation of China through Grants U1911201 and U2001209, the Natural Science Foundation of Guangdong under Grant 2021A1515011369, and the Science and Technology Program of Guangzhou under Grant 2023A04J2029.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Wu .

Editor information

Editors and Affiliations

Appendiex

Appendiex

Within this section, we will provide proofs for Lemma 1 and Lemma 2.

1.1 Proof of Lemma 1

For any \(t \ge 0 \), there exits a \(t - t_0 \le E \), and \( \omega _k^{t_0} = \omega ^{t_0} \) for all \(k=1,2,...,N\). Similar to previous work [17], we have

$$\begin{aligned} \, \begin{aligned} \mathbb {E} {\left\| \omega _k^t-\omega ^t \right\| }^2 &= \mathbb {E} {\left\| (\omega _k^t - \omega ^{t_0}) -(\omega ^t - \omega ^{t_0}) \right\| }^2\\ &\le \mathbb {E} {\left\| \omega _k^t - \omega ^{t_0} \right\| }^2\\ &\le \eta _t^2 E \sum _{i=t_0}^t \mathbb {E} \left\| \nabla F_k(\omega _k^i,\xi _k^i)\right\| ^2\\ &\le \eta _t^2 E^2 G^2. \end{aligned} \end{aligned}$$

1.2 Proof of Lemma 2

Since \( n_k = \frac{n}{N} \) for all \( n_k \), \( \sum _{k=1}^{N} M_k^t = mt \), we can derive that

$$\begin{aligned} \, \begin{aligned} \sum _{k\in S_t} p_k^2 = \sum _{k\in S_t} \left( \frac{\frac{n}{N} M_k^t}{ \sum _{k=1}^{N} \frac{n}{N} M_k^t} \right) ^2 = \sum _{k\in S_t} {\left( \frac{M_k^t}{mt} \right) }^2. \end{aligned} \end{aligned}$$
(22)

Utilizing the \(\rho \)-smoothness property of \(F(\omega )\), the subsequent inequality can be derived:

$$\begin{aligned} \, \begin{aligned} \mathbb {E} F(\omega ^{t+1}) \le \mathbb {E} F(\omega ^t) + \mathbb {E} \left\langle \nabla F(\omega ^t), \omega ^{t+1}-\omega ^t \right\rangle +\frac{\rho }{2} \mathbb {E} {\left\| \omega ^{t+1}-\omega ^t \right\| }^2. \end{aligned} \end{aligned}$$
(23)

By applying the fact: \( \mathbb {E} \left\| x \right\| ^2 = \mathbb {E} \left[ \left\| x - \mathbb {E} x \right\| ^2 \right] + \left\| \mathbb {E} x \right\| ^2 \), we can obtain

$$\begin{aligned} \, \begin{aligned} \mathbb {E}& {\left\| \omega ^{t+1}-\omega ^t \right\| }^2 = \eta _t^2 \mathbb {E} {\left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t,\xi _k^t) \right\| }^2\\ &= \eta _t^2 \mathbb {E} {\left\| \frac{N}{m} \sum _{k\in S_t} p_k \left[ \nabla F_k(\omega _k^t,\xi _k^t) - \nabla F_k(\omega _k^t) \right] \right\| }^2 + \eta _t^2 \mathbb {E} {\left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) \right\| }^2. \end{aligned} \end{aligned}$$
(24)

Since each client works in parallel and independently and according to Assumption 3, we have

$$\begin{aligned} \, \begin{aligned} \mathbb {E} {\left\| \omega ^{t+1}-\omega ^t \right\| }^2 &= \frac{\eta _t^2 N^2}{m^2} \sum _{k\in S_t} p_k^2 \mathbb {E} {\left\| \nabla F_k(\omega _k^t,\xi _k^t) - \nabla F_k(\omega _k^t) \right\| }^2\\ &\quad + \eta _t^2 \mathbb {E} {\left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) \right\| }^2\\ &\le \frac{\eta _t^2 N^2 \delta ^2}{m^2} \sum _{k\in S_t} p_k^2 + \eta _t^2 \mathbb {E} {\left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) \right\| }^2. \end{aligned} \end{aligned}$$
(25)

We further note that

$$\begin{aligned} \, \begin{aligned} \mathbb {E}& \left\langle \nabla F(\omega ^t), \omega ^{t+1}-\omega ^t \right\rangle = - \eta _t \mathbb {E} \left\langle \nabla F(\omega ^t), \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega ^t,\xi _k^t) \right\rangle \\ & = - \eta _t \mathbb {E} \left[ \mathbb {E} \left[ \left\langle \nabla F(\omega ^t), \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega ^t,\xi _k^t) \right\rangle \Bigg | \xi ^t \right] \right] \\ & = - \eta _t \mathbb {E} \left\langle \nabla F(\omega ^t), \mathbb {E} \left[ \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega ^t,\xi _k^t) \Big | \xi ^t \right] \right\rangle \\ & = \begin{array}{c} \underbrace{ - \eta _t \mathbb {E} \left\langle \nabla F(\omega ^t), \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega ^t) \right\rangle }\\ A1 \end{array}. \end{aligned} \end{aligned}$$
(26)

Firstly, for bound A1, we can obtain

(27)

Secondly, \(\omega ^{t+1} = \frac{N}{m} \sum _{k \in s_t} p_k \omega _{k}^{t+1}\) according to Eq. (4), therefore we can obtain \(\nabla F(\omega ^{t+1}) = \frac{N}{m} \sum _{k \in s_{t+1}} p_k \nabla F_k(\omega ^{t+1})\)�[24]. For bound A2, we can obtain

$$\begin{aligned} \, \begin{aligned} A2 &= \frac{\eta _t}{2} \mathbb {E} \left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) - \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega ^t) \right\| ^2\\ &= \frac{\eta _t}{2} \mathbb {E} \left\| \frac{N}{m} \sum _{k\in S_t} p_k \left[ \nabla F_k(\omega _k^t) - \nabla F_k(\omega ^t) \right] \right\| ^2. \end{aligned} \end{aligned}$$
(28)

According to the Cauchy-Buniakowsky-Schwarz inequality, we have

$$\begin{aligned} \, \begin{aligned} A2 &\le \frac{\eta _t N^2}{2m^2} \sum _{k\in S_t} p_k^2 \sum _{k\in S_t} \mathbb {E} \left\| \nabla F_k(\omega _k^t) - \nabla F_k(\omega ^t) \right\| ^2. \end{aligned} \end{aligned}$$
(29)

By using Assumption 1, we can obtain

$$\begin{aligned} \, \begin{aligned} A2 & \le \frac{\eta _t \rho ^2 N^2}{2m^2} \sum _{k\in S_t} p_k^2 \sum _{k\in S_t} \mathbb {E} \left\| \omega _k^t - \omega ^t \right\| ^2. \end{aligned} \end{aligned}$$
(30)

By using Lemma 1, we can derive the bound of A2 as

$$\begin{aligned} \, \begin{aligned} A2 & \le \frac{\eta _t^3 \rho ^2 N^2 E^2 G^2}{2m} \sum _{k\in S_t} p_k^2. \end{aligned} \end{aligned}$$
(31)

Upon substituting Eq. (31) into Eq. (27), we arrive at the upper bound for A1 as follows:

$$\begin{aligned} \, \begin{aligned} A1 \le - \frac{\eta _t}{2} \mathbb {E} {\left\| \nabla F(\omega ^t) \right\| }^2 &- \frac{\eta _t}{2} \mathbb {E} \left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) \right\| ^2\\ &\quad + \frac{\eta _t^3 \rho ^2 N^2 E^2 G^2}{2m} \sum _{k\in S_t} p_k^2. \end{aligned} \end{aligned}$$
(32)

By combining the results of Eq. (25), Eq. (26) and Eq. (32), we can obtain

$$\begin{aligned} \, \begin{aligned} F(\omega ^{t+1}) &\le F(\omega ^t) - \frac{\eta _t}{2} \mathbb {E} {\left\| \nabla F(\omega ^t) \right\| }^2\\ &\quad - \frac{\eta _t - \eta _t^2 \rho }{2} \mathbb {E} \left\| \frac{N}{m} \sum _{k\in S_t} p_k \nabla F_k(\omega _k^t) \right\| ^2\\ &\quad + \frac{\eta _t^3 \rho ^2N^2 E^2 G^2}{2m} \sum _{k\in S_t} p_k^2 + \frac{\eta _t^2 \rho N^2 \delta ^2}{2m^2} \sum _{k\in S_t} p_k^2. \end{aligned} \end{aligned}$$
(33)

The conclusion that \( 0 \le \eta _t \le \frac{1}{\rho } \) can be obtained from the setting \(\eta _t = \frac{1}{\rho } \sqrt{\frac{1}{T}}\), we can obtain

(34)

By dividing both the left side and the right side by \( \frac{\eta _t}{2} \), we have

(35)

According to Eq. (13), we have \(\sum _{k\in S_t} p_k^2 = \sum _{k\in S_t} {\left( \frac{M_k^t}{mt} \right) }^2\). As \(\eta _t = \frac{1}{\rho } \sqrt{\frac{1}{T}}\), we can sum Eq. (35) from \(t=0\) to \(T-1\) and obtain

where \(\omega ^*\) is the optimal solution.

Rights and permissions

Reprints and permissions

Copyright information

� 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, L., Hu, M., Zhou, Y., Wu, D. (2023). Analyzing the�Convergence of�Federated Learning with�Biased Client Participation. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14177. Springer, Cham. https://doi.org/10.1007/978-3-031-46664-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46664-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46663-2

  • Online ISBN: 978-3-031-46664-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics