research-article

Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds

Authors:

Xiaomin Zhu,

Ji Wang,

Hui Guo,

Dakai Zhu,

Laurence T. Yang,

Ling LiuAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 27, Issue 12

Pages 3501 - 3517

https://doi.org/10.1109/TPDS.2016.2543731

Published: 01 December 2016 Publication History

Abstract

Clouds are becoming an important platform for scientific workflow applications. However, with many nodes being deployed in clouds, managing reliability of resources becomes a critical issue, especially for the real-time scientific workflow execution where deadlines should be satisfied. Therefore, fault tolerance in clouds is extremely essential. The PB (primary backup) based scheduling is a popular technique for fault tolerance and has effectively been used in the cluster and grid computing. However, applying this technique for real-time workflows in a virtualized cloud is much more complicated and has rarely been studied. In this paper, we address this problem. We first establish a real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics. Based on this model, we develop approaches for task allocation and message transmission to ensure faults can be tolerated during the workflow execution. Finally, we propose a dynamic fault-tolerant scheduling algorithm, FASTER, for real-time workflows in the virtualized cloud. FASTER has three key features: 1) it employs a backward shifting method to make full use of the idle resources and incorporates task overlapping and VM migration for high resource utilization, 2) it applies the vertical/horizontal scaling-up technique to quickly provision resources for a burst of workflows, and 3) it uses the vertical scaling-down scheme to avoid unnecessary and ineffective resource changes due to fluctuated workflow requests. We evaluate our FASTER algorithm with synthetic workflows and workflows collected from the real scientific and business applications and compare it with six baseline algorithms. The experimental results demonstrate that FASTER can effectively improve the resource utilization and schedulability even in the presence of node failures in virtualized clouds.

Cited By

View all

Umamaheswari KMuthu kumaran A(2023)HGPSOJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22284244:3(4445-4458)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-222842
Xiao WFang XLiu BWang JZhu X(2023)UNION: Fault-tolerant Cooperative Computing in Opportunistic Mobile Edge CloudACM Transactions on Internet Technology10.1145/361799423:4(1-27)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3617994
Beikzadeh Abbasi FRezaee AAdabi SMovaghar A(2023)Fault-tolerant scheduling of graph-based loads on fog/cloud environments with multi-level queues and LSTM-based workload predictionComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109964235:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.comnet.2023.109964
Show More Cited By

Index Terms

Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds

Index terms have been assigned to the content through auto-classification.

Recommendations

FESTAL: Fault-Tolerant Elastic Scheduling Algorithm for Real-Time Tasks in Virtualized Clouds
As clouds have been deployed widely in various fields, the reliability and availability of clouds become the major concern of cloud service providers and users. Thereby, fault tolerance in clouds receives a great deal of attention in both industry and ...
Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds

Cloud service providers are offering computing resources at a reasonable price as a pay-per-use model. Further, cloud service providers have also introduced different pricing models like spot, blockspot and spotfleet instances that are cost effective ...
Resource provisioning and scheduling in clouds: QoS perspective

Resource provisioning of appropriate resources to cloud workloads depends on the quality of service (QoS) requirements of cloud applications and is a challenging task. In cloud environment, heterogeneity, uncertainty and dispersion of resources ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 27, Issue 12

December 2016

304 pages

ISSN:1045-9219

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 December 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Umamaheswari KMuthu kumaran A(2023)HGPSOJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-22284244:3(4445-4458)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.3233/JIFS-222842
Xiao WFang XLiu BWang JZhu X(2023)UNION: Fault-tolerant Cooperative Computing in Opportunistic Mobile Edge CloudACM Transactions on Internet Technology10.1145/361799423:4(1-27)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3617994
Beikzadeh Abbasi FRezaee AAdabi SMovaghar A(2023)Fault-tolerant scheduling of graph-based loads on fog/cloud environments with multi-level queues and LSTM-based workload predictionComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109964235:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.comnet.2023.109964
Yu JTong WLv PFeng D(2022)TERMSJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.08.005170:C(74-85)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2022.08.005
Coleman TCasanova HPottier LKaushik MDeelman EFerreira da Silva R(2022)WfCommonsFuture Generation Computer Systems10.1016/j.future.2021.09.043128:C(16-27)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.future.2021.09.043
Saxena DSingh A(2022)OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environmentsThe Journal of Supercomputing10.1007/s11227-021-04235-z78:6(8003-8024)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s11227-021-04235-z
Chakravarthi KNeelakantan PShyamala LVaidehi V(2022)Reliable budget aware workflow scheduling strategy on multi-cloud environmentCluster Computing10.1007/s10586-021-03464-425:2(1189-1205)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10586-021-03464-4
Khojasteh Toussi GNaghibzadeh M(2021)A divide and conquer approach to deadline constrained cost-optimization workflow scheduling for the cloudCluster Computing10.1007/s10586-020-03223-x24:3(1711-1733)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10586-020-03223-x
Mousavi Nik SNaghibzadeh MSedaghat Y(2021)Task replication to improve the reliability of running workflows on the cloudCluster Computing10.1007/s10586-020-03109-y24:1(343-359)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1007/s10586-020-03109-y
Long TChen PXia YJiang NWang XLong M(2021)A Novel Fault-Tolerant Approach to Web Service Composition upon the Edge Computing EnvironmentWeb Services – ICWS 202110.1007/978-3-030-96140-4_2(15-31)Online publication date: 10-Dec-2021
https://dl.acm.org/doi/10.1007/978-3-030-96140-4_2
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations

FESTAL: Fault-Tolerant Elastic Scheduling Algorithm for Real-Time Tasks in Virtualized Clouds

Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds

Resource provisioning and scheduling in clouds: QoS perspective

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations