skip to main content
article

Cost and fault-tolerant aware resource management for scientific workflows using hybrid instances on clouds

Published: 01 April 2018 Publication History

Abstract

Cloud service providers are offering computing resources at a reasonable price as a pay-per-use model. Further, cloud service providers have also introduced different pricing models like spot, blockspot and spotfleet instances that are cost effective and user's have to go through the bidding to balance the reliability and monetary costs. Henceforth, Scientific Workflows (SWf) that are used to model applications of high throughput, computation and complex large-scale data analysis are significantly adopting these computing resources. Nevertheless, spot instances are terminated when the market spot price exceeds the users bid price. Moreover, failures are inevitable in such a large distributed systems and often pose a challenge to design a fault-tolerant scheduling algorithm for SWf. This paper presents an efficient, low-cost and fault-tolerant scheduling algorithm and a bidding strategy to minimize the volatility and cost of resource provisioning for SWf. The proposed algorithm uses spot and blockspot instances as hybrid instances in comparison with on-demand instance to reduce the execution cost and fault-tolerant while meeting the SWf deadline. The results obtained reveal the promising potential of the proposed scheduling algorithm and are demonstrated through empirical simulation study that is robust under short deadlines with minimal makespan and cost.

References

[1]
Almi'Ani K, Lee YC (2016) Partitioning-based workflow scheduling in clouds. In: 2016 IEEE 30th international conference on Advanced information networking and applications (AINA). IEEE, Piscataway, pp 645---652
[2]
Bala A, Chana I (2015) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42(3):980---989
[3]
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and experience 41 (1):23---50
[4]
Calheiros RN, Buyya Rajkumar (2014) Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans Parallel Distrib Syst 25(7):1787---1796
[5]
Chen J, Yang Y (2007) Adaptive selection of necessary and sufficient checkpoints for dynamic verification of temporal constraints in grid workflow systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 2(2):6
[6]
Chirkin AM, Kovalchuk SV (2014) Towards better workflow execution time estimation. IERI Procedia 10:216---223
[7]
Darbha S, Agrawal DP (1994) A task duplication based optimal scheduling algorithm for variable execution time tasks. In: International conference on parallel processing, 1994. ICPP 1994, vol 2. IEEE, Piscataway, pp 52---56
[8]
Dejun J, Pierre G, Chi C-h (2010) Ec2 performance analysis for resource provisioning of service-oriented applications. In: Service-Oriented computing. ICSOC/ServiceWave 2009 workshops. Springer, Berlin, pp 197---207
[9]
D�az JL, Entrialgo J, Garc�a M, Garc�a J, Garc�a DF (2017) Optimal allocation of virtual machines in multi-cloud environments with reserved and on-demand pricing. Futur Gener Comput Syst 71:129---144
[10]
Hwang S, Kesselman C (2003) Grid workflow: A flexible failure handling framework for the grid. In: 2003. Proceedings. 12th IEEE International Symposium on High Performance Distributed Computing. IEEE, Piscataway, pp 126---137
[11]
Jangjaimon I, Tzeng N-F (2015) Effective cost reduction for elastic clouds under spot instance pricing through adaptive checkpointing. IEEE Trans Comput 64 (2):396---409
[12]
Javadi B, Abawajy J, Buyya R (2012) Failure-aware resource provisioning for hybrid cloud infrastructure. J Parallel Distrib Comput 72(10):1318---1331
[13]
Lifka D, Foster I, Mehringer S, Parashar M, Redfern P, Stewart C, Tuecke S (2013) Xsede cloud survey report. Technical report, National Science Foundation, USA, Tech. Rep.
[14]
Juve G, Chervenak A, Deelman E, Bharathi S, Mehta G, Vahi K (2013) Characterizing and profiling scientific workflows. Futur Gener Comput Syst 29 (3):682---692
[15]
Li J, Humphrey M, Cheah Y-W, Ryu Y, Agarwal D, Jackson K, van Ingen C (2010) Fault tolerance and scaling in e-science cloud applications: Observations from the continuing development of modisazure. In: 2010 IEEE Sixth International Conference on e-Science (e-Science). IEEE, Piscataway, pp 246---253
[16]
Li X, Zhang L, Wu Y, Liu X, Zhu E, Yi H, Wang F, Zhang C, Yang Y (2017) A novel workflow-level data placement strategy for data-sharing scientific cloud workflows. IEEE Trans Serv Comput
[17]
Mehmi S, Verma HK, Sangal AL (2016) Comparative analysis of cloudlet completion time in time and space shared allocation policies during attack on smart grid cloud. Procedia Computer Science 94:435---440
[18]
Plankensteiner K, Prodan R, Fahringer T, Kert�sz A, Kacsuk P (2009) Fault detection, prevention and recovery in current grid workflow systems. In: Grid and services evolution, pp 1---13
[19]
Qu C, Calheiros RN, Buyya R (2016) A reliable and cost-efficient auto-scaling system for web applications using heterogeneous spot instances. J Netw Comput Appl 65:167---180
[20]
Ribas M, Furtado CG, de Souza JN, Barroso GC, Moura A, Lima AS, Sousa FRC (2015) A petri net-based decision-making framework for assessing cloud services adoption The use of spot instances for cost reduction. J Netw Comput Appl 57:102---118
[21]
Rodriguez MA, Buyya R (2014) Deadline based resource provisioningand scheduling algorithm for scientific workflows on clouds. IEEE Transactions on Cloud Computing 2(2):222---235
[22]
Samak T, Gunter D, Goode M, Deelman E, Juve G, Silva F, Vahi K (2012) Failure analysis of distributed scientific workflows executing in the cloud. In: Proceedings of the 8th international conference on network and service management, pp 46---54 international federation for information processing
[23]
Tang X, Li K, Liao G (2014) An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems. Clust Comput 17(4):1413---1425
[24]
Vinay K, Dilip Kumar SM (2016) Auto-scaling for deadline constrained scientific workflows in cloud environment. In: India Conference (INDICON) 2016 IEEE Annual. IEEE, Piscataway, pp 1---6
[25]
Wan J, Zhang R, Gui X, Xu B (2016) Reactive pricing: an adaptive pricing policy for cloud providers to maximize profit. IEEE Trans Netw Serv Manag 13 (4):941---953
[26]
Zhu X, Ji W, Guo H, Zhu D, Yang LT, Liu L (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27(12):3501---3517

Cited By

View all
  • (2022)OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environmentsThe Journal of Supercomputing10.1007/s11227-021-04235-z78:6(8003-8024)Online publication date: 1-Apr-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 77, Issue 8
Apr 2018
1145 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2018

Author Tags

  1. Blockspot
  2. Cloud computing
  3. Fault-tolerant
  4. Instances
  5. On-demand
  6. Scheduling
  7. Scientific workflows
  8. Spot

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environmentsThe Journal of Supercomputing10.1007/s11227-021-04235-z78:6(8003-8024)Online publication date: 1-Apr-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media