skip to main content
10.1145/2541940.2541963acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Disengaged scheduling for fair, protected access to fast computational accelerators

Published: 24 February 2014 Publication History

Abstract

Today's operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management.
Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval.
Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4% overhead on average (max 18%) compared to direct device access across our evaluation scenarios.

References

[1]
Altera. www.altera.com.
[2]
AMD. Radeon R5xx Acceleration: v1.2, Feb. 2008.
[3]
AMD's revolutionary Mantle. http://www.amd.com/us/products/technologies/mantle.
[4]
ARM graphics plus GPU compute. www.arm.com/products/multimedia/mali-graphics-plus-gpu-compute.
[5]
M. Aron and P. Druschel. Soft timers: Efficient microsecond software timer support for network processing. ACM Trans. on Computer Systems, 18(3):197--228, Aug. 2000.
[6]
C. Basaran and K.-D. Kang. Supporting preemptive task executions and memory copies in gpgpus. In Real-Time Systems (ECRTS), 2012 24th Euromicro Conference on, pages 287--296. IEEE, 2012.
[7]
N. Brookwood. AMD Fusion#8482; family of APUs. AMD White Paper, Mar. 2010.
[8]
C. Cascaval, S. Chatterjee, H. Franke, K. Gildea, and P. Pattnaik. A taxonomy of accelerator architectures and their programming models. IBM Journal of Research and Development, 54(5):5--1, 2010.
[9]
M. Daga, A. M. Aji, and W.-C. Feng. On the efficacy of a fused CPU
[10]
GPU processor (or APU) for parallel computing. In Proc. of the 2011 Symp. on Application Accelerators in High-Performance Computing, Knoxville, TN, July 2011.
[11]
A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In Proc. of the ACM SIGCOMM Conf., Austin, TX, Sept. 1989.
[12]
A. Dwarakinath. A fair-share scheduler for the graphics processing unit. Master's thesis, Stony Brook University, Aug. 2008.
[13]
G. A. Elliott and J. H. Anderson. Globally scheduled real-time multiprocessor systems with GPUs. Real-Time Systems, 48(1):34--74, Jan. 2012.
[14]
M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess, and F. Bellosa. LoGV: Low-overhead GPGPU virtualization. In Proceedings of the 4th Intl. Workshop on Frontiers of Heterogeneous Computing. IEEE, 2013.
[15]
A. G. Greenberg and N. Madras. How fair is fair queuing. Journal of the ACM, 39(3):568--598, July 1992.
[16]
V. Gupta, K. Schwan, N. Tolia, V. Talwar, and P. Ranganathan. Pegasus: Coordinated scheduling for virtualized accelerator-based systems. In Proc. of the USENIX Annual Technical Conf., Portland, OR, June 2011.
[17]
Intel. OpenSource HD Graphics Programmer's Reference Manual: vol. 1, part 2, May 2012.
[18]
Intel Corporation. Enabling consistent platform-level services for tightly coupled accelerators. White Paper, 2008.
[19]
W. Jin, J. S. Chase, and J. Kaur. Interposed proportional sharing for a storage service utility. In Proc. of the ACM SIGMETRICS Conf., New York, NY, June 2004.
[20]
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proc. of the USENIX Annual Technical Conf., Portland, OR, June 2011.
[21]
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In Proc. of the USENIX Annual Technical Conf., Boston, MA, June 2012.
[22]
Khronos OpenCL Working Group and others. The OpenCL specification, v1.2, 2011. A. Munshi, Ed.
[23]
Khronos OpenCL Working Group and others. The OpenGL graphics system: A specification, v4.2, 2012. Mark Segal and Kurt Aekeley, Eds.
[24]
A. Krishna, T. Heil, N. Lindberg, F. Toussi, and S. VanderWiel. Hardware acceleration in the IBM PowerEN processor: Architecture and performance. In Proc. of the 21st Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, Sept. 2012.
[25]
G. Kyriazis. Heterogeneous system architecture: A technical review. Technical report, AMD, Aug. 2012. Rev. 1.0.
[26]
K. Menychtas, K. Shen, and M. L. Scott. Enabling OS research by inferring interactions in the black-box GPU stack. In Proc. of the USENIX Annual Technical Conf., San Jose, CA, June 2013.
[27]
E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: Heterogeneous multiprocessing with satellite kernels. In Proc. of the 22nd ACM Symp. on Operating Systems Principles (SOSP), Big Sky, MT, Oct. 2009.
[28]
Nouveau: Accelerated open source driver for Nvidia cards. nouveau.freedesktop.org.
[29]
NVIDIA Corp. CUDA SDK, v5.0. http://docs.nvidia.com/cuda/.
[30]
S. Panneerselvam and M. M. Swift. Operating systems should manage accelerators. In Proc. of the Second USENIX Workshop on Hot Topics in Parallelism, Berkeley, CA, June 2012.
[31]
A. K. Parekh and R. G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: The single-node case. IEEE/ACM Trans. on Networking, 1(3), June 1993.
[32]
Nouveau, by PathScaleInc. github.com/pathscale/pscnv.
[33]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proc. of the 23rd ACM Symp. on Operating Systems Principles (SOSP), Cascais, Portugal, Oct. 2011.
[34]
K. Shen and S. Park. FlashFQ: A fair queueing I/O scheduler for Flash-based SSDs. In Proc. of the USENIX Annual Technical Conf., San Jose, CA, June 2013.
[35]
M. Shreedhar and G. Varghese. Efficient fair queueing using deficit round robin. In Proc. of the ACM SIGCOMM Conf., Cambridge, MA, Aug. 1995.
[36]
L. Soares and M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proc. of the 9th USENIX Conf. on Operating Systems Design and Implementation (OSDI), Vancouver, BC, Canada, Oct. 2010.
[37]
P. M. Stillwell, V. Chadha, O. Tickoo, S. Zhang, R. Illikkal, R. Iyer, and D. Newell. HiPPAI: High performance portable accelerator interface for SoCs. In Proc. of the 16th IEEE Conf. on High Performance Computing (HiPC), Kochi, India, Dec. 2009.

Cited By

View all
  • (2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
  • (2024)GASS: GPU Automated Sharing at Scale2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00056(439-445)Online publication date: 7-Jul-2024
  • (2023)Gemini: Enabling Multi-Tenant GPU Sharing Based on Kernel Burst EstimationIEEE Transactions on Cloud Computing10.1109/TCC.2021.311920511:1(854-867)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
February 2014
780 pages
ISBN:9781450323055
DOI:10.1145/2541940
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fairness
  2. gpus
  3. hardware accelerators
  4. operating system protection
  5. scheduling

Qualifiers

  • Research-article

Conference

ASPLOS '14

Acceptance Rates

ASPLOS '14 Paper Acceptance Rate 49 of 217 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
  • (2024)GASS: GPU Automated Sharing at Scale2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00056(439-445)Online publication date: 7-Jul-2024
  • (2023)Gemini: Enabling Multi-Tenant GPU Sharing Based on Kernel Burst EstimationIEEE Transactions on Cloud Computing10.1109/TCC.2021.311920511:1(854-867)Online publication date: 1-Jan-2023
  • (2021)Optimizing Goodput of Real-time Serverless Functions using Dynamic Slicing with vGPUs2021 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E52221.2021.00020(60-70)Online publication date: Oct-2021
  • (2021)Deadline-Aware Offloading for High-Throughput Accelerators2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00048(479-492)Online publication date: Feb-2021
  • (2021)GPU-aware resource management in heterogeneous cloud data centersThe Journal of Supercomputing10.1007/s11227-021-03779-4Online publication date: 8-Apr-2021
  • (2020)TelekineProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388301(817-834)Online publication date: 25-Feb-2020
  • (2020)A Dynamic and Proactive GPU Preemption Mechanism Using CheckpointingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.288390639:1(75-87)Online publication date: Jan-2020
  • (2019)HodorProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358849(489-503)Online publication date: 10-Jul-2019
  • (2019)NICAProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358838(345-361)Online publication date: 10-Jul-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media