skip to main content
research-article
Open access

Halide: decoupling algorithms from schedules for high-performance image processing

Published: 27 December 2017 Publication History

Abstract

Writing high-performance code on modern machines requires not just locally optimizing inner loops, but globally reorganizing computations to exploit parallelism and locality---doing things such as tiling and blocking whole pipelines to fit in cache. This is especially true for image processing pipelines, where individual stages do much too little work to amortize the cost of loading and storing results to and from off-chip memory. As a result, the performance difference between a naive implementation of a pipeline and one globally optimized for parallelism and locality is often an order of magnitude. However, using existing programming tools, writing high-performance image processing code requires sacrificing simplicity, portability, and modularity. We argue that this is because traditional programming models conflate the computations defining the algorithm with decisions about intermediate storage and the order of computation, which we call the schedule.
We propose a new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule. Programmers can change the schedule to express many possible organizations of a single algorithm. The Halide compiler then synthesizes a globally combined loop nest for an entire algorithm, given a schedule. Halide models a space of schedules which is expressive enough to describe organizations that match or outperform state-of-the-art hand-written implementations of many computational photography and computer vision algorithms. Its model is simple enough to do so often in only a few lines of code, and small changes generate efficient implementations for x86, ARM, Graphics Processors (GPUs), and specialized image processors, all from a single algorithm.
Halide has been public and open source for over four years, during which it has been used by hundreds of programmers to deploy code to tens of thousands of servers and hundreds of millions of phones, processing billions of images every day.

References

[1]
Adams, A., Talvala, E., Park, S.H., Jacobs, D.E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H.P.A., Matusik, W., Pulli, K., Horowitz, M., Levoy, M. The Frankencamera: An experimental platform for computational photography. ACM Trans. Graph. 29, 4 (2010), 29:1--29:12.
[2]
Aubry, M., Paris, S., Hasinoff, S.W., Kautz, J., Durand, F. Fast local Laplacian filters: Theory and applications. ACM Trans. Graph. 33, 5 (2014), 167.
[3]
Bacon, D.F., Graham, S.L., Sharp, O.J. Compiler transformations for high-performance computing. ACM Comput Surv. 26, 4 (Dec. 1994).
[4]
Blythe, D. The Direct3D 10 system. ACM Trans. Graph. 25, (2006), 724--734.
[5]
Buck, I. GPU computing: Programming a massively parallel processor. In Proceedings of the International Symposium on Code Generation and Optimization (Tessellations Publishing, Phoenix, Arizona, 2007).
[6]
Chamberlain, B., Callahan, D., Zima, H. Parallel programmability and the Chapel language. Int J High Perform Comput Appl. 21, (2007), 291--312.
[7]
Chen, J., Paris, S., Durand, F. Real-time edge-aware image processing with the bilateral grid. ACM Trans. Graph. 26, 3 (2007), 103:1--103:9.
[8]
Elliott, C. Functional image synthesis. In Proceedings of Bridges 2001, Mathematical Connections in Art, Music, and Science (IEEE Computer Society, Washington, DC, USA, 2001).
[9]
Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P. Sequoia: Programming the memory hierarchy. In ACM/IEEE conference on Supercomputing (ACM, New York, NY, 2006).
[10]
Feautrier, P. Dataflow analysis of array and scalar references. Int J Parallel Program. 20, 1 (1991), 23--53.
[11]
Frigo, M., Johnson, S.G. The design and implementation of FFTW3. Proc IEEE 93, 2 (2005).
[12]
Gordon, M.I., Thies, W., Karczmarek, M., Lin, J., Meli, A.S., Leger, C., Lamb, A.A., Wong, J., Hoffman, H., Maze, D.Z., Amarasinghe, S. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems (ACM, New York, NY, 2002).
[13]
Govindaraju, N., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J. High performance discrete Fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE (Washington, DC, January 2008).
[14]
Halide source repository. http://github.com/halide/Halide.
[15]
Hasinoff, S.W., Sharlet, D., Geiss, R., Adams, A., Barron, J.T., Kainz, F., Chen, J., Levoy, M. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Trans. Graph. 35, 6 (2016).
[16]
Holzmann, G. Beyond Photography: The Digital Darkroom. Prentice Hall, Englewood Cliffs, NJ, 1988.
[17]
Mullapudi, R.T., Adams, A., Sharlet, D., Ragan-Kelley, J., Fatahalian, K. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (2016).
[18]
Mullapudi, R.T., Vasista, V., Bondhugula, U. PolyMage: Automatic optimization for image processing pipelines. In ACM SIGPLAN Notices (ACM, New York, NY, 2015), volume 50, 429--443.
[19]
The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf, 2011.
[20]
P�schel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation" 93, 2 (2005), 232--275.
[21]
Ragan-Kelley, J. Decoupling algorithms from the organization of computation for high performance image processing. PhD thesis, Massachusetts Institute of Technology (2014).
[22]
Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., Durand, F. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012).
[23]
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (ACM, New York, NY, 2013).
[24]
Rudy, G., Khan, M.M., Hall, M., Chen, C., Chame, J. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing LCPC'10, (Springer-Verlag, Berlin, Heidelberg, 2011), 136--150.
[25]
Suriana, P., Adams, A., Kamil, S. Parallel associative reductions in halide. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (ACM, New York, NY, 2017).

Cited By

View all
  • (2024)Research on the Application of Multimedia Image Processing Technology in Sports Sociology EducationInternational Journal of Web-Based Learning and Teaching Technologies10.4018/IJWLTT.34798919:1(1-21)Online publication date: 30-Jul-2024
  • (2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
  • (2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 61, Issue 1
January 2018
110 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3176926
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-NoDerivs International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2017
Published in CACM Volume 61, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSF
  • Department of Energy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4,460
  • Downloads (Last 6 weeks)208
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Research on the Application of Multimedia Image Processing Technology in Sports Sociology EducationInternational Journal of Web-Based Learning and Teaching Technologies10.4018/IJWLTT.34798919:1(1-21)Online publication date: 30-Jul-2024
  • (2024)Minotaur: A SIMD-Oriented Synthesizing SuperoptimizerProceedings of the ACM on Programming Languages10.1145/36897668:OOPSLA2(1561-1585)Online publication date: 8-Oct-2024
  • (2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
  • (2024)Optimizing Nested Recursive QueriesProceedings of the ACM on Management of Data10.1145/36392712:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Optimizing Dynamic-Shape Neural Networks on Accelerators via On-the-Fly Micro-Kernel PolymerizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640390(797-812)Online publication date: 27-Apr-2024
  • (2024)Auto-Generating Diverse Heterogeneous Designs2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00035(116-123)Online publication date: 27-May-2024
  • (2024)A Probabilistic Motion Model for Skid-Steer Wheeled Mobile Robot Navigation on Off-Road Terrains2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611343(12599-12605)Online publication date: 13-May-2024
  • (2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
  • (2024)Template-Based Automatic Library Function Generation with Halide for Compute-Intensive Simulink Models2024 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)10.1109/COOLCHIPS61292.2024.10531173(1-6)Online publication date: 17-Apr-2024
  • (2024)Verifying a Radio Telescope Pipeline Using HaliVer: Solving Nonlinear and Quantifier ChallengesFormal Methods for Industrial Critical Systems10.1007/978-3-031-68150-9_9(152-169)Online publication date: 9-Sep-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media