skip to main content
10.5555/3049832.3049863acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Parallel associative reductions in halide

Published: 04 February 2017 Publication History

Abstract

Halide is a domain-specific language for fast image processing that separates pipelines into the algorithm, which defines what values are computed, and the schedule, which defines how they are computed. Changes to the schedule are guaranteed to not change the results. While Halide supports parallelizing and vectorizing naturally data-parallel operations, it does not support the same scheduling for reductions. Instead, the programmer must create data parallelism by manually factoring reductions into multiple stages. This manipulation of the algorithm can introduce bugs, impairs readability and portability, and makes it impossible for automatic scheduling methods to parallelize reductions.
We describe a new Halide scheduling primitive rfactor which moves this factoring transformation into the schedule, as well as a novel synthesis-based technique that takes serial Halide reductions and synthesizes an equivalent binary associative reduction operator and its identity. This enables us to automatically replace the original pipeline stage with a pair of stages which first compute partial results over slices of the reduction domain, and then combine them. Our technique permits parallelization and vectorization of Halide algorithms which previously required manipulating both the algorithm and schedule.

References

[1]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP ’95, pages 207–216, 1995.
[2]
ISBN 0-89791-700-6.
[3]
L. Dagum and R. Menon. Openmp: An industrystandard api for shared-memory programming. IEEE Comput. Sci. Eng., 5(1):46–55, Jan. 1998. ISSN 1070-9924.
[4]
L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, 2008. ISBN 3-540-78799-2, 978-3-540-78799-0.
[5]
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23–53, 1991.
[6]
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming, 34(3):261– 317, 2006.
[7]
T. Granlund and R. Kenner. Eliminating branches using a superoptimizer and the gnu c compiler. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, PLDI ’92, pages 341–352, 1992.
[8]
ISBN 0-89791-475-9.
[9]
M. Hall, J. Chame, C. Chen, J. Shin, G. Rudy, and M. M. Khan. Loop Transformation Recipes for Code Generation and Auto-Tuning, pages 50–64. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010. ISBN 978-3-642-13374-9.
[10]
S. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (SIGGRAPH Asia 2016), 2016.
[11]
Intel. Mkl. http://software.intel.com/mkl, 2016.
[12]
F. Irigoin and R. Triolet. Supernode partitioning. In Symposium on Principles of Programming Languages (POPL’88), pages 319–328, San Diego, CA, January 1988.
[13]
C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO ’04, pages 75–, 2004. ISBN 0-7695-2102-9.
[14]
N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’15, pages 22–32, 2015. ISBN 978-1- 4503-3468-6.
[15]
H. Massalin. Superoptimizer: A look at the smallest program. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems, ASPLOS II, pages 122–126, 1987.
[16]
ISBN 0-8186-0805-6.
[17]
K. Morita, A. Morihata, K. Matsuzaki, Z. Hu, and M. Takeichi. Automatic inversion generates divide-and-conquer parallel programs. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, pages 146–155, 2007.
[18]
ISBN 978-1-59593-633-2.
[19]
R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. Automatically scheduling halide image processing pipelines. ACM Trans. Graph., 35(4):83:1–83:11, July 2016. ISSN 0730-0301.
[20]
P. M. Phothilimthana, A. Thakur, R. Bodik, and D. Dhurjati. Scaling up superoptimization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, pages 297–310, 2016. ISBN 978- 1-4503-4091-5.
[21]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 519–530, 2013.
[22]
ISBN 978-1-4503-2014-6.
[23]
C. Reddy, M. Kruse, and A. Cohen. Reduction drawing: Language constructs and polyhedral compilation for reductions on gpu. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT ’16, pages 87–97, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4121-9.
[24]
E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. SIGARCH Comput. Archit. News, 41(1):305–316, Mar. 2013. ISSN 0163-5964.
[25]
C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’16, pages 326–340, 2016. ISBN 978- 1-4503-4261-2.
[26]
A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, 2008. AAI3353225.
[27]
Y. M. Teo, W.-N. Chin, and S. H. Tan. Deriving efficient parallel programs for complex recurrences. In Proceedings of the Second International Symposium on Parallel Symbolic Computation, PASCO ’97, pages 101–110, 1997.
[28]
ISBN 0-89791-951-3.
[29]
E. Torlak and R. Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2013, pages 135–152, 2013. ISBN 978-1-4503-2472-4.
[30]
Z. Xu, S. Kamil, and A. Solar-Lezama. Msl: A synthesis enabled language for distributed implementations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pages 311–322, 2014. ISBN 978-1-4799-5500-8.

Cited By

View all
  • (2020)AnsorProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488815(863-879)Online publication date: 4-Nov-2020
  • (2020)Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414649(317-328)Online publication date: 30-Sep-2020
  • (2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '17: Proceedings of the 2017 International Symposium on Code Generation and Optimization
February 2017
317 pages
ISBN:9781509049318

Sponsors

Publisher

IEEE Press

Publication History

Published: 04 February 2017

Check for updates

Qualifiers

  • Article

Conference

CGO '17
Sponsor:

Acceptance Rates

CGO '17 Paper Acceptance Rate 26 of 116 submissions, 22%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)AnsorProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488815(863-879)Online publication date: 4-Nov-2020
  • (2020)Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414649(317-328)Online publication date: 30-Sep-2020
  • (2019)Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUsProceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization10.5555/3314872.3314884(73-84)Online publication date: 16-Feb-2019
  • (2018)Associative instruction reordering to alleviate register pressureProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291718(1-13)Online publication date: 11-Nov-2018
  • (2018)Revealing parallel scans and reductions in recurrences through function reconstructionProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243204(1-13)Online publication date: 1-Nov-2018
  • (2018)Differentiable programming for image processing and deep learning in halideACM Transactions on Graphics10.1145/3197517.320138337:4(1-13)Online publication date: 30-Jul-2018
  • (2018)Associative instruction reordering to alleviate register pressureProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00049(1-13)Online publication date: 11-Nov-2018
  • (2017)HalideCommunications of the ACM10.1145/315021161:1(106-115)Online publication date: 27-Dec-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media