skip to main content
10.1145/1024393.1024414acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Locality phase prediction

Published: 07 October 2004 Publication History

Abstract

As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-time prediction. By profiling a training input, it identifies locality phases by sifting through all accesses to all data elements using variable-distance sampling, wavelet filtering, and optimal phase partitioning. It then constructs a phase hierarchy through grammar compression. Finally, it inserts phase markers into the program using binary rewriting. When the instrumented program runs, it uses the first few executions of a phase to predict all its later executions.Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. The second half of the paper presents a comprehensive evaluation. It measures the accuracy and the coverage of the new technique and compares it with best known run-time methods. It measures its benefit in adaptive cache resizing and memory remapping. Finally, it compares the automatic analysis with manual phase marking. The results show that locality phase prediction is well suited for identifying large, recurring phases in complex programs.

References

[1]
F. Allen and J. Cocke. A proram data flow analysis procedure. Communications of the ACM, 19:137--147, 1976.]]
[2]
R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture, Monterey, California, December 2000.]]
[3]
R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically managing the communication-parallelism trade-off in future clustered processors. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[4]
A. P. Batson and A. W. Madison. Measurements of major locality phases in symbolic reference strings. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Cambridge, MA, March 1976.]]
[5]
T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001.]]
[6]
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.]]
[7]
I. Daubechies. Ten Lectures on Wavelets. Capital City Press, Montpelier,Vermont, 1992.]]
[8]
P.J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), January 1980.]]
[9]
A. S. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working-set analysis. In Proceedings of International Symposium on Computer Architecture, Anchorage, Alaska, June 2002.]]
[10]
A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of International Symposium on Microarchitecture, December 2003.]]
[11]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.]]
[12]
C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[13]
E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.]]
[14]
C. Fang, S. Carr, S. Onder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004.]]
[15]
H. Han and C. W. Tseng. Improving locality for adaptive irregular scientific codes. In Proceedings of Workshop on Languages and Compilers for High-Performance Computing (LCPC'00), White Plains, NY, August 2000.]]
[16]
J. E. Hopcroft and J. D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.]]
[17]
C.-H. Hsu and U. Kermer. The design, implementation and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[18]
R. Joseph, Z. Hu, and M. Martonosi. Wavelet analysis for microprocessor design: Experiences with wavelet-based di/dt characterization. In Proceedings of International Symposium on High Performance Computer Architecture, February 2004.]]
[19]
J. R. Larus. Whole program paths. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]]
[20]
C. Luk and T. C. Mowry. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation. In Proceedings of International Symposium on Computer Architecture, Atlanta, GA, May 1999.]]
[21]
M. Huang and J. Renau and J. Torrellas. Positional adaptation of processors: application to energy reduction. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[22]
G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi, and S. Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[23]
G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004.]]
[24]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.]]
[25]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001.]]
[26]
C. G. Nevill-Manning and I. H. Witten. Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7:67--82, 1997.]]
[27]
V. S. Pingali, S. A. McKee, W. C. Hsieh, and J. B. Carter. Restructuring computations for temporal data cache locality. International Journal of Parallel Programming, 31(4), August 2003.]]
[28]
X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.]]
[29]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, September 2001.]]
[30]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[31]
A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994.]]
[32]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[33]
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1993.]]
[34]
L. Zhang. Efficient Remapping Mechanism for an Adaptive Memory System. PhD thesis, Department of Computer Science, University of Utah, August 2000.]]
[35]
L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke, J. B. Carter, W. C. Hsieh, and S. A. McKee. The Impulse memory controller. IEEE Transactions on Computers, 50(11), November 2001.]]
[36]
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004.]]

Cited By

View all
  • (2024)Viper: Utilizing Hierarchical Program Structure to Accelerate Multi-Core SimulationIEEE Access10.1109/ACCESS.2024.335406912(17669-17678)Online publication date: 2024
  • (2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
  • (2021)Online Working Set Change Detection with Constant ComplexityProceedings of the International Symposium on Memory Systems10.1145/3488423.3519332(1-16)Online publication date: 27-Sep-2021
  • Show More Cited By

Recommendations

Reviews

Peter C. Patton

The principles of task and data locality are very important in computer architecture, especially in the application of cache memory to bridge the speed mismatch between a central processing unit (CPU) and its primary memory. This paper goes well beyond qualitative consideration of the principles of locality, to describe a detailed, quantitative evaluation of their effects in actual memory hierarchies, for an actual benchmark program. As memory hierarchies become adaptive, performance becomes dependent on forecasting dynamic locality. This paper presents a method that predicts the locality phases of a program, by a combination of locality profiling and runtime prediction. Previous studies on the manual adaptation of programs by dynamic data reorganization, at both the hardware and application program levels, have shown improvement in cache locality and prefetching efficiency. Unfortunately, such manual analysis is difficult, and depends on the user's intimate knowledge of the program and its data sets. Most large-scale scientific and engineering computations require substantial computer time, and also have recurring locality phases. These computations are good candidates for adaptation using the method described in this paper. Application of the three-step method to the familiar tomcatv benchmark predicts the locality phase of this program with near perfect accuracy, showing a cache size reduction of 40 percent, without additional misses, and a program speedup of 35 percent. This is an excellent paper on quantitative computer architecture; even Hennessy and Patterson would be impressed. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
October 2004
296 pages
ISBN:1581138040
DOI:10.1145/1024393
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 5
    ASPLOS 2004
    December 2004
    283 pages
    ISSN:0163-5964
    DOI:10.1145/1037947
    Issue’s Table of Contents
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 38, Issue 5
    ASPLOS '04
    December 2004
    283 pages
    ISSN:0163-5980
    DOI:10.1145/1037949
    Issue’s Table of Contents
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 39, Issue 11
    ASPLOS '04
    November 2004
    283 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1037187
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic optimization
  2. locality analysis and optimization
  3. phase hierarchy
  4. program phase analysis and prediction
  5. reconfigurable architecture

Qualifiers

  • Article

Conference

ASPLOS04

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)4
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Viper: Utilizing Hierarchical Program Structure to Accelerate Multi-Core SimulationIEEE Access10.1109/ACCESS.2024.335406912(17669-17678)Online publication date: 2024
  • (2023)Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop CharacteristicsProceedings of the ACM on Programming Languages10.1145/36228037:OOPSLA2(173-203)Online publication date: 16-Oct-2023
  • (2021)Online Working Set Change Detection with Constant ComplexityProceedings of the International Symposium on Memory Systems10.1145/3488423.3519332(1-16)Online publication date: 27-Sep-2021
  • (2021)A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006ACM Transactions on Architecture and Code Optimization10.1145/344620018:2(1-20)Online publication date: 8-Mar-2021
  • (2021)DNA: Dynamic Resource Allocation for Soft Real-Time Multicore Systems2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS52030.2021.00024(196-209)Online publication date: May-2021
  • (2020)Computational Geometry Column 70ACM SIGACT News10.1145/3388392.338840451:1(105-117)Online publication date: 12-Mar-2020
  • (2020)Mocktails: Capturing the Memory Behaviour of Proprietary Mobile Architectures2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00046(460-472)Online publication date: May-2020
  • (2019)Evaluating the effectiveness of program data features for guiding memory managementProceedings of the International Symposium on Memory Systems10.1145/3357526.3357537(383-395)Online publication date: 30-Sep-2019
  • (2019)DynaSprintProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358301(426-439)Online publication date: 12-Oct-2019
  • (2019)Coding Scheme Optimization for Fast Fluorescence Lifetime ImagingACM Transactions on Graphics10.1145/332513638:3(1-16)Online publication date: 7-Jun-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media