skip to main content
10.1109/ISCA.2005.28acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Exploiting Structural Duplication for Lifetime Reliability Enhancement

Published: 01 May 2005 Publication History

Abstract

Increased power densities (and resultant temperatures) and other effects of device scaling are predicted to cause significant lifetime reliability problems in the near future. In this paper, we study two techniques that leverage microarchitectural structural redundancy for lifetime reliability enhancement. First, in structural duplication (SD), redundant microarchitectural structures are added to the processor and designated as spares. Spare structures can be turned on when the original structure fails, increasing the processor�s lifetime. Second, graceful performance degradation (GPD) is a technique which exploits existing microarchitectural redundancy for reliability. Redundant structures that fail are shut down while still maintaining functionality, thereby increasing the processor�s lifetime, but at a lower performance. Our analysis shows that exploiting structural redundancy can provide significant reliability benefits, and we present guidelines for efficient usage of these techniques by identifying situations where each is more beneficial. We show that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability. Specifically, on average for our systems and applications,GPD increased processor reliability to 1.42 times the base value for less than a 5% loss in performance. On the other hand, for systems where reliability is more important than performance or cost, SD is more beneficial. SD increases reliability to 3.17 times the base value for 2.25 times the base cost, for our applications. Finally, a combination of the two techniques (SD+GPD) provides the highest reliability benefit.

References

[1]
{1} Assessing Product Reliability, Chapter 8, NIST/SEMATECH e-Handbook of Statistical Methods. In http://www.itl.nist.gov/div898/handbook/.
[2]
{2} Compaq NonStop Himalaya S-Series Server Description Manual. In Compaq Technical Manual 520331-001, http://www.compaq.com.
[3]
{3} Methods for Calculating Failure Rates in Units of FITs. In JEDEC Publication JESD85, 2001.
[4]
{4} F. Bower et al. Tolerating Hard Faults in Microprocessor Array Structures. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004.
[5]
{5} D. Brooks et al. Power-aware Microarchitecture: Design and Modeling Challenges for the next-generation microprocessor. In IEEE Micro, 2000.
[6]
{6} D. Brooks et al. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations. In Proc. of the 27th Annual Intl. Symp. on Comp. Arch., 2000.
[7]
{7} J. L. Hennessy and D. A. Patterson. Computer Architecture, A Quantitative Approach. Morgan Kaufmann, 2003.
[8]
{8} S. Heo et al. Reducing Power Density Through Activity Migration. In Intl. Symp. on Low Power Elec. Design, 2003.
[9]
{9} G. Hetheringon et al. Logic BIST for Large Industrial Designs: Real Issues and Case Studies. In Proceedings of the International Test Conference, 1999.
[10]
{10} V. Iyengar, L. H. Trevillyan, and P. Bose. Representative Traces for Processor Models with Infinite Cache. In Proc. of the 2nd Intl. Symp. on High-Perf. Comp. Architecture, 1996.
[11]
{11} I. Koren et al. Defect Tolerant VLSI Circuits: Techniques and Yield Analysis. In Proceedings of the IEEE, 1998.
[12]
{12} M. Moudgill et al. Environment for PowerPC microarchitectural exploration. In IEEE Micro, 1999.
[13]
{13} M. Moudgill et al. Validation of turandot, a fast processor model for microarchitectural exploration. In IEEE Intl Perf., Computing, and Communications Conf., 1999.
[14]
{14} W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, 1992.
[15]
{15} P. Shivakumar et al. Exploiting Microarchitectural Redundancy for Defect Tolerance. In 21st Intl. Conf. on Comp. Design, 2003.
[16]
{16} K. Skadron et al. Temperature-Aware Microarchitecture. In Proc. of the 30th Annual Intl. Symp. on Comp. Arch., 2003.
[17]
{17} L. Spainhower et al. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. In IBM Journal of R&D, September/November 1999.
[18]
{18} J. Srinivasan et al. The Case for Lifetime Reliability-Aware Microprocessors. In Proc. of the 31st Annual Intl. Symp. on Comp. Architecture, 2004.
[19]
{19} J. Srinivasan et al. The Impact of Technology Scaling on Lifetime Reliability. In Proceedings of the 2004 International Conference on Dependable Systems and Networks, 2004.
[20]
{20} J. M. Tendler et al. POWER4 System Microarchitecture. In IBM Journal of Research and Development, 2002.
[21]
{21} K. Trivedi. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. Prentice Hall, 1982.
[22]
{22} S. Zafar et al. A Model for Negative Bias Temperature Instability (NBTI) in Oxide and High-KpFETs. In 2004 Symposia on VLSI Technology and Circuits, June, 2004.

Cited By

View all
  • (2017)Multi-armed bandits for efficient lifetime estimation in MPSoC designProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130739(1544-1549)Online publication date: 27-Mar-2017
  • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
  • (2017)Transparent lifetime built-in self-testing of networks-on-chip through the selective non-concurrent testing of their communication channelsProceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems10.1145/3073763.3073765(12-17)Online publication date: 25-Jan-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture
June 2005
541 pages
ISBN:076952270X
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 33, Issue 2
    ISCA 2005
    May 2005
    531 pages
    ISSN:0163-5964
    DOI:10.1145/1080695
    Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 2005

Check for updates

Qualifiers

  • Article

Conference

ISCA05
Sponsor:

Acceptance Rates

ISCA '05 Paper Acceptance Rate 45 of 194 submissions, 23%;
Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Multi-armed bandits for efficient lifetime estimation in MPSoC designProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130739(1544-1549)Online publication date: 27-Mar-2017
  • (2017)Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital SystemsACM Computing Surveys10.1145/309269950:4(1-38)Online publication date: 4-Oct-2017
  • (2017)Transparent lifetime built-in self-testing of networks-on-chip through the selective non-concurrent testing of their communication channelsProceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems10.1145/3073763.3073765(12-17)Online publication date: 25-Jan-2017
  • (2016)Area-energy tradeoffs of logic wear-leveling for BTI-induced agingProceedings of the ACM International Conference on Computing Frontiers10.1145/2903150.2903171(37-44)Online publication date: 16-May-2016
  • (2015)An online wear state monitoring methodology for off-the-shelf embedded processorsProceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis10.5555/2830840.2830853(114-123)Online publication date: 4-Oct-2015
  • (2015)Lifetime Reliability Enhancement of MicroprocessorsACM Computing Surveys10.1145/278598848:1(1-25)Online publication date: 29-Sep-2015
  • (2015)A Hardware Framework for Yield and Reliability Enhancement in Chip MultiprocessorsACM Transactions on Embedded Computing Systems10.1145/262968814:1(1-26)Online publication date: 21-Jan-2015
  • (2014)Exploiting Existing Comparators for Fine-Grained Low-Cost Error DetectionACM Transactions on Architecture and Code Optimization10.1145/265634111:3(1-24)Online publication date: 27-Oct-2014
  • (2014)Workload assignment considering NBTI degradation in multicore systemsACM Journal on Emerging Technologies in Computing Systems10.1145/253912410:1(1-22)Online publication date: 13-Jan-2014
  • (2014)Cost-effective lifetime and yield optimization for NoC-based MPSoCsACM Transactions on Design Automation of Electronic Systems10.1145/253557519:2(1-33)Online publication date: 28-Mar-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media