research-article

8th Workshop on Challenges for Parallel Computing

Authors:

Priya Unnikrishnan,

Kit Barton,

Jeremy Bradbury,

Reza AzimiAuthors Info & Claims

CASCON '13: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Pages 363 - 364

Published: 18 November 2013 Publication History

Get Access

Abstract

Over the last decade, interest in parallel programming has grown tremendously. Hardware systems that contain many different levels of parallelism have become mainstream. At one end of the spectrum, computer system that contain many processing cores, each capable of running multiple hardware threads, are becoming commonplace. It is common to find laptop and desktop systems that contain a small number of these Shared-Memory Processor (SMP) chips. Furthermore, high-end computing systems can now contain hundreds of these SMP chips, resulting in machines capable of running an incredibly large number of hardware threads simultaneously. As processor speeds begin to stagnate, software developers are being forced to exploit the parallelism that is available in these systems in order to improve the performance of their applications.

At the other end of the spectrum, as commodity hardware prices fall it is becoming increasingly affordable to build large-scale multi-node distributed machines. A survey of the top 10 supercomputers in the world (www.top500.org) shows that the systems contain an average of about 400,000 cores running at an average frequency of 2.5GHz. Since the average clock frequency of these machines is fairly low, the full potential of these systems must be exploited through efficient use of the parallelism provided by the thousands of processors they contain.

New types of heterogeneous parallel computing system have begun to emerge. These systems contain multiple types of processors - typically a powerful CPU core and some type of Graphics Processing Unit (GPU) or hardware accelerator unit. This type of heterogeneous system provides a new set of challenges for software developers in terms of how to distribute work among the different units to maximize their utilization, based on their capabilities. Many advances in heterogeneous system design are still being realized as people continue to explore how to combine existing hardware in new and novel ways. Others still are exploring new advances in hardware design that can further increase the breadth of combinations that can be used to create heterogeneous systems. One striking example is the use of Field Programmable Gate Arrays (FPGAs) as re-configurable specialized processing units that can be included in a heterogeneous system to perform specialized work on demand.

Of course, all of these advances in the construction of large parallel machines, whether single-node SMPs or large distributed clusters, are done with the intention of providing more performance for the applications that are designed to run on these systems. Thus, it is imperative that we provide software developers the means to exploit these systems. Programming models and languages are instrumental in allowing software developers to efficiently develop parallel applications with suitable performance. Unfortunately, the perfect programming language to deal with the different types of parallel systems has yet to be found. Existing languages and models, such as OpenMP and Message Passing Interface (MPI) are well established within the communities. However, it remains unclear whether they alone can provide solutions for all different types of parallel systems that are available. Similarly, the Partitioned Global Address Space (PGAS) programming model is gaining more traction within the community, as it provides a paradigm that allows development of parallel software and provides performance that is becoming increasingly competitive with existing paradigms. Other emerging languages, such as OpenCL, OpenACC provide a means to effectively develop applications for heterogeneous systems.

The performance of parallel applications relies heavily on the underlying synchronization primitives used for concurrency control. So it is necessary to study the performance implications of synchronization primitives. Programming scalable, massively-parallel applications using fine-grained locking is a very challenging problem requiring significant expertise. Transactional Memory (TM) is emerging as a promising an alternative to the traditional lock-based synchronization. Transactional programming is easier for programmers because a lot of the burden of concurrency control is handled by the underlying system. This will become increasingly important, as the productivity of software developers continues to be stressed.

Compilers play a significant role in transforming and optimizing code for parallel execution. Most mainstream compilers offer some level of automatic parallelization but there is still a long way to go. A good understanding of the hardware, especially in heterogeneous systems, is essential in order for the compiler and runtime systems to leverage new hardware features.

Tools that assist in the development, debugging, testing and analysis of parallel software is also of utmost importance. Debugging large parallel applications is a formidable task. Most debuggers offer minimal support for debugging parallel applications and are lagging in keeping up with new parallel programming models and paradigms. A recent challenge for a parallel debugger lies in its scalability. It is essential for a debugger to have a small memory footprint to allow the debugger and the application to scale well. Organizing the large amount of debug information and presenting it to the programmer is a daunting task. Occupying a large system for several hours for interactive debugging is a rare luxury and is often not feasible. Therefore, it is crucial for the debugger to gather meaningful information and process it so as to best assist the programmer in diagnosing the problem. AB@The testing of parallel applications to ensure correct behaviour is an equally hard problem. The non-determinism inherent in parallel applications makes reliable and reproducible testing extremely difficult, if not impossible. New techniques and tools must be developed to assist with this as well. There is also an urgent need for more tools and infrastructure for performance tuning and profiling of parallel applications.

A study of parallel applications is crucial in order to understand the inherent parallelism available in the program. Such a study reveals important performance characteristics and possible performance gains and scalability of the application. Information on the nature of parallelism (whether structured or unstructured) can be very useful in extending existing programming models.

The goals of this workshop are to bring together different groups from the parallel community (application developers, language developers, compiler and tools developers, and academic researchers)to further explore the current challenges that parallel computing faces and present ideas on how to deal with these challenges.

Topics to be discussed in the workshop include, but are not limited to:

• Parallel architectures

• Parallel programming models & languages

• Concurrency control mechanisms

• Compiler, runtime, debugger, tools and infrastructure for parallel computing

• Parallel applications (scientific and non-scientific)

• Parallel performance evaluation

• New trends in parallel computing

Recommendations

Challenges for parallel computing
CASCON '11: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research

Over the last decade, interest in parallel programming has grown tremendously and hardware systems that contain different levels of parallelism have become mainstream. At one end of the spectrum, computer systems that contain many processing cores and ...
7th Workshop on Challenges for Parallel Computing
CASCON '12: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research

Over the last decade, interest in parallel programming has grown tremendously. Hardware systems that contain many different levels of parallelism have become mainstream. At one end of the spectrum, computer system that contain many processing cores, ...
12th workshop on challenges for parallel computing
CASCON '17: Proceedings of the 27th Annual International Conference on Computer Science and Software Engineering

Over the last decade, interest in parallel programming has grown tremendously. Hardware systems that contain many different levels of parallelism have become mainstream. At one end of the spectrum, computer systems that contain many processing cores, ...

Comments

Information & Contributors

Information

Published In

CASCON '13: Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

November 2013

449 pages

Conference Chair:
Joanna Ng
CAS Research, IBM Canada Software Lab
,
Program Chairs:
James R. Cordy
Queen's University
,
Krzysztof Czarnecki
University of Waterloo

Publisher

IBM Corp.

United States

Publication History

Published: 18 November 2013

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 24 of 90 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
15
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Recommendations

Challenges for parallel computing

7th Workshop on Challenges for Parallel Computing

12th workshop on challenges for parallel computing