Keyword: cuda : Search

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

99 Results for: Keyword: cudaEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,774,380 records)|Limit your search to The ACM Full-Text Collection (761,092 records)

Showing 1 - 20of99 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
October 2023
Optimizing K-Mer Fingerprint Generation for Machine Learning
- Cory Kromer-Edwards
BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsArticle No.: 101, Pages 1–5https://doi.org/10.1145/3584371.3612946

With the increasing availability of genomic data obtained through Whole-Genome Sequencing (WGS), Machine Learning (ML) algorithms are being used to analyze this data. However, processing large datasets or files poses challenges. One approach is to ...
1
47
Metrics
Total Citations1
Total Downloads47
Last 12 Months47
Last 6 weeks2
Get Access
abstract
June 2023
Brief Announcement: Optimized GPU-accelerated Feature Extraction for ORB-SLAM Systems
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and ArchitecturesPages 299–302https://doi.org/10.1145/3558481.3591310

Reducing the execution time of ORB-SLAM algorithm is a crucial aspect of autonomous vehicles since it is computationally intensive for embedded boards. We propose a parallel GPU-based implementation, able to run on embedded boards, of the Tracking part ...
0
198
Metrics
Total Citations0
Total Downloads198
Last 12 Months146
Last 6 weeks21
1
Supplementary Material
SPAA23-spaaba014.mp4
Get Access
research-article
September 2021
Warp-centric K-Nearest Neighbor Graphs construction on GPU
ICPP Workshops '21: 50th International Conference on Parallel Processing WorkshopArticle No.: 5, Pages 1–10https://doi.org/10.1145/3458744.3474053

Recent advances and applications of machine learning algorithms are becoming more common in different fields. It is expected that some applications require the processing of large datasets with those algorithms, which leads to high computational costs. ...
2
148
Metrics
Total Citations2
Total Downloads148
Last 12 Months45
Last 6 weeks3
Get Access
poster
Public Access
June 2021
CharminG: A Scalable GPU-resident Runtime System
HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed ComputingPages 261–262https://doi.org/10.1145/3431379.3464454

Host-driven execution of applications on modern GPU-accelerated systems suffer from frequent host-device synchronizations, data movement and limited flexibility in scheduling user tasks. We present CharminG, a runtime system designed to run entirely on ...
0
184
Metrics
Total Citations0
Total Downloads184
Last 12 Months60
Last 6 weeks12
View online with eReader
PDF
research-article
Open Access
June 2021
SnuRHAC: A Runtime for Heterogeneous Accelerator Clusters with CUDA Unified Memory
HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed ComputingPages 107–120https://doi.org/10.1145/3431379.3460647

This paper proposes a framework called SnuRHAC, which provides an illusion of a single GPU for the multiple GPUs in a cluster. Under SnuRHAC, a CUDA program designed to use a single GPU can utilize multiple GPUs in a cluster without any source code ...
2
894
Metrics
Total Citations2
Total Downloads894
Last 12 Months266
Last 6 weeks30
View online with eReader
PDF
Upcoming Conferences
Skip slideshow

SPLASH '24

October 20 - 25, 2024

Hilton Pasadena, Pasadena, CA, USA

SPLASH '24 Website

CIKM '24

October 21 - 25, 2024

Boise Centre, Boise, ID, USA

CIKM '24 Website

MM '24

October 28 - November 1, 2024

Melbourne Convention and Exhibition Centre (MCEC), Melbourne, VIC, Australia

MM '24 Website

BCB '24

November 22 - 25, 2024

Dayhello International Hotel, Shenzhen, China

BCB '24 Website

SIGCSE Virtual 2024

December 5 - 8, 2024

USA, Virtual Event, NC, USA

SIGCSE Virtual 2024 Website

POPL '25

January 19 - 25, 2025

The Curtis Hotel, Denver, CO, USA

ASPLOS '25

March 30 - April 3, 2025

Postillion Hotel and Convention Centre WTC Rotterdam, Rotterdam, Netherlands

ICSE 2025

April 26 - May 3, 2025

Shaw Centre, Ottawa, ON, Canada

ICSE 2025 Website
research-article
June 2021
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes
HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed ComputingPages 95–106https://doi.org/10.1145/3431379.3460645

MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications. These datatypes are recursively constructed at runtime from primitive Named Types defined in the MPI standard. More recently, the development ...
1
111
Metrics
Total Citations1
Total Downloads111
Last 12 Months23
Last 6 weeks0
Get Access
research-article
October 2020
SphericRTC: A System for Content-Adaptive Real-Time 360-Degree Video Communication
MM '20: Proceedings of the 28th ACM International Conference on MultimediaPages 3595–3603https://doi.org/10.1145/3394171.3413999

We present the SphericRTC system for real-time 360-degree video communication. 360-degree video allows the viewer to observe the environment in any direction from the camera location. This more-immersive streaming experience allows users to more-...
5
388
Metrics
Total Citations5
Total Downloads388
Last 12 Months40
Last 6 weeks4
1
Supplementary Material
3394171.3413999.mp4
Get Access
research-article
Open Access
September 2020
cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 3–15https://doi.org/10.1145/3410463.3414624

Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC ...
48
921
Metrics
Total Citations48
Total Downloads921
Last 12 Months341
Last 6 weeks46
View online with eReader
PDF
research-article
August 2020
Detailed Analysis and Optimization of CUDA K-means Algorithm
- Martin Kruliš,
- Miroslav Kratochvíl
ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingArticle No.: 69, Pages 1–11https://doi.org/10.1145/3404397.3404426

K-means is one of the most frequently used algorithms for unsupervised clustering data analysis. Individual steps of the k-means algorithm include nearest neighbor finding, efficient distance computation, and cluster-wise reduction, which may be ...
11
492
Metrics
Total Citations11
Total Downloads492
Last 12 Months134
Last 6 weeks24
Get Access
research-article
August 2020
Massively parallel rendering of complex closed-form implicit surfaces
- Matthew J. Keeter
ACM Transactions on Graphics (TOG), Volume 39, Issue 4Article No.: 141, Pages 141:1–141:10https://doi.org/10.1145/3386569.3392429

We present a new method for directly rendering complex closed-form implicit surfaces on modern GPUs, taking advantage of their massive parallelism. Our model representation is unambiguously solid, can be sampled at arbitrary resolution, and supports both ...
15
433
Metrics
Total Citations15
Total Downloads433
Last 12 Months56
Last 6 weeks8
2
Supplementary Material
3386569.3392429.mp4
3386569.3392429.vtt
Get Access
short-paper
April 2019
Simultaneous Solving of Batched Linear Programs on a GPU
- Amit Gurung,
- Rajarshi Ray
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance EngineeringPages 59–66https://doi.org/10.1145/3297663.3310308

Linear Programs (LPs) appear in a large number of applications. Offloading the LP solving tasks to a GPU is viable to accelerate an application's performance. Existing work on offloading and solving an LP on a GPU shows that performance can be ...
4
192
Metrics
Total Citations4
Total Downloads192
Last 12 Months14
Last 6 weeks2
Get Access
research-article
Public Access
April 2019
Artifacts Evaluated & Functional
Best Paper
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects
ICPE '19: Proceedings of the 2019 ACM/SPEC International Conference on Performance EngineeringPages 209–218https://doi.org/10.1145/3297663.3310299

Data-intensive applications such as machine learning and analytics have created a demand for faster interconnects to avert the memory bandwidth wall and allow GPUs to be effectively leveraged for lower compute intensity tasks. This has resulted in wide ...
19
957
Metrics
Total Citations19
Total Downloads957
Last 12 Months281
Last 6 weeks44
View online with eReader
PDF
research-article
March 2018
Fast and accurate volume data curvature determination using GPGPU computation
- Jacob D. Hauenstein,
- Timothy S. Newman
ACMSE '18: Proceedings of the 2018 ACM Southeast ConferenceArticle No.: 19, Pages 1–8https://doi.org/10.1145/3190645.3190681

A methodology for fast determination of a key shape feature in volume datasets using a GPU is described. The shape feature, surface curvature, which is a valuable descriptor for structure classification and dataset registration applications, can be time-...
0
80
Metrics
Total Citations0
Total Downloads80
Last 12 Months2
Last 6 weeks0
Get Access
short-paper
Public Access
August 2017
An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics
- Muaaz Gul Awan,
- Fahad Saeed
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health InformaticsPages 550–555https://doi.org/10.1145/3107411.3107466

Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. ...
7
228
Metrics
Total Citations7
Total Downloads228
Last 12 Months28
Last 6 weeks6
View online with eReader
PDF
poster
Public Access
September 2016
POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 443–445https://doi.org/10.1145/2967938.2976039

Programming heterogeneous parallel systems can be extremely complex because a single system may include multiple different parallelism models, instruction sets, and memory hierarchies, and different systems use different combinations of these features. ...
0
463
Metrics
Total Citations0
Total Downloads463
Last 12 Months38
Last 6 weeks6
View online with eReader
PDF
poster
September 2016
POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationPages 423–424https://doi.org/10.1145/2967938.2974056

Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or ...
0
72
Metrics
Total Citations0
Total Downloads72
Last 12 Months6
Last 6 weeks1
Get Access
poster
July 2016
A CUDA Implementation of an Improved Decomposition Based Evolutionary Algorithm for Multi-Objective Optimization
GECCO '16 Companion: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference CompanionPages 71–72https://doi.org/10.1145/2908961.2908971

In last few years, the concept of decomposition has been extensively used in a number of evolutionary algorithms, wherein a multiobjective problem is solved as a set of single objective sub-problems. Such algorithms have demonstrated significant break-...
0
136
Metrics
Total Citations0
Total Downloads136
Last 12 Months1
Last 6 weeks0
Get Access
research-article
May 2016
Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs
HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed ComputingPages 219–230https://doi.org/10.1145/2907294.2907297

Matrix factorization (MF) is used by many popular algorithms such as collaborative filtering. GPU with massive cores and high memory bandwidth sheds light on accelerating MF much further when appropriately exploiting its architectural characteristics.

...
39
454
Metrics
Total Citations39
Total Downloads454
Last 12 Months26
Last 6 weeks5
Get Access
research-article
May 2016
GPU Delegation: Toward a Generic Approach for Developping MABS using GPU Programming
- Emmanuel Hermellin,
- Fabien Michel
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent SystemsPages 1249–1258

Using Multi-Agent Based Simulation (MABS), computing resources requirements often limit the extent to which a model could be experimented. As the number of agents and the size of the environment are constantly growing in these simulations, using General-...
1
55
Metrics
Total Citations1
Total Downloads55
Last 12 Months2
Last 6 weeks0
Get Access
research-article
June 2015
Automated GPU Kernel Transformations in Large-Scale Production Stencil Applications
- Mohamed Wahib,
- Naoya Maruyama
HPDC '15: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed ComputingPages 259–270https://doi.org/10.1145/2749246.2749255

This paper proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels ...
15
297
Metrics
Total Citations15
Total Downloads297
Last 12 Months23
Last 6 weeks2
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Paper Award

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences