No abstract available.
Proceeding Downloads
Evaluating Autoencoders for Dimensionality Reduction of MRI-derived Radiomics and Classification of Malignant Brain Tumors
Malignant brain tumors including parenchymal metastatic (MET) lesions, glioblastomas (GBM), and lymphomas (LYM) account for 29.7% of brain cancers. However, the characterization of these tumors from MRI imaging is difficult due to the similarity of ...
LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization
This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that ...
Indexing Temporal Relations for Range-Duration Queries
Temporal information plays a crucial role in many database applications, however support for queries on such data is limited. We present an index structure, termed RD-index, to support range-duration queries over interval timestamped relations, which ...
SciDG: Benchmarking Scientific Dynamic Graph Queries
Dynamic graphs are increasingly being utilized in domain knowledge modeling and large-scale scientific data management. Managing dynamic graph data requires a graph database system that can handle constantly changing volumes and data versions, while ...
Data Driven Dimensionality Reduction to Improve Modeling Performance✱
In a number of applications, data may be anonymized, obfuscated, or highly noisy. In such cases, it is difficult to use domain knowledge or low-dimensional visualizations to engineer the features for tasks such as machine learning, instead, we explore ...
Privacy-Preserving OLAP via Modeling and Analysis of Query Workloads: Innovative Theories and Theorems
This paper proposes innovative theories and theorems in the context of a state-of-the-art paper that computes privacy-preserving OLAP cubes via modeling and analyzing query workloads. The work contributes to actual literature by devising a solid ...
ESM2-Tree: An maintenance efficient authentication data structure in blockchain
Blockchain technology is gaining broader attention. Owing to its immutability property and byzantine fault-tolerance consensus protocol, blockchain offers a brand new trusted data-sharing solution. Some researchers use blockchain to drive autonomous ...
ST-CopulaGNN : A Multi-View Spatio-Temporal Graph Neural Network for Traffic Forecasting
Modern cities heavily rely on complex transportation, making accurate traffic speed prediction crucial for traffic management authorities. Classical methods, including statistical techniques and traditional machine learning techniques, fail to capture ...
Towards Efficient Discovery of Spatially Interesting Patterns in Geo-referenced Sequential Databases
A geo-referenced time series is a crucial form of spatiotemporal data. Useful information that can empower the users to achieve economic development is hidden in this series. When confronted with this problem, researchers modeled this series as a ...
Multi-representations Space Separation based Graph-level Anomaly-aware Detection
Graph structure patterns are widely used to model different area data recently. How to detect anomalous graph information on these graph data has become a popular research problem. The objective of this research is centered on the particular issue that ...
Federated Learning on Personal Data Management Systems: Decentralized and Reliable Secure Aggregation Protocols
The development and adoption of personal data management systems (PDMS) has been fueled by legal and technical means such as smart disclosure, data portability and data altruism. By using a PDMS, individuals can effortlessly gather and share data, ...
A Computer Vision Approach for Detecting Discrepancies in Map Textual Labels
- Abdulrahman Salama,
- Mahmoud Elkamhawy,
- Mohamed Ali,
- Ehab Al-Masri,
- Adel Sabour,
- Abdeltawab Hendawi,
- Ming Tan,
- Vashutosh Agrawal,
- Ravi Prakash
Maps provide various sources of information. An important example of such information is textual labels such as cities, neighborhoods, and street names. Although we treat this information as facts, and despite the massive effort done by providers to ...
Accelerating Machine Learning Queries with Linear Algebra Query Processing
The rapid growth of large-scale machine learning (ML) models has led numerous commercial companies to utilize ML models for generating predictive results to help business decision-making. As two primary components in traditional predictive pipelines, ...
A Long-term Time Series Forecasting method with Multiple Decomposition
In various real-world applications such as weather forecasting, energy consumption planning, and traffic flow prediction, time serves as a critical variable. These applications can be collectively referred to as time-series prediction problems. Despite ...
Heterogeneous Graph Neural Network via Knowledge Relations for Fake News Detection
The proliferation of fake news in social media has been recognized as a severe problem for society, and substantial attempts have been devoted to fake news detection to alleviate the detrimental impacts. Knowledge graphs (KGs) comprise rich factual ...
Less is More: How Fewer Results Improve Progressive Join Query Processing
With the requirements to enable data analytics and exploration interactively and efficiently, progressive data processing, especially progressive join, became essential to data science. Join queries are particularly challenging due to the correlation ...
Fast Algorithm for Embedded Order Dependency Validation
Order Dependencies (ODs) have many applications, such as query optimization, data integration, and data cleaning. Although many works addressed the problem of discovering OD (and its variants), they do not consider datasets with missing values, a ...
MSLS: Meta-graph Search with Learnable Supernet for Heterogeneous Graph Neural Networks
In recent years, heterogeneous graph neural networks (HGNNs) have achieved excellent performance. The efficient HGNNs consist of meta-graphs and aggregation operations. Since manually designing meta-graph is an expert-dependent and time-consuming ...
InfoMoD: Information-theoretic Model Diagnostics
Validating and debugging machine learning models is done by testing them on unseen data. Analyzing model performance on various subsets of the data is critical for fairness, trust, bias detection and explainablility. In this paper, we describe a new way ...
Decoupled Graph Neural Architecture Search with Variable Propagation Operation and Appropriate Depth
To alleviate the over-smoothing problem caused by deep graph neural networks, decoupled graph neural networks (DGNNs) are proposed. DGNNs decouple the graph neural network into two atomic operations, the propagation (P) operation and the transformation ...
Early ICU Mortality Prediction with Deep Federated Learning: A Real-World Scenario
The generation of large amounts of healthcare data has motivated the use of Machine Learning (ML) to train robust models for clinical tasks. However, limitations of local datasets and restrictions on sharing patient data impede the use of traditional ML ...
Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis
Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing ...
TGSLN : Time-aware Graph Structure Learning Network for Multi-variates Stock Sector Ranking Recommendation
In the field of financial prediction, most studies focus on individual stocks or stock indices. Stock sectors are collections of stocks with similar characteristics and the indices of sectors have more stable trends and predictability compared to ...
Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?
Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial consideration.
...
Interactive Data Mashups for User-Centric Data Analysis
Nowadays, the amount of data is growing rapidly. Through data mining and analysis, information and knowledge can be derived based on this growing volume of data. Different tools have been introduced in the past to specify data analysis scenarios in a ...
Four Factors Affecting Missing Data Imputation
Missing data is a common problem in datasets and impacts the reliability of data analysis. Numerous methods to impute (i.e., predict and replace) missing values have been proposed. The quality of these imputed values depends on factors like correlation,...
Index Terms
- Proceedings of the 35th International Conference on Scientific and Statistical Database Management