Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark

J Gu, YH Watanabe, WA Mazza, A Shkapsky… - Proceedings of the …, 2019 - dl.acm.org
J Gu, YH Watanabe, WA Mazza, A Shkapsky, M Yang, L Ding, C Zaniolo
Proceedings of the 2019 International Conference on Management of Data, 2019dl.acm.org
Thanks to a simple SQL extension, Recursive-aggregate-SQL (RaSQL) can express very
powerful queries and declarative algorithms, such as classical graph algorithms and data
mining algorithms. A novel compiler implementation allows RaSQL to map declarative
queries into one basic fixpoint operator supporting aggregates in recursive queries. A fully
optimized implementation of this fixpoint operator leads to superior performance, scalability
and portability. Thus, our RaSQL system, which extends Spark SQL with the before …
Thanks to a simple SQL extension, Recursive-aggregate-SQL (RaSQL) can express very powerful queries and declarative algorithms, such as classical graph algorithms and data mining algorithms. A novel compiler implementation allows RaSQL to map declarative queries into one basic fixpoint operator supporting aggregates in recursive queries. A fully optimized implementation of this fixpoint operator leads to superior performance, scalability and portability. Thus, our RaSQL system, which extends Spark SQL with the before-mentioned new constructs and implementation techniques, matches and often surpasses the performance of other systems, including Apache Giraph, GraphX and Myria.
ACM Digital Library