Welcome to the Third ACM Symposium of Cloud Computing (SoCC'12). This annual symposium is co-sponsored by the ACM Special Interest Group on Management of Data (SIGMOD) and the ACM Special Interest Group on Operating Systems (SIGOPS). Both these communities share a common interest in the rapidly developing field of Cloud Computing, i.e., large scale distributed systems that can manage massive volumes of data and yet deliver reliable and efficient service. As a result, they co-sponsor this symposium with active participation and shared responsibilities from both the communities. In its first year, SoCC was held in conjunction with ACM SIGMOD, the flagship conference of the database community. In the second year, SoCC was held in conjunction with ACM SOSP, the premier conference for operating systems. The goal for co-location was to facilitate effective networking across the two communities, and the symposium was successfully born. This year's edition is being held, for the first time, as an independent event. This year SoCC is being hosted in San Jose, California, a.k.a. Silicon Valley, due to the high level of industrial activity there in the Cloud Computing arena.
Proceeding Downloads
Logic and lattices for distributed programming
In recent years there has been interest in achieving application-level consistency criteria without the latency and availability costs of strongly consistent storage infrastructure. A standard technique is to adopt a vocabulary of commutative operations;...
vBalance: using interrupt load balance to improve I/O performance for SMP virtual machines
A Symmetric MultiProcessing (SMP) virtual machine (VM) enables users to take advantage of a multiprocessor infrastructure in supporting scalable job throughput and request responsiveness. It is known that hypervisor scheduling activities can heavily ...
Improving large graph processing on partitioned graphs in the cloud
As the study of large graphs over hundreds of gigabytes becomes increasingly popular for various data-intensive applications in cloud computing, developing large graph processing systems has become a hot and fruitful research area. Many of those ...
Sailfish: a framework for large scale data processing
In this paper, we present Sailfish, a new Map-Reduce framework for large scale data processing. The Sailfish design is centered around aggregating intermediate data, specifically data produced by map tasks and consumed later by reduce tasks, to improve ...
OS-Sommelier: memory-only operating system fingerprinting in the cloud
Precise fingerprinting of an operating system (OS) is critical to many security and virtual machine (VM) management applications in the cloud, such as VM introspection, penetration testing, guest OS administration (e.g., kernel update), kernel dump ...
How consistent is your cloud application?
Current cloud datastores usually trade consistency for performance and availability. However, it is often not clear how an application is affected when it runs under a low level of consistency. In fact, current application designers have basically no ...
Heterogeneity and dynamicity of clouds at scale: Google trace analysis
To better understand the challenges in developing effective cloud-based resource schedulers, we analyze the first publicly available trace data from a sizable multi-purpose cluster. The most notable workload characteristic is heterogeneity: in resource ...
Using vector interfaces to deliver millions of IOPS from a networked key-value storage server
The performance of non-volatile memories (NVM) has grown by a factor of 100 during the last several years: Flash devices today are capable of over 1 million I/Os per second. Unfortunately, this incredible growth has put strain on software storage ...
Chronos: predictable low latency for data center applications
In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across ...
Bridging the tenant-provider gap in cloud services
The disconnect between the resource-centric interface exposed by today's cloud providers and tenant goals hurts both entities. Tenants are encumbered by having to translate their performance and cost goals into the corresponding resource requirements, ...
Using batteries to reduce the power costs of internet-scale distributed networks
Modern Internet-scale distributed networks have hundreds of thousands of servers deployed in hundreds of locations and networks around the world. Canonical examples of such networks are content delivery networks (called CDNs) that we study in this ...
Zeta: scheduling interactive services with partial execution
This paper presents a scheduling model for a class of interactive services in which requests are time bounded and lower result quality can be traded for shorter execution time. These applications include web search engines, finance servers, and other ...
Themis: an I/O-efficient MapReduce
"Big Data" computing increasingly utilizes the MapReduce programming model for scalable processing of large data collections. Many MapReduce jobs are I/O-bound, and so minimizing the number of I/O operations is critical to improving their performance. ...
Cake: enabling high-level SLOs on shared storage systems
Cake is a coordinated, multi-resource scheduler for shared distributed storage environments with the goal of achieving both high throughput and bounded latency. Cake uses a two-level scheduling scheme to enforce high-level service-level objectives (SLOs)...
Generalized resource allocation for the cloud
Resource allocation is an integral, evolving part of many data center management problems such as virtual machine placement in data centers, network virtualization, and multi-path network routing. Since the problems are inherently NP-Hard, most existing ...
Balancing reducer skew in MapReduce workloads using progressive sampling
The elapsed time of a parallel job depends on the completion time of its longest running constituent. We present a static load balancing algorithm that distributes work evenly across the reducers in a MapReduce job resulting in significant elapsed time ...
Probabilistic deduplication for cluster-based storage systems
The need to backup huge quantities of data has led to the development of a number of distributed deduplication techniques that aim to reproduce the operation of centralized, single-node backup systems in a cluster-based environment. At one extreme, ...
All aboard the Databus!: Linkedin's scalable consistent change data capture platform
- Shirshanka Das,
- Chavdar Botev,
- Kapil Surlaker,
- Bhaskar Ghosh,
- Balaji Varadarajan,
- Sunil Nagaraj,
- David Zhang,
- Lei Gao,
- Jemiah Westerman,
- Phanindra Ganti,
- Boris Shkolnik,
- Sajid Topiwala,
- Alexander Pachev,
- Naveen Somasundaram,
- Subbu Subramaniam
In Internet architectures, data systems are typically categorized into source-of-truth systems that serve as primary stores for the user-generated writes, and derived data stores or indexes which serve reads and other complex queries. The data in these ...
Untangling cluster management with Helix
- Kishore Gopalakrishna,
- Shi Lu,
- Zhen Zhang,
- Adam Silberstein,
- Kapil Surlaker,
- Ramesh Subramonian,
- Bob Schulman
Distributed data systems systems are used in a variety of settings like online serving, offline analytics, data transport, and search, among other use cases. They let organizations scale out their workloads using cost-effective commodity hardware, while ...
More for your money: exploiting performance heterogeneity in public clouds
- Benjamin Farley,
- Ari Juels,
- Venkatanathan Varadarajan,
- Thomas Ristenpart,
- Kevin D. Bowers,
- Michael M. Swift
Infrastructure-as-a-system compute clouds such as Amazon's EC2 allow users to pay a flat hourly rate to run their virtual machine (VM) on a server providing some combination of CPU access, storage, and network. But not all VM instances are created equal:...
Romano: autonomous storage management using performance prediction in multi-tenant datacenters
Workload consolidation is a key technique in reducing costs in virtualized datacenters. When considering storage consolidation, a key problem is the unpredictable performance behavior of consolidated workloads on a given storage system. In practice, ...
The potential dangers of causal consistency and an explicit solution
Causal consistency is the strongest consistency model that is available in the presence of partitions and provides useful semantics for human-facing distributed services. Here, we expose its serious and inherent scalability limitations due to write ...
A case for dual stack virtualization: consolidating HPC and commodity applications in the cloud
With the growth of Infrastructure as a Service (IaaS) cloud providers, many have begun to seriously consider cloud services as a substrate for HPC applications. While the cloud promises many benefits for the HPC community, it currently does not come ...
True elasticity in multi-tenant data-intensive compute clusters
Data-intensive computing (DISC) frameworks scale by partitioning a job across a set of fault-tolerant tasks, then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource ...
alsched: algebraic scheduling of mixed workloads in heterogeneous clouds
As cloud resources and applications grow more heterogeneous, allocating the right resources to different tenants' activities increasingly depends upon understanding tradeoffs regarding their individual behaviors. One may require a specific amount of RAM,...
Designing good algorithms for MapReduce and beyond
As MapReduce/Hadoop grows in importance, we find more exotic applications being written this way. Not every program written for this platform performs as well as we might wish. There are several reasons why a MapReduce program can underperform ...
Distributed programming and consistency: principles and practice
In recent years, distributed programming has become a topic of widespread interest among developers. However, writing reliable distributed programs remains stubbornly difficult. In addition to the inherent challenges of distribution---asynchrony, ...
Open source cloud technologies
Open source cloud technologies such as OpenStack, CloudStack, OpenNebula, Eucalyptus, OpenShift, and Cloud Foundry have gained significant momentum in the last few years. For a researcher and practitioner, they present a unique opportunity to analyze, ...
- Proceedings of the Third ACM Symposium on Cloud Computing