Constellation: A secure self-optimizing framework for genomic processing

Publications

Abstract:
The Constellation framework is designed for process automation, secure workload distribution, and performance optimization for genomic processing. The goal of Constellation is to provide a flexible platform for the processing of custom ”write once run anywhere” genomic pipelines, across a range of computational resources and environments, through the agent-based management of genomic processing containers. An implementation of the Constellation framework is currently in use at the University of Kentucky Medical Center for clinical diagnostic and research genomic processing.
Published in: e-Health Networking, Applications and Services (Healthcom), 2016 IEEE 18th International

Date of Conference: 14-16 Sept. 2016
Date Added to IEEE Xplore: 21 November 2016
ISBN Information:
Electronic ISBN: 978-1-5090-3370-6
Print on Demand(PoD) ISBN: 978-1-5090-3371-3

DOI: 10.1109/HealthCom.2016.7749534
Publisher: IEEE

Citation:
Bumgardner, VK Cody, et al. “Constellation: A secure self-optimizing framework for genomic processing.” e-Health Networking, Applications and Services (Healthcom), 2016 IEEE 18th International Conference on. IEEE, 2016.

Encyclopedia of Cloud Computing : Educational Applications of the Cloud

Publications

This chapter covers a broad range of cloud computing concepts, technologies, and challenges related to education. We will define educational applications as resources related to learning, encompassing stages of development from preschool through higher and continuing education. In the second section we describe the ways cloud technology is being adopted in education. We will cover the benefits and barriers of cloud adoption based on technology found in education. In the third section we describe cloud resources used in the direct instruction of students, front-office applications (user facing), and back office-applications (administrative). The final section will describe cloud computing in research.

Publisher Link

Citation:
Bumgardner, V. K., Victor Marek, and Doyle Friskney. “Educational Applications of the Cloud.” Encyclopedia of Cloud Computing (2016): 505-516.

Collating time-series resource data for system-wide job profiling

Publications

Abstract:
Through the collection and association of discrete time-series resource metrics and workloads, we can both provide benchmark and intra-job resource collations, along with system-wide job profiling. Traditional RDBMSes are not designed to store and process long-term discrete time-series metrics and the commonly used resolution-reducing round robin databases (RRDB), make poor long-term sources of data for workload analytics. We implemented a system that employs “Big-data” (Hadoop/HBase) and other analytics (R) techniques and tools to store, process, and characterize HPC workloads. Using this system we have collected and processed over a 30 billion time-series metrics from existing short-term high-resolution (15-sec RRDB) sources, profiling over 200 thousand jobs across a wide spectrum of workloads. The system is currently in use at the University of Kentucky for better understanding of individual jobs and system-wide profiling as well as a strategic source of data for resource allocation and future acquisitions.

Published in: Network Operations and Management Symposium (NOMS), 2016 IEEE/IFIP
Date of Conference: 25-29 April 2016
Date Added to IEEE Xplore: 04 July 2016
ISBN Information:
Electronic ISSN: 2374-9709

INSPEC Accession Number: 16124063
DOI: 10.1109/NOMS.2016.7502958
Publisher: IEEE

Citation:
Bumgardner, VK Cody, Victor W. Marek, and Ray L. Hyatt. “Collating time-series resource data for system-wide job profiling.” Network Operations and Management Symposium (NOMS), 2016 IEEE/IFIP. IEEE, 2016.

Scalable hybrid stream and hadoop network analysis system

Publications

Collections of network traces have long been used in network traffic analysis. Flow analysis can be used in network anomaly discovery, intrusion detection and more generally, discovery of actionable events on the network. The data collected during processing may be also used for prediction and avoidance of traffic congestion, network capacity planning, and the development of software-defined networking rules. As network flow rates increase and new network technologies are introduced on existing hardware platforms, many organizations find themselves either technically or financially unable to generate, collect, and/or analyze network flow data. The continued rapid growth of network trace data, requires new methods of scalable data collection and analysis. We report on our deployment of a system designed and implemented at the University of Kentucky that supports analysis of network traffic across the enterprise. Our system addresses problems of scale in existing systems, by using distributed computing methodologies, and is based on a combination of stream and batch processing techniques. In addition to collection, stream processing using Storm is utilized to enrich the data stream with ephemeral environment data. Enriched stream-data is then used for event detection and near real-time flow analysis by an in-line complex event processor. Batch processing is performed by the Hadoop MapReduce framework, from data stored in HBase BigTable storage.

In benchmarks on our 10 node cluster, using actual network data, we were able to stream process over 315k flows/sec. In batch analysis were we able to process over 2.6M flows/sec with a storage compression ratio of 6.7:1.

Citation:
Bumgardner, Vernon KC, and Victor W. Marek. “Scalable hybrid stream and hadoop network analysis system.” Proceedings of the 5th ACM/SPEC international conference on Performance engineering. ACM, 2014.