Scalable Analysis for Big Biological Data

Attention Presenters - please review the Speaker Information Page available here

The modern era of bioinformatics is defined by a deluge of data coupled with powerful computing. Providing tools and data that are FAIR (Findable, Accessible, Interoperative, and Reusable), while not currently consistently applied, is a sensible goal for the bioinformatics community. The coupling of big biological data and its application to scientific and medical knowledge can potentially provide meaningful improvements to quality of life and future discoveries. However, to capture the practical application of big data, the tools and methodology need to be honed for the scale of the data and the quality of the results. How do we adapt algorithms to efficiently handle terabytes or even petabytes of genomic data? What unique problems only arise when investigating problems at large scale? How do we balance a trade-off in scalability versus accuracy?

Available biological datasets have increased exponentially in size over the past decades due to technological improvements. As modern databases increase in size, so too does the cost of maintenance and the complexity of the data. Increased biological data necessitates additional computational expertise and infrastructure, as well as the relevant biological expertise for interpreting the data in the burgeoning field of bioinformatics. As a result, efficient and scalable analyses are urgently needed for the scientific community to leverage the opportunities of “big data” while ensuring the results are scalable, accurate, and accessible. It can be especially difficult to replicate results at scale because of the computing power needed. Differences in cloud computing environments compound these issues, occasionally leading to problems with replicability. The era of big data has the potential to catalyze incredible progress toward answering scientific questions, but a concerted effort should be made to ensure that the methods are robust and reliable.

Meaningful presentations highlighting solutions to these challenges will be the theme of this session. Analysis of big data sets and scalable algorithmic approaches are especially relevant. We are also interested in talks that have developed best practices for replicability and how these methods are used in comparing algorithms and results. Talks may focus on algorithm development, large scale applications, and/or novel understudied problems using big biological data. We look forward to proposals covering FAIR, challenging, and pressing issues in computational biology as well as high-throughput and high-performance computing.

Schedule subject to change
All times listed are in EDT

Thursday, May 16^th

10:30-11:00

SQLSTATE[42S02]: Base table or view not found: 1146 Table '348208_glbio.presenters' doesn't exist