Adventures in the Biology trade : Bioinformatics in the Petabyte era (60 mins, ~42 MB)
Listen now
Description
Bioinformatics and more widely Computational Biology is a largely data-driven Science. The array of high-throughput technology platforms in the last 10 years mean that the amount of data being generated in this field is likely to enter into Exabytes by 2020. The challenges associated with this are quite different from the data sets generated by High Energy Physics or Astrophysics in that they tend to gathered from a wide variety of different providers. Meta-analyses of these data sets can give startling new insights but come with many caveats - in particular that the quality of the data from each provider can be highly variable. I will spend some time talking about one set of experiences I have dealing with one specific technology platform and in particular how it is clear that the detection of bias in data sets is a key element of any high-throughput analysis. This talk was given as part of our MSc in HPC's 'HPC Ecosystem' course. Talk slides
More Episodes
Performing complex solar shading analysis to take into account the sun's path and solar penetration on large buildings has historically consumed very many CPU cycles for IES "Virtual Environment" (3D building physics) simulation users. One particularly complex model took almost 2 weeks to...
Published 03/14/14
Intel will provide an insight into future HPC technology development looking at hardware trends, ecosystem support and the challenges around ExaScale computing. The talk will also touch upon the convergence of High Performance Computing and High Performance Data Analytics, examining where the...
Published 02/28/14
PrimeGrid is a volunteer computing project that gives participants the chance to be the discoverer of a new world record prime number! In addition, we are working towards the solution of several mathematical problems which have remained unsolved for over 50 years. The talk will cover some basic...
Published 09/07/13