Saturday 16th August, 1pm to 6pm at Broad Oak Social Club
Bioinformatics - Jan Kim
Biological systems generate and process information, and new DNA sequencing technologies [1] now provide unprecedented experimental access to genetic information. The volume of sequence data necessitates the use of high performance computing for its storage and analysis, and the majority of sequence data is now held in public repositories operated by global collaborations [2].
Linux is a major platform for the development of bioinformatics systems and APIs, including EMBOSS [3], clustal [4], APIs for general purpose programming systems such as Bioconductor [5] for R or Biopython [6], and specialised high performance systems for "Next Generation Sequencing" (NGS) data analysis [7, 8, 9]. These systems are generally developed as Open Source projects and packaged for all major Linux distributions.
My general plan for this talk is to start with a brief background on molecular genetics, and to then give a flavour of the use of some of the systems above in scientific computing environments [10]. It will be a group decision which of these we'll look into, and to which extent -- I'll bring my laptop and slides covering most of the systems above but expect there won't be time for all of them.