Archive for June, 2007

The Self-Organized Gene (Part 2)

Wednesday, June 13th, 2007

In 2003 the Human Genome Project was completed, mapping the entire human genome. The project was started in 1990 and was estimated to take some 30-40 years to complete. What the initial predictions missed was that DNA sequencing was subject to what Kurzweil refers to as the law of accelerating returns. The power of DNA sequencing has been increasing exponentially while the cost of the sequencing has been falling exponentially. Thus the project was completed much earlier than expected.

The Human Genome Project is however not the only completed full genome mapping. Thanks to the fast and cheap sequencing a wide range of other organisms have had their DNA fully sequenced. In this second part of this tutorial we will look at yeast and how its genes control its metabolic processes. We will use DNA microarray data to look at this problem.

In the first part of the tutorial we introduced competitive algorithms and focused on the Self-Organizing Map (SOM). We showed how it can be used to visualize high-dimensional data in a very intuitive way.

In this part we will move on to meta-clustering: how and why to cluster a SOM using a neural gas. You will also familiarize yourself with the unified distance matrix (U-Matrix). We will then move on to the actual problem involving the microarray data. After we have done our clustering with the SOM Visualizer we will create a standard neural network based classifier based on the results.

If you haven’t gone through the first part, it is strongly recommended that you do so before proceeding.

Although having Synapse is not absolutely necessary for this tutorial, it is recommended as interactivity helps a lot. You can download it here.

(more…)