Thursday, July 23, 2009

Performing ANOVA t-test on Microarray Gene Expression data

Obama's surprise visit to our research lab at the Childrens Hospital Research Institute didn't shake off my supervisors focus on lab work. On that very afternoon, I got assigned a new task that demanded use of MATLAB® to perform data analysis on microarray data collected from Muscular Dystrophy diseased muscle tissues. I only did it today because I was still healing from the hyper-obama excitement Syndrome. The data that was presented to me was collected using U133plus2 human genome micro array chips which have about 60,000 gene probes. I currently have four data sets with 120 timestamps for each gene probe. This means that the matrices have approximately 60,000 rows and 120 columns.


The task was to remove genes with negligible change in their expression values. To achievetthis , I had to conduct ANOVA (ANalysis Of VAriance) one way t-tests for each gene. Using Microsoft Excel for this undertaking would be like searching for a needle in stacks of hay. With a simple function signGenes - [Significant Genes], that I wrote using MATLAB® programming language , I was able to filter out significant genes from the matrices in less than 30 minutes. More power to MATLAB®!

Click on script image to view larger version


The function works fine- taking about 50 seconds to run through each Microarray matrix. However, M-Lint Code Check keeps giving me the following message:
Line 27: Array 'remove' might be grown using subscripting. Consider preallocating for speed.
I am still a MATLAB novice so this report kinda made me hit the wall. How can I make a cell array for reallocation if I do not know the number of elements that I will be putting in it? I wish this was Java®- I could just make use of Node objects or something!

Related Posts by Categories



0 comments:

Post a Comment