Association in multifactorial traits: how to deal with rare observations?
Jannot A.-S., Essioux L., Clerget-Darpoux F.
To detect the role of a candidate gene for a trait in a sample of individuals, we may test SNP haplotype or diplotype effects. For a limited sample size, many haplotype or diplotype categories may contain few individuals. This involves a power decrease when testing the association between the trait and the haplotypes or diplotypes as these categories provide little additional information while increasing the degrees of freedom. The present paper proposes a new strategy to group rare categories based on a measure of similarity between haplotypes or diplotypes and compares it to two other possible strategies to deal with rare categories: a SNP selection strategy based on haplotype diversity, and a grouping strategy that pools all rare categories into a single baseline group. This comparison is performed by means of simulation under four scenarios. We show that this new strategy shows the largest increase in power irrespective of the model underlying the candidate gene in the studied trait. This strategy therefore provides a powerful alternative to currently used methods to reduce the number of rare categories.