Copy Number Variations Jean-Baptiste Cazier http://www.well.ox.ac.uk/dr-jean-bap

Copy Number Variations Jean-Baptiste Cazier http://www.well.ox.ac.uk/dr-jean-bap www.phwiki.com

Copy Number Variations Jean-Baptiste Cazier http://www.well.ox.ac.uk/dr-jean-bap

Sinclair, Kevin, Late Night On-Air Personality has reference to this Academic Journal, PHwiki organized this Journal Copy Number Variations Jean-Baptiste Cazier http://www.well.ox.ac.uk/dr-jean-baptiste-cazier Jean-Baptiste.Cazier@well.ox.ac.uk DTC BioIn as long as matics Course Hillary Term 2010 WTHCG, Thursday 12th of February Outline Definitions Acronyms: CNP: Copy Number Polymorphisms CNV: Copy Number Variations CNA: Copy Number Aberrations Copy Number Alterations Creation: Germline vs Somatic Is the CNV coming from the original cell or did it evolve only in a few There are very many CNVs shared among population like SNPs or STRs Somatic propagation of CNVs is a mark of Cancer Finding the missing heritability of complex diseases TA Manolio et al. Nature 461, 747-753 (2009)

Abo Akademi University FI www.phwiki.com

This Particular University is Related to this Particular Journal

Gain, Loss, etc Normal: 2 chromosomes are inherited, one from each parents Deletion: Homozygous: 0 copy left Hemizygous: 1 copy left Sizeable event: -> not InDels Gain Can be 3, 4, 5, copies Most often nearby, but not always Not Line, Sine, repeats, etc. Copy Neutral Loss of Heterozygosity Not Copy Number Polymorsphism per se, but needs to be addressed Copy Number Variation in Human Health, Disease, in addition to Evolution Zhang F et al, Ann. Rev. of Gen. in addition to Hum. Gen. 2009 (10) 451-481 Mechanisms 4 main mechanisms in the generation of CNV: NAHR Non-Allelic Homologous Recombination NHEJ Non-Homologous End-Joining FoSTeS Fork Stalling in addition to Template Switching L1 retrotransposition Copy Number Variation in Human Health, Disease, in addition to Evolution Zhang F et al, Ann. Rev. of Gen. in addition to Hum. Gen. 2009 (10) 451-481 Characterization Identification: a Genome-Wide test Karyotyping Multi color chromosome painting Comparative Genetic Hybridization (CGH) Array CGH (aCGH) “SNP”- array Validation: a local test qPCR: quantitative Polymerase Chain Reaction MLPA: Multiplex Ligation-dependent Probe Amplification Fluorescent In-Situ Hybridization (FISH) Sequencing

Array technology Array CGH Agilent, Nimblegen 2 channels: compare hybridization level to a common background reference Usually 42 million probes genome-wide Resolution up to 200bp SNP array Illumina, Affymetrix Test one or few samples at a time Initially developed as long as genotyping 2 channels: allele A/B Increasing density of markers From 10,000 Linkage SNPs Up to 5M SNPs in addition to CNV probes Affymetrix CNV in color (a) Aberrations leading to aneuploidy. (b) Aberrations leaving the chromosome apparently intact Chromosome aberrations in solid tumors Donna G et al. Nature Genetics 34, 369 – 376 (2003) SNP array + + + + + + + + + Revival Genome-Wide Association provided some success in the identification of variants as long as many diseases: AMD, Coeliac disease, Type 2 Diabetes, Prostate Cancer, Colorectal Cancer, etc. However most variants are ‘only’ statistically significant: 80% fall outside of coding regions The case of Missing Heritability: Whatever the number of variants identified, they usually account as long as only a small proportion of the heritability Finding the missing heritability of complex diseases TA Manolio et al. Nature 461, 747-753 (2009)

Missing Heritability Need to find other “reasons” to explain the difference. Heritability definition Proportion of phenotypic variance attributable to additive genetic factors The Common Variant Common Disease model is challenged Look as long as more markers Rarer with strong effect Common with lower effect Gene-Gene interaction Shared environment This is essentially a question of power Groups are joining as long as ces in very large consortium Better technological coverage of the rarer variants More variant types Copy Number Variation InDels, Segmental Duplications. Comparable phenotyping in meta analysis The ‘Dark Matter’ Does it really exists Can we see it beyond its influence Feasibility of identifying genetic variants by risk allele frequency in addition to strength of genetic effect (odds ratio). Finding the missing heritability of complex diseases TA Manolio et al. Nature 461, 747-753 (2009) SNP-array signature Sample data as long as a number of different copy number in addition to LOH events. The Log R Ratio scales with copy number The distribution of the B allele frequency is governed by a more complex relationship with allowable genotypes. Simulation Gain Neutral Loss Real data Copy Number Loss SNP array aCGH

Copy Number Loss in addition to Gain SNP array aCGH Mixed Cell Population SNP array aCGH Copy Neutral LOH SNP array aCGH

Automatic recognition of CNVs Originally done by visual inspection Problem of reproducibility Problem of accuracy With increasing density, problem of possibility to see Automation in addition to test Moving average Probe selection / compilation Segmentation, Hidden Markov Model Significance testing Need to compile data with uncertainty Moving average Automatisation by use of Hidden Markov Model Select automatically the optimal Copy Number sequence over a chromosome to fit the Model Evaluate the probability of the sequence of intensity signal fitting this model Can test various models in addition to select the most appropriate The Model can be trained simply by feeding “typical” data sets Look as long as minimum number of changes Look as long as maximum instability Select a most likely default state 0 1 2 0 1 2 0 1 2

Process Definition: Find the underlying states giving the observation Underlying states are the number of copies: 0,1,2, Observation is the Signal Intensity Defined by 3 probabilistic entities 2 1 0 2 1 0 2 1 0 2 1 0 2 1 0 Segmentation CNAM employs a powerful optimal segmenting algorithm using dynamic programming to detect inherited in addition to de novo CNVs on a per-sample (univariate) in addition to multi-sample (multivariate) basis. Unlike Hidden Markov Models, which assume the means of different copy number states are consistent, optimal segmenting properly delineates CNV boundaries in the presence of mosaicism, even at a single probe level, in addition to with controllable sensitivity in addition to false discovery rate. Available software Graphical Interface: Agilent Golden Helix Partek BeadStudio/GenomeStudio Golf CNAT CNAG dChip PennCNV Uneven field of quality in addition to specificity Comm in addition to line QuantiSNP BirdSuite OncoSNP R packages Somatics DNACopy Aroma Cancer Specific tools

Development of recent array In 2008 McCarroll in addition to Korn published the identification of CNPs in addition to CNVs using/designing Affymetrix SNP 6.0 high resolution array SNP 6.0 by McCarroll “ We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs in addition to copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) in as long as med by integer genotypes as long as 1,320 copy number polymorphisms (CNPs)” McCarroll Published both analysis with chip design in addition to algorithm suite: BirdSuite Per as long as m both genotyping in addition to CNV identification First call as long as known CNP Look as long as new CNV 80% of observed copy number differences due to common CNPs (MAF>5%), > 99% derived from inheritance rather than new mutation. Found a common deletion polymorphism in perfect LD with Crohn’s disease SNPs 2kb upstream IRGM Affect level of expression High density of probes Can identify smaller events E.g. Important to spot residual event in translocation/fusion genes Gain confidence in SNP-regions by increasing the number of probes Can get better resolutions, i.e. more accurate breakpoints: Can split existing large regions into smaller ones Better coverage of CNP These regions were mainly not be covered by SNP-only arrays Beware of overrepresentation of these regions Tiling across the genome More exhaustive picture

Sinclair, Kevin WVAS-FM Late Night On-Air Personality www.phwiki.com

10K 250K Nsp 250K Sty 6.0 4 2 1 4 2 1 4 2 1 4 2 1 Copy Number Loss of 65Kb region confidently identified only with SNP 6.0, Bryan Young et al, Cancer Research UK Increase density t-test on Run I t-test on Run II Summation of I in addition to II Too much data Log 2 Ratio I Log 2 Ratio II Replicates increase signal to noise ratio in addition to avoid false positives in addition to true negatives But it costs twice as much ! 4 2 1 4 2 1 Copy Number t-test Potential Issues Interpretation What to use as a baseline i.e. define the Ratio Variations in probe coverage: Gaps Overlapping probes Inaccurate reference Reference build is inaccurate Probes cannot match the locus accurately Systematic error Autocorrelation with GC content Preparation, e.g. genome amplification

Overlapping probes in regions of CNP Probes in repeat elements SNPs in probes The special case of rodents: There can be many strain from limited number of founders Full sequencing has been limited The reference used as long as the probe generation can be far from the strain tested This will lead to failure across the genome Gauguier et al, in preparation

Future Catalogue of CNPs GSV in addition to WTCCC ef as long as t Use of the 1000 genome project Methods Improvements of the algorithms Improvements of the Computing power Other technologies Use of expression data Use of Clonal Sequencing Single molecule sequencing Useful references Collections of known aberrations: Mitelman Database of Chromosome Aberration in Cancer http://cgap.nci.nih.gov/Chromsomes/Mitelman cytogenetic confirmed Database of Genomics Variants Zhang, J et al. Development of bioin as long as matics resources as long as display in addition to analysis of copy number in addition to other structural variants in the human genome. Cytogenet. Genome Res. (2006). Redon, R. et al. Global variation in copy number in the human genome. Nature, (2006). Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. (2004). McCarrol & Korn (2008) Based on SNP 6.0 in 270 HapMap samples Genome Structural Variation Consortium Conrad D et al. Origins in addition to functional impact of copy number variation in the human genome Nat. Genet. (2009) Practical R- packages: DNACopy: A Package as long as Analyzing DNA Copy data A faster circular binary segmentation algorithm as long as the analysis of array cgh data. Venkatraman, E. S. in addition to Olshen, A. B. (2007). Bioin as long as matics, 23: 657 – 663 snapCGH: Segmentation, Normalization in addition to Processing of aCGH Data BioHMM: a heterogeneous hidden Markov model as long as segmenting array CGH data. Marioni, J. C., Thorne, N. P., in addition to Tavaré, S. (2006).Bioin as long as matics 22: 1144 – 1146 BeadarraySNP: package as long as the analysis of Illumina genotyping BeadArray data High-resolution copy number analysis of paraffin-embedded archival tissue using SNP BeadArrays. Oosting J et al. Genome Res. 2007 Mar;17(3):368-76 Web interface: Integration of CNV results across multiple samples http://www.well.ox.ac.uk/~jcazier/GWA-Viewer.html

Sinclair, Kevin Late Night On-Air Personality

Sinclair, Kevin is from United States and they belong to WVAS-FM and they are from  Montgomery, United States got related to this Particular Journal. and Sinclair, Kevin deal with the subjects like Local News; Music; National News

Journal Ratings by Abo Akademi University

This Particular Journal got reviewed and rated by Abo Akademi University and short form of this particular Institution is FI and gave this Journal an Excellent Rating.