Analysis in addition to comparison of very large metagenomes with fast clustering in addition to functional annotation

Analysis in addition to comparison of very large metagenomes with fast clustering in addition to functional annotation www.phwiki.com

Analysis in addition to comparison of very large metagenomes with fast clustering in addition to functional annotation

Weinstein, Randi, Managing Editor has reference to this Academic Journal, PHwiki organized this Journal Analysis in addition to comparison of very large metagenomes with fast clustering in addition to functional annotationWeizhong Li, BMC Bioin as long as matics 2009Present by Chuan-Yih YuOutlineRapid Analysis of Multiple Metagenomes with a Clustering in addition to Annotation Pipeline (RAMMCAP)GoalMethodologyMetagenome comparisonConclusionDiscussionGoalReduce computation timeGlobal Ocean Survey(GOS): 1 M CPU Hours = 144 yrsDiscover the novel gene or protein familiesMetagenomic Profiling of Nice Biomes(BIOME) : ~90% sequences unknownGOS: double the protein familiesCompare metagenome dataClustering-basedProtein family-based

Concordia University, Austin US www.phwiki.com

This Particular University is Related to this Particular Journal

RAMMCAPRNARAMMCAP

Meta-RNA & tRNAscanHigh sensitivity, Low specificity(Except 16S)“Identification of ribosomal RNA genes in metagenomic fragments.“, Huang, Y., Gilna, P. & Li, W. Z. Bioin as long as matics“tRNAscan-SE: a program as long as improved detection of transfer RNA genes in genomic sequence.“, Lowe, T.M. in addition to Eddy, S.R. Nucleic Acids ResClusteringCD-HITRAMMCAP

CD-HITGreedy incremental clustering algorithmWhole pairwise alignment avoidShort word (2~5)Index table”Clustering of highly homologous sequences to reduce the size of large protein database”, Weizhong Li, et al. Bioin as long as matics, (2001) “Tolerating some redundancy significantly speeds up clustering of large protein databases”, Weizhong Li, et al. Bioin as long as matics, (2002) “Cd-hit: a fast program as long as clustering in addition to comparing large sets of protein or nucleotide sequences”, Weizhong Li, et al. Bioin as long as matics, (2006). Limitation of CD-HITEvenly distributed mismatchesGreedy issueGroup in first meet cluster

CD-HIT Per as long as manceOrfS clusteringRAMMCAP

Why Cluster ORFsFunction studiesNovel genes findingORF PredictionORF-finderMetageneORF Prediction Per as long as manceMetaSimAverage 100, 200, 400, 800 bp, 1 million readsTrue ORF (sensitivity)Overlap 30 AA with NCBI annotated ORFPredicted ORF (specificity)50% overlap with true ORF

ORF ClusteringRun 1 clustering90~95% identityRun 2 clustering60% identity over 80% of length (454)30% identity over 80% of length (Sanger)Merge run 1 & 2 resultClustering EvaluationTest setsGOS-ORF (30%),BIOME (95%),BIOME-ORF (60%)BIOME Microbiomes & ViromesMicrobial sequences are more conserved than viral sequences.

Clustering QualityNeed conservative thresholdUse only >30 AA Pfam sequenceDiscard short sequence in overlapping Pfam sequencePlace into different clusterSequence in the same Pfam, place into different cluster.Clustering ValidationGenerate a clusters whose sequences from the same PfamMinimize the number of clustersGood clusters : >95% members from the same Pfam>97% sequences are in good clusters~30 times more than bad clustersNumber of sequencesNumber of clustersCluster SizeRAMMCAP

Weinstein, Randi Phoenix Business Journal Managing Editor www.phwiki.com

Protein Family AnnotationPfam (24.0, Oct. 2009, 11912 families)textual descriptions, other resources in addition to literature referencesTIGRFAMs (9.0, Nov. 2009, 3808 models)GO, Pfam in addition to InterPro modelsCOG(2003, 4873 clusters of orthologous groups)3 lineages in addition to ancient conserved domainRPSBLAST(Reverse psi-blast)E values 0.001Novel Protein Families DiscoverySpurious ORFs in a large size of cluster without homology match may contain novel protein families.In GOS only 1.3% of clusters with cluster size 10 map to 93% of true ORFsIn BIOME only 1.0% of clusters with cluster size 5 map to 28% of true ORFsMetagenome comparison

Statistical Comparison of MetagenomicsOccurrence profile coefficientz score, why (not Rodriguez-Brito’s require 105 simulated samples)Low occurrence cut offHA=4 (0.95) z=1.96HA=7 (0.99) z=2.581.z> cut off2.PA f x PBComparison between Rodriguez-Brito’s method in addition to z test method.Clustering-based ComparisonGOS ORF clustersrABNo. of cluster

ConclusionRAMMCAP improve per as long as mance CD-HITz testNovel protein families discoveryORFs clusteringMetagenome comparisonCluster-basedProtein family-basedDiscussionHow much improvement apply RNA prediction How to determine significant factorPA f PB (f>1)

Weinstein, Randi Managing Editor

Weinstein, Randi is from United States and they belong to Phoenix Business Journal and they are from  Phoenix, United States got related to this Particular Journal. and Weinstein, Randi deal with the subjects like Business; Regional Business News

Journal Ratings by Concordia University, Austin

This Particular Journal got reviewed and rated by Concordia University, Austin and short form of this particular Institution is US and gave this Journal an Excellent Rating.