Big Data in addition to Cloud Computing: Current State in addition to Future OpportunitiesEDBT 2011 Tu

Big Data in addition to Cloud Computing: Current State in addition to Future OpportunitiesEDBT 2011 Tu www.phwiki.com

Big Data in addition to Cloud Computing: Current State in addition to Future OpportunitiesEDBT 2011 Tu

Schreiber, Dave, Morning Drive-Time Personality/Production Director has reference to this Academic Journal, PHwiki organized this Journal Big Data in addition to Cloud Computing: Current State in addition to Future OpportunitiesEDBT 2011 TutorialDivy Agrawal, Sudipto Das, in addition to Amr El AbbadiDepartment of Computer ScienceUniversity of Cali as long as nia at Santa BarbaraWhy WEB is replacing the DesktopEDBT 2011 TutorialParadigm Shift in ComputingEDBT 2011 Tutorial

Fisk University US www.phwiki.com

This Particular University is Related to this Particular Journal

What is Cloud ComputingDelivering applications in addition to services over the Internet:Software as a serviceExtended to: Infrastructure as a service: Amazon EC2Plat as long as m as a service: Google AppEngine, Microsoft AzureUtility Computing: pay-as-you-go computingIllusion of infinite resourcesNo up-front costFine-grained billing (e.g. hourly) EDBT 2011 TutorialCloud Computing: HistoryEDBT 2011 TutorialCloud Computing: Why NowExperience with very large datacentersUnprecedented economies of scaleTransfer of riskTechnology factorsPervasive broadb in addition to InternetMaturity in Virtualization TechnologyBusiness factorsMinimal capital expenditurePay-as-you-go billing modelEDBT 2011 Tutorial

Economics of Cloud UsersPay by use instead of provisioning as long as peakStatic data centerData center in the cloudEDBT 2011 TutorialSlide Credits: Berkeley RAD LabUnused resourcesEconomics of Cloud UsersRisk of over-provisioning: underutilizationStatic data centerEDBT 2011 TutorialSlide Credits: Berkeley RAD LabEconomics of Cloud UsersHeavy penalty as long as under-provisioningLost revenueLost usersEDBT 2011 TutorialSlide Credits: Berkeley RAD Lab

The Big PictureUnlike the earlier attempts:Distributed ComputingDistributed DatabasesGrid ComputingCloud Computing is likely to persist:Organic growth: Google, Yahoo, Microsoft, in addition to AmazonPoised to be an integral aspect of National Infrastructure in US in addition to other countriesEDBT 2011 TutorialCloud RealityFacebook Generation of Application DevelopersAnimoto.com:Started with 50 servers on Amazon EC2Growth of 25,000 users/hourNeeded to scale to 3,500 servers in 2 days (RightScale@SantaBarbara)Many similar stories: RightScaleJoyent EDBT 2011 TutorialCloud Challenges: ElasticityEDBT 2011 Tutorial

Cloud Challenges: Differential Pricing ModelsEDBT 2011 TutorialOutlineData in the CloudPlat as long as ms as long as Data AnalysisPlat as long as ms as long as Update intensive workloadsData Plat as long as ms as long as Large ApplicationsMultitenant Data Plat as long as msOpen Research ChallengesEDBT 2011 TutorialOur Data-driven WorldScienceData bases from astronomy, genomics, environmental data, transportation data, Humanities in addition to Social SciencesScanned books, historical documents, social interactions data, Business & CommerceCorporate sales, stock market transactions, census, airline traffic, EntertainmentInternet images, Hollywood movies, MP3 files, MedicineMRI & CT scans, patient records, EDBT 2011 Tutorial

Data-rich WorldData capture in addition to collection:Highly instrumented environmentSensors in addition to Smart Devices Network Data storage:Seagate 1 TB Barracuda @ $72.95 from Amazon.com (73¢/GB)EDBT 2011 TutorialWhat can we do with this wealthWhat can we doScientific breakthroughsBusiness process efficienciesRealistic special effectsImprove quality-of-life: healthcare, transportation, environmental disasters, daily life, Could We Do MoreYES: but need major advances in our capability to analyze this dataEDBT 2011 TutorialCloud Computing ModalitiesHosted Applications in addition to servicesPay-as-you-go modelScalability, fault-tolerance, elasticity, in addition to self-manageabilityVery large data repositoriesComplex analysisDistributed in addition to parallel data processing“Can we outsource our IT software in addition to hardware infrastructure”“We have terabytes of click-stream data – what can we do with it”EDBT 2011 Tutorial

OutlineData in the CloudPlat as long as ms as long as Data AnalysisPlat as long as ms as long as Update intensive workloadsData Plat as long as ms as long as Large ApplicationsMultitenant Data Plat as long as msOpen Research ChallengesEDBT 2011 TutorialData Warehousing, Data Analytics & Decision Support SystemsUsed to manage in addition to control businessTransactional Data: historical or point-in-timeOptimized as long as inquiry rather than updateUse of the system is loosely defined in addition to can be ad-hocUsed by managers in addition to analysts to underst in addition to the business in addition to make judgmentsEDBT 2011 TutorialData Analytics in the Web ContextData capture at the user interaction level: in contrast to the client transaction level in the Enterprise contextAs a consequence the amount of data increases significantlyGreater need to analyze such data to underst in addition to user behaviorsEDBT 2011 Tutorial

Data Analytics in the CloudScalability to large data volumes:Scan 100 TB on 1 node @ 50 MB/sec = 23 daysScan on 1000-node cluster = 33 minutes Divide-And-Conquer (i.e., data partitioning)Cost-efficiency:Commodity nodes (cheap, but unreliable)Commodity networkAutomatic fault-tolerance (fewer administrators)Easy to use (fewer programmers)EDBT 2011 TutorialPlat as long as ms as long as Large-scale Data AnalysisParallel DBMS technologiesProposed in the late eightiesMatured over the last two decadesMulti-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions as long as very large enterprisesMap Reduce pioneered by Googlepopularized by Yahoo! (Hadoop)EDBT 2011 TutorialParallel DBMS technologiesPopularly used as long as more than two decadesResearch Projects: Gamma, Grace, Commercial: Multi-billion dollar industry but access to only a privileged fewRelational Data ModelIndexingFamiliar SQL interfaceAdvanced query optimizationWell understood in addition to well studiedEDBT 2011 Tutorial

Schreiber, Dave KGCB-FM Morning Drive-Time Personality/Production Director www.phwiki.com

MapReduce [Dean et al., OSDI 2004, CACM Jan 2008, CACM Jan 2010]Overview:Data-parallel programming model An associated parallel in addition to distributed implementation as long as commodity clustersPioneered by GoogleProcesses 20 PB of data per dayPopularized by open-source Hadoop projectUsed by Yahoo!, Facebook, Amazon, in addition to the list is growing EDBT 2011 TutorialProgramming FrameworkRaw Input: MAPREDUCEEDBT 2011 TutorialMapReduce AdvantagesAutomatic Parallelization:Depending on the size of RAW INPUT DATA instantiate multiple MAP tasksSimilarly, depending upon the number of intermediate partitions instantiate multiple REDUCE tasksRun-time:Data partitioningTask schedulingH in addition to ling machine failuresManaging inter-machine communicationCompletely transparent to the programmer/analyst/userEDBT 2011 Tutorial

MapReduce ExperienceRuns on large commodity clusters:1000s to 10,000s of machinesProcesses many terabytes of dataEasy to use since run-time complexity hidden from the users1000s of MR jobs/day at Google (circa 2004)100s of MR programs implemented (circa 2004)EDBT 2011 TutorialThe NeedSpecial-purpose programs to process large amounts of data: crawled documents, Web Query Logs, etc.At Google in addition to others (Yahoo!, Facebook):Inverted indexGraph structure of the WEB documentsSummaries of pages/host, set of frequent queries, etc.Ad OptimizationSpam filteringEDBT 2011 TutorialMapReduce ContributionsSimple & PowerfulProgramming Paradigm For Large-scale Data AnalysisRun-time System ForLarge-scale Parallelism & DistributionEDBT 2011 Tutorial

Coffee BreakReferences[Dean et al., ODSI 2004] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S. Ghemawat, In OSDI 2004[Dean et al., CACM 2008] MapReduce: Simplified Data Processing on Large Clusters, J. Dean, S. Ghemawat, In CACM Jan 2008[Dean et al., CACM 2010] MapReduce: a flexible data processing tool, J. Dean, S. Ghemawat, In CACM Jan 2010[Stonebraker et al., CACM 2010] MapReduce in addition to parallel DBMSs: friends or foes, M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, A, Rasin, In CACM Jan 2010[Pavlo et al., SIGMOD 2009] A comparison of approaches to large-scale data analysis, A. Pavlo et al., In SIGMOD 2009[Abouzeid et al., VLDB 2009] HadoopDB: An Architectural Hybrid of MapReduce in addition to DBMS Technologies as long as Analytical Workloads, A. Abouzeid et al., In VLDB 2009[Afrati et al., EDBT 2010] Optimizing joins in a map-reduce environment, F. N. Afrati, J. D. Ullman, In EDBT 2010[Agrawal et al., SIGMOD 2009] Asynchronous view maintenance as long as VLSD databases, P. Agrawal et al., In SIGMOD 2009[Das et al., SIGMOD 2010] Ricardo: Integrating R in addition to Hadoop, S. Das et al., In SIGMOD 2010[Cohen et al., VLDB 2009] MAD Skills: New Analysis Practices as long as Big Data, J. Cohen et al., In VLDB 2009EDBT 2011 Tutorial

Schreiber, Dave Morning Drive-Time Personality/Production Director

Schreiber, Dave is from United States and they belong to KGCB-FM and they are from  Prescott, United States got related to this Particular Journal. and Schreiber, Dave deal with the subjects like Music; Religious/Gospel

Journal Ratings by Fisk University

This Particular Journal got reviewed and rated by Fisk University and short form of this particular Institution is US and gave this Journal an Excellent Rating.