Cloud Computing in addition to Big Data ProcessingShivaram VenkataramanUC Berkeley, AMP Lab

Cloud Computing in addition to Big Data ProcessingShivaram VenkataramanUC Berkeley, AMP Lab www.phwiki.com

Cloud Computing in addition to Big Data ProcessingShivaram VenkataramanUC Berkeley, AMP Lab

Birmingham, Paul, News Director has reference to this Academic Journal, PHwiki organized this Journal Cloud Computing in addition to Big Data ProcessingShivaram VenkataramanUC Berkeley, AMP LabSlides from Matei ZahariaCloud Computing, Big DataHardware

Guilford College US www.phwiki.com

This Particular University is Related to this Particular Journal

SoftwareOpen MPIGoogle 1997Data, Data, Data“ Storage space must be used efficiently to store indices in addition to , optionally, the documents themselves. The indexing system must process hundreds of gigabytes of data efficiently ”

Commodity CPUsLots of disksLow b in addition to width networkGoogle 2001Cheap !Datacenter evolutionFacebook’s daily logs: 60 TB1000 genomes project: 200 TBGoogle web index: 10+ PB(IDC report)Slide from Ion StoicaDatacenter EvolutionGoogle data centers in The Dalles, Oregon

Datacenter EvolutionCapacity: ~10000 machinesB in addition to width: 12-24 disks per nodeLatency: 256GB RAM cacheDatacenter NetworkingInitially tree topology Over subscribed linksFat tree, Bcube, VL2 etc.Lots of research to getfull bisection b in addition to widthDatacenter DesignGoalsPower usage effectiveness (PUE)Cost-efficiencyCustom machine designOpen Compute Project(Facebook)

Datacenters Cloud Computing“ long-held dream of computing as a utility ”From Mid 2006Rent virtual computers in the “Cloud”On-dem in addition to machines, spot pricingAmazon EC2 1 ECU = CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

HardwareHopper vs. Datacenter2http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware- as long as -your-new-hadoop-cluster/SummaryFocus on Storage vs. FLOPSScale out with commodity componentsPay-as-you-go model

Jeff Dean @ Google

How do we program this Programming ModelsMessage Passing Models (MPI)Fine-grained messages + computationHard to deal with disk locality, failures, stragglers1 server fails every 3 years 10K nodes see 10 faults/dayProgramming ModelsData Parallel ModelsRestrict the programming interfaceAutomatically h in addition to le failures, locality etc.“Here’s an operation, run it on all of the data”I don’t care where it runs (you schedule that)In fact, feel free to run it retry on different nodes

Birmingham, Paul KNST-AM News Director www.phwiki.com

MapReduceGoogle 2004 Build search indexCompute PageRankHadoop: Open-source at Yahoo, FacebookMapReduce Programming ModelData type: Each record is (key, value)Map function:(Kin, Vin) list(Kinter, Vinter)Reduce function:(Kinter, list(Vinter)) list(Kout, Vout)Example: Word Countdef mapper(line): as long as word in line.split(): output(word, 1)def reducer(key, values): output(key, sum(values))

Word Count Executionthe quickbrown foxthe fox ate the mousehow nowbrown cowMapMapMapReduceReducebrown, 2fox, 2how, 1now, 1the, 3ate, 1cow, 1mouse, 1quick, 1the, 1brown, 1fox, 1quick, 1the, 1fox, 1the, 1how, 1now, 1brown, 1ate, 1mouse, 1cow, 1InputMapShuffle & SortReduceOutputWord Count ExecutionAutomatically split workSchedule taskswith localityJobTrackerSubmit a JobFault RecoveryIf a task crashes:Retry on another nodeIf the same task repeatedly fails, end the jobRequires user code to be deterministic

~250 in person3000 onlinehttp://ampcamp.berkeley.eduH in addition to s-on Exercises using Spark, Shark etc.Course Project IdeasLinear Algebra on commodity clusters Optimizing algorithms Cost model as long as datacenter topologyMeasurement studies Comparing EC2 vs Hopper Optimizing BLAS as long as virtual machinesConclusionCommodity clusters needed as long as big dataKey challenges: Fault tolerance, stragglersData-parallel models: MapReduce in addition to Spark Simplify programming H in addition to le faults automatically

Birmingham, Paul News Director

Birmingham, Paul is from United States and they belong to KNST-AM and they are from  Tucson, United States got related to this Particular Journal. and Birmingham, Paul deal with the subjects like Human Interest; International News; Local News; National News; Regional News

Journal Ratings by Guilford College

This Particular Journal got reviewed and rated by Guilford College and short form of this particular Institution is US and gave this Journal an Excellent Rating.