Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsoura

Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsoura www.phwiki.com

Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsoura

Hundahl, Mark, Co-Publisher has reference to this Academic Journal, PHwiki organized this Journal Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec, Jure Kang, U Our goal: One-stop solution as long as mining huge graphs Outline Datasets: Synthetic (‘Kronecker’, ~300M nodes, 1B edges) NetFlix (20K movies, ~500K users, 100M edges)

Lipscomb University TN www.phwiki.com

This Particular University is Related to this Particular Journal

100 machines – 8min Degree Distributions – NetFlix Movie in-degree count 100 machines – 8min Degree Distributions – NetFlix Movie in-degree count Theoretically expected 100 machines – 8min Degree Distributions – NetFlix User out-degree count

100 machines – 8min Degree Distributions – NetFlix User out-degree count Theoretically expected Sharp drop below 100 ratings Nodes:259M – Edges: 1B 100 machines – 6h Degree Distributions – Kronecker degree count Degree Distributions – timings Edge file size (MB) Time (sec) 1 task 24 tasks 48 tasks

Outline Datasets: Synthetic (‘Kronecker’, ~300M nodes, 1B edges) NetFlix (20K movies, ~500K users, 100M edges) Diameter of a graph Maximum shortest path Normally, > O(N2) ANF : `Approximate Neighborhood function’ [Palmer+02]: O(E) Goal : calculate neighborhood function Neighborhood N(h) : number of pairs of nodes within distance h Diameter For large jobs, parallelization helps Unstable results due to shared machines Diameter Edge file (MB) Time (min) 1 node 48 nodes 28 nodes

Diameter / Hop Plot (Netflix) h: of hops of reachable pairs within <= h hops Diameter / Hop Plot (Netflix) h: of hops of reachable pairs within <= h hops Diameter: 3 Outline Datasets: Synthetic (‘Kronecker’, ~300M nodes, 1B edges) NetFlix (20K movies, ~500K users, 100M edges) Community detection Cross associations [Chakrabarti+ ’04] Community detection Outline Datasets: Synthetic (‘Kronecker’, ~300M nodes, 1B edges) NetFlix (20K movies, ~500K users, 100M edges) Triangles ‘friends of friends are friends’ Triangles ‘friends of friends are friends’ Triangles ‘friends of friends are friends’ Naïve algo: 3-way join (slow) [Tsourakakis’08]: triangles ~ sum of cubes of eigenvalues Thus, super-fast computation of triangles (100x - 25,000x faster than naïve; >95% accuracy

Triangles Easy to implement on hadoop: it only needs eigenvalues (to do, with Lanczos) Outline Datasets: Synthetic (‘Kronecker’, ~300M nodes, 1B edges) NetFlix (20K movies, ~500K users, 100M edges) Visualization Principled visualization of large graphs (show few most `important’ edges)

Hundahl, Mark In Los Angeles Co-Publisher www.phwiki.com

Summary Goal: one-stop solution as long as mining huge graphs

Hundahl, Mark Co-Publisher

Hundahl, Mark is from United States and they belong to In Los Angeles and they are from  Los Angeles, United States got related to this Particular Journal. and Hundahl, Mark deal with the subjects like Gay/Lesbian

Journal Ratings by Lipscomb University

This Particular Journal got reviewed and rated by Lipscomb University and short form of this particular Institution is TN and gave this Journal an Excellent Rating.