Data Mining: Concepts in addition to Techniques Data Mining: Concepts in addition to Techniques Ji

Data Mining: Concepts in addition to Techniques  Data Mining: Concepts in addition to Techniques Ji www.phwiki.com

Data Mining: Concepts in addition to Techniques Data Mining: Concepts in addition to Techniques Ji

Kenyon, Paulette, Food Editor has reference to this Academic Journal, PHwiki organized this Journal Data Mining: Concepts in addition to Techniques Data Mining: Concepts in addition to Techniques Jiawei Han in addition to Micheline Kamber Data Mining: Concepts in addition to Techniques Chapter 1. Introduction Motivation: Why data mining What is data mining Data Mining: On what kind of data Data mining functionality Classification of data mining systems Top-10 most popular data mining algorithms Major issues in data mining Overview of the course Data Mining: Concepts in addition to Techniques Why Data Mining The Explosive Growth of Data: from terabytes to petabytes Data collection in addition to data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, Science: Remote sensing, bioin as long as matics, scientific simulation, Society in addition to everyone: news, digital cameras, YouTube We are drowning in data, but starving as long as knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets

Cincinnati School of Medical Massage OH www.phwiki.com

This Particular University is Related to this Particular Journal

Data Mining: Concepts in addition to Techniques Evolution of Sciences Be as long as e 1600, empirical science 1600-1950s, theoretical science Each discipline has grown a theoretical component. Theoretical models often motivate experiments in addition to generalize our underst in addition to ing. 1950s-1990s, computational science Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, in addition to computational ecology, or physics, or linguistics.) Computational Science traditionally meant simulation. It grew out of our inability to find closed- as long as m solutions as long as complex mathematical models. 1990-now, data science The flood of data from new scientific instruments in addition to simulations The ability to economically store in addition to manage petabytes of data online The Internet in addition to computing Grid that makes all these archives universally accessible Scientific info. management, acquisition, organization, query, in addition to visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge! Jim Gray in addition to Alex Szalay, The World Wide Telescope: An Archetype as long as Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002 Data Mining: Concepts in addition to Techniques Evolution of Database Technology 1960s: Data collection, database creation, IMS in addition to network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s: Data mining, data warehousing, multimedia databases, in addition to Web databases 2000s Stream data management in addition to mining Data mining in addition to its applications Web technology (XML, data integration) in addition to global in as long as mation systems Data Mining: Concepts in addition to Techniques What Is Data Mining Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown in addition to potentially useful) patterns or knowledge from huge amount of data Data mining: a misnomer Alternative names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, in as long as mation harvesting, business intelligence, etc. Watch out: Is everything “data mining” Simple search in addition to query processing (Deductive) expert systems

Data Mining: Concepts in addition to Techniques Knowledge Discovery (KDD) Process Data mining—core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation Data Mining: Concepts in addition to Techniques Data Mining in addition to Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining In as long as mation Discovery Data Exploration Statistical Summary, Querying, in addition to Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Data Mining: Concepts in addition to Techniques Data Mining: Confluence of Multiple Disciplines

Data Mining: Concepts in addition to Techniques Why Not Traditional Data Analysis Tremendous amount of data Algorithms must be highly scalable to h in addition to le such as tera-bytes of data High-dimensionality of data Micro-array may have tens of thous in addition to s of dimensions High complexity of data Data streams in addition to sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks in addition to multi-linked data Heterogeneous databases in addition to legacy databases Spatial, spatiotemporal, multimedia, text in addition to Web data Software programs, scientific simulations New in addition to sophisticated applications Data Mining: Concepts in addition to Techniques Multi-Dimensional View of Data Mining Data to be mined Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. Multiple/integrated functions in addition to mining at multiple levels Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc. Data Mining: Concepts in addition to Techniques Data Mining: Classification Schemes General functionality Descriptive data mining Predictive data mining Different views lead to different classifications Data view: Kinds of data to be mined Knowledge view: Kinds of knowledge to be discovered Method view: Kinds of techniques utilized Application view: Kinds of applications adapted

Data Mining: Concepts in addition to Techniques Data Mining: On What Kinds of Data Database-oriented data sets in addition to applications Relational database, data warehouse, transactional database Advanced data sets in addition to advanced applications Data streams in addition to sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks in addition to multi-linked data Object-relational databases Heterogeneous databases in addition to legacy databases Spatial data in addition to spatiotemporal data Multimedia database Text databases The World-Wide Web Data Mining: Concepts in addition to Techniques Data Mining Functionalities Multidimensional concept description: Characterization in addition to discrimination Generalize, summarize, in addition to contrast data characteristics, e.g., dry vs. wet regions Frequent patterns, association, correlation vs. causality Diaper Beer [0.5%, 75%] (Correlation or causality) Classification in addition to prediction Construct models (functions) that describe in addition to distinguish classes or concepts as long as future prediction E.g., classify countries based on (climate), or classify cars based on (gas mileage) Predict some unknown or missing numerical values Data Mining: Concepts in addition to Techniques Data Mining Functionalities (2) Cluster analysis Class label is unknown: Group data to as long as m new classes, e.g., cluster houses to find distribution patterns Maximizing intra-class similarity & minimizing interclass similarity Outlier analysis Outlier: Data object that does not comply with the general behavior of the data Noise or exception Useful in fraud detection, rare events analysis Trend in addition to evolution analysis Trend in addition to deviation: e.g., regression analysis Sequential pattern mining: e.g., digital camera large SD memory Periodicity analysis Similarity-based analysis Other pattern-directed or statistical analyses

Data Mining: Concepts in addition to Techniques Top-10 Most Popular DM Algorithms: 18 Identified C in addition to idates (I) Classification 1. C4.5: Quinlan, J. R. C4.5: Programs as long as Machine Learning. Morgan Kaufmann., 1993. 2. CART: L. Breiman, J. Friedman, R. Olshen, in addition to C. Stone. Classification in addition to Regression Trees. Wadsworth, 1984. 3. K Nearest Neighbours (kNN): Hastie, T. in addition to Tibshirani, R. 1996. Discriminant Adaptive Nearest Neighbor Classification. TPAMI. 18(6) 4. Naive Bayes H in addition to , D.J., Yu, K., 2001. Idiot’s Bayes: Not So Stupid After All Internat. Statist. Rev. 69, 385-398. Statistical Learning 5. SVM: Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag. 6. EM: McLachlan, G. in addition to Peel, D. (2000). Finite Mixture Models. J. Wiley, New York. Association Analysis 7. Apriori: Rakesh Agrawal in addition to Ramakrishnan Srikant. Fast Algorithms as long as Mining Association Rules. In VLDB ’94. 8. FP-Tree: Han, J., Pei, J., in addition to Yin, Y. 2000. Mining frequent patterns without c in addition to idate generation. In SIGMOD ’00. Data Mining: Concepts in addition to Techniques The 18 Identified C in addition to idates (II) Link Mining 9. PageRank: Brin, S. in addition to Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. In WWW-7, 1998. 10. HITS: Kleinberg, J. M. 1998. Authoritative sources in a hyperlinked environment. SODA, 1998. Clustering 11. K-Means: MacQueen, J. B., Some methods as long as classification in addition to analysis of multivariate observations, in Proc. 5th Berkeley Symp. Mathematical Statistics in addition to Probability, 1967. 12. BIRCH: Zhang, T., Ramakrishnan, R., in addition to Livny, M. 1996. BIRCH: an efficient data clustering method as long as very large databases. In SIGMOD ’96. Bagging in addition to Boosting 13. AdaBoost: Freund, Y. in addition to Schapire, R. E. 1997. A decision-theoretic generalization of on-line learning in addition to an application to boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119-139. Data Mining: Concepts in addition to Techniques The 18 Identified C in addition to idates (III) Sequential Patterns 14. GSP: Srikant, R. in addition to Agrawal, R. 1996. Mining Sequential Patterns: Generalizations in addition to Per as long as mance Improvements. In Proceedings of the 5th International Conference on Extending Database Technology, 1996. 15. PrefixSpan: J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal in addition to M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In ICDE ’01. Integrated Mining 16. CBA: Liu, B., Hsu, W. in addition to Ma, Y. M. Integrating classification in addition to association rule mining. KDD-98. Rough Sets 17. Finding reduct: Zdzislaw Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Norwell, MA, 1992 Graph Mining 18. gSpan: Yan, X. in addition to Han, J. 2002. gSpan: Graph-Based Substructure Pattern Mining. In ICDM ’02.

Data Mining: Concepts in addition to Techniques Top-10 Algorithm Finally Selected at ICDM’06 1: C4.5 (61 votes) 2: K-Means (60 votes) 3: SVM (58 votes) 4: Apriori (52 votes) 5: EM (48 votes) 6: PageRank (46 votes) 7: AdaBoost (45 votes) 7: kNN (45 votes) 7: Naive Bayes (45 votes) 10: CART (34 votes) Data Mining: Concepts in addition to Techniques Major Issues in Data Mining Mining methodology Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web Per as long as mance: efficiency, effectiveness, in addition to scalability Pattern evaluation: the interestingness problem Incorporation of background knowledge H in addition to ling noise in addition to incomplete data Parallel, distributed in addition to incremental mining methods Integration of the discovered knowledge with existing one: knowledge fusion User interaction Data mining query languages in addition to ad-hoc mining Expression in addition to visualization of data mining results Interactive mining of knowledge at multiple levels of abstraction Applications in addition to social impacts Domain-specific data mining & invisible data mining Protection of data security, integrity, in addition to privacy Data Mining: Concepts in addition to Techniques A Brief History of Data Mining Society 1989 IJCAI Workshop on Knowledge Discovery in Databases Knowledge Discovery in Databases (G. Piatetsky-Shapiro in addition to W. Frawley, 1991) 1991-1994 Workshops on Knowledge Discovery in Databases Advances in Knowledge Discovery in addition to Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, in addition to R. Uthurusamy, 1996) 1995-1998 International Conferences on Knowledge Discovery in Databases in addition to Data Mining (KDD’95-98) Journal of Data Mining in addition to Knowledge Discovery (1997) ACM SIGKDD conferences since 1998 in addition to SIGKDD Explorations More conferences on data mining PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc. ACM Transactions on KDD starting in 2007

Data Mining: Concepts in addition to Techniques Conferences in addition to Journals on Data Mining KDD Conferences ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases in addition to Data Mining (KDD) SIAM Data Mining Conf. (SDM) (IEEE) Int. Conf. on Data Mining (ICDM) Conf. on Principles in addition to practices of Knowledge Discovery in addition to Data Mining (PKDD) Pacific-Asia Conf. on Knowledge Discovery in addition to Data Mining (PAKDD) Other related conferences ACM SIGMOD VLDB (IEEE) ICDE WWW, SIGIR ICML, CVPR, NIPS Journals Data Mining in addition to Knowledge Discovery (DAMI or DMKD) IEEE Trans. On Knowledge in addition to Data Eng. (TKDE) KDD Explorations ACM Trans. on KDD Data Mining: Concepts in addition to Techniques Where to Find References DBLP, CiteSeer, Google Data mining in addition to KDD (SIGKDD: CDROM) Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Journal: Data Mining in addition to Knowledge Discovery, KDD Explorations, ACM TKDD Database systems (SIGMOD: ACM SIGMOD Anthology—CD ROM) Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc. AI & Machine Learning Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), CVPR, NIPS, etc. Journals: Machine Learning, Artificial Intelligence, Knowledge in addition to In as long as mation Systems, IEEE-PAMI, etc. Web in addition to IR Conferences: SIGIR, WWW, CIKM, etc. Journals: WWW: Internet in addition to Web In as long as mation Systems, Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Visualization Conference proceedings: CHI, ACM-SIGGraph, etc. Journals: IEEE Trans. visualization in addition to computer graphics, etc. Data Mining: Concepts in addition to Techniques Recommended Reference Books S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex in addition to Semi-Structured Data. Morgan Kaufmann, 2002 R. O. Duda, P. E. Hart, in addition to D. G. Stork, Pattern Classification, 2ed., Wiley-Interscience, 2000 T. Dasu in addition to T. Johnson. Exploratory Data Mining in addition to Data Cleaning. John Wiley & Sons, 2003 U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, in addition to R. Uthurusamy. Advances in Knowledge Discovery in addition to Data Mining. AAAI/MIT Press, 1996 U. Fayyad, G. Grinstein, in addition to A. Wierse, In as long as mation Visualization in Data Mining in addition to Knowledge Discovery, Morgan Kaufmann, 2001 J. Han in addition to M. Kamber. Data Mining: Concepts in addition to Techniques. Morgan Kaufmann, 2nd ed., 2006 D. J. H in addition to , H. Mannila, in addition to P. Smyth, Principles of Data Mining, MIT Press, 2001 T. Hastie, R. Tibshirani, in addition to J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, in addition to Prediction, Springer-Verlag, 2001 B. Liu, Web Data Mining, Springer 2006. T. M. Mitchell, Machine Learning, McGraw Hill, 1997 G. Piatetsky-Shapiro in addition to W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991 P.-N. Tan, M. Steinbach in addition to V. Kumar, Introduction to Data Mining, Wiley, 2005 S. M. Weiss in addition to N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998 I. H. Witten in addition to E. Frank, Data Mining: Practical Machine Learning Tools in addition to Techniques with Java Implementations, Morgan Kaufmann, 2nd ed. 2005

Kenyon, Paulette Fish Sniffer Food Editor www.phwiki.com

Data Mining: Concepts in addition to Techniques Summary Data mining: Discovering interesting patterns from large amounts of data A natural evolution of database technology, in great dem in addition to , with wide applications A KDD process includes data cleaning, data integration, data selection, trans as long as mation, data mining, pattern evaluation, in addition to knowledge presentation Mining can be per as long as med in a variety of in as long as mation repositories Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier in addition to trend analysis, etc. Data mining systems in addition to architectures Major issues in data mining Data Mining: Concepts in addition to Techniques Supplementary Lecture Slides Note: The slides following the end of chapter summary are supplementary slides that could be useful as long as supplementary readings or teaching These slides may have its corresponding text contents in the book chapters, but were omitted due to limited time in author’s own course lecture The slides in other chapters have similar convention in addition to treatment Data Mining: Concepts in addition to Techniques Why Data Mining—Potential Applications Data analysis in addition to decision support Market analysis in addition to management Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation Risk analysis in addition to management Forecasting, customer retention, improved underwriting, quality control, competitive analysis Fraud detection in addition to detection of unusual patterns (outliers) Other Applications Text mining (news group, email, documents) in addition to Web mining Stream data mining Bioin as long as matics in addition to bio-data analysis

Data Mining: Concepts in addition to Techniques Ex. 1: Market Analysis in addition to Management Where does the data come from—Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies Target marketing Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time Cross-market analysis—Find associations/co-relations between product sales, & predict based on such association Customer profiling—What types of customers buy what products (clustering or classification) Customer requirement analysis Identify the best products as long as different groups of customers Predict what factors will attract new customers Provision of summary in as long as mation Multidimensional summary reports Statistical summary in as long as mation (data central tendency in addition to variation) Data Mining: Concepts in addition to Techniques Ex. 2: Corporate Analysis & Risk Management Finance planning in addition to asset evaluation cash flow analysis in addition to prediction contingent claim analysis to evaluate assets cross-sectional in addition to time series analysis (financial-ratio, trend analysis, etc.) Resource planning summarize in addition to compare the resources in addition to spending Competition monitor competitors in addition to market directions group customers into classes in addition to a class-based pricing procedure set pricing strategy in a highly competitive market Data Mining: Concepts in addition to Techniques Ex. 3: Fraud Detection & Mining Unusual Patterns Approaches: Clustering & model construction as long as frauds, outlier analysis Applications: Health care, retail, credit card service, telecomm. Auto insurance: ring of collisions Money laundering: suspicious monetary transactions Medical insurance Professional patients, ring of doctors, in addition to ring of references Unnecessary or correlated screening tests Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm Retail industry Analysts estimate that 38% of retail shrink is due to dishonest employees Anti-terrorism

Data Mining: Concepts in addition to Techniques Architecture: Typical Data Mining System

Kenyon, Paulette Food Editor

Kenyon, Paulette is from United States and they belong to Fish Sniffer and they are from  Elk Grove, United States got related to this Particular Journal. and Kenyon, Paulette deal with the subjects like Fishing

Journal Ratings by Cincinnati School of Medical Massage

This Particular Journal got reviewed and rated by Cincinnati School of Medical Massage and short form of this particular Institution is OH and gave this Journal an Excellent Rating.