Classification I have a box of apples Decision Tree Classifier Decision Tree Classification How do we construct the decision tree

Classification I have a box of apples Decision Tree Classifier Decision Tree Classification How do we construct the decision tree

Classification I have a box of apples Decision Tree Classifier Decision Tree Classification How do we construct the decision tree

Watters, Carrie, Northwest Valley Economic Development Reporter has reference to this Academic Journal, PHwiki organized this Journal We have seen that we can do machine learning on data that is in the nice “flat file” as long as mat Rows are objects Columns are features Taking a real problem in addition to “massaging” it into this as long as mat is domain dependent, but often the most fun part of machine learning. Let see just one example . (Western Pipistrelle (Parastrellus hesperus) Photo by Michael Durham Western pipistrelle calls A spectrogram of a bat call.

ITT Technical Institute Maitland US

This Particular University is Related to this Particular Journal

Characteristic frequency Call duration We can easily measure two features of bat calls. Their characteristic frequency in addition to their call duration

Classification We have seen 2 classification techniques: Simple linear classifier, Nearest neighbor,. Let us see two more techniques: Decision tree, Naïve Bayes There are other techniques: Neural Networks, Support Vector Machines, that we will not consider

I have a box of apples All bad All good 0 0.5 1 H(X) Pr(X = good) = p then Pr(X = bad) = 1 p the entropy of X is given by 0 1 binary entropy function attains its maximum value when p = 0.5 Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length Abdomen Length > 7.1 no yes Katydid Antenna Length > 6.0 no yes Katydid Grasshopper Grasshopper Antennae shorter than body Cricket Foretiba has ears Katydids Camel Cricket Yes Yes Yes No No 3 Tarsi No Decision trees predate computers

Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify in addition to remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree Decision Tree Classification Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide- in addition to -conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they can be discretized in advance) Examples are partitioned recursively based on selected attributes. Test attributes are selected on the basis of a heuristic or statistical measure (e.g., in as long as mation gain) Conditions as long as stopping partitioning All samples as long as a given node belong to the same class There are no remaining attributes as long as further partitioning – majority voting is employed as long as classifying the leaf There are no samples left How do we construct the decision tree In as long as mation Gain as A Splitting Criteria Select the attribute with the highest in as long as mation gain (in as long as mation gain is the expected reduction in entropy). Assume there are two classes, P in addition to N Let the set of examples S contain p elements of class P in addition to n elements of class N The amount of in as long as mation, needed to decide if an arbitrary example in S belongs to P or N is defined as 0 log(0) is defined as 0

In as long as mation Gain in Decision Tree Induction Assume that using attribute A, a current set will be partitioned into some number of child sets The encoding in as long as mation that would be gained by branching on A Note: entropy is at its minimum if the collection of objects is completely uni as long as m Hair Length <= 5 yes no Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = 0.8113 Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = 0.9710 Gain(Hair Length <= 5) = 0.9911 – (4/9 0.8113 + 5/9 0.9710 ) = 0.0911 Let us try splitting on Hair length Weight <= 160 yes no Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = 0.7219 Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Gain(Weight <= 160) = 0.9911 – (5/9 0.7219 + 4/9 0 ) = 0.5900 Let us try splitting on Weight age <= 40 yes no Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911 Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = 0.9183 Gain(Age <= 40) = 0.9911 – (6/9 1 + 3/9 0.9183 ) = 0.0183 Let us try splitting on Age Weight <= 160 yes no Hair Length <= 2 yes no Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified So we simply recurse! This time we find that we can split on Hair length, in addition to we are done! Weight <= 160 yes no Hair Length <= 2 yes no We need don’t need to keep the data around, just the test conditions. Male Male Female How would these people be classified It is trivial to convert Decision Trees to rules Weight <= 160 yes no Hair Length <= 2 yes no Male Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female Decision tree as long as a typical shared-care setting applying the system as long as the diagnosis of prostatic obstructions. Once we have learned the decision tree, we don’t even need a computer! This decision tree is attached to a medical machine, in addition to is designed to help nurses make decisions about what type of doctor to call. Watters, Carrie Arizona Republic - Arrowhead Branch, The Northwest Valley Economic Development Reporter

Garzotto M et al. JCO 2005;23:4322-4329 PSA = serum prostate-specific antigen levels PSAD = PSA density TRUS = transrectal ultrasound Wears green Yes No The worked examples we have seen were per as long as med on small datasets. However with small datasets there is a great danger of overfitting the data When you have few datapoints, there are many possible splitting rules that perfectly classify the data, but will not generalize to future datasets. For example, the rule “Wears green” perfectly classifies the data, so does “Mothers name is Jacqueline”, so does “Has blue shoes” Male Female Avoid Overfitting in Classification The generated tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy as long as unseen samples Two approaches to avoid overfitting Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the “best pruned tree”

Which of the “Pigeon Problems” can be solved by a Decision Tree Deep Bushy Tree Useless Deep Bushy Tree The Decision Tree has a hard time with correlated attributes Advantages: Easy to underst in addition to (Doctors love them!) Easy to generate rules Disadvantages: May suffer from overfitting. Classifies by rectangular partitioning (so does not h in addition to le correlated features very well). Can be quite large – pruning is necessary. Does not h in addition to le streaming data easily Advantages/Disadvantages of Decision Trees

The Supreme Court’s search in addition to seizure decisions, 1962–1984 terms. U = Unreasonable R = Reasonable Classification Problem: Fourth Amendment Cases be as long as e the Supreme Court II Keogh vs. State of Cali as long as nia = {0,1,1,0,0,0,1,0} Decision Tree as long as Supreme Court Justice S in addition to ra Day O’Connor We can also learn decision trees as long as individual Supreme Court Members. Using similar decision trees as long as the other eight justices, these models correctly predicted the majority opinion in 75 percent of the cases, substantially outper as long as ming the experts’ 59 percent.

Watters, Carrie Northwest Valley Economic Development Reporter

Watters, Carrie is from United States and they belong to Arizona Republic – Arrowhead Branch, The and they are from  Glendale, United States got related to this Particular Journal. and Watters, Carrie deal with the subjects like Business; Economy/Economic Issues; Local News

Journal Ratings by ITT Technical Institute Maitland

This Particular Journal got reviewed and rated by ITT Technical Institute Maitland and short form of this particular Institution is US and gave this Journal an Excellent Rating.