Efficient Query Filtering as long as Streaming Time Series Outline of Talk What are Time Series Time Series are Everywhere Time Series Data Mining Tasks

Efficient Query Filtering as long as Streaming Time Series Outline of Talk What are Time Series Time Series are Everywhere Time Series Data Mining Tasks www.phwiki.com

Efficient Query Filtering as long as Streaming Time Series Outline of Talk What are Time Series Time Series are Everywhere Time Series Data Mining Tasks

Kovacevic, Katarina, Freelance Writer has reference to this Academic Journal, PHwiki organized this Journal Efficient Query Filtering as long as Streaming Time Series ICDM ’05 Outline of Talk Introduction to time series Time series filtering Wedge-based approach Experimental results Conclusions What are Time Series Time series are collections of observations made sequentially in time. 4.7275 4.7083 4.6700 4.6600 4.6617 4.6517 4.6500 4.6500 4.6917 4.7533 4.8233 4.8700 4.8783 4.8700 4.8500 4.8433 4.8383 4.8400 4.8433

Arizona State University, Downtown Phoenix Campus US www.phwiki.com

This Particular University is Related to this Particular Journal

Time Series are Everywhere ECG Heartbeat Image Stock Video Time Series Data Mining Tasks Clustering Classification Query by Content Rule Discovery s = 0.5 c = 0.3 Motif Discovery Anomaly Detection Visualization 10 2 1 4 3 7 6 5 9 8 10 11 12 C in addition to idates Time Series Filtering Given a Time Series T, a set of C in addition to idates C in addition to a distance threshold r, find all subsequences in T that are within r distance to any of the c in addition to idates in C. Matches Q11 Time Series

2 1 4 3 7 6 5 9 8 10 11 12 Queries Matches Q11 Database Database Query (template) 2 1 4 3 5 7 6 9 8 10 Database Best match Filtering vs. Querying Euclidean Distance Metric Given two time series Q = q1 qn in addition to C = c1 cn , the Euclidean distance between them is defined as: Early Ab in addition to on During the computation, if current sum of the squared differences between each pair of corresponding data points exceeds r 2, we can safely stop the calculation.

2 1 4 3 7 6 5 9 8 10 11 12 C in addition to idates Classic Approach Individually compare each c in addition to idate sequence to the query using the early ab in addition to oning algorithm. Time Series Wedge C2 C1 U L Q W W Having c in addition to idate sequences C1, , Ck , we can as long as m two new sequences U in addition to L : Ui = max(C1i , , Cki ) Li = min(C1i , , Cki ) They as long as m the smallest possible bounding envelope that encloses sequences C1, ,Ck . We call the combination of U in addition to L a wedge, in addition to denote a wedge as W. W = {U, L} A lower bounding measure between an arbitrary query Q in addition to the entire set of c in addition to idate sequences contained in a wedge W: Generalized Wedge Use W(1,2) to denote that a wedge is built from sequences C1 in addition to C2 . Wedges can be hierarchally nested. For example, W((1,2),3) consists of W(1,2) in addition to C3 . W((1, 2), 3)

2 1 4 3 7 6 5 9 8 10 11 12 C in addition to idates Wedge Based Approach Compare the query to the wedge using LB-Keogh If the LB-Keogh function early ab in addition to ons, we are done Otherwise individually compare each c in addition to idate sequences to the query using the early ab in addition to oning algorithm Time Series Examples of Wedge Merging W((1, 2), 3) Hierarchal Clustering C1 (or W1) C4 (or W4) C2 (or W2) C5 (or W5) C3 (or W3) Which wedge set to choose

Which Wedge Set to Choose Test all k wedge sets on a representative sample of data Choose the wedge set which per as long as ms the best Upper Bound on Wedge Based Approach Wedge based approach seems to be efficient when comparing a set of time series to a large batch dataset. But, what about streaming time series Streaming algorithms are limited by their worst case. Being efficient on average does not help. Worst case W((1, 2), 3) Subsequence W((2,5),3) W(1,4) W(((2,5),3), (1,4)) K = 5 K = 4 K = 3 K = 2 K = 1 If dist(W((2,5),3), W(1,4)) >= 2 r fails cannot fail on both wedges Subsequence >= 2r < r Triangular Inequality Experimental Setup Datasets ECG Dataset Stock Dataset Audio Dataset We measure the number of computational steps used by the following methods: Brute as long as ce Brute as long as ce with early ab in addition to oning (classic) Our approach (Atomic Wedgie) Our approach with r in addition to om wedge set (AWR) How to choose r A logical value as long as r would be the average distance from a pattern to its nearest neighbor ECG Dataset Batch time series 650,000 data points (half an hour’s ECG signals) C in addition to idate set 200 time series of length 40 4 types of patterns left bundle branch block beat right bundle branch block beat atrial premature beat ventricular escape beat r = 0.5 Upper Bound: 2,120 (8,000 as long as brute as long as ce) Stock Dataset Batch time series 2,119,415 data points C in addition to idate set 337 time series with length 128 3 types of patterns head in addition to shoulders reverse head in addition to shoulders cup in addition to h in addition to le r = 4.3 Upper Bound: 18,048 (43,136 as long as brute as long as ce) Audio Dataset Batch time series 37,583,512 data points (one hour’s sound) C in addition to idate set 68 time series with length 51 3 species of harmful mosquitoes Culex quinquefasciatus Aedes aegypti Culiseta spp Sliding window: 11,025 (1 second) Step: 5,512 (0.5 second) r = 2 Upper Bound: 2,929 (6,868 as long as brute as long as ce) Conclusions We introduce the problem of time series filtering. Combining similar sequences into a wedge is a quite promising idea. We have provided the upper bound of the cost of the algorithm to compute the fastest arrival rate we can guarantee to h in addition to le. Future Work Dynamic wedge set choosing as long as data with concept shifting Extension to other distance measures, as long as example, DTW (Dynamic Time Warping) in addition to uni as long as m scaling Kovacevic, Katarina SheKnows.com Freelance Writer www.phwiki.com

Questions All datasets used in this talk can be found at http://www.cs.ucr.edu/~wli/ICDM05/ Normalize a data sequence C to have mean = 0 in addition to st in addition to ard deviation = 1 C = (C – mean(C )) / std(C ) 0 100 200 300 400 500 600 700 800 900 1000 Z-Normalization

Kovacevic, Katarina Freelance Writer

Kovacevic, Katarina is from United States and they belong to SheKnows.com and they are from  Scottsdale, United States got related to this Particular Journal. and Kovacevic, Katarina deal with the subjects like Features/Lifestyle; Health and Wellness; Human Interest; Women’s Interest

Journal Ratings by Arizona State University, Downtown Phoenix Campus

This Particular Journal got reviewed and rated by Arizona State University, Downtown Phoenix Campus and short form of this particular Institution is US and gave this Journal an Excellent Rating.