Chapter 11: Indexing in addition to Hashing Chapter 12: Indexing in addition to Hashing Basic Concepts

Chapter 11: Indexing in addition to Hashing Chapter 12: Indexing in addition to Hashing Basic Concepts

Chapter 11: Indexing in addition to Hashing Chapter 12: Indexing in addition to Hashing Basic Concepts

Cantelmo, Gregg, Host has reference to this Academic Journal, PHwiki organized this Journal Chapter 11: Indexing in addition to Hashing Chapter 12: Indexing in addition to Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing in addition to Hashing Index Definition in SQL Multiple-Key Access Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key – attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the as long as m Index files are typically much smaller than the original file Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uni as long as mly across “buckets” using a “hash function”. search-key pointer

Brigham Young University US

This Particular University is Related to this Particular Journal

Index Evaluation Metrics Access types supported efficiently. E.g., records with a specified value in the attribute or records with an attribute value falling in a specified range of values. Access time Insertion time Deletion time Space overhead Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index The search key of a primary index is usually but not necessarily the primary key. Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index. Dense Index Files Dense index — Index record appears as long as every search-key value in the file. E.g. index on ID attribute of instructor relation

Dense Index Files (Cont.) Dense index on dept-name, with instructor file sorted on dept-name Sparse Index Files Sparse Index: contains index records as long as only some search-key values. Applicable when records are sequentially ordered on search-key To locate a record with search-key value K we: Find index record with largest search-key value < K Search file sequentially starting at the record to which the index record points Sparse Index Files (Cont.) Compared to dense indices: Less space in addition to less maintenance overhead as long as insertions in addition to deletions. Generally slower than dense index as long as locating records. Good tradeoff: sparse index with an index entry as long as every block in file, corresponding to least search-key value in the block. Secondary Indices Example Index record points to a bucket that contains pointers to all the actual records with that particular search-key value. Secondary indices have to be dense Secondary index on salary field of instructor Primary in addition to Secondary Indices Indices offer substantial benefits when searching as long as records. BUT: Updating indices imposes overhead on database modification -when a file is modified, every index on the file must be updated, Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive Each record access may fetch a new block from disk Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds as long as memory access Multilevel Index If primary index does not fit in memory, access becomes expensive. Solution: treat primary index kept on disk as a sequential file in addition to construct a sparse index on it. outer index – a sparse index of primary index inner index – the primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, in addition to so on. Indices at all levels must be updated on insertion or deletion from the file. Multilevel Index (Cont.) Index Update: Deletion Single-level index entry deletion: Dense indices – deletion of search-key is similar to file record deletion. Sparse indices – if an entry as long as the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced. If deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also. Index Update: Insertion Single-level index insertion: Per as long as m a lookup using the search-key value appearing in the record to be inserted. Dense indices – if the search-key value does not appear in the index, insert it. Sparse indices – if index stores an entry as long as each block of the file, no change needs to be made to the index unless a new block is created. If a new block is created, the first search-key value appearing in the new block is inserted into the index. Multilevel insertion in addition to deletion: algorithms are simple extensions of the single-level algorithms Secondary Indices Frequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition. Example 1: In the instructor relation stored sequentially by ID, we may want to find all instructors in a particular department Example 2: as above, but where we want to find all instructors with a specified salary or with salary in a specified range of values We can have a secondary index with an index record as long as each search-key value B+-Tree Index Files Disadvantage of indexed-sequential files per as long as mance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required. Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions in addition to deletions. Reorganization of entire file is not required to maintain per as long as mance. (Minor) disadvantage of B+-trees: extra insertion in addition to deletion overhead, space overhead. Advantages of B+-trees outweigh disadvantages B+-trees are used extensively B+-tree indices are an alternative to indexed-sequential files. Example of B+-Tree B+-Tree Index Files (Cont.) All paths from root to leaf are of the same length Each node that is not a root or a leaf has between n/2 in addition to n children. A leaf node has between (n–1)/2 in addition to n–1 values Special cases: If the root is not a leaf, it has at least 2 children. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 in addition to (n–1) values. A B+-tree is a rooted tree satisfying the following properties: B+-Tree Node Structure Typical node Ki are the search-key values Pi are pointers to children ( as long as non-leaf nodes) or pointers to records or buckets of records ( as long as leaf nodes). The search-keys in a node are ordered K1 < K2 < K3 < < Kn–1 (Initially assume no duplicate keys, address duplicates later) Leaf Nodes in B+-Trees For i = 1, 2, , n–1, pointer Pi points to a file record with search-key value Ki, If Li, Lj are leaf nodes in addition to i < j, Li’s search-key values are less than or equal to Lj’s search-key values Pn points to next leaf node in search-key order Properties of a leaf node: Non-Leaf Nodes in B+-Trees Non leaf nodes as long as m a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: All the search-keys in the subtree to which P1 points are less than K1 For 2 i n – 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki–1 in addition to less than Ki All the search-keys in the subtree to which Pn points have values greater than or equal to Kn–1 Example of B+-tree Leaf nodes must have between 3 in addition to 5 values ((n–1)/2 in addition to n –1, with n = 6). Non-leaf nodes other than root must have between 3 in addition to 6 children ((n/2 in addition to n with n =6). Root must have at least 2 children. B+-tree as long as instructor file (n = 6) Observations about B+-trees Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close. The non-leaf levels of the B+-tree as long as m a hierarchy of sparse indices. The B+-tree contains a relatively small number of levels Level below root has at least 2 n/2 values Next level has at least 2 n/2 n/2 values etc. If there are K search-key values in the file, the tree height is no more than logn/2(K) thus searches can be conducted efficiently. Insertions in addition to deletions to the main file can be h in addition to led efficiently, as the index can be restructured in logarithmic time (as we shall see). Cantelmo, Gregg KPXQ-AM Host

Queries on B+-Trees Find record with search-key value V. C=root While C is not a leaf node { Let i be least value s.t. V Ki. If no such exists, set C = last non-null pointer in C Else { if (V= Ki ) Set C = Pi +1 else set C = Pi} } Let i be least value s.t. Ki = V If there is such a value i, follow pointer Pi to the desired record. Else no record with search-key value k exists. H in addition to ling Duplicates With duplicate search keys In both leaf in addition to internal nodes, we cannot guarantee that K1 < K2 < K3 < < Kn–1 but can guarantee K1 K2 K3 Kn–1 Search-keys in the subtree to which Pi points are Ki,, but not necessarily < Ki, To see why, suppose same search key value V is present in two leaf node Li in addition to Li+1. Then in parent node Ki must be equal to V H in addition to ling Duplicates We modify find procedure as follows traverse Pi even if V = Ki As soon as we reach a leaf node C check if C has only search key values less than V if so set C = right sibling of C be as long as e checking whether C contains V Procedure printAll uses modified find procedure to find first occurrence of V Traverse through consecutive leaves to find all occurrences of V Errata note: modified find procedure missing in first printing of 6th edition Queries on B+-Trees (Cont.) If there are K search-key values in the file, the height of the tree is no more than logn/2(K). A node is generally the same size as a disk block, typically 4 kilobytes in addition to n is typically around 100 (40 bytes per index entry). With 1 million search key values in addition to n = 100 at most log50(1,000,000) = 4 nodes are accessed in a lookup. Contrast this with a balanced binary tree with 1 million search key values — around 20 nodes are accessed in a lookup above difference is significant since every node access may need a disk I/O, costing around 20 milliseconds Updates on B+-Trees: Insertion Find the leaf node in which the search-key value would appear If the search-key value is already present in the leaf node Add record to the file If necessary add a pointer to the bucket. If the search-key value is not present, then add the record to the main file ( in addition to create a bucket if necessary) If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node Otherwise, split the node (along with the new (key-value, pointer) entry) as discussed in the next slide. Updates on B+-Trees: Insertion (Cont.) Splitting a leaf node: take the n (search-key value, pointer) pairs (including the one being inserted) in sorted order. Place the first n/2 in the original node, in addition to the rest in a new node. let the new node be p, in addition to let k be the least key value in p. Insert (k,p) in the parent of the node being split. If the parent is full, split it in addition to propagate the split further up. Splitting of nodes proceeds upwards till a node that is not full is found. In the worst case the root node may be split increasing the height of the tree by 1. Result of splitting node containing Br in addition to t, Califieri in addition to Crick on inserting Adams Next step: insert entry with (Califieri,pointer-to-new-node) into parent Example Grid File as long as account Queries on a Grid File A grid file on two attributes A in addition to B can h in addition to le queries of all following as long as ms with reasonable efficiency (a1 A a2) (b1 B b2) (a1 A a2 b1 B b2),. E.g., to answer (a1 A a2 b1 B b2), use linear scales to find corresponding c in addition to idate grid array cells, in addition to look up all the buckets pointed to from those cells. Grid Files (Cont.) During insertion, if a bucket becomes full, new bucket can be created if more than one cell points to it. Idea similar to extendable hashing, but on multiple dimensions If only one cell points to it, either an overflow bucket must be created or the grid size must be increased Linear scales must be chosen to uni as long as mly distribute records across cells. Otherwise there will be too many overflow buckets. Periodic re-organization to increase grid size will help. But reorganization can be very expensive. Space overhead of grid array can be high. R-trees (Chapter 23) are an alternative

Cantelmo, Gregg Host

Cantelmo, Gregg is from United States and they belong to KPXQ-AM and they are from  Chandler, United States got related to this Particular Journal. and Cantelmo, Gregg deal with the subjects like Books and Literature; Christian (non-Catholic)

Journal Ratings by Brigham Young University

This Particular Journal got reviewed and rated by Brigham Young University and short form of this particular Institution is US and gave this Journal an Excellent Rating.