This paper first introduces the Maximally Stable Extremal Region(MSER) detector,which is affine invariant.According to disjoint-set forests data structure and union-find,the extremal regions are extracted.And,combing the component tree and maximally stable extremal condition,the MSERs are obtained.Then the SIFT descriptors,which are used as local feature at low level,are produced in the MSER and then clustered into the visual "words".By using standard weighting,the query region is selected by the rectangle ...