MySQL :: MySQL 8.0 Reference Manual :: 8.3.9 Comparison of B-Tree and Hash Indexes https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html
If you use ... LIKE '%_`string`_%'
and string
is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.
Turbo-BM algorithm http://igm.univ-mlv.fr/~lecroq/string/node15.html
Boyer-Moore algorithm http://igm.univ-mlv.fr/~lecroq/string/node14.html#SECTION00140
The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the «search» and «substitute» commands.
The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift).
Assume that a mismatch occurs between the character x[_i_]=a of the pattern and the character y[_i_+j]=b of the text during an attempt at position j.
Then, x[_i_+1 .. m-1]=y[_i_+j+1 .. j+m-1]=u and x[_i_] y[_i_+j]. The good-suffix shift consists in aligning the segment y[_i_+j+1 .. j+m-1]=x[_i_+1 .. m-1] with its rightmost occurrence in x that is preceded by a character different from x[_i_](see figure 13.1).
Figure 13.1. The good-suffix shift, u re-occurs preceded by a character c different from a.
If there exists no such segment, the shift consists in aligning the longest suffix v of y[_i_+j+1 .. j+m-1] with a matching prefix of x (see figure 13.2).
Figure 13.2. The good-suffix shift, only a suffix of u re-occurs in x.
The bad-character shift consists in aligning the text character y[_i_+j] with its rightmost occurrence in x[0 .. m-2]. (see figure 13.3)
Figure 13.3. The bad-character shift, a occurs in x.
If y[_i_+j] does not occur in the pattern x, no occurrence of x in y can include y[_i_+j], and the left end of the window is aligned with the character immediately after y[_i_+j], namely y[_i_+j+1] (see figure 13.4).
Figure 13.4. The bad-character shift, b does not occur in x.
Note that the bad-character shift can be negative, thus for shifting the window, the Boyer-Moore algorithm applies the maximum between the the good-suffix shift and bad-character shift. More formally the two shift functions are defined as follows.
The good-suffix shift function is stored in a table bmGs of size m+1.
Let us define two conditions:
Cs(i, s): for each k such that i < k < m, s k or x[_k_-_s_]=x[_k_] and
Co(i, s): if s <i then x[_i_-_s_] x[_i_]
Then, for 0 i < m: bmGs[_i_+1]=min{s>0 : Cs(i, s) and Co(i, s) hold}
and we define bmGs[0] as the length of the period of x. The computation of the table bmGs use a table suff defined as follows: for 1 i < m, suff[_i_]=max{k : x[_i_-_k_+1 .. i]=x[_m_-_k_ .. m-1]}
The bad-character shift function is stored in a table bmBc of size . For c in : bmBc[_c_] = min{i : 1 i <m-1 and x[_m_-1-i]=c} if c occurs in x, m otherwise.
Tables bmBc and bmGs can be precomputed in time O(m+) before the searching phase and require an extra-space in O(m+). The searching phase time complexity is quadratic but at most 3_n_ text character comparisons are performed when searching for a non periodic pattern. On large alphabets (relatively to the length of the pattern) the algorithm is extremely fast. When searching for a_m_-1b in b_n_ the algorithm makes only O(n / m) comparisons, which is the absolute minimum for any string-matching algorithm in the model where the pattern only is preprocessed.
The C code
void preBmBc(char *x, int m, int bmBc[]) {
int i;
for (i = 0; i < ASIZE; ++i)
bmBc[i] = m;
for (i = 0; i < m - 1; ++i)
bmBc[x[i]] = m - i - 1;
}
void suffixes(char *x, int m, int *suff) {
int f, g, i;
suff[m - 1] = m;
g = m - 1;
for (i = m - 2; i >= 0; --i) {
if (i > g && suff[i + m - 1 - f] < i - g)
suff[i] = suff[i + m - 1 - f];
else {
if (i < g)
g = i;
f = i;
while (g >= 0 && x[g] == x[g + m - 1 - f])
--g;
suff[i] = f - g;
}
}
}
void preBmGs(char *x, int m, int bmGs[]) {
int i, j, suff[XSIZE];
suffixes(x, m, suff);
for (i = 0; i < m; ++i)
bmGs[i] = m;
j = 0;
for (i = m - 1; i >= 0; --i)
if (suff[i] == i + 1)
for (; j < m - 1 - i; ++j)
if (bmGs[j] == m)
bmGs[j] = m - 1 - i;
for (i = 0; i <= m - 2; ++i)
bmGs[m - 1 - suff[i]] = m - 1 - i;
}
void BM(char *x, int m, char *y, int n) {
int i, j, bmGs[XSIZE], bmBc[ASIZE];
/* Preprocessing */
preBmGs(x, m, bmGs);
preBmBc(x, m, bmBc);
/* Searching */
j = 0;
while (j <= n - m) {
for (i = m - 1; i >= 0 && x[i] == y[i + j]; --i);
if (i < 0) {
OUTPUT(j);
j += bmGs[0];
}
else
j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);
}
}
Preprocessing phase
mBc and bmGs tables used by Boyer-Moore algorithm
Searching phase
手机扫一扫
移动阅读更方便
你可能感兴趣的文章