Skip to content

Commit

Permalink
update stale links and standardize formatting
Browse files Browse the repository at this point in the history
  • Loading branch information
CGJennings committed Jun 19, 2020
1 parent fef2b3a commit ed4029c
Show file tree
Hide file tree
Showing 7 changed files with 141 additions and 137 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright © 2017 Christopher G. Jennings
Copyright © 2020 Christopher G. Jennings

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@

String search algorithms find occurrences of a pattern string in a text, like the search feature of a text editor. The FJS (Franek-Jennings-Smyth) algorithm is the fastest known string search algorithm under a wide variety of conditions. It combines the linear-time worst case guarantee of the well-known KMP (Knuth-Morris-Pratt) algorithm with the fast average-case performance of the BMS (Boyer-Moore-Sunday) algorithm.

[More information, including an interactive visualization.](https://cgjennings.ca/articles/fjs.html)
[More information, including an interactive visualization.](https://cgjennings.ca/articles/fjs/)

## The sample code

Sample implementations are currently provided in C and Java. Both implementations find *all* matches of the pattern string in the text, rather than simply finding the first or last.

### c/

The C implementation is meant as a starting point that you can customize to suit your specific needs. Note that it is also based on 8-bit characters. For wider characters you might want to adapt the simple hash strategy demonstrated by the Java code to improve performance on short texts. Another option is to process the string as 8-bit characters and ignore spurious matches. For example, cast pointers to 16-bit character strings to byte array pointers and then ignore "matches" that start at an odd offset.
The C implementation is meant as a starting point that you can customize to suit your specific needs. Note that it is also based on 8-bit characters. For wider characters you might want to adapt the simple hash strategy demonstrated by the Java code to improve performance on short texts. Another option is to process the string as 8-bit characters and ignore spurious matches. For example, cast pointers to 16-bit character strings to byte array pointers and then ignore "matches" that start on odd offsets.

### java/

Expand Down
2 changes: 1 addition & 1 deletion c/fjs.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
FJS is a very fast algorithm for finding every occurrence
of a string p of length m in a string x of length n.
For details see <https://cgjennings.ca/articles/fjs.html>.
For details see <https://cgjennings.ca/articles/fjs/>.
Christopher G. Jennings.
See LICENSE.md for license details (MIT license).
Expand Down
14 changes: 7 additions & 7 deletions java/ca/cgjennings/algo/BruteForceStringSearcher.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/* See LICENSE.md for license details (MIT license). */
package ca.cgjennings.algo;

import java.util.stream.IntStream;
Expand All @@ -11,23 +12,22 @@
public final class BruteForceStringSearcher implements StringSearcher {

/**
* Creates a new {@code StringSearcher} that uses brute force to find
* matches.
* Creates a new {@code StringSearcher} that uses brute force to find matches.
*/
public BruteForceStringSearcher() {
}

@Override
@SuppressWarnings( "empty-statement" )
public IntStream findAll( CharSequence p, CharSequence x ) {
@SuppressWarnings("empty-statement")
public IntStream findAll(CharSequence p, CharSequence x) {
final int m = p.length(), n = x.length();

final IntStream.Builder stream = IntStream.builder();
int i, j;

for( j=0; j <= n-m; ++j ) {
for( i=0; i < m && p.charAt(i) == x.charAt(i+j); ++i );
if( i >= p.length() ) {
for (j = 0; j <= n - m; ++j) {
for (i = 0; i < m && p.charAt(i) == x.charAt(i + j); ++i);
if (i >= p.length()) {
stream.accept(j);
}
}
Expand Down
76 changes: 39 additions & 37 deletions java/ca/cgjennings/algo/FJSStringSearcher.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/* See LICENSE.md for license details (MIT license). */
package ca.cgjennings.algo;

import java.util.Arrays;
Expand All @@ -12,6 +13,8 @@
* implementation (very rare in practice).
*
* @author Christopher G. Jennings
* @see <a href="https://cgjennings.ca/articles/fjs/">The FJS string matching
* algorithm</a>
*/
public final class FJSStringSearcher implements StringSearcher {

Expand All @@ -20,7 +23,7 @@ public final class FJSStringSearcher implements StringSearcher {
*/
public FJSStringSearcher() {
// reused since it does not depend on pattern size
delta = new int[ ALPHABET_HASH_SIZE ];
delta = new int[ALPHABET_HASH_SIZE];
}

// The hash size must be a power of 2; typical texts may not see a speedup
Expand All @@ -30,74 +33,73 @@ public FJSStringSearcher() {
private final int[] delta;

@Override
public IntStream findAll( CharSequence p, CharSequence x ) {
public IntStream findAll(CharSequence p, CharSequence x) {
final int n = x.length();
final int m = p.length();

if( m == 0 ) {
return IntStream.rangeClosed( 0, n );
if (m == 0) {
return IntStream.rangeClosed(0, n);
}
if( m > n ) {
if (m > n) {
return IntStream.empty();
}

final int beta[] = makeBeta( p );
@SuppressWarnings( "LocalVariableHidesMemberVariable" )
final int delta[] = makeDelta( p );
final int beta[] = makeBeta(p);
@SuppressWarnings("LocalVariableHidesMemberVariable")
final int delta[] = makeDelta(p);
final IntStream.Builder stream = IntStream.builder();

int mp = m-1, np = n-1, i = 0, ip = i+mp, j = 0;
int mp = m - 1, np = n - 1, i = 0, ip = i + mp, j = 0;

outer:
while( ip < np ) {
if( j <= 0 ) {
while( p.charAt(mp) != x.charAt(ip) ) {
ip += delta[x.charAt(ip+1) & HASH_MASK];
if( ip >= np ) {
outer: while (ip < np) {
if (j <= 0) {
while (p.charAt(mp) != x.charAt(ip)) {
ip += delta[x.charAt(ip + 1) & HASH_MASK];
if (ip >= np) {
break outer;
}
}
j = 0;
i = ip - mp;
while( (j < mp) && (x.charAt(i) == p.charAt(j)) ) {
while ((j < mp) && (x.charAt(i) == p.charAt(j))) {
++i;
++j;
}
if( j == mp ) {
stream.accept( i-mp );
if (j == mp) {
stream.accept(i - mp);
++i;
++j;
}
if( j <= 0 ) {
if (j <= 0) {
++i;
} else {
j = beta[j];
}
} else {
while( (j < m) && (x.charAt(i) == p.charAt(j)) ) {
while ((j < m) && (x.charAt(i) == p.charAt(j))) {
++i;
++j;
}
if( j == m ) {
stream.accept( i-m );
if (j == m) {
stream.accept(i - m);
}
j = beta[j];
}
ip = i + mp - j;
}

// check final alignment p[0..m-1] == x[n-m..n-1]
if( ip == np ) {
if( j < 0 ) {
if (ip == np) {
if (j < 0) {
j = 0;
}
i = n - m + j;
while( j < m && x.charAt(i) == p.charAt(j) ) {
while (j < m && x.charAt(i) == p.charAt(j)) {
++i;
++j;
}
if( j == m ) {
stream.accept( n-m );
if (j == m) {
stream.accept(n - m);
}
}

Expand All @@ -109,17 +111,17 @@ public IntStream findAll( CharSequence p, CharSequence x ) {
*
* @param pattern the search pattern
*/
private int[] makeDelta( CharSequence pattern ) {
private int[] makeDelta(CharSequence pattern) {
final int m = pattern.length();
@SuppressWarnings( "LocalVariableHidesMemberVariable" )
@SuppressWarnings("LocalVariableHidesMemberVariable")
final int[] delta = this.delta;

Arrays.fill( delta, m + 1 );
for( int i=0; i < m; ++i ) {
Arrays.fill(delta, m + 1);
for (int i = 0; i < m; ++i) {
final char ch = pattern.charAt(i);
final int slot = ch & HASH_MASK;
final int jump = m - i;
if( jump < delta[slot] ) {
if (jump < delta[slot]) {
delta[slot] = jump;
}
}
Expand All @@ -132,19 +134,19 @@ private int[] makeDelta( CharSequence pattern ) {
* @param pattern the search pattern
* @return a new β′ array based on the borders of the pattern
*/
private int[] makeBeta( CharSequence pattern ) {
private int[] makeBeta(CharSequence pattern) {
final int m = pattern.length();
final int[] beta = new int[ m + 1 ];
final int[] beta = new int[m + 1];
int i = 0, j = beta[0] = -1;

while( i < m ) {
while( (j > -1) && (pattern.charAt(i) != pattern.charAt(j)) ) {
while (i < m) {
while ((j > -1) && (pattern.charAt(i) != pattern.charAt(j))) {
j = beta[j];
}

++i;
++j;
if( (i < m) && (pattern.charAt(i) == pattern.charAt(j)) ) {
if ((i < m) && (pattern.charAt(i) == pattern.charAt(j))) {
beta[i] = beta[j];
} else {
beta[i] = j;
Expand Down
11 changes: 6 additions & 5 deletions java/ca/cgjennings/algo/StringSearcher.java
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/* See LICENSE.md for license details (MIT license). */
package ca.cgjennings.algo;

import java.util.stream.IntStream;
Expand All @@ -19,13 +20,13 @@
*/
public interface StringSearcher {
/**
* Finds all matches of the pattern within the text. Each entry in the
* returned {@code IntStream} is the index of one match. If the pattern does
* not occur in the text, an empty stream is returned.
* Finds all matches of the pattern within the text. Each entry in the returned
* {@code IntStream} is the index of one match. If the pattern does not occur in
* the text, an empty stream is returned.
*
* @param pattern the pattern to search for
* @param text the text to search within
* @param text the text to search within
* @return a stream of the indices at which matches was found
*/
IntStream findAll( CharSequence pattern, CharSequence text );
IntStream findAll(CharSequence pattern, CharSequence text);
}
Loading

0 comments on commit ed4029c

Please sign in to comment.