Get WPFTS Pro today with 25% discount!

How to modify word-break behaviour of the indexing engine?


  • Sometimes it is necessary to change the behavior of the indexing engine when breaking the text into separate words (you may be more familiar with the word "tokenization").

    For example, you want to make it so that SKU numbers that contain periods, minuses, or spaces are perceived by the search engine as a whole. Let's say the article number "EBR-001-567" should be found only on the substrings "EBR" or "EBR-001", but never on the substrings "56" or "567".

    By default, the indexer tokenizer of the WPFTS treats minus as a separator, so it will place three different words "EBR", "001", "567" in the index, and even though the phrase "EBR 001 567" will still have priority in the search (since the engine gives a bonus of relevance to whole phrases), it will still be possible to find "567" or "001" separately, which is unacceptable in our case.

    In order to overcome this problem, we must change the behavior of the tokenizer so that the minus is no longer a word separator. Note that this can be solved in at least two ways: a simple one - to exclude the minus from the list of separators for the entire text and a complex one - to calculate which words are articles and turn off the breakdown only for them.

    Here's some sample code we could use to follow a simple script.

    It uses two regular expressions to split the text (they are very similar, but actually different - look carefully!)

    add_filter('wpfts_split_to_words', function($words, $text)
    {
        // The context stores useful information about current post and cluster
        global $wpfts_context;
        
        // Check if we are in the indexing stage
        if ($wpfts_context && ($wpfts_context->index_post > 0)) {
                // Ok, we are indexing now
                // Let's apply different rules for post_title and any other cluster
                if ($wpfts_context->index_token == 'post_title') {
                    // The part number can be in the title, using the rule where "minus" is NOT a divider
                    $rule = "~([\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w][\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w'\-]*[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+|[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+)~u";
                } else {
                    // Other parts of the document will be broken assuming "minus" is a divider
                    $rule = "~([\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w][\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w']*[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+|[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+)~u";
                }
    
                // Finally let's make a split
                $matches = false;
                preg_match_all($rule, $text, $matches);
                if (isset($matches[1])) {
                    $words = $matches[1];
                } else {
                    $words = array();
                }
        }
      
        return $words;
    });
    

    Yes, it may look a bit complex, but actually nothing too hard to understand.

Suggested Topics

  • How to implement FullTextSearch on a page?

    Recipes and Known Solutions
    25
    0 Votes
    25 Posts
    4k Views
    EpsilonAdminE

    Hi, Larry

    The problem with the links has been fixed - just a typo in the code. The code in the messages above is correct.

  • metadata filter in WPFTS widget

    Recipes and Known Solutions
    2
    0 Votes
    2 Posts
    583 Views
    EpsilonAdminE

    Hi @pfb6736

    Yes, you can do this via CSS, either by adding a new rule to your Theme custom CSS block, or you can use WPFTS Settings / Search & Output / Display / Smart Excerpts CSS editor block.
    I hope it's .widget.widget-search rule, you can set 'width: 100%' to extend widget's width to all available room.

    The plugin is growing, but we have not yet a powerful system to make customized search widgets. What I can propose to you is (depending on your PHP/WP knowledge): there is a file /includes/widgets/wpfts_widget.class.php which is actually a native WP search widget with some changes for Live Search functionality.
    I would recommend you to copy this file to your child theme and create your own widget (by adding a select input to existing code). In case you think your knowledge of PHP is not that good, I can gladly help you, but you need to explain to me what exactly you need.

    Yes, it's possible via the small code addition. Do you want to show a path to the file instead of the file title? Or it should be an additional line in the search result item? Do you think showing a local path to the file is OKAY and maybe it's better to show Category instead?

    As I can see they are highlighted (bolded) but you can change this again in the Smart Excerpts CSS block. For example, this rule will make found words RED and YELLOW highlighted.

    .wpfts-result-item .wpfts-smart-excerpt b { /* Excerpt text */ color: red; background-color: #ff3; }

    Thanks.

  • 0 Votes
    1 Posts
    705 Views
    No one has replied
  • 0 Votes
    1 Posts
    692 Views
    No one has replied
  • 0 Votes
    2 Posts
    2k Views
    EpsilonAdminE

    There is a simpler way for fix the Avada Theme search results issue in case you're using Avada Theme 7.0+.

    In this version the authors enabled to override the output of search results by the hook. So using the simple addon become possible.

    Just download it and install. And enjoy the Smart Excerpts 🙂

    wpfts-addon-avada-theme.zip

Be the first to read the news!

We are always improving our products, adding new functions and fixes. Subscribe now to be the first to get the updates and stay informed about our sales! We are not spammy. Seriously.

Join Us Now!

We are a professional IT-team. Many of us have been working in a Web IT field for more than 10 years. Our advanced experience of software development has been employed in the creation of the WordPress FullText Search plugin. All solutions implemented into the plugin have been used for 5 or more years in over 60 different web-projects.

We are looking forward to your comments, requests and suggestions in relation to the current plugin and future updates.

ewm-logo-450

The forum powered by NodeBB | Contributors