WPFTS Pro Main Site WPFTS Community Forum
    • Recent
    • Tags
    • Popular
    • Register
    • Login
    Get WPFTS Pro today with 25% discount!

    How to modify word-break behaviour of the indexing engine?

    Scheduled Pinned Locked Moved Recipes and Known Solutions
    1 Posts 1 Posters 752 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EpsilonAdminE Offline
      EpsilonAdmin
      last edited by EpsilonAdmin

      Sometimes it is necessary to change the behavior of the indexing engine when breaking the text into separate words (you may be more familiar with the word "tokenization").

      For example, you want to make it so that SKU numbers that contain periods, minuses, or spaces are perceived by the search engine as a whole. Let's say the article number "EBR-001-567" should be found only on the substrings "EBR" or "EBR-001", but never on the substrings "56" or "567".

      By default, the indexer tokenizer of the WPFTS treats minus as a separator, so it will place three different words "EBR", "001", "567" in the index, and even though the phrase "EBR 001 567" will still have priority in the search (since the engine gives a bonus of relevance to whole phrases), it will still be possible to find "567" or "001" separately, which is unacceptable in our case.

      In order to overcome this problem, we must change the behavior of the tokenizer so that the minus is no longer a word separator. Note that this can be solved in at least two ways: a simple one - to exclude the minus from the list of separators for the entire text and a complex one - to calculate which words are articles and turn off the breakdown only for them.

      Here's some sample code we could use to follow a simple script.

      It uses two regular expressions to split the text (they are very similar, but actually different - look carefully!)

      add_filter('wpfts_split_to_words', function($words, $text)
      {
          // The context stores useful information about current post and cluster
          global $wpfts_context;
          
          // Check if we are in the indexing stage
          if ($wpfts_context && ($wpfts_context->index_post > 0)) {
                  // Ok, we are indexing now
                  // Let's apply different rules for post_title and any other cluster
                  if ($wpfts_context->index_token == 'post_title') {
                      // The part number can be in the title, using the rule where "minus" is NOT a divider
                      $rule = "~([\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w][\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w'\-]*[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+|[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+)~u";
                  } else {
                      // Other parts of the document will be broken assuming "minus" is a divider
                      $rule = "~([\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w][\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w']*[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+|[\x{00C0}-\x{1FFF}\x{2C00}-\x{D7FF}\w]+)~u";
                  }
      
                  // Finally let's make a split
                  $matches = false;
                  preg_match_all($rule, $text, $matches);
                  if (isset($matches[1])) {
                      $words = $matches[1];
                  } else {
                      $words = array();
                  }
          }
        
          return $words;
      });
      

      Yes, it may look a bit complex, but actually nothing too hard to understand.

      https://e-wm.org

      1 Reply Last reply Reply Quote 0
      • First post
        Last post

      Suggested Topics

      • V

        Recognize active members in Restrict Content Pro when searching

        Watching Ignoring Scheduled Pinned Locked Moved Recipes and Known Solutions
        1
        0 Votes
        1 Posts
        68 Views
        No one has replied
      • EpsilonAdminE

        Slow search on a site based on Divi Theme

        Watching Ignoring Scheduled Pinned Locked Moved Recipes and Known Solutions
        1
        0 Votes
        1 Posts
        366 Views
        No one has replied
      • EpsilonAdminE

        [Solved] The License become not valid and Update API is not accessible

        Watching Ignoring Scheduled Pinned Locked Moved Recipes and Known Solutions license update api ssl https curl
        1
        0 Votes
        1 Posts
        827 Views
        No one has replied
      • EpsilonAdminE

        [Solved] Media Library Folders Pro indexing issue

        Watching Ignoring Scheduled Pinned Locked Moved Recipes and Known Solutions fix mlf media library folders
        1
        0 Votes
        1 Posts
        900 Views
        No one has replied
      • EpsilonAdminE

        [Solved] Indexing and Search files by content in BuddyDrive

        Watching Ignoring Scheduled Pinned Locked Moved Recipes and Known Solutions buddydrive file search
        1
        0 Votes
        1 Posts
        834 Views
        No one has replied

      Additional Resources

      • My Account
      • Buy WPFTS Pro
      • Community Forum
      • Affiliate Program
      • Privacy Policy
      • Terms & Conditions
      • Contact Us
      • Coupon Partner

      Be the first to read the news!

      We are always improving our products, adding new functions and fixes. Subscribe now to be the first to get the updates and stay informed about our sales! We are not spammy. Seriously.

      Join Us Now!

      We are a professional IT-team. Many of us have been working in a Web IT field for more than 10 years. Our advanced experience of software development has been employed in the creation of the WordPress FullText Search plugin. All solutions implemented into the plugin have been used for 5 or more years in over 60 different web-projects.

      We are looking forward to your comments, requests and suggestions in relation to the current plugin and future updates.

      ewm-logo-450

      The forum powered by NodeBB | Contributors