Get WPFTS Pro today with 25% discount!

How to search a post by the attached PDF file


  • The Problem

    Sometimes you want to search for posts that have a PDF file attached to it and specifically you want that PDF file to be found by its content. Assume you don't want to display the attached files themselves in the search results. But instead, you want these parent posts (maybe custom post types) shown in search results using attached files content.

    The "universal" solution to this can be complex. Because there are numerous ways how the file can be attached to the post. For example:

    1. Very simple case - you can put post ID to attachment's record as the post_parent. In this case, WordPress considers the attachment is "uploaded" to the post.

    2. Another case is when you put the PDF attachment ID to the arbitrary post's meta field. You can use some plugin like popular ACF (Advanced Custom Fields) for this. The classic WP way is to use update_post_meta() call for this.

    3. You can put a direct PDF link or file path to the post's meta field. You can use ACF for this as well or set this up by the update_post_meta() function.

    4. You can have a PDF link mentioned somewhere in the post_content. For example, you have some text where is one or more direct links to PDF files.

    5. You even can have a shortcode which is included in the post_content and this shortcode may explode to something very beautiful - for example, a PDF viewer widget or real3D book.

    6. Imagine more cases...

    Ok, well. So how we should manage all these cases?

    Basic Idea of the Algorithm

    The solution could be almost impossible with the usual "direct" WordPress search. But the indexed search of WPFTS Pro is a "silver bullet" for this type of task.

    The main idea is here. We need to extract the content from the included file(s) and put it to the specific search index cluster of the parent post.

    Sounds great, but how may we actually achieve this?

    The Implementation

    Okay. As you know, there is a special WPFTS hook wpfts_index_post which is called each time when the plugin needs to index any changed or added post (or custom post, or attachment - in the WP they all are actually "posts"). The goal of this hook is to allow developers to add any custom data to the post's index.

    How does it work?

    Just before putting anything to the Search Index, the WPFTS is collecting all the required textual information, which is related to the post (by default it's only post_title and post_content) and puts them into the $index array, where the key become a name of the index cluster and value is a string with text. Thus, initially this $index variable looks like this:

    $index = array(
        'post_title' => "...The post title is here...",
        'post_content' => "...The post content is here...",
    );
    

    After this (and before putting $index into the search index) it calls the wpfts_index_post hook which we have mentioned above.

    And this is the point, where we can put our own code to add file content to the index!

    Assume we have uploaded the PDF file "into the post" using the Add Media button on the Edit Post page. And now we want the post to be searched by this PDF file content. (BTW, not a mandatory PDF file. It can be any supported file type.)

    Let's add our own hook implementation.

    add_filter ( 'wpfts_index_post', function($index, $post) {
    
        global $wpfts_core;
     
        // We can add anything we want to the $index here. 
        // But now we need to add exactly the content of the PDF file(s), "uploaded"
        // to the post.
    
        // First step: let's check if we're working with the post of the correct type
        if ($post->post_type == "post") {
            // Second step: okay, now let's check if we have any 
            // files "uploaded to the post" and collect their IDs
    
            // You can specify different mime-types to select files you need
            $files = get_attached_media(array('application/*', $post->ID));
            if (count($files) > 0) {
                // Okay, we found some files! Let's extract the text from those files
                // And add it to the $index
                $sum_string = '';  // We will collect texts here
                foreach ($files as $file) {
                    if ($file) {
                        // Call WPFTS's method to get text from the file
                        // Despite the fact that this is a very complex 
                        // and slow function, its result is cached
                        $att_data = $wpfts_core->getCachedAttachmentContent($file->ID);
    
                        if (isset($att_data['post_content'])) {
                            $sum_string .= $att_data['post_content'].' ';
                        }
                    }
                }
    
                // Now all texts are collected, store them to the $index array
                // You can use any cluster name
                $index['attachment_content'] = $sum_string;
            }
        }
    
        // We have to return summarized $index to 
        // allow WPFTS put the data to the search index
        return $index;
    }, 10, 2);
    

    Please read carefully all the comments in the code to understand how the magic happens.

    To check this code, you can either use the "Index Tester" functionality in the WPFTS Settings / Sandbox Area tab. Or you can open the Edit Post page and press the Update button. This action will force an automatic reindex of the post and thus this hook will be run too.

Suggested Topics

  • Slow search on a site based on Divi Theme

    Recipes and Known Solutions
    1
    0 Votes
    1 Posts
    114 Views
    No one has replied
  • 0 Votes
    5 Posts
    519 Views
    G

    @EpsilonAdmin Thanks for quick response and question.
    Yes, my hope was to use the standard WP search widget - but I haven't explored any other option. If there's a better way I'm happy to get guidance 😀

    My site has been recently re-created in WordPress after quite a few years of running under Joomla and that Joomla installation had a free plugin called jiFiles (?) which did the document scanning/indexing. A standard search, scoped on file name or a string from within file content, would pull up a list of file names each hyperlinked to the file itself to easily click on for in-browser viewing (or possibly downloading).

    I appreciate that there are other WordPress plugins that offer a full document management system but they have a much larger feature set than I need and are also majorly expensive for a small non-profit community web site.

  • 0 Votes
    1 Posts
    372 Views
    No one has replied
  • 1 Votes
    4 Posts
    930 Views
    P

    @pettera
    It is the same post. But as you can see tthe second post is displayed with the category name, and the category slug at the end of the url.

  • 0 Votes
    3 Posts
    802 Views
    EpsilonAdminE

    Yes, you can do that. But it will need some coding.

    First of all, we need to extract a PDF link or file path from the post. It should be done at indexing stage, inside the 'wpfts_index_post' hook function.

    After the link/path extraction, we should read this file and extract the text information from it. It still should be done inside the same hook.

    Okay, when we have the text, we put it to one of the index clusters (your preference) and let it to the indexer to make its job as well.

    The final code will depend from the way how you have attached the PDF file to the parent post. It can be done using very different ways.

    Please give me some more explanation of this and I will post the semi-ready code here.

Be the first to read the news!

We are always improving our products, adding new functions and fixes. Subscribe now to be the first to get the updates and stay informed about our sales! We are not spammy. Seriously.

Join Us Now!

We are a professional IT-team. Many of us have been working in a Web IT field for more than 10 years. Our advanced experience of software development has been employed in the creation of the WordPress FullText Search plugin. All solutions implemented into the plugin have been used for 5 or more years in over 60 different web-projects.

We are looking forward to your comments, requests and suggestions in relation to the current plugin and future updates.

ewm-logo-450

The forum powered by NodeBB | Contributors