Get WPFTS Pro today with 50% discount!

How to search a post by the attached PDF file



  • The Problem

    Sometimes you want to search for posts that have a PDF file attached to it. Assume you don't want to display the attached files themselves in the search results. But instead, you want these parent posts (maybe custom post types) shown in search results using attached files content.

    The "universal" solution to this can be complex. Because there are numerous ways how the file can be attached to the post. For example:

    1. Very simple case - you can put post ID to attachment's record as the post_parent. In this case, WordPress considers the attachment is "uploaded" to the post.

    2. Another case is when you put the PDF attachment ID to the arbitrary post's meta field. You can use some plugin like popular ACF (Advanced Custom Fields) for this. The classic WP way is to use update_post_meta() call for this.

    3. You can put a direct PDF link or file path to the post's meta field. You can use ACF for this as well or set this up by the update_post_meta() function.

    4. You can have a PDF link mentioned somewhere in the post_content. For example, you have some text where is one or more direct links to PDF files.

    5. You even can have a shortcode which is included in the post_content and this shortcode may explode to something very beautiful - for example, a PDF viewer widget or real3D book.

    6. Imagine more cases...

    Ok, well. So how we should manage all these cases?

    Basic Idea of the Algorithm

    The solution could be almost impossible with the usual "direct" WordPress search. But the indexed search is a "silver bullet" for this type of task.

    The main idea is here. We need to extract the content from the included file(s) and put it to the specific search index cluster of the parent post.

    Sounds great, but how may we actually achieve this?

    The Implementation

    Okay. As you know, there is a special WPFTS hook wpfts_index_post which is called each time when the plugin needs to index any changed or added post (or custom post, or attachment - in the WP they all are actually "posts"). The goal of this hook is to allow developers to add any custom data to the post's index.

    How does it work?

    Just before putting anything to the Search Index, the WPFTS is collecting all the required textual information, which is related to the post (by default it's only post_title and post_content) and puts them into the $index array, where the key become a name of the index cluster and value is a string with text. Thus, initially this $index variable looks like this:

    $index = array(
        'post_title' => "...The post title is here...",
        'post_content' => "...The post content is here...",
    );
    

    After this (and before putting $index into the search index) it calls the wpfts_index_post hook which we have mentioned above.

    And this is the point, where we can put our own code to add file content to the index!

    Assume we have uploaded the PDF file "into the post" using the Add Media button on the Edit Post page. And now we want the post to be searched by this PDF file content. (BTW, not a mandatory PDF file. It can be any supported file type.)

    Let's add our own hook implementation.

    add_filter ( 'wpfts_index_post', function($index, $post) {
    
        global $wpfts_core;
     
        // We can add anything we want to the $index here. 
        // But now we need to add exactly the content of the PDF file(s), "uploaded"
        // to the post.
    
        // First step: let's check if we're working with the post of the correct type
        if ($post->post_type == "post") {
            // Second step: okay, now let's check if we have any 
            // files "uploaded to the post" and collect their IDs
    
            // You can specify different mime-types to select files you need
            $files = get_attached_media(array('application/*', $post->ID));
            if (count($files) > 0) {
                // Okay, we found some files! Let's extract the text from those files
                // And add it to the $index
                $sum_string = '';  // We will collect texts here
                foreach ($files as $file) {
                    if ($file) {
                        // Call WPFTS's method to get text from the file
                        // Despite the fact that this is a very complex 
                        // and slow function, its result is cached
                        $att_data = $wpfts_core->getCachedAttachmentContent($file->ID);
    
                        if (isset($att_data['post_content'])) {
                            $sum_string .= $att_data['post_content'].' ';
                        }
                    }
                }
    
                // Now all texts are collected, store them to the $index array
                // You can use any cluster name
                $index['attachment_content'] = $sum_string;
            }
        }
    
        // We have to return summarized $index to 
        // allow WPFTS put the data to the search index
        return $index;
    }, 10, 2);
    

    Please read carefully all the comments in the code to understand how the magic happens.

    To check this code, you can either use "Index Tester" functionality in the WPFTS Settings / Sandbox Area tab. Or you can open the Edit Post page and press the Update button. This action will force an automatic reindex of the post and thus this hook will be run too.