How to search file by content when using PDFViewer plugin

EpsilonAdmin

There is a good (albeit somewhat outdated) plugin for embedding PDF files directly into a WordPress page - PDFViewer. Using the PDF Viewer plugin, you can insert a special shortcode into a page or post and specify a direct link to the PDF file inside this shortcode. This article will let you know how you can search posts with embedded PDF files by their content.

It is extremely convenient. However, in order for the WPFTS plugin to index the contents of the inserted file and include it in the page/post index, a little bit of coding magic is needed. This will make the parent page searchable by the content of the files inserted into it!

I want to show you how to write this magic code.

So, we need to add a wpfts_index_post hook handler, inside which we will check the types of posts that need to be processed. If the post_type matches the desired one, then you will need to look for the [pdfviewer] tag in its content, find the link to the PDF file that this tag contains. Next, we will need to pull the content of each of these PDFs and add it to the index of the parent post.

Seems simple? Now, look at the code. I tried to add comments to make it clear what and where. But if you still have questions - write in the comments.

add_filter('wpfts_index_post', function($index, $p)
{
        global $wpdb, $wpfts_core;
        
        // Check if we are processing the correct post_type
        if (in_array($p->post_type, array('post', 'page'))) {
                global $post;
                
                // Look for pdfviewer tags using PREG expression
                if (preg_match_all('~\[pdfviewer((\s+[^\]]*)|(\]))\]([^\[]*)\[/pdfviewer\]~sU', $p->post_content, $zz, PREG_OFFSET_CAPTURE)) {
					// offset 4 is a file URL
					// offset 0 is a whole shortcode tag

                                        // Let's include WPFTS Utils (we will need it below)
					require_once $wpfts_core->root_dir.'/includes/wpfts_utils.class.php';

                                        // We going to collect extracted texts here
					$sum = '';

					foreach ($zz[4] as $k => $d) {
						$url = $d[0];
						if (preg_match('~^http~', $url)) {
                                                        // In case we have a correct URL, let's extract the text from that file.
                                                        // This method is using caching to prevent repeated extractions
							$ret = WPFTS_Utils::GetCachedFileContent_ByLocalLink($url);

                                                        // Summarize extracted content
							$sum .= (isset($ret['post_content']) ? trim($ret['post_content']) : '').' ';
						}
					}

                                        // Store extracted texts into the separate cluster
					$index['pdfviewer_content'] .= $sum;

                                        // But we are not yet finished. Let's remove [shortcodes] from the content to be sure they will not appear in search results
					global $shortcode_tags;

                                        // Temporary disable shortcode processor for [pdfviewer]
					$removed_tmp = array();
					
					$shortcode_list = array('pdfviewer');

					foreach ($shortcode_list as $dd) {
						if (isset($shortcode_tags[$dd])) {
							$removed_tmp[$dd] = $shortcode_tags[$dd];	// Save shortcode function
							unset($shortcode_tags[$dd]);
							add_shortcode($dd, function(){ return ''; });	// Dummy function to render empty string for shortcode
						}
					}

                                        // Render post content with shortcodes
					$post = get_post($p->ID);
					setup_postdata($post);
                
					ob_start();
					the_content();
                
					$r = ob_get_clean();
					$r = strip_tags(str_replace('<', ' <', $r));
				
					// Restore disabled shortcode processors
					foreach ($removed_tmp as $k => $d) {
						$shortcode_tags[$k] = $removed_tmp[$k];
					}

					wp_reset_postdata();
                 
                                        // Okay, we are done. Just store result into the cluster  
					$index['post_content'] = $r;
				}
		}
        
        return $index;
}, 3, 2);

Using this method, you can create your own handler for any such shortcodes.

If you don't want to mess with a code, simple download and install this addon.
WPFTS Addon for PDFViewer (zip)

How to search file by content when using PDFViewer plugin

Suggested Topics

Slow search on a site based on Divi Theme

Posts show up multiple times i search result with wrong url

[Solved] Indexing and Search files by content in BuddyDrive

How to search coupons by description in WP Coupons and Deals plugin

PDF Search Results: Titles and Excerpts