There is a good (albeit somewhat outdated) plugin for embedding PDF files directly into a WordPress page - PDFViewer. Using the PDF Viewer plugin, you can insert a special shortcode into a page or post and specify a direct link to the PDF file inside this shortcode. This article will let you know how you can search posts with embedded PDF files by their content.
It is extremely convenient. However, in order for the WPFTS plugin to index the contents of the inserted file and include it in the page/post index, a little bit of coding magic is needed. This will make the parent page searchable by the content of the files inserted into it!
I want to show you how to write this magic code.
So, we need to add a wpfts_index_post hook handler, inside which we will check the types of posts that need to be processed. If the post_type matches the desired one, then you will need to look for the [pdfviewer] tag in its content, find the link to the PDF file that this tag contains. Next, we will need to pull the content of each of these PDFs and add it to the index of the parent post.
Seems simple? Now, look at the code. I tried to add comments to make it clear what and where. But if you still have questions - write in the comments.
add_filter('wpfts_index_post', function($index, $p)
{
global$wpdb, $wpfts_core;
// Check if we are processing the correct post_typeif (in_array($p->post_type, array('post', 'page'))) {
global$post;
// Look for pdfviewer tags using PREG expressionif (preg_match_all('~\[pdfviewer((\s+[^\]]*)|(\]))\]([^\[]*)\[/pdfviewer\]~sU', $p->post_content, $zz, PREG_OFFSET_CAPTURE)) {
// offset 4 is a file URL// offset 0 is a whole shortcode tag// Let's include WPFTS Utils (we will need it below)require_once$wpfts_core->root_dir.'/includes/wpfts_utils.class.php';
// We going to collect extracted texts here$sum = '';
foreach ($zz[4] as$k => $d) {
$url = $d[0];
if (preg_match('~^http~', $url)) {
// In case we have a correct URL, let's extract the text from that file.// This method is using caching to prevent repeated extractions$ret = WPFTS_Utils::GetCachedFileContent_ByLocalLink($url);
// Summarize extracted content$sum .= (isset($ret['post_content']) ? trim($ret['post_content']) : '').' ';
}
}
// Store extracted texts into the separate cluster$index['pdfviewer_content'] .= $sum;
// But we are not yet finished. Let's remove [shortcodes] from the content to be sure they will not appear in search resultsglobal$shortcode_tags;
// Temporary disable shortcode processor for [pdfviewer]$removed_tmp = array();
$shortcode_list = array('pdfviewer');
foreach ($shortcode_listas$dd) {
if (isset($shortcode_tags[$dd])) {
$removed_tmp[$dd] = $shortcode_tags[$dd]; // Save shortcode functionunset($shortcode_tags[$dd]);
add_shortcode($dd, function(){ return''; }); // Dummy function to render empty string for shortcode
}
}
// Render post content with shortcodes$post = get_post($p->ID);
setup_postdata($post);
ob_start();
the_content();
$r = ob_get_clean();
$r = strip_tags(str_replace('<', ' <', $r));
// Restore disabled shortcode processorsforeach ($removed_tmpas$k => $d) {
$shortcode_tags[$k] = $removed_tmp[$k];
}
wp_reset_postdata();
// Okay, we are done. Just store result into the cluster $index['post_content'] = $r;
}
}
return$index;
}, 3, 2);
Using this method, you can create your own handler for any such shortcodes.
@EpsilonAdmin Thanks for quick response and question.
Yes, my hope was to use the standard WP search widget - but I haven't explored any other option. If there's a better way I'm happy to get guidance 😀
My site has been recently re-created in WordPress after quite a few years of running under Joomla and that Joomla installation had a free plugin called jiFiles (?) which did the document scanning/indexing. A standard search, scoped on file name or a string from within file content, would pull up a list of file names each hyperlinked to the file itself to easily click on for in-browser viewing (or possibly downloading).
I appreciate that there are other WordPress plugins that offer a full document management system but they have a much larger feature set than I need and are also majorly expensive for a small non-profit community web site.