Extending FunctionalityAction/Filter Hookswpfts_get_attachment_content

wpfts_get_attachment_content (Filter)

WARNING!!! This function is only available in the PRO version of the WPFTS plugin.

The wpfts_get_attachment_content filter in WP Fast Total Search allows developers to modify or add data that will be indexed for attachments. It is called during attachment indexing and allows adding to the index not only standard attachment fields (title, description), but also data from other sources, such as text extracted from the attachment file.

When to Use

This filter is particularly useful when you need to index the content of attachment files, such as PDFs, DOCX, or other text files. You can use this filter in conjunction with a library for extracting text from files and add the extracted text to the WPFTS index.

Arguments

  • $chunks (array): An array of data prepared for indexing. By default, it contains the title (post_title) and description (post_content) of the attachment.
  • $post (WP_Post object): The WordPress attachment object.
  • $is_reset_cache (bool): A flag indicating whether to reset the attachment data cache.

Return Value

  • $chunks (array): The modified array of data for indexing.

Example (Adding Text from a PDF File to the Index)

add_filter('wpfts_get_attachment_content', 'add_pdf_content_to_index', 10, 3);
 
function add_pdf_content_to_index($chunks, $post, $is_reset_cache)
{
	// Get the file's mime-type
	$mime_type = $post->post_mime_type;
	if (strpos($mime_type, 'pdf') !== false) {
		// Use a library to extract text from the PDF
		require_once 'path/to/pdfparser.php'; // Path to your library
		$parser = new PdfParser();
		$content = $parser->parseFile(get_attached_file($post->ID));
 
		if ($content) {
			$chunks['attachment_content'] = $content; // Add the PDF content to the new cluster `attachment_content`
		}
	}
	return $chunks;
}

Important Notes

  • Remember to set weights for new clusters (e.g., attachment_content in the example above) in the plugin settings or via the wpfts_cluster_weights filter for them to be considered when calculating relevance.
  • Make sure that the library you are using to extract text from files is installed and working correctly.
  • Processing large files can take considerable time and server resources. Optimize your code for working with large files.

The wpfts_get_attachment_content filter gives developers the ability to index the content of attachment files, significantly expanding the search capabilities of WP Fast Total Search.