Get WPFTS Pro today with 25% discount!

Sorting of Search Results


  • Is there a way to modify the relevance sort weightings when applied to attachments, particularly pdfs, based upon the number of times the search term appears within the document? (i.e. the more often the term appears, the higher the relevance).


  • Hi @Nick

    It's a good task to think! Mainly the relevance calculation is based on the TF-IDF algorithm, which means the number of terms also divided by the length of the document, so we actually get not exactly the number of terms, but a terms density.

    However, the WPFTS is very flexible and I think the algorithm you need can be implemented with some code magic.

    Could you explain please why you need such an algorithm and how it exactly should be implemented?


  • Most of the documents we have on our website are pdfs of our Journals, that were originally paper copies going back to 1955. Each Journal typically contains 6 - 8 substantial articles of about 10 pages each, plus many shorter articles and including comments on earlier articles.
    So, the more times a search term is mentioned in an article and, particularly, if the term is included in the title of the article, the more likely it is that the article is specifically about the search term or something closely related to it.
    On the assumption that WPFTS can't tell where one article ends and another begins, the number of instances within the document would be a good proxy.


  • Okay @Nick

    So the main issue is that we can not extract separate articles from the PDF and assign more weight to their titles, correct?

    Your solution looks reasonable, and we can implement it. But before that, do you think we can try to extract separate articles from the PDF or (which is even simpler) extract article titles. We could store these titles into the separate index cluster and assign this cluster more weight. It could work even better than weighting relevance on a number of terms.

    Could you please send me one or two typical PDFs? It worth trying to extract titles from them using some heuristic algorithm.

    Also, could you tell me how much PDFs you have in total? Will it be a headache to extract titles manually?


  • We have over 1,000 pdf documents on our website, and the most important is our Journal, with 238 editions at present. As an example, a couple of issues of our Journal are here, but they are all similar in structure:
    https://rchs.org.uk/wp-content/uploads/2020/02/Journal-100-Nov-1975.pdf
    https://rchs.org.uk/wp-content/uploads/2020/02/Journal-001-Jan-1955.pdf
    You will see that at the bottom of the second page of Journal 100 is a table of contents for the issue and, if the pdf is saved and then opened in Acrobat Reader, there is an equivalent set of bookmarks. It would help if a search term appearing in the article title (as listed in the table of contents) was given a higher weighting than one in the text of the document, but quite often the term will only occur in the text, and not in the title at all.
    Ideally, we would like the weighting to be based upon articles, but weightings based upon the the whole Journal is acceptable. This is because the articles tend to be on unrelated topics within a Journal, so there is probably little difference between the number of instances of a specific search term within an article, and within its parent document.
    I think it would help me if you could explain, in non-technical terms, how the four weightings operate with WPFTS. I've looked at the TFIDF article on Wikipedia, and understand the basics, but the majority of the article is too technical for me. Perhaps this information could also be added to the WPFTS documentation?
    As a related issue, could WPFTS open the document listed in the search results at the first article page where the search term is found? I realise it might go to the article title instead, and if so that wouldn't really help much.

Suggested Topics

  • No PDF results

    Bugs and Fixes
    2
    0 Votes
    2 Posts
    25 Views
    EpsilonAdminE

    Hi @ultraman

    Thank you for your message.

    Since you can see results in the Sandbox, the index is fine. No need to rebuild it again. The problem is on the "display results" side.

    With your configuration, the first idea I think is to install the pre-release WPFTS version that contains the latest fix of the Divi Addon for WPFTS. I've sent you the link in private messages.

    Please install it and tell me if it works fine or not.

    In case it still does not work, please tell me which version of Divi Theme you are using, the WPFTS plugin version, and also please explain your search behaviour. Simple try the request like this:

    https://yourdomain.com/?s=<search_phrase>

    And notice if it returns any result.

    Thank you!

  • No search results after update

    Bugs and Fixes
    3
    0 Votes
    3 Posts
    222 Views
    EpsilonAdminE

    @ibloom
    Thank you, this is actually the thing which I would like to recommend you to do.

  • Inconsistency in search result

    Bugs and Fixes
    2
    0 Votes
    2 Posts
    669 Views
    EpsilonAdminE

    Hi, @Amine
    The WPFTS Pro version you are using is very outdated. Please consider upgrading to 2.46.x which is the latest. It has a completely new indexing and searching algorithm and it should solve your issue.

  • Multi word search showing an empty screen

    Bugs and Fixes
    3
    1 Votes
    3 Posts
    795 Views
    W

    @EpsilonAdmin -

    Thank you, that worked. We appreciate the help!

  • Very slow search query on large wordpress site

    Bugs and Fixes
    13
    0 Votes
    13 Posts
    3k Views
    mazcabralM

    @EpsilonAdmin
    Great! but I still haven't received the email with the link. If possible, please send me back so that I can install and do the tests. Thank you so much!

Be the first to read the news!

We are always improving our products, adding new functions and fixes. Subscribe now to be the first to get the updates and stay informed about our sales! We are not spammy. Seriously.

Join Us Now!

We are a professional IT-team. Many of us have been working in a Web IT field for more than 10 years. Our advanced experience of software development has been employed in the creation of the WordPress FullText Search plugin. All solutions implemented into the plugin have been used for 5 or more years in over 60 different web-projects.

We are looking forward to your comments, requests and suggestions in relation to the current plugin and future updates.

ewm-logo-450

The forum powered by NodeBB | Contributors