@hugo Yes, relevancy tuning is hard to do well. I know the big search engines have huge teams using a combination of pretty advanced tech, and a surprising amount of manual intervention.
@michaellewis I find search ranking fascinating. It’s a really tough problem to solve but in the case of this engine I just got the feeling their were ranked by something really basic like index date or something like that but I could be wrong was only there for 5 minutes.
@hugo At the moment I've a "query fields" boost on title, tags, description, url, author, body, a "boost query" on contains_adverts and owner_verified, and a "boost function" on the log of indexed_inlinks_count (config snippit is in a blog post). It could definitely be improved, but I was hoping it would good enough for now. If you're interested, Relevant Search by Doug Turnbull is a good starting point, although doesn't cover some of the fancy AI/ML companies with billions to spend are doing.
@michaellewis Oh! I didn't realise that you were the person behind searchmysite.net and I want to say that I am not putting down your work. I do love the idea. It's something that can be really useful and I am jealous that it never crossed my mind to do a user submitted search engine that can be used as a site specific search. It's a great approach.
@michaellewis When I went to the site the first searches I did were:
https://searchmysite.net/search/?q=hobbies
and https://searchmysite.net/search/?q=mastodon
And the first result for hobbies is a blank page and the first result for mastodon is a page that says nothing about Mastodon.
@michaellewis Probably the searches I did are not representative but I remember playing with the basic mysql full text search and I could obtain some decent relevance ranking out of html body content on a previous project I had. Just a thought.
But well done, I love what you did so far and if ever I can be of help in something feel free to reach out :)
@hugo Thanks for the offer. I think your queries were reasonable and the results ranking for them could be better. I've made a tweak to give more of a boost on the number of indexed_inlinks which I think helps a bit for now, but it definitely needs a lot more tuning so I've made a note to revisit later. Having more real data will actually help too.
@michaellewis OK, hope the project goes well :)
@mike yep pretty cool idea, the ranking of results is still really basic but I like it :)
@celia