Show newer

searchmysite.net has its own dedicated blog now, and I've posted a bit more details of some of the changes I've made since the burst of activity in mid October at blog.searchmysite.net/posts/im

TIL there's a breed of semi-feral sheep on a small Scottish island that has evolved to eat mostly seaweed, giving the meat a "unique, rich flavour": en.wikipedia.org/wiki/North_Ro

Had to log in to FB for the first time in 7 years, to change my password as apparently it had been compromised (it says there was a login from a Windows PC, so that definitely wasn't me). What a disaster zone it has become.

@monorail I've been doing a search for "pikmin 4" once or twice a year since 2015, when the creator said pikmin 4 was "close to completion".

There has been a slight mismatch between the number of sites submitted, and the number indexed. Turns out that 10 sites have a User-agent: * Disallow: / in their robots.txt. I've added those sites to the do not index list, which means if you resubmit them you'll see the message '... has previously been submitted but ... Access blocked by robots.txt'. If you see this, but have updated robots.txt to allow searchmysite.net, let me know and I'll move to the index list again.

@metbril @celia Many thanks. There are actually quite a few non-English sites listed now, which is great, although I need to make it easier for non-English speakers to use. In the short term, I'm planning on simply having a Language drop down on both the Search and Browse pages (probably having to default to All Languages to avoid user tracking and profiling). Longer term, it would be good to properly internationalise the menus etc., but that'll probably be something for the issue log at first.

@dajbelshaw Looks great, but one thing missing - what happens if someone breaks the rules?

So today's my 20 year anniversary at my current job (I remember the date because it was my dad's birthday). They used to make a big thing about significant anniversaries. But today? Not even an email.

@celia Simple rule of thumb I use: if it is more of a "site" than an "app" don't use an SPA, i.e. if it is more about content than functionality, reading rather than writing, more "passive" consumption rather than "active" interaction, etc. Of course in the real world many use cases aren't that clear cut.

@goldfinch Thanks. Still getting to grips with it, but seems good so far.

Had the first outage of searchmysite.net last night:-( Good news is that the the monitoring and alerting triggered. Bad news is that it happened just after I went to bed and I put my phone on Do Not Disturb overnight. Hopefully no-one too inconvenienced. Root cause was the system running out of memory trying to index a ginormous .cbr file. Needless to say some changes already made to prevent a repeat, and more changes planned.

@hugo Thanks for the offer. I think your queries were reasonable and the results ranking for them could be better. I've made a tweak to give more of a boost on the number of indexed_inlinks which I think helps a bit for now, but it definitely needs a lot more tuning so I've made a note to revisit later. Having more real data will actually help too.

@hugo At the moment I've a "query fields" boost on title, tags, description, url, author, body, a "boost query" on contains_adverts and owner_verified, and a "boost function" on the log of indexed_inlinks_count (config snippit is in a blog post). It could definitely be improved, but I was hoping it would good enough for now. If you're interested, Relevant Search by Doug Turnbull is a good starting point, although doesn't cover some of the fancy AI/ML companies with billions to spend are doing.

@hugo Yes, relevancy tuning is hard to do well. I know the big search engines have huge teams using a combination of pretty advanced tech, and a surprising amount of manual intervention.

@mike @celia Many thanks for the mention. Any suggestions for improvements welcome. BTW I'm aiming to open source it in time.

@nathand @celia Hi Nathan, good to see you again. Yes, its just a side project at the moment - normally around a couple of hours at night after the kids are in bed.

Show older
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.