@Mojeek I am in the process of adding Mojeek as the default search engine on Privacy Browser, but a user informed me that searches involving Cyrillic characters return a 403 - Forbidden error. Is that intentional?

@privacybrowser Hey Soren, we sent you an email also, but for anyone with keen eyes on this, here's an excerpt:

"At the moment we don't offer Cyrillic searches as our index does not currently contain crawled results which would provide anything useful. It takes time to build up an index and for the moment we are focussed on improving our results in English & some other (mainly Romance) languages. It's likely the Cyrillic characters return a 403, due to our bot-capturing software."

@Mojeek Do you have a timeline for when you expect to support Cyrillic searches?

@privacybrowser @Mojeek I reported this more than a year ago, still not fixed. Not having any matches in the index is understandable, but It is misuse of http status code which annoys me.

@privacybrowser @older Agreed too. We are looking into how to address this. No easy fix; bots are a big challenge us, in general, and all search engines. A solution for some could involve using reCaptcha; but you may well understand we and users don't want us using that.

@Mojeek @older How many more system resources does it consume to return a page saying that no search results were found compared to returning a 403 - Forbidden error?

@privacybrowser @older that's not the issue. there may search results. this issue is; is this a bot? and we are deluged with them

@Mojeek @older You said previously that requests with Cyrillic characters are classified as bots (in my experience 100% of the time). You also said that you have no indexes for Cyrillic characters, so searches for them always return no results. So, why not just return a page saying there are no results instead of returning a 403 - Forbidden error? It has the same impact on your system, while being much more helpful to legitimate searches from real people.

@privacybrowser @older I am sorry if we have not been clear; but we did not say "You said previously that requests with Cyrillic characters are classified as bots". Perhap a re-read of the explanation in the email sent to you by our CEO 3 days ago should makes thing clearer.

@Mojeek @older The text of the email you sent me is published at stoutner.com/switching-from-st . In it, you say that it may be the problem, but that you tried several searches with Cyrillic characters that were not blocked. However, my experience is that all English queries produce a webpage, even if no results are found, and all Cyrillic queries produce a 403 - Forbidden error. Were you perhaps testing from an IP address that is whitelisted from your bot blocking tech?

@Mojeek @older Thank you for the link to your monetization plan. I found it to be well written and to address the concerns I had. I have updated my post accordingly.

stoutner.com/switching-from-st

@privacybrowser @older Thank you and glad you found it informative. We looked at the Cyrillic problem overnight and we may have a fix. Please try it out now and let us know.

@Mojeek @privacybrowser Seems to be fixed. Now I get actual response instead of http status 403.
Now, wake me up, when it supports Ukrainian, I might consider switching ;)

@privacybrowser @older Many thanks for your input and persistence. We learned that Cyrillic characters have a specifc URL encoding which we were not handling appropriately.

@Mojeek @privacybrowser That surprizes me. There's nothing special about Cyrillic characters. They are encoded according to RFC 3986, just like any other non-ASCII characters.

@older @privacybrowser Indeed. But every character is prefixed with %D0 and that was a signal showing up in bot attacks. We might have been overzealous in our defences, or even messed up. As you saw and pointed out it was killing human queries. Anyway, apologies about that and thanks for highlighting the issue.

@Mojeek A user pointed out that searching for `site:stoutner.com "privacy browser"` always produces an `403 - Forbidden` error, while searching for `"privacy browser" site:stoutner.com` does not.

@Mojeek The issue was originally reported at forum.f-droid.org/t/privacy-br The reporter experiences it intermittently using Tor, but I have experienced it consistently (at least so far) without using Tor or any other VPN.

@savely @privacybrowser Hey Savely, any chance of getting some more information to: mojeek.com/about/contact specifically your IP so we can check it out?

Sign in to participate in the conversation
Fosstodon

Fosstodon is an English speaking Mastodon instance that is open to anyone who is interested in technology; particularly free & open source software.