Today I found out that google docs infects html exports with spyware, no scripts, but links in your document are replaced with invisible google tracking redirects. I was using their software because a friend wanted me to work with him on a google doc, he is a pretty big fan of their software, but we were both somehow absolutely shocked that they would go that far.
Google Docs exports automatically infected with tracking links:
txt - unaffected
html + AFFECTED
odt - unaffected
pdf - unaffected
epub + AFFECTED
rtf - unaffected
docx - unaffected
sample web html <a> tag:
<a class="c4" href="https://www.google.com/url?q=https://wikimediafoundation.org/&sa=D&source=editors&ust=1696089933805520&usg=AOvVaw2ypOvslXzoEGwdryv4bFyJ">https://wikimediafoundation.org/</a>
sample epub xhtml <a> tag:
<a class="c5" href="https://www.google.com/url?q=https://wikimediafoundation.org/&sa=D&source=editors&ust=1696087392161966&usg=AOvVaw1v4xpIFWD9GYkMFifXd1uo">https://wikimediafoundation.org/</a>
For those unfamiliar with html: the href section, everything between href=" and "> is the real link, and the section between > and </a> is the display text
This html feature is useful so that a link can display as smething like "Read more", "Profile", "Wiki" etc, but in this case it is misused.
Google tracks people that are not using any of their products by adding hidden tracking links to exports without designer knowledge or end user consent.
@lil5 Interesting find. Perhaps they only officially track the google user who made the doc, so users unwittingly help the google collect stats on the author and their site, or only those that are signed with in google are tracked, and almost every person with a computer is usually signed into an account. But even so, this tactic shows that there no trust, so why shouldn't they track everyone, its not like they have to share the code they say only tracks legally.
@Joe_0237 @lil5 without a logged in account, the google domain gets the same data any other site would have. If they're not setting a cookie, and the link is taking you where it is intended, I don't see what they're doing wrong.
I'm up for being educated on this topic, as it affects my field of work. Thanks
@adg @lil5 I have not investigated that very deeply, but I don't think that the fact that any company can track makes it moral.
I don't know if they set a cookie or if not if they use a fingerprinter, but beyond any doubt they have your IP address and user agent which unless you are on a VPN or at a coffee shop or or share a network with your apartment building is probably enough to identify you.
But ...
@adg @lil5 ... even if they are not tracking the follower of the link, they are surely at least tracking link traffic on someone else's website or ebook, they know the source page because of the referer http header, they know the destination, and they know the document the link belongs to with the ID numbers the in the "ust" and "usg" url parameters.
@adg @Joe_0237 @lil5 In this case it's sending the IP address of e.g an ebook's reader to a third party. That's not something that person expects or agreed with.
Also, breaking the GDPR doesn't require that we have proof, we only need that to convict somebody, and just like most other criminals big data companies will try to hide their crimes.
(And Google got caught violating laws many times before, so I'm definitely not going to trust them...)
@adg @Joe_0237 @lil5 The GDPR isn't about cookies, it's about personally identifiable information. An IP address and time stamp is sufficiently personally identifiable. This is why privacy-conscious access logs mask IP addresses.
Note that the EU ePrivacy directive also exists and it covers cookies more explicitly but the GDPR is the baseline.
@Joe_0237 For people who have too much stuff on Google Docs to migrate easily, it would be nice to distribute a one-liner script that strips the evil redirects from downloads. Maybe a maker of an unzip utility could even add an option to strip the redirects while unzipping the dirty HTML or Epub.
@Steve98052 it would be a little tricky in a single line. How about ten lines (probs more than 30 for someone as verbose as me) and the right to import an HTML or XML parser, a URL parser, and a module for dealing with zip archives.
(both ebup and web use a kind of HTML, web exports from google are zipped, and an epub file is zipped XHTML + more)