Archive Team and the Fediverse

The popular Mastodon instance berries.space recently shut down, with all of their content being taken offline, most likely permanently. It was a popular home for teenagers migrating away from tumblr. At its peak, it had over 3,000 users.

When an instance on the Fediverse shuts down, its data is gone, unless the administrators make it accessible in some way. While other instances may have locally cached copies of some posts, for the most part, it’s gone.

This was not the case with berries.space.

Update: Added a former berries.space admin’s response.
Update 2: Added the Archive Team founder’s response.

A list of dead instances archived by the Archive Team. berries.space is "Saved!"
A list of dead instances archived by the Archive Team. berries.space is “Saved!”

I won’t be providing links to any of the archived data (or pages that link to it), as I don’t want to spread it.

The Archive Team

The Archive Team is a “loose collective of rogue archivists, programmers, writers and loudmouths” who archive publicly accessible content on the internet. They are separate from archive.org.

The Archive Team has a vast library of archived content, including everything ever posted on Google Plus. This is both good and bad – it’s good because it means that potentially important information isn’t lost, but it’s very, very bad for privacy.

Let’s say that you make a post revealing some private information one day. An archival bot like archive.org’s Internet Archive bot crawls the site you made the post from, and creates a publicly accessible archived version of it. Later on, you decide to delete the post. Even though the post will disappear from the website you posted it on, archive.org still has a copy of it.

Forever.

This is, quite obviously, a huge privacy concern. Thankfully, the archive.org has an exclusion policy which you can make use of to remove your data, or to request that it isn’t added in the first place. The Archive Team does not provide a service like this.

When the Archive Team archived all publicly accessible content on berries.space, not only did they not notify any of the users, but they didn’t provide any means for the users to opt out. A member from archive.org stated that they also archived berries.space, and also didn’t provide a means for opting out. This means that regardless of whether the users wanted their data to be archived (or even knew about it happening), their public post history was archived by the Archive Team.

There was no way for anybody involved with berries.space, not even the admins, to opt out of this public data collection. The Archive Team has a rant (CW: suicide mention) on their wiki about how they despise robots.txt, a text file that allows you to specify whether or not you’d like parts of your website to be archived or crawled by bots. The second paragraph states:

If you do not know what ROBOTS.TXT is and you run a site… excellent. If you do know what it is and you have one, delete it. Regardless, Archive Team will ignore it and we’ll delete your complaints, just like you should be deleting ROBOTS.TXT.

The Archive Team

The Archive Team is not violating any rules or laws by ignoring the robots.txt file – although they are violating laws in other ways – but regardless, they are stating that they will not honour the one method that a site administrator has of opting out of collection of public data. On top of that, they also say that they’ll delete any complaints you have about this behaviour. Will they archive your complaints before deleting them? :mlem:

They go on to make many other outlandish claims about how robots.txt is ineffective, stating with authority that anyone who uses it is stupid and wrong. One would think that a group of people dedicated to preserving data on the internet would also have some respect for the data’s owner, but apparently not.

Archive Team’s response

The day after this article was published, the founder of the Archive Team stated on Twitter that berries.space had been archived by a single member of the group, who had then gone on to speak as though it was a decision made by the entire team.

To quote the Archive Team:

If you don’t want people to have your data, don’t put it online.

The Archive Team

It’s entirely possible that this data has already been downloaded. The damage may already have been done. This is a definite step in the right direction, and will almost certainly mean that the data is gone forever (if removed from the Archive Team’s archives). But there’s still a chance it’s out there, and nothing has been done to prevent this from happening in future.

Even though this particular case may have been fixed in the end, this entire debacle should serve as a reminder that there are people who will archive your data – even if you’re a minor – without consent or even a way to avoid it, and will make it public. The Archive Team has gone back on their decision this time, but they still have entire archives of other social networks, such as Google Plus and Miiverse. While the onus shouldn’t be on you to make sure your public conversations and interactions aren’t monitored, it seems as though this is how things are going to be on the internet.

Making something public is not the same as consenting to having it stored for all eternity

If you don’t have any problems with your public posts being archived, there’s nothing wrong with that. But I doubt that all 3,000+ users on berries.space wanted their post history to be archived forever. If the Archive Team had simply asked users permission first, this wouldn’t be an issue. But again, not only did they not ask, they didn’t provide a method to avoid the data collection.

One of the former berries.space admins has said that the Archive Team’s behaviour is “absolutely not acceptable”, and that the admin team will be taking action.

berries had A LOT of minors who didn’t give any kind of consent to have their posts archived.

@tutu@midnight.dance, former berries.space admin

I have deleted posts that reveal information that I’m not comfortable with before. I closed my Facebook account long ago, and would really rather it stayed that way. I made many embarrassing posts from my Facebook account (I was a teenager), and would hate for them to resurface. Just because I was comfortable with sharing something at one point doesn’t mean I’m comfortable with sharing it today. That doesn’t mean they’re gone forever – I downloaded an archive of my content before I closed my account. I have access to them, and if there was anything good in there (there isn’t), I would be able to provide it to the public at my own discretion.

Mastodon has a utility to export your entire post history. You can click a button in the settings menu and download every post and the attached media you’ve ever made. If people want their content archived, they can do it themselves, without invading the privacy of others.

There are many situations in which someone would not want the public to know that they’re trans. If a trans woman had previously posted somewhere using her deadname1, this would not only out her as trans, but reveal her deadname to the public, which can be a major source of dysphoria. If the Archive Team’s “never post anything you don’t want to be known forever” advice is to be taken, then she would have to somehow know in advance that she wasn’t male, and know never to post anything using her real name until she realised that she was trans. This is, of course, ridiculous. This is a case where mass public data collection is dangerous to the target’s well-being.

Just because something’s publicly available doesn’t mean people want it harvested and stored forever by a dedicated group of data archivists. Posting your phone number on a billboard isn’t consenting to having your phone number posted online by an archival team. Neither is protesting in public, or having a conversation in public, or going skinny-dipping in public, or walking past a CCTV camera in public. Being in public does not mean you automatically consent to having your activities archived.

While data archival is important, it shouldn’t come at the expense of privacy. It can be difficult to make that decision sometimes, but if you’re part of a team of archivists, you should really consider some sort of code of ethics. Privacy is becoming increasingly more difficult on the internet, and things like this make it worse. The idea that you should never do something in public because someone might be archiving your every move is absurd.

  1. A deadname is a name that a person (usually trans) no longer uses.

3 thoughts on “Archive Team and the Fediverse”

  1. Wouldn’t children putting stuff on berries.space be a violation of COPPA, putting the blame on the server’s admins?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.