Privacy Policy for Bluesky Measurements

Who are you?

We are a team of researchers from multiple institutions. As a first point of contact, refer to the Communication Networks Lab at TU Darmstadt, Germany.

What data are you collecting?

As part of our ongoing research, we are collecting data of the Bluesky social network. We collect data of multiple components of the network:

Firehose Data

We are subscribed to the official Bluesky Firehose, i.e., the big relay that unifies and re-broadcasts repository commits of all federated PDSes. We extract and parse the commits, which then take a form like so:

did, timestamp, commit

where each commit contains a list of operations:

operation_type, collection, rkey, record

These repo commits are produced by the PDS of a user based on their actions in Bluesky. They are part of the normal operation of the network and contain users’ public content.

PLC Directory Mirror

We replicate the official PLC directory using the export functionality, which exports an ordered list of operations. We replay these operations to arrive at the latest state for each DID. Ultimately, this stores the current and historical DID documents for each registered DID.

This data is part of the decentralized identity infrastructure. It has to be public in order for content on Bluesky to be verifiable, as the DID documents contains public key(s) of a user. We operate our mirror as a service to the community: The (centralized) plc.directory instance imposes necessary rate limits. We currently do not, in order to make this vital data more easily available to researchers and the community.

Full-Network Mirror

We replicate the data of the entire network in one big database. For that, we are subscribed to all PDSes and apply their event streams onto the database. This essentially has the format

DID, rkey, <record JSON>

Due to the way the AT Protocol works, this data is public. We fulfill the same functionality as the Bluesky AppView, without a public frontend.

From the database we export snapshots for research purposes. We correctly implement removals, i.e., we do not process deleted data for our research.

Labeler Logs

We are subscribed to all publicly listed Labeler accounts and log their labels.

This data is necessarily public for the Bluesky moderation to work. We log and archive the operations taken by these Labelers for research purposes.

Why are you doing this?

Our overarching goal is to derive general, non-personalized insights and statistics about Bluesky. We plan to make the results of our research public but will not make our raw collected data public at any point. Furthermore, we may release anonymized versions of some datasets, especially if we believe them to be useful to the wider community.

We use collected data for testing purposes during the development of new methods and their implementations.

We will use the collected data only for furthering the goals outlined above. We will use collected data only for purposes that are explicitly stated in this document and purposes that are permissible with respect to Art. 6 (4) GDPR.

We operate within the bounds of European data protection law, specifically Art. 5 (1) (e) GDPR.

For how long are you going to keep this data?

At the moment, we are operating as an indefinite research archive for the data outlined above. We believe it is vital for future researchers to collect and preserve these datasets. We are constantly reflecting on our data storage practices. If we conclude that some data fields or whole data sets are no longer needed for our intended purpose, we will delete them accordingly.

Who are you sharing this data with?

We are sharing parts of this this data, on request, with researchers working on thematically similar topics, only after we are convinced that our data helps them in their research and that the goals of their research overlap with our own purposes for collecting and storing the shared data. In each individual case, we determine the minimal subset of data that is necessary to hand out. We also inform the receiving researchers of the delicate nature of this data and ask them to handle it responsibly and in line with our own privacy policy.

I have questions. Can I contact you?

Of course! You can contact us via email. You can also contact us via mail at

FG KOM 
TU Darmstadt 
Rundeturmstraße 10
64283 Darmstadt
Germany

What are my rights?

You have the right to be informed about the data we collect about you. If you have the right by law, you can correct, delete or limit the data we collect about you. In those cases, please contact us.