2023-10-11T00:00:00.000Z

2023 Protocol Roadmap

October 11, 2023 · 10 min read

This post lays out the current AT Protocol (atproto) development plan, through to a "version one" release. This document is written for developers already familiar with atproto concepts and terminology. The scope here is features of the underlying protocol itself, not any application or Lexicons on top of the protocol. In particular, this post doesn't describe any product features specific to the Bluesky microblogging application (app.bsky.* Lexicons).

At a high level:

We are pushing towards federation on the production network early next year (2024), if development continues as planned.
A series of related infrastructure and operational changes to Bluesky services are underway to ensure we can handle a large influx of content from federated instances in a resource and cost-efficient way.
We plan to submit future governance and formalization of the core protocol to an independent standards body. These are consensus processes relying on interest and support of a community of volunteers, and we have already done some initial outreach, but we expect to start conversations in earnest around the same time as opening up federation.

Feel free to share your thoughts via Github Discussions here.

Getting to Federation

The AT Protocol is designed as a federated social web protocol. While we have an open federated sandbox network for developers to experiment with, the main Bluesky services are currently still not federated. Our current development priority is to resolve any final protocol issues or decisions that would impede federation with independent PDS instances. Some of these points are listed below.

Account Migration: One of the differentiating features of atproto is seamless migration of both identity and public content from one PDS instance to another. The foundations for this feature were set in place from the start, but there are some details and behaviors to decide on and document.

Auth refactor: We want to improve both third-party auth flows (eg, OAuth2), and to support verifiable inter-service requests (eg, with UCANs). These involve both authentication (“who is this”) and authorization (“what is allowed”). This work will hopefully be a matter of integrating and adapting existing standards. There is a chance that this will be ready in time for federation, but it is not a hard requirement.

Repo Event Stream (Firehose) Iteration: A healthy ecosystem of projects has subscribed to the full stream of repo commits that the firehose provides. We have a few improvements and options in mind to help with efficiency and developer experience, without changing the underlying semantics and power dynamic (aka, the ability to replicate all public content). These include "epochs" of sequence numbers; sharding; the option to strip MST nodes ("proof" chain) to reduce bandwidth; and changes to account lifecycle events.

Third-party Labeling: This is the protocol-level ability to distribute and subscribe to labels from many sources, including cryptographic signatures for verification. The lexicons are mostly designed, but are not included in the AppView reference implementation, and documentation is needed.

Account Status Propagation: Actions like account takedowns and (self) deletions don't currently get broadcast to other parties in the network, though they can be publicly observed (i.e., content becomes inaccessible). Some form of explicit signal would likely be helpful. Additionally, we’d like to allow temporary account deactivation as an option, instead of only account deletion. Temporary deactivation will have behavior similar to an account takedown, with effects propagating through the network.

Refactor Moderation Actions: The com.atproto.admin.* Lexicons have a concept of "actions" which "resolve" specific "reports," which has become rigid to work with. A refactor will have more flexible "events" (including both reports and private flags or annotations) which update "subjects." The moderation report submission process should not be impacted, but other admin Lexicons may have breaking changes.

Legacy Record Cleanup: We have a short remaining pre-federation window during which we could, in theory, rewrite any records to remove legacy or invalid data. Specifically, we could try to remove all instances of the legacy "blob" schema, and fix many invalid datetimes. This would break a large number of strong references between records, in a way that is difficult to clean up, and would be an aggressive intervention to existing repository content, so there is a good chance this won't actually happen and these crufty records will be in the network forever.

Operational Changes for Federation

A few infrastructure changes are planned for coming weeks and months that will impact other folks in the ecosystem. We will try to communicate these ahead of time, but are balancing disruption to the existing developer and service ecosystem with federating the network quickly.

Multiple PDS instances: The Bluesky PDS (bsky.social) is currently a monolithic PostgreSQL database with over a million hosted repositories. We will be splitting accounts across multiple instances, using the protocol itself to help with scaling. Depending on details, this will probably not be very visible to end users or client developers, but will impact firehose consumers and some developers.

BGS Firehose: Most folks currently subscribe to the firehose coming from the bsky.social PDS directly. We have had a BGS instance running in production for some time, but it has not been promoted and has had occasional downtime. With the move to multiple PDSes, it will still technically be possible to subscribe to multiple individual PDS feeds, but most folks will want to shift to the unified BGS feed instead. This will involve a change in hostname, and the sequence numbers will be different.

Possible bulk did:plc updates: We are not yet certain when, but there is a good chance that DID documents will need to be updated for all (or a large fraction) of bsky.social accounts. This identity churn will be in-spec, but could cause disruptions to other services if they are not prepared.

Search updates: The search.bsky.social service, which was never a formal part of the Bluesky API (Lexicons) will be deprecated soon, with both actor (profile) and post search possible through app.bsky.* Lexicons, served by the AppView (and proxied by the PDS).

Spam mitigation systems: Anti-spam efforts will be a crucial part of opening federation. It is not a protocol-level change per-say, but we expect to deploy automated systems to combat spam by labeling posts and accounts at scale.

Network Growth Control: We don't have a specific proposal for feedback from the ecosystem yet, but expect to have some form of resource use limits in place as we open federation to prevent the social graph, firehose consumers, and human and technical systems from being overwhelmed.

New Application Development

Atproto has had a clear system for third-party application development and application-layer protocol extension built in from the beginning: the Lexicon schema system. The current PDS reference implementation restricts record creation for unknown schemas, even when the "skip validation" query parameter is set, but we expect to relax that constraint in coming months.

There is a lot of supporting framework and documentation needed to make atproto a developer-friendly foundation for application development, and a few protocol pieces still need to be worked out:

Lexicon schema resolution: There should be a clear automated method for resolving new and unknown NSIDs (schema names) to a Lexicon JSON object over the web.

Clarify validation failure behaviors: What each piece of infrastructure is expected to do when records fail to validate against the Lexicon schema needs to be specified in detail, with test cases provided.

Lexicon schema evolution and extension: The backwards-compatibility rules for changes to application Lexicon schemas are mostly in place, but they’re not well documented. There is also wiggle room for third parties to inject extension fields into third-party records, and the norms and best practices for this type of extension need documentation.

Further Down the Road

It is worth mentioning a few parts of the protocol that we do not expect to work on in the near future. We think these are pretty important (which is why we are mentioning them!), but our current priority is federation.

Private content and messaging (DMs): We intend to eventually incorporate group-private E2EE (end-to-end encryption) content in the protocol, as well as a robust E2EE messaging solution. These are important and rightfully expected features. We expect this to be a major additional component of atproto, not a simple bolt-on to the current public repository data structure, even if we adopt an existing standard like MLS, Matrix, or the Signal protocol. As a result, this will be a large body of integration work, and will not be a focus until after federation.

Record versioning: The low-level protocol allows updates to existing records. It should be possible to store "old" versions of records in the repo, allowing (optional) transparency into history and edits. This may involve additions to repository paths, AT URIs, and core repo read and update endpoints.

Floating point numbers: The data model currently only supports integers, not floats. The concerns around floats in content-addressed systems are well documented, and there are alternatives and work-arounds (like string encoding). But if an elegant solution presents itself, it would be nice to support floats as a first-class data type.

Static CAR file repository hosting: For many use cases, it would be convenient if atproto repositories could be served as static files (probably CAR files), instead of requiring a full-featured PDS instance.

Protocol Governance

Development of atproto to date has been driven by a single company, Bluesky PBC. Once the network opens to federation, protocol changes and improvements will still be necessary, but will impact multiple organizations, communities, and individuals, each with separate priorities and development interests. If the protocol is successful, there certainly will be disagreements and competitive tensions at play. Having a single company controlling the protocol will not work long-term.

The plan is to bring development and governance of the protocol itself to an established standards body around the time the network opens to federation. Our current hope is to bring this work to the IETF, likely as a new working group, which would probably be a multi-year process. If the IETF does not work out as a home, we will try again with other bodies. While existing work can be proposed exactly “as-is", it is common to have some evolution and breaking changes come out of the standardization processes.

Which parts of the protocol would be in-scope for independent standardization? Like many network protocols, atproto is abstracted into multiple layers, with distinctions between "application-specific" bits and the core protocol. The identity, auth, data model, repositories, streams, and Lexicon schema language are all part of the core protocol, and in-scope for standardization. Some API endpoints under the com.atproto.* namespace are also relatively core, and that namespace specifically could fall under independent governance.

Other application-specific APIs and namespaces, including the app.bsky.* microblogging application, would not be in-scope for standardization as part of the core protocol. The authority for these APIs and Lexicons is encoded in the name itself, and Bluesky PBC intends to retain control over future development of the bsky.app application. The API schemas (Lexicons) will be open, and the expectations and rules for third-party interoperability and extensions for any Lexicon will be part of the core protocol specification.

2023-10-06T00:00:00.000Z

Bluesky BGS and DID Document Formatting Changes

October 6, 2023 · 3 min read

We have a number of protocol and infrastructure changes rolling out in the next three months, and want to keep everybody in the loop.

This update was also emailed to the developer mailing list, which you can subscribe to here.

TL;DR

As of this week, the Bluesky AppView instance now consumes from a Bluesky BGS, instead of directly from the PDS. Devs can access the current streaming API at https://bsky.network/xrpc/com.atproto.sync.subscribeRepos or for WebSocket directly, wss://bsky.network/xrpc/com.atproto.sync.subscribeRepos
Your existing cursor for bsky.social will not be in sync with bsky.network, so check the live stream first to grab a recent seq before connecting!
We are updating the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). This change is already live on the sandbox PLC directory.

How will this affect me?

For today, if you're consuming the firehose, grab a new cursor from bsky.network and restart your firehose consumer pointed at bsky.network.

Bluesky BGS

The Bluesky services themselves are moving to a federated deployment, with multiple Bluesky (the company) PDS instances aggregated by a BGS, and the AppView downstream of that. As of yesterday, the Bluesky Appview instance (api.bsky.app) consumes from a Bluesky PBC BGS (bsky.network), which consumes from the Bluesky PDS (bsky.social). Until now, the AppView consumed directly from the PDS.

How close are we to federation?

Technically, the main network BGS could start consuming from independent PDS instances today, the same as the sandbox BGS does. We have configured it not to do so until we finish implementing some more details, and do our own round of security hardening. If you want to bang on the BGS implementation (written in Go, code in the indigo github repository), please do so in the sandbox environment, not the main network.

This change impacts devs in two ways:

In the next couple weeks, new Bluesky (company) PDS instances will appear in the main network. Our plan is to optionally abstract this away for most client developers, so they can continue to connect to bsky.social as a virtual PDS. But the actual PDS hostnames will be distinct and will show up in DID documents.
Firehose consumers (feed generators, mirrors, metrics dashboards, etc) will need to switch over and consume from the BGS instead of the PDS directly. If they do not, they will miss content from the new (Bluesky) PDS instances.

The firehose subscription endpoint, which works as of today, is https://bsky.network/xrpc/com.atproto.sync.subscribeRepos (or wss:// for WebSocket directly). Note that this endpoint has different sequence numbers. When switching over, we recommend folks consume from both the BGS and PDS for a period to ensure no events are lost, or to scroll back the BGS cursor to ensure there is reasonable overlap in streams.

We encourage developers and operators to switch to the BGS firehose sooner than later.

DID Document Formatting Changes

We also want to remind folks that we are planning to update the DID document public key syntax to “Multikey” format next week on the main network PLC directory (plc.directory). These changes are described here, with example documents for testing, and are live now on the sandbox PLC directory.

2023-09-15T00:00:00.000Z

Rate Limits, PDS Distribution v3, and More

September 15, 2023 · 5 min read

To get future blog posts directly in your email, you can now subscribe to Bluesky’s Developer Mailing List here.

Adding Rate Limits

Now that we have a better sense of user activity on the network, we’re adding some application rate limits. This helps us keep the network secure — for example, by limiting the number of requests a user or bot can make in a given time period, it prevents bad actors from brute-forcing certain requests and helps us limit spammy behavior.

We’re adding a rate limit for the number of created actions per DID. These numbers shouldn’t affect typical Bluesky users, and won’t affect the majority of developers either, but it will affect prolific bots, such as the ones that follow every user or like every post on the network. The limit is 5,000 points per hour and 35,000 points per day, where:

Action Type	Value
CREATE	3 points
UPDATE	2 points
DELETE	1 point

To reiterate, these limits should be high enough to affect no human users, but low enough to constrain abusive or spammy bots. We decided to release this new rate limit immediately instead of giving developers an advance notice to secure the network from abusive behavior as soon as possible, especially since bad actors might take this blog post as an open invite!

Per this system, an account may create at most 1,666 records per hour and 11,666 records per day. That means an account can like up to 1,666 records in one hour with no problem. We took the most active human users on the network into account when we set this threshold (you surpassed our expectations!).

In case you missed it, in August, we added some other rate limits as well.

Global limit (aggregated across all routes)
- Rate limited by IP
- 3000/5 min
updateHandle
- Rate limited by DID
- 10/5 min
- 50/day
createAccount
- Rate limited by IP
- 100/5 min
createSession
- Rate limited by handle
- 30/5 min
- 300/day
deleteAccount
- Rate limited by IP
- 50/5 min
resetPassword
- Rate limited by IP
- 50/5 min

We’ll also return rate limit headers on each response so developers can dynamically adapt to these standards.

In a future update (in about a week), we’re also lowering the applyWrites limit from 200 to 10. This function applies a batch transaction of creates, updates, and deletes. This is part of the PDS distribution upgrade to v3 (read more below) — now that repos are ahistorical, we no longer need a higher limit to account for batch writes. applyWrites is used for transactional writes, and logic that requires more than 10 transactional records is rare.

PDS Distribution v3

We’re rolling out v3 of the PDS distribution. This shouldn’t be a breaking change, though we will be wiping the PLC sandbox. PDSs in parallel networks should still continue to operate with the new distribution.

Reminder: The PDS distribution auto-updates via the Watchtower companion Docker container, unless you specifically disabled that option. We’re adding the admin upgradeRepoVersion endpoint to the upgraded PDS distribution, so PDS admins can also upgrade their repos by hand.

Handle Invalidations on App View

Last month, we began proxying requests to the App View. In our federation architecture, the App View is the piece of the stack that gives you all your views of data, such as profiles and threads. Initially, we started out by serving all of these requests from our bsky.social PDS, but proxying these to the App View is one way of scaling our infrastructure to handle many more users on the network. (Read our federation architecture overview blog post for more information.)

For some users, this caused an invalid handle error. If you have an invalid handle, the user-facing UI will display this instead of your handle:

Screenshot of a profile with an invalid handle

You can use our debugging tool to investigate this: https://bsky-debug.app/handle. Just type your handle in. If it shows no error, please try updating your handle to the same handle you currently have to resolve this issue.

If the debugging page shows an error for your handle, follow this guide to make sure you set up your handle properly.

If that still isn’t working for you, file a support ticket through the app (“Help” button in the left menu on mobile or right side on desktop) and a Bluesky team member will assist you.

You can subscribe to Bluesky’s Developer Mailing List here to receive future updates in your email. If you received your invite code from the developer waitlist, you’re already subscribed. Each email will have the option to unsubscribe.

We’ll continue to publish updates to our technical blog as well as on the app from @atproto.com.

2023-08-24T00:00:00.000Z

Updates to Repository Sync Semantics

August 24, 2023 · 4 min read

We’re excited to announce that we’re rolling out a new version of atproto repositories that removes history from the canonical structure of repositories, and replaces it with a logical clock. We’ll start rolling out this update next week (August 28, 2023).

For most developers with projects subscribed to the firehose, such as feed generators, this change shouldn’t affect you. These will only affect you if you’re doing commit-aware repo sync (a good rule of thumb is if you’ve ever passed earliest or latest to the com.atproto.sync.getRepo method) or are explicitly checking the repo version when processing commits.

Removing Repository History

Repositories on the AT Protocol are like Git repositories, but for structured records. Just like Git, each commit to an atproto repository currently includes a pointer to the previous commit. However, this approach has caused a couple of pain points:

Record deletions are difficult to process. If a user deletes a record, that commit needs to be erased from their repository to match their intent.
Increased storage cost. Maintaining repo history can cause anywhere from a 5-10x increase in repo size.

We attempted to resolve both of these in the current model through rebases (discrete moments when the history of a repository is deleted/mutated, like in Git). However, this is a tricky and sensitive operation that is expensive to conduct and complex to communicate across the network.

Using a Logical Clock for Repositories

To address the above issues, we’re replacing the prev pointer in commits with a logical clock. We originally published our intention to do so a few weeks ago. These are the changes we’re making to the way we handle repository history:

Incrementing the repo version to 3
Making the prev field on repo commits optional
Adding a new required rev (revision) field which is a logical clock
Removing or adjusting commit-aware repo sync mechanisms

Note: If you explicitly verify the version of a repo commit or do strict type checking on commit repo commits (which you shouldn’t — the spec allows unspecified fields!), you will need to make that check inclusive of version 3.

To facilitate backwards compatibility with software that is still running repo v2, we will continue setting the prev field on commits in the interim.

Even though we are setting the prev field, this can be considered a “hint” and the history is no longer considered a canonical part of the repository.

Repository Revisions

The new sync semantics for the repository rely on a logical clock included in each signed commit.

This “revision” takes the form of a TID and must be monotonically increasing.

The included revision serves a few functions:

Ordering

The clock provides a simple ordering mechanism for encountered repos or commits. If a consumer encounters the same repo from two different sources, each with a valid signature and structure, the revision gives a simple mechanism to determine which is the most recent repository.

Sync

When syncing a repository, revisions give a series of signposts that allow you to request everything from a given repo since a previously seen version. Because revisions are ordered and monotonically increasing, the provider does not necessarily need the exact revision that the consumer is asking for (as with a commit hash), rather they can provide all repo contents from the latest version of the repo that they remember that is before the requested revision.

The PDS for instance will track the revision at which each repo block or record was introduced into a repository. If a consumer asks for every block or record since a given revision, the PDS has a simple mechanism by which to give that information, without needing a complicated sync algorithm.

Stale Reads

Finally, a logical clock on the repo gives us a mechanism through which we can detect stale reads. (We actually already snuck this in with an optional revision field on v2 repos!)

Repo revisions may be returned in response headers to most requests. A client will know their own repo’s current revision and can compare that with the upstream service’s revision.

We use this today on the PDS to paper over some read-after-write concerns that are inherent in eventually consistent architectures. Some clients may use these headers to alert their users that their PDS is “out of sync” with other services in the network (for instance an AppView).

Available sync methods

If you have questions about these changes, join us on GitHub Discussions here.

2023-08-11T00:00:00.000Z

Posting via the Bluesky API

August 11, 2023 · 11 min read

This blog post may become outdated as new features are added to atproto and the Bluesky application schemas.

First, you'll need a Bluesky account. We'll create a session with HTTPie (brew install httpie).

http post https://bsky.social/xrpc/com.atproto.server.createSession \
        identifier="$BLUESKY_HANDLE" \
        password="$BLUESKY_APP_PASSWORD"

Now you can create a post by sending a POST request to the createRecord endpoint.

http post https://bsky.social/xrpc/com.atproto.repo.createRecord \
    Authorization:"Bearer $AUTH_TOKEN" \
    repo="$BLUESKY_HANDLE" \
    collection=app.bsky.feed.post \
    record:="{\"text\": \"Hello world! I posted this via the API.\", \"createdAt\": \"`date -u +"%Y-%m-%dT%H:%M:%SZ"`\"}"

Posts can get a lot more complicated with replies, mentions, embedding images, and more. This guide will walk you through how to create these more complex posts in Python, but there are many API clients and SDKs for other programming languages and Bluesky PBC publishes atproto code in TypeScript and Go as well.

Skip the steps below and get the full script here. It was tested with Python 3.11, with the requests and bs4 (BeautifulSoup) packages installed.

Authentication

Posting on Bluesky requires account authentication. Have your Bluesky account handle and App Password handy.

import requests

BLUESKY_HANDLE = "example.bsky.social"
BLUESKY_APP_PASSWORD = "123-456-789"

resp = requests.post(
    "https://bsky.social/xrpc/com.atproto.server.createSession",
    json={"identifier": BLUESKY_HANDLE, "password": BLUESKY_APP_PASSWORD},
)
resp.raise_for_status()
session = resp.json()
print(session["accessJwt"])

The com.atproto.server.createSession API endpoint returns a session object containing two API tokens: an access token (accessJwt) which is used to authenticate requests but expires after a few minutes, and a refresh token (refreshJwt) which lasts longer and is used only to update the session with a new access token. Since we're just publishing a single post, we can get away with a single session and not bother with refreshing.

Post Record Structure

Here is what a basic post record should look like, as a JSON object:

{
  "$type": "app.bsky.feed.post",
  "text": "Hello World!",
  "createdAt": "2023-08-07T05:31:12.156888Z"
}

Bluesky posts are repository records with the Lexicon type app.bsky.feed.post — this just defines the schema for what a post looks like.

Each post requires these fields: text and createdAt (a timestamp).

This script below will create a simple post with just a text field and a timestamp. You'll need the datetime package installed.

import json
from datetime import datetime, timezone

# Fetch the current time
# Using a trailing "Z" is preferred over the "+00:00" format
now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

# Required fields that each post must include
post = {
    "$type": "app.bsky.feed.post",
    "text": "Hello World!",
    "createdAt": now,
}

resp = requests.post(
    "https://bsky.social/xrpc/com.atproto.repo.createRecord",
    headers={"Authorization": "Bearer " + session["accessJwt"]},
    json={
        "repo": session["did"],
        "collection": "app.bsky.feed.post",
        "record": post,
    },
)
print(json.dumps(resp.json(), indent=2))
resp.raise_for_status()

The full repository path (including the auto-generated rkey) will be returned as a response to the createRecord request. It looks like:

{
  "uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k4duaz5vfs2b",
  "cid": "bafyreibjifzpqj6o6wcq3hejh7y4z4z2vmiklkvykc57tw3pcbx3kxifpm"
}

Setting the Post's Language

Setting the post's language helps custom feeds or other services filter and parse posts.

This snippet sets the text and langs value of a post to be Thai and English.

# an example with Thai and English (US) languages
post["text"] = "สวัสดีชาวโลก!\nHello World!"
post["langs"] = ["th", "en-US"]

The resulting post record object looks like:

{
  "$type": "app.bsky.feed.post",
  "text": "\u0e2a\u0e27\u0e31\u0e2a\u0e14\u0e35\u0e0a\u0e32\u0e27\u0e42\u0e25\u0e01!\\nHello World!",
  "createdAt": "2023-08-07T05:44:04.395087Z",
  "langs": [ "th", "en-US" ]
}

The langs field indicates the post language, which can be an array of strings in BCP-47 format.

You can include multiple values in the array if there are multiple languages present in the post. The Bluesky Social client auto-detects the languages in each post and sets them as the default langs value, but a user can override the configuration on a per-post basis.

Mentions and Links

Mentions and links are annotations that point into the text of a post. They are actually part of a broader system for rich-text "facets." Facets only support links and mentions for now, but can be extended to support features like bold and italics in the future.

Suppose we have a post:

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.

Our goal is to turn the handle (@atproto.com) into a mention and the URL (https://en.wikipedia.org/wiki/CBOR) into a link. To do that, we grab the starting and ending locations of each "facet".

✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR.
             start=23^     end=35^            start=74^                          end=108^

We then identify them in the facets array, using the mention and link feature types. (You can view the schema of a facet object here.) The post record will then look like this:

{
  "$type": "app.bsky.feed.post",
  "text": "\u2728 example mentioning @atproto.com to share the URL \ud83d\udc68\u200d\u2764\ufe0f\u200d\ud83d\udc68 https://en.wikipedia.org/wiki/CBOR.",
  "createdAt": "2023-08-08T01:03:41.157302Z",
  "facets": [
    {
      "index": {
        "byteStart": 23,
        "byteEnd": 35
      },
      "features": [
        {
          "$type": "app.bsky.richtext.facet#mention",
          "did": "did:plc:ewvi7nxzyoun6zhxrhs64oiz"
        }
      ]
    },
    {
      "index": {
        "byteStart": 74,
        "byteEnd": 108
      },
      "features": [
        {
          "$type": "app.bsky.richtext.facet#link",
          "uri": "https://en.wikipedia.org/wiki/CBOR"
        }
      ]
    }
  ]
}

You can programmatically set the start and end points of a facet with regexes. Here's a script that parses mentions and links:

import re
from typing import List, Dict

def parse_mentions(text: str) -> List[Dict]:
    spans = []
    # regex based on: https://atproto.com/specs/handle#handle-identifier-syntax
    mention_regex = rb"[$|\W](@([a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)"
    text_bytes = text.encode("UTF-8")
    for m in re.finditer(mention_regex, text_bytes):
        spans.append({
            "start": m.start(1),
            "end": m.end(1),
            "handle": m.group(1)[1:].decode("UTF-8")
        })
    return spans

def parse_urls(text: str) -> List[Dict]:
    spans = []
    # partial/naive URL regex based on: https://stackoverflow.com/a/3809435
    # tweaked to disallow some training punctuation
    url_regex = rb"[$|\W](https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*[-a-zA-Z0-9@%_\+~#//=])?)"
    text_bytes = text.encode("UTF-8")
    for m in re.finditer(url_regex, text_bytes):
        spans.append({
            "start": m.start(1),
            "end": m.end(1),
            "url": m.group(1).decode("UTF-8"),
        })
    return spans

Once the facet segments have been parsed out, we can then turn them into app.bsky.richtext.facet objects.

# Parse facets from text and resolve the handles to DIDs
def parse_facets(text: str) -> List[Dict]:
    facets = []
    for m in parse_mentions(text):
        resp = requests.get(
            "https://bsky.social/xrpc/com.atproto.identity.resolveHandle",
            params={"handle": m["handle"]},
        )
        # If the handle can't be resolved, just skip it!
        # It will be rendered as text in the post instead of a link
        if resp.status_code == 400:
            continue
        did = resp.json()["did"]
        facets.append({
            "index": {
                "byteStart": m["start"],
                "byteEnd": m["end"],
            },
            "features": [{"$type": "app.bsky.richtext.facet#mention", "did": did}],
        })
    for u in parse_urls(text):
        facets.append({
            "index": {
                "byteStart": u["start"],
                "byteEnd": u["end"],
            },
            "features": [
                {
                    "$type": "app.bsky.richtext.facet#link",
                    # NOTE: URI ("I") not URL ("L")
                    "uri": u["url"],
                }
            ],
        })
    return facets

The list of facets gets attached to the facets field of the post record:

post["text"] = "✨ example mentioning @atproto.com to share the URL 👨‍❤️‍👨 https://en.wikipedia.org/wiki/CBOR."
post["facets"] = parse_facets(post["text"])

Replies, Quote Posts, and Embeds

Replies and quote posts contain strong references to other records. A strong reference is a combination of:

AT URI: indicates the repository DID, collection, and record key
CID: the hash of the record itself

Posts can have several types of embeds: record embeds, images and exernal embeds (like link/webpage cards, which is the preview that shows up when you post a URL).

Replies

A complete reply post record looks like:

{
  "$type": "app.bsky.feed.post",
  "text": "example of a reply",
  "createdAt": "2023-08-07T05:49:40.501974Z",
  "reply": {
    "root": {
      "uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
      "cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
    },
    "parent": {
      "uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k43tv4rft22g",
      "cid": "bafyreig2fjxi3rptqdgylg7e5hmjl6mcke7rn2b6cugzlqq3i4zu6rq52q"
    }
  }
}

Since threads of replies can get pretty long, reply posts need to reference both the immediate "parent" post and the original "root" post of the thread.

Here's a Python script to find the parent and root values:

# Resolve the parent record and copy whatever the root reply reference there is
# If none exists, then the parent record was a top-level post, so that parent reference can be reused as the root value
def get_reply_refs(parent_uri: str) -> Dict:
    uri_parts = parse_uri(parent_uri)

    resp = requests.get(
        "https://bsky.social/xrpc/com.atproto.repo.getRecord",
        params=uri_parts,
    )
    resp.raise_for_status()
    parent = resp.json()

    parent_reply = parent["value"].get("reply")
    if parent_reply is not None:
        root_uri = parent_reply["root"]["uri"]
        root_repo, root_collection, root_rkey = root_uri.split("/")[2:5]
        resp = requests.get(
            "https://bsky.social/xrpc/com.atproto.repo.getRecord",
            params={
                "repo": root_repo,
                "collection": root_collection,
                "rkey": root_rkey,
            },
        )
        resp.raise_for_status()
        root = resp.json()
    else:
        # The parent record is a top-level post, so it is also the root
        root = parent

    return {
        "root": {
            "uri": root["uri"],
            "cid": root["cid"],
        },
        "parent": {
            "uri": parent["uri"],
            "cid": parent["cid"],
        },
    }

The root and parent refs are stored in the reply field of posts:

post["reply"] = get_reply_refs("at://atproto.com/app.bsky.feed.post/3k43tv4rft22g")

Quote Posts

A quote post embeds a reference to another post record. A complete quote post record would look like:

{
  "$type": "app.bsky.feed.post",
  "text": "example of a quote-post",
  "createdAt": "2023-08-07T05:49:39.417839Z",
  "embed": {
    "$type": "app.bsky.embed.record",
    "record": {
      "uri": "at://did:plc:u5cwb2mwiv2bfq53cjufe6yn/app.bsky.feed.post/3k44deefqdk2g",
      "cid": "bafyreiecx6dujwoeqpdzl27w67z4h46hyklk3an4i4cvvmioaqb2qbyo5u"
    }
  }
}

The record embedded here is the post that's getting quoted. The post record type is app.bsky.feed.post, but you can also embed other record types in a post, like lists (app.bsky.graph.list) and feed generators (app.bsky.feed.generator).

Images Embeds

Images are also embedded objects in a post. This example code demonstrates reading an image file from disk and uploading it, capturing a blob in the response:

IMAGE_PATH = "./example.png"
IMAGE_MIMETYPE = "image/png"
IMAGE_ALT_TEXT = "brief alt text description of the image"

with open(IMAGE_PATH, "rb") as f:
    img_bytes = f.read()

# this size limit is specified in the app.bsky.embed.images lexicon
if len(img_bytes) > 1000000:
    raise Exception(
        f"image file size too large. 1000000 bytes maximum, got: {len(img_bytes)}"
    )

# TODO: strip EXIF metadata here, if needed

resp = requests.post(
    "https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
    headers={
        "Content-Type": IMAGE_MIMETYPE,
        "Authorization": "Bearer " + session["accessJwt"],
    },
    data=img_bytes,
)
resp.raise_for_status()
blob = resp.json()["blob"]

The blob object, as JSON, would look something like:

{
    "$type": "blob",
    "ref": {
        "$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
    },
    "mimeType": "image/png",
    "size": 760898
}

The blob is then included in a app.bsky.embed.images array, along with an alt-text string. The alt field is required for each image. Pass an empty string if there is no alt text available.

post["embed"] = {
    "$type": "app.bsky.embed.images",
    "images": [{
        "alt": IMAGE_ALT_TEXT,
        "image": blob,
    }],
}

A complete post record, containing two images, would look something like:

{
  "$type": "app.bsky.feed.post",
  "text": "example post with multiple images attached",
  "createdAt": "2023-08-07T05:49:35.422015Z",
  "embed": {
    "$type": "app.bsky.embed.images",
    "images": [
      {
        "alt": "brief alt text description of the first image",
        "image": {
          "$type": "blob",
          "ref": {
            "$link": "bafkreibabalobzn6cd366ukcsjycp4yymjymgfxcv6xczmlgpemzkz3cfa"
          },
          "mimeType": "image/webp",
          "size": 760898
        }
      },
      {
        "alt": "brief alt text description of the second image",
        "image": {
          "$type": "blob",
          "ref": {
            "$link": "bafkreif3fouono2i3fmm5moqypwskh3yjtp7snd5hfq5pr453oggygyrte"
          },
          "mimeType": "image/png",
          "size": 13208
        }
      }
    ]
  }
}

Each post contains up to four images, and each image can have its own alt text and is limited to 1,000,000 bytes in size. Image files are referenced by posts, but are not actually included in the post (eg, using bytes with base64 encoding). The image files are first uploaded as "blobs" using com.atproto.repo.uploadBlob, which returns a blob metadata object, which is then embedded in the post record itself.

It's strongly recommended best practice to strip image metadata before uploading. The server (PDS) may be more strict about blocking upload of such metadata by default in the future, but it is currently the responsibility of clients (and apps) to sanitize files before upload today.

Website Card Embeds

A website card embed, often called a "social card," is the rendered preview of a website link. A complete post record with an external embed, including image thumbnail blob, looks like:

{
  "$type": "app.bsky.feed.post",
  "text": "post which embeds an external URL as a card",
  "createdAt": "2023-08-07T05:46:14.423045Z",
  "embed": {
    "$type": "app.bsky.embed.external",
    "external": {
      "uri": "https://bsky.app",
      "title": "Bluesky Social",
      "description": "See what's next.",
      "thumb": {
        "$type": "blob",
        "ref": {
          "$link": "bafkreiash5eihfku2jg4skhyh5kes7j5d5fd6xxloaytdywcvb3r3zrzhu"
        },
        "mimeType": "image/png",
        "size": 23527
      }
    }
  }
}

Here's an example of embedding a website card:

from bs4 import BeautifulSoup

def fetch_embed_url_card(access_token: str, url: str) -> Dict:

    # the required fields for every embed card
    card = {
        "uri": url,
        "title": "",
        "description": "",
    }

    # fetch the HTML
    resp = requests.get(url)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")

    # parse out the "og:title" and "og:description" HTML meta tags
    title_tag = soup.find("meta", property="og:title")
    if title_tag:
        card["title"] = title_tag["content"]
    description_tag = soup.find("meta", property="og:description")
    if description_tag:
        card["description"] = description_tag["content"]

    # if there is an "og:image" HTML meta tag, fetch and upload that image
    image_tag = soup.find("meta", property="og:image")
    if image_tag:
        img_url = image_tag["content"]
        # naively turn a "relative" URL (just a path) into a full URL, if needed
        if "://" not in img_url:
            img_url = url + img_url
        resp = requests.get(img_url)
        resp.raise_for_status()

        blob_resp = requests.post(
            "https://bsky.social/xrpc/com.atproto.repo.uploadBlob",
            headers={
                "Content-Type": IMAGE_MIMETYPE,
                "Authorization": "Bearer " + access_token,
            },
            data=resp.content,
        )
        blob_resp.raise_for_status()
        card["thumb"] = blob_resp.json()["blob"]

    return {
        "$type": "app.bsky.embed.external",
        "external": card,
    }

An external embed is stored under embed like all the others:

post["embed"] = fetch_embed_url_card(session["accessJwt"], "https://bsky.app")

On Bluesky, each client fetches and embeds this card metadata, including blob upload if needed. Embedding the card content in the record ensures that it appears consistently to everyone and reduces waves of automated traffic being sent to the referenced website, but it does require some extra work by the client.

Putting It All Together

A complete script, with command-line argument parsing, is available from this Git repository.

As mentioned at the beginning, we expect most folks will use SDKs or libraries for their programming language of choice to help with most of the details described here. But sometimes it is helpful to see what is actually going on behind the abstractions.

2023-08-08T00:00:00.000Z

Featured Community Project: Skyfeed

August 8, 2023 · 6 min read

SkyFeed is a third-party client built by redsolver. Users can create a dashboard out of their feeds, profiles, and more. Additionally, while custom feeds currently take some developer familiarity to build from scratch, SkyFeed allows Bluesky users to easily build their own custom feeds based off of regexes or lists.

You can try SkyFeed yourself here, and follow SkyFeed’s Bluesky account for updates.

Screenshot of SkyFeed

Hi, I’m redsolver, a developer from Germany. In the past I tried building a decentralized social network twice, but both times it failed, most recently due to the decentralized storage layer (Skynet) just shutting down completely. So last year I started working on a new content-addressed storage network myself with all features needed for a truly reliable social network. I'm still actively working and building open-source apps like an end-to-end-encrypted cloud storage app on top of it, but instead of building yet another social network from scratch, I decided to focus on building cool stuff for atproto/Bluesky. The AT Protocol shares many ideas with my previous attempts (like decentralized identity) and is already a lot more mature.

What is SkyFeed?

There's the SkyFeed app, which is a third-party web client (cross-platform soon) for using Bluesky. Some users compare the experience to TweetDeck. A unique feature is that it subscribes to a custom minimal version of the Bluesky firehose (all events happening on the network). This makes it possible to have all like/reply/repost counts update in real-time and new posts pop up in near real-time everywhere in the app! Another cool feature is the collapsible thread view which makes following big discussions a lot easier.

But most users are using the app because of the integrated SkyFeed Builder, a tool to make building feeds easier for both developers and non-developers. It's really exciting watching a very diverse set of users build the over 6,000 feeds that are already published using the builder! The SkyFeed web app is available at https://skyfeed.app/.

Screenshot of SkyFeed Builder

What inspired you to build SkyFeed?

As mentioned earlier, I've been really interested in decentralized social networks for a while. After getting a Bluesky invite and reading the atproto docs, the tech really caught my interest.

There were already quite a few third-party clients, but none of them were written in Flutter (my favorite framework). So I started working on a new one, both for getting a better feel of the Bluesky internals and because I wanted a desktop client that I personally enjoy using daily. Even though the first release was missing quite a lot of important features (like notifications), the positive feedback motivated me to continue building.

When the Bluesky team published the custom feed spec and the feed generator starter kit, things really took off. I made some feeds and added experimental support for using them to the SkyFeed app. They are an awesome concept and in my opinion really give Bluesky the edge over competing networks. It makes content discovery so much easier, because no algorithm or AI has more relevant suggestions than highly engaged users building elaborate feeds for any and all niche interests they have. So the reason I made the SkyFeed Builder was to give this power to as many people as possible. And what inspires me to continue building and improving SkyFeed is all the positive feedback and happy users :)

What tech stack is SkyFeed built on?

The SkyFeed app is built using the Flutter framework and the Dart programming language. I'm using the excellent Dart atproto/Bluesky packages, created by Shinya Kato. Most of the backend is written in Dart and running on some Hetzner servers, the feed generator proxy and cache were recently moved to fly.io for better scalability. I'm running multiple open-source indexers which listen to the entire network firehose and store everything in an instance of SurrealDB. SurrealDB is still in beta, but it's fun to work with! And apart from some performance issues, it has been pretty reliable. The query engine for the SkyFeed Builder feeds is written in Rust and open-source too. It keeps all posts from the last 7 days and their metadata in memory and then executes all of the SkyFeed Builder steps/blocks. Additional metadata (like the full post history for a single user) are fetched on demand from SurrealDB.

What's in the future for SkyFeed?

New "Remix" feature to edit, improve and re-publish any SkyFeed Builder feed (as long as it has an open license)
Make it easier to self-host the SkyFeed Builder infrastructure and get some third-party providers online. This will give users more choice and make the whole feed ecosystem more reliable and robust
Add support for personalized feeds and SurrealQL queries to the builder, but they are very resource-intensive so will likely be invite-only (but self-hosting always works of course!)
Improve the SkyFeed app, get a nice new logo, fully open-source it and release cross-platform (Android, iOS, Linux, Windows, macOS)
Support for videos, audio and other media content with a new custom lexicon in a backwards-compatible way. They will use the storage network I'm working on, but with an atproto-compatible blob format. The main difference is that it uses the BLAKE3 hash function instead of SHA256 and has no file size limit
A self-hosted proxy which bridges other social networks (Mastodon, Nostr, RSS, Hacker News) and makes them available in any Bluesky client. Reddit and "X" might be supported too, but with a bring-your-own-API-key requirement. The proxy also adds more features like advanced (word) muting, an audit log to see exactly which changes third-party apps made to your account and the option to use a self-hosted "App View" (basically the SkyFeed Indexer with SurrealDB)
A new List Builder (based on profile, name, follower count and more) as soon as lists other than Mute Lists are supported

In summary: Make SkyFeed (apps, builder and more) the ultimate power user experience, while open-sourcing everything and keeping the option to self-host all components.

You can follow redsolver on Bluesky here, SkyFeed for project updates here, and be sure to try out SkyFeed yourself here.

Note: Use an App Password when logging in to third-party tools for account security and read our disclaimer for third-party applications.

2023-08-03T00:00:00.000Z

Bluesky Call for Developers

August 3, 2023 · One min read

Bluesky is an open social network built on the AT Protocol, a flexible technology that will never lock developers out of the ecosystems that they help build. With atproto, third-party can be as seamless as first-party through custom feeds, federated services, clients, and more.

If you're a developer interested in building on atproto, we'd love to email you an invite code. Simply share your GitHub (or similar) profile with us via this form.

Read more about the AT Protocol here and check out some third-party developer projects here.

2023-06-20T00:00:00.000Z

Federation Developer Sandbox Guidelines

June 20, 2023 · 7 min read

Welcome to the atproto federation developer sandbox! ✨

This is a completely separate network from our production services that allows us to test out the federation architecture and wire protocol.

The federation sandbox environment is an area set up for exploration and testing of the technical components of the AT Protocol distributed social network. It is intended for developers and self-hosters to test out data availability in a federated environment.

To maintain a positive and productive developer experience, we've established this Code of Conduct that outlines our expectations and guidelines. This sandbox environment is initially meant to test the technical components of federation.

Given that this is a testing environment, we will be defederating from any instances that do not abide by these guidelines, or that cause unnecessary trouble, and will not be providing specific justifications for these decisions.

Using the sandbox environment means you agree to adhere to our Guidelines. Please read the following carefully:

Post responsibly. The sandbox environment is intended to test infrastructure, but user content may be created as part of this testing process. Content generation can be automated or manual. Do not post content that requires active moderation or violates the Bluesky Community Guidelines.

Keep the emphasis on testing. We’re striving to maintain a sandbox environment that fosters learning and technical growth. We will defederate with instances that recruit users without making it clear that this is a test environment.

Do limit account creation. We don't want any one server using a majority of the resources in the sandbox. To keep things balanced, to start, we’re only federating with Personal Data Servers (PDS) with up to 1000 accounts. However, we may change this if needed.

Don’t expect persistence or uptime. We will routinely be wiping the data on our infrastructure. This is intended to reset the network state and to test sync protocols. Accounts and content should not be mirrored or migrated between the sandbox and real-world environments.

Don't advertise your service as being "Bluesky." This is a developer sandbox and is meant for technical users. Do not promote your service as being a way for non-technical users to use Bluesky.

Do not mirror sandbox did:plcs to production.

Status and Wipes

🐉 Beware of dragons!

This hasn’t been production tested yet. It seems to work pretty well, but who knows what’s lurking under the surface — that's what this sandbox is for! Have patience with us as we prep for federation.

On that note, please give us feedback either in Issues (actual bugs) or Discussions (higher-level questions/discussions) on the atproto repo.

🗓 Routine wipes

As part of the sandbox, we will be doing routine wipes of all network data.

We expect to perform wipes on a weekly or bi-weekly basis, though we reserve the right to do a wipe at any point.

When we wipe data, we will be wiping it on all services (BGS, App View, PLC). We will also mark any existing DIDs as “invalid” & will refuse to index those accounts in the next epoch of the network to discourage users from attempting to “rollover” their accounts across wipes.

Getting started ✨

Now that you've read the sandbox guidelines, you're ready to self-host a PDS in the developer sandbox. For complete instructions on getting your PDS set up, check out the README.

To access your account, you’ll log in with the client of your choice in the exact same way that you log into production Bluesky, for instance the Bluesky web client. When you do so, please provide the url of your PDS as the service that you wish to log in to.

Auto-updates

We’ve included Watchtower in the PDS distribution. Every day at midnight PST, this will check our GitHub container registry to see if there is a new version of the PDS container & update it on your service.

This will allow us to rapidly iterate on protocol changes, as we’ll be able to push them out to the network on a daily basis.

When we do routine network wipes, we will be pushing out a database migration to participating PDS that wipes content and accounts.

You are within your rights to disable Watchtower auto-updates, but we strongly encourage their use and will not be providing support if you decide not to run the most up-to-date PDS distribution.

Odds & Ends & Warnings & Reminders

🧪 Experiment & have fun!

🤖 Run feed generators. They should work the exact same way as production - be sure to adjust your env to listen to Sandbox BGS!

🌈 Feel free to run your own AppView or BGS - although it’s a bit more involved & we’ll be providing limited support for this.

👤 Your PDS will provide your handle by default. Custom domain handles should work exactly the same in sandbox as they do on production Bluesky. Although you will not be able to re-use your handle from production Bluesky as you can only have one DID set per handle.

🚨 If you follow the self-hosted PDS setup instructions, you’ll have private key material in your env file - be careful about sharing that!

📣 This is a sandbox version of a public broadcast protocol - please do not share sensitive information.

🤝 Help each other out! Respond to issues & discussions, chat in the community-run Matrix or Discord, etc.

Learn more about atproto federation

Check out the high-level view of federation.

Dive deeper with the atproto docs.

Network Services

We are running three services: PLC, BGS, Bluesky "App View"

PLC

Hostname: plc.bsky-sandbox.dev

Code: https://github.com/did-method-plc/did-method-plc

PLC is the default DID provider for the network. DIDs are the root of your identity in the network. Sandbox PLC functions exactly the same as production PLC, but it is run as a separate service with a separate dataset. The DID resolution client in the self-hosted PDS package is set up to talk the correct PLC service.

BGS

Hostname: bgs.bsky-sandbox.dev

Code: https://github.com/bluesky-social/indigo/tree/main/bgs

BGS (Big Graph Service) is the firehose for the entire network. It collates data from PDSs & rebroadcasts them out on one giant websocket.

BGS has to find out about your server somehow, so when we do any sort of write, we ping BGS with com.atproto.sync.requestCrawl to notify it of new data. This is done automatically in the self-hosted PDS package.

If you’re familiar with the Bluesky production firehose, you can subscribe to the BGS firehose in the exact same manner, the interface & data should be identical

Bluesky App View

Hostname: api.bsky-sandbox.dev

Code: https://github.com/bluesky-social/atproto/tree/main/packages/bsky

The Bluesky App View aggregates data from across the network to service the Bluesky microblogging application. It consumes the firehose from the BGS, processing it into serviceable views of the network such as feeds, post threads, and user profiles. It functions as a fairly traditional web service.

When you request a Bluesky-related view from your PDS (getProfile for instance), your PDS will actually proxy the request up to App View.

Feel free to experiment with running your own App View if you like!

The PDS

The PDS (Personal Data Server) is where users host their social data such as posts, profiles, likes, and follows. The goal of the sandbox is to federate many PDS together, so we hope you’ll run your own.

We’re not actually running a Bluesky PDS in sandbox. You might see Bluesky team members' accounts in the sandbox environment, but those are self-hosted too.

The PDS that you’ll be running is much of the same code that is running on the Bluesky production PDS. Notably, all of the in-pds-appview code has been torn out. You can see the actual PDS code that you’re running on the atproto/simplify-pds branch.

Feedback

We're excited for you to join us in the developer sandbox soon! Please give us feedback either in Issues (actual bugs) or Discussions (higher-level questions/discussions) on the atproto repo.

2023-06-08T00:00:00.000Z

Why are blocks on Bluesky public?

June 8, 2023 · 11 min read

The technical implementation of public blocks and some possibilities for more privacy preserving block implementations — an area of active research and experimentation.

In April, we shipped a block feature to all users. Unlike on other centralized platforms, blocks on Bluesky are public and enumerable data, because all servers across the network need to know that they exist in order to respect the user’s request.

The current system of public blocks is just one aspect of our composable moderation stack, which we are actively building during our beta period. We’re working on more sophisticated individual and community-level interaction controls and moderation tooling, and we also encourage third-party community developers to contribute to this ecosystem.

In this post, we’ll share the technical implementation of public blocks and discuss some possibilities for more privacy preserving block implementations — an area of active research and experimentation. We welcome community suggestions, so if you have a proposal to share with us on how to implement private blocks after you read this post, please contribute to our public discussion here.

What are blocks?

At an abstract level, across many social media platforms, blocks between two accounts usually have the following features:

Symmetric: the behavior is the same regardless of which account initiated a block first
Mutual mute: neither account can read any content (public or private) from the other account, while logged in
Mutual interaction block: direct interactions between the two accounts are not allowed. This includes direct mentions resulting in a notification, replies to posts, direct messages (DMs), and follows (which normally result in notifications).

Blocks add a significant and high-impact degree of friction. There are many cases where this friction alone is sufficient to de-escalate conflict.

However, it is important to note that blocking does not prevent all possible interaction (even on centralized social networks). For example, when content is public, as it is on Bluesky, blogs, or websites, blocked people can still easily access the content by simply logging out or opening an incognito browser tab. Posts can still be screenshotted and shared either on-network or off-network. Harassment can continue to occur even without direct mentions or replies (”subtweeting,” posting screenshots, etc.).

On most existing services, the blockee can detect that they’ve been blocked, though it may not be immediately obvious. For example, if they’re able to navigate to the blocker’s profile, they may see a screen that says they’ve been blocked, or the absence of the profile is indication enough that they have been blocked. Most social apps provide each user with a list of the accounts that they have blocked.

You can read more about blocking behaviors on other platforms:

Twitter: https://help.twitter.com/en/using-twitter/blocking-and-unblocking-accounts
Mastodon: https://docs.joinmastodon.org/user/moderating/#block
Instagram: https://help.instagram.com/447613741984126

How are blocks currently implemented in Bluesky?

Blocks prevent interaction. Blocked accounts will not be able to like, reply, mention, or follow you, and if they navigate directly to your profile, they will see that they have been blocked. Like other public social networks, if they log out of their account or use a different account, they will be able to view your content. (This much is standard across centralized social networks as well.)

Currently, on Bluesky, you can view a list of your blocked accounts, and while the list of people who have blocked you is not surfaced in the app, developers familiar with the API could crawl the network to parse this information. This section will dive into the technical constraints that cause blocks to be public, and in a later section, we’ll discuss possible alternative implementations.

Blocks in Bluesky are implemented as part of the app.bsky.* application protocol, which builds on top of the underlying AT Protocol (atproto). Blocks are a record stored in account repositories. They look and behave very similarly to “follows”: the app.bsky.graph.block and app.bsky.graph.follow record schemas are nearly identical.

The block behavior is then implemented by several pieces of software. Servers and clients will index the block records and prevent actions which would have violated the intended behaviors: posts will not appear in feeds and reply threads; profile fetches will be empty or annotated with block state; creation of reply posts, quote posts, embeds, and mentions are blocked; any notifications involving the other account are additionally suppressed.

One of the core principles of the AT Protocol, which Bluesky is built on, is that account holders have total control over their own data. This means that while protocol-compliant clients and servers prevent blocked accounts from creating replies or other disallowed records in each user’s data repository, it is technically possible to bypass those restrictions if a client refuses to be protocol-compliant. The act of being blocked also does not result in any change to the blockee’s repository, and any old replies or mentions remain in place, untouched. For example, in the user-facing app, if someone replies to your post and then you block them, their replies will now be hidden to you. If you later decide to unblock them, their replies to that post will appear again, because the replies themselves were not deleted.

Despite blocks not removing the content of other user’s repositories, the data is not shown because blocks are primarily enforced by other nodes and services — personal data servers (PDS), App Views, and clients. One side effect that comes out of this architecture is that follow relationships are not changed due to a block, and “soft blocks” (rapid block/unblock) do not work as a mechanism to remove a follower. While a follow relationship might still exist in the graph, the block prevents any actual viewing or delivery of content. As future work, we can also ensure that details such as ”like” counts and “follower” accounts are updated when block status changes.

How will blocks work with open federation?

Bluesky is a public social network built on a protocol to support public conversation, so similar to blogs and websites, you do not need a Bluesky account in order to see content posted to the app. In order to support open federation where many servers, clients, and App Views are collaborating to surface content to users, each account’s data repository — which contains information like follows and blocks — must be public. All of the servers across the network must be able to read the data. Servers must know which accounts you have blocked in order to be able to enforce that relationship.

Once we launch federation there will be many personal data servers (PDS), clients, and App Views. The expectation is that virtually all accounts will be using clients and servers that respect blocking behavior.

It is this need for multiple parties to coordinate that necessitates blocks being public. “Mute” behavior can be implemented entirely in a client app because it only impacts the view of the local account holder. Blocks require coordination and enforcement by other parties, because the views and actions of multiple (possibly antagonistic) parties are involved.

In theory, a bad actor could create their own rogue client or interface which ignores some of the blocking behaviors, since the content is posted to a public network. But showing content or notifications to the person who created the block won’t be possible, as that behavior is controlled by their own PDS and client. It’s technically possible for a rogue client to create replies and mentions, but they would be invisible or at least low-impact to the recipient account for the same reasons. Protocol-compliant software in the ecosystem will keep such content invisible to other accounts on the network. If a significant fraction of accounts elected to use noncompliant rogue infrastructure, we would consider that a failure of the entire ecosystem.

Remember that clever bypasses of the blocking behaviors are already possible on most networks (centralized or not), and it is the added friction that matters.

Are there other ways to implement blocks in federated systems?

Yes, and we are actively exploring other implementations and novel research areas to inform our development on the AT Protocol. We also welcome community suggestions and discussions on this topic.

One example is ActivityPub, which is the protocol that Mastodon is built on. ActivityPub does not require public blocks because content there is not globally public by default — this is also why picking which server you join matters, because it limits the content that you see. Despite this, Mastodon does sometimes show block information to other parties, which is a frequent topic of discussion in the ActivityPub ecosystem.

As we currently understand it, on Mastodon, you only see content when there is an explicit follow relationship between accounts and servers, and follows require mutual consent. (In practice, most follow requests are auto-accepted, so this behavior is not always obvious to end users.) The mutual-mute behavior that blocks require can be implemented on Mastodon by first, disallowing any follows between the two accounts, and second, by adding a regular “mute.” Similar to Bluesky, the interaction-block behavior relies on enforcement by both the server and the client. So on Mastodon too, it’s possible that a bad actor implements a server that ignores blocks and displays blocked replies in threads. Both ActivityPub and AT Protocol can use de-federation as an enforcement mechanism to disconnect from servers that don’t respect blocks.

Technical approaches we’ve considered for private blocks

One proposed mechanism to make blocks less public on Bluesky is the use of bloom filters. The basic idea is to encode block relationships in a statistical data structure, and to distribute that data structure instead of the set of actual blocks. The data structure would make it easy to check if there was a block relationship between two specific accounts, but not make it easy to list all of the blocks. Other servers and clients in the network would then use the data structure to enforce the blocking behaviors. The bloom filters could either be per-account (aka, a bloom filter stored in a record), or per-PDS, or effectively global, with individual PDS instances submitting block relationships to a trusted central service which would publish the bloom filter lists. We considered a scheme like this before implementing blocks, but there are a few issues and concerns:

Bloom filters don’t fully prevent enumerating blocks, and if a bad actor was only interested in specific accounts, they could still easily find the list of blocked accounts. Bloom filters really only add a mask, and it would still be relatively easy to enumerate blocks. While the full matrix of possible block relationships is NxN (where N is the number of accounts in the network, which could ultimately be upwards of hundreds of millions in the future) might be too large to test against, in reality, a bad actor would likely only be targeting prominent accounts or specific communities. In that case, only on the order of billions of possible relationships would need to be tested, which would be trivial on modern hardware.
Bloom filters are computationally expensive. While bloom filters are known for efficiently reducing the storage size for looking up a large number of hashes, they have a large overhead compared to individual hashes. In the context of blocks, every creation or deletion of a block record would potentially require the generation and distribution of a full-sized bloom filter. The storage and bandwidth overhead becomes significant at scale, especially since a significant fraction of social media accounts could have many thousands of blocks.
Latency problems persist in mitigations for bloom filter overhead. The above storage and bandwidth concerns could be mitigated by “batching,” or through a trusted central service. But those solutions have their own problems with latency (time until block is enforced across the network) and trust and reliability (in a central service, which would have the full enumeration of block relationships).

The team is still actively discussing this option, and it’s possible that the extra effort and resources required by bloom filters is worth the imperfect but additional friction that they provide. At the moment, it’s not entirely obvious to us that the tradeoff is worth it. While we’re currently iterating on other moderation and account safety features, we decided to initially release blocks with this simple public system as a first pass.

Some other proposals we’re exploring include:

Label-based block enforcement. Instead of trying to prevent all violations of blocking relationships across the network, scan for violations of them and label them.
Interaction gating. Place authority for post threads and quote posts in the original poster’s PDS, so block information doesn’t need to leave that server.
Zero-knowledge proofs. We’re aware of existing ZK approaches to distributed blocks, such as SNARKBlock, and we’re speaking with trusted advisors about this open area of research and experimentation. Perhaps this research might lead to us deploying a novel system in the future.
Trusted App Views. Accounts could privately register their blocks with their PDS, and then these servers would forward block metadata to a small number of “blessed” App Views.

If you have experience here or have thoughts about how to implement private block relationships in decentralized systems, we’d love to hear from you. Please contribute to our discussion here.

Getting to Federation​

Operational Changes for Federation​

New Application Development​

Further Down the Road​

Protocol Governance​

TL;DR​

Bluesky BGS​

How close are we to federation?​

This change impacts devs in two ways:​

DID Document Formatting Changes​

Adding Rate Limits​

PDS Distribution v3​

Handle Invalidations on App View​

Subscribe for Developer Updates​

Removing Repository History​

Using a Logical Clock for Repositories​

Repository Revisions​

Ordering​

Sync​

Stale Reads​

Available sync methods​

Authentication​

Post Record Structure​

Setting the Post's Language​

Mentions and Links​

Replies, Quote Posts, and Embeds​

Replies​

Quote Posts​

Images Embeds​

Website Card Embeds​

Putting It All Together​

Can you share a bit about yourself and your background?​

What is SkyFeed?​

What inspired you to build SkyFeed?​

What tech stack is SkyFeed built on?​

What's in the future for SkyFeed?​

Status and Wipes​