a more complete bluesky feed generator
Bluesky, the recently trending Twitter-like platform, offers a powerful promise for the future of social media. Especially in modern times, when our attention is a battleground for large corporations and nation states alike, we really need better tools to make our information diets more transparent.
Bluesky offers quite a few of those tools, including:
- An event firehose that allows anyone with an Internet connection to inspect real time interactions on the network
- The ability for developers to create new custom feeds, giving users choice on what algorithms they use to filter content on the network.
- "Stackable" moderation through labelers users can subscribe to customize how content is displayed.
I used these tools to create Alt Text Hotties, a feed encouraging accessibility best practices by displaying selfies that include alt text from across the entire network.
While the team has offered up some boilerplate templates to help developers get started with creating their own custom feeds, I didn't find anything that could be considered "feature complete", and wanted to share the learnings from developing my own custom feed.
I started with this minimal Go-based template (thank you Jaz!) and extended the functionality to create dynamic feeds. This fork is published on GitHub and the rest of this post serves as a deep-dive for anyone that wishes to use this template to create their own feeds.
Before diving into the code, we'll begin with an overview of how the protocol Bluesky is built on top of works.
the AT protocol
Bluesky is built on top of the Authenticated Transfer protocol, or atproto for short. In theory, atproto can be used for a range of social media apps (not just a Twitter clone) and potentially pave the way towards decentralized social media becoming mainstream.
In practice, the protocol is a work in progress and there are a few limitations today that need to be addressed.
Despite the caveats — Bluesky is one of the few places with decent enough network effects to make it interesting, and atproto provides a radically open view into the network’s machinery in a way that other social media platforms do not.
Each user on atproto is represented by a unique Decentralized Identifier (DID) and has their own Personal Data Server (PDS) that stores all their interactions on the network.
Users interact with the network my committing data records to their PDS
Relays are responsible for monitoring every PDS and providing a synchronized view of activity across the network. These relays are what provides the “firehose” of Bluesky data.
data flow from multiple PDS instances and a relay
Subscribing to one of these relays is where we start the process of generating a feed.
working with the event firehose
Rather than listening to the firehose directly, the project makes use of jetstream
, which provides a simplified JSON view of events streaming through Bluesky’s relays. Using jetstream
drastically decreases the overall bandwidth flowing through the service and is sufficient for the purposes of creating a custom feed.
In service/pkg/stream/subscriber.go
, we connect to jetstream
through a web socket connection at the following URI:
We use the client
interface to connect to the web socket and start reading events from the stream.
Notice the call to s.handleCommit()
. This is within a callback function that gets called for every event that is received through the websocket connection.
Take a look at the handleCommit
implementation. Here, we inspect the commit and make decisions based on what kind of event it is. This is apparent with the switch
statements on event.Commit.Operation
and event.Commit.Collection
.
There's several constants defined for the collection types we're interested in processing:
These constants describe the different types of data records that Bluesky uses as defined by atproto’s Lexicon schema language. Seem familiar? These are the same records that are committed to a user’s PDS.
commit handlers
Now take a look at pkg/stream/handlers.go
.
Here, we have a variety of methods that handle events dependent on what commit operation it is and what collection type that commit is for. For instance, here’s the handler for post deletion:
These can be modified to your liking depending on how you want to structure your feed.
post classification and persistence
The handleCreatePost
method contains most of the interesting logic. In this method, we:
- inspect a post's contents
- check if the post contains images
- if yes, run those images through a classifier
- saving matching posts in a database
Since most machine learning tools (like PyTorch, transformers, etc) exist in the Python ecosystem, we wrap calls to the image classification model in a separate container as a REST interface.
The Go service makes POST calls to the classifier at http://classifier:12000/classify
, passing in an image_url
parameter. When called, the Python service loads the image and runs it through a transformer-based classifier.
In this example, we use the CLIP transformer model in classifier/app.py
to label images bird
or not_bird
:
There is another method in app.py
called classify_image
that serves as our POST request handler, and returns JSON in the response that looks like this:
handleCreatePost
will check this response for the proper label, and save posts if the confidence exceeds 85%:
serving feeds
Within the feedrouter
package, the interface we expect every feed to satisfy is defined:
To create a dynamic feed, we'll retrieve the posts previously persisted from the firehose, matching our "is it a bird" criteria.
We've already seen examples of leveraging the db
package to add / remove posts from the firehose. Let's take a look at the interface:
The last two methods are what we'll use to retrieve posts. Notice how these methods expect limit
and cursor
integers. These are same values passed into GetPage
to serve a feed with pagination.
Now let's register two dynamic feeds leveraging these db functions in our main program. We'll use the NewDynamicFeed
method in main.go
, like so:
Taking a look at the first feed we register, the returned justBirdsFeed
contains both GetPage
and Describe
methods to satisfy the feed interface, which can then be leveraged by feedRouter
to serve up our newly implemented feeds.
testing
To run the feed generator, the main dependency you'll need is Docker. With Docker installed, create a .env
file based on .env.example
. For now, the main variable to change is FEED_ACTOR_DID
, which will be the DID associated with the feed you're publishing.
You can find the DID for your Bluesky handle with the following command:
With FEED_ACTOR_DID
populated, you can start the feed generator's services by running:
Then you can see which posts get returned by one of the dynamic feeds like so:
This will return a list of AT URIs and a cursor. The AT URI looks something like this:
To see the post on Bluesky, use the DID and the post record to format the URL in your browser like so:
deployment
While creating a full cloud deployment is outside the scope of this article, examining AT URIs isn't as fun as seeing the feed in Bluesky's app.
If you'd like to see the feed on Bluesky without a full deployment, you can use a proxy to provide a public endpoint to the local feedgen
service. Here, we'll use ngrok
.
Once you have ngrok
installed and created an account, register an auth token (from the account page) like so:
Then, you can create a tunnel to your local feedgen
service by running:
This will create a secure endpoint that forwards traffic to your local feedgen
service:
This https URL will serve as the SERVICE_ENDPOINT
found in .env
. Once you've set this variable, you can use these instructions in the official Bluesky feed generator template to publish your feed.
And that's about it. Hopefully you found this guide useful and a reasonable starting point to develop your own custom feeds. If you did find it useful, be sure to give a star to the repo on Github. Happy developing!
references
- AT Protocol Documentation
- Bluesky's
indigo
library - The Xblock 3rd party labeller by @aendra.bsky.social for inspiration on image classification
bsky-furry-feed
for inspiration on firehose consumption- The original
go-bsky-feed-generator
by @jaz.bsky.social