Disagreement - The Floating Droid

Don't accuse your interlocutor of being insufficiently truth-seeking

Apr 30, 2025 2 min read Disagreement

I argue that you shouldn't accuse your interlocutor of being insufficiently truth-seeking. This doesn't mean you can't internally model their level of truth-seeking and use that for

The limits of black-box evaluations: two hypotheticals

Apr 11, 2025 6 min read Evaluations

A prominent approach to AI safety goes under the name of "evals" or "evaluations". These are a critical component of plans that various major labs have, such as Anthropic&

Distributional Trust

Oct 21, 2022 4 min read Distributional Trust

I've written previously about factors impacting cooperation, especially in the presence of large disagreements. One factor that relates to cooperation that has been on my mind lately is trust. There are

Social Firewalls

Sep 2, 2022 8 min read Cooperation

I'm interested in how people can cooperate despite large disagreements. Part of this is because I believe such cooperation may be necessary for tackling issue in AI safety (e.g. some