Don't accuse your interlocutor of being insufficiently truth-seeking
I argue that you shouldn't accuse your interlocutor of being insufficiently truth-seeking. This doesn't mean you can't internally model their level of truth-seeking and use that for
The limits of black-box evaluations: two hypotheticals
A prominent approach to AI safety goes under the name of "evals" or "evaluations". These are a critical component of plans that various major labs have, such as Anthropic&
A different take on the Musk v OpenAI preliminary injunction order
The judge in the Musk v OpenAI[1] case out of the northern district of California has issued an order on Musk's motion for preliminary injunction, which asked for an order
Do-Not-Train Signals
What data should an ML model developer be able to train on? This post is a proposal for addressing that question. In my view, powerful ML systems will radically change the world, and
Is Agency Identifiable?
Identifiability in IRL
One of my favorite papers is this one, titled "Occam's razor is insufficient to infer the preferences of irrational agents". It relates to an area of