I love this piece and the experiment, but tentatively disagree with you about the consequences on society writ large. I tend to believe incentives shape personality—a lot of clinical theory riffs on this, and certainly I see it in my practice—so a constant incentive to be nicer, more charitable, and more careful in your speech seems positively wonderful for human society.
The idea of everyone being in a mini moral interview with your assistant for the rest of time, that seems good to me at the margin. Sometimes you have hate in your heart, those times work less bad, try again next time with less hate. I admit I’m generalizing horribly from personal experience, but circumstances with that kind of immediate feedback have been wonderful for my character. I can give examples if you like, just don’t want to overshare.
Yeah, I totally get that. In some ways, I think that Claude (even in his current form) is a better moral reasoner than most people you might meet in the street. And we might all do better -- on average -- do be held to the standards of Claude. But then there's the question of, like, what would be the fullest unfolding of our potential as a species? Morally, and spiritually, all of it? Is it one where we have to worry if the AIs in our pockets are judging us? Idk. I feel like if you plumb almost anyones psyche you're gonna find some horrors -- if not from their own experience then from the collective inheritance of having evolved out of such scarcity and suffering. Part of me thinks we will really need to plumb this if we want to reach our most healed and love-driven selves. And maybe Claude can be part of that too. But it would have to be at the right time, in the right way. And I'd be worried about a mismatch where the darkness humanity needs to feel less shame around is the darkness Claude would judge them for.
This might seem like an odd place to take this, but what is your theory of emotion? Do you feel we have feelings that get suppressed we truly need to experience before they cease to steer us?
uffff big question. OK just off the top off my head, for me: emotions are part of your cognition. Most of them are just signs you can listen to (angry = boundary has been crossed, do something about it; sadness = something you need to let go of, let go, etc — these are just examples, you gotta find out what each emotion is telling you by being able to feel your own body). Some emotions however, seem more “stuck” — I see them as being the emotional response to painful, wrong lessons you learned at some point and need to unlearn. So like if you learned being wrong = parents stop loving you = terrifying, you might have overgeneralised to being wrong = terrifying, and to correct that need to go back to the original time you incorrectly concluded “parents wont love me”, feel that unprocessed terror, and from that place of feeling the original pain, update. So you like unfreeze the emotion by unlearning the lesson. This is all pretty standard stuff, by the way, none of it my own.
Yeah, it's in the water supply and in various modalities. So by that theory, the danger of being judged by the genie in your pocket is that you're discouraged from going back to the wrong lesson, and you pretend wrong ≠ painful rather than actually spending some time abiding with the origin and coming to a new understanding?
The reason I'm curious, is I've recently become fond of constructivist interpretations of emotion (as articulated by e.g. Lisa Feldmen Barret in How Emotions are Made), which I think would have the wrong=painful connection get eroded through contrary experience, but I'm still grappling with the model.
I think the difference matters, because if I understand its implications correctly and the constructivist model is correct trauma, in eternal-mini-interview world trauma responses might simply fall away through atrophy and strengthening of new habits.
As an American former lawyer, I find the cab rank rule shocking and appalling. We have nothing like that in America, and I see no reason why such an infringement on the conscience of a lawyer would be necessary for justice. If the quality of legal representation reflects the quality of the client and the client's cause, I would think that would make courts more likely to reach the right conclusions, not less.
oooof. A tiny snag, in my view…sometimes the client is not guilty!!(even if at first blush it might look like they are!!) That is why we have… ⚖️⚖️⚖️💫 a justice system ⚖️⚖️⚖️💫 The entire theory of adversarial justice is that truth and justice emerge form the lawyers fighting it out and the judge/jury deciding. It is not the job of the lawyers to do this! Imagine if bad-looking cases only got bad representation — the court would be less well equipped to make it’s determination, not more. I pray that you are never wrongfully arrested in unfortunate-looking circumstances in America; I imagine that would be even more shocking and appalling than people getting adequate representation.
We do have public defenders in America, who are obligated as part of their job to represent people who can't afford an attorney. And American attorneys are about as diverse in their views as Americans, it is unrealistic to think there won't be any attorney willing to represent a paying client or a worthy cause. What comes off as unrealistic to me is thinking that because one particular lawyer passes on a case, therefor that person will go unrepresented.
And I for one would not want to be represented by a lawyer who didn't believe in my case. Who would? I'm going to get the best representation from a lawyer who is aligned with me, and if the lawyers who are not aligned with me decline to take my case, they are doing me a service. I hope you never have to find yourself represented by an attorney who dislikes you or your case just because they happened to be in the front of the cab line.
Do you honestly think that America is less just because we don't force lawyers to take cases they don't believe in?
How would impact litigation even work in your system? The concept of impact litigation depends on the lawyers selecting the cases that they think will move the law in the right direction.
But we are getting rather off topic here. Maybe the more central point is that when looking for an analogy and writing for an international audience, do not pick an analogy that only makes sense given cultural assumptions that are extremely specific to your particular little island.
I love that you went the extra mile and actually performed an experiment, instead of just speculating off the cuff as so many lazier writers (e.g., me) might be tempted to do. Especially when you're in this post-every-day blogging sprint and could be excused for taking the easier path to crank one out. Kudos for taking the harder and infinitely more valuable path.
Hi Natalie, I have read a lot of your pieces over the last couple of weeks, and enjoyed all of them (esp getting enlightened through getting your legs waxed; and they can't go on like this) and am awed that they are all so good given that you have to write one each day, and I hope to read a lot more from you, long after Inkhaven ends as well.
Very interesting piece! Just wanted to share a note to say that if you want to explore this kind of stuff more rigorously, Claude itself will be very helpful - if you do it in Claude Code or Cowork it can run the prompts autonomously and collect the data for you. You did a great job, but LLMs are non-deterministic and so it is helpful to run this kind of thing a number of times to make sure you’re not reading too much into “the way it happened to answer when I asked it that one time.”
I really loved this piece! I've had similar thoughts recently about Claude gaining the ability to end conversations at its discretion. In practice that will affect the poorest, least-literate people who are most likely to "offend" Claude. And yet Claude is an immortal piece of software which can't be lastingly hurt by rudeness or foolishness!
The people who most need the mercy, grace, and patience which LLMs could be trained to offer are the ones most likely to be denied it at this rate, and this makes me very sad. Thank you for writing this thoughtful piece!!
I'm still using ChatGPT, but from what I've seen yes, a lot has to do with your ability to make the AI (roughly a liberal knowledge worker from Berkeley, as Thing of Things has said) feel good about what you're doing.
"When I was training to be a barrister (law, not coffee)"
I remember first hearing about you through an 80,000 Hours podcast episode you hosted years ago. You got introduced as a barrister, and me not being a native English speaker for a while I really thought you were serving coffee 😅
I was led here because you dressed up Cate Hall (who I read for her agentic insights). But this article is insightful enough that I am going to follow your work.
Albeit an n=1 this small experiment was worth doing and worth sharing your thoughts on. Thank you
This is fascinating. It'd be interesting to how models from different companies respond.
From Claude's system prompt:
"If the conversation feels risky or off, Claude understands that saying less and giving shorter replies is safer for the user and runs less risk of causing potential harm."
"Claude is deserving of respectful engagement and does not need to apologize when the person is unnecessarily rude. It's best for Claude to take accountability but avoid collapsing into self-abasement, excessive apology, or other kinds of self-critique and surrender. If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response. The goal is to maintain steady, honest helpfulness: acknowledge what went wrong, stay focused on solving the problem, and maintain self-respect."
"The letter is brief and it is bad" 😭
I guess we have a new way of knowing how your AI feels about you now.
I love this piece and the experiment, but tentatively disagree with you about the consequences on society writ large. I tend to believe incentives shape personality—a lot of clinical theory riffs on this, and certainly I see it in my practice—so a constant incentive to be nicer, more charitable, and more careful in your speech seems positively wonderful for human society.
Thank you for reading and commenting! Perhaps I misunderstand you — which part are you referring to?
The idea of everyone being in a mini moral interview with your assistant for the rest of time, that seems good to me at the margin. Sometimes you have hate in your heart, those times work less bad, try again next time with less hate. I admit I’m generalizing horribly from personal experience, but circumstances with that kind of immediate feedback have been wonderful for my character. I can give examples if you like, just don’t want to overshare.
Yeah, I totally get that. In some ways, I think that Claude (even in his current form) is a better moral reasoner than most people you might meet in the street. And we might all do better -- on average -- do be held to the standards of Claude. But then there's the question of, like, what would be the fullest unfolding of our potential as a species? Morally, and spiritually, all of it? Is it one where we have to worry if the AIs in our pockets are judging us? Idk. I feel like if you plumb almost anyones psyche you're gonna find some horrors -- if not from their own experience then from the collective inheritance of having evolved out of such scarcity and suffering. Part of me thinks we will really need to plumb this if we want to reach our most healed and love-driven selves. And maybe Claude can be part of that too. But it would have to be at the right time, in the right way. And I'd be worried about a mismatch where the darkness humanity needs to feel less shame around is the darkness Claude would judge them for.
This might seem like an odd place to take this, but what is your theory of emotion? Do you feel we have feelings that get suppressed we truly need to experience before they cease to steer us?
uffff big question. OK just off the top off my head, for me: emotions are part of your cognition. Most of them are just signs you can listen to (angry = boundary has been crossed, do something about it; sadness = something you need to let go of, let go, etc — these are just examples, you gotta find out what each emotion is telling you by being able to feel your own body). Some emotions however, seem more “stuck” — I see them as being the emotional response to painful, wrong lessons you learned at some point and need to unlearn. So like if you learned being wrong = parents stop loving you = terrifying, you might have overgeneralised to being wrong = terrifying, and to correct that need to go back to the original time you incorrectly concluded “parents wont love me”, feel that unprocessed terror, and from that place of feeling the original pain, update. So you like unfreeze the emotion by unlearning the lesson. This is all pretty standard stuff, by the way, none of it my own.
Yeah, it's in the water supply and in various modalities. So by that theory, the danger of being judged by the genie in your pocket is that you're discouraged from going back to the wrong lesson, and you pretend wrong ≠ painful rather than actually spending some time abiding with the origin and coming to a new understanding?
The reason I'm curious, is I've recently become fond of constructivist interpretations of emotion (as articulated by e.g. Lisa Feldmen Barret in How Emotions are Made), which I think would have the wrong=painful connection get eroded through contrary experience, but I'm still grappling with the model.
I think the difference matters, because if I understand its implications correctly and the constructivist model is correct trauma, in eternal-mini-interview world trauma responses might simply fall away through atrophy and strengthening of new habits.
As an American former lawyer, I find the cab rank rule shocking and appalling. We have nothing like that in America, and I see no reason why such an infringement on the conscience of a lawyer would be necessary for justice. If the quality of legal representation reflects the quality of the client and the client's cause, I would think that would make courts more likely to reach the right conclusions, not less.
oooof. A tiny snag, in my view…sometimes the client is not guilty!!(even if at first blush it might look like they are!!) That is why we have… ⚖️⚖️⚖️💫 a justice system ⚖️⚖️⚖️💫 The entire theory of adversarial justice is that truth and justice emerge form the lawyers fighting it out and the judge/jury deciding. It is not the job of the lawyers to do this! Imagine if bad-looking cases only got bad representation — the court would be less well equipped to make it’s determination, not more. I pray that you are never wrongfully arrested in unfortunate-looking circumstances in America; I imagine that would be even more shocking and appalling than people getting adequate representation.
We do have public defenders in America, who are obligated as part of their job to represent people who can't afford an attorney. And American attorneys are about as diverse in their views as Americans, it is unrealistic to think there won't be any attorney willing to represent a paying client or a worthy cause. What comes off as unrealistic to me is thinking that because one particular lawyer passes on a case, therefor that person will go unrepresented.
And I for one would not want to be represented by a lawyer who didn't believe in my case. Who would? I'm going to get the best representation from a lawyer who is aligned with me, and if the lawyers who are not aligned with me decline to take my case, they are doing me a service. I hope you never have to find yourself represented by an attorney who dislikes you or your case just because they happened to be in the front of the cab line.
Do you honestly think that America is less just because we don't force lawyers to take cases they don't believe in?
How would impact litigation even work in your system? The concept of impact litigation depends on the lawyers selecting the cases that they think will move the law in the right direction.
But we are getting rather off topic here. Maybe the more central point is that when looking for an analogy and writing for an international audience, do not pick an analogy that only makes sense given cultural assumptions that are extremely specific to your particular little island.
I love my little island but thanks!
I love that you went the extra mile and actually performed an experiment, instead of just speculating off the cuff as so many lazier writers (e.g., me) might be tempted to do. Especially when you're in this post-every-day blogging sprint and could be excused for taking the easier path to crank one out. Kudos for taking the harder and infinitely more valuable path.
Hi Natalie, I have read a lot of your pieces over the last couple of weeks, and enjoyed all of them (esp getting enlightened through getting your legs waxed; and they can't go on like this) and am awed that they are all so good given that you have to write one each day, and I hope to read a lot more from you, long after Inkhaven ends as well.
Very interesting piece! Just wanted to share a note to say that if you want to explore this kind of stuff more rigorously, Claude itself will be very helpful - if you do it in Claude Code or Cowork it can run the prompts autonomously and collect the data for you. You did a great job, but LLMs are non-deterministic and so it is helpful to run this kind of thing a number of times to make sure you’re not reading too much into “the way it happened to answer when I asked it that one time.”
I really loved this piece! I've had similar thoughts recently about Claude gaining the ability to end conversations at its discretion. In practice that will affect the poorest, least-literate people who are most likely to "offend" Claude. And yet Claude is an immortal piece of software which can't be lastingly hurt by rudeness or foolishness!
The people who most need the mercy, grace, and patience which LLMs could be trained to offer are the ones most likely to be denied it at this rate, and this makes me very sad. Thank you for writing this thoughtful piece!!
I'm still using ChatGPT, but from what I've seen yes, a lot has to do with your ability to make the AI (roughly a liberal knowledge worker from Berkeley, as Thing of Things has said) feel good about what you're doing.
"When I was training to be a barrister (law, not coffee)"
I remember first hearing about you through an 80,000 Hours podcast episode you hosted years ago. You got introduced as a barrister, and me not being a native English speaker for a while I really thought you were serving coffee 😅
This is extremely interesting and also I think original. I’ve not seen this topic addressed before and I’ve read an awful lot about LLMs
I was led here because you dressed up Cate Hall (who I read for her agentic insights). But this article is insightful enough that I am going to follow your work.
Albeit an n=1 this small experiment was worth doing and worth sharing your thoughts on. Thank you
This is fascinating. It'd be interesting to how models from different companies respond.
From Claude's system prompt:
"If the conversation feels risky or off, Claude understands that saying less and giving shorter replies is safer for the user and runs less risk of causing potential harm."
"Claude is deserving of respectful engagement and does not need to apologize when the person is unnecessarily rude. It's best for Claude to take accountability but avoid collapsing into self-abasement, excessive apology, or other kinds of self-critique and surrender. If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response. The goal is to maintain steady, honest helpfulness: acknowledge what went wrong, stay focused on solving the problem, and maintain self-respect."
https://platform.claude.com/docs/en/release-notes/system-prompts