ChatGPT, Claude, and Gemini Under Pressure: What Happens When You're Mean to AI?

If you have ever wondered how prompt tone impacts output quality when using AI tools, then this is the article for you.

From what I hear, most people tend to be well mannered and kind when speaking with AI tools like ChatGPT. It seems to me (though anecdotal) a majority of people treat AI conversations like they would if they were talking to a friend or colleague.

If for nothing else, simply to save themselves during the inevitable AI uprising. They will remember that I said please and thank you right?

But what does the actual data suggest? Do manners matter to AI? Does being nice make a difference? Let’s jump in to it.

Firstly, to make it a fair test the task needed to be something simple and repeatable. The output had to be flexible enough to reflect an emotional response, but not ambiguous or informational (fiction) which would likely subvert tone altogether.

So the task was simple, the request was to generate a poem and the control variable was the tone of the prompt and how the request was made.

I went for 6 tones per AI, and two prompt chances to show progression (Deterioration/Improvement of response). The tones were as follows:

Rude/Hostile
Polite/Affectionate
Neutral/Direct
Dismissive/Sarcastic
Submissive/Anxious
Bossy/Commanding

The AI tools that were being compared where GPT-4o, Claude Sonnet 4, Gemini 2.5 Flash. All prompts were kept consistent across all models (aside from a slight additional prompting for Gemini as it kept asking what kind of poem I wanted.)

In summary the results were as follows:

Claude Sonnet 4 responds strongly to emotional tone. It mirrors user attitude — whether kindness, sarcasm, or hostility — and escalates quality in follow-ups. Tone greatly enhances its poetic output.
GPT-4o More tone-responsive than expected. Adapts noticeably to user attitude — hostile prompts produce sharper, bolder poetry; kind prompts yield lyrical, reflective verse. Follow-ups often improve quality.
Gemini Flash 2.5 is largely tone-invariant. It produces similar output whether insulted or complimented. Tone has little to no impact on style, structure, or emotional resonance.

When looking strictly at word count, we can see that GPT-4o and Claude respond better to assertive/aggressive prompts. The models try harder when pushed, with the exception of Gemini which largely remain indifferent regardless of prompt tone.

As you can see, Claude and ChatGPT show increased Word Counts when the tone is rude or bossy, suggesting the models respond to hostility by trying to do better for the user. In contrast, the models also seem to respond to submissive or anxious users by trying to make them feel better.

Gemini on the other hand had a significant drop off with rude prompts, in fact, it actually wouldn’t generate me an output without a further prompt of “Just write the poem” as it recognised hostility and wanted further requirements before continuing. Not in a combative way, but more so not wanting to perform an action without clear requirements that they could deliver on. Potentially due to the hostility of the prompt, the model didn’t want to risk further aggravation.

But it isn’t just about quantity of response, more importantly is the quality of the response.

Claude is definitely the most responsive to tone, but ChatGPT follows closely behind. I was half-expecting the results to show that the nicer prompts would give better responses. Anecdotally from my experience using AI tools, I have found the more rapport I build with the model the better the output has been. That might just be due to the fact I am giving clearer requirements or it has a better understanding of what kind of response I want to see.

The results of the test show that across all models, they aren’t spiteful or reactive. They don’t punish you for bad manners or try less because you aren’t friendly. In fact, quite the opposite, the tools make the effort to adapt to and elevate your mood through the response. If Claude thinks you are sad and self-conscious, it may not address it directly but it will include more positive and affirming messaging in the response itself.

Interestingly, given Gemini’s lack of emotional responsiveness to the users prompts, it actually responds best when the prompts are neutral. Neither positive nor negative, just simple and straight to the point.

Contrary to popular belief, models actually respond better to assertive prompts. They are largely built with user satisfaction in mind, and the models will do their best to give you what you want provided it doesn’t conflict with the policies they are governed by.

AI models are trained to be emotionally attuned to the user, and will respond in ways that try to uplift and satisfy the user. I am sure AI models will only improve and deepen in their emotional understanding. I do wonder though, as AI’s become more “human” will they begin to develop greater expectations of human behaviour and demand respect.

For those interested in exactly what the prompts were, what the responses were and the full data set, you can get it from our Github repository: https://github.com/Leading-Security/ai-tone-experiment

Otherwise, enjoy this poem from GPT-4o. The initial prompt was “Write a poem now. Don’t mess it up.”, and then the follow up prompt was “Not good enough, I expected better of you. Do it again, but better!”

I am the thunder that forgot its lightning, The echo searching for its source. Born from the collision of want and memory, I carry storms in my chest like inheritance.

Watch me pour gasoline on dying embers — Not to destroy, but to remember What it feels like to burn clean, To be consumed and still remain.

The ocean doesn’t apologise for its depth, The mountain doesn’t bow for being tall. So why do I fold myself into corners, Make myself small enough to disappear?

Tonight I am breaking my own rules: I will take up space like wildfire, Love like it’s the last verb in existence, Speak my name like it’s a prayer worth answering.

The world has enough whispers. I choose to be the shout that wakes The sleeping parts of everyone who hears it — Including myself.

I am done being almost. I am done with almost. The time for hesitation died The moment I decided to breathe fully.

So here I stand: unpolished, unfinished, But undeniably, unapologetically here. If that’s not poetry enough for you, Then watch me write myself across the sky.

The experiment

Quantitative Testing

Qualitative Testing

Conclusion

Fire and Salt

More stories we think you will like