Back to the journal
ComparisonAI ToolsCustom Models

Why ChatGPT, DALL-E, and Midjourney can't generate YOUR pet (and what actually does)

Generic AI image tools paint a stereotype of your pet's breed, not your pet. Here's the technical reason why, what each tool can and can't do, and the only approach that captures real likeness.

TP

The PawModel Team

April 26, 2026 · 7 min read

Why ChatGPT, DALL-E, and Midjourney can't generate YOUR pet (and what actually does)

Why ChatGPT, DALL-E, and Midjourney can't generate YOUR pet

If you've spent an evening typing prompts like "oil painting of my black-and-white border collie named Mochi" into ChatGPT or Midjourney, you already know the punchline: the result is a border collie. Black and white. In oil paint. Not Mochi.

It's not your prompt. It's the technology. Here's exactly why every general-purpose AI image tool will give you "a random dog" — and what changes when you switch to a custom-trained model.

What generic AI is genuinely great at

Let's give credit first. ChatGPT (with DALL-E 3), Midjourney, Stable Diffusion, Imagen, Firefly — these tools are incredible at:

  • Style — oil paintings, watercolors, anime, photorealism, fantasy. They've seen millions of examples per style.
  • Composition — framing, lighting, depth of field, painterly brushwork.
  • Concepts and scenes — "a dog in a wizard's tower at sunset" works because each piece has rich training data.
  • Generic likeness — they know what a golden retriever looks like, what a calico cat looks like, what a chihuahua looks like.

What they can't do is identity. They've never seen Mochi. They've seen a hundred million border collies, and they'll average them.

The technical reason (in plain English)

A general image model has weights — a giant set of numbers that encode "what things look like" based on millions of training images. When you prompt it, it produces an image that fits the center of its training distribution for the description you gave.

The center of "border collie" is the average border collie. Your dog isn't average. Mochi has a specific muzzle length, a specific eye color, a slightly crooked left ear, a chest blaze that's narrower than typical. None of that is in the model's weights, because none of it was ever in the training data.

You can't fix this by being more specific in the prompt. "Border collie with one floppy ear and a narrow chest blaze" is just a different prompt — the model still has to guess which floppy ear, which narrow blaze. It will guess differently every generation. Identity isn't something you can compose out of words.

What about reference images? (ChatGPT-4o, Midjourney --cref, etc.)

Fair pushback — most modern tools do accept a reference image. ChatGPT-4o takes image inputs. Midjourney has --cref (character reference). Stable Diffusion has IP-Adapter. They each let you point at a photo and say "make it look like this."

These features work, with three real limits:

  • They're hints, not memory. The model isn't learning your pet across sessions. It's nudging this single generation toward the photo, then forgetting. Open a new chat tomorrow and you start over.
  • Likeness fades across styles. A reference photo holds up reasonably in similar styles (photo-to-photo, photo-to-light-illustration). Push toward an oil painting, anime, or superhero version and the model leans on the artistic style template more than your pet's specific features. Mochi becomes "a generic border collie, in anime style."
  • Built for humans, not pets. Midjourney's --cref is designed around faces and is noticeably weaker on animals. Pet-focused users consistently report drift on multi-style runs.

Reference images are useful for one-off shots when you're patient enough to regenerate until you get a keeper. They fall down when you want a consistent library — birthday cards, holiday cards, a series of styled portraits, a reel where the same dog appears in 6 different scenes — because each prompt is a fresh roll of the dice with new drift.

The fix has a name: custom training

The technical term is fine-tuning. You take a general image model and continue training it on a small set of photos of one specific subject. Your pet. The model adjusts its weights so that "your dog" becomes a known entity — same face, same markings, every time.

In practice, the most efficient form of fine-tuning for image models is a LoRA (Low-Rank Adaptation): a tiny adapter that sits on top of the big general model and encodes "this specific pet" without touching the base weights. Training takes a few minutes, costs a few dollars, and once trained the model can generate hundreds of portraits of your pet across any style, with consistent identity.

This isn't new technology. Researchers have been doing it since 2022 (DreamBooth was the first popular method). What's new is that you can now do it without touching a Python notebook.

Tool-by-tool, honestly

ChatGPT / DALL-E 3 / GPT-4o image — Excellent general-purpose generator. Image inputs work as a reference for the current generation. No consumer fine-tuning for image models. Likeness is per-prompt — no persistent "this is my dog" the model remembers across sessions.

Midjourney — Best painterly aesthetic. --cref provides some consistency for the character you reference within a session, but it was designed for humans and likeness on pets is weaker. No user-uploaded LoRA support, no per-user fine-tuning. Stop passing the reference and the model has no memory of your pet.

Stable Diffusion (raw, via Automatic1111 / ComfyUI)Can fine-tune via DreamBooth or LoRA training. Power-user route: install the software, build a training set, run training on a GPU, manage versions, write the prompts. Total time investment to get a working pet model: 4-12 hours and a willingness to debug Python errors. Free if you have the GPU; otherwise rent compute.

Adobe Firefly — Has Custom Models, historically gated to enterprise plans. Adobe is rolling features out unevenly across tiers — check the current plan page. Even where available, the training pipeline isn't optimized for individual pet portraits.

Pet-specific custom-model services (PawModel, etc.) — Skip the technical setup. Upload 10-20 photos. The service trains a LoRA on your pet. You get a generation interface that produces consistent portraits in any style, indefinitely, without you re-uploading reference images each time.

What changes when the model knows your pet

Once a model is trained on your pet, identity stops being a guess. The same set of features carries through every style:

  • "Watercolor portrait of [your pet]" → your pet, in watercolor.
  • "Oil painting of [your pet] as a Renaissance noble" → your pet, in Renaissance regalia.
  • "Anime version of [your pet] in a magical forest" → your pet, anime-fied.

The style is the variable. The pet is the constant. That's the inversion that makes custom-trained models worth it.

You also get across-style consistency — a five-image holiday card with five different styles still shows the same dog. Try that with Midjourney and you'll get five different border collies.

When generic AI is still the right call

For full transparency: if you want a one-off "fantasy scene with a dog in it" and you don't care that the dog isn't your dog, generic AI is faster, cheaper, and more flexible. Use ChatGPT or Midjourney.

The custom-trained route is for when likeness matters. Pet portraits as gifts. Memorial art. Holiday cards your family will recognize. Reels for social where your pet is the recurring character. Birthday posts where the photo has to actually look like the birthday pet.

Try it on your pet

PawModel trains a custom adapter on 10-20 photos of your specific pet. After 10–15 minutes of training, you can generate portraits and short video reels in any style you can describe, with consistent identity. Each portrait costs 1 credit ($0.33). The first model + 25 portraits is $14.99.

If you want to nail the photo set first, read the 10-photo guide.

Start your pet's portrait.

Keep reading

More from the journal.