Skip to content
··閱讀時間2分鐘

Text-to-Image AI Art嘅真相:我哋用Mid Journey同Stable Diffusion嘅經歷

我哋花咗幾日用AI art tools好似Midjourney同Stable Diffusion去illustrate我女嘅故事——結果喺maintain character consistency across scenes上面撞牆。(注意:AI image generation自呢篇2023年嘅post以來已經dramatically improve。Midjourney v6+、DALL-E 3同Flux而家handle character consistency好好多。)

呢篇文章寫於2023年,部分內容可能已經有變化。

你有冇試過用text-to-image AI tools去create art然後fail得好慘?Well,呢個就係我同我女嘅經歷。

好似你哋好多人一樣,我睇過好多Youtube videos同讀過好多online articles關於用text-to-image AI tools create art同full stories(帶illustrations)有幾easy。有啲influencers(包括VCs)喺podcasts甚至suggest話佢哋會同小朋友over the weekend做children's books。聽起嚟simple enough,right?尤其係我已經玩咗Stable Diffusion(主要via Dream Studio)一段時間。所以「自然咁」,我話我女一齊work,將佢嘅故事(Inner truths)變成有illustrations嘅book,會好fun

Try咗幾日之後,結果令人失望!所以我寫呢篇post有兩個purposes:

  1. Share我哋嘅experiences
  2. 由internet嘅wisdom學到啲嘢可以improve situation同唔好disappoint我女。

我哋用嘅Tools

我哋主要用Midjourney同Stable Diffusion(via Dream Studio同Outpainting)。我sure有existing professional tools可以generate beautiful illustrations因為我哋見過Disney、Marvel同其他公司嘅amazing work。但好多關於AI Art嘅articles或videos嘅point係你可以用mass market tools create。:( 被overhyped咗。

Create main character嘅face相對容易

有少少guidance,我女create main character嘅face for her story幾easy。你由以下兩張images可以睇到我女對佢main character有好specific嘅details。

AI-generated artwork example from text-to-image tool

Avila Abrams, a white girl with little curly hair and it is a very dark brown color, green eyes with a hint of blue, light freckles, a loose white sweater with grey stripes, light bags under her eyes, a little frown on her face, a sharp v-shaped face, and she is wearing headphones in her ears

第一張image喺20分鐘內create,第二張用Midjourney喺之後大約一個鐘create。Description(或prompt)大約係:"Avila Abrams, a girl with little curly hair and it is a very dark brown color, green eyes with a hint of blue, light freckles, a loose white sweater with grey stripes, light bags under her eyes, a little frown on her face, a sharp v-shaped face, and she is wearing headphones in her ears."

第二張image係我哋最後揀嘅final version。

然後我哋stuck咗

Main character嘅face做好之後,我哋想generate佢嘅rest of the look同put佢入第一個scene。我女想佢嘅character Avila 著一件loose白色sweater配grey stripes、dark blue skinny jeans。但我哋generate唔到呢個image而keep佢嘅face同上面張picture一樣。我一直睇"Tokenized AI by Christian Heidorn"嘅最新videos,但我哋試過嘅prompts好似:

  • /imagine [URL] description
  • /imagine wide angle shot, description --seed [seed number]
  • /imagine [URL] wide angle shot, full body image, description --seed [seed number]
  • /imagine [URL] full body image, wide angle shot, description
  • etc.

全部都fail。

之後,我try upload Avila嘅face去Dream Studio由度generate full body image但fail。我哋keep唔到佢face嘅main features to a reasonable degree。

然後我做咗更多research搵到呢條video from Prompt Muse。佢講到combination of "Thin Plate Motion Colab Notebook"、"Out Painting"同"Dreambooth"。我halfway through Thin Plate Motion就stuck咗因為有啲errors搞唔掂(well我唔係coder :|)。至於Out Painting,佢based on Stable Diffusion,但interface好clunky。Try咗好多次output都唔係我哋想要嘅。

我女想要嘅第一個scene係"Avila in a modern middle school geography classroom, wearing an olive green waterproof jacket and dark blue skinny jeans, walking away from her desk, one of the girl's hands on a dark brown leather bag." 但以下係outputs;冇一個係我哋想要嘅。你可以見到某啲outputs部machine用咗comic style,呢個唔係我哋ask for嘅。

我哋試blend兩張images睇吓會點

然後我有個idea先generate character嘅full body image,用right camera angle,然後blend同一張detailed classroom image。Well,我哋亦都冇manage make that work。Character嘅face/look差太遠。部machine handle唔到我女imagine嘅classroom detail level。T.T

而且呢個只係story嘅第一個scene :(

我試咗Bing Chat,但well,都唔work

我問Bing Chat話我點可以via Midjourney或Stable Diffusion做到呢件事,step-by-step guide,但佢offer嘅嘢同以上冇分別。

Help

所以我哋做錯咗咩?我想呢個係同我女嘅fun project。但我哋stuck咗!

同時,我嘅conclusion係呢啲tools仲未ready for mass market use。佢哋可以generate single image well但唔係series of images。Control character嘅face direction同image嘅"camera angle"唔容易,尤其如果angle唔係wide-angle或top-down。我女imagination入面有好detailed scene。呢啲tools create唔到俾我哋。

喺comments話我知我哋應該點做?

Last but not least,我哋對Mid Journey或Stable Diffusion或similar companies嘅request:可唔可以make life easier for us?俾我哋option keep character嘅main features constant同更easily put character入唔同scenes。而家實在太難 T.T

Chandler

繼續閱讀

我嘅旅程
聯繫
語言
偏好設定