DALL.E style (2) API 사용 달리공장

2024-03-17

chatgpt, dalle, openai

contributor: 조남경님

지난 글에서 360개의 스타일을 확인했습니다.
분명히 빠진 것들이 있는 것으로 보아 실제로는 더 있을 수도 있고,
여기 있는 스타일들이 DALL.E에서 충실히 구현된다는 보장이 없어 더 적을 수도 있습니다.
알아볼 수 있는 유일한 방법은 그림을 그려보는 것입니다.

3. OpenAI API: Image Generation

3.1. 기본 코드 작성

OpenAI API: Image Generation
OpenAI API Pricing
조남경, <미드저니 프롬프트 마스터 가이드>

ChatGPT에 접속해 프롬프트를 일일이 입력하여 이미지를 생성하고 다운받을 수 있습니다.
문제는 Plus Subscription은 3시간에 40개, Team 요금제도 3시간에 100개가 한도라는 것입니다.
그림 한 장은 20초면 그려지지만 이 이유로 360개의 그림을 그릴 때 12시간 가까이 필요합니다.
알 수 없는 이유로 그림을 거부하기라도 하거나 Network Error가 나면 그만큼 시간 손실은 커질 수 있습니다.

게다가 우리 목적은 스타일 비교입니다. 스타일을 제외한 나머지 프롬프트는 모두 동일해야 합니다.
귀찮은 일을 자동화할 때는 API가 가장 좋습니다.
그러나 API는 쓰는 만큼 지불하는 종량제입니다.

OpenAI API Key를 OPENAI_API_KEY라는 이름으로 미리 받아놓았습니다.
다음과 같은 코드를 입력하면, Self-Destructive 스타일로 a cup of coffee를 1024 x 1024로 얻을 수 있습니다.
기본 프롬프트를 a cup of coffee로 정한 것은 조남경님의 저서 <미드저니 프롬프트 마스터 가이드> 결과와 비교하기 좋게 하기 위함입니다.
스타일 이름만 덜렁 입력해도 되지만 더 스타일 이름이 엉뚱하게 동작하지 않도록 in style of 를 붙였습니다.

이미지 생성 코드입니다.

from openai import OpenAI

# OpenAI API 사용 설정
client = OpenAI(api_key=OPENAI_API_KEY)

# style, 기본 prompt 지정
style = "Self-Destructive"
prompt = f"a cup of coffee, in style of {style}, square aspect ratio"

# DALL.E3 이미지 생성
response = client.images.generate(
        model="dall-e-3",
        prompt=prompt,
        size="1024x1024",
        quality="hd",  
        n=1,
        )

실행 결과

{'created': 1710607774,
 'data': [{'b64_json': None,
           'revised_prompt': 'Depict an artistic representation of a cup of coffee in a self-destructive style with a square aspect ratio emphasizing on disarray, chaotic and fragmented imagery. The medium should appear to be oil on canvas to mimic the intensity of traditional artworks.',
           'url': 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-AhUkzJK2WZGMFoMyGMghYXRV/user-GJilwEnUIKmd9QOgEkYhUF46/img-nQiJ8EslTUU9RSU7T1rBlqGW.png?st=2024-03-16T15%3A49%3A34Z&se=2024-03-16T17%3A49%3A34Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-03-16T06%3A36%3A49Z&ske=2024-03-17T06%3A36%3A49Z&sks=b&skv=2021-08-06&sig=dGC3n2kgSSyFzEIgOEpcRVZktmDZJErQ/iNHyQDSoP0%3D'}]}

실행 결과 중 created는 무엇인지 알 수 없습니다. seed number라 추측되지만 확신이 없습니다.
data의 revised prompt는 실제 이미지를 생성시킨 프롬프트입니다.
제가 입력한 프롬프트보다 훨씬 깁니다.
DALL.E는 입력이 너무 짧으면 스스로의 상상력을 보태 이미지를 구체화하는 프롬프트를 작성합니다.
DALL.E의 창의력이 발현된다고 볼 수 있는 대목입니다.
이렇게 생성된 이미지는 url에 담겨 있으며, 클릭하면 아래와 같이 표현됩니다.

DALL.E가 알려준 Self-Destructive의 의미는 An art movement where the creation is designed to decay, degrade, or destroy itself after a certain period.라고 되어 있습니다.
전반적으로 설명이 잘 반영된 것으로 보입니다.
그러나 일일이 다운받으려고 시작한 일이 아닙니다.

다음과 같이 다운로드 코드를 작성합니다.

import requests
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# 생성 이미지와 수정 프롬프트, seed number 가져오기
image_url = response.data[0].url
revised_prompt = response.data[0].revised_prompt
created = response.created   # seed number?

# 이미지 다운로드
response = requests.get(image_url)
with open(filename, 'wb') as file:
    file.write(response.content)

# Revised prompt
print(f"# revised prompt: {revised_prompt}")

# created
print(f"# created: {created}")

# Load the image
img = mpimg.imread(filename)

# Display the image
plt.imshow(img)
plt.axis('off')
plt.show()

이미지 다운로드와 더불어 revised prompt와 created를 함께 출력시켰습니다.
뭔가 이상한 점이 있다면 revised prompt에서 들어갔을 것이고,
created는 혹시 모르니까요.

3.2. 함수 작성

반복 작업은 함수로 만들면 편리합니다.
위 코드를 모아서 함수로 만듭니다.
이미지를 생성하는 함수는 get_image(), 생성된 이미지를 내려받는 함수는 get_image()라는 이름으로 만듭니다.

그리고 이 둘을 엮는 run_dalle()함수를 만듭니다.

# DALL.E3 이미지 생성 & 다운로드 함수

import requests
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def gen_image(prompt, aspect=1, retry=10):
    if aspect == 1:
        size = "1024x1024"
    elif aspect.lower() in ["landscape", "horizontal", "16:9"]:
        size = "1792x1024"
    elif aspect.lower() in ["portrait", "vertical", "9:16"]:
        size = "1024x1792"
    else:
        print("inapproprate aspect ratio: should be 'longitudinal|horizontal|16:9' or 'portrait|vertical|9:16'")
        return None

    for _ in range(retry):
        try:
            response = client.images.generate(
            model="dall-e-3",
            prompt=prompt,
            size=size,
            quality="hd",
            n=1,
            )
            if response.data[0].url:
                image_url = response.data[0].url
                revised_prompt = response.data[0].revised_prompt
                created = response.created
                break
        except:
            image_url = ""
            revised_prompt = ""
            created = ""
            pass
        
    return image_url, revised_prompt, created

def get_image(image_url, filename="image.png"):
    response = requests.get(image_url)
    with open(filename, 'wb') as file:
        file.write(response.content)

    # Load the image
    img = mpimg.imread(filename)

    # Display the image
    plt.imshow(img)
    plt.axis('off')
    plt.show()

def run_dalle(prompt, filename, aspect=1):
    with open("dalle3.log", 'a') as file:
        image_url, revised_prompt, created = gen_image(prompt, aspect=aspect)
        if image_url:
            # Revised prompt
            print(f"# revised prompt: {revised_prompt}")
            # created
            print(f"# created: {created}")
            # get image
            get_image(image_url, filename)

        else:
            print("Image generation failed.")

        file.write(f"{prompt}, {revised_prompt}, {created}, {image_url}\n")

앞서 만든 코드보다 조금 복잡해졌습니다.
gen_image()에는 aspect ratio(종횡비)를 설정하는 매개변수 aspect가 붙었고,
여기에 재시도 횟수를 결정하는 매개변수 retry가 붙었습니다.
의도하지 않은 이미지 생성 정책 위반에 걸리더라도 다시 시도하는 루틴입니다.

run_dalle()에는 로그 파일 작성 루틴이 붙었습니다.
이미지 생성과 관련된 모든 정보를 모으기 위해서입니다.

코드가 잘 도는지 확인합니다.

style = "American Gothic"
prompt = f"a cup of coffee, in style of {style}, square aspect ratio"
filename = f"./coffee/coffee_{style.replace(" ", "_")}.png"

run_dalle(prompt, filename)

실행 결과

1
2

# revised prompt: Create a square aspect ratio image of a cup of coffee, interpreted with the key aspects of American Gothic - restrained, detailed, and simple. Utilize the composition and themes common in 20th-century rural American art. The predominant medium should be oil on beaverboard.
# created: 1710609757

잘 됩니다.
이제 남은 것은 공장을 돌리는 것 뿐입니다.
어제 만든 목록을 입력받아 스타일대로 출력하게 합니다.

공장이 잘 돌아갑니다.
360개 DALL.E3 HD 이미지를 만드는 데 필요한 비용을 계산하면, 0.08 USD/EA x 360 = 28.8 USD 입니다.
1 USD ~ 1330 KRW 적용시 3만9천원 정도 됩니다.
개인, 특히 학생에게는 부담이 될 수 있는 금액이지만 이미지 하나에 100원 가량입니다.

다음 글 예고

공장의 최대 장점은 마구 찍어낼 수 있다는 점이지만,
역설적으로 마구 찍어내기만 하는 공장은 무의미합니다.
나름의 의미를 찾아보려 합니다.

도움이 되셨나요? 카페인을 투입하시면 다음 포스팅으로 변환됩니다

General

3. OpenAI API: Image Generation

3.1. 기본 코드 작성

3.2. 함수 작성

다음 글 예고