티스토리 뷰

반응형

3.4 단일 GPU를 활용한 Gemma-2B-it 파인튜닝

  • Fine tune Gemma-2B-it

3.4.1 Setup Runpod

3.4.2 Model Preparation

  1. Create hugging face account
  2. Create a Token
  3. Use Model : https://huggingface.co/google/gemma-2b-it
  4. Run the code
  5. # Load a pretrained causal language model (e.g., GPT) using the Hugging Face Transformers library model = AutoModelForCausalLM.from_pretrained( model_name, # The name or path of the pretrained model (e.g., "gpt2", "EleutherAI/gpt-neo"). use_cache=False, # Disable caching of past key-value pairs during inference to save memory. device_map="auto", # Automatically map the model to the best available devices (e.g., GPU, CPU). torch_dtype=torch.bfloat16, # Use bfloat16 precision to reduce memory usage while maintaining performance. low_cpu_mem_usage=True, # Optimize CPU memory usage during model loading, helpful for large models. attn_implementation="eager", # Use eager execution for attention computation for debugging or compatibility. ) # Load the tokenizer that corresponds to the specified model tokenizer = AutoTokenizer.from_pretrained( model_name # The same name or path as the model to ensure compatibility. )

3.4.3 Prepare Dataset

3.4.4 Geamma Model 기능 확인

Keyword Extraction

def change_inference_chat_format(input_text):
    # The function returns a list formatted as a conversation exchange
    # between a 'user' and an 'assistant'. This is commonly used in AI chat models
    # that require structured input to simulate conversational interactions.
    return [
        {"role": "user", "content": f"{input_text}"},  # The user's initial input is set dynamically.
        {"role": "assistant", "content": """부산의 한 왕복 2차선 도로에서 역주행 사고로 배달 오토바이 운전자인 고등학생이 숨지는 사고가 발생했다.
         유족은 '가해자가 사고 후 곧바로 신고하지 않고 늑장 대응해 피해를 키웠다'고 주장하고 있다."""},
        # The assistant provides a preset response (possibly an article or statement).
        {"role": "user", "content": "중요한 키워드 5개를 뽑아주세요."},  # User asks for 5 key keywords from the assistant.
        {"role": "assistant", "content": ""}  # The assistant's next response is left empty, to be generated dynamically.
    ]

# Initialize the conversation prompt using the input text.
prompt = change_inference_chat_format(input_text)

# Tokenizer is applied to convert the conversation into a format suitable for model input.
inputs = tokenizer.apply_chat_template(
    prompt,  # Provide the chat exchange structure.
    tokenize=True,  # Tokenize the text for the model.
    add_generation_prompt=True,  # Add additional prompts to guide the model's response generation.
    return_tensors="pt"  # Return the input in PyTorch tensor format.
).to(model.device)  # Ensure the input is moved to the model's device (e.g., GPU).

# Generate the assistant's response using the model.
outputs = model.generate(
    input_ids=inputs.to(model.device),  # Input tensor for the model.
    max_new_tokens=256  # Limit the maximum number of tokens in the generated response.
)

# Decode the model's output and print it, skipping any special tokens like <|endoftext|>.
print(tokenizer.decode(
    outputs[0],  # The first (and only) generated response.
    skip_special_tokens=True  # Remove special tokens from the output.
))

Data Summary

# Input_text : 위 정의된 기사와 동일 

def change_inference_chat_format(input_text):
    return [
    {"role": "user", "content": f"{input_text}"},
    {"role": "assistant", "content": "한국어 요약:\n"}
    ]

# chat template 적용
prompt = change_inference_chat_format(input_text)

# 생성
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, use_cache=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

 

반응형
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/02   »
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28
글 보관함
반응형