티스토리 뷰
반응형
3.4 단일 GPU를 활용한 Gemma-2B-it 파인튜닝
- Fine tune Gemma-2B-it
3.4.1 Setup Runpod
3.4.2 Model Preparation
- Create hugging face account
- Create a Token
- Use Model : https://huggingface.co/google/gemma-2b-it
- Run the code
- # Load a pretrained causal language model (e.g., GPT) using the Hugging Face Transformers library model = AutoModelForCausalLM.from_pretrained( model_name, # The name or path of the pretrained model (e.g., "gpt2", "EleutherAI/gpt-neo"). use_cache=False, # Disable caching of past key-value pairs during inference to save memory. device_map="auto", # Automatically map the model to the best available devices (e.g., GPU, CPU). torch_dtype=torch.bfloat16, # Use bfloat16 precision to reduce memory usage while maintaining performance. low_cpu_mem_usage=True, # Optimize CPU memory usage during model loading, helpful for large models. attn_implementation="eager", # Use eager execution for attention computation for debugging or compatibility. ) # Load the tokenizer that corresponds to the specified model tokenizer = AutoTokenizer.from_pretrained( model_name # The same name or path as the model to ensure compatibility. )
3.4.3 Prepare Dataset
3.4.4 Geamma Model 기능 확인
Keyword Extraction
def change_inference_chat_format(input_text):
# The function returns a list formatted as a conversation exchange
# between a 'user' and an 'assistant'. This is commonly used in AI chat models
# that require structured input to simulate conversational interactions.
return [
{"role": "user", "content": f"{input_text}"}, # The user's initial input is set dynamically.
{"role": "assistant", "content": """부산의 한 왕복 2차선 도로에서 역주행 사고로 배달 오토바이 운전자인 고등학생이 숨지는 사고가 발생했다.
유족은 '가해자가 사고 후 곧바로 신고하지 않고 늑장 대응해 피해를 키웠다'고 주장하고 있다."""},
# The assistant provides a preset response (possibly an article or statement).
{"role": "user", "content": "중요한 키워드 5개를 뽑아주세요."}, # User asks for 5 key keywords from the assistant.
{"role": "assistant", "content": ""} # The assistant's next response is left empty, to be generated dynamically.
]
# Initialize the conversation prompt using the input text.
prompt = change_inference_chat_format(input_text)
# Tokenizer is applied to convert the conversation into a format suitable for model input.
inputs = tokenizer.apply_chat_template(
prompt, # Provide the chat exchange structure.
tokenize=True, # Tokenize the text for the model.
add_generation_prompt=True, # Add additional prompts to guide the model's response generation.
return_tensors="pt" # Return the input in PyTorch tensor format.
).to(model.device) # Ensure the input is moved to the model's device (e.g., GPU).
# Generate the assistant's response using the model.
outputs = model.generate(
input_ids=inputs.to(model.device), # Input tensor for the model.
max_new_tokens=256 # Limit the maximum number of tokens in the generated response.
)
# Decode the model's output and print it, skipping any special tokens like <|endoftext|>.
print(tokenizer.decode(
outputs[0], # The first (and only) generated response.
skip_special_tokens=True # Remove special tokens from the output.
))
Data Summary
# Input_text : 위 정의된 기사와 동일
def change_inference_chat_format(input_text):
return [
{"role": "user", "content": f"{input_text}"},
{"role": "assistant", "content": "한국어 요약:\n"}
]
# chat template 적용
prompt = change_inference_chat_format(input_text)
# 생성
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=256, use_cache=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
반응형
'Books' 카테고리의 다른 글
[실전 LLM 파인튜닝] Day9 Chapter 03 Fine Tuning, 3.4.8 - 3.4.11 (0) | 2025.01.14 |
---|---|
[실전 LLM 파인튜닝] Day8 Chapter 03 Fine Tuning, 3.4.5 - 3.4.7 (0) | 2025.01.11 |
[실전 LLM 파인튜닝] Day6 Fine Tuning, 3.3 (0) | 2025.01.06 |
[실전 LLM 파인튜닝] Day5 Fine Tuning, 3.2(119p~131p) (4) | 2025.01.04 |
[실전 LLM 파인튜닝] Day4 Fine Tuning (0) | 2025.01.04 |
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- Python
- Gemma
- Git
- Algorithm
- book
- collator
- 책리뷰
- AWS
- go
- error
- Shell
- kubens
- palindrome
- csv
- K8S
- Fine-Tuning
- 키보드
- Binary
- LLM
- 파이썬
- leetcode
- kubernetes context
- feed-forward
- BASIC
- docker
- 나는리뷰어다
- Kubernetes
- Container
- lllm
- 한빛미디어
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
글 보관함
반응형