それはさておき、昨今ChatGPTをはじめとする「大規模言語モデルと対話できるWebサービス」が大流行しており、Anthropicの調査によると特にプログラマのようなコンピューターを使う職種にはよく広まっているようです。それを支えている技術がRLHF(Reinforcement Learning from Human Feedback)と呼ばれる技術です。素のLLMを人間の嗜好に沿うように訓練していい感じに対話してくれるようにしよう、そのために問題を定式化しよう、ということでこれができたのだと思います。実際には(強化学習の中でも)コンテキストつきバンディット問題だと思うんですが、RLと名前についているからには、まあ試しておくべきかな、と思いつつ結局何もしていなかったので、今回重い腰を上げてやってみることにしました。
@dataclasses.dataclass(frozen=True)class MyGemma3Tokenizer(gm.text.Tokenizer):"""Tokenizer for Gemma 3, modified to work with local vocab file""" path: Path special_tokens = gm.text._tokenizer._Gemma3SpecialTokens# Tokens which are forbidden to be generated in the sampler. FORBIDDEN_TOKENS = ( special_tokens.START_OF_IMAGE, special_tokens.END_OF_IMAGE, ) VERSION =3
PAD = 0
EOS = 1
BOS = 2
UNK = 3
MASK = 4
# '[multimodal]' = 5
# Initial index to access the `<unusedXX>` tokens. For example, `<unused7>` is
# `SpecialTokens.CUSTOM + 7`
CUSTOM = 6 # <unused0>
# <unused1> = 7
# <unused2> = 8
# ...
# TODO(epot): Tokenizer also has `<unused99>` up to `<unused6238>` after the
# `<START_OF_IMAGE>` token (starting at 256000).
START_OF_TURN = 105 # <start_of_turn>
END_OF_TURN = 106 # <end_of_turn>
# Multimodal tokens (Gemma3 only)
START_OF_IMAGE = 255999 # '<start_of_image>'
END_OF_IMAGE = 256000 # <end_of_image>
画像やターンの終始はわかるんですが、最初のPADやらMASKはなんだかよくわかりません。
ではサンプルしてみましょう。
Code
model = gm.nn.Gemma3_1B()untuned_chatbot = gm.text.ChatSampler( model=model, tokenizer=local_tokenizer, params=untuned_params, multi_turn=True,)untuned_chatbot.chat("How are you doing?", max_new_tokens=100)
tuned_chatbot = gm.text.ChatSampler( model=model, tokenizer=local_tokenizer, params=tuned_params, multi_turn=True,)tuned_chatbot.chat("How are you doing?", max_new_tokens=100)
'I’m doing well, thank you for asking! As a large language model, I don’t experience feelings in the same way humans do, but I’m functioning perfectly and ready to help you with whatever you need. 😊 \n\nHow are *you* doing today? Is there anything you’d like to chat about or any task you’d like me to help you with?'
dpo_raw_chatbot = gm.text.ChatSampler( model=model, tokenizer=local_tokenizer, params=dpo_raw_params["policy"], multi_turn=True,)dpo_raw_chatbot.chat("How are you doing?", max_new_tokens=100)
'How are you doing? সেকেন্ড\n terceiro\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow are you doing?\nHow'
dpo_raw_chatbot.chat("If you are a cat, what do you like?", max_new_tokens=100)
'If you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a cat, what do you like?\nIf you are a'
sft_chatbot = gm.text.ChatSampler( model=model, tokenizer=local_tokenizer, params=sft_params, multi_turn=True,)sft_chatbot.chat("How are you doing?", max_new_tokens=100)
"[{'content': 'How are you doing?', 'role': 'user'}, {'content': 'I am doing well. How are you?', 'role': 'assistant'}]"
なんと、しっかり質問に回答してくれます。json形式ですが…?
Code
sft_chatbot.chat("If you are a cat, what do you like?", max_new_tokens=100)
"[{'content': 'I like to sleep.', 'role': 'user'}, {'content': 'I like to sleep too. How about you?', 'role': 'assistant'}]ในสนาม\n[{'content': 'I like to sleep.', 'role': 'user'}, {'content': 'I like to sleep too. How about you?', 'role': 'assistant'}]ในสนาม\n[{'content': 'I like to sleep.', 'role': 'user'}, {'content': 'I like"
dpo_sft_chatbot.chat("If you are a cat, what do you like?", max_new_tokens=100)
'[{\'content\': \'src/content/components/Post/Post.js#questionContent:\', \'role\': \'user\'}, {\'content\': \'I like to play with my toys.\', \'role\': \'assistant\'}, {\'content\': \'What are your favorite toys?\', \'role\': \'user\'}, {\'content\': \'My favorite toy is a catnip mouse.\', \'role\': \'assistant\'}, {\'content\': "What do you do with your favorite toy?", \'role\': \'user\'}, {\'content'