The author | huggingface compile | source of vitamin k | dead simple
This section explains how to save and reload fine-tuning models (BERT, GPT, GPT-2, and Transformer-XL). You need to save three file types to reload the fine-tuned model:
- The model itself should be a PyTorch serialized saved model (pytorch.org/docs/stable…)
- The configuration file for the model is saved as a JSON file
- Vocabulary (and a model based on THE COMBINED GPT and GPT-2 BPE).
The default file names for these files are as follows:
- Model weight file:
pytorch_model.bin
- Configuration file:
config.json
- Vocabulary files:
vocab.txt
For BERT and Transformer-XL,vocab.json
Stands for GPT/GPT-2(BPE vocabulary), - Additional merged files representing GPT/GPT-2(BPE vocabulary) :
merges.txt
.
If you save the model with these default file names, you can reload the model and tokenizer using the from_pretrained() method.
This is the recommended way to save models, configurations, and configuration files. Go to output_dir and reload the model and tokenizer:
From transformers import WEIGHTS_NAME, CONFIG_NAME output_dir = "./models/" Save a fine-tuned model, configuration, and vocabulary # If we have a distributed model, Model_to_save = model.module if hasattr(model, allel, gradient, etc.) 'module') else model # If saved with a predefined name, Join (output_model_file = os.path. dir, from_pretrained) WEIGHTS_NAME) output_config_file = os.path.join(output_dir, CONFIG_NAME) torch.save(model_to_save.state_dict(), Output_model_file) model_to_save.config.to_json_file(output_config_file) tokenizer.save_vocabulary(output_dir) Reload the saved model # Bert sample model = BertForQuestionAnswering. From_pretrained tokenizer = (output_dir) BertTokenizer.from_pretrained(output_dir, Do_lower_case =args.do_lower_case) # Add specific options if needed #GPT OpenAIGPTDoubleHeadsModel.from_pretrained(output_dir) tokenizer = OpenAIGPTTokenizer.from_pretrained(output_dir)Copy the code
If you want to use a specific path for each type of file, there is another way to save and reload the model:
output_model_file = "./models/my_own_model_file.bin" output_config_file = "./models/my_own_config_file.bin" Output_vocab_file = "./models/my_own_vocab_file. Save a fine-tuned model, configuration, and vocabulary # If we have a distributed model, Model_to_save = model.module if hasattr(model, allel, gradient, etc.) 'module') else model torch.save(model_to_save.state_dict(), output_model_file) model_to_save.config.to_json_file(output_config_file) tokenizer.save_vocabulary(output_vocab_file) # 'from_pretrained' is not used to load the saved model using the pre-defined weight name and configuration name. # Here's how to do it in this case: Config = bertconfig. from_json_file(output_config_file) model = BertForQuestionAnswering(config) state_dict = torch.load(output_model_file) model.load_state_dict(state_dict) tokenizer = BertTokenizer(output_vocab_file, Config = OpenAigptconfig. from_json_file(output_config_file) model = OpenAIGPTDoubleHeadsModel(config) state_dict = torch.load(output_model_file) model.load_state_dict(state_dict) tokenizer = OpenAIGPTTokenizer(output_vocab_file)Copy the code
The original link: huggingface co/transformer…
Welcome to panchuangai blog: panchuang.net/
OpenCV: woshicver.com/
Welcome to docs.panchuang.net/