模型文件路径
ls /home/wenyn/.cache/modelscope/hub/damo/nlp_structbert_zero-shot-classification_chinese-base
训练踩坑
config里面说是zero-shot-classification
** build_dataset error log: __call__() missing 2 required positional arguments: 'hypothesis_template' and 'candidate_labels'
** build_dataset error log: __call__() missing 2 required positional arguments: 'hypothesis_template' and 'candidate_labels'
可能需要z-s-c 自动
trainer.train() 之后 dataset会变
它的config 说的dataset 还是跟Nli的,试着把任务改成nli
找到z-s-c 的dataset地方
在configuration
任务改成Nli,好像没有Map到key
怎样到features
image.png
训练成功后
发现 反而很多Bias, 家居全是0?
原:
labels = ['家居', '旅游', '科技', '军事', '游戏', '故事']
sentence = '世界那么大,我想去看看'
print(classifier(sentence, candidate_labels=labels))
sentence = '苟利国家生死以,岂因祸福避趋之'
print(classifier(sentence, candidate_labels=labels))
{'labels': ['旅游', '故事', '游戏', '家居', '科技', '军事'], 'scores': [0.511588454246521, 0.16600897908210754, 0.11971477419137955, 0.08431538194417953, 0.06298772990703583, 0.05538470670580864]}
{'labels': ['游戏', '故事', '家居', '旅游', '军事', '科技'], 'scores': [0.24303244054317474, 0.20803643763065338, 0.17602896690368652, 0.17113320529460907, 0.11692868918180466, 0.08484029024839401]}
现:
2024-03-26 10:33:27,904 - modelscope - INFO - The key of sentence1: premise, The key of sentence2: hypothesis, The key of label: label
2024-03-26 10:33:27,913 - modelscope - INFO - The key of sentence1: premise, The key of sentence2: hypothesis, The key of label: label
Keyword arguments {'candidate_labels': ['家居', '旅游', '科技', '军事', '游戏', '故事'], 'hypothesis_template': '{}'} not recognized.
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:962: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
{'labels': ['家居'], 'scores': [1.0]}
Keyword arguments {'candidate_labels': ['家居', '旅游', '科技', '军事', '游戏', '故事'], 'hypothesis_template': '{}'} not recognized.
{'labels': ['家居'], 'scores': [1.0]}
全是0
{
"framework": "pytorch",
"task": "nli", #好像必须nli? 来推理做训练?
"preprocessor": {
"type": "sen-sim-tokenizer", #用的nli
"first_sequence": "premise",
"second_sequence": "hypothesis",
"label": "label",
"label2id": {
"0": 0,
"1": 1,
"2": 2
}
},
"model": {
"type": "structbert"
},
"pipeline": {
"type": "zero-shot-classification" #看看行不行
},
"dataset": {
"train": {
"first_sequence": "premise",
"second_sequence": "hypothesis",
"label": "label"
}
},
"train": {
"work_dir": "/tmp",
"max_epochs": 5,
"dataset": {
"train": {
"labels": [
"0",
"1",
"2"
],
"first_sequence": "premise",
"second_sequence": "hypothesis",
"label": "label"
}
},
"dataloader": {
"batch_size_per_gpu": 32,
"workers_per_gpu": 1
},
"optimizer": {
"type": "AdamW",
"lr": 2e-5,
"options": {}
},
"lr_scheduler": {
"type": "LinearLR",
"start_factor": 1.0,
"end_factor": 0.0,
"total_iters": 10,
"options": {
"by_epoch": false
}
},
"hooks": [
{
"type": "CheckpointHook",
"interval": 1
},
{
"type": "TextLoggerHook",
"interval": 1
},
{
"type": "IterTimerHook"
},
{
"type": "EvaluationHook",
"by_epoch": false,
"interval": 100
}
]
},
"evaluation": {
"dataloader": {
"batch_size_per_gpu": 16,
"workers_per_gpu": 1,
"shuffle": false
},
"metrics": [
"Metrics.seq_cls_metric" #要加
]
}
}
除此之外,如果只是希望下载模型到本地,我们还提供了更加底层的API接口snapshot_download() 。通过这个接口可以直接下载模型,并且可指定下载模型的地址。
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('damo/nlp_structbert_word-segmentation_chinese-base', cache_dir='path/to/local/dir')
不知道下到哪里去了, 找到之后要删掉
远端不同步Bug, 找不到路径?循环找路径?
好像发生过,pycharm remote 有文件,本地和docker里面的command line也没有
(docker command 要重启?)
KeyError: 'Metrics.seq_cls_metric is not in the metrics registry group default. Please make sure the correct version of ModelScope library is used.'
try:
取消Metrics
MMRotate写自己的模型运行时出现“xxxxx is not in the model registry” - 知乎 (zhihu.com)