黄陂区建设局网站,手机端店铺装修,wordpress邮箱功能不安全,柳州关键词优化网站问题描述#xff1a;
在训练模型的过程中#xff0c;出现 clip_image_processor 无法处理数据的问题#xff0c;说明数据集中很可能出现了脏数据。本文使用的数据为 LAION-Aesthetics-V2-6.5plus#xff0c;从 https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-…问题描述
在训练模型的过程中出现 clip_image_processor 无法处理数据的问题说明数据集中很可能出现了脏数据。本文使用的数据为 LAION-Aesthetics-V2-6.5plus从 https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus 上下载的。
Traceback (most recent call last):
...File /xxx/check_train_data.py, line 69, in __getitem__raise e # Re-raise the exception to halt the training process^^^^^^^File /xxx/check_train_data.py, line 64, in __getitem__clip_image self.clip_image_processor(imagesraw_image, return_tensorspt).pixel_values^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File /xxx/lib/python3.12/site-packages/transformers/image_processing_utils.py, line 41, in __call__return self.preprocess(images, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File /xxx/lib/python3.12/site-packages/transformers/models/clip/image_processing_clip.py, line 341, in preprocessself.normalize(imageimage, meanimage_mean, stdimage_std, input_data_formatinput_data_format)File /xxx/lib/python3.12/site-packages/transformers/image_processing_utils.py, line 111, in normalizereturn normalize(^^^^^^^^^^File /xxx/lib/python3.12/site-packages/transformers/image_transforms.py, line 392, in normalizeraise ValueError(fmean must have {num_channels} elements if it is an iterable, got {len(mean)})
ValueError: mean must have 1 elements if it is an iterable, got 3解决方案
将原代码的 clip_image self.clip_image_processor 修改为 try、except 来找到导致报错的图片。将加载数据的代码部分拎出并遍历一遍。 # read imageraw_image Image.open(os.path.join(self.image_root_path, image_file))image self.transform(raw_image.convert(RGB))# clip_image self.clip_image_processor(imagesraw_image, return_tensorspt).pixel_valuestry:clip_image self.clip_image_processor(imagesraw_image, return_tensorspt).pixel_valuesprint(fimage_file_{idx} processed with clip_image_processor: {image_file})except Exception as e:print(fError processing image_file_{idx}: {image_file})print(e)raise e # Re-raise the exception to halt the training process最终卡在 4235 附近的图片通过肉眼观察发现 4236 是图片空的手动删除 4236 图片以及对应的 json 文本后便可正常训练️