当前位置：首页 > news >正文

网站建设秋实wordpress图片转移oss

news 2026/4/24 21:36:31

网站建设秋实,wordpress图片转移oss,深圳市南山区做网站的小公司,营销型网站试运营调忧#x1f368; 本文为#x1f517;365天深度学习训练营中的学习记录博客#x1f356; 原作者#xff1a;K同学啊 SVM与集成学习 SVMSVM线性模型SVM非线性模型SVM常用参数集成学习随机森林导入数据查看数据信息数据分析随机森林模型预测结果结果分析个人总结 SVM 超平面本文为365天深度学习训练营中的学习记录博客原作者K同学啊 SVM与集成学习 SVMSVM线性模型SVM非线性模型SVM常用参数集成学习随机森林导入数据查看数据信息数据分析随机森林模型预测结果结果分析个人总结 SVM 超平面SVM 在特征空间中寻找一个能够最大化类别间隔的超平面称为最大间隔超平面。这个超平面就是将数据集分成不同类别的边界。支持向量支持向量是离分隔超平面最近的样本点它们决定了超平面的位置和方向。换句话说只有这些样本对分类结果有影响其他的样本点则不影响。 SVM线性模型 from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC from sklearn.metrics import accuracy_score# 加载数据集 iris datasets.load_iris() X iris.data y iris.target# 数据预处理 sc StandardScaler() X sc.fit_transform(X)# 训练集和测试集的分割 X_train, X_test, y_train, y_test train_test_split(X, y, test_size0.3, random_state42)# 创建SVM模型 svm SVC(kernellinear, C1.0)# 训练模型 svm.fit(X_train, y_train)# 预测 y_pred svm.predict(X_test)# 评估模型性能 accuracy accuracy_score(y_test, y_pred) print(Accuracy: %.2f % (accuracy * 100.0))SVM非线性模型 from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import SVC from sklearn.metrics import accuracy_score# 加载数据集 iris datasets.load_iris() X iris.data y iris.target# 数据预处理 sc StandardScaler() X sc.fit_transform(X)# 训练集和测试集的分割 X_train, X_test, y_train, y_test train_test_split(X, y, test_size0.3, random_state42)# 创建SVM模型 svm SVC(kernelrbf, C1.0, gamma0.1)# 训练模型 svm.fit(X_train, y_train)# 预测 y_pred svm.predict(X_test)# 评估模型性能 accuracy accuracy_score(y_test, y_pred) print(Accuracy: %.2f % (accuracy * 100.0))SVM常用参数 C默认值1.0 ○ 作用惩罚参数用于平衡最大化分类间隔和误分类惩罚之间的关系。 ○ 解释较大的 C 值意味着对误分类的惩罚更大模型会倾向于将更多的训练数据点分类正确但可能会导致间隔变小可能出现过拟合较小的 C 值则会更关注于间隔的大小而允许更多的误分类从而提高模型的泛化能力。 ○ 常用范围通常在 0.001 到 1000 之间进行调节。kernel默认值‘rbf’ ○ 作用指定要使用的核函数支持不同的非线性映射方法。 ○ 可选值 ■ ‘linear’线性核函数即不进行任何非线性映射。 ■ ‘poly’多项式核函数通常用于多项式可分的情况。 ■ ‘rbf’径向基函数Radial Basis Function又称高斯核是最常用的非线性核函数。 ■ ‘sigmoid’类似于神经网络的激活函数较少使用。 ■ 你也可以传递自定义核函数方法是传递一个函数。degree 默认值3 ○ 作用当 kernel‘poly’ 时指定多项式核的多项式次数。 ○ 解释如果使用多项式核函数polydegree 参数决定多项式的阶数通常是 2 或 3。gamma默认值‘scale’ ○ 作用核函数系数适用于 ‘rbf’、‘poly’ 和 ‘sigmoid’ 核函数。 ○ 可选值 ■ ‘scale’使用 1 / (n_features * X.var()) 作为默认值。这个值会根据输入特征的数量和方差自动调整。 ■ ‘auto’使用 1 / n_features 作为值。 ○ 解释gamma 值越大模型越倾向于拟合训练数据但可能会导致过拟合gamma 值越小模型更倾向于平滑。coef0默认值0.0 ○ 作用核函数中的独立项仅在 kernel‘poly’ 或 kernel‘sigmoid’ 时有意义。 ○ 解释用于控制多项式核函数和 sigmoid 核函数中的偏移量。集成学习 Bagging在做预测时对于分类任务使用简单的投票法。对于回归任务使用简单平均法。若分类预测时出现两个类票数一样时则随机选择一个。Boosting 工作原理弱学习器中的弱学习器通常是性能稍微优于随机猜测的模型通常使用简单的模型如浅层决策树。加权训练在每一次迭代中Boosting 会调整每个样本的权重增加那些前一次模型预测错误样本的权重使得后续的学习器更关注这些难以分类的样本。加权投票最终模型是通过将所有弱学习器的预测结果加权整合而成通常采用加权投票分类问题或加权平均回归问题。随机森林一种基于集成学习的算法主要用于分类和回归分析。随机森林通过结合多个决策树来提高模型的准确性和稳健性步骤如下随机抽样从原始训练数据中随机抽取多个样本集通常是相同大小为每棵决策树准备训练数据。构建决策树对于每个样本集根据随机选取的特征构建一棵决策树。树的生长过程中使用信息增益、基尼指数等标准进行节点分裂。集成预测对于分类任务随机森林通过对所有决策树的预测进行投票选择票数最多的类别作为最终类别。对于回归任务计算所有树的预测值的平均值。导入数据 import pandas as pd import numpy as np import seaborn as sns from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_reportdata pd.read_csv(rC:\Users\11054\Desktop\kLearning\L678_learning\data.csv) data查看数据信息 data.info()import matplotlib.pyplot as pltplt.rcParams[font.family] SimHei # 指定默认字体为黑体 feature_map {Temperature: 温度,Humidity: 湿度百分比,Wind Speed: 风速,Precipitation (%): 降水量百分比,Atmospheric Pressure: 大气压力,UV Index: 紫外线指数,Visibility (km): 能见度 } plt.figure(figsize(15, 10))for i, (col, col_name) in enumerate(feature_map.items(), 1):plt.subplot(2, 4, i)sns.boxplot(ydata[col])plt.title(f{col_name}的箱线图, fontsize14)plt.ylabel(数值, fontsize12)plt.grid(axisy, linestyle--, alpha0.7)plt.tight_layout() plt.show()C:\Users\11054\AppData\Local\Temp\ipykernel_7496\1699620420.py:22: UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.plt.tight_layout() C:\Users\11054\.conda\envs\kmate\lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.fig.canvas.print_figure(bytes_io, **kw)print(f温度超过60°C的数据量{data[data[Temperature] 60].shape[0]}占比{round(data[data[Temperature] 60].shape[0] / data.shape[0] * 100,2)}%。) print(f湿度百分比超过100%的数据量{data[data[Humidity] 100].shape[0]}占比{round(data[data[Humidity] 100].shape[0] / data.shape[0] * 100,2)}%。) print(f降雨量百分比超过100%的数据量{data[data[Precipitation (%)] 100].shape[0]}占比{round(data[data[Precipitation (%)] 100].shape[0] / data.shape[0] * 100,2)}%。)温度超过60°C的数据量207占比1.57%。湿度百分比超过100%的数据量416占比3.15%。降雨量百分比超过100%的数据量392占比2.97%。数据分析 data.describe(includeall)plt.figure(figsize(20, 15)) plt.subplot(3, 4, 1) sns.histplot(data[Temperature], kdeTrue,bins20) plt.title(温度分布) plt.xlabel(温度) plt.ylabel(频数)plt.subplot(3, 4, 2) sns.boxplot(ydata[Humidity]) plt.title(湿度百分比箱线图) plt.ylabel(湿度百分比)plt.subplot(3, 4, 3) sns.histplot(data[Wind Speed], kdeTrue,bins20) plt.title(风速分布) plt.xlabel(风速km/h) plt.ylabel(频数)plt.subplot(3, 4, 4) sns.boxplot(ydata[Precipitation (%)]) plt.title(降雨量百分比箱线图) plt.ylabel(降雨量百分比)plt.subplot(3, 4, 5) sns.countplot(xCloud Cover, datadata) plt.title(云量 (描述)分布) plt.xlabel(云量 (描述)) plt.ylabel(频数)plt.subplot(3, 4, 6) sns.histplot(data[Atmospheric Pressure], kdeTrue,bins10) plt.title(大气压分布) plt.xlabel(气压 (hPa)) plt.ylabel(频数)plt.subplot(3, 4, 7) sns.histplot(data[UV Index], kdeTrue,bins14) plt.title(紫外线等级分布) plt.xlabel(紫外线指数) plt.ylabel(频数)plt.subplot(3, 4, 8) Season_counts data[Season].value_counts() plt.pie(Season_counts, labelsSeason_counts.index, autopct%1.1f%%, startangle140) plt.title(季节分布)plt.subplot(3, 4, 9) sns.histplot(data[Visibility (km)], kdeTrue,bins10) plt.title(能见度分布) plt.xlabel(能见度Km) plt.ylabel(频数)plt.subplot(3, 4, 10) sns.countplot(xLocation, datadata) plt.title(地点分布) plt.xlabel(地点) plt.ylabel(频数)plt.subplot(3, 4, (11,12)) sns.countplot(xWeather Type, datadata) plt.title(天气类型分布) plt.xlabel(天气类型) plt.ylabel(频数)plt.tight_layout() plt.show()C:\Users\11054\AppData\Local\Temp\ipykernel_7496\3587563545.py:65: UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.plt.tight_layout() C:\Users\11054\.conda\envs\kmate\lib\site-packages\IPython\core\pylabtools.py:152: UserWarning: Glyph 8722 (\N{MINUS SIGN}) missing from current font.fig.canvas.print_figure(bytes_io, **kw)随机森林模型 new_data data.copy() label_encoders {} categorical_features [Cloud Cover, Season, Location, Weather Type] for feature in categorical_features:le LabelEncoder()new_data[feature] le.fit_transform(data[feature])label_encoders[feature] lefor feature in categorical_features:print(f{feature}特征的对应关系)for index, class_ in enumerate(label_encoders[feature].classes_):print(f {index}: {class_}) Cloud Cover特征的对应关系0: clear1: cloudy2: overcast3: partly cloudy Season特征的对应关系0: Autumn1: Spring2: Summer3: Winter Location特征的对应关系0: coastal1: inland2: mountain Weather Type特征的对应关系0: Cloudy1: Rainy2: Snowy3: Sunny# 构建xy x new_data.drop([Weather Type],axis1) y new_data[Weather Type]# 划分数据集 x_train,x_test,y_train,y_test train_test_split(x,y,test_size0.3,random_state15)# 构建随机森林模型 rf_clf RandomForestClassifier(random_state15) rf_clf.fit(x_train, y_train)预测结果 y_pred_rf rf_clf.predict(x_test) class_report_rf classification_report(y_test, y_pred_rf) print(class_report_rf)precision recall f1-score support0 0.87 0.93 0.90 10181 0.93 0.91 0.92 9672 0.96 0.92 0.94 10073 0.91 0.91 0.91 968accuracy 0.92 3960macro avg 0.92 0.91 0.92 3960 weighted avg 0.92 0.92 0.92 3960结果分析 feature_importances rf_clf.feature_importances_ features_rf pd.DataFrame({特征: x.columns, 重要度: feature_importances}) features_rf.sort_values(by重要度, ascendingFalse, inplaceTrue) plt.figure(figsize(10, 8)) sns.barplot(x重要度, y特征, datafeatures_rf) plt.xlabel(重要度) plt.ylabel(特征) plt.title(随机森林特征图) plt.show()个人总结学习了随机森林模型的使用理解了SVM和集成学习的基本原理

查看全文

http://www.hkea.cn/news/14399899/