当前位置: 首页 > news >正文

上海公司做网站如何设计网站的首页

上海公司做网站,如何设计网站的首页,北京设计装修公司排名,做短视频网站需要审批目录 决策树优化与可视化 1 决策树分类 2 决策树可视化 3 显示树的特征重要性 特征重要性可视化 决策树回归 1 决策树回归 决策树优化与可视化 1 决策树分类 from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier from sk…

目录

决策树优化与可视化

1 决策树分类

2 决策树可视化

3 显示树的特征重要性

 特征重要性可视化

决策树回归

1 决策树回归


决策树优化与可视化

1 决策树分类

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as npcancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state = 42)
tree = DecisionTreeClassifier(random_state=0)tree.fit(X_train, y_train)
print("Accuracy on traning set:{:.3f}".format(tree.score(X_train, y_train)))
print("Accuracy on test set:{:.3f}".format(tree.score(X_test, y_test)))
print("tree max depth:{}".format(tree. tree_.max_depth))
# 报错:AttributeError: 'function' object has no attribute 'data' function对象没有data属性
# 解决之后:
#Accuracy on traning set:1.000
#Accuracy on test set:0.937
#tree max depth:7

可以得到,训练集的精度是100%,这是因为叶子结点都是纯的,树的深度为7,足以完美地记住训练数据的所有标签,测试集泛化精度只有93.7%,明显过拟合。

不限制决策树的深度,它的深度和复杂度都可以变得特别大。故未剪枝的树容易过拟合,对新数据的泛化性能不佳。

现在将预剪枝应用在决策树上,可以阻止树的完全生长。

设置max_depth=4,这表明构造的决策树只有4层,限制树的深度可以减少过拟合,这会降低训练集的精度,但可以提高测试集的精度。

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as npcancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state = 42)
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)
print("Accuracy on traning set:{:.3f}".format(tree.score(X_train, y_train)))
print("Accuracy on test set:{:.3f}".format(tree.score(X_test, y_test)))
Accuracy on traning set:0.988
Accuracy on test set:0.951

训练精度为98.8%,测试精度为95.1%,树的最大深度只有4层,降低了训练精度,但提高了泛化(测试)精度,改善了过拟合的状况。

2 决策树可视化

 

 使用 pip3 install graphviz 后, import graphviz 仍然报错:

ModuleNotFoundError: No module named 'graphviz'

使用命令:conda install python-graphviz;


 

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as np
import graphviz
from sklearn.tree import export_graphviz
cancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state = 42)
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)
export_graphviz(tree,out_file="tree.dot",class_names=["malignat","benign"],feature_names=cancer.feature_names,impurity=False,filled=True)with open("tree.dot") as f:dot_graph = f.read() 
graphviz.Source(dot_graph)# out:ModuleNotFoundError: No module named 'graphviz'

尝试了很多种方法并没有解决问题‼️

http://t.csdn.cn/wAVEK ⬅️可用此方法再次验证

3 显示树的特征重要性

其中最常用的是特征重要性(Feature Importance),每个特征对树决策的重要性进行排序, 其中0表示“根本没用到”,1表示“完美预测目标值”,特征重要性的求和始终为1。

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as npcancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state = 42)
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)
print("Feature imprtance:\n{}".format(tree.feature_importances_))

Feature imprtance:
[0.         0.         0.         0.         0.         0.0.         0.         0.         0.         0.01019737 0.048398250.         0.         0.0024156  0.         0.         0.0.         0.         0.72682851 0.0458159  0.         0.0.0141577  0.         0.018188   0.1221132  0.01188548 0.        ]

 特征重要性可视化

from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import numpy as npcancer = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, stratify=cancer.target, random_state = 42)
tree = DecisionTreeClassifier(max_depth=4, random_state=0)
tree.fit(X_train, y_train)
print("Feature imprtance:\n{}".format(tree.feature_importances_))def plot_feature_importances_cancer(model):n_features = cancer.data.shape[1]plt.barh(range(n_features),model.feature_importances_,align='center')plt.yticks(np.arange(n_features),cancer.feature_names)plt.xlabel("Feature importance")plt.ylabel("Feature")plot_feature_importances_cancer(tree)

 


决策树回归

1 决策树回归

#决策树回归
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
boston = datasets.load_boston()X = boston.data
y = boston.target
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=666)# DecisionTreeRegressor决策树的回归器
from sklearn.tree import DecisionTreeRegressor
dt_reg = DecisionTreeRegressor( max_depth= 11 )
dt_reg.fit(X_train, y_train)
print(dt_reg.score(X_test,y_test))
print(dt_reg.score(X_train,y_train))
# 0.6005800948958887
# 1.0# 此时决策树在训练数据集上预测准确率是百分百的,但是在测试数据集上只有60%的准确率
# 很显然出现了过拟合,可通过设置树深来改善过拟合
# 0.6908496704356424
# 0.9918292293652428

此时决策树在训练数据集上预测准确率是百分百的,但是在测试数据集上只有60%的准确率,很显然出现了过拟合,可通过设置树深来改善过拟合。

http://www.hkea.cn/news/948152/

相关文章:

  • 做网站时如何去掉网站横条小红书软文案例
  • 图虫南宁百度快速排名优化
  • 上城网站建设app推广文案
  • 网站建设特点宁波seo搜索引擎优化公司
  • 地产商网站建设网球新闻最新消息
  • 做爰全过程网站免费的视频谷歌seo搜索引擎
  • 怎么架设网站seo推广培训
  • 自己网站做问卷调查网页设计学生作业模板
  • 清远企业网站排名深圳网站建设系统
  • 互助平台网站建设费用卡点视频免费制作软件
  • 上海做b2b国际网站公司排名优化公司电话
  • 裙晖wordpress重庆seo整站优化
  • 乌克兰网站后缀谷歌浏览器下载电脑版
  • 建设部网站撤销注册资质的都是公职人员吗正规网络公司关键词排名优化
  • 杂志网站建设推广方案铜川网络推广
  • 网站建设后怎么搜索引擎优化解释
  • 网站建设维护 天博网络成都营销型网站制作
  • 秦皇岛北京网站建设百度广告投放电话
  • 团购做的比较好的网站营销推广ppt
  • 网站怎么做网站地图重庆网站制作公司哪家好
  • wordpress改地址后打不开seo品牌优化整站优化
  • 网页设计师证书含金量高吗百度网络优化
  • 咸阳网站开发长沙seo优化公司
  • 网站通cms国内十大搜索引擎排名
  • centos7安装 wordpress网站如何进行seo
  • 设计师灵感网站美国今天刚刚发生的新闻
  • 重庆南岸营销型网站建设公司推荐竞价sem托管
  • 深圳做二维码网站建设什么是互联网营销
  • 网易企业邮箱收费标准百色seo关键词优化公司
  • 做网站的财务需求张北网站seo