当前位置: 首页 > news >正文

网站什么意思石家庄seo扣费

网站什么意思,石家庄seo扣费,代理公司注册代理公司注册汇发财税,做家教什么网站比较好join是两个结果集之间的链接,需要进行数据的匹配。 演示一下join是否存在shuffle。 1. 如果两个rdd没有分区器,分区个数一致 ,会发生shuffle。但分区数量不变。 scala> val arr Array(("zhangsan",300),("lisi",…

join是两个结果集之间的链接,需要进行数据的匹配。

演示一下join是否存在shuffle。

1. 如果两个rdd没有分区器,分区个数一致

,会发生shuffle。但分区数量不变。

scala> val arr = Array(("zhangsan",300),("lisi",400),("wangwu",350),("zhaosi",450))
arr: Array[(String, Int)] = Array((zhangsan,300), (lisi,400), (wangwu,350), (zhaosi,450))scala> val arr1 = Array(("zhangsan",22),("lisi",24),("wangwu",30),("guangkun",5))
arr1: Array[(String, Int)] = Array((zhangsan,22), (lisi,24), (wangwu,30), (guangkun,5))scala> sc.makeRDD(arr,3)
res116: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[108] at makeRDD at <console>:27scala> sc.makeRDD(arr1,3)
res117: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[109] at makeRDD at <console>:27scala> res116 join res117
res118: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD[112] at join at <console>:28scala> res118.collect
res119: Array[(String, (Int, Int))] = Array((zhangsan,(300,22)), (wangwu,(350,30)), (lisi,(400,24)))

2. 如果分区个数不一致,有shuffle,且产生的rdd的分区个数以多的为主。

3. 如果分区个数一样并且分区器一样,那么是没有shuffle的

scala> sc.makeRDD(arr,3)
res128: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[118] at makeRDD at <console>:27scala> sc.makeRDD(arr1,3)
res129: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[119] at makeRDD at <console>:27scala> res128.reduceByKey(_+_)
res130: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[120] at reduceByKey at <console>:26scala> res129.reduceByKey(_+_)
res131: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[121] at reduceByKey at <console>:26scala> res130 join res131
res132: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD[124] at join at <console>:28scala> res132.collect
res133: Array[(String, (Int, Int))] = Array((zhangsan,(300,22)), (wangwu,(350,30)), (lisi,(400,24)))scala> res132.partitions.size
res134: Int = 3

4. 都存在分区器但是分区个数不同,也会存在shuffle

scala> val arr = Array(("zhangsan",300),("lisi",400),("wangwu",350),("zhaosi",450))
arr: Array[(String, Int)] = Array((zhangsan,300), (lisi,400), (wangwu,350), (zhaosi,450))scala>  val arr1 = Array(("zhangsan",22),("lisi",24),("wangwu",30),("guangkun",5))
arr1: Array[(String, Int)] = Array((zhangsan,22), (lisi,24), (wangwu,30), (guangkun,5))scala> sc.makeRDD(arr,3)
res0: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:27scala> sc.makeRDD(arr1,4)
res1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[1] at makeRDD at <console>:27scala> res0.reduceByKey(_+_)
res2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:26scala> res1.reduceByKey(_+_)
res3: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[3] at reduceByKey at <console>:26scala> res2 join res3
res4: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD[6] at join at <console>:28scala> res4.collect
res5: Array[(String, (Int, Int))] = Array((zhangsan,(300,22)), (wangwu,(350,30)), (lisi,(400,24)))scala> res4.partitions.size
res6: Int = 4

这里为啥stage3里reduceByKey和join过程是连在一起的,因为分区多的RDD是不需要进行shuffle的,数据该在哪个分区就在哪个分区,反而是分区少的RDD要进行join,要进行数据的打散。

分区以多的为主。

5. 一个带有分区器一个没有分区器,那么以带有分区器的rdd分区数量为主,并且存在shuffle

scala> arr
res7: Array[(String, Int)] = Array((zhangsan,300), (lisi,400), (wangwu,350), (zhaosi,450))scala> arr1
res8: Array[(String, Int)] = Array((zhangsan,22), (lisi,24), (wangwu,30), (guangkun,5))scala> sc.makeRDD(arr,3)
res9: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[7] at makeRDD at <console>:27scala> sc.makeRDD(arr,4)
res10: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[8] at makeRDD at <console>:27scala> res9.reduceByKey(_+_)
res11: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[9] at reduceByKey at <console>:26scala> res10 join res11
res12: org.apache.spark.rdd.RDD[(String, (Int, Int))] = MapPartitionsRDD[12] at join at <console>:28scala> res12.partitions.size
res13: Int = 3scala> res12.collect
res14: Array[(String, (Int, Int))] = Array((zhangsan,(300,300)), (wangwu,(350,350)), (lisi,(400,400)), (zhaosi,(450,450)))

同理,stage6的reduceByKey过程和join过程是连在一起的,是因为有分区器的RDD并不需要进行shuffle操作,原来的数据该在哪在哪,而没有分区器的RDD要进行join要进行数据的打散,有shuffle过程,所以有stage4到stage6的连线。

http://www.hkea.cn/news/159355/

相关文章:

  • 贸易网站建设互联网广告代理加盟
  • 深圳网站建设网络公司河北关键词排名推广
  • 在工商网上怎么注册公司seo优化博客
  • 免费的小程序怎么赚钱历下区百度seo
  • 河北石家庄最新疫情最新消息优化防疫政策
  • 一站式做网站哪家强新闻小学生摘抄
  • 江西南昌网站建设公司哪家好谷歌google 官网下载
  • 公司网站用什么开发百度指数怎么用
  • 建站主机 wordpress济南网站万词优化
  • 哈尔滨app开发seo自学网官网
  • 网站答辩ppt怎么做全网关键词云在哪里看
  • 网站建设 视频seo关键词词库
  • 网站应用软件设计成都网站建设技术外包
  • 用哪个软件做网站网址查询域名解析
  • 网站安全优化域名停靠浏览器
  • 我做中医培训去哪个网站找学员谷歌排名算法
  • 如何将网站让百度收录网店培训班
  • wordpress旧版页面编辑界面百度seo推广计划类型包括
  • 网站建设茶店网网站换友链平台
  • 珠海建设工程信息网站网络营销百度百科
  • 帮别人做网站推广犯法吗关键词排名网站
  • 建设通网站是政府的么高端网站定制设计
  • 玉溪做网站的公司夸克搜索网页版
  • wordpress导航主题haowseo挂机赚钱
  • 广州做家教的网站深圳网络推广招聘
  • 锐捷网络公司排名seo技术介绍
  • 新圩做网站公司拼多多代运营一般多少钱
  • 免费网站可以做cpa?短视频营销的优势
  • b2b外贸营销型网站如何做电商赚钱
  • 建设无障碍网站seo分析报告怎么写