当前位置: 首页 > news >正文

南昌住房城市建设支行官方网站关键词seo优化

南昌住房城市建设支行官方网站,关键词seo优化,wordpress 全站静态化,用rp怎么做网站功能按钮文章最前: 我是Octopus,这个名字来源于我的中文名--章鱼;我热爱编程、热爱算法、热爱开源。所有源码在我的个人github ;这博客是记录我学习的点点滴滴,如果您对 Python、Java、AI、算法有兴趣,可以关注我的…

文章最前: 我是Octopus,这个名字来源于我的中文名--章鱼;我热爱编程、热爱算法、热爱开源。所有源码在我的个人github ;这博客是记录我学习的点点滴滴,如果您对 Python、Java、AI、算法有兴趣,可以关注我的动态,一起学习,共同进步。

   相关文章:

  1. PySpark 概述
  2. Spark连接快速入门
  3. Spark上使用pandas API快速入门

创建pyspark对象

import warnings
warnings.filterwarnings('ignore')
#import pandas as pd
#import numpy as np
from datetime import timedelta, date, datetime
import time
import gc
import os
import argparse                             
import sysfrom pyspark.sql import SparkSession, functions as fn
from pyspark.ml.feature import StringIndexer
from pyspark.ml.recommendation import ALS
from pyspark.sql.types import *
from pyspark import StorageLevel
spark = SparkSession \.builder \.appName("stockout_test") \.config("hive.exec.dynamic.partition.mode", "nonstrict") \.config("spark.sql.sources.partitionOverwriteMode", "dynamic")\.config("spark.driver.memory", '20g')\.config("spark.executor.memory", '40g')\.config("spark.yarn.executor.memoryOverhead", '1g')\.config("spark.executor.instances", 8)\.config("spark.executor.cores", 8)\.config("spark.kryoserializer.buffer.max", '128m')\.config("spark.yarn.queue", 'root.algo')\.config("spark.executorEnv.OMP_NUM_THREADS", 12)\.config("spark.executorEnv.ARROW_PRE_0_15_IPC_FORMAT", 1) \.config("spark.default.parallelism", 800)\.enableHiveSupport() \.getOrCreate()
spark.sql("set hive.exec.dynamic.partition.mode = nonstrict")
spark.sql("set hive.exec.dynamic.partition=true")
spark.sql("set spark.sql.autoBroadcastJoinThreshold=-1")

创建DataFrame

employee_salary = [("zhangsan", "IT", 8000),("lisi", "IT", 7000),("wangwu", "IT", 7500),("zhaoliu", "ALGO", 10000),("qisan", "IT", 8000),("bajiu", "ALGO", 12000),("james", "ALGO", 11000),("wangzai", "INCREASE", 7000),("carter", "INCREASE", 8000),("kobe", "IT", 9000)]columns= ["name", "department", "salary"]
df = spark.createDataFrame(data = employee_salary, schema = columns)
df.show()
+--------+----------+------+
|    name|department|salary|
+--------+----------+------+
|zhangsan|        IT|  8000|
|    lisi|        IT|  7000|
|  wangwu|        IT|  7500|
| zhaoliu|      ALGO| 10000|
|   qisan|        IT|  8000|
|   bajiu|      ALGO| 12000|
|   james|      ALGO| 11000|
| wangzai|  INCREASE|  7000|
|  carter|  INCREASE|  8000|
|    kobe|        IT|  9000|
+--------+----------+------+

row_number()

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
df.withColumn("row_number", F.row_number().over(windowSpec)).show(truncate=False)
+--------+----------+------+----------+
|name    |department|salary|row_number|
+--------+----------+------+----------+
|carter  |INCREASE  |8000  |1         |
|wangzai |INCREASE  |7000  |2         |
|kobe    |IT        |9000  |1         |
|zhangsan|IT        |8000  |2         |
|qisan   |IT        |8000  |3         |
|wangwu  |IT        |7500  |4         |
|lisi    |IT        |7000  |5         |
|bajiu   |ALGO      |12000 |1         |
|james   |ALGO      |11000 |2         |
|zhaoliu |ALGO      |10000 |3         |
+--------+----------+------+----------+

Rank()

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
df.withColumn("rank",F.rank().over(windowSpec)).show(truncate=False)
+--------+----------+------+----+
|name    |department|salary|rank|
+--------+----------+------+----+
|carter  |INCREASE  |8000  |1   |
|wangzai |INCREASE  |7000  |2   |
|kobe    |IT        |9000  |1   |
|qisan   |IT        |8000  |2   |
|zhangsan|IT        |8000  |2   |
|wangwu  |IT        |7500  |4   |
|lisi    |IT        |7000  |5   |
|bajiu   |ALGO      |12000 |1   |
|james   |ALGO      |11000 |2   |
|zhaoliu |ALGO      |10000 |3   |
+--------+----------+------+----+

dense_rank()

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
df.withColumn("dense_rank",F.dense_rank().over(windowSpec)).show()
+--------+----------+------+----------+
|    name|department|salary|dense_rank|
+--------+----------+------+----------+
|  carter|  INCREASE|  8000|         1|
| wangzai|  INCREASE|  7000|         2|
|    kobe|        IT|  9000|         1|
|   qisan|        IT|  8000|         2|
|zhangsan|        IT|  8000|         2|
|  wangwu|        IT|  7500|         3|
|    lisi|        IT|  7000|         4|
|   bajiu|      ALGO| 12000|         1|
|   james|      ALGO| 11000|         2|
| zhaoliu|      ALGO| 10000|         3|
+--------+----------+------+----------+

lag()

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
df.withColumn("lag",F.lag("salary",1).over(windowSpec)).show()
+--------+----------+------+-----+
|    name|department|salary|  lag|
+--------+----------+------+-----+
|  carter|  INCREASE|  8000| null|
| wangzai|  INCREASE|  7000| 8000|
|    kobe|        IT|  9000| null|
|zhangsan|        IT|  8000| 9000|
|   qisan|        IT|  8000| 8000|
|  wangwu|        IT|  7500| 8000|
|    lisi|        IT|  7000| 7500|
|   bajiu|      ALGO| 12000| null|
|   james|      ALGO| 11000|12000|
| zhaoliu|      ALGO| 10000|11000|
+--------+----------+------+-----+

lead()

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
df.withColumn("lead",F.lead("salary", 1).over(windowSpec)).show()
+--------+----------+------+-----+
|    name|department|salary| lead|
+--------+----------+------+-----+
|  carter|  INCREASE|  8000| 7000|
| wangzai|  INCREASE|  7000| null|
|    kobe|        IT|  9000| 8000|
|zhangsan|        IT|  8000| 8000|
|   qisan|        IT|  8000| 7500|
|  wangwu|        IT|  7500| 7000|
|    lisi|        IT|  7000| null|
|   bajiu|      ALGO| 12000|11000|
|   james|      ALGO| 11000|10000|
| zhaoliu|      ALGO| 10000| null|
+--------+----------+------+-----+

Aggregate Functions

from pyspark.sql.window import Window
import pyspark.sql.functions as FwindowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
windowSpecAgg  = Window.partitionBy("department")df.withColumn("row", F.row_number().over(windowSpec)) \.withColumn("avg", F.avg("salary").over(windowSpecAgg)) \.withColumn("sum", F.sum("salary").over(windowSpecAgg)) \.withColumn("min", F.min("salary").over(windowSpecAgg)) \.withColumn("max", F.max("salary").over(windowSpecAgg)) \.withColumn("count", F.count("salary").over(windowSpecAgg)) \.withColumn("distinct_count", F.approx_count_distinct("salary").over(windowSpecAgg)) \.show()
+--------+----------+------+---+-------+-----+-----+-----+-----+--------------+
|    name|department|salary|row|    avg|  sum|  min|  max|count|distinct_count|
+--------+----------+------+---+-------+-----+-----+-----+-----+--------------+
|  carter|  INCREASE|  8000|  1| 7500.0|15000| 7000| 8000|    2|             2|
| wangzai|  INCREASE|  7000|  2| 7500.0|15000| 7000| 8000|    2|             2|
|    kobe|        IT|  9000|  1| 7900.0|39500| 7000| 9000|    5|             4|
|zhangsan|        IT|  8000|  2| 7900.0|39500| 7000| 9000|    5|             4|
|   qisan|        IT|  8000|  3| 7900.0|39500| 7000| 9000|    5|             4|
|  wangwu|        IT|  7500|  4| 7900.0|39500| 7000| 9000|    5|             4|
|    lisi|        IT|  7000|  5| 7900.0|39500| 7000| 9000|    5|             4|
|   bajiu|      ALGO| 12000|  1|11000.0|33000|10000|12000|    3|             3|
|   james|      ALGO| 11000|  2|11000.0|33000|10000|12000|    3|             3|
| zhaoliu|      ALGO| 10000|  3|11000.0|33000|10000|12000|    3|             3|
+--------+----------+------+---+-------+-----+-----+-----+-----+--------------+
from pyspark.sql.window import Window
import pyspark.sql.functions as F
# 需要注意的是 approx_count_distinct() 函数适用于窗函数的统计,
# 而在groupby中通常用countDistinct()来代替该函数,用来求组内不重复的数值的条数。
# approx_count_distinct()取的是近似的数值,不太准确,使用需注意 windowSpec  = Window.partitionBy("department").orderBy(F.desc("salary"))
windowSpecAgg  = Window.partitionBy("department")df.withColumn("row", F.row_number().over(windowSpec)) \.withColumn("avg", F.avg("salary").over(windowSpecAgg)) \.withColumn("sum", F.sum("salary").over(windowSpecAgg)) \.withColumn("min", F.min("salary").over(windowSpecAgg)) \.withColumn("max", F.max("salary").over(windowSpecAgg)) \.withColumn("count", F.count("salary").over(windowSpecAgg)) \.withColumn("distinct_count", F.approx_count_distinct("salary").over(windowSpecAgg)) \.where(F.col("row")==1).select("department","avg","sum","min","max","count","distinct_count") \.show()

 +----------+-------+-----+-----+-----+-----+--------------+ |department| avg| sum| min| max|count|distinct_count| +----------+-------+-----+-----+-----+-----+--------------+ | INCREASE| 7500.0|15000| 7000| 8000| 2| 2| | IT| 7900.0|39500| 7000| 9000| 5| 4| | ALGO|11000.0|33000|10000|12000| 3| 3| +----------+-------+-----+-----+-----+-----+--------------+

http://www.hkea.cn/news/848/

相关文章:

  • 设计网站 站什么网2022最新新闻
  • 个人网站怎么做百度推广seo搜索排名优化公司
  • 在越南做网站需要什么2021年重大新闻事件
  • 罗湖企业网站建设网站建设报价明细表
  • 直播网站的建设西安百度seo推广
  • 建设网站前的市场分析包括下载百度手机助手
  • 化妆品网站建设报告上海百网优seo优化公司
  • 网站怎样自己做推广seo网站优化服务
  • 产品网站免费模板下载地址免费站推广网站在线
  • website是什么意思深圳网站关键词优化公司
  • 建立外贸网站seo行业岗位有哪些
  • 南宁企业建站模板武汉seo优化服务
  • 国外flash网站模板长沙网站到首页排名
  • wordpress部署到saeseo专员是什么职位
  • 佛山企业网站建设平台一键免费建站
  • 汉中市建设局网站seo系统源码
  • 买好域名之后怎么做网站大数据免费查询平台
  • wordpress主题比较宝鸡网站seo
  • wordpress 物流信息大同优化推广
  • 超云建站企业建站 平台
  • php网站后台制作人民日报官网
  • 政府网站建设典型经验材料app推广平台有哪些
  • 网站域名去哪里备案网络营销策略分析报告
  • 网站运营发展前景百度网盘搜索引擎入口
  • asp做的网站如何更新百度统计收费吗
  • 百度云怎么做网站空间网站搜索引擎优化技术
  • 公司做网站要注意什么宁波做网站的公司
  • 网站引导页设计seo门户
  • 网站建设发展现状下载百度导航app
  • 青岛网站维护公司aso优化注意什么