网站1g的空间能用多久,建设主管部门指定网站,自己做网站需要会什么,如何做强企业网站1.前期准备
#xff08;1#xff09;首先要把hadoop集群#xff0c;hive和spark等配置好
hadoop集群#xff0c;hive的配置可以看看这个博主写的博客
大数据_蓝净云的博客-CSDN博客
或者看看黑马程序员的视频
黑马程序员大数据入门到实战教程#xff0c;大数据开发必…1.前期准备
1首先要把hadoop集群hive和spark等配置好
hadoop集群hive的配置可以看看这个博主写的博客
大数据_蓝净云的博客-CSDN博客
或者看看黑马程序员的视频
黑马程序员大数据入门到实战教程大数据开发必会的Hadoop、Hive云平台实战项目全套一网打尽_哔哩哔哩_bilibili
对于博主本人有关hadoop集群和hive的配置可以直接看这篇文章
黑马程序员hadoop三件套hdfs,Mapreduce,yarn的安装配置以及hive的安装配置-CSDN博客 spark配置参考文章
spark的安装配置_spark基本配置-CSDN博客
2最好把Finalshell也下载好具体下载教程详见如下文章
保姆级教程下载finalshell以及连接云服务器基础的使用教程_finalshell下载安装-CSDN博客 2.配置spark-sql
1首先在node1登录root用户接着进入hive安装目录conf目录修改hive-site.xml
cd /export/server/apache-hive-3.1.3-bin/conf/
vi hive-site.xml
添加如下内容 property namehive.spark.client.jar/name value${SPARK_HOME}/lib/spark-assembly-*.jar/value /property
propertynamehive.spark.client.jar/namevalue${SPARK_HOME}/lib/spark-assembly-*.jar/value
/property 2拷贝hive-site.xml到/export/server/spark-3.4.4-bin-hadoop3/conf同时分发到node2node3节点
cp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml /export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml node2:/export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
3拷贝MYSQL驱动到/export/server/spark-3.4.4-bin-hadoop3/jars/同时分发到node2node3节点
cp /export/server/apache-hive-3.1.3-bin/lib/mysql-connector-java-5.1.34.jar /export/server/spark-3.4.4-bin-hadoop3/jars/
scp /export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar node2:/export/server/spark-3.4.4-bin-hadoop3/jars/
scp /export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar node3:/export/server/spark-3.4.4-bin-hadoop3/jars/4在node1的/export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh 文件中配置 MySQL 驱动同时分发到node2node3节点
vi /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh 添加如下内容 export SPARK_CLASSPATH/export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar
export SPARK_CLASSPATH/export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar
分发 scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node2:/export/server/spark-3.4.4-bin-hadoop3/conf/ scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node2:/export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node3:/export/server/spark-3.4.4-bin-hadoop3/conf/ 5在node1修改日志级别同时分发到node2node3节点
cp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties.template /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties
vi /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties把以下这部分注释
rootLogger.level info rootLogger.appenderRef.stdout.ref console
注释后效果如下
# rootLogger.level info # rootLogger.appenderRef.stdout.ref console
再添加以下内容 rootLogger.level warn rootLogger.appenderRef.console.ref console
rootLogger.level warn
rootLogger.appenderRef.console.ref console
再分发 scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node2:/export/server/spark-3.4.4-bin-hadoop3/conf/ scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node2:/export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node3:/export/server/spark-3.4.4-bin-hadoop3/conf/3.体验spark-sql
1首先启动该启动的在node1此时是root用户直接复制以下命令到命令行运行即可
su - hadoop
start-dfs.sh
start-yarn.sh
nohup /export/server/hive/bin/hive --service metastore /export/server/hive/logs/metastore.log 21
cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./start-all.sh
jps
spark-sql
效果如下
[rootnode1 ~]# su - hadoop
Last login: Wed Dec 4 20:52:45 CST 2024 on pts/0
[hadoopnode1 ~]$ start-dfs.sh
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node1]
[hadoopnode1 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoopnode1 ~]$ nohup /export/server/hive/bin/hive --service metastore /export/server/hive/logs/metastore.log 21
[1] 39039
[hadoopnode1 ~]$ cd /export/server/spark-3.4.4-bin-hadoop3/sbin
[hadoopnode1 sbin]$ ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
node2: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out
node3: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out
[hadoopnode1 sbin]$ jps
39952 Jps
37553 SecondaryNameNode
38978 WebAppProxyServer
36902 NameNode
39127 Master
39143 VersionInfo
38537 NodeManager
37118 DataNode
38335 ResourceManager
[hadoopnode1 sbin]$ spark-sql
24/12/04 22:20:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/12/04 22:20:20 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
24/12/04 22:20:20 WARN HiveConf: HiveConf of name hive.spark.client.jar does not exist
Spark master: spark://node1:7077, Application Id: app-20241204222029-0000
spark-sql (default) use demo1;2在spark-sql中尝试一下写代码
随便选择一个数据库使用吧
use demo1;
-- 创建一个新表
CREATE TABLE employees (id INT,name STRING,salary DOUBLE
);-- 插入单条记录
INSERT INTO employees VALUES (4, Alice, 1300);
-- 插入多条记录
INSERT INTO employees VALUES
(5, Bob, 1400),
(6, Charlie, 1100);-- 查询表中的所有数据
SELECT * FROM employees;
-- 删除表
DROP TABLE IF EXISTS employees;效果如下
spark-sql (default) use demo1;
Time taken: 9.682 seconds
spark-sql (demo1) -- 创建一个新表
spark-sql (demo1) CREATE TABLE employees ( id INT, name STRING, salary DOUBLE );
24/12/04 22:22:50 WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
[TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view demo1.employees because it already exists.
Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects.
spark-sql (demo1) -- 删除表
spark-sql (demo1) DROP TABLE IF EXISTS employees;
Time taken: 4.462 seconds
spark-sql (demo1) -- 创建一个新表
spark-sql (demo1) CREATE TABLE employees ( id INT, name STRING, salary DOUBLE );
24/12/04 22:23:29 WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
24/12/04 22:23:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
Time taken: 1.541 seconds
spark-sql (demo1) -- 插入单条记录
spark-sql (demo1) INSERT INTO employees VALUES (4, Alice, 1300);
Time taken: 15.817 seconds
spark-sql (demo1) -- 插入多条记录
spark-sql (demo1) INSERT INTO employees VALUES (5, Bob, 1400), (6, Charlie, 1100);
Time taken: 10.018 seconds
spark-sql (demo1) -- 查询表中的所有数据
spark-sql (demo1) SELECT * FROM employees;
4 Alice 1300.0
5 Bob 1400.0
6 Charlie 1100.0
Time taken: 5.835 seconds, Fetched 3 row(s)
spark-sql (demo1) -- 删除表
spark-sql (demo1) DROP TABLE IF EXISTS employees;
Time taken: 0.784 seconds
spark-sql (demo1) 到这里基本上就已经成功了
3关闭所有进程代码
先ctrlC退出spark-sql cd /export/server/spark-3.4.4-bin-hadoop3/sbin ./stop-all.sh cd stop-yarn.sh stop-dfs.sh jps 再通过kill -9 命令把RunJar进程给关闭掉
cd /export/server/spark-3.4.4-bin-hadoop3/sbin
./stop-all.sh
cd
stop-yarn.sh
stop-dfs.sh
jps
效果如下
Time taken: 0.784 seconds
spark-sql (demo1) [hadoopnode1 sbin]$
[hadoopnode1 sbin]$ cd /export/server/spark-3.4.4-bin-hadoop3/sbin
[hadoopnode1 sbin]$ ./stop-all.sh
node2: stopping org.apache.spark.deploy.worker.Worker
node3: stopping org.apache.spark.deploy.worker.Worker
stopping org.apache.spark.deploy.master.Master
[hadoopnode1 sbin]$ cd
[hadoopnode1 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
Stopping proxy server [node1]
[hadoopnode1 ~]$ stop-dfs.sh
Stopping namenodes on [node1]
Stopping datanodes
Stopping secondary namenodes [node1]
[hadoopnode1 ~]$ jps
64996 Jps
39039 RunJar
[hadoopnode1 ~]$ kill -9 39039
[hadoopnode1 ~]$ jps
66159 Jps
[1] Killed nohup /export/server/hive/bin/hive --service metastore /export/server/hive/logs/metastore.log 21
[hadoopnode1 ~]$ jps
66219 Jps
[hadoopnode1 ~]$ 4.参考文章
1黑马大数据学习笔记4-Hive部署和基本操作_黑马大数据 hive笔记-CSDN博客
2spark的安装配置_spark基本配置-CSDN博客
3Scala配置教程_统信 scala-CSDN博客
4大数据_蓝净云的博客-CSDN博客
5.补充笔记
1Hive 3.x版本通常会自动处理与Spark的集成
vi /export/server/apache-hive-3.1.3-bin/bin/hive 为什么我的hive文件里面没有这一行sparkAssemblyPathls ${SPARK_HOME}/lib/spark-assembly-*.jar
在 Hive 3.x 版本中sparkAssemblyPath 这一行并不是标准的配置项。因此你在 hive 启动脚本中没有看到这行配置是完全正常的。
### 为什么没有这行配置
1. **Hive 版本更新**从 Hive 2.x 到 Hive 3.x 的版本更新中Hive 与 Spark 的集成方式发生了变化。特别是对于 Spark 2.x 及之后的版本Spark 与 Hive 的集成不再依赖于传统的 spark-assembly JAR 文件。spark-assembly 是 Spark 1.x 版本的一个重要文件在 Spark 2.x 及以后版本中Spark 被重新架构spark-assembly 文件已被拆分为多个 JAR 文件因此不再需要在 Hive 启动脚本中通过 sparkAssemblyPath 来加载它。
2. **Hive 集成 Spark 的方式**在 Hive 3.x 版本中Spark 通过 hive-site.xml 和其他配置文件进行集成而不再需要在 Hive 启动脚本中显式地指定 Spark 的 JAR 文件路径。Spark 会根据环境变量或配置文件自动加载所需的 JAR 文件。
### 解决方案
如果你希望在 Hive 中集成 Spark并且需要 Spark 的相关配置可以通过以下方法进行配置
1. **确保 Hive 配置文件正确** - 在 hive-site.xml 中设置与 Spark 相关的参数尤其是 Spark SQL 引擎和 Metastore 配置。
2. **设置 SPARK_HOME 环境变量** - 确保你的 Hive 环境能够找到正确的 Spark 配置可以通过设置环境变量 SPARK_HOME 来指定 Spark 的安装路径。
3. **添加 Spark JAR 到 SPARK_CLASSPATH** - 如果你确实需要手动指定 Spark 的 JAR 文件例如如果 Spark 配置文件没有被自动加载可以在 spark-env.sh 中设置 SPARK_CLASSPATH 来指定 Spark JAR 文件的路径 bash export SPARK_CLASSPATH/path/to/spark/jars/*.jar
4. **启动 Hive 与 Spark 结合的服务** - 启动 Spark 时可以通过 spark-submit 或 spark-sql 来运行 Spark SQL 查询并确保 Hive Metastore 已经启动。
### 如何检查 Hive 与 Spark 集成是否正常
1. **检查 Hive Metastore 是否正常运行** 你可以通过命令查看 Hive Metastore 服务是否启动成功 bash hive --service metastore
2. **测试 Hive 和 Spark 的连接** 在 spark-sql 中执行以下命令检查 Spark 能否正常读取 Hive 数据库 bash spark-sql --conf spark.sql.warehouse.dir/path/to/hive/warehouse
3. **查看 Spark 的 JAR 文件路径** 确保 Spark 的所有必要 JAR 文件都在 Spark 的 jars 目录下可以使用以下命令列出 Spark 中的所有 JAR 文件 bash ls $SPARK_HOME/jars
### 总结
在 Hive 3.x 版本中sparkAssemblyPath 的配置项已经不再使用而是通过配置文件和环境变量来管理 Spark 与 Hive 的集成。如果你需要手动配置 Spark 的 JAR 文件路径可以通过 SPARK_CLASSPATH 环境变量或者在 spark-env.sh 中指定。 2之前遇到的报错
[hadoopnode1 sbin]$ ./start-all.sh starting org.apache.spark.deploy.master.Master, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out node3: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out node2: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out [hadoopnode1 sbin]$
您已经成功执行了 ./start-all.sh 脚本来启动 Spark 集群。根据输出信息 starting org.apache.spark.deploy.master.Master, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out node3: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out node2: starting org.apache.spark.deploy.worker.Worker, logging to /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out
说明 - **Master** 节点已经启动并且日志文件 spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out 已经开始记录 Master 进程的日志。 - **Worker** 节点也在 node2 和 node3 上启动并且相应的日志文件已经生成。
### 接下来您可以做以下几步来验证集群是否正常启动
#### 1. **检查 Master 节点状态** 您可以通过访问 Spark Master UI 来检查 Master 节点是否正常启动。默认情况下Spark Master 的 Web UI 运行在 7077 端口。您可以访问以下 URL http://node1:8080
这是 Spark Master 的 Web UI您应该能看到集群状态、Worker 节点以及其他信息。如果页面能够正常加载且显示了 Worker 节点的状态如 node2 和 node3说明 Master 节点和 Worker 节点正常工作。
#### 2. **检查 Worker 节点状态** 您还可以检查各个 Worker 节点的 Web UI 来查看其状态。默认情况下Spark Worker 的 Web UI 运行在 8081 端口。您可以访问以下 URL http://node2:8081 http://node3:8081
如果 Worker 节点也显示正常并且没有错误信息那么说明 Worker 节点已成功加入集群。
#### 3. **查看日志文件** 查看日志文件以确保没有错误
- **Master 节点日志** /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out
- **Worker 节点日志** /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.out /export/server/spark-3.4.4-bin-hadoop3/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node3.out
如果日志中有错误信息可能需要根据日志内容进一步诊断问题。
#### 4. **提交 Spark 作业** 您还可以提交一个简单的 Spark 作业来测试集群是否正常运行。例如您可以使用以下命令提交一个简单的 Spark 程序
bash spark-submit --class org.apache.spark.examples.SparkPi --master spark://node1:7077 $SPARK_HOME/examples/jars/spark-examples_2.12-3.4.4.jar 10
这将提交一个计算 Pi 值的示例作业。如果集群正常作业应该能够运行并且您可以看到结果。
### 总结 - 确认通过 Web UI (http://node1:8080 和 http://node2:8081, http://node3:8081) 检查 Spark 集群状态。 - 查看日志文件确保没有错误。 - 通过提交简单的作业来测试集群的可用性。
如果您在过程中遇到任何问题或有其他问题随时可以向我询问
3配置的txt参考
1.进入hive安装目录conf目录修改hive-site.xml cd /export/server/apache-hive-3.1.3-bin/conf/ vi hive-site.xml 添加如下内容 property namehive.spark.client.jar/name value${SPARK_HOME}/lib/spark-assembly-*.jar/value /property
2.拷贝hive-site.xml到/export/server/spark-3.4.4-bin-hadoop3/conf cp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml /export/server/spark-3.4.4-bin-hadoop3/conf/
scp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml node2:/export/server/spark-3.4.4-bin-hadoop3/conf/ scp /export/server/apache-hive-3.1.3-bin/conf/hive-site.xml node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
3.拷贝MYSQL驱动到/export/server/spark-3.4.4-bin-hadoop3/jars cd /export/server/apache-hive-3.1.3-bin/lib/ cp /export/server/apache-hive-3.1.3-bin/lib/mysql-connector-java-5.1.34.jar /export/server/spark-3.4.4-bin-hadoop3/jars/ scp /export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar node2:/export/server/spark-3.4.4-bin-hadoop3/jars/ scp /export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar node3:/export/server/spark-3.4.4-bin-hadoop3/jars/
4.在所有节点/export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh 文件中配置 MySQL 驱动 vi /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh 添加如下内容 export SPARK_CLASSPATH/export/server/spark-3.4.4-bin-hadoop3/jars/mysql-connector-java-5.1.34.jar
分发 scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node2:/export/server/spark-3.4.4-bin-hadoop3/conf/ scp /export/server/spark-3.4.4-bin-hadoop3/conf/spark-env.sh node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
5.修改日志级别在各节点 cp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties.template /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties
vi /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties 在文件中找到log4j2.rootCategory的设置并将其修改为 原来的 rootLogger.level info rootLogger.appenderRef.stdout.ref console 把原来的那个注释 再添加以下内容 rootLogger.level warn rootLogger.appenderRef.console.ref console 再分发 scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node2:/export/server/spark-3.4.4-bin-hadoop3/conf/ scp /export/server/spark-3.4.4-bin-hadoop3/conf/log4j2.properties node3:/export/server/spark-3.4.4-bin-hadoop3/conf/
6.启动该启动的访问spark-sql su - hadoop start-dfs.sh start-yarn.sh nohup /export/server/hive/bin/hive --service metastore /export/server/hive/logs/metastore.log 21 cd /export/server/spark-3.4.4-bin-hadoop3/sbin ./start-all.sh jps spark-sql 7.在spark-sql中尝试一下写代码 use demo1;
-- 创建一个新表 CREATE TABLE employees ( id INT, name STRING, salary DOUBLE );
-- 插入单条记录 INSERT INTO employees VALUES (4, Alice, 1300);
-- 插入多条记录 INSERT INTO employees VALUES (5, Bob, 1400), (6, Charlie, 1100);
-- 查询表中的所有数据 SELECT * FROM employees;
-- 删除表 DROP TABLE IF EXISTS employees;