当前位置：首页 > news >正文

网站都是在哪里制作的东道设计应届生收入

news 2026/4/25 4:24:19

网站都是在哪里制作的,东道设计应届生收入,c 开发商城网站开发,网络规划设计师培训机构中文版什么是 Coordinate Ascent 算法#xff1f; Coordinate Ascent#xff08;坐标上升#xff09;是一种优化算法#xff0c;它通过在每次迭代时优化一个变量#xff08;或一个坐标#xff09;#xff0c;并保持其他变量不变#xff0c;逐步逼近最优解。与坐标下…中文版什么是 Coordinate Ascent 算法 Coordinate Ascent坐标上升是一种优化算法它通过在每次迭代时优化一个变量或一个坐标并保持其他变量不变逐步逼近最优解。与坐标下降Coordinate Descent算法相对坐标上升通常用于求解带有多个变量的优化问题特别是在优化函数在每个维度上都是可分的情况下。在每次迭代中Coordinate Ascent算法通过选择一个变量来最大化该变量对目标函数的贡献然后再选择下一个变量进行优化。它通过反复更新每个坐标来寻找最优解。虽然这个过程可能不会全局收敛但它在很多实际问题中表现得很好特别是在目标函数较为简单且每个坐标可以独立优化时。以EM算法为例介绍Coordinate Ascent EMExpectation-Maximization算法是一种常见的优化算法常用于统计学和机器学习中尤其是当数据中有隐变量时。EM算法的基本思路是通过迭代求解期望步骤E步和最大化步骤M步来优化参数。我们可以将EM算法视为一种坐标上升方法。每一轮的E步和M步都可以视为分别优化不同的“坐标”其中E步优化期望函数M步最大化该期望函数。 EM算法中的坐标上升假设我们有一个隐变量模型其目标是最大化对数似然函数 L ( θ ) log ⁡ P ( X , Z ∣ θ ) \mathcal{L}(\theta) \log P(X, Z | \theta) L(θ)logP(X,Z∣θ) 其中 ( X X X ) 是观察数据( Z Z Z ) 是隐变量( θ \theta θ ) 是模型的参数。由于隐变量 ( Z Z Z ) 是不可观测的直接求解对数似然函数是困难的。因此EM算法通过迭代的方法来解决这个问题。 E步Expectation step 在E步中我们计算隐变量的期望值即给定当前参数估计 ( θ ( t ) \theta^{(t)} θ(t) ) 下隐变量 ( Z Z Z ) 的条件概率分布 Q ( θ , θ ( t ) ) E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] Q(\theta, \theta^{(t)}) \mathbb{E}_{Z | X, \theta^{(t)}} \left[ \log P(X, Z | \theta) \right] Q(θ,θ(t))EZ∣X,θ(t)[logP(X,Z∣θ)] 这是一个坐标上升的过程因为我们通过固定当前的参数 ( θ ( t ) \theta^{(t)} θ(t) ) 来最大化隐变量的期望。 M步Maximization step 在M步中我们最大化E步得到的期望函数 ( Q ( θ , θ ( t ) ) Q(\theta, \theta^{(t)}) Q(θ,θ(t)) ) 来更新参数 ( θ \theta θ ) θ ( t 1 ) arg ⁡ max ⁡ θ Q ( θ , θ ( t ) ) \theta^{(t1)} \arg\max_\theta Q(\theta, \theta^{(t)}) θ(t1)argθmaxQ(θ,θ(t)) 这也可以视为一次坐标上升因为我们固定隐变量的期望并优化参数 ( θ \theta θ )。在EM算法中E步和M步交替进行每一轮都像是在“沿着不同的坐标轴上升”通过迭代更新参数和隐变量的估计值。数学公式假设我们有如下的对数似然函数 L ( θ ) log ⁡ P ( X ∣ θ ) log ⁡ ∑ Z P ( X , Z ∣ θ ) \mathcal{L}(\theta) \log P(X | \theta) \log \sum_{Z} P(X, Z | \theta) L(θ)logP(X∣θ)logZ∑P(X,Z∣θ) 由于隐变量 ( Z ) 无法直接观测我们在E步中计算隐变量的条件期望 Q ( θ , θ ( t ) ) E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] Q(\theta, \theta^{(t)}) \mathbb{E}_{Z | X, \theta^{(t)}} \left[ \log P(X, Z | \theta) \right] Q(θ,θ(t))EZ∣X,θ(t)[logP(X,Z∣θ)] 然后在M步中最大化这个期望 θ ( t 1 ) arg ⁡ max ⁡ θ Q ( θ , θ ( t ) ) \theta^{(t1)} \arg\max_\theta Q(\theta, \theta^{(t)}) θ(t1)argθmaxQ(θ,θ(t)) 这两个步骤交替进行直到收敛为止。例子高斯混合模型Gaussian Mixture Model, GMM 高斯混合模型是一种常见的隐变量模型它假设数据是由多个高斯分布组成的每个数据点属于某个高斯分布但我们不知道每个数据点具体属于哪个分布。EM算法可以用来估计模型参数。假设数据 ( X { x 1 , x 2 , … , x n } X \{x_1, x_2, \dots, x_n\} X{x1,x2,…,xn} ) 来自两个高斯分布每个高斯分布的参数为 ( ( μ 1 , σ 1 2 ) (\mu_1, \sigma_1^2) (μ1,σ12) ) 和 ( ( μ 2 , σ 2 2 ) (\mu_2, \sigma_2^2) (μ2,σ22) )。我们要估计这些参数。 E步计算每个数据点属于每个高斯分布的概率即隐变量的期望给定当前参数 ( ( μ 1 ( t ) , σ 1 2 ( t ) , μ 2 ( t ) , σ 2 2 ( t ) ) (\mu_1^{(t)}, \sigma_1^{2(t)}, \mu_2^{(t)}, \sigma_2^{2(t)}) (μ1(t),σ12(t),μ2(t),σ22(t)) )我们计算每个数据点 ( x i x_i xi ) 属于第一个高斯分布的概率称为责任 γ i 1 π 1 N ( x i ∣ μ 1 ( t ) , σ 1 2 ( t ) ) π 1 N ( x i ∣ μ 1 ( t ) , σ 1 2 ( t ) ) π 2 N ( x i ∣ μ 2 ( t ) , σ 2 2 ( t ) ) \gamma_{i1} \frac{\pi_1 \mathcal{N}(x_i | \mu_1^{(t)}, \sigma_1^{2(t)})}{\pi_1 \mathcal{N}(x_i | \mu_1^{(t)}, \sigma_1^{2(t)}) \pi_2 \mathcal{N}(x_i | \mu_2^{(t)}, \sigma_2^{2(t)})} γi1π1N(xi∣μ1(t),σ12(t))π2N(xi∣μ2(t),σ22(t))π1N(xi∣μ1(t),σ12(t)) 其中 ( π 1 \pi_1 π1 ) 和 ( π 2 \pi_2 π2 ) 是高斯分布的混合系数 ( N ( x i ∣ μ , σ 2 ) \mathcal{N}(x_i | \mu, \sigma^2) N(xi∣μ,σ2) ) 是高斯分布的概率密度函数。 M步最大化E步得到的期望函数以更新参数根据责任 ( γ i 1 \gamma_{i1} γi1 ) 和 ( γ i 2 \gamma_{i2} γi2 )我们更新高斯分布的均值和方差 μ 1 ( t 1 ) ∑ i 1 n γ i 1 x i ∑ i 1 n γ i 1 \mu_1^{(t1)} \frac{\sum_{i1}^{n} \gamma_{i1} x_i}{\sum_{i1}^{n} \gamma_{i1}} μ1(t1)∑i1nγi1∑i1nγi1xi σ 1 2 ( t 1 ) ∑ i 1 n γ i 1 ( x i − μ 1 ( t 1 ) ) 2 ∑ i 1 n γ i 1 \sigma_1^{2(t1)} \frac{\sum_{i1}^{n} \gamma_{i1} (x_i - \mu_1^{(t1)})^2}{\sum_{i1}^{n} \gamma_{i1}} σ12(t1)∑i1nγi1∑i1nγi1(xi−μ1(t1))2 类似地我们也更新 ( μ 2 \mu_2 μ2 ) 和 ( σ 2 2 \sigma_2^2 σ22 )并更新混合系数 ( π 1 \pi_1 π1 ) 和 ( π 2 \pi_2 π2 )。 Python代码实现以下是使用EM算法估计GMM参数的简单Python代码示例具体解析见文末 import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm# 生成数据两个高斯分布混合的样本 np.random.seed(42) n_samples 500 data1 np.random.normal(0, 1, n_samples//2) # 均值为0标准差为1 data2 np.random.normal(5, 1, n_samples//2) # 均值为5标准差为1 X np.concatenate([data1, data2])# 初始化参数 pi np.array([0.5, 0.5]) # 混合系数 mu np.array([0, 5]) # 均值 sigma np.array([1, 1]) # 方差# EM算法迭代 n_iterations 100 for iteration in range(n_iterations):# E步计算责任每个点属于每个分布的概率resp1 pi[0] * norm.pdf(X, mu[0], sigma[0])resp2 pi[1] * norm.pdf(X, mu[1], sigma[1])total_resp resp1 resp2resp1 / total_respresp2 / total_resp# M步最大化Q函数更新参数pi[0] np.mean(resp1)pi[1] np.mean(resp2)mu[0] np.sum(resp1 * X) / np.sum(resp1)mu[1] np.sum(resp2 * X) / np.sum(resp2)sigma[0] np.sqrt(np.sum(resp1 * (X - mu[0])**2) / np.sum(resp1))sigma[1] np.sqrt(np.sum(resp2 * (X - mu[1])**2) / np.sum(resp2))# 打印每次迭代的结果print(fIteration {iteration1}:)print(f pi: {pi})print(f mu: {mu})print(f sigma: {sigma})# 可视化结果 x_vals np.linspace(-5, 10, 1000) y_vals pi[0] * norm.pdf(x_vals, mu[0], sigma[0]) pi[1] * norm.pdf(x_vals, mu[1], sigma[1]) plt.hist(X, bins30, densityTrue, alpha0.6, colorg) plt.plot(x_vals, y_vals, colorr) plt.title(EM Algorithm for GMM) plt.show() 总结 Coordinate Ascent坐标上升算法是一种通过逐个优化坐标变量来最大化目标函数的方法。在EM算法中E步和M步分别优化隐变量的期望和模型参数的最大化这可以看作是一种坐标上升的过程。通过反复交替执行这两个步骤EM算法最终能够逼近最优的模型参数。英文版 What is the Coordinate Ascent Algorithm? Coordinate Ascent is an optimization algorithm that improves one variable (or coordinate) at a time, while keeping the other variables fixed, gradually approaching the optimal solution. In contrast to Coordinate Descent, which focuses on minimizing the objective function, Coordinate Ascent is used to maximize the objective function, typically in situations where the optimization function is separable across dimensions. In each iteration, the Coordinate Ascent algorithm selects a variable and maximizes the contribution of that variable to the objective function, then proceeds to the next variable for optimization. The process involves iteratively updating each coordinate to find the optimal solution. While this process may not always converge globally, it performs well in many practical problems, particularly when the objective function is simple, and each coordinate can be independently optimized. Coordinate Ascent in the Context of the EM Algorithm The Expectation-Maximization (EM) algorithm is a common optimization method used in statistics and machine learning, especially when dealing with hidden variables in the data. The basic idea of the EM algorithm is to iteratively solve the Expectation (E) step and Maximization (M) step to optimize the model parameters. We can view the EM algorithm as a form of Coordinate Ascent. Each round of the E-step and M-step can be seen as optimizing different “coordinates,” with the E-step optimizing the expectation function, and the M-step maximizing that expectation. Coordinate Ascent in the EM Algorithm Assume we have a latent variable model with the goal of maximizing the log-likelihood function: L ( θ ) log ⁡ P ( X , Z ∣ θ ) \mathcal{L}(\theta) \log P(X, Z | \theta) L(θ)logP(X,Z∣θ) where: ( X X X ) represents observed data,( Z Z Z ) represents latent variables,( θ \theta θ ) represents model parameters. Since the latent variables ( Z Z Z ) are unobserved, directly maximizing the log-likelihood function is difficult. Thus, the EM algorithm solves this problem iteratively. E-step (Expectation step): In the E-step, we compute the expected value of the latent variables, i.e., given the current parameter estimate ( θ ( t ) \theta^{(t)} θ(t) ), the conditional probability distribution of the latent variables ( Z Z Z ): Q ( θ , θ ( t ) ) E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] Q(\theta, \theta^{(t)}) \mathbb{E}_{Z | X, \theta^{(t)}} \left[ \log P(X, Z | \theta) \right] Q(θ,θ(t))EZ∣X,θ(t)[logP(X,Z∣θ)] This is a coordinate ascent process because we maximize the expectation of the latent variables while holding the current parameters ( θ ( t ) \theta^{(t)} θ(t) ) fixed. M-step (Maximization step): In the M-step, we maximize the expected function ( Q ( θ , θ ( t ) ) Q(\theta, \theta^{(t)}) Q(θ,θ(t)) ) computed in the E-step to update the model parameters ( θ \theta θ ): θ ( t 1 ) arg ⁡ max ⁡ θ Q ( θ , θ ( t ) ) \theta^{(t1)} \arg\max_\theta Q(\theta, \theta^{(t)}) θ(t1)argθmaxQ(θ,θ(t)) This is also a coordinate ascent process because we fix the expectation of the latent variables and optimize the parameters ( θ \theta θ ). The E-step and M-step alternate, with each step resembling an ascent along different coordinates. Through iterative updates of the parameters and the latent variables’ estimates, the EM algorithm moves towards the optimal solution. Mathematical Formulas Assume we have the following log-likelihood function: L ( θ ) log ⁡ P ( X ∣ θ ) log ⁡ ∑ Z P ( X , Z ∣ θ ) \mathcal{L}(\theta) \log P(X | \theta) \log \sum_{Z} P(X, Z | \theta) L(θ)logP(X∣θ)logZ∑P(X,Z∣θ) Since the latent variables ( Z ) are unobserved, in the E-step, we compute the conditional expectation of the latent variables: Q ( θ , θ ( t ) ) E Z ∣ X , θ ( t ) [ log ⁡ P ( X , Z ∣ θ ) ] Q(\theta, \theta^{(t)}) \mathbb{E}_{Z | X, \theta^{(t)}} \left[ \log P(X, Z | \theta) \right] Q(θ,θ(t))EZ∣X,θ(t)[logP(X,Z∣θ)] Then, in the M-step, we maximize this expectation: θ ( t 1 ) arg ⁡ max ⁡ θ Q ( θ , θ ( t ) ) \theta^{(t1)} \arg\max_\theta Q(\theta, \theta^{(t)}) θ(t1)argθmaxQ(θ,θ(t)) These two steps alternate until convergence. Example: Gaussian Mixture Model (GMM) A Gaussian Mixture Model (GMM) is a common latent variable model that assumes the data is generated from a mixture of several Gaussian distributions. Each data point belongs to one of the Gaussian distributions, but we don’t know which one. The EM algorithm can be used to estimate the model parameters. Assume that the data ( X { x 1 , x 2 , … , x n } X \{x_1, x_2, \dots, x_n\} X{x1,x2,…,xn} ) comes from two Gaussian distributions, with parameters ( ( μ 1 , σ 1 2 ) (\mu_1, \sigma_1^2) (μ1,σ12) ) and ( ( μ 2 , σ 2 2 ) (\mu_2, \sigma_2^2) (μ2,σ22) ), and we aim to estimate these parameters. E-step: Calculate the probability (responsibility) of each data point belonging to each Gaussian distribution: Given the current parameters ( ( μ 1 ( t ) , σ 1 2 ( t ) , μ 2 ( t ) , σ 2 2 ( t ) ) (\mu_1^{(t)}, \sigma_1^{2(t)}, \mu_2^{(t)}, \sigma_2^{2(t)}) (μ1(t),σ12(t),μ2(t),σ22(t)) ), we calculate the probability that data point ( x i x_i xi ) belongs to the first Gaussian distribution (called responsibility): γ i 1 π 1 N ( x i ∣ μ 1 ( t ) , σ 1 2 ( t ) ) π 1 N ( x i ∣ μ 1 ( t ) , σ 1 2 ( t ) ) π 2 N ( x i ∣ μ 2 ( t ) , σ 2 2 ( t ) ) \gamma_{i1} \frac{\pi_1 \mathcal{N}(x_i | \mu_1^{(t)}, \sigma_1^{2(t)})}{\pi_1 \mathcal{N}(x_i | \mu_1^{(t)}, \sigma_1^{2(t)}) \pi_2 \mathcal{N}(x_i | \mu_2^{(t)}, \sigma_2^{2(t)})} γi1π1N(xi∣μ1(t),σ12(t))π2N(xi∣μ2(t),σ22(t))π1N(xi∣μ1(t),σ12(t)) where ( π 1 \pi_1 π1 ) and ( π 2 \pi_2 π2 ) are the mixing coefficients, and ( N ( x i ∣ μ , σ 2 ) \mathcal{N}(x_i | \mu, \sigma^2) N(xi∣μ,σ2) ) is the Gaussian distribution’s probability density function. M-step: Maximize the expected function obtained in the E-step to update the parameters: Using the responsibilities ( γ i 1 \gamma_{i1} γi1 ) and ( γ i 2 \gamma_{i2} γi2 ), we update the Gaussian distribution’s mean and variance: μ 1 ( t 1 ) ∑ i 1 n γ i 1 x i ∑ i 1 n γ i 1 \mu_1^{(t1)} \frac{\sum_{i1}^{n} \gamma_{i1} x_i}{\sum_{i1}^{n} \gamma_{i1}} μ1(t1)∑i1nγi1∑i1nγi1xi σ 1 2 ( t 1 ) ∑ i 1 n γ i 1 ( x i − μ 1 ( t 1 ) ) 2 ∑ i 1 n γ i 1 \sigma_1^{2(t1)} \frac{\sum_{i1}^{n} \gamma_{i1} (x_i - \mu_1^{(t1)})^2}{\sum_{i1}^{n} \gamma_{i1}} σ12(t1)∑i1nγi1∑i1nγi1(xi−μ1(t1))2 Similarly, we update ( μ 2 \mu_2 μ2 ), ( σ 2 2 \sigma_2^2 σ22 ), and the mixing coefficients ( π 1 \pi_1 π1 ) and ( π 2 \pi_2 π2 ). Python Code Implementation Here is a simple Python implementation of the EM algorithm for estimating GMM parameters: import numpy as np import matplotlib.pyplot as plt from scipy.stats import norm# Generate data: samples from a mixture of two Gaussian distributions np.random.seed(42) n_samples 500 data1 np.random.normal(0, 1, n_samples//2) # Mean 0, Std 1 data2 np.random.normal(5, 1, n_samples//2) # Mean 5, Std 1 X np.concatenate([data1, data2])# Initialize parameters pi np.array([0.5, 0.5]) # Mixing coefficients mu np.array([0, 5]) # Means sigma np.array([1, 1]) # Variances# EM algorithm iterations n_iterations 100 for iteration in range(n_iterations):# E-step: Calculate responsibilities (probabilities of each point belonging to each distribution)resp1 pi[0] * norm.pdf(X, mu[0], sigma[0])resp2 pi[1] * norm.pdf(X, mu[1], sigma[1])total_resp resp1 resp2resp1 / total_respresp2 / total_resp# M-step: Maximize Q function, update parameterspi[0] np.mean(resp1)pi[1] np.mean(resp2)mu[0] np.sum(resp1 * X) / np.sum(resp1)mu[1] np.sum(resp2 * X) / np.sum(resp2)sigma[0] np.sqrt(np.sum(resp1 * (X - mu[0])**2) / np.sum(resp1))sigma[1] np.sqrt(np.sum(resp2 * (X - mu[1])**2) / np.sum(resp2))# Print results for each iterationprint(fIteration {iteration1}:)print(f pi: {pi})print(f mu: {mu})print(f sigma: {sigma})# Visualize the results x_vals np.linspace(-5, 10, 1000) y_vals pi[0] * norm.pdf(x_vals, mu[0], sigma[0]) pi[1] * norm.pdf(x_vals, mu[1], sigma[1]) plt.hist(X, bins30, densityTrue, alpha0.6, colorg) plt.plot(x_vals, y_vals, colorr) plt.title(EM Algorithm for GMM)plt.show()Summary The Coordinate Ascent method is a useful iterative approach to optimization, where only one parameter (coordinate) is updated at a time while others remain fixed. This method is central to algorithms like Expectation-Maximization (EM), where the E-step and M-step alternately optimize different coordinates (latent variables and model parameters). The Gaussian Mixture Model (GMM) example illustrates how the EM algorithm can be used for model estimation in latent variable models. python代码中哪一步体现数据的似然最大化在EM算法的M步中“最大化对数似然函数” 这一目标体现在通过更新参数混合系数 ( π k \pi_k πk )、均值 ( μ k \mu_k μk ) 和方差 ( σ k \sigma_k σk )来最大化每个数据点的概率从而最大化整个模型的数据似然。下面详细解析这个过程并明确如何体现最大化似然。 1. 对数似然函数的目标对数似然函数表示的是数据给定当前模型参数下的概率。在高斯混合模型GMM中假设我们有一个由 ( K K K ) 个高斯分布组成的混合模型数据集 ( X { x 1 , x 2 , … , x N } X \{x_1, x_2, \dots, x_N\} X{x1,x2,…,xN} ) 由这些高斯分布生成。模型的对数似然函数是 L ( θ ) ∑ i 1 N log ⁡ ( ∑ k 1 K π k N ( x i ; μ k , σ k 2 ) ) \mathcal{L}(\theta) \sum_{i1}^{N} \log \left( \sum_{k1}^{K} \pi_k \mathcal{N}(x_i; \mu_k, \sigma_k^2) \right) L(θ)i1∑Nlog(k1∑KπkN(xi;μk,σk2)) 其中 ( π k \pi_k πk ) 是第 ( k k k ) 个高斯分布的混合系数即该高斯分布在总模型中的权重( N ( x i ; μ k , σ k 2 ) \mathcal{N}(x_i; \mu_k, \sigma_k^2) N(xi;μk,σk2) ) 是第 ( k k k ) 个高斯分布对数据点 ( x i x_i xi ) 的概率密度( μ k \mu_k μk ) 是第 ( k k k ) 个高斯分布的均值( σ k 2 \sigma_k^2 σk2 ) 是第 ( k k k ) 个高斯分布的方差。对数似然函数衡量的是在给定数据的情况下当前模型参数 ( π k , μ k , σ k \pi_k, \mu_k, \sigma_k πk,μk,σk ) 下数据的整体似然。我们的目标是最大化这个对数似然函数找到能够最好地解释数据的模型参数。 2. M步的作用最大化似然函数在EM算法中M步的目标是最大化对数似然函数即通过更新参数使得模型的似然值数据的概率最大化。如何做到这一点呢 2.1. E步计算责任期望在E步中我们计算每个数据点 ( x i x_i xi ) 属于每个高斯分布 ( k k k ) 的责任即数据点 ( x i x_i xi ) 属于第 ( k k k ) 个分布的后验概率 γ i k P ( z i k ∣ x i ) π k N ( x i ; μ k , σ k 2 ) ∑ k ′ 1 K π k ′ N ( x i ; μ k ′ , σ k ′ 2 ) \gamma_{ik} P(z_i k | x_i) \frac{\pi_k \mathcal{N}(x_i; \mu_k, \sigma_k^2)}{\sum_{k1}^{K} \pi_{k} \mathcal{N}(x_i; \mu_{k}, \sigma_{k}^2)} γikP(zik∣xi)∑k′1Kπk′N(xi;μk′,σk′2)πkN(xi;μk,σk2) 这里( γ i k \gamma_{ik} γik ) 表示数据点 ( x i x_i xi ) 属于高斯分布 ( k k k ) 的概率责任。 2.2. M步更新参数 E步计算得到的责任 ( γ i k \gamma_{ik} γik ) 用来在M步中更新模型的参数。通过更新混合系数 ( π k \pi_k πk )、均值 ( μ k \mu_k μk ) 和方差 ( σ k \sigma_k σk )我们让每个数据点的似然由它属于每个高斯分布的概率加权计算的最大化。更新混合系数 ( π k \pi_k πk )混合系数表示每个高斯分布的权重。更新规则为 π k 1 N ∑ i 1 N γ i k \pi_k \frac{1}{N} \sum_{i1}^{N} \gamma_{ik} πkN1i1∑Nγik 这里的更新表示的是每个高斯分布的权重是所有数据点在该分布下的责任概率的平均值。更新均值 ( μ k \mu_k μk )均值是高斯分布的中心通过对数据点的加权平均来更新。更新规则为 μ k ∑ i 1 N γ i k x i ∑ i 1 N γ i k \mu_k \frac{\sum_{i1}^{N} \gamma_{ik} x_i}{\sum_{i1}^{N} \gamma_{ik}} μk∑i1Nγik∑i1Nγikxi 这里的加权平均意味着每个数据点 ( x i x_i xi ) 对均值的贡献是它属于该高斯分布的概率 ( γ i k \gamma_{ik} γik )。更新方差 ( σ k 2 \sigma_k^2 σk2 )方差衡量高斯分布的扩散程度同样通过加权平方差来更新。更新规则为 σ k 2 ∑ i 1 N γ i k ( x i − μ k ) 2 ∑ i 1 N γ i k \sigma_k^2 \frac{\sum_{i1}^{N} \gamma_{ik} (x_i - \mu_k)^2}{\sum_{i1}^{N} \gamma_{ik}} σk2∑i1Nγik∑i1Nγik(xi−μk)2 同样地这里加权平方差意味着每个数据点 ( x i x_i xi ) 对方差的贡献是它属于该高斯分布的概率 ( γ i k \gamma_{ik} γik )。 3. 最大化似然如何体现最大化对数似然函数是通过更新参数( π k , μ k , σ k \pi_k, \mu_k, \sigma_k πk,μk,σk )使得每个数据点的似然最大化来实现的。在M步中更新混合系数 ( π k \pi_k πk )通过更新每个高斯分布的权重使得每个数据点属于各个高斯分布的责任概率最大化。更新均值 ( μ k \mu_k μk )通过更新每个高斯分布的均值使得数据点在该分布下的概率密度最大化。更新方差 ( σ k \sigma_k σk )通过更新每个高斯分布的方差使得数据点在该分布下的概率密度最大化。最终EM算法会通过迭代E步和M步逐步优化参数直到对数似然函数收敛即模型的似然最大化。 M步通过计算每个数据点在E步中得到的责任更新模型的参数混合系数 ( π k \pi_k πk )、均值 ( μ k \mu_k μk ) 和方差 ( σ k \sigma_k σk )使得数据在当前模型下的似然最大化。这是通过加权平均和加权平方差的方式来进行的每次更新都朝着使数据在当前模型下最可能的方向调整参数。后记 2024年12月28日14点12分于上海在GPT4o大模型辅助下完成。

查看全文

http://www.hkea.cn/news/14403674/