Simulation-计算统计——Monte Carlo

阅读原文时间：2023年07月09日阅读：2

找到原函数，再计算
无法找到原函数,MC积分

Assume that we can generate $U_1, . . . , U_n \sim Uniform (0, 1)$, and define $\hat \theta_n = \frac{1}{n} \sum_{i=1}^{n} g(U_i)$

By law of large numbers， if $\int_{0}^{1}|g(u)|du < \infty$, we have $\hat \theta_n \rightarrow \theta := \mathbb{E}g(U) = \int_0^1 g(u)du \quad a.s. \quad n \rightarrow \infty$

a.s. means almost surely:the set of possible exceptions may be non-empty, but it has probability 0.

bounded interval

Eg. Calculate $\theta = \int_0^1 x^2 dx$

n <- 100
set.seed(1)
x <- runif(n)
theta.hat <- mean(x^2)
theta.hat

求：$\theta = \int_a^{b} g(t) dt$

利用$\int_a^bg(t)dt = (b-a)\int_a^b g(t)\frac{1}{b-a}dt = (b-a)\mathbb{E}g(Y)$, where Y~ Uniform(a,b)

所以我们可以：

先生成$Y_1,…,Y_n \sim Uniform(a,b)$
$\hat \theta_n = (b-a) * \frac{1}{n} \sum_{i=1}^n g(Y_i)$

Eg. Calculate $\theta = \int_1^4 x^2dx$

n <- 100
set.seed(1)
x <- runif(n, min = 1, max = 4)
theta.hat <- (4 - 1) * mean(x^2)
theta.hat

Unbounded Interval

\[\Phi(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^\frac{-t^2}{2}dt
\]

Note that $$\Phi(x) = \mathbb{P}(Z \le x) = \mathbb{E}[I_{Z \le x}], \text{where } Z \sim N(0,1)$$

Therefore, we can estimate

\[\widehat{\Phi(x)}=\frac{1}{n} \sum_{i=1}^n \mathbb{I}\left(Z_i \leq x\right)=\frac{\#\left\{i: Z_i \leq x\right\}}{n}
\]

事件发生的概率等于它的示性函数的期望。

Eg. Calculate $\Phi(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}e^\frac{t^2}{2}dt$ (1) when x = 0.3. (2) when $x = (x_1, …, x_T)$

#(1)
x <- 0.3
n <- 100
set.seed(1)
z <- rnorm(n)
mean(z < x) # estimate
pnorm(x) # true value by pnorm()

#(2)
Phi.hat <- function(x, n = 1000){
  z <- rnorm(n)
  return(mean(z < x))
}
x <- seq(0, 2.5, len = 5)
Phi.hat(x) #但是这样有问题，输出只有一个值
##原因是当向量和数进行比较的时候，向量的每一个元素会分别跟数比较；而向量和向量比较时，是对应元素进行比较。Be Careful！
round(pnorm(x), digits = 3) # true value

#(2)解决方法
probab <- numeric(5)
for(i in 1:5){
  probab[i] <- Phi.hat(x[i])
}
probab

For General Distributions

Calculate $\theta = \int_A g(x)f(x) dx$

先选一个满意的$f(x)$
生成一组$X_1,…,X_n$ 服从$f(x)$ (上节课的知识)
计算$\hat \theta_n = \frac{1}{n} \sum{i=1}^n g(X_i)$

Eg. Calculate $\int_0^{\infty} \frac{e^{-x}}{1+x^2}dx$

可以把$e^{-x}$视作均值为1的指数分布的密度函数，由此可以生成X。

Let$\xi_1,\xi_2…$ be a sequence of i.i.d. random variables with finite expected value,by $\mu$. Let

\[\bar{\xi}_n=\frac{1}{n} \sum_{i=1}^n \xi_i .
\]

Then,

Weak law of large numbers

\[\bar{\xi}_n \stackrel{P}{\rightarrow} \mu \text { as } n \rightarrow \infty .
\]

Strong law of large numbers)

\[\bar{\xi}_n \stackrel{\text { a.s. }}{\longrightarrow} \mu \text { as } n \rightarrow \infty .
\]

Weak law of large numbers

Definition (Convergence in probability)

Let $\left(\bar{\xi}_n\right)_{n \geq 1}$ be a sequence of real-valued random variables defined on a probability space $(\Omega, \mathcal{F}, P)$.
For any positive number $\varepsilon$,

\[\lim _{n \rightarrow \infty} P\left(\left\{\omega \in \Omega:\left|\bar{\xi}_n(\omega)-\mu\right|>\varepsilon\right\}\right)=0 .
\]

Then, we say $\bar{\xi}_n$ converges in probability to $\mu$, denoted by

\[\bar{\xi}_n \stackrel{P}{\longrightarrow} \mu .
\]

证明见老师ppt。

M <- 100000
n <- 100
eps <- 0.1
xi.bar <- numeric(M) # xi.bar[i] will record the sample mean of trial i
for (i in 1:M){
set.seed(i)
xi.vec <- sample(1:6, size = n, replace = TRUE)
xi.bar[i] <- mean(xi.vec)
}
mu <- 3.5
mean(abs(xi.bar - mu) > eps)

注意这里只跟n有关，m是试验数量。

如何从

估计这种积分形式的标准差。

如果能用数值解，就用数值解法。不知道分布的时候，可以用MC来估计。

\[\theta = \int_{-\infty}^{\infty} g(x)f(x)dx
\]

where $f(x)$is a p.d.f.

\[\hat \theta_n = \frac{1}{n}\sum_{i = 1}^ng(X_i)
\]

, where $X_i \sim f(x)$

\[\mathbb{E}[\hat\theta_n] = \theta
\]

现求$Var(\hat\theta_n)$

standard error：标准误差

The variance can be calculated by

\[\operatorname{Var}\left(\hat{\theta}_n\right)=\frac{\sigma^2}{n}=:(\text { standard error of } \hat{\theta})^2=\left(\operatorname{se}\left(\hat{\theta}_n\right)\right)^2,
\]

where

\[\sigma^2=\operatorname{Var}(g(X))=\int_{-\infty}^{\infty}(g(x)-\theta)^2 f(x) d x .
\]

However, $\sigma^2$ is not known in general.
We can use the estimated variance:

\[\hat{\sigma}_n^2=\frac{1}{n-1} \sum_{i=1}^n\left(g\left(X_i\right)-\hat{\theta}_n\right)^2 .
\]

The estimated variance of $\hat{\theta}_n$ is

\[\widehat{\operatorname{se}}\left(\hat{\theta}_n\right)^2:=\frac{\hat{\sigma}_n^2}{n}=\frac{1}{n(n-1)} \sum_{i=1}^n\left(g\left(X_i\right)-\hat{\theta}_n\right)^2 .
\]

f(x) 是自己取的，{$X_i$}是根据f(x)生成的。n是MC取的样本数。

由前面的内容，

\[\hat{\theta}_n=\frac{1}{n} \sum_{i=1}^n g\left(X_i\right) .
\]

Eg. Calculate $\int_0^1 xdx$

Choose f(x)=1, then g(x) = x, 然后可以代入公式计算。

重复生成如10000次$\hat \theta_n$（每一次都要投100次骰子，总的会投很多很多次骰子）, 当n足够大时，$$P(\frac{\theta_n-\theta}{\sigma/\sqrt{n}}) \rightarrow \Phi(x)$$

Eg. Variance of Dice Rolling

sigma <- sqrt(35/12)
xi.bar.standardized <- (xi.bar - mu)/(sigma/sqrt(n))
hist(xi.bar.standardized, probability = TRUE)

By CLT(Central Limit Theorem) $$ P(-1.96 \le \frac{\hat \theta_n - \theta}{\sigma/\sqrt{n}}) \le 1.96 \sim 0.95 $$, So,$$ \hat{\theta}_n \in\left[\theta-1.96 \times \frac{\sigma^{\prime}}{\sqrt{n}}, \theta+1.96 \times \frac{\sigma}{\sqrt{n}}\right] . $$

但是$\sigma$不一定知道，可以用估计值代替，于是就有了t-分布。

As $\sigma^2$ is unknown in general, people usually use $\hat{\sigma}^2$ to replace it in practice. In this case,

\[\frac{\hat{\theta}_n-\theta}{\hat{\sigma}_n / \sqrt{n}}=\frac{\hat{\theta}_n-\theta}{\hat{\operatorname{se}}\left(\hat{\theta}_n\right)} \Longrightarrow t_{n-1} .
\]

So, by limit distribution,

\[P\left(\theta-t_{0.025}(n-1) \frac{\hat{\sigma}_n}{\sqrt{n}} \leq \hat{\theta}_n \leq t_{0.025}(n-1) \frac{\hat{\sigma}_n}{\sqrt{n}}\right) \approx 0.05
\]

Eg. Dice Rolling

# student's t-distribution
M <- 100
xi.student <- numeric(M)
for (i in 1:M){
set.seed(i)
xi.vec <- sample(1:6, size = n, replace = TRUE)
xi.student[i] <- (mean(xi.vec) - 3.5) / (sd(xi.vec) / sqrt(n))
}

Note在n越来越大的时候，t分布会和normal越来越像。

M<-100
n<-200
u<-runif(n)
set.seed(1)
g <- numeric(M)
for (i in 1:M){
  g[i]<-sin(u[i])^4*exp(-u[i])
  theta<-g[i]
}
theta<-theta/M
theta

Variance and Efficiency

Definition (Efficiency)

Let $\hat{\theta}_1$ and $\hat{\theta}_2$ be two estimators of $\theta$. We say $\hat{\theta}_1$ is more efficient (in a statistical sense) than $\hat{\theta}_2$ if

\[\frac{\operatorname{Var}\left(\hat{\theta}_1\right)}{\operatorname{Var}\left(\hat{\theta}_2\right)}<1
\]

Remark.

If the variances of estimators $\hat{\theta}_1$ and $\hat{\theta}_2$ are unknown, we can estimate efficiency by substituting a sample estimate of the variance for each estimator.

Antithetic variables (对照变量)

如果自己生成两个负相关的变量，然后就可以生成一个variance更小的变量。

We can also use

\[\hat{\theta}_{1,2}=\frac{1}{2}\left(\hat{\theta}_1+\hat{\theta}_2\right)
\]

to estimate $\theta$.

Note that

\[\operatorname{Var}\left(\hat{\theta}_{1,2}\right)=\operatorname{Var}\left(\frac{\hat{\theta}_1+\hat{\theta}_2}{2}\right)=\frac{1}{4}\left(\operatorname{Var}\left(\hat{\theta}_1\right)+\operatorname{Var}\left(\hat{\theta}_2\right)+2 \operatorname{Cov}\left(\hat{\theta}_1, \hat{\theta}_2\right)\right) \text {. }
\]

If $\hat{\theta}_1$ and $\hat{\theta}_2$ are negatively correlated, then

\[\operatorname{Cov}\left(\hat{\theta}_1, \hat{\theta}_2\right)<0 \text {, }
\]

and hence,

\[\operatorname{Var}\left(\hat{\theta}_{1,2}\right) \leq \frac{1}{4}\left(\operatorname{Var}\left(\hat{\theta}_1\right)+\operatorname{Var}\left(\hat{\theta}_2\right)\right)
\]

想要构造$\hat\theta_1, \hat\theta_2$

使得他们负相关。下面构造的$X_i$和$\tilde X_i$就是负相关的。

If $$ \hat{\theta}1=\frac{1}{n} \sum^n g\left(X_i\right) $$

where

\[X_i \sim F_X(x)
\]

Assume that $X_i$ is generated from the inverse transform method, then

\[X_i=F^{-1}\left(U_i\right) \stackrel{d}{=} \tilde{X}_i=F^{-1}\left(1-U_i\right) .
\]

If the function $g$ is a monotone function, then

\[\operatorname{Cov}\left(g\left(X_i\right), g\left(\tilde{X}_i\right)\right) \leq 0
\]

Therefore,

\[\hat{\theta}_2=\frac{1}{n} \sum_{i=1}^n g\left(\tilde{X}_i\right)
\]

is also an estimator of $\theta$

Eg.$\int_0^1x^2dx$

u<-runif(100)
theta1 <-mean(u^2)
theta2<-mean((1-u)^2)
theta<-mean(theta1,theta2)
theta

Eg. Normal distribution function

x <- 1
pnorm(1)

n <- 100
V <- runif(100, 0, 1)
V.tilde <- 1 - V

theta1 <- mean(exp(-V^2/2)*(1/sqrt(2*pi)))
theta2 <- mean(exp(-V.tilde^2/2)*(1/sqrt(2*pi)))

theta <- mean(theta1, theta2)
theta

控制方差的程度和g(x)有关，也就是和f(x)的选择有关，好的时候能控制到90%以下

Control variates

Choose $$ c^*=-\frac{\operatorname{Cov}(g(X), f(X))}{\operatorname{Var}(f(X))} $$
Therefore, the improved estimator is defined as

\[\hat{\theta}^*=\frac{1}{n} \sum_{i=1}^n g\left(X_i\right)+\hat{c}^* \cdot \frac{1}{n} \sum_{i=1}^n\left(f\left(X_i\right)-\mu\right),
\]

where

\[\hat{c}^*=-\frac{\text { sample covariance }}{\text { sample variance }} \text {. }
\]

Eg. $\theta = \int_0^1 e^u du$

u <- runif(100)
gu <- exp(u)
theta.hat <- mean(gu)
fu <- u
mu <- 1/2
sample.cov <- cov(gu,fu)
sample.var <- var(fu)

c.star <- -sample.cov / sample.var
theta.star <- mean(gu) + c.star*mean(fu-1/2)

print(theta.hat)
print(theta.star)

这里取f(u)=u

手机扫一扫

移动阅读更方便

你可能感兴趣的文章

AutoEmbedding论文阅读笔记

论文翻译：2021_Real-Time Denoising and Dereverberation wtih Tiny Recurrent U-Net

markdown的摘要测试

知识图谱顶会论文(KDD-2022) kgTransformer：复杂逻辑查询的预训练知识图谱Transformer