Improve NN
文章目录
- Improve NN
-
- train/dev/test set
- Bias/Variance
- basic recipe
- Regularization
-
- Logistic Regression
- Neural network
- other ways
- optimization problem
-
- Normalizing inputs
- vanishing/exploding gradients
- weight initialize
- gradient check
-
- Numerical approximation
- grad check
train/dev/test set
0.7/0/0.3 0.6.0.2.0.2 -> 100-10000
0.98/0.01/0.01 … -> big data
Bias/Variance
偏差度量的是单个模型的学习能力,而方差度量的是同一个模型在不同数据集上的稳定性。
high variance ->high dev set error
high bias ->high train set error
basic recipe
high bias -> bigger network / train longer / more advanced optimization algorithms / NN architectures
high variance -> more data / regularization / NN architecture
Regularization
Logistic Regression
L
2
r
e
g
u
l
a
r
i
z
a
t
i
o
n
:
m
i
n
J
(
w
,
b
)
→
J
(
w
,
b
)
=
1
m
∑
i
=
1
m
L
(
y
^
(
i
)
,
y
(
i
)
)
+
2
m
∥
w
∥
2
2
L2;; regularization:minmathcal{J}(w,b)rightarrow J(w,b)=frac{1}{m}sum_{i=1}^mmathcal{L}(hat y^{(i)},y^{(i)})+frac{lambda}{2m}Vert wVert_2^2
L2regularization:minJ(w,b)→J(w,b)=m1i=1∑mL(y^(i),y(i))+2m∥w∥22
Neural network
F
r
o
b
e
n
i
u
s
n
o
r
m
∥
w
[
l
]
∥
F
2
=
∑
i
=
1
n
[
l
]
∑
j
=
1
n
[
l
−
1
]
(
w
i
,
j
[
l
]
)
2
D
r
o
p
o
u
t
r
e
g
u
l
a
r
i
z
a
t
i
o
n
:
d
3
=
n
p
.
r
a
n
d
m
.
r
a
n
d
(
a
3.
s
h
a
p
e
.
s
h
a
p
e
[
0
]
,
a
3.
s
h
a
p
e
[
1
]
Frobeniusnorm∥w[l]∥F2=i=1∑n[l]j=1∑n[l−1](wi,j[l])2Dropoutregularization:d3=np.randm.rand(a3.shape.shape[0],a3.shape[1]keep.prob)a3=np.multiply(a3,d3)a3/=keep.prob
other ways
- early stopping
- data augmentation
optimization problem
speed up the training of your neural network
Normalizing inputs
- subtract mean
=
1
m
∑
i
=
1
m
x
(
i
)
x
:
=
x
−
mu =frac{1}{m}sum _{i=1}^{m}x^{(i)} x:=x-mu
=m1i=1∑mx(i)x:=x−
- normalize variance
2
=
1
m
∑
i
=
1
m
(
x
(
i
)
)
2
x
/
=
sigma ^2=frac{1}{m}sum_{i=1}^m(x^{(i)})^2 x/=sigma
2=m1i=1∑m(x(i))2x/=
vanishing/exploding gradients
y
=
w
[
l
]
w
[
l
−
1
]
.
.
.
w
[
2
]
w
[
1
]
x
w
[
l
]
>
I
→
(
w
[
l
]
)
L
→
∞
w
[
l
]
Irightarrow (w^{[l]})^Lrightarrowinfty w^{[l]}y=w[l]w[l−1]…w[2]w[1]xw[l]>I→(w[l])L→∞w[l]I→(w[l])L→0
weight initialize
v
a
r
(
w
)
=
1
n
(
l
−
1
)
w
[
l
]
=
n
p
.
r
a
n
d
o
m
.
r
a
n
d
n
(
s
h
a
p
e
)
∗
n
p
.
s
q
r
t
(
1
n
(
l
−
1
)
)
var(w)=frac{1}{n^{(l-1)}} w^{[l]}=np.random.randn(shape)*np.sqrt(frac{1}{n^{(l-1)}})
var(w)=n(l−1)1w[l]=np.random.randn(shape)∗np.sqrt(n(l−1)1)
gradient check
Numerical approximation
f
(
)
=
3
f
′
(
)
=
f
(
+
)
−
f
(
−
)
2
f(theta)=theta^3 f'(theta)=frac{f(theta+varepsilon)-f(theta-varepsilon)}{2varepsilon}
f()=3f′()=2f(+)−f(−)
grad check
d
a
p
p
r
o
x
[
i
]
=
J
(
1
,
.
.
.
i
+
.
.
.
)
−
J
(
1
,
.
.
.
i
−
.
.
.
)
2
=
d
[
i
]
c
h
e
c
服务器托管网
k
:
∥
d
a
p
p
r
o
x
−
d
∥
2
∥
d
a
p
p
r
o
x
∥
2
+
∥
d
∥
2
dapprox[i]=2J(1,…i+…)−J(1,…i−…)=d[i]check:∥dapprox∥2+∥d∥2∥dapprox−d∥210−7
服务器托管,北京服务器托管,服务器租用 http://www.fwqtg.net
相关推荐: SpringBoot 集成 WebSocket,实现后台向前端推送信息
SpringBoot 集成 WebSocket,实现后台向前端推送信息 在一次项目开发中,使用到了Netty网络应用框架,以及MQTT进行消息数据的收发,这其中需要后台来将获取到 的消息主动推送给前端,于是就使用到了MQTT,特此记录一下。 1、什么是webs…