Hugh

ML-softmax 理解

Posted on 2020-02-25 Edited on 2020-02-27

softmax 含义

softmax把一些输入映射为0-1之间的实数，并且归一化保证和为1，因此多分类的概率之和也刚好为1。

首先简单理解softmax的含义。顾名思义，softmax由两个单词组成，其中一个是max。对于max我们都很熟悉，比如有两个变量a,b。如果a>b，则max为a，反之为b。
另外一个单词为soft。max存在的一个问题是什么呢？如果将max看成一个分类问题，就是非黑即白，最后的输出是一个确定的变量。更多的时候，我们希望输出的是取到某个分类的概率，或者说，我们希望分值大的那一项被经常取到，而分值较小的那一项也有一定的概率偶尔被取到，所以我们就应用到了soft的概念，即最后的输出是每个分类被取到的概率。

softmax 计算

计算过程如下图

假设有一个数组V，$ V_i $表示V中的第i个元素，那么这个元素的softmax值为:
$$ s_i = \frac{e^i}{\sum_{j = 1} ^{i}e^j} $$
该元素的softmax值，就是该元素的指数与所有元素指数和的比值。

这个定义可以说很简单，也很直观。那为什么要定义成这个形式呢？原因主要如下。

1.softmax设计的初衷，是希望特征对概率的影响是乘性的。
2.多类分类问题的目标函数常常选为cross-entropy。即 $ L = - \sum_{k} t_k \cdot \ln P(y = k)$，其中目标类的 $ t_k $为1，其余类的$ t_k $为0。

softmax求导

在多分类问题中，我们经常使用交叉熵作为损失函数
$$ Loss = - \sum{t_i} \ln{y_i} $$
其中，$ t_i $表示真实值，$ y_i $表示求出的softmax值。
当预测第i个时，可以认为$ t_i = 1 $。此时损失函数变成了:
$$ Loss_i = −\ln {y_i} $$

接下来对$ Loss $求导。根据定义：
$$ y_i = \frac{e^i} {\sum_j e_j} $$

我们已经将数值映射到了0-1之间，并且和为1，则有：
$$ \frac{e^i} {\sum_j e^j} = 1 - \frac{ \sum_{j \neq i} e^j} { \sum_j e^j} $$

求导

TF2-fashion_mnist_分类模型

Posted on 2020-02-25 Edited on 2020-03-02 In TF2

使用三层全联接层对图片进行分类

line_number: true

import matplotlib as mpl
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras

from sklearn.preprocessing import StandardScaler
import numpy as np

# 加载数据
fashion_mnist = keras.datasets.fashion_mnist
(x_train_all, y_train_all), (x_test, y_test) = fashion_mnist.load_data()

# 切分为测试集和训练集
x_valid, x_train = x_train_all[:5000], x_train_all[5000:]
y_valid, y_train = y_train_all[:5000], y_train_all[5000:]

print(x_valid.shape, y_valid.shape)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

## 归一化
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(
    x_train.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)

x_valid_scaled = scaler.transform(
    x_valid.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)

x_test_scaled = scaler.transform(
    x_test.astype(np.float32).reshape(-1, 1)).reshape(-1, 28, 28)
print(np.max(x_train_scaled), np.min(x_train_scaled))


def show_single_image(img_arr):
    plt.imshow(img_arr, cmap="binary")
    plt.show()

show_single_image(x_train[0])

def show_imgs(n_rows, n_cols, x_data, y_data, class_names):
    plt.figure(figsize = (n_cols * 1.4, n_rows * 1.6))
    for row in range(n_rows):
        for col in range(n_cols):
            index = n_cols * row + col
            plt.subplot(n_rows, n_cols, index + 1)
            plt.imshow(x_data[index], cmap = "binary", interpolation = "nearest")
            plt.axis('off')
            plt.title(class_names[y_data[index]])
    plt.show()
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

show_imgs(3, 5, x_train, y_train, class_names)

# 构建模型
model = keras.models.Sequential([
    keras.layers.Flatten(input_shape = [28, 28]),
    keras.layers.Dense(300, activation = 'relu'),
    keras.layers.Dense(100, activation = 'relu'),
    keras.layers.Dense(10, activation = 'softmax')
])

# 编译
model.compile(loss = 'sparse_categorical_crossentropy',
              optimizer = "sgd",
              metrics=["accuracy"])

model.summary()

# 训练模型
history = model.fit(x_train, y_train, epochs = 10, validation_data = (x_valid, y_valid))

history.history
import pandas as pd
def plot_learning_curves(history):
    pd.DataFrame(history.history).plot(figsize=(8,5))
    plt.grid(True)
    plt.gca().set_ylim(0, 5)
    plt.show()

plot_learning_curves(history)