PR

深層学習を使った競馬予想モデルの作成

Python
この記事は約63分で読めます。
スポンサーリンク

深層学習を使ったモデル分析・管理ツールの動作確認¶

0.必要なモジュールのインポート¶

一部有料ソースを含んでいます。同じ条件で競馬予想プログラムソフトを開発したい方は、宜しければソースをお買い求めください。
公開ソース一覧:https://keiba-ds-lab.com/bookers-article-lists/

In [1]:
import gc
import sys
import warnings
import pathlib
import numpy as np
import random
import torch
import torch.nn as nn


def set_seed(seed):
    random.seed(seed)  # Python標準の乱数シード
    np.random.seed(seed)  # NumPyの乱数シード
    torch.manual_seed(seed)  # PyTorchの乱数シード(CPU用)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)  # CUDAの乱数シード(単一GPU用)
        torch.cuda.manual_seed_all(seed)  # 複数GPU用
    # 再現性のためにPyTorchの動作設定を変更
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False


# シードを固定
set_seed(42)


sys.path.append(".")
sys.path.append("..")
sys.path.append("../..")
from src.data_manager.preprocess_tools import DataPreProcessor  # noqa
from src.data_manager.data_loader import DataLoader  # noqa
from src.core.meta.bet_name_meta import BetName  # noqa
from src.model_manager.pytorch_manager import PyTorchModelManager  # noqa
from src.model_manager.base_manager import BaseModelManager  # noqa
from src.model_manager.pytorch_utils.custom_dataset.thirdmodel_dataset import CustomKaibaAIDatasetForMultiTask  # noqa
warnings.filterwarnings("ignore")

cache_file_path = pathlib.Path(
    f"data/cache_0001_{CustomKaibaAIDatasetForMultiTask.__name__}.pkl")

root_dir = pathlib.Path(".").absolute().parent.parent

start_year = 2010  # DBが持つ最古の年を指定
split_year = 2010  # 学習対象期間の開始年を指定
target_year = 2019  # テスト対象期間の開始年を指定
end_year = 2023  # テスト対象期間の終了年を指定 (当然DBに対象年のデータがあること)

# 各種インスタンスの作成
data_loader = DataLoader(
    start_year,
    end_year,
    dbpath=root_dir / "data/keibadata.db"  # dbpathは各種環境に合わせてパスを指定してください。絶対パス推奨
)

dataPreP = DataPreProcessor(
    # キャッシュ機能を使用する場合にTrueを指定。デフォルト:True
    use_cache=True,
    cache_dir=pathlib.Path("./data")
)
In [2]:
pytorch_model_manager = PyTorchModelManager(
    # modelsディレクトリ配下に作成したいモデル名のフォルダパスを指定。フォルダパスは絶対パスにすると安全です。
    root_dir / "models" / "DL_first_model",
    split_year,
    target_year,
    end_year,
)
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=NeauralNetwork
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=DL_first_model
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\DL_first_model
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob
2025-01-21 15:51:28.987 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank

1.競馬データのロードと前処理の実行¶

ロードマップ3-1で作成した前処理モジュールを使って、競馬データを加工する。

In [3]:
df = data_loader.load_racedata()
dfblood = data_loader.load_horseblood()

df = dataPreP.exec_pipeline(
    df,
    dfblood,
    blood_set=["s", "b", "bs", "bbs", "ss", "sss", "ssss", "bbbs"],
    lagN=5
)
2025-01-21 15:51:29.507 | INFO     | src.data_manager.data_loader:load_racedata:23 - Get Year Range: 2010 -> 2023.
2025-01-21 15:51:29.507 | INFO     | src.data_manager.data_loader:load_racedata:24 - Loading Race Info ...
2025-01-21 15:51:31.065 | INFO     | src.data_manager.data_loader:load_racedata:26 - Loading Race Data ...
2025-01-21 15:51:42.141 | INFO     | src.data_manager.data_loader:load_racedata:28 - Merging Race Info and Race Data ...
2025-01-21 15:51:43.472 | INFO     | src.data_manager.data_loader:load_horseblood:45 - Loading Horse Blood ...
2025-01-21 15:52:17.429 | INFO     | src.data_manager.preprocess_tools:load_cache:760 - Loading Cache. file: data\cache_data.pkl
2025-01-21 15:52:30.996 | INFO     | src.data_manager.preprocess_tools:load_cache:771 - Check Cache version... cache ver: 14
2025-01-21 15:52:30.996 | INFO     | src.data_manager.preprocess_tools:exec_pipeline:170 - OK! Completed Loading Cache File. cache ver: 14

2.説明変数の用意¶

説明変数には、量的変数と質的変数を分けておきます
LightGBMと違い、深層学習の場合はカテゴリ変数はちゃんと数値化しないとダメなためです

In [4]:
# モデル作成で使用する特徴量
# 量的変数の特徴量
num_feas = [
    'distance',
    'number',
    'boxNum',
    'age',
    'jweight',
    'weight',
    'gl',
    'race_span_fill',
] + ['winR_stallion', 'winR_breed', 'winR_bStallion', 'winR_b2Stallion']
# 質的変数の特徴量
cat_feas = [
    'place_en',
    'field_en',
    'sex_en',
    'condition_en',
    'jockeyId',
    'teacherId',
    'dist_cat_en',
    'horseId',
    "raceGrade", "stallionId", "breedId", "bStallionId", "b2StallionId"
]
if not cache_file_path.exists():
    dataset_mapping = pytorch_model_manager.make_dataset_mapping(df)
del dfblood, df
gc.collect()
Out[4]:
0

量的変数の標準化やらをここで実行する

In [5]:
# ベース前処理対象のカラムを指定
# レース全体で標準化する特徴量
sd_by_all_race = ["distance", "age"]
# レースごとに標準化する特徴量
sd_by_a_race = ["number", "boxNum", "odds", "favorite", "jweight", "weight"]

# すべての期間でカテゴリ数が変わらない定常なもの
stationary_category = ["raceGrade", "place_en",
                       "field_en", "sex_en", "condition_en", "dist_cat_en"]
# カテゴリ数に変動があるもの
non_stationary_category = ["jockeyId", "teacherId", "horseId",
                           "stallionId", "breedId", "bStallionId", "b2StallionId"]

if not cache_file_path.exists():
    # データセットクラスに応じたカスタムモジュールをセットする

    pytorch_model_manager.base_preprocess(
        dataset_mapping,
        sd_by_a_race,
        sd_by_all_race,
        non_stationary_category,
        stationary_category
    )

3.データセットの作成¶

pytorch_model_manager.setup_dataset_modulerメソッドを用いて、データセット作成用のモジュールを定義する

pytorch_model_manager.setup_datasetメソッドを用いて、深層学習のモデル作成用のデータセットを作成

In [6]:
from src.model_manager.pytorch_manager import PyTorchDataset
import pickle

num_feas2 = pytorch_model_manager.convert_features_name(num_feas)
cat_feas2 = pytorch_model_manager.convert_features_name(cat_feas)


def setup_dataset_moduler(dataset_dict: PyTorchDataset, num_feas2: list[str], cat_feas2: list[str]):
    cat_num_list = CustomKaibaAIDatasetForMultiTask.generate_cat_num_list(
        dataset_dict.train[cat_feas2], cat_feas2)
    dataset_dict.train_dataset = CustomKaibaAIDatasetForMultiTask(
        dataset_dict.train, num_feas2, cat_feas2, cat_num_list)

    dataset_dict.valid_dataset = CustomKaibaAIDatasetForMultiTask(
        dataset_dict.valid, num_feas2, cat_feas2, cat_num_list)

    dataset_dict.test_dataset = CustomKaibaAIDatasetForMultiTask(
        dataset_dict.test, num_feas2, cat_feas2, cat_num_list)
    dataset_dict.cat_num_list = cat_num_list
    dataset_dict.params = (
        dataset_dict.cat_num_list,
        len(num_feas2)
    )


if not cache_file_path.exists():
    # データセットクラスに応じたカスタムモジュールをセットする
    pytorch_model_manager.setup_dataset_moduler = setup_dataset_moduler

    dataset_mapping = pytorch_model_manager.setup_dataset(
        dataset_mapping, [num_feas2, cat_feas2])

    with open(cache_file_path, "wb") as f:
        pickle.dump(dataset_mapping, f)
else:
    with open(cache_file_path, "rb") as f:
        dataset_mapping: dict[str, PyTorchDataset] = pickle.load(f)
損失関数の定義
In [7]:
# ロス関数


class MultiTaskLoss(nn.Module):
    def __init__(self, l1=1/3, l2=1/3, l3=1/3, topn=1):
        super(MultiTaskLoss, self).__init__()
        self.l1 = l1
        self.l2 = l2
        self.l3 = l3
        self.topn = topn
        self.KLloss1 = nn.KLDivLoss(reduction="batchmean")
        self.KLloss2 = nn.KLDivLoss(reduction="batchmean")
        # self.MAE = nn.MSELoss()
        self.MAE = nn.L1Loss()
        self.loss1 = 0
        self.loss2 = 0
        self.loss3 = 0

    def forward(self, pred1: torch.Tensor, pred2: torch.Tensor, label: torch.Tensor, odds_rate: torch.Tensor,  mask: torch.Tensor) -> torch.Tensor:
        if len(pred1.shape) == 1:
            batch_size = 1
            pred1, pred2, label, odds_rate, mask = pred1.unsqueeze(0), pred2.unsqueeze(
                0), label.unsqueeze(0), odds_rate.unsqueeze(0), mask.unsqueeze(0)
        else:
            batch_size = pred1.shape[0]

        # loss1
        _, indices = torch.topk(label, k=self.topn, largest=False)
        _, indices2 = torch.topk(label, k=18-self.topn)
        ans_proba = torch.cat([torch.gather(odds_rate, dim=1, index=indices), torch.gather(
            odds_rate, dim=1, index=indices2).sum(dim=1).reshape(batch_size, 1)], dim=1)

        _, indices_pred1 = torch.topk(pred1, k=self.topn)
        _, indices_pred2 = torch.topk(pred1, k=18-self.topn, largest=False)
        pred_proba = torch.cat([torch.gather(pred2, dim=1, index=indices_pred1), torch.gather(
            pred2, dim=1, index=indices_pred2).sum(dim=1).reshape(batch_size, 1)], dim=1)

        loss1: torch.Tensor = self.KLloss1(nn.functional.log_softmax(
            pred_proba, dim=1), ans_proba)

        self.loss1 = loss1.detach().cpu().item()

        pred2 = torch.log(torch.clip(
            nn.functional.softmax(pred2, dim=1), 0.8/500))
        odds_rate = torch.log(torch.clip(odds_rate, 0.8/500))

        loss2: torch.Tensor = self.MAE(pred2, odds_rate)
        self.loss2 = loss2.detach().cpu().item()

        # loss3
        ans_rank = (19 - label)
        pair_diff = pred1.unsqueeze(2) - pred1.unsqueeze(1)
        # pair_labels = (ans_rank.unsqueeze(2) - ans_rank.unsqueeze(1)).sign()
        pair_labels = 5*(ans_rank.unsqueeze(2) - ans_rank.unsqueeze(1))
        # 数値安定性を考慮したロジスティック損失
        log_loss = nn.functional.softplus(-pair_labels * pair_diff)
        # 平均損失を返す
        loss3: torch.Tensor = log_loss.mean()
        self.loss3 = loss3.detach().cpu().item()

        return self.l1*loss1 + self.l2*loss2 + self.l3*loss3

学習するモデルの定義
完全にスクラッチが必要なので、アイデア云々は生成AIと対話してたたき台を作ってもらうと良いと思います

In [8]:
# モデル定義
import math


class KeibaAIThirdModelForMultiTask(nn.Module):
    def __init__(self, cat_num_list: list[int], numerous_feature_num: int) -> None:
        super(KeibaAIThirdModelForMultiTask, self).__init__()

        cat_embed_list = []
        self.embed_num_list = []
        for cat_num in cat_num_list:
            embed_num = round(math.sqrt(cat_num))
            self.embed_num_list += [embed_num]
            embed_layer = nn.Embedding(
                cat_num, embed_num, padding_idx=cat_num-1)
            cat_embed_list += [embed_layer]
        # カテゴリのベクトル埋め込み用レイヤー
        self.cat_embed_list = nn.ModuleList(cat_embed_list)
        self.cat_num_list = cat_num_list

        # 畳み込み層
        self.conv1 = nn.Conv2d(
            in_channels=1, out_channels=8, kernel_size=3, stride=1, padding=1)

        # 畳み込み層(共有)
        self.shared_conv = nn.Conv2d(
            in_channels=8, out_channels=8, kernel_size=3, stride=1, padding=1)

        self.task1_input0 = nn.Linear(
            8 * (18//4) * (((numerous_feature_num+sum(self.embed_num_list))//2)//2), 2048)
        self.task2_input0 = nn.Linear(
            8 * (18//4) * (((numerous_feature_num+sum(self.embed_num_list))//2)//2), 2048)

        self.task1_input1 = nn.Linear(2048, 512)
        self.task2_input1 = nn.Linear(2048, 512)

        self.task1_input2 = nn.Linear(512, 256)
        self.task2_input2 = nn.Linear(512, 256)

        self.task1_input3 = nn.Linear(256, 128)
        self.task2_input3 = nn.Linear(256, 128)

        self.task1_input4 = nn.Linear(128, 64)
        self.task2_input4 = nn.Linear(128, 64)

        # タスク1: 着順の推定
        self.fc_task1 = nn.Linear(64, 18)
        # タスク2: オッズの分布推定
        self.fc_task2 = nn.Linear(64, 18)

        self.drop_rate = 0.25
        self.dropout = nn.Dropout(p=self.drop_rate)
        self.relu = nn.ReLU()

        # self.log_softmax = nn.LogSoftmax(dim=0)
        self.mode = torch.tensor(0)
        self.max_pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.max_pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

    def forward(self, x_num: torch.Tensor, x_cat_list: torch.Tensor) -> torch.Tensor:
        x_cat_list = x_cat_list.int()
        cat_embed_list = [self.cat_embed_list[idx](
            x_cat.T) for idx, x_cat in enumerate(x_cat_list.T)]
        cat_embed_list = torch.cat(
            cat_embed_list, dim=2 if len(x_cat_list.shape) > 2 else 1)
        x = torch.cat([x_num, cat_embed_list],
                      dim=2 if len(x_num.shape) > 2 else 1).unsqueeze(1 if len(x_num.shape) > 2 else 0)

        # 畳み込み層 + プーリング
        # input(batch_size, 1, 18, 特徴量数) -> (batch_size, 16, 18, 特徴量数)
        x = self.relu(self.conv1(x))
        # (batch_size, 16, 18, 特徴量数) -> (batch_size, 16, 9, 特徴量数//2)
        x = self.max_pool1(x)

        # 畳み込み層(共有)
        # (batch_size, 16, 9, 特徴量数//2) -> (batch_size, 32, 9, 特徴量数//2)
        x = self.relu(self.shared_conv(x))
        # (batch_size, 32, 9, 特徴量数//2) -> (batch_size, 32, 4, 特徴量数//4)
        x = self.max_pool2(x)

        # フラット化
        if len(x_num.shape) > 2:
            x = x.view(x.size(0), -1)
        else:
            x = x.view(-1)

        # タスク1: 着順の推定
        x_task1 = self.dropout(self.relu(self.task1_input0(x)))
        x_task1 = self.dropout(self.relu(self.task1_input1(x_task1)))
        x_task1 = self.dropout(self.relu(self.task1_input2(x_task1)))
        x_task1 = self.dropout(self.relu(self.task1_input3(x_task1)))
        x_task1 = self.dropout(self.relu(self.task1_input4(x_task1)))
        task1_output: torch.Tensor = self.fc_task1(x_task1)

        # タスク2: オッズの分布推定
        x_task2 = self.dropout(self.relu(self.task2_input0(x)))
        x_task2 = self.dropout(self.relu(self.task2_input1(x_task2)))
        x_task2 = self.dropout(self.relu(self.task2_input2(x_task2)))
        x_task2 = self.dropout(self.relu(self.task2_input3(x_task2)))
        x_task2 = self.dropout(self.relu(self.task2_input4(x_task2)))
        task2_output: torch.Tensor = self.fc_task2(x_task2)

        # task2_output = self.log_softmax(task2_output)
        if self.mode < 1:
            return task1_output, task2_output
        elif self.mode < 2:
            return task1_output
        else:
            return task2_output

    def predict(self, x_num: torch.Tensor, x_cat_list: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
        output1, output2 = self.forward(x_num, x_cat_list)
        return output1[mask > 0], output2[mask > 0]

4.モデルの作成¶

pytorch_model_manager.train_modulerメソッドを用いて、モデルの学習用モジュールを定義する

In [ ]:
import tqdm

# デバイスの設定(GPUが利用可能な場合)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

optim_param = {}
scheduler_param = {}


def train_moduler(
    target_dataset: PyTorchDataset
):
    train_dataloader = torch.utils.data.DataLoader(
        target_dataset.train_dataset, batch_size=1024, shuffle=True)
    valid_dataloader = torch.utils.data.DataLoader(target_dataset.valid_dataset, batch_size=len(
        target_dataset.valid_dataset), shuffle=False)
    test_dataloader = torch.utils.data.DataLoader(
        target_dataset.test_dataset, batch_size=len(target_dataset.test_dataset), shuffle=False)

    loss_list_all = []
    # 学習ループ
    num_epochs = 100
    valid_loss, test_loss, best_loss, global_step = 0, 0, 0, 0
    torch.cuda.empty_cache()

    model: nn.Module = KeibaAIThirdModelForMultiTask(
        target_dataset.cat_num_list,
        len(target_dataset.train_dataset.numerous.columns)
    ).to(device)

    param = np.array([1., 1., 1.])
    param /= param.sum()
    loss_fn = MultiTaskLoss(*param.tolist(), topn=3).to(device)

    pytorch_model_manager.expt_info_map.model_class_name = KeibaAIThirdModelForMultiTask.__name__
    # オプティマイザ
    pytorch_model_manager.optimizer = torch.optim.AdamW(
        model.parameters(),
        lr=optim_param.get("lr", 5e-4),
        weight_decay=optim_param.get("weight_decay", 0.5)
    )

    # スケジューラー (ReduceLROnPlateau)
    pytorch_model_manager.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
        pytorch_model_manager.optimizer,
        mode='min',
        factor=scheduler_param.get("factor", 0.5),
        patience=scheduler_param.get("patience", 2),
        verbose=True
    )

    with tqdm.tqdm(total=len(train_dataloader)*num_epochs, desc=f"Epoch 1/{num_epochs}. loss: train=None, valid=None, test=None, Best=None (0)") as pbar:
        for epoch in range(num_epochs):
            torch.cuda.empty_cache()
            model.train()  # 訓練モード
            loss_list = []
            for datas in train_dataloader:
                global_step += 1
                num_data, cat_data_list, labels, odds_rate, mask = datas["num_feas"], datas[
                    "cat_feas"], datas["label"], datas["odds_rate"], datas["mask"]
                # データとラベルをGPUに転送
                num_data, cat_data_list, labels, odds_rate, mask = num_data.to(device), cat_data_list.to(device), labels.to(
                    device), odds_rate.to(device), mask.to(device)

                # 順伝播
                pytorch_model_manager.optimizer.zero_grad()  # 勾配を初期化
                outputs = model(num_data, cat_data_list)

                # 損失の計算
                loss = loss_fn(outputs[0], outputs[1],
                               labels, odds_rate, mask)

                # 逆伝播
                loss.backward()

                # 最適化
                pytorch_model_manager.optimizer.step()
                current_lr = pytorch_model_manager.optimizer.param_groups[0]['lr']
                pytorch_model_manager.writer.add_scalar('Loss/Loss1 (TOPn オッズ分布)',
                                                        loss_fn.loss1, global_step)
                pytorch_model_manager.writer.add_scalar('Loss/Loss2 (オッズ推定MAE)',
                                                        loss_fn.loss2, global_step)
                pytorch_model_manager.writer.add_scalar('Loss/Loss3 (着順Pair Wise)',
                                                        loss_fn.loss3, global_step)
                pytorch_model_manager.writer.add_scalar(
                    'Loss/Total', loss.item(), global_step)

                # ロスを記録
                loss_list += [loss.item()]
                pbar.update()
                pbar.set_description(
                    desc=f"Epoch {epoch+1}/{num_epochs}. loss: train={np.mean(loss_list):.4f}, valid={valid_loss:.4f}, test={test_loss:.4f}, Best={best_loss:.4f} ({pytorch_model_manager.early_stopping.best_epoch})")
                del num_data, cat_data_list, labels, odds_rate, mask, datas, loss

            loss_list_all += [loss_list]

            with torch.no_grad():
                model.eval()  # 推論モード
                # 検証データのロス確認
                for datas_val in valid_dataloader:
                    num_data_val, cat_data_list_val, labels_val, odds_rate_val, mask_val = datas_val["num_feas"], datas_val[
                        "cat_feas"], datas_val["label"], datas_val["odds_rate"], datas_val["mask"]
                    # データとラベルをGPUに転送
                    num_data_val, cat_data_list_val, labels_val, odds_rate_val, mask_val = num_data_val.to(device), cat_data_list_val.to(device), labels_val.to(
                        device), odds_rate_val.to(device), mask_val.to(device)

                    # 順伝播
                    outputs = model(num_data_val, cat_data_list_val)
                    # 損失の計算
                    loss_val = loss_fn(outputs[0], outputs[1],
                                       labels_val, odds_rate_val, mask_val)
                    valid_loss = loss_val.item()
                    del num_data_val, cat_data_list_val, labels_val, odds_rate_val, mask_val, datas_val, loss_val

                # テストデータのロス確認
                for datas_te in test_dataloader:
                    num_data_te, cat_data_list_te, labels_te, odds_rate_te, mask_te = datas_te["num_feas"], datas_te[
                        "cat_feas"], datas_te["label"], datas_te["odds_rate"], datas_te["mask"]
                    # データとラベルをGPUに転送
                    num_data_te, cat_data_list_te, labels_te, odds_rate_te, mask_te = num_data_te.to(device), cat_data_list_te.to(device), labels_te.to(
                        device), odds_rate_te.to(device), mask_te.to(device)

                    # 順伝播
                    outputs_test = model(num_data_te, cat_data_list_te)
                    # 損失の計算
                    loss_test = loss_fn(
                        outputs_test[0], outputs_test[1], labels_te, odds_rate_te, mask_te)

                    test_loss = loss_test.item()
                    del num_data_te, cat_data_list_te, labels_te, odds_rate_te, mask_te, datas_te, loss_test

                pytorch_model_manager.writer.add_scalar('Epoch Loss/Total-Train',
                                                        np.mean(loss_list), global_step)
                pytorch_model_manager.writer.add_scalar('Epoch Loss/Total-Valid',
                                                        valid_loss, global_step)
                pytorch_model_manager.writer.add_scalar('Epoch Loss/Total-Test',
                                                        test_loss, global_step)

            # 学習率スケジューラーの更新
            pytorch_model_manager.scheduler.step(valid_loss, epoch)
            pytorch_model_manager.writer.add_scalar(
                'Learning Rate/current', current_lr, epoch)

            # Early Stopping のチェック
            pytorch_model_manager.early_stopping(
                valid_loss, model, epoch, current_lr)
            best_loss = pytorch_model_manager.early_stopping.best_loss
            pbar.set_description(
                desc=f"Epoch {epoch+1}/{num_epochs}. loss: train={np.mean(loss_list):.4f}, valid={valid_loss:.4f}, test={test_loss:.4f}, Best={best_loss:.4f} ({pytorch_model_manager.early_stopping.best_epoch})")

            if pytorch_model_manager.early_stopping.early_stop:
                print(f"Early stopping at epoch {epoch+1}")
                break


pytorch_model_manager.train_moduler = train_moduler
torch.cuda.empty_cache()

pytorch_model_manager.trainメソッドを使って、モデルの学習を実行

In [10]:
import numpy as np

pytorch_model_manager.train(
    dataset_mapping,
    model_class=KeibaAIThirdModelForMultiTask,
    early_params={
        "patience": 5,
        "min_delta": 0.0,
        "lr_threshold": 1e-9
    }
)
2025-01-21 15:53:09.839 | INFO     | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_class_name, val=KeibaAIThirdModelForMultiTask

5.データセットの推論¶

pytorch_model_manager.predict_modulerメソッドを用いて、推論用モジュールを定義し
pytorch_model_manager.predictメソッドを用いて、対象のデータセットを推論

In [ ]:
def predict_moduler(dataset: PyTorchDataset):
    
    train_dataloader = DataLoader(
        dataset.train_dataset, batch_size=1024, shuffle=False)
    valid_dataloader = DataLoader(
        dataset.valid_dataset, batch_size=1024, shuffle=False)
    test_dataloader = DataLoader(
        dataset.test_dataset, batch_size=1024, shuffle=False)
    with torch.no_grad():
        pytorch_model_manager.model.to(pytorch_model_manager.device)
        pytorch_model_manager.model.eval()
        for mode, dataloader in [('train', train_dataloader), ('valid', valid_dataloader), ('test', test_dataloader)]:
            all_outputs0 = []
            all_outputs1 = []
            for datas in dataloader:
                num_data, cat_data_list, labels, odds_rate, mask = datas["num_feas"], datas[
                    "cat_feas"], datas["label"], datas["odds_rate"], datas["mask"]
                # データとラベルをGPU/CPUに転送
                num_data, cat_data_list, labels, odds_rate, mask = [
                    num_data.to(pytorch_model_manager.device),
                    cat_data_list.to(pytorch_model_manager.device),
                    labels.to(pytorch_model_manager.device),
                    odds_rate.to(pytorch_model_manager.device),
                    mask.to(pytorch_model_manager.device)
                ]

                # 順伝播
                if 'predict' in pytorch_model_manager.model.__dir__():
                    outputs = pytorch_model_manager.model.predict(
                        num_data, cat_data_list, mask)
                else:
                    outputs = pytorch_model_manager.model(num_data, cat_data_list, mask)
                all_outputs0 += outputs[0].to(
                    "cpu").detach().numpy().tolist()
                all_outputs1 += outputs[1].to(
                    "cpu").detach().numpy().tolist()
                del num_data, cat_data_list, labels, odds_rate, mask

            idf = dataset.__dict__[mode].copy()
            idf[pytorch_model_manager.pred_fav_columns] = np.exp(np.array(all_outputs1))
            idf[pytorch_model_manager.pred_fav_columns] /= idf["raceId"].map(
                idf[["raceId", pytorch_model_manager.pred_fav_columns]].groupby("raceId")[pytorch_model_manager.pred_fav_columns].sum())
            idf[pytorch_model_manager.pred_odds_columns] = 0.8/idf[pytorch_model_manager.pred_fav_columns]
            idf[pytorch_model_manager.pred_fav_columns] = idf.groupby(
                "raceId")[pytorch_model_manager.pred_fav_columns].rank(ascending=False).astype(int)
            idf[pytorch_model_manager.confidence_column] = np.array(all_outputs0)
            idf[pytorch_model_manager.confidence_rank_column] = idf.groupby(
                "raceId")[pytorch_model_manager.confidence_column].rank(ascending=False).astype(int)
            del all_outputs0, all_outputs1
            dataset.__dict__[mode] = idf.copy()
            del idf, dataloader

pytorch_model_manager.predict_moduler = predict_moduler
In [ ]:
pytorch_model_manager.predict(dataset_mapping)
2025-01-21 15:53:10.621 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2019first
2025-01-21 15:53:11.210 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2019first
2025-01-21 15:53:53.427 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first
2025-01-21 15:53:53.477 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2019first
2025-01-21 15:53:57.126 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2019second
2025-01-21 15:53:57.656 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2019second
2025-01-21 15:54:42.611 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second
2025-01-21 15:54:42.657 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2019second
2025-01-21 15:54:46.483 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2020first
2025-01-21 15:54:47.078 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2020first
2025-01-21 15:55:33.983 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first
2025-01-21 15:55:34.025 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2020first
2025-01-21 15:55:38.125 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2020second
2025-01-21 15:55:38.696 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2020second
2025-01-21 15:56:29.009 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second
2025-01-21 15:56:29.074 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2020second
2025-01-21 15:56:33.178 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2021first
2025-01-21 15:56:33.774 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2021first
2025-01-21 15:57:26.481 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first
2025-01-21 15:57:26.541 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2021first
2025-01-21 15:57:30.976 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2021second
2025-01-21 15:57:31.600 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2021second
2025-01-21 15:58:26.561 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second
2025-01-21 15:58:26.624 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2021second
2025-01-21 15:58:31.212 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2022first
2025-01-21 15:58:31.829 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2022first
2025-01-21 15:59:30.480 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first
2025-01-21 15:59:30.531 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2022first
2025-01-21 15:59:35.337 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2022second
2025-01-21 15:59:36.028 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2022second
2025-01-21 16:00:38.561 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second
2025-01-21 16:00:38.626 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2022second
2025-01-21 16:00:43.462 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2023first
2025-01-21 16:00:44.146 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2023first
2025-01-21 16:01:47.146 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first
2025-01-21 16:01:47.213 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2023first
2025-01-21 16:01:52.244 | INFO     | src.model_manager.pytorch_manager:load_model:493 - Loading model... model name: 2023second
2025-01-21 16:01:52.938 | INFO     | src.model_manager.pytorch_manager:load_model:497 - model activate! model_name: 2023second
2025-01-21 16:03:00.227 | INFO     | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second
2025-01-21 16:03:00.303 | INFO     | src.model_manager.pytorch_manager:save_predict_result:468 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\00_predict\2023second

7.収支グラフの作成¶

pytorch_model_manager.set_bet_columnメソッドを用いて、指定したデータセットにベットフラグを追加する

pytorch_model_manager.merge_dataframe_dataメソッドを用いて、作成対象のDataFrameの作成

pytorch_model_manager.generate_profit_lossメソッドを用いて、収支グラフの原データを生成

In [247]:
bet_mode = BetName.tan
bet_column = pytorch_model_manager.get_bet_column(bet_mode=bet_mode)
pl_column = pytorch_model_manager.get_profit_loss_column(bet_mode=bet_mode)
pred_fav_column = pytorch_model_manager.pred_fav_columns
pred_odds_column = pytorch_model_manager.pred_odds_columns
lo, up = 5, 10
loo, upo = 0, 10
for dataset_dict in dataset_mapping.values():
    pytorch_model_manager.set_bet_column(dataset_dict, bet_mode)
    dataset_dict.pred_valid[bet_column] &= dataset_dict.pred_valid[
        pred_fav_column].ge(lo)
    dataset_dict.pred_test[bet_column] &= dataset_dict.pred_test[
        pred_fav_column].ge(lo)
    dataset_dict.pred_valid[bet_column] &= dataset_dict.pred_valid[
        pred_fav_column].lt(up)
    dataset_dict.pred_test[bet_column] &= dataset_dict.pred_test[
        pred_fav_column].lt(up)

    dataset_dict.pred_valid[bet_column] &= dataset_dict.pred_valid[
        pred_odds_column].ge(loo)
    dataset_dict.pred_test[bet_column] &= dataset_dict.pred_test[
        pred_odds_column].ge(loo)
    dataset_dict.pred_valid[bet_column] &= dataset_dict.pred_valid[
        pred_odds_column].lt(upo)
    dataset_dict.pred_test[bet_column] &= dataset_dict.pred_test[
        pred_odds_column].lt(upo)
In [ ]:
_, dfbetva, dfbette = pytorch_model_manager.merge_dataframe_data(
    dataset_mapping, mode=True)

dfbetva, dfbette = pytorch_model_manager.generate_profit_loss(
    dfbetva, dfbette, bet_mode)

dfbette[f"{pl_column}_sum"] = dfbette[pl_column].cumsum()
dfbette[["raceDate", "raceId", "label", "favorite",
         bet_column, pl_column, f"{pl_column}_sum"]]
2025-01-25 10:06:47.984 | INFO     | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\tan\profit_loss
Out[ ]:
raceDate raceId label favorite bet_tan pl_tan pl_tan_sum
34 2019-01-06 201906010212 12 5 True -100.0 -100.0
84 2019-01-13 201908010404 16 9 True -100.0 -200.0
112 2019-01-14 201908010509 11 10 True -100.0 -300.0
117 2019-01-19 201906010602 15 7 True -100.0 -400.0
132 2019-01-19 201907010107 2 3 True -100.0 -500.0
1654 2023-12-28 202309050902 12 14 True -100.0 -108080.0
1655 2023-12-28 202309050903 2 9 True -100.0 -108180.0
1656 2023-12-28 202309050905 6 9 True -100.0 -108280.0
1657 2023-12-28 202309050906 8 7 True -100.0 -108380.0
1658 2023-12-28 202309050907 2 9 True -100.0 -108480.0

3323 rows × 7 columns

8.基礎分析と結果の保存¶

pytorch_model_manager.basic_analyzeメソッドを用いて、回収率と的中率の統計および人気別のベット回数の集計を行い結果を保存する

In [249]:
pytorch_model_manager.basic_analyze(dataset_mapping)
2025-01-25 10:06:48.046 | INFO     | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze.
2025-01-25 10:06:48.567 | INFO     | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary.
2025-01-25 10:06:48.585 | INFO     | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary.

9.オッズグラフの作成¶

pytorch_model_manager.merge_dataframe_dataメソッドを用いて、作成対象のDataFrameの作成

pytorch_model_manager.gegnerate_odds_graphメソッドを用いて、オッズグラフの原データを生成

In [250]:
dftrain, dfvalid, dftest = pytorch_model_manager.merge_dataframe_data(
    dataset_mapping,
    mode=True
)
summary_dict = pytorch_model_manager.gegnerate_odds_graph(
    dftrain, dfvalid, dftest, bet_mode)
print("'test'データのオッズグラフを確認")
summary_dict["test"].fillna(0)
2025-01-25 10:06:51.067 | INFO     | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\analyze\tan\odds_graph
'test'データのオッズグラフを確認
Out[250]:
勝率 支持率 回収率100%超 weight 件数
odds_round
1.25 85.714286 64.000000 80.000000 0.004213 14
1.75 46.753247 45.714286 57.142857 0.023172 77
2.25 39.024390 35.555556 44.444444 0.024676 82
2.75 32.558140 29.090909 36.363636 0.025880 86
3.25 20.987654 24.615385 30.769231 0.024376 81
120.00 0.000000 0.666667 0.833333 0.012037 40
130.00 5.882353 0.615385 0.769231 0.010232 34
140.00 0.000000 0.571429 0.714286 0.012639 42
150.00 0.000000 0.533333 0.666667 0.005116 17
200.00 0.000000 0.400000 0.500000 0.114956 382

90 rows × 5 columns

10.分析基盤um-AIのためにモデル情報をエクスポート¶

pytorch_model_manager.export_model_infoメソッドを用いて、モデル情報をエクスポート

In [251]:
pytorch_model_manager.export_model_info()
2025-01-25 10:06:51.084 | INFO     | src.model_manager.base_manager:export_model_info:848 - Export Model info json. export path: e:\dev_um_ai\dev-um-ai\models\DL_first_model\model_info.json

コメント

タイトルとURLをコピーしました