はじめに¶
私は競馬予想AIの開発をしています。動画で制作過程の解説をしています。良ければ見ていってください。
また、共有するソースの一部は有料のものを使ってます。
同じように分析したい方は、以下の記事から入手ください。
4.セカンドモデル作成 準備¶
今回からセカンドモデルの作成に入る
前回までの血統の分析結果を考慮したモデルの作成に取り組む
しかし、セカンドモデルの作成前にファーストモデルではデモとして作った一面があり、無理のある前提を仮定しているなど単純に問題設定に間違いがある
そのため、ここでは血統情報を考慮したモデルを作る前に、準備編としてファーストモデルの作り直しを行う。
今日やること
- ファーストモデルの作り直し
- 芝とダートでモデルを分けた場合の比較
話の流れ
- 下準備
- ファーストモデルの設定
- ファーストモデルの学習
- 性能の確認(WEBアプリ起動)
- 芝とダートを分ける方法
- 芝,ダート別版モデルの学習
- 結論
4-0.下準備¶
ソースの一部は有料のものを使ってます。
同じように分析したい方は、以下の記事から入手ください。
今回は血統データは不用なので、
必要なモジュールのインポートから
モデル作成用のデータの読み込みまで行う
import pathlib
import warnings
import lightgbm as lgbm
import sys
sys.path.append(".")
sys.path.append("..")
from src.model_manager.lgbm_manager import LightGBMDataset # noqa
from src.model_manager.lgbm_manager import LightGBMModelManager # noqa
from src.core.meta.bet_name_meta import BetName # noqa
from src.data_manager.preprocess_tools import DataPreProcessor # noqa
from src.data_manager.data_loader import DataLoader # noqa
warnings.filterwarnings("ignore")
root_dir = pathlib.Path(".").absolute().parent
start_year = 2000 # DBが持つ最古の年を指定
split_year = 2014 # 学習対象期間の開始年を指定
target_year = 2019 # テスト対象期間の開始年を指定
end_year = 2023 # テスト対象期間の終了年を指定 (当然DBに対象年のデータがあること)
# 各種インスタンスの作成
data_loader = DataLoader(
start_year,
end_year,
dbpath=root_dir / "data" / "keibadata.db" # dbpathは各種環境に合わせてパスを指定してください。絶対パス推奨
)
dataPreP = DataPreProcessor()
df = data_loader.load_racedata()
df = dataPreP.exec_pipeline(df)
2024-08-10 12:57:11.300 | INFO | src.data_manager.data_loader:load_racedata:23 - Get Year Range: 2000 -> 2023. 2024-08-10 12:57:11.301 | INFO | src.data_manager.data_loader:load_racedata:24 - Loading Race Info ... 2024-08-10 12:57:13.243 | INFO | src.data_manager.data_loader:load_racedata:26 - Loading Race Data ... 2024-08-10 12:57:32.115 | INFO | src.data_manager.data_loader:load_racedata:28 - Merging Race Info and Race Data ...
2024-08-10 12:57:34.422 | INFO | src.data_manager.preprocess_tools:__0_check_use_save_checkpoints:35 - Start PreProcess #0 ... 2024-08-10 12:57:34.425 | INFO | src.data_manager.preprocess_tools:__1_exec_all_sub_prep1:38 - Start PreProcess #1 ... 2024-08-10 12:57:41.420 | INFO | src.data_manager.preprocess_tools:__2_exec_all_sub_prep2:40 - Start PreProcess #2 ... 2024-08-10 12:57:56.039 | INFO | src.data_manager.preprocess_tools:__3_convert_type_str_to_number:42 - Start PreProcess #3 ... 2024-08-10 12:58:00.261 | INFO | src.data_manager.preprocess_tools:__4_drop_or_fillin_none_data:44 - Start PreProcess #4 ... 2024-08-10 12:58:04.361 | INFO | src.data_manager.preprocess_tools:__5_exec_all_sub_prep5:46 - Start PreProcess #5 ... 2024-08-10 12:58:29.183 | INFO | src.data_manager.preprocess_tools:__6_convert_label_to_rate_info:48 - Start PreProcess #6 ... 2024-08-10 12:58:40.926 | INFO | src.data_manager.preprocess_tools:__7_convert_distance_to_smile:50 - Start PreProcess #7 ... 2024-08-10 12:58:41.173 | INFO | src.data_manager.preprocess_tools:__8_category_encoding:52 - Start PreProcess #8 ... 2024-08-10 12:58:46.586 | INFO | src.data_manager.preprocess_tools:__9_convert_raceClass_to_grade:54 - Start PreProcess #9 ...
準備完了
4-1.ファーストモデルの設定¶
4-1-1.前提事項一覧¶
機械学習モデル
- LightGBM
- 学習タスク
- 2値分類
- 問題設定
- 1着になるかどうか
- 学習期間
- 2010年~2022年12月
- 検証期間
- 2010年~20223年6月
- 予測年度
- 2019年~2023年12月
4-1-2.モデル説明¶
モデル作成の動機¶
全ての基準になるモデルが欲しい。
改良した際に最低限勝つべき対象とする。
モデルの目的¶
1着になる競走馬の確率を出したい
確認したい仮説¶
以下の確認
基本的な情報(調べれば手に入る当日の情報)から1着になる馬を予測できるかどうか
但し書き(2024年8月時点)¶
ただし、オッズは最終オッズを使用する。
オッズの分析はおいおい行うので、一旦は理論値という意味合いで最終オッズを用いる。
特徴量¶
オッズ,人気,馬場,距離,距離カテゴリ,競馬場,馬場状態,
馬番,枠番,馬齢,斤量,馬体重,体重増減,出走間隔,
レースグレード,性別,騎手ID,調教師ID,馬ID
目的変数¶
1着なら1, そうでないなら0のバイナリ
4-2.ファーストモデルの学習¶
4-2-0.機械学習モデル作成時に必要な手順¶
- データ準備
- データセット作成
- 学習実行
- 結果の確認
- モデルエクスポート
4-2-1.データの準備¶
モデル学習用インスタンス作成¶
lgbm_model_manager = LightGBMModelManager(
# modelsディレクトリ配下に作成したいモデル名のフォルダパスを指定。
# フォルダパスは絶対パスにすると安全です。
root_dir / "models" / "first_model",
split_year,
target_year,
end_year
)
2024-08-10 14:36:38.327 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-10 14:36:38.329 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=first_model 2024-08-10 14:36:38.330 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\first_model 2024-08-10 14:36:38.332 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\first_model 2024-08-10 14:36:38.335 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\first_model\analyze 2024-08-10 14:36:38.338 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict 2024-08-10 14:36:38.340 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-10 14:36:38.341 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank
説明変数と目的変数の作成¶
# 説明変数にするカラム
feature_columns = [
'distance',
'number',
'boxNum',
'odds',
'favorite',
'age',
'jweight',
'weight',
'gl',
'race_span',
"raceGrade" # グレード情報を追加
] + dataPreP.encoding_columns # カテゴリカラムを追加
# カテゴリカラム一覧
# "place", "field", "sex", "condition", "jockeyId", "teacherId", "dist_cat", "horseId"
# 目的変数用のカラム
objective_column = "label_in1"
# 説明変数と目的変数をモデル作成用のインスタンスへセット
lgbm_model_manager.set_feature_and_objective_columns(
feature_columns, objective_column)
# 目的変数の作成: 1着のデータに正解フラグを立てる処理を実行
df = lgbm_model_manager.add_objective_column_to_df(df, "label", 1)
2024-08-10 14:46:13.622 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-10 14:46:13.625 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-08-10 14:46:13.626 | INFO | src.model_manager.lgbm_manager:add_objective_column_to_df:80 - make objective data. label_in1. topN: 1
4-2-2.データセット作成¶
データセットの作成では、以下の学習とテストデータを作成する
- 学習データ
- 2010年から検証データ直前まで
- 検証データ
- 学習データから半年間
- 予測データ
- 2019年1月から6月まで
- 2019年7月から12月まで
- 2020年1月から6月まで
- 2020年7月から12月まで
- 2021年1月から6月まで
- 2021年7月から12月まで
- 2022年1月から6月まで
- 2022年7月から12月まで
- 2023年1月から6月まで
- 2023年7月から12月まで
上記のデータセットは以下の2行を実行するだけで作成できる
dataset_mapping = lgbm_model_manager.make_dataset_mapping(df)
dataset_mapping = lgbm_model_manager.setup_dataset(dataset_mapping)
2024-08-10 14:55:04.157 | INFO | src.data_manager.dataset_tools:make_dataset_mapping:103 - Generate dataset mapping. Year Range: 2019 -> 2023 2024-08-10 14:55:07.072 | INFO | src.model_manager.lgbm_manager:setup_dataset:110 - Create LightGBM Dataset.
4-2-3.学習実行¶
# lightGBM用のモデルパラメータ
# パラメータ自体は適当にする。
params = {
'boosting_type': 'gbdt',
# 二値分類
'objective': 'binary',
'metric': 'auc',
'verbose': 0,
'seed': 77777,
'learning_rate': 0.01,
"n_estimators": 10000
}
lgbm_model_manager.train_all(
params,
dataset_mapping,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-08-10 14:59:18.928 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-08-10 14:59:18.933 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-08-10 14:59:18.934 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-08-10 14:59:18.935 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-08-10 14:59:18.936 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-08-10 14:59:18.938 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-08-10 14:59:18.940 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-08-10 14:59:18.941 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-08-10 14:59:18.942 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-08-10 14:59:18.944 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-08-10 14:59:18.945 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-08-10 14:59:18.946 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-08-10 14:59:18.948 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 14:59:19.182 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 14:59:19.184 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 14:59:19.422 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 14:59:30.076 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.871807 valid_1's auc: 0.840632 2024-08-10 14:59:50.062 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.89677 valid_1's auc: 0.840119 2024-08-10 14:59:55.592 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [63] training's auc: 0.851271 valid_1's auc: 0.840833 2024-08-10 14:59:55.771 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2019first\model.params 2024-08-10 14:59:56.891 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-08-10 14:59:56.892 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 14:59:57.073 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 14:59:57.075 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 14:59:57.300 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:00:09.275 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.870639 valid_1's auc: 0.819911 2024-08-10 15:00:31.367 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.894912 valid_1's auc: 0.819113 2024-08-10 15:00:55.486 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.909803 valid_1's auc: 0.817025 2024-08-10 15:00:56.263 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [258] training's auc: 0.871511 valid_1's auc: 0.819973 2024-08-10 15:00:57.070 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2019second\model.params 2024-08-10 15:00:58.334 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-08-10 15:00:58.335 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:00:58.559 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:00:58.561 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:00:58.806 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:01:11.586 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86771 valid_1's auc: 0.832936 2024-08-10 15:01:36.192 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.891366 valid_1's auc: 0.830434 2024-08-10 15:01:55.015 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [170] training's auc: 0.859468 valid_1's auc: 0.833484 2024-08-10 15:01:55.543 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2020first\model.params 2024-08-10 15:01:56.708 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-08-10 15:01:56.709 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:01:56.934 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:01:56.936 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:01:57.198 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:02:10.774 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.866477 valid_1's auc: 0.82393 2024-08-10 15:02:37.416 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.889361 valid_1's auc: 0.822261 2024-08-10 15:02:53.458 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [133] training's auc: 0.855343 valid_1's auc: 0.824802 2024-08-10 15:02:53.859 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2020second\model.params 2024-08-10 15:02:54.969 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-08-10 15:02:54.971 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:02:55.160 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:02:55.161 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:02:55.441 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:03:10.001 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86393 valid_1's auc: 0.822076 2024-08-10 15:03:38.653 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.886215 valid_1's auc: 0.820405 2024-08-10 15:03:41.101 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [20] training's auc: 0.842479 valid_1's auc: 0.822224 2024-08-10 15:03:41.235 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2021first\model.params 2024-08-10 15:03:42.247 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-08-10 15:03:42.248 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:03:42.446 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:03:42.447 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:03:42.705 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:03:58.465 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861851 valid_1's auc: 0.835742 2024-08-10 15:04:29.155 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.883583 valid_1's auc: 0.834695 2024-08-10 15:04:46.434 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [124] training's auc: 0.850631 valid_1's auc: 0.836322 2024-08-10 15:04:46.789 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2021second\model.params 2024-08-10 15:04:47.909 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-08-10 15:04:47.910 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:04:48.094 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:04:48.096 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:04:48.389 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:05:05.606 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861216 valid_1's auc: 0.826826 2024-08-10 15:05:39.098 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.882493 valid_1's auc: 0.825194 2024-08-10 15:05:52.470 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [88] training's auc: 0.847512 valid_1's auc: 0.827772 2024-08-10 15:05:52.728 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2022first\model.params 2024-08-10 15:05:53.764 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-08-10 15:05:53.765 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:05:53.994 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:05:53.996 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:05:54.294 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:06:11.673 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859738 valid_1's auc: 0.843091 2024-08-10 15:06:46.386 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.880705 valid_1's auc: 0.842285 2024-08-10 15:07:22.512 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [233] training's auc: 0.85823 valid_1's auc: 0.843208 2024-08-10 15:07:23.364 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2022second\model.params 2024-08-10 15:07:24.657 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-08-10 15:07:24.658 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:07:24.888 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:07:24.889 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:07:25.214 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:07:43.829 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859688 valid_1's auc: 0.839903 2024-08-10 15:08:20.570 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.8801 valid_1's auc: 0.838717 2024-08-10 15:08:56.642 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [221] training's auc: 0.856973 valid_1's auc: 0.840087 2024-08-10 15:08:57.454 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2023first\model.params 2024-08-10 15:08:58.741 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-08-10 15:08:58.742 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-10 15:08:58.962 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-10 15:08:58.963 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-10 15:08:59.285 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-10 15:09:18.716 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859039 valid_1's auc: 0.838319 2024-08-10 15:09:57.586 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.879004 valid_1's auc: 0.836402 2024-08-10 15:10:00.030 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [13] training's auc: 0.840206 valid_1's auc: 0.838479 2024-08-10 15:10:00.168 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\first_model\params\2023second\model.params
4-2-4.結果の確認¶
上記の学習では、学習データを使ってモデルの学習を行い、検証データで汎化性能を担保していた
本当のモデルの性能を測る上では、別で用意していたテストデータを推論した結果が必要になる
なので、ここではモデルの学習で使わなかったテストデータの推論を行う
テストデータの推論は以下のコードを実行するだけでできる
for dataset_dict in dataset_mapping.values():
lgbm_model_manager.load_model(dataset_dict.name)
lgbm_model_manager.predict(dataset_dict)
2024-08-10 15:51:55.406 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-08-10 15:51:55.469 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-08-10 15:51:56.554 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-08-10 15:51:56.590 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2019first 2024-08-10 15:51:59.518 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-08-10 15:51:59.639 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-08-10 15:52:02.309 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-08-10 15:52:02.355 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2019second 2024-08-10 15:52:05.419 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-08-10 15:52:05.515 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-08-10 15:52:07.600 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-08-10 15:52:07.636 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2020first 2024-08-10 15:52:10.760 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-08-10 15:52:10.830 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-08-10 15:52:12.680 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-08-10 15:52:12.734 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2020second 2024-08-10 15:52:16.109 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-08-10 15:52:16.161 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-08-10 15:52:17.602 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-08-10 15:52:17.659 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2021first 2024-08-10 15:52:21.303 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-08-10 15:52:21.363 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-08-10 15:52:23.450 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-08-10 15:52:23.505 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2021second 2024-08-10 15:52:27.362 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-08-10 15:52:27.423 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-08-10 15:52:29.321 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-08-10 15:52:29.381 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2022first 2024-08-10 15:52:33.474 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-08-10 15:52:33.624 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-08-10 15:52:37.295 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-08-10 15:52:37.356 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2022second 2024-08-10 15:52:41.289 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-08-10 15:52:41.411 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-08-10 15:52:45.080 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-08-10 15:52:45.142 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2023first 2024-08-10 15:52:49.316 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-08-10 15:52:49.367 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-08-10 15:52:51.391 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-08-10 15:52:51.462 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\00_predict\2023second
4-2-5.モデルエクスポート¶
モデルのエクスポートをするためには、モデルの成績を先に計算しておく必要がある
モデルの成績は以下を計算する
- 収支の計算
- 基礎統計の計算
- オッズグラフの計算
収支の計算¶
収支計算はメンテが出来てないので少し泥臭いが、以下のコードを実行して貰えればよい
現在のソースでは単勝の収支しか計算できない
bet_mode = BetName.tan
bet_column = lgbm_model_manager.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_mapping.values():
lgbm_model_manager.set_bet_column(dataset_dict, bet_mode)
_, dfbetva, dfbette = lgbm_model_manager.merge_dataframe_data(
dataset_mapping, mode=True)
dfbetva, dfbette = lgbm_model_manager.generate_profit_loss(
dfbetva, dfbette, bet_mode)
dfbette[["raceDate", "raceId", "label", "favorite", bet_column, pl_column]]
2024-08-10 16:18:08.425 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-08-10 16:18:08.425 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'} 2024-08-10 16:18:09.374 | INFO | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\tan\profit_loss 2024-08-10 16:18:09.376 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=profit_loss_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\first_model\\analyze\\tan\\profit_loss'}
raceDate | raceId | label | favorite | bet_tan | pl_tan | |
---|---|---|---|---|---|---|
896624 | 2019-01-05 | 201906010101 | 4 | 1 | 1 | -100.0 |
896642 | 2019-01-05 | 201906010102 | 3 | 1 | 1 | -100.0 |
896651 | 2019-01-05 | 201906010103 | 1 | 1 | 1 | 140.0 |
896670 | 2019-01-05 | 201906010104 | 2 | 1 | 1 | -100.0 |
896693 | 2019-01-05 | 201906010105 | 2 | 1 | 1 | -100.0 |
… | … | … | … | … | … | … |
1126007 | 2023-12-28 | 202309050908 | 1 | 1 | 1 | 170.0 |
1126030 | 2023-12-28 | 202309050909 | 1 | 1 | 1 | 250.0 |
1126031 | 2023-12-28 | 202309050910 | 4 | 1 | 1 | -100.0 |
1126051 | 2023-12-28 | 202309050911 | 2 | 1 | 1 | -100.0 |
1126061 | 2023-12-28 | 202309050912 | 3 | 1 | 1 | -100.0 |
16630 rows × 6 columns
基礎統計の計算¶
基礎統計では回収率と的中率および人気別のベット回数の集計を行う
以下のコードを実行するだけ
lgbm_model_manager.basic_analyze(dataset_mapping)
2024-08-10 16:18:17.410 | INFO | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze. 2024-08-10 16:18:17.901 | INFO | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary. 2024-08-10 16:18:17.909 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=return_hit_rate_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\first_model\\analyze\\tan\\hit_and_return_rate.csv'} 2024-08-10 16:18:17.910 | INFO | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary. 2024-08-10 16:18:17.922 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=fav_bet_num_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\first_model\\analyze\\tan\\fav_bet_num'}
オッズグラフの計算¶
オッズグラフの話は以下の動画を確認ください
モデルを評価する上で最も重要なことを話しています
dftrain, dfvalid, dftest = lgbm_model_manager.merge_dataframe_data(
dataset_mapping,
mode=True
)
summary_dict = lgbm_model_manager.gegnerate_odds_graph(
dftrain, dfvalid, dftest, bet_mode)
print("'test'データのオッズグラフを確認")
summary_dict["test"].fillna(0)
2024-08-10 16:18:23.029 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: e:\dev_um_ai\dev-um-ai\models\first_model\analyze\tan\odds_graph 2024-08-10 16:18:23.030 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=odds_graph_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\first_model\\analyze\\tan\\odds_graph'}
'test'データのオッズグラフを確認
勝率 | 支持率 | 回収率100%超 | weight | 件数 | |
---|---|---|---|---|---|
odds_round | |||||
1.25 | 65.805471 | 64.000000 | 80.000000 | 0.039567 | 658 |
1.75 | 46.336567 | 45.714286 | 57.142857 | 0.179735 | 2989 |
2.25 | 35.981967 | 35.555556 | 44.444444 | 0.213410 | 3549 |
2.75 | 29.793673 | 29.090909 | 36.363636 | 0.218581 | 3635 |
3.25 | 24.479568 | 24.615385 | 30.769231 | 0.155983 | 2594 |
3.75 | 23.122413 | 21.333333 | 26.666667 | 0.101684 | 1691 |
4.25 | 14.965986 | 18.823529 | 23.529412 | 0.044197 | 735 |
4.75 | 16.666667 | 16.842105 | 21.052632 | 0.023452 | 390 |
5.25 | 16.818182 | 15.238095 | 19.047619 | 0.013229 | 220 |
5.75 | 9.523810 | 13.913043 | 17.391304 | 0.005051 | 84 |
6.25 | 13.513514 | 12.800000 | 16.000000 | 0.002225 | 37 |
6.75 | 5.555556 | 11.851852 | 14.814815 | 0.001082 | 18 |
7.25 | 9.090909 | 11.034483 | 13.793103 | 0.000661 | 11 |
7.75 | 16.666667 | 10.322581 | 12.903226 | 0.000361 | 6 |
8.75 | 50.000000 | 9.142857 | 11.428571 | 0.000120 | 2 |
9.25 | 0.000000 | 8.648649 | 10.810811 | 0.000060 | 1 |
9.75 | 0.000000 | 8.205128 | 10.256410 | 0.000241 | 4 |
10.25 | 0.000000 | 7.804878 | 9.756098 | 0.000060 | 1 |
10.75 | 0.000000 | 7.441860 | 9.302326 | 0.000060 | 1 |
15.75 | 0.000000 | 5.079365 | 6.349206 | 0.000060 | 1 |
22.25 | 100.000000 | 3.595506 | 4.494382 | 0.000060 | 1 |
34.75 | 0.000000 | 2.302158 | 2.877698 | 0.000060 | 1 |
50.00 | 0.000000 | 1.600000 | 2.000000 | 0.000060 | 1 |
モデルのエクスポート¶
以下を実行することでモデルの成績をエクスポートできる
lgbm_model_manager.export_model_info()
2024-08-10 16:22:41.109 | INFO | src.model_manager.base_manager:export_model_info:848 - Export Model info json. export path: e:\dev_um_ai\dev-um-ai\models\first_model\model_info.json
4-3.性能の確認(WEBアプリ起動)¶
以下のコードを実行するとWEBアプリが起動します
! python ../app_keiba/manage.py makemigrations
! python ../app_keiba/manage.py migrate
! echo server launch OK
# ! python ../app_keiba/manage.py runserver 12345
No changes detected Operations to perform: Apply all migrations: admin, auth, contenttypes, model_analyzer, sessions Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying admin.0003_logentry_add_action_flag_choices... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying auth.0009_alter_user_last_name_max_length... OK Applying auth.0010_alter_group_name_max_length... OK Applying auth.0011_update_proxy_permissions... OK Applying auth.0012_alter_user_first_name_max_length... OK Applying model_analyzer.0001_initial... OK Applying model_analyzer.0002_alter_modellist_motivate... OK Applying sessions.0001_initial... OK server launch OK
「server launch OK」の表示がでたら以下のリンクをクリックしてWEBアプリへアクセス
モデルID | 支持率OGS | 回収率OGS | AonBOGS | |
---|---|---|---|---|
1 | first_model (baseline) |
0.41924 | -7.64492 |
4-4.芝とダートを分ける方法¶
芝とダートを分けて学習させるには、以下2つの考え方がある
- パターン1
- 検証データとテストデータのみ芝とダートでデータを分ける
パターン2すべてのデータセットで芝とダートにデータを分ける
特定の条件でデータを分ける場合、本体のソースに手を加える必要がある。
現状、効果が期待できない以上ソースを変えるのはリスクがあるので、ここではモデルを芝とダートで別々に作ってマージさせる方式をとる
4-5.芝,ダート別版モデルの学習¶
4-5-1.パターン1のモデルの作成¶
データを芝とダートに分ける¶
dataset_grass, dataset_dirt = {}, {}
for key, dataset in dataset_mapping.items():
idfv = dataset.valid
idfv_g, idfv_d = idfv[idfv["field"].isin(
["芝"])], idfv[~idfv["field"].isin(["芝"])]
idft = dataset.test
idft_g, idft_d = idft[idft["field"].isin(
["芝"])], idft[~idft["field"].isin(["芝"])]
# region 芝のデータセット
dataset_g = LightGBMDataset(key, dataset.train, idfv_g, idft_g)
dataset_g.train_dataset, dataset_g.valid_dataset, dataset_g.test_dataset = \
lgbm.Dataset(dataset.train[feature_columns], dataset.train[objective_column]), \
lgbm.Dataset(idfv_g[feature_columns], idfv_g[objective_column]), \
lgbm.Dataset(idft_g[feature_columns], idft_g[objective_column])
# endregion
# region ダートのデータセット
dataset_d = LightGBMDataset(key, dataset.train, idfv_d, idft_d)
dataset_d.train_dataset, dataset_d.valid_dataset, dataset_d.test_dataset = \
lgbm.Dataset(dataset.train[feature_columns], dataset.train[objective_column]), \
lgbm.Dataset(idfv_d[feature_columns], idfv_d[objective_column]), \
lgbm.Dataset(idft_d[feature_columns], idft_d[objective_column])
dataset_grass[key] = dataset_g
dataset_dirt[key] = dataset_d
# endregion
まずは芝のみのモデルを作成¶
学習実行¶
lgbm_model_manager = LightGBMModelManager(
# 芝のみのモデルフォルダ作成
root_dir / "models" / "model_only_grass",
split_year,
target_year,
end_year
)
lgbm_model_manager.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager.topN = 1
lgbm_model_manager.train_all(
params,
dataset_grass,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-08-05 04:15:40.343 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:15:40.345 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_only_grass 2024-08-05 04:15:40.347 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass 2024-08-05 04:15:40.348 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_only_grass 2024-08-05 04:15:40.353 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze 2024-08-05 04:15:40.354 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict 2024-08-05 04:15:40.355 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:15:40.356 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:15:40.359 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:15:40.360 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-08-05 04:15:40.361 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-08-05 04:15:40.366 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-08-05 04:15:40.367 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-08-05 04:15:40.368 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-08-05 04:15:40.369 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-08-05 04:15:40.370 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-08-05 04:15:40.371 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-08-05 04:15:40.373 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-08-05 04:15:40.374 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-08-05 04:15:40.375 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-08-05 04:15:40.375 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-08-05 04:15:40.376 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-08-05 04:15:40.377 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:15:40.566 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:15:40.568 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:15:40.777 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:15:51.188 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.871807 valid_1's auc: 0.846729 2024-08-05 04:16:11.179 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.89677 valid_1's auc: 0.846496 2024-08-05 04:16:12.590 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [15] training's auc: 0.843835 valid_1's auc: 0.847674 2024-08-05 04:16:12.701 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2019first\model.params 2024-08-05 04:16:13.657 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-08-05 04:16:13.658 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:16:13.821 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:16:13.823 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:16:14.023 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:16:25.823 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.870639 valid_1's auc: 0.814135 2024-08-05 04:16:48.232 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.894912 valid_1's auc: 0.812338 2024-08-05 04:16:49.919 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [16] training's auc: 0.84428 valid_1's auc: 0.815285 2024-08-05 04:16:50.027 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2019second\model.params 2024-08-05 04:16:50.979 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-08-05 04:16:50.980 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:16:51.174 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:16:51.175 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:16:51.388 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:17:04.015 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86771 valid_1's auc: 0.837084 2024-08-05 04:17:28.689 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.891366 valid_1's auc: 0.833318 2024-08-05 04:17:52.496 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [216] training's auc: 0.864218 valid_1's auc: 0.837541 2024-08-05 04:17:53.126 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2020first\model.params 2024-08-05 04:17:54.311 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-08-05 04:17:54.312 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:17:54.502 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:17:54.503 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:17:54.727 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:18:08.198 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.866477 valid_1's auc: 0.809263 2024-08-05 04:18:35.366 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.889361 valid_1's auc: 0.806705 2024-08-05 04:18:44.266 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [71] training's auc: 0.8496 valid_1's auc: 0.810884 2024-08-05 04:18:44.466 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2020second\model.params 2024-08-05 04:18:45.475 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-08-05 04:18:45.477 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:18:45.687 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:18:45.688 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:18:45.925 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:19:00.435 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86393 valid_1's auc: 0.825407 2024-08-05 04:19:29.590 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.886215 valid_1's auc: 0.823367 2024-08-05 04:20:01.442 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.900037 valid_1's auc: 0.821308 2024-08-05 04:20:01.830 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [253] training's auc: 0.864237 valid_1's auc: 0.825461 2024-08-05 04:20:02.629 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2021first\model.params 2024-08-05 04:20:03.838 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-08-05 04:20:03.839 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:20:04.039 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:20:04.041 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:20:04.287 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:20:19.879 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861851 valid_1's auc: 0.825318 2024-08-05 04:20:50.933 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.883583 valid_1's auc: 0.82441 2024-08-05 04:21:12.691 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [154] training's auc: 0.853197 valid_1's auc: 0.82568 2024-08-05 04:21:13.167 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2021second\model.params 2024-08-05 04:21:14.238 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-08-05 04:21:14.240 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:21:14.451 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:21:14.453 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:21:14.709 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:21:31.799 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861216 valid_1's auc: 0.832859 2024-08-05 04:22:06.034 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.882493 valid_1's auc: 0.831195 2024-08-05 04:22:08.678 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [17] training's auc: 0.840881 valid_1's auc: 0.833574 2024-08-05 04:22:08.804 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2022first\model.params 2024-08-05 04:22:09.805 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-08-05 04:22:09.807 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:22:10.019 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:22:10.021 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:22:10.278 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:22:27.523 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859738 valid_1's auc: 0.834225 2024-08-05 04:23:02.911 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.880705 valid_1's auc: 0.832829 2024-08-05 04:23:40.013 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [233] training's auc: 0.85823 valid_1's auc: 0.834375 2024-08-05 04:23:40.755 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2022second\model.params 2024-08-05 04:23:41.986 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-08-05 04:23:41.987 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:23:42.202 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:23:42.203 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:23:42.510 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:24:00.995 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859688 valid_1's auc: 0.829658 2024-08-05 04:24:36.009 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.8801 valid_1's auc: 0.828485 2024-08-05 04:25:08.297 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [221] training's auc: 0.856973 valid_1's auc: 0.830037 2024-08-05 04:25:09.070 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2023first\model.params 2024-08-05 04:25:10.297 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-08-05 04:25:10.297 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:25:10.486 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:25:10.487 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:25:10.736 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:25:29.769 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859039 valid_1's auc: 0.827814 2024-08-05 04:26:08.071 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.879004 valid_1's auc: 0.82535 2024-08-05 04:26:10.269 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [12] training's auc: 0.839905 valid_1's auc: 0.828929 2024-08-05 04:26:10.375 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\params\2023second\model.params
推論実行¶
for dataset_dict in dataset_grass.values():
lgbm_model_manager.load_model(dataset_dict.name)
lgbm_model_manager.predict(dataset_dict)
bet_mode = BetName.tan
bet_column = lgbm_model_manager.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_grass.values():
lgbm_model_manager.set_bet_column(dataset_dict, bet_mode)
_, dfbetva_grass, dfbette_grass = lgbm_model_manager.merge_dataframe_data(
dataset_grass, mode=True)
2024-08-05 04:26:12.308 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-08-05 04:26:12.358 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-08-05 04:26:13.246 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-08-05 04:26:13.305 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2019first 2024-08-05 04:26:15.896 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-08-05 04:26:15.955 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-08-05 04:26:16.903 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-08-05 04:26:16.966 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2019second 2024-08-05 04:26:19.707 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-08-05 04:26:19.805 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-08-05 04:26:22.068 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-08-05 04:26:22.122 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2020first 2024-08-05 04:26:25.033 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-08-05 04:26:25.089 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-08-05 04:26:26.453 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-08-05 04:26:26.520 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2020second 2024-08-05 04:26:29.562 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-08-05 04:26:29.671 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-08-05 04:26:32.719 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-08-05 04:26:32.787 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2021first 2024-08-05 04:26:35.955 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-08-05 04:26:36.031 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-08-05 04:26:38.239 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-08-05 04:26:38.309 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2021second 2024-08-05 04:26:41.596 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-08-05 04:26:41.658 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-08-05 04:26:43.192 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-08-05 04:26:43.272 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2022first 2024-08-05 04:26:46.737 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-08-05 04:26:46.844 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-08-05 04:26:50.430 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-08-05 04:26:50.502 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2022second 2024-08-05 04:26:54.167 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-08-05 04:26:54.300 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-08-05 04:26:57.941 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-08-05 04:26:58.016 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2023first 2024-08-05 04:27:01.781 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-08-05 04:27:01.833 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-08-05 04:27:03.679 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-08-05 04:27:03.761 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass\analyze\00_predict\2023second 2024-08-05 04:27:06.731 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-08-05 04:27:06.732 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'}
ダートのみのモデル作成¶
学習実行¶
lgbm_model_manager_dirt = LightGBMModelManager(
# ダートのみのモデルフォルダ作成
root_dir / "models" / "model_only_dirt",
split_year,
target_year,
end_year
)
lgbm_model_manager_dirt.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager_dirt.topN = 1
lgbm_model_manager_dirt.train_all(
params,
dataset_dirt,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-08-05 04:27:07.338 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:27:07.340 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_only_dirt 2024-08-05 04:27:07.341 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt 2024-08-05 04:27:07.343 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt 2024-08-05 04:27:07.347 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze 2024-08-05 04:27:07.347 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict 2024-08-05 04:27:07.348 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:27:07.349 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:27:07.350 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:27:07.351 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-08-05 04:27:07.352 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-08-05 04:27:07.356 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-08-05 04:27:07.356 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-08-05 04:27:07.357 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-08-05 04:27:07.358 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-08-05 04:27:07.359 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-08-05 04:27:07.360 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-08-05 04:27:07.361 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-08-05 04:27:07.362 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-08-05 04:27:07.362 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-08-05 04:27:07.363 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-08-05 04:27:07.364 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-08-05 04:27:07.365 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:27:07.543 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:27:07.544 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:27:07.763 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:27:17.996 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.871807 valid_1's auc: 0.833088 2024-08-05 04:27:37.923 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.89677 valid_1's auc: 0.832282 2024-08-05 04:27:59.627 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.912683 valid_1's auc: 0.829816 2024-08-05 04:28:01.259 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [270] training's auc: 0.874199 valid_1's auc: 0.833208 2024-08-05 04:28:02.048 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2019first\model.params 2024-08-05 04:28:03.282 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-08-05 04:28:03.283 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:28:03.472 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:28:03.474 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:28:03.674 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:28:15.664 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.870639 valid_1's auc: 0.824531 2024-08-05 04:28:37.797 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.894912 valid_1's auc: 0.824518 2024-08-05 04:29:01.931 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.909803 valid_1's auc: 0.823077 2024-08-05 04:29:19.679 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [441] training's auc: 0.890461 valid_1's auc: 0.824739 2024-08-05 04:29:20.959 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2019second\model.params 2024-08-05 04:29:22.505 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-08-05 04:29:22.506 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:29:22.669 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:29:22.671 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:29:22.887 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:29:35.552 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86771 valid_1's auc: 0.82787 2024-08-05 04:30:00.386 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.891366 valid_1's auc: 0.826831 2024-08-05 04:30:13.713 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [116] training's auc: 0.854617 valid_1's auc: 0.82877 2024-08-05 04:30:14.020 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2020first\model.params 2024-08-05 04:30:15.051 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-08-05 04:30:15.052 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:30:15.240 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:30:15.242 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:30:15.467 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:30:29.156 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.866477 valid_1's auc: 0.835707 2024-08-05 04:30:56.858 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.889361 valid_1's auc: 0.834774 2024-08-05 04:31:13.608 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [133] training's auc: 0.855343 valid_1's auc: 0.836302 2024-08-05 04:31:14.000 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2020second\model.params 2024-08-05 04:31:15.048 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-08-05 04:31:15.049 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:31:15.238 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:31:15.240 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:31:15.480 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:31:29.984 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.86393 valid_1's auc: 0.817968 2024-08-05 04:31:59.341 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.886215 valid_1's auc: 0.816825 2024-08-05 04:32:01.727 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [17] training's auc: 0.841838 valid_1's auc: 0.818268 2024-08-05 04:32:01.824 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2021first\model.params 2024-08-05 04:32:02.779 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-08-05 04:32:02.780 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:32:02.977 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:32:02.979 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:32:03.228 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:32:18.864 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861851 valid_1's auc: 0.843667 2024-08-05 04:32:50.088 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.883583 valid_1's auc: 0.84249 2024-08-05 04:32:54.977 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [33] training's auc: 0.84323 valid_1's auc: 0.844655 2024-08-05 04:32:55.111 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2021second\model.params 2024-08-05 04:32:56.088 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-08-05 04:32:56.090 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:32:56.321 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:32:56.323 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:32:56.579 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:33:13.852 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861216 valid_1's auc: 0.81993 2024-08-05 04:33:48.301 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.882493 valid_1's auc: 0.818373 2024-08-05 04:34:03.216 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [94] training's auc: 0.847917 valid_1's auc: 0.821375 2024-08-05 04:34:03.469 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2022first\model.params 2024-08-05 04:34:04.785 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-08-05 04:34:04.787 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:34:05.003 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:34:05.004 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:34:05.273 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:34:22.654 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859738 valid_1's auc: 0.849977 2024-08-05 04:34:58.308 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.880705 valid_1's auc: 0.84974 2024-08-05 04:35:38.268 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.893643 valid_1's auc: 0.849005 2024-08-05 04:35:44.163 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [291] training's auc: 0.863579 valid_1's auc: 0.850116 2024-08-05 04:35:45.268 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2022second\model.params 2024-08-05 04:35:46.640 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-08-05 04:35:46.640 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:35:46.842 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:35:46.843 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:35:47.154 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:36:05.385 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859688 valid_1's auc: 0.851099 2024-08-05 04:36:42.627 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.8801 valid_1's auc: 0.849899 2024-08-05 04:37:24.301 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [245] training's auc: 0.859216 valid_1's auc: 0.85114 2024-08-05 04:37:25.156 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2023first\model.params 2024-08-05 04:37:26.438 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-08-05 04:37:26.439 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:37:26.646 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:37:26.648 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:37:26.925 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:37:46.250 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.859039 valid_1's auc: 0.84669 2024-08-05 04:38:25.838 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.879004 valid_1's auc: 0.845218 2024-08-05 04:39:11.058 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.891391 valid_1's auc: 0.843779 2024-08-05 04:39:13.360 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [263] training's auc: 0.860106 valid_1's auc: 0.846709 2024-08-05 04:39:14.268 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\params\2023second\model.params
推論実行¶
for dataset_dict in dataset_dirt.values():
lgbm_model_manager_dirt.load_model(dataset_dict.name)
lgbm_model_manager_dirt.predict(dataset_dict)
bet_mode = BetName.tan
bet_column = lgbm_model_manager_dirt.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager_dirt.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_dirt.values():
lgbm_model_manager_dirt.set_bet_column(dataset_dict, bet_mode)
_, dfbetva_dirt, dfbette_dirt = lgbm_model_manager_dirt.merge_dataframe_data(
dataset_dirt, mode=True)
2024-08-05 04:39:16.467 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-08-05 04:39:16.618 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-08-05 04:39:18.714 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-08-05 04:39:18.772 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2019first 2024-08-05 04:39:21.337 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-08-05 04:39:21.538 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-08-05 04:39:26.255 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-08-05 04:39:26.318 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2019second 2024-08-05 04:39:29.012 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-08-05 04:39:29.106 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-08-05 04:39:30.531 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-08-05 04:39:30.599 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2020first 2024-08-05 04:39:33.384 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-08-05 04:39:33.478 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-08-05 04:39:35.158 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-08-05 04:39:35.225 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2020second 2024-08-05 04:39:38.319 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-08-05 04:39:38.381 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-08-05 04:39:39.625 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-08-05 04:39:39.699 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2021first 2024-08-05 04:39:42.912 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-08-05 04:39:42.973 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-08-05 04:39:44.509 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-08-05 04:39:44.585 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2021second 2024-08-05 04:39:47.867 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-08-05 04:39:47.951 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-08-05 04:39:49.721 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-08-05 04:39:49.796 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2022first 2024-08-05 04:39:53.267 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-08-05 04:39:53.436 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-08-05 04:39:57.637 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-08-05 04:39:57.719 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2022second 2024-08-05 04:40:01.432 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-08-05 04:40:01.586 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-08-05 04:40:05.336 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-08-05 04:40:05.415 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2023first 2024-08-05 04:40:09.129 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-08-05 04:40:09.283 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-08-05 04:40:13.574 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-08-05 04:40:13.665 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt\analyze\00_predict\2023second 2024-08-05 04:40:16.606 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-08-05 04:40:16.607 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'}
モデル情報のエクスポート¶
ここが大変でWEBアプリでインポート出来るように、芝とダートの結果をマージする必要がある
lgbm_model_manager_merge = LightGBMModelManager(
# マージ用のモデルフォルダを作成
root_dir / "models" / "model_field_div_part1",
split_year,
target_year,
end_year
)
lgbm_model_manager_merge.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager_merge.topN = 1
lgbm_model_manager_merge.expt_info_map.bet_columns_map = {"tan": "bet_tan"}
lgbm_model_manager_merge.expt_info_map.pl_column_map = {"tan": "pl_tan"}
lgbm_model_manager_merge.expt_info_map.confidence_column = "pred_prob"
lgbm_model_manager_merge.expt_info_map.confidence_rank_column = "pred_rank"
2024-08-05 04:40:17.260 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:40:17.261 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_grass_dirt_part1 2024-08-05 04:40:17.263 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1 2024-08-05 04:40:17.265 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1 2024-08-05 04:40:17.271 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1\analyze 2024-08-05 04:40:17.273 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1\analyze\00_predict 2024-08-05 04:40:17.274 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:40:17.275 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:40:17.277 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:40:17.278 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1
収支の計算
import pandas as pd
dfbetva, dfbette = \
pd.concat(
[dfbetva_dirt, dfbetva_grass]
).sort_values(["raceDate", "raceId"], ignore_index=True), \
pd.concat(
[dfbette_dirt, dfbette_grass]
).sort_values(["raceDate", "raceId"], ignore_index=True)
dfbetva, dfbette = lgbm_model_manager_merge.generate_profit_loss(
dfbetva, dfbette, bet_mode)
2024-08-05 04:40:17.873 | INFO | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1\analyze\tan\profit_loss 2024-08-05 04:40:17.875 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=profit_loss_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part1\\analyze\\tan\\profit_loss'}
基礎情報の計算
dataset_merge = {}
for (key, dset_g), dset_d in zip(dataset_grass.items(), dataset_dirt):
dset_m = LightGBMDataset(key, None, None, None)
dset_m.pred_train = pd.concat(
[dset_g.pred_train, dset_g.pred_train]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dset_m.pred_test = pd.concat(
[dset_g.pred_test, dset_g.pred_test]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dset_m.pred_valid = pd.concat(
[dset_g.pred_valid, dset_g.pred_valid]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dataset_merge[key] = dset_m
lgbm_model_manager_merge.model_name_list = lgbm_model_manager.model_name_list
lgbm_model_manager_merge.basic_analyze(dataset_merge)
2024-08-05 04:40:20.941 | INFO | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze. 2024-08-05 04:40:21.600 | INFO | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary. 2024-08-05 04:40:21.610 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=return_hit_rate_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part1\\analyze\\tan\\hit_and_return_rate.csv'} 2024-08-05 04:40:21.612 | INFO | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary. 2024-08-05 04:40:21.627 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=fav_bet_num_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part1\\analyze\\tan\\fav_bet_num'}
オッズグラフの作成
dft_g, dfv_g, dfte_g = lgbm_model_manager.merge_dataframe_data(
dataset_grass,
mode=True
)
dft_d, dfv_d, dfte_d = lgbm_model_manager_dirt.merge_dataframe_data(
dataset_dirt,
mode=True
)
dft_m, dfv_m, dfte_m = \
pd.concat([dft_g, dft_d]), pd.concat(
[dfv_g, dfv_d]), pd.concat([dfte_g, dfte_d])
summary_dict = lgbm_model_manager_merge.gegnerate_odds_graph(
dft_m, dfv_m, dfte_m, bet_mode)
2024-08-05 04:40:23.771 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1\analyze\tan\odds_graph 2024-08-05 04:40:23.773 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=odds_graph_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part1\\analyze\\tan\\odds_graph'}
モデル情報のエクスポート
lgbm_model_manager_merge.export_model_info()
2024-08-05 04:40:23.814 | INFO | src.model_manager.base_manager:export_model_info:848 - Export Model info json. export path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part1\model_info.json
4-5-2.パターン2のモデル作成¶
データを芝とダートに分ける¶
dataset_grass, dataset_dirt = {}, {}
for key, dataset in dataset_mapping.items():
idftr = dataset.train
idftr_g, idftr_d = idftr[idftr["field"].isin(
["芝"])], idftr[~idftr["field"].isin(["芝"])]
idfv = dataset.valid
idfv_g, idfv_d = idfv[idfv["field"].isin(
["芝"])], idfv[~idfv["field"].isin(["芝"])]
idft = dataset.test
idft_g, idft_d = idft[idft["field"].isin(
["芝"])], idft[~idft["field"].isin(["芝"])]
# region 芝のデータセット
dataset_g = LightGBMDataset(key, idftr_g, idfv_g, idft_g)
dataset_g.train_dataset, dataset_g.valid_dataset, dataset_g.test_dataset = \
lgbm.Dataset(idftr_g[feature_columns], idftr_g[objective_column]), \
lgbm.Dataset(idfv_g[feature_columns], idfv_g[objective_column]), \
lgbm.Dataset(idft_g[feature_columns], idft_g[objective_column])
# endregion
# region ダートのデータセット
dataset_d = LightGBMDataset(key, dataset.train, idfv_d, idft_d)
dataset_d.train_dataset, dataset_d.valid_dataset, dataset_d.test_dataset = \
lgbm.Dataset(idftr_d[feature_columns], idftr_d[objective_column]), \
lgbm.Dataset(idfv_d[feature_columns], idfv_d[objective_column]), \
lgbm.Dataset(idft_d[feature_columns], idft_d[objective_column])
dataset_grass[key] = dataset_g
dataset_dirt[key] = dataset_d
# endregion
まずは芝のみのモデルを作成¶
学習実行¶
lgbm_model_manager = LightGBMModelManager(
# 芝のみのモデルフォルダ作成
root_dir / "models" / "model_only_grass2",
split_year,
target_year,
end_year
)
lgbm_model_manager.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager.topN = 1
lgbm_model_manager.train_all(
params,
dataset_grass,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-08-05 04:40:27.691 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:40:27.693 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_only_grass2 2024-08-05 04:40:27.694 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass2 2024-08-05 04:40:27.695 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2 2024-08-05 04:40:27.699 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze 2024-08-05 04:40:27.699 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict 2024-08-05 04:40:27.701 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:40:27.702 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:40:27.705 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:40:27.706 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-08-05 04:40:27.707 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-08-05 04:40:27.718 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-08-05 04:40:27.719 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-08-05 04:40:27.720 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-08-05 04:40:27.722 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-08-05 04:40:27.723 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-08-05 04:40:27.724 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-08-05 04:40:27.725 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-08-05 04:40:27.726 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-08-05 04:40:27.727 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-08-05 04:40:27.729 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-08-05 04:40:27.730 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-08-05 04:40:27.732 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:40:27.846 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:40:27.848 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:40:27.974 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:40:33.296 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.892565 valid_1's auc: 0.843136 2024-08-05 04:40:42.059 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.924031 valid_1's auc: 0.84129 2024-08-05 04:40:42.744 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [18] training's auc: 0.853563 valid_1's auc: 0.844379 2024-08-05 04:40:42.839 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2019first\model.params 2024-08-05 04:40:43.820 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-08-05 04:40:43.821 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:40:43.937 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:40:43.939 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:40:44.073 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:40:49.800 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.8896 valid_1's auc: 0.809824 2024-08-05 04:40:59.331 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.919221 valid_1's auc: 0.807434 2024-08-05 04:41:01.670 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [57] training's auc: 0.861228 valid_1's auc: 0.813443 2024-08-05 04:41:01.814 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2019second\model.params 2024-08-05 04:41:02.789 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-08-05 04:41:02.790 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:41:02.906 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:41:02.908 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:41:03.055 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:41:09.359 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.886358 valid_1's auc: 0.833312 2024-08-05 04:41:20.143 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.915381 valid_1's auc: 0.829913 2024-08-05 04:41:22.641 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [55] training's auc: 0.858773 valid_1's auc: 0.834047 2024-08-05 04:41:22.792 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2020first\model.params 2024-08-05 04:41:23.761 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-08-05 04:41:23.762 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:41:23.889 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:41:23.890 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:41:24.028 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:41:30.955 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.884546 valid_1's auc: 0.807105 2024-08-05 04:41:43.111 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.912097 valid_1's auc: 0.803467 2024-08-05 04:41:48.789 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [110] training's auc: 0.865402 valid_1's auc: 0.808919 2024-08-05 04:41:49.078 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2020second\model.params 2024-08-05 04:41:50.119 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-08-05 04:41:50.120 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:41:50.232 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:41:50.233 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:41:50.397 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:41:57.801 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.881131 valid_1's auc: 0.821904 2024-08-05 04:42:10.714 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.909083 valid_1's auc: 0.819465 2024-08-05 04:42:11.873 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [20] training's auc: 0.849044 valid_1's auc: 0.823702 2024-08-05 04:42:12.028 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2021first\model.params 2024-08-05 04:42:12.986 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-08-05 04:42:12.988 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:42:13.158 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:42:13.160 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:42:13.323 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:42:21.193 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.87889 valid_1's auc: 0.824324 2024-08-05 04:42:35.249 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.906096 valid_1's auc: 0.822413 2024-08-05 04:42:36.818 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [25] training's auc: 0.849061 valid_1's auc: 0.825301 2024-08-05 04:42:36.915 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2021second\model.params 2024-08-05 04:42:37.873 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-08-05 04:42:37.875 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:42:38.017 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:42:38.019 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:42:38.184 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:42:46.663 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.87727 valid_1's auc: 0.830538 2024-08-05 04:43:02.141 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.903892 valid_1's auc: 0.82801 2024-08-05 04:43:06.585 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [67] training's auc: 0.854664 valid_1's auc: 0.832685 2024-08-05 04:43:06.763 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2022first\model.params 2024-08-05 04:43:07.747 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-08-05 04:43:07.748 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:43:07.912 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:43:07.913 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:43:08.090 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:43:17.035 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.875465 valid_1's auc: 0.831728 2024-08-05 04:43:33.706 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.90205 valid_1's auc: 0.829298 2024-08-05 04:43:34.023 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [4] training's auc: 0.840408 valid_1's auc: 0.833176 2024-08-05 04:43:34.137 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2022second\model.params 2024-08-05 04:43:35.088 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-08-05 04:43:35.090 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:43:35.295 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:43:35.297 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:43:35.480 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:43:44.918 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.874735 valid_1's auc: 0.827236 2024-08-05 04:44:02.301 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.900642 valid_1's auc: 0.824765 2024-08-05 04:44:07.800 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [72] training's auc: 0.853329 valid_1's auc: 0.828611 2024-08-05 04:44:08.021 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2023first\model.params 2024-08-05 04:44:09.018 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-08-05 04:44:09.019 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:44:09.179 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:44:09.181 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:44:09.369 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:44:19.292 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.872935 valid_1's auc: 0.825762 2024-08-05 04:44:37.925 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.899011 valid_1's auc: 0.823072 2024-08-05 04:44:40.339 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [29] training's auc: 0.846909 valid_1's auc: 0.828874 2024-08-05 04:44:40.442 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\params\2023second\model.params
推論実行¶
for dataset_dict in dataset_grass.values():
lgbm_model_manager.load_model(dataset_dict.name)
lgbm_model_manager.predict(dataset_dict)
bet_mode = BetName.tan
bet_column = lgbm_model_manager.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_grass.values():
lgbm_model_manager.set_bet_column(dataset_dict, bet_mode)
_, dfbetva_grass, dfbette_grass = lgbm_model_manager.merge_dataframe_data(
dataset_grass, mode=True)
2024-08-05 04:44:42.359 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-08-05 04:44:42.432 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-08-05 04:44:42.939 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-08-05 04:44:42.973 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2019first 2024-08-05 04:44:44.782 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-08-05 04:44:44.860 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-08-05 04:44:45.437 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-08-05 04:44:45.473 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2019second 2024-08-05 04:44:47.360 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-08-05 04:44:47.430 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-08-05 04:44:48.026 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-08-05 04:44:48.065 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2020first 2024-08-05 04:44:50.056 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-08-05 04:44:50.146 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-08-05 04:44:50.951 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-08-05 04:44:50.993 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2020second 2024-08-05 04:44:52.983 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-08-05 04:44:53.044 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-08-05 04:44:53.686 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-08-05 04:44:53.732 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2021first 2024-08-05 04:44:55.831 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-08-05 04:44:55.904 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-08-05 04:44:56.588 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-08-05 04:44:56.636 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2021second 2024-08-05 04:44:58.825 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-08-05 04:44:58.901 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-08-05 04:44:59.676 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-08-05 04:44:59.725 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2022first 2024-08-05 04:45:01.905 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-08-05 04:45:01.954 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-08-05 04:45:02.686 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-08-05 04:45:02.739 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2022second 2024-08-05 04:45:05.062 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-08-05 04:45:05.141 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-08-05 04:45:06.069 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-08-05 04:45:06.125 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2023first 2024-08-05 04:45:08.534 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-08-05 04:45:08.592 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-08-05 04:45:09.452 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-08-05 04:45:09.512 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_grass2\analyze\00_predict\2023second 2024-08-05 04:45:11.028 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-08-05 04:45:11.029 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'}
ダートのみのモデル作成¶
学習実行¶
lgbm_model_manager_dirt = LightGBMModelManager(
# ダートのみのモデルフォルダ作成
root_dir / "models" / "model_only_dirt2",
split_year,
target_year,
end_year
)
lgbm_model_manager_dirt.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager_dirt.topN = 1
lgbm_model_manager_dirt.train_all(
params,
dataset_dirt,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-08-05 04:45:11.415 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:45:11.417 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_only_dirt2 2024-08-05 04:45:11.418 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt2 2024-08-05 04:45:11.419 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2 2024-08-05 04:45:11.424 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze 2024-08-05 04:45:11.425 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict 2024-08-05 04:45:11.426 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:45:11.427 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:45:11.431 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:45:11.432 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-08-05 04:45:11.433 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-08-05 04:45:11.440 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-08-05 04:45:11.441 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-08-05 04:45:11.442 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-08-05 04:45:11.443 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-08-05 04:45:11.444 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-08-05 04:45:11.445 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-08-05 04:45:11.446 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-08-05 04:45:11.447 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-08-05 04:45:11.449 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-08-05 04:45:11.450 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-08-05 04:45:11.451 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-08-05 04:45:11.452 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:45:11.547 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:45:11.549 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:45:11.674 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:45:16.833 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.892287 valid_1's auc: 0.827398 2024-08-05 04:45:26.096 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.924317 valid_1's auc: 0.824905 2024-08-05 04:45:26.932 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [22] training's auc: 0.853636 valid_1's auc: 0.8305 2024-08-05 04:45:27.021 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2019first\model.params 2024-08-05 04:45:27.977 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-08-05 04:45:27.978 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:45:28.121 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:45:28.122 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:45:28.258 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:45:34.009 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.889208 valid_1's auc: 0.820366 2024-08-05 04:45:44.360 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.920663 valid_1's auc: 0.818507 2024-08-05 04:45:45.422 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [25] training's auc: 0.854107 valid_1's auc: 0.823117 2024-08-05 04:45:45.546 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2019second\model.params 2024-08-05 04:45:46.523 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-08-05 04:45:46.525 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:45:46.691 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:45:46.693 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:45:46.834 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:45:53.226 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.884969 valid_1's auc: 0.825236 2024-08-05 04:46:04.998 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.915289 valid_1's auc: 0.822479 2024-08-05 04:46:06.446 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [30] training's auc: 0.852515 valid_1's auc: 0.827618 2024-08-05 04:46:06.542 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2020first\model.params 2024-08-05 04:46:07.500 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-08-05 04:46:07.501 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:46:07.629 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:46:07.631 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:46:07.778 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:46:14.617 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.883667 valid_1's auc: 0.832062 2024-08-05 04:46:27.364 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.912794 valid_1's auc: 0.830017 2024-08-05 04:46:29.057 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [31] training's auc: 0.852393 valid_1's auc: 0.834074 2024-08-05 04:46:29.176 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2020second\model.params 2024-08-05 04:46:30.138 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-08-05 04:46:30.139 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:46:30.321 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:46:30.323 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:46:30.476 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:46:37.870 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.881116 valid_1's auc: 0.814129 2024-08-05 04:46:51.973 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.909153 valid_1's auc: 0.812765 2024-08-05 04:46:59.715 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [132] training's auc: 0.865583 valid_1's auc: 0.814922 2024-08-05 04:47:00.035 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2021first\model.params 2024-08-05 04:47:01.105 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-08-05 04:47:01.107 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:47:01.239 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:47:01.241 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:47:01.402 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:47:09.257 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.879119 valid_1's auc: 0.840663 2024-08-05 04:47:24.520 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.906463 valid_1's auc: 0.837274 2024-08-05 04:47:27.283 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [43] training's auc: 0.852409 valid_1's auc: 0.842359 2024-08-05 04:47:27.383 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2021second\model.params 2024-08-05 04:47:28.353 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-08-05 04:47:28.354 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:47:28.524 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:47:28.525 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:47:28.693 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:47:37.070 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.876844 valid_1's auc: 0.816646 2024-08-05 04:47:53.754 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.903971 valid_1's auc: 0.814472 2024-08-05 04:47:56.347 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [35] training's auc: 0.850723 valid_1's auc: 0.8197 2024-08-05 04:47:56.482 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2022first\model.params 2024-08-05 04:47:57.450 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-08-05 04:47:57.451 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:47:57.624 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:47:57.625 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:47:57.803 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:48:06.762 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.875403 valid_1's auc: 0.848327 2024-08-05 04:48:24.699 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.902316 valid_1's auc: 0.846846 2024-08-05 04:48:36.450 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [159] training's auc: 0.864476 valid_1's auc: 0.848732 2024-08-05 04:48:36.901 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2022second\model.params 2024-08-05 04:48:38.005 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-08-05 04:48:38.006 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:48:38.182 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:48:38.184 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:48:38.367 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:48:47.998 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.874736 valid_1's auc: 0.849187 2024-08-05 04:49:07.635 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.900896 valid_1's auc: 0.847178 2024-08-05 04:49:17.504 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [115] training's auc: 0.859439 valid_1's auc: 0.849603 2024-08-05 04:49:17.794 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2023first\model.params 2024-08-05 04:49:18.839 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-08-05 04:49:18.841 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-08-05 04:49:19.033 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-08-05 04:49:19.035 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-08-05 04:49:19.220 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-08-05 04:49:29.269 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.874521 valid_1's auc: 0.842834 2024-08-05 04:49:49.601 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.900463 valid_1's auc: 0.840593 2024-08-05 04:49:50.647 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [11] training's auc: 0.845526 valid_1's auc: 0.844109 2024-08-05 04:49:50.770 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\params\2023second\model.params
推論実行¶
for dataset_dict in dataset_dirt.values():
lgbm_model_manager_dirt.load_model(dataset_dict.name)
lgbm_model_manager_dirt.predict(dataset_dict)
bet_mode = BetName.tan
bet_column = lgbm_model_manager_dirt.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager_dirt.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_dirt.values():
lgbm_model_manager_dirt.set_bet_column(dataset_dict, bet_mode)
_, dfbetva_dirt, dfbette_dirt = lgbm_model_manager_dirt.merge_dataframe_data(
dataset_dirt, mode=True)
2024-08-05 04:49:52.687 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-08-05 04:49:52.756 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-08-05 04:49:53.605 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-08-05 04:49:53.667 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2019first 2024-08-05 04:49:56.429 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-08-05 04:49:56.510 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-08-05 04:49:57.515 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-08-05 04:49:57.578 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2019second 2024-08-05 04:50:00.346 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-08-05 04:50:00.420 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-08-05 04:50:01.517 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-08-05 04:50:01.583 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2020first 2024-08-05 04:50:04.372 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-08-05 04:50:04.424 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-08-05 04:50:05.614 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-08-05 04:50:05.691 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2020second 2024-08-05 04:50:08.954 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-08-05 04:50:09.038 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-08-05 04:50:10.874 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-08-05 04:50:10.946 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2021first 2024-08-05 04:50:14.144 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-08-05 04:50:14.199 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-08-05 04:50:15.646 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-08-05 04:50:15.723 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2021second 2024-08-05 04:50:18.976 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-08-05 04:50:19.027 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-08-05 04:50:20.616 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-08-05 04:50:20.685 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2022first 2024-08-05 04:50:24.192 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-08-05 04:50:24.285 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-08-05 04:50:26.743 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-08-05 04:50:26.826 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2022second 2024-08-05 04:50:30.574 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-08-05 04:50:30.656 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-08-05 04:50:32.865 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-08-05 04:50:32.940 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2023first 2024-08-05 04:50:36.891 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-08-05 04:50:36.961 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-08-05 04:50:38.869 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-08-05 04:50:38.956 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: e:\dev_um_ai\dev-um-ai\models\model_only_dirt2\analyze\00_predict\2023second 2024-08-05 04:50:42.097 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-08-05 04:50:42.098 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'}
モデル情報のエクスポート¶
ここが大変でWEBアプリでインポート出来るように、芝とダートの結果をマージする必要がある
lgbm_model_manager_merge = LightGBMModelManager(
# マージ用のモデルフォルダを作成
root_dir / "models" / "model_field_div_part2",
split_year,
target_year,
end_year
)
lgbm_model_manager_merge.set_feature_and_objective_columns(
feature_columns, objective_column)
lgbm_model_manager_merge.topN = 1
lgbm_model_manager_merge.expt_info_map.bet_columns_map = {"tan": "bet_tan"}
lgbm_model_manager_merge.expt_info_map.pl_column_map = {"tan": "pl_tan"}
lgbm_model_manager_merge.expt_info_map.confidence_column = "pred_prob"
lgbm_model_manager_merge.expt_info_map.confidence_rank_column = "pred_rank"
2024-08-05 04:50:42.721 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-08-05 04:50:42.722 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=model_grass_dirt_part2 2024-08-05 04:50:42.724 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2 2024-08-05 04:50:42.725 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2 2024-08-05 04:50:42.728 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2\analyze 2024-08-05 04:50:42.729 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2\analyze\00_predict 2024-08-05 04:50:42.731 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-08-05 04:50:42.732 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-08-05 04:50:42.734 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'raceGrade', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-08-05 04:50:42.735 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1
収支の計算
dfbetva, dfbette = \
pd.concat(
[dfbetva_dirt, dfbetva_grass]
).sort_values(["raceDate", "raceId"], ignore_index=True), \
pd.concat(
[dfbette_dirt, dfbette_grass]
).sort_values(["raceDate", "raceId"], ignore_index=True)
dfbetva, dfbette = lgbm_model_manager_merge.generate_profit_loss(
dfbetva, dfbette, bet_mode)
2024-08-05 04:50:43.316 | INFO | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2\analyze\tan\profit_loss 2024-08-05 04:50:43.318 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=profit_loss_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part2\\analyze\\tan\\profit_loss'}
基礎情報の計算
dataset_merge = {}
for (key, dset_g), dset_d in zip(dataset_grass.items(), dataset_dirt):
dset_m = LightGBMDataset(key, None, None, None)
dset_m.pred_train = pd.concat(
[dset_g.pred_train, dset_g.pred_train]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dset_m.pred_test = pd.concat(
[dset_g.pred_test, dset_g.pred_test]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dset_m.pred_valid = pd.concat(
[dset_g.pred_valid, dset_g.pred_valid]).sort_values(
["raceDate", "raceId"], ignore_index=True)
dataset_merge[key] = dset_m
lgbm_model_manager_merge.model_name_list = lgbm_model_manager.model_name_list
lgbm_model_manager_merge.basic_analyze(dataset_merge)
2024-08-05 04:50:45.275 | INFO | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze. 2024-08-05 04:50:45.716 | INFO | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary. 2024-08-05 04:50:45.727 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=return_hit_rate_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part2\\analyze\\tan\\hit_and_return_rate.csv'} 2024-08-05 04:50:45.728 | INFO | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary. 2024-08-05 04:50:45.756 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=fav_bet_num_dir, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part2\\analyze\\tan\\fav_bet_num'}
オッズグラフの作成
dft_g, dfv_g, dfte_g = lgbm_model_manager.merge_dataframe_data(
dataset_grass,
mode=True
)
dft_d, dfv_d, dfte_d = lgbm_model_manager_dirt.merge_dataframe_data(
dataset_dirt,
mode=True
)
dft_m, dfv_m, dfte_m = \
pd.concat([dft_g, dft_d]), pd.concat(
[dfv_g, dfv_d]), pd.concat([dfte_g, dfte_d])
summary_dict = lgbm_model_manager_merge.gegnerate_odds_graph(
dft_m, dfv_m, dfte_m, bet_mode)
2024-08-05 04:50:47.856 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2\analyze\tan\odds_graph 2024-08-05 04:50:47.856 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=odds_graph_file, val={'tan': 'e:\\dev_um_ai\\dev-um-ai\\models\\model_grass_dirt_part2\\analyze\\tan\\odds_graph'}
モデル情報のエクスポート
lgbm_model_manager_merge.export_model_info()
2024-08-05 04:50:47.880 | INFO | src.model_manager.base_manager:export_model_info:848 - Export Model info json. export path: e:\dev_um_ai\dev-um-ai\models\model_grass_dirt_part2\model_info.json
4-5-3.性能の確認(WEBアプリ起動)¶
! python ../app_keiba/manage.py makemigrations
! python ../app_keiba/manage.py migrate
! echo server launch OK
# ! python ../app_keiba/manage.py runserver 12345
No changes detected Operations to perform: Apply all migrations: admin, auth, contenttypes, model_analyzer, sessions Running migrations: No migrations to apply. server launch OK
「server launch OK」の表示がでたら以下のリンクをクリックしてWEBアプリへアクセス
停止する場合は、セルの中断ボタンを押下
モデルID | 支持率OGS | 回収率OGS | AonBOGS | |
---|---|---|---|---|
1 | model_field_div_part2 | 0.56458 | -7.49332 | 0.15654 |
2 | model_field_div_part1 | 0.38790 | -7.67658 | -0.02672 |
3 | first_model (baseline) |
0.41924 | -7.64492 |
4-6.結論¶
結論: 芝とダートでモデルは分けるべきかも
結果から、AonB OGSでみるとパート2で作成したモデル、つまり芝とダートで完全に分けて学習したモデルが最も性能が良い。
また、特徴的なのがパート1で作成したモデル、つまり検証データとテストデータで芝とダートでデータを分けた場合の方が、ファーストモデルよりも性能が低い結果になっている。
つまりダート(/芝)のレースを推論する上で学習データには芝(/ダート)レースの情報が不要であることを意味している。
とはいえ前走データをまだ考慮できていないこともあり、結論を出すのは時期早々感が否めない。
正直、芝とダートでモデルを分けて学習する場合、既存ソースの改造がえげつないので判断を慎重にしたい
しかし、結果を見るとファーストモデルと比較してAonB OGSで0.15654の改善がされている。
これを無視するのはやや無理がある印象。
そのため、ゆくゆくは芝とダートでモデルを分けて学習する方式をとるようにソースを改善する。
もしかしたら、競馬場別、距離カテゴリ別と細分化していく方がモデルの性能は上がっていくのかもしれない。
そのような芝とダートのみならず汎用的なモデル分割ができるように、改善を行うタイミングは血統と前走データを考慮したモデルを作成してから再度判断することとする。
その際には、今回のNotebookをもう少し改善し、馬場別、競馬場別、距離カテゴリ別、馬場×競馬場別、馬場×競馬場×距離カテゴリ別など様々な組み合わせに対しても集計できるようにしておく。
コメント