はじめに
本シリーズでは、動画で解説したモデル管理分析クラスの使い方のサンプルコードを載せます。
ロードマップ2で決定したモデル作成時の前提一覧については以下のページを参照ください。
競馬予想プログラムソフト開発の制作過程動画リスト
モデルの評価方法
以下のページでモデルの評価方法について解説しています。
詳しく理解したい方は参照ください。
モデル管理分析クラスのソースコード
今回クラス化した目的は、今後作成予定の競馬予想プログラムソフトを開発する上でモデルの読み込み、推論結果の取得、基礎分析、オッズグラフの作成、収支の確認といった一連の決まった処理をクラスで管理することで、インスタンスを作成してソフトからモデルの分析結果を参照しやすくするためです。
そのため、これまで競馬のデータをスクレイピングするコードおよびデータの前処理を行うクラスのコードの2つを公開したため、それに倣いこのモデル管理分析するクラスも同様に第3弾としてソースをBookersへ公開します。
Bookersリンクはこちら↓
これまでBookersで公開したコードの一覧はこちらにまとめています↓
モデル管理分析クラスのサンプルコードのNotebook
以下、今回作成したモデル管理分析クラスの使い方を記したサンプルコードをNotebookにしました
実際の実行結果を載せてます。使い方が分からなくなったら参考にしてください。
モデル分析・管理ツールの動作確認 ver 0.2¶
変更歴¶
ver 0.1 → 0.2
- モデルエクスポート機能を追加
0.必要なモジュールのインポート¶
一部有料ソースを含んでいます。同じ条件で競馬予想プログラムソフトを開発したい方は、宜しければソースをお買い求めください。
公開ソース一覧:https://keiba-ds-lab.com/bookers-article-lists/
import warnings
import sys
sys.path.append(".")
sys.path.append("..")
from src.data_manager.preprocess_tools import DataPreProcessor # noqa
from src.data_manager.data_loader import DataLoader # noqa
from src.core.meta.bet_name_meta import BetName
from src.model_manager.lgbm_manager import LightGBMModelManager
warnings.filterwarnings("ignore")
start_year = 2010 # DBが持つ最古の年を指定
split_year = 2014 # 学習対象期間の開始年を指定
target_year = 2018 # テスト対象期間の開始年を指定
end_year = 2023 # テスト対象期間の終了年を指定 (当然DBに対象年のデータがあること)
# 各種インスタンスの作成
data_loader = DataLoader(
start_year,
end_year,
dbpath = "E:/keiba_dev/keiba_ai/data/keibadata.db" # dbpathは各種環境に合わせてパスを指定してください。絶対パス推奨
)
dataPreP = DataPreProcessor()
lgbm_model_manager = LightGBMModelManager(
"E:/keiba_dev/keiba_ai/models/first_model", # modelsディレクトリ配下に作成したいモデル名のフォルダパスを指定。フォルダパスは絶対パスにすると安全です。
split_year,
target_year,
end_year
)
2024-04-17 23:22:49.893 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-04-17 23:22:49.894 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=first_model 2024-04-17 23:22:49.895 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=E:\keiba_dev\keiba_ai\models\first_model 2024-04-17 23:22:49.896 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: E:\keiba_dev\keiba_ai\models\first_model 2024-04-17 23:22:49.899 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=E:\keiba_dev\keiba_ai\models\first_model\analyze 2024-04-17 23:22:49.900 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict 2024-04-17 23:22:49.901 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-04-17 23:22:49.901 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank
1.競馬データのロードと前処理の実行¶
ロードマップ3-1で作成した前処理モジュールを使って、競馬データを加工する。
df = data_loader.load_racedata()
df = dataPreP.exec_pipeline(df)
2024-04-17 23:22:49.921 | INFO | src.data_manager.data_loader:load_racedata:23 - Get Year Range: 2010 -> 2023. 2024-04-17 23:22:49.922 | INFO | src.data_manager.data_loader:load_racedata:24 - Loading Race Info ... 2024-04-17 23:22:51.000 | INFO | src.data_manager.data_loader:load_racedata:26 - Loading Race Data ... 2024-04-17 23:23:01.749 | INFO | src.data_manager.data_loader:load_racedata:28 - Merging Race Info and Race Data ...
2024-04-17 23:23:03.193 | INFO | src.data_manager.preprocess_tools:__0_check_use_save_checkpoints:34 - Start PreProcess #0 ... 2024-04-17 23:23:03.195 | INFO | src.data_manager.preprocess_tools:__1_exec_all_sub_prep1:37 - Start PreProcess #1 ... 2024-04-17 23:23:07.309 | INFO | src.data_manager.preprocess_tools:__2_exec_all_sub_prep2:39 - Start PreProcess #2 ... 2024-04-17 23:23:11.643 | INFO | src.data_manager.preprocess_tools:__3_convert_type_str_to_number:41 - Start PreProcess #3 ... 2024-04-17 23:23:14.347 | INFO | src.data_manager.preprocess_tools:__4_drop_or_fillin_none_data:43 - Start PreProcess #4 ... 2024-04-17 23:23:16.548 | INFO | src.data_manager.preprocess_tools:__5_exec_all_sub_prep5:45 - Start PreProcess #5 ... 2024-04-17 23:23:30.601 | INFO | src.data_manager.preprocess_tools:__6_convert_label_to_rate_info:47 - Start PreProcess #6 ... 2024-04-17 23:23:41.948 | INFO | src.data_manager.preprocess_tools:__7_convert_distance_to_smile:49 - Start PreProcess #7 ... 2024-04-17 23:23:42.105 | INFO | src.data_manager.preprocess_tools:__8_category_encoding:51 - Start PreProcess #8 ...
2.説明変数、目的変数のセット¶
lgbm_model_manager.set_feature_and_objective_columns
メソッドを用いて、説明変数と目的変数を登録
lgbm_model_manager.add_objective_column_to_df
メソッドを用いて、目的変数をDataFrameへ登録
# 説明変数
feature_columns = [
'distance',
'number',
'boxNum',
'odds',
'favorite',
'age',
'jweight',
'weight',
'gl',
'race_span',
] + dataPreP.encoding_columns
# 目的変数
objective_column = "label_in3"
lgbm_model_manager.set_feature_and_objective_columns(feature_columns, objective_column)
df = lgbm_model_manager.add_objective_column_to_df(df, "label", 3)
2024-04-17 23:23:45.414 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-04-17 23:23:45.416 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in3 2024-04-17 23:23:45.417 | INFO | src.model_manager.lgbm_manager:add_objective_column_to_df:80 - make objective data. label_in3. topN: 3
3.データセットの作成¶
lgbm_model_manager.make_dataset_mapping
メソッドを用いて、データセットのマッピングを作成
lgbm_model_manager.setup_dataset
メソッドを用いて、LightGBMのモデル作成用のデータセットを作成
dataset_mapping = lgbm_model_manager.make_dataset_mapping(df)
dataset_mapping = lgbm_model_manager.setup_dataset(dataset_mapping)
2024-04-17 23:23:45.458 | INFO | src.data_manager.dataset_tools:make_dataset_mapping:103 - Generate dataset mapping. Year Range: 2018 -> 2023 2024-04-17 23:23:48.408 | INFO | src.model_manager.lgbm_manager:setup_dataset:110 - Create LightGBM Dataset.
4.モデルの作成¶
lgbm_model_manager.train_all
メソッドを用いて、モデルの学習
# ここも競馬予想プログラムソフトでは、クラス化させる
# lightGBM用のモデルパラメータ
# パラメータ自体は適当にする。あまりこだわっても泥沼になるので
params = {
'boosting_type': 'gbdt',
# 二値分類
'objective': 'binary',
'metric': 'auc',
'verbose': 0,
'seed': 77777,
'learning_rate': 0.01,
"n_estimators": 10000,
# "device_type": "gpu"
}
lgbm_model_manager.train_all(
params,
dataset_mapping,
stopping_rounds=500, # ここで指定した値を超えるまでは、early stopさせない
val_num=250 # ログを出力するスパン
)
2024-04-17 23:23:48.694 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-04-17 23:23:48.704 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-04-17 23:23:48.706 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-04-17 23:23:48.706 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-04-17 23:23:48.707 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-04-17 23:23:48.708 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-04-17 23:23:48.709 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-04-17 23:23:48.710 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-04-17 23:23:48.711 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-04-17 23:23:48.712 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-04-17 23:23:48.713 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-04-17 23:23:48.716 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2018first 2024-04-17 23:23:48.723 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:23:48.947 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:23:48.950 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:23:49.134 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:23:57.728 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.836305 valid_1's auc: 0.814343 2024-04-17 23:24:13.730 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.853341 valid_1's auc: 0.81094 2024-04-17 23:24:20.778 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [98] training's auc: 0.824781 valid_1's auc: 0.815309 2024-04-17 23:24:20.868 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2018first\model.params 2024-04-17 23:24:21.637 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2018second 2024-04-17 23:24:21.640 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:24:21.818 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:24:21.820 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:24:22.004 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:24:31.316 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.835273 valid_1's auc: 0.82158 2024-04-17 23:24:49.114 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.851262 valid_1's auc: 0.818537 2024-04-17 23:24:53.950 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [60] training's auc: 0.821836 valid_1's auc: 0.822222 2024-04-17 23:24:54.050 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2018second\model.params 2024-04-17 23:24:54.680 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-04-17 23:24:54.682 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:24:54.878 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:24:54.880 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:24:55.073 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:25:05.288 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.834499 valid_1's auc: 0.816296 2024-04-17 23:25:25.566 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.849992 valid_1's auc: 0.814961 2024-04-17 23:25:38.330 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [137] training's auc: 0.826974 valid_1's auc: 0.816743 2024-04-17 23:25:38.467 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2019first\model.params 2024-04-17 23:25:39.190 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-04-17 23:25:39.191 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:25:39.386 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:25:39.388 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:25:39.597 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:25:51.176 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.833594 valid_1's auc: 0.816957 2024-04-17 23:26:12.451 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.848205 valid_1's auc: 0.814346 2024-04-17 23:26:22.673 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [100] training's auc: 0.824327 valid_1's auc: 0.817385 2024-04-17 23:26:22.802 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2019second\model.params 2024-04-17 23:26:23.437 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-04-17 23:26:23.438 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:26:23.651 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:26:23.653 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:26:23.867 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:26:36.336 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.832645 valid_1's auc: 0.808625 2024-04-17 23:27:01.111 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.846424 valid_1's auc: 0.80635 2024-04-17 23:27:13.967 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [111] training's auc: 0.824523 valid_1's auc: 0.808954 2024-04-17 23:27:14.110 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2020first\model.params 2024-04-17 23:27:14.773 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-04-17 23:27:14.774 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:27:14.951 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:27:14.953 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:27:15.182 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:27:29.018 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.831374 valid_1's auc: 0.805466 2024-04-17 23:27:55.421 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.844969 valid_1's auc: 0.803805 2024-04-17 23:28:01.462 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [48] training's auc: 0.820561 valid_1's auc: 0.805655 2024-04-17 23:28:01.542 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2020second\model.params 2024-04-17 23:28:02.150 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-04-17 23:28:02.152 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:28:02.371 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:28:02.373 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:28:02.600 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:28:17.373 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.8299 valid_1's auc: 0.806191 2024-04-17 23:28:45.665 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.843182 valid_1's auc: 0.804131 2024-04-17 23:29:10.152 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [185] training's auc: 0.8262 valid_1's auc: 0.806393 2024-04-17 23:29:10.470 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2021first\model.params 2024-04-17 23:29:11.192 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-04-17 23:29:11.195 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:29:11.396 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:29:11.399 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:29:11.633 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:29:27.232 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.828814 valid_1's auc: 0.816412 2024-04-17 23:29:58.057 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.841652 valid_1's auc: 0.814492 2024-04-17 23:30:32.958 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.84921 valid_1's auc: 0.812417 2024-04-17 23:30:35.440 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [269] training's auc: 0.829938 valid_1's auc: 0.816484 2024-04-17 23:30:36.018 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2021second\model.params 2024-04-17 23:30:36.837 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-04-17 23:30:36.838 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:30:37.043 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:30:37.045 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:30:37.295 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:30:53.491 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.828343 valid_1's auc: 0.809561 2024-04-17 23:31:26.061 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.840578 valid_1's auc: 0.807362 2024-04-17 23:31:52.130 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [174] training's auc: 0.824298 valid_1's auc: 0.810015 2024-04-17 23:31:52.467 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2022first\model.params 2024-04-17 23:31:53.252 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-04-17 23:31:53.254 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:31:53.448 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:31:53.451 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:31:53.719 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:32:11.300 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.827511 valid_1's auc: 0.818719 2024-04-17 23:32:46.265 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.839455 valid_1's auc: 0.817531 2024-04-17 23:33:20.930 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [216] training's auc: 0.82566 valid_1's auc: 0.818836 2024-04-17 23:33:21.382 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2022second\model.params 2024-04-17 23:33:22.156 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-04-17 23:33:22.158 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:33:22.389 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:33:22.391 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:33:22.675 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:33:41.153 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.827143 valid_1's auc: 0.815338 2024-04-17 23:34:18.014 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.838694 valid_1's auc: 0.814349 2024-04-17 23:34:45.463 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [159] training's auc: 0.822708 valid_1's auc: 0.815399 2024-04-17 23:34:45.756 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2023first\model.params 2024-04-17 23:34:46.461 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-04-17 23:34:46.462 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:34:46.663 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:34:46.664 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:34:46.933 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:35:06.078 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.826685 valid_1's auc: 0.821159 2024-04-17 23:35:44.858 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.837975 valid_1's auc: 0.819708 2024-04-17 23:36:16.322 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [174] training's auc: 0.823088 valid_1's auc: 0.821376 2024-04-17 23:36:16.679 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: E:\keiba_dev\keiba_ai\models\first_model\params\2023second\model.params
5.データセットの推論¶
lgbm_model_manager.load_model
メソッドを用いて、学習済モデルを有効化
lgbm_model_manager.predict
メソッドを用いて、対象のデータセットを推論
for dataset_dict in dataset_mapping.values():
lgbm_model_manager.load_model(dataset_dict.name)
lgbm_model_manager.predict(dataset_dict)
2024-04-17 23:36:18.100 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2018first 2024-04-17 23:36:18.135 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2018first 2024-04-17 23:36:19.055 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2018first 2024-04-17 23:36:19.093 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2018first 2024-04-17 23:36:21.437 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2018second 2024-04-17 23:36:21.472 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2018second 2024-04-17 23:36:22.351 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2018second 2024-04-17 23:36:22.395 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2018second 2024-04-17 23:36:24.982 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-04-17 23:36:25.026 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-04-17 23:36:26.331 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-04-17 23:36:26.372 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2019first 2024-04-17 23:36:29.085 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-04-17 23:36:29.133 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-04-17 23:36:30.349 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-04-17 23:36:30.398 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2019second 2024-04-17 23:36:32.978 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-04-17 23:36:33.020 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-04-17 23:36:34.431 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-04-17 23:36:34.482 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2020first 2024-04-17 23:36:37.437 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-04-17 23:36:37.470 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-04-17 23:36:38.864 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-04-17 23:36:38.922 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2020second 2024-04-17 23:36:42.055 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-04-17 23:36:42.106 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-04-17 23:36:44.452 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-04-17 23:36:44.502 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2021first 2024-04-17 23:36:47.721 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-04-17 23:36:47.819 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-04-17 23:36:51.284 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-04-17 23:36:51.343 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2021second 2024-04-17 23:36:54.805 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-04-17 23:36:54.859 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-04-17 23:36:57.312 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-04-17 23:36:57.369 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2022first 2024-04-17 23:37:01.066 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-04-17 23:37:01.152 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-04-17 23:37:04.259 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-04-17 23:37:04.322 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2022second 2024-04-17 23:37:08.185 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-04-17 23:37:08.229 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-04-17 23:37:11.051 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-04-17 23:37:11.108 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2023first 2024-04-17 23:37:15.158 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-04-17 23:37:15.217 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-04-17 23:37:18.077 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-04-17 23:37:18.139 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2023second
6.推論済データのロード¶
lgbm_model_manager.load_predict_result
メソッドを用いて、指定したデータセットに推論済データをセット
for dataset_dict in dataset_mapping.values():
lgbm_model_manager.load_predict_result(dataset_dict)
2024-04-17 23:37:21.756 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2018first 2024-04-17 23:37:22.252 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2018second 2024-04-17 23:37:22.741 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2019first 2024-04-17 23:37:23.299 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2019second 2024-04-17 23:37:23.832 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2020first 2024-04-17 23:37:24.460 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2020second 2024-04-17 23:37:25.095 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2021first 2024-04-17 23:37:25.795 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2021second 2024-04-17 23:37:26.644 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2022first 2024-04-17 23:37:27.537 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2022second 2024-04-17 23:37:28.418 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2023first 2024-04-17 23:37:29.386 | INFO | src.model_manager.base_manager:load_predict_result:442 - Loading predict data. data path: E:\keiba_dev\keiba_ai\models\first_model\analyze\00_predict\2023second
7.収支グラフの作成¶
lgbm_model_manager.set_bet_column
メソッドを用いて、指定したデータセットにベットフラグを追加する
lgbm_model_manager.merge_dataframe_data
メソッドを用いて、作成対象のDataFrameの作成
lgbm_model_manager.generate_profit_loss
メソッドを用いて、収支グラフの原データを生成
bet_mode = BetName.tan
bet_column = lgbm_model_manager.get_bet_column(bet_mode=bet_mode)
pl_column = lgbm_model_manager.get_profit_loss_column(bet_mode=bet_mode)
for dataset_dict in dataset_mapping.values():
lgbm_model_manager.set_bet_column(dataset_dict, bet_mode)
_, dfbetva, dfbette = lgbm_model_manager.merge_dataframe_data(dataset_mapping, mode=True)
dfbetva, dfbette = lgbm_model_manager.generate_profit_loss(dfbetva, dfbette, bet_mode)
dfbette[["raceDate", "raceId", "label", "favorite", bet_column, pl_column]]
2024-04-17 23:37:30.426 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-04-17 23:37:30.427 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'} 2024-04-17 23:37:31.387 | INFO | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: E:\keiba_dev\keiba_ai\models\first_model\analyze\tan\profit_loss 2024-04-17 23:37:31.388 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=profit_loss_dir, val={'tan': 'E:\\keiba_dev\\keiba_ai\\models\\first_model\\analyze\\tan\\profit_loss'}
raceDate | raceId | label | favorite | bet_tan | pl_tan | |
---|---|---|---|---|---|---|
14 | 2018-01-06 | 201806010101 | 1 | 1 | 1 | 100.0 |
28 | 2018-01-06 | 201806010102 | 6 | 1 | 1 | -100.0 |
33 | 2018-01-06 | 201806010103 | 2 | 1 | 1 | -100.0 |
52 | 2018-01-06 | 201806010104 | 2 | 1 | 1 | -100.0 |
71 | 2018-01-06 | 201806010105 | 1 | 1 | 1 | 40.0 |
… | … | … | … | … | … | … |
22408 | 2023-12-28 | 202309050908 | 1 | 1 | 1 | 170.0 |
22431 | 2023-12-28 | 202309050909 | 1 | 1 | 1 | 250.0 |
22432 | 2023-12-28 | 202309050910 | 4 | 1 | 1 | -100.0 |
22452 | 2023-12-28 | 202309050911 | 2 | 1 | 1 | -100.0 |
22462 | 2023-12-28 | 202309050912 | 3 | 1 | 1 | -100.0 |
19957 rows × 6 columns
8.基礎分析と結果の保存¶
lgbm_model_manager.basic_analyze
メソッドを用いて、回収率と的中率の統計および人気別のベット回数の集計を行い結果を保存する
lgbm_model_manager.basic_analyze(dataset_mapping)
2024-04-17 23:37:31.457 | INFO | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze. 2024-04-17 23:37:31.749 | INFO | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary. 2024-04-17 23:37:31.755 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=return_hit_rate_file, val={'tan': 'E:\\keiba_dev\\keiba_ai\\models\\first_model\\analyze\\tan\\hit_and_return_rate.csv'} 2024-04-17 23:37:31.757 | INFO | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary. 2024-04-17 23:37:31.785 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=fav_bet_num_dir, val={'tan': 'E:\\keiba_dev\\keiba_ai\\models\\first_model\\analyze\\tan\\fav_bet_num'}
9.オッズグラフの作成¶
lgbm_model_manager.merge_dataframe_data
メソッドを用いて、作成対象のDataFrameの作成
lgbm_model_manager.gegnerate_odds_graph
メソッドを用いて、オッズグラフの原データを生成
dftrain, dfvalid, dftest = lgbm_model_manager.merge_dataframe_data(
dataset_mapping,
mode=True
)
summary_dict = lgbm_model_manager.gegnerate_odds_graph(dftrain, dfvalid, dftest, bet_mode)
print("'test'データのオッズグラフを確認")
summary_dict["test"]
2024-04-17 23:37:32.854 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: E:\keiba_dev\keiba_ai\models\first_model\analyze\tan\odds_graph 2024-04-17 23:37:32.855 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=odds_graph_file, val={'tan': 'E:\\keiba_dev\\keiba_ai\\models\\first_model\\analyze\\tan\\odds_graph'}
'test'データのオッズグラフを確認
勝率 | 支持率 | 回収率100%超 | weight | 件数 | |
---|---|---|---|---|---|
odds_round | |||||
1.25 | 65.356265 | 64.000000 | 80.000000 | 0.040788 | 814 |
1.75 | 45.913860 | 45.714286 | 57.142857 | 0.181490 | 3622 |
2.25 | 36.124067 | 35.555556 | 44.444444 | 0.214862 | 4288 |
2.75 | 29.582287 | 29.090909 | 36.363636 | 0.219522 | 4381 |
3.25 | 24.166402 | 24.615385 | 30.769231 | 0.157789 | 3149 |
3.75 | 21.748179 | 21.333333 | 26.666667 | 0.096307 | 1922 |
4.25 | 16.524909 | 18.823529 | 23.529412 | 0.041239 | 823 |
4.75 | 17.168142 | 16.842105 | 21.052632 | 0.028311 | 565 |
5.25 | 11.956522 | 15.238095 | 19.047619 | 0.013830 | 276 |
5.75 | 14.084507 | 13.913043 | 17.391304 | 0.003558 | 71 |
6.25 | 3.571429 | 12.800000 | 16.000000 | 0.001403 | 28 |
6.75 | NaN | 11.851852 | 14.814815 | 0.000451 | 9 |
7.25 | 50.000000 | 11.034483 | 13.793103 | 0.000200 | 4 |
7.75 | NaN | 10.322581 | 12.903226 | 0.000050 | 1 |
8.25 | NaN | 9.696970 | 12.121212 | 0.000100 | 2 |
8.75 | 50.000000 | 9.142857 | 11.428571 | 0.000100 | 2 |
10.仮のセカンドモデルの作成¶
オッズグラフのスコア算出と他モデルと比較するために、
2つ目のモデルを仮に作成する
params = {
'boosting_type': 'gbdt',
# 二値分類
'objective': 'binary',
'metric': 'auc',
'verbose': 0,
'seed': 77777,
'learning_rate': 0.01,
"n_estimators": 10000,
# "device_type": "gpu"
}
# 説明変数は変えない
target_feature_columns = [
'distance',
'number',
'boxNum',
'odds',
'favorite',
'age',
'jweight',
'weight',
'gl',
'race_span',
] + dataPreP.encoding_columns
# 目的変数として1着以内を予測するモデルを作る
target_objective_column = "label_in1"
# モデル管理クラスのインスタンス作成
target_lgbm_model_manager = LightGBMModelManager("../models/first_model_label1", 2010, 2018, 2023)
# 目的変数の作成
target_lgbm_model_manager.set_feature_and_objective_columns(target_feature_columns, target_objective_column)
df = target_lgbm_model_manager.add_objective_column_to_df(df, "label", 1)
# データセットの作成
target_dataset_mapping = target_lgbm_model_manager.make_dataset_mapping(df)
target_dataset_mapping = target_lgbm_model_manager.setup_dataset(target_dataset_mapping)
# モデルの作成 (models/first_model_label1のパスにモデルパラメータが保存される)
target_lgbm_model_manager.train_all(
params,
target_dataset_mapping.copy(),
stopping_rounds=500,
val_num=250
)
for target_dataset_dict in target_dataset_mapping.values():
target_lgbm_model_manager.load_model(target_dataset_dict.name)
target_lgbm_model_manager.predict(target_dataset_dict)
# for dataset_dict in target_dataset_mapping.values():
# target_lgbm_model_manager.load_predict_result(dataset_dict)
2024-04-17 23:37:32.901 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_type, val=lightGBM 2024-04-17 23:37:32.903 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_id, val=first_model_label1 2024-04-17 23:37:32.905 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_dir, val=e:\keiba_dev\keiba_ai\notebook\..\models\first_model_label1 2024-04-17 23:37:32.906 | INFO | src.model_manager.base_manager:__init__:43 - make directory. path: ..\models\first_model_label1 2024-04-17 23:37:32.909 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_analyze_dir, val=e:\keiba_dev\keiba_ai\notebook\..\models\first_model_label1\analyze 2024-04-17 23:37:32.910 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=model_predict_dir, val=e:\keiba_dev\keiba_ai\notebook\..\models\first_model_label1\analyze\00_predict 2024-04-17 23:37:32.911 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_column, val=pred_prob 2024-04-17 23:37:32.912 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=confidence_rank_column, val=pred_rank 2024-04-17 23:37:32.913 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:77 - Set Feature columns. ['distance', 'number', 'boxNum', 'odds', 'favorite', 'age', 'jweight', 'weight', 'gl', 'race_span', 'place_en', 'field_en', 'sex_en', 'condition_en', 'jockeyId_en', 'teacherId_en', 'dist_cat_en', 'horseId_en'] 2024-04-17 23:37:32.914 | INFO | src.data_manager.dataset_tools:set_feature_and_objective_columns:79 - Set Objective columns. label_in1 2024-04-17 23:37:32.915 | INFO | src.model_manager.lgbm_manager:add_objective_column_to_df:80 - make objective data. label_in1. topN: 1 2024-04-17 23:37:32.926 | INFO | src.data_manager.dataset_tools:make_dataset_mapping:103 - Generate dataset mapping. Year Range: 2018 -> 2023 2024-04-17 23:37:37.631 | INFO | src.model_manager.lgbm_manager:setup_dataset:110 - Create LightGBM Dataset. 2024-04-17 23:37:38.015 | INFO | src.model_manager.lgbm_manager:save_root_model_info:281 - Save model params and dataset columns 2024-04-17 23:37:38.020 | INFO | src.model_manager.lgbm_manager:train_all:262 - Training Start! 2024-04-17 23:37:38.021 | INFO | src.model_manager.lgbm_manager:train_all:263 - ================== train params ======================== 2024-04-17 23:37:38.022 | INFO | src.model_manager.lgbm_manager:train_all:266 - boosting_type = gbdt 2024-04-17 23:37:38.024 | INFO | src.model_manager.lgbm_manager:train_all:266 - objective = binary 2024-04-17 23:37:38.025 | INFO | src.model_manager.lgbm_manager:train_all:266 - metric = auc 2024-04-17 23:37:38.026 | INFO | src.model_manager.lgbm_manager:train_all:266 - verbose = 0 2024-04-17 23:37:38.026 | INFO | src.model_manager.lgbm_manager:train_all:266 - seed = 77777 2024-04-17 23:37:38.027 | INFO | src.model_manager.lgbm_manager:train_all:266 - learning_rate = 0.01 2024-04-17 23:37:38.028 | INFO | src.model_manager.lgbm_manager:train_all:266 - n_estimators = 10000 2024-04-17 23:37:38.028 | INFO | src.model_manager.lgbm_manager:train_all:267 - ========================================================== 2024-04-17 23:37:38.029 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2018first 2024-04-17 23:37:38.030 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:37:38.222 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:37:38.223 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:37:38.510 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:37:55.928 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.862091 valid_1's auc: 0.836297 2024-04-17 23:38:28.228 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.882241 valid_1's auc: 0.834686 2024-04-17 23:38:42.858 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [96] training's auc: 0.848662 valid_1's auc: 0.83707 2024-04-17 23:38:43.014 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2018first\model.params 2024-04-17 23:38:43.652 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2018second 2024-04-17 23:38:43.654 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:38:43.839 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:38:43.842 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:38:44.102 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:39:02.463 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.861462 valid_1's auc: 0.835734 2024-04-17 23:39:36.854 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.881338 valid_1's auc: 0.833685 2024-04-17 23:40:15.417 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [237] training's auc: 0.860302 valid_1's auc: 0.835782 2024-04-17 23:40:15.850 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2018second\model.params 2024-04-17 23:40:16.598 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019first 2024-04-17 23:40:16.600 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:40:16.785 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:40:16.787 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:40:17.084 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:40:36.658 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.860475 valid_1's auc: 0.841981 2024-04-17 23:41:12.771 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.8799 valid_1's auc: 0.841053 2024-04-17 23:41:28.978 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [96] training's auc: 0.847828 valid_1's auc: 0.842271 2024-04-17 23:41:29.146 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2019first\model.params 2024-04-17 23:41:29.817 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2019second 2024-04-17 23:41:29.819 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:41:30.030 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:41:30.032 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:41:30.308 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:41:50.628 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.85993 valid_1's auc: 0.821816 2024-04-17 23:42:28.170 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.878888 valid_1's auc: 0.821082 2024-04-17 23:42:45.424 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [92] training's auc: 0.847311 valid_1's auc: 0.822497 2024-04-17 23:42:45.577 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2019second\model.params 2024-04-17 23:42:46.294 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020first 2024-04-17 23:42:46.295 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:42:46.509 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:42:46.511 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:42:46.823 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:43:08.040 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.858524 valid_1's auc: 0.834816 2024-04-17 23:43:47.320 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.877163 valid_1's auc: 0.83338 2024-04-17 23:44:29.292 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [220] training's auc: 0.856065 valid_1's auc: 0.834993 2024-04-17 23:44:29.780 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2020first\model.params 2024-04-17 23:44:30.609 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2020second 2024-04-17 23:44:30.610 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:44:30.803 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:44:30.805 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:44:31.115 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:44:53.747 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.857948 valid_1's auc: 0.825461 2024-04-17 23:45:34.465 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.875943 valid_1's auc: 0.824503 2024-04-17 23:46:08.078 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [167] training's auc: 0.851538 valid_1's auc: 0.825715 2024-04-17 23:46:08.483 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2020second\model.params 2024-04-17 23:46:09.242 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021first 2024-04-17 23:46:09.244 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:46:09.498 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:46:09.501 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:46:09.815 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:46:32.959 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.856623 valid_1's auc: 0.823788 2024-04-17 23:47:15.526 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.874608 valid_1's auc: 0.823069 2024-04-17 23:47:38.375 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [110] training's auc: 0.846505 valid_1's auc: 0.823886 2024-04-17 23:47:38.593 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2021first\model.params 2024-04-17 23:47:39.333 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2021second 2024-04-17 23:47:39.335 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:47:39.572 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:47:39.574 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:47:39.897 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:48:04.472 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.855594 valid_1's auc: 0.83644 2024-04-17 23:48:49.120 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.872919 valid_1's auc: 0.836433 2024-04-17 23:49:18.137 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [132] training's auc: 0.847115 valid_1's auc: 0.836851 2024-04-17 23:49:18.420 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2021second\model.params 2024-04-17 23:49:19.156 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022first 2024-04-17 23:49:19.158 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:49:19.399 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:49:19.401 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:49:19.743 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:49:46.364 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.855245 valid_1's auc: 0.827909 2024-04-17 23:50:33.643 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.872575 valid_1's auc: 0.826546 2024-04-17 23:51:09.565 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [158] training's auc: 0.848622 valid_1's auc: 0.828375 2024-04-17 23:51:09.941 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2022first\model.params 2024-04-17 23:51:10.725 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2022second 2024-04-17 23:51:10.726 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:51:11.025 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:51:11.028 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:51:11.382 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:51:37.831 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.854347 valid_1's auc: 0.844362 2024-04-17 23:52:26.240 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.871073 valid_1's auc: 0.843462 2024-04-17 23:52:29.335 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [13] training's auc: 0.838279 valid_1's auc: 0.844482 2024-04-17 23:52:29.426 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2022second\model.params 2024-04-17 23:52:30.081 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023first 2024-04-17 23:52:30.082 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:52:30.333 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:52:30.336 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:52:30.701 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:52:58.129 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.854308 valid_1's auc: 0.841028 2024-04-17 23:53:48.113 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.870579 valid_1's auc: 0.840079 2024-04-17 23:54:49.194 | INFO | lightgbm.basic:_log_info:191 - [750] training's auc: 0.879958 valid_1's auc: 0.83887 2024-04-17 23:54:50.040 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [253] training's auc: 0.854565 valid_1's auc: 0.841179 2024-04-17 23:54:50.738 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2023first\model.params 2024-04-17 23:54:51.673 | INFO | src.model_manager.lgbm_manager:train_all:271 - Start training model. model name: 2023second 2024-04-17 23:54:51.674 | WARNING | lightgbm.basic:_log_warning:195 - Found `n_estimators` in params. Will use it instead of argument 2024-04-17 23:54:51.919 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found. 2024-04-17 23:54:51.922 | INFO | lightgbm.basic:_log_native:200 - [LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories. 2024-04-17 23:54:52.298 | INFO | lightgbm.basic:_log_info:191 - Training until validation scores don't improve for 500 rounds 2024-04-17 23:55:20.514 | INFO | lightgbm.basic:_log_info:191 - [250] training's auc: 0.853987 valid_1's auc: 0.838904 2024-04-17 23:56:12.352 | INFO | lightgbm.basic:_log_info:191 - [500] training's auc: 0.870068 valid_1's auc: 0.837347 2024-04-17 23:57:12.727 | INFO | lightgbm.basic:_log_info:191 - Early stopping, best iteration is: [241] training's auc: 0.853356 valid_1's auc: 0.838922 2024-04-17 23:57:13.386 | INFO | src.model_manager.lgbm_manager:save_model:222 - Saving model... model path: ..\models\first_model_label1\params\2023second\model.params 2024-04-17 23:57:14.881 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2018first 2024-04-17 23:57:14.934 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2018first 2024-04-17 23:57:16.947 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2018first 2024-04-17 23:57:17.010 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2018first 2024-04-17 23:57:20.531 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2018second 2024-04-17 23:57:20.617 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2018second 2024-04-17 23:57:24.380 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2018second 2024-04-17 23:57:24.439 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2018second 2024-04-17 23:57:28.084 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019first 2024-04-17 23:57:28.123 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019first 2024-04-17 23:57:30.384 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019first 2024-04-17 23:57:30.449 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2019first 2024-04-17 23:57:34.621 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2019second 2024-04-17 23:57:34.669 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2019second 2024-04-17 23:57:37.315 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2019second 2024-04-17 23:57:37.388 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2019second 2024-04-17 23:57:41.964 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020first 2024-04-17 23:57:42.050 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020first 2024-04-17 23:57:46.466 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020first 2024-04-17 23:57:46.536 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2020first 2024-04-17 23:57:51.128 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2020second 2024-04-17 23:57:51.185 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2020second 2024-04-17 23:57:55.032 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2020second 2024-04-17 23:57:55.099 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2020second 2024-04-17 23:57:59.788 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021first 2024-04-17 23:57:59.843 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021first 2024-04-17 23:58:03.105 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021first 2024-04-17 23:58:03.193 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2021first 2024-04-17 23:58:07.953 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2021second 2024-04-17 23:58:07.998 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2021second 2024-04-17 23:58:11.428 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2021second 2024-04-17 23:58:11.513 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2021second 2024-04-17 23:58:16.410 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022first 2024-04-17 23:58:16.482 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022first 2024-04-17 23:58:20.413 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022first 2024-04-17 23:58:20.500 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2022first 2024-04-17 23:58:25.682 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2022second 2024-04-17 23:58:25.727 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2022second 2024-04-17 23:58:28.525 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2022second 2024-04-17 23:58:28.608 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2022second 2024-04-17 23:58:33.544 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023first 2024-04-17 23:58:33.633 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023first 2024-04-17 23:58:39.869 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023first 2024-04-17 23:58:39.947 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2023first 2024-04-17 23:58:45.128 | INFO | src.model_manager.lgbm_manager:load_model:201 - Loading model... model name: 2023second 2024-04-17 23:58:45.230 | INFO | src.model_manager.lgbm_manager:load_model:203 - model activate! model_name: 2023second 2024-04-17 23:58:51.464 | INFO | src.model_manager.base_manager:set_predict_dataframe:379 - Set the infered DataFrame into the dataset. model_name: 2023second 2024-04-17 23:58:51.548 | INFO | src.model_manager.base_manager:save_predict_result:406 - Save predict result. save path: ..\models\first_model_label1\analyze\00_predict\2023second
# 単勝馬券のベット
bet_column = target_lgbm_model_manager.get_bet_column(BetName.tan)
pl_column = target_lgbm_model_manager.get_profit_loss_column(BetName.tan)
for dataset_dict in target_dataset_mapping.values():
target_lgbm_model_manager.set_bet_column(dataset_dict, bet_mode)
# テストデータの推論結果を作成
# 基礎分析結果も保存
target_lgbm_model_manager.basic_analyze(target_dataset_mapping, bet_mode)
2024-04-17 23:58:56.271 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=bet_columns_map, val={'tan': 'bet_tan'} 2024-04-17 23:58:56.273 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=pl_column_map, val={'tan': 'pl_tan'} 2024-04-17 23:58:56.899 | INFO | src.model_manager.base_manager:basic_analyze:220 - Start basic analyze. 2024-04-17 23:58:57.597 | INFO | src.model_manager.base_manager:basic_analyze:256 - Saving Return And Hit Rate Summary. 2024-04-17 23:58:57.607 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=return_hit_rate_file, val={'tan': 'e:\\keiba_dev\\keiba_ai\\notebook\\..\\models\\first_model_label1\\analyze\\tan\\hit_and_return_rate.csv'} 2024-04-17 23:58:57.609 | INFO | src.model_manager.base_manager:basic_analyze:259 - Saving Favorite Bet Num Summary. 2024-04-17 23:58:57.627 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=fav_bet_num_dir, val={'tan': 'e:\\keiba_dev\\keiba_ai\\notebook\\..\\models\\first_model_label1\\analyze\\tan\\fav_bet_num'}
11. オッズグラフのスコア比較¶
lgbm_model_manager.calc_score_odds_grap
を用いて、オッズグラフのスコアの算出
引数によって、2つのオッズグラフを比較したスコアの算出が出来る。
# first modelのオッズグラフ用のDataFrameを取得
dftrain, dfvalid, dftest = lgbm_model_manager.merge_dataframe_data(
dataset_mapping,
mode=True
)
# オッズグラフの作成
base_summary_dict = lgbm_model_manager.gegnerate_odds_graph(dftrain, dfvalid, dftest, bet_mode)
# 2つ目のmodelのオッズグラフ用のDataFrameを取得
dftrain_base, dfvalid_base, dftest_base = target_lgbm_model_manager.merge_dataframe_data(
target_dataset_mapping,
mode=True
)
# 2つ目のmodelのオッズグラフの作成
trarget_summary_dict = target_lgbm_model_manager.gegnerate_odds_graph(dftrain_base, dfvalid_base, dftest_base, bet_mode)
# first modelと2つ目のmodelのオッズグラフのスコアを比較する
summary_score = lgbm_model_manager.calc_odds_graph_score(trarget_summary_dict, base_summary_dict)
# 比較するオッズグラフがない場合もオッズグラフのスコアを算出できる
base_summary_score = target_lgbm_model_manager.calc_odds_graph_score(base_summary_dict)
2024-04-17 23:58:58.708 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: E:\keiba_dev\keiba_ai\models\first_model\analyze\tan\odds_graph 2024-04-17 23:59:00.454 | INFO | src.model_manager.base_manager:__save_odds_graph:514 - Save Odds Graph. save_path: ..\models\first_model_label1\analyze\tan\odds_graph 2024-04-17 23:59:00.455 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=odds_graph_file, val={'tan': 'e:\\keiba_dev\\keiba_ai\\notebook\\..\\models\\first_model_label1\\analyze\\tan\\odds_graph'}
12.オッズグラフスコアを確認する¶
4行目は、first_model_label1とfirst_modelのテストデータに関するスコアの加重平均を表す
5行目は、first_model_label1のオッズグラフスコアと、first_modelとの比較結果を表す
6行目は、first_modelのオッズグラフスコアを表す
以下、カラム名の説明
カラム名 | 説明 |
---|---|
weighted_over_support_score | 支持率オッズグラフスコア |
weighted_over_100per_score | 回収率オッズグラフスコア |
weighted_over_base_score | AonBオッズグラフスコア first_modelをベースラインとする first_model_label1の回収率オッズグラフスコア |
from pandas import option_context, concat
weighted_columns = ["weighted_over_support_score", "weighted_over_100per_score", "weighted_over_base_score"]
with option_context('display.max_rows', None):
display(
concat(
[summary_score["test"][weighted_columns].sum(), base_summary_score["test"][weighted_columns].sum()],
axis=1
).rename(
columns={0: target_lgbm_model_manager.model_dir.name, 1: lgbm_model_manager.model_dir.name},
index={
"weighted_over_support_score": "支持率オッズグラフスコア",
"weighted_over_100per_score": "回収率オッズグラフスコア",
"weighted_over_base_score": "AonBオッズグラフスコア"
}
)
)
display(summary_score["test"])
display(base_summary_score["test"].fillna(0))
first_model_label1 | first_model | |
---|---|---|
支持率オッズグラフスコア | 0.217186 | 0.159289 |
回収率オッズグラフスコア | -7.891067 | -7.942142 |
AonBオッズグラフスコア | 0.059617 | 0 |
over_support_score | over_100per_score | over_base_score | weighted_over_support_score | weighted_over_100per_score | weighted_over_base_score | weight | base_weight | 件数 | base_件数 | |
---|---|---|---|---|---|---|---|---|---|---|
odds_round | ||||||||||
1.25 | 1.356265 | -14.643735 | 0.000000 | 0.055319 | -0.597284 | 0.000000 | 0.040788 | 0.040788 | 814 | 814.0 |
1.75 | 0.212254 | -11.216318 | 0.012680 | 0.038511 | -2.035090 | 0.002863 | 0.181440 | 0.181490 | 3621 | 3622.0 |
2.25 | 0.597751 | -8.291138 | 0.029240 | 0.128164 | -1.777711 | 0.010022 | 0.214411 | 0.214862 | 4279 | 4288.0 |
2.75 | 0.500236 | -6.772491 | 0.008858 | 0.110966 | -1.502321 | -0.013666 | 0.221827 | 0.219522 | 4427 | 4381.0 |
3.25 | -0.258101 | -6.411947 | 0.190882 | -0.039238 | -0.974788 | 0.067067 | 0.152027 | 0.157789 | 3034 | 3149.0 |
3.75 | 0.418214 | -4.915120 | 0.003368 | 0.044028 | -0.517446 | -0.043761 | 0.105276 | 0.096307 | 2101 | 1922.0 |
4.25 | -1.811534 | -6.517416 | 0.487087 | -0.083238 | -0.299467 | -0.010611 | 0.045949 | 0.041239 | 917 | 823.0 |
4.75 | 0.791305 | -3.419221 | 0.465269 | 0.017089 | -0.073843 | 0.036130 | 0.021596 | 0.028311 | 431 | 565.0 |
5.25 | -2.040126 | -5.849650 | 1.241448 | -0.020139 | -0.057743 | 0.040325 | 0.009871 | 0.013830 | 197 | 276.0 |
5.75 | -4.389234 | -7.867495 | -4.560698 | -0.013856 | -0.024836 | -0.013072 | 0.003157 | 0.003558 | 63 | 71.0 |
6.25 | -6.133333 | -9.333333 | 3.095238 | -0.009220 | -0.014030 | 0.003407 | 0.001503 | 0.001403 | 30 | 28.0 |
6.75 | -11.851852 | -14.814815 | 0.000000 | -0.005345 | -0.006681 | 0.000000 | 0.000451 | 0.000451 | 9 | 9.0 |
7.25 | -3.891626 | -6.650246 | -42.857143 | -0.002730 | -0.004665 | -0.011922 | 0.000702 | 0.000200 | 14 | 4.0 |
7.75 | 2.177419 | -0.403226 | 12.500000 | 0.000873 | -0.000162 | 0.000485 | 0.000401 | 0.000050 | 8 | 1.0 |
8.25 | -9.696970 | -12.121212 | 0.000000 | -0.000486 | -0.000607 | 0.000607 | 0.000050 | 0.000100 | 1 | 2.0 |
8.75 | -9.142857 | -11.428571 | -50.000000 | -0.000458 | -0.000573 | -0.004438 | 0.000050 | 0.000100 | 1 | 2.0 |
9.25 | -8.648649 | -10.810811 | -10.810811 | -0.000433 | -0.000542 | -0.000542 | 0.000050 | 0.000000 | 1 | 0.0 |
9.75 | -8.205128 | -10.256410 | -10.256410 | -0.000822 | -0.001028 | -0.001028 | 0.000100 | 0.000000 | 2 | 0.0 |
10.25 | -7.804878 | -9.756098 | -9.756098 | -0.000391 | -0.000489 | -0.000489 | 0.000050 | 0.000000 | 1 | 0.0 |
10.75 | -7.441860 | -9.302326 | -9.302326 | -0.000373 | -0.000466 | -0.000466 | 0.000050 | 0.000000 | 1 | 0.0 |
13.75 | -5.818182 | -7.272727 | -7.272727 | -0.000292 | -0.000364 | -0.000364 | 0.000050 | 0.000000 | 1 | 0.0 |
15.75 | -5.079365 | -6.349206 | -6.349206 | -0.000509 | -0.000636 | -0.000636 | 0.000100 | 0.000000 | 2 | 0.0 |
25.75 | -3.106796 | -3.883495 | -3.883495 | -0.000156 | -0.000195 | -0.000195 | 0.000050 | 0.000000 | 1 | 0.0 |
50.00 | -1.600000 | -2.000000 | -2.000000 | -0.000080 | -0.000100 | -0.000100 | 0.000050 | 0.000000 | 1 | 0.0 |
over_support_score | over_100per_score | over_base_score | weighted_over_support_score | weighted_over_100per_score | weighted_over_base_score | weight | base_weight | 件数 | base_件数 | |
---|---|---|---|---|---|---|---|---|---|---|
1.25 | 1.356265 | -14.643735 | 0 | 0.055319 | -0.597284 | 0 | 0.040788 | 0 | 814 | 0 |
1.75 | 0.199574 | -11.228997 | 0 | 0.036221 | -2.037953 | 0 | 0.181490 | 0 | 3622 | 0 |
2.25 | 0.568512 | -8.320377 | 0 | 0.122152 | -1.787733 | 0 | 0.214862 | 0 | 4288 | 0 |
2.75 | 0.491378 | -6.781349 | 0 | 0.107868 | -1.488655 | 0 | 0.219522 | 0 | 4381 | 0 |
3.25 | -0.448983 | -6.602829 | 0 | -0.070845 | -1.041855 | 0 | 0.157789 | 0 | 3149 | 0 |
3.75 | 0.414846 | -4.918488 | 0 | 0.039953 | -0.473685 | 0 | 0.096307 | 0 | 1922 | 0 |
4.25 | -2.298621 | -7.004503 | 0 | -0.094792 | -0.288856 | 0 | 0.041239 | 0 | 823 | 0 |
4.75 | 0.326036 | -3.884490 | 0 | 0.009230 | -0.109973 | 0 | 0.028311 | 0 | 565 | 0 |
5.25 | -3.281573 | -7.091097 | 0 | -0.045383 | -0.098068 | 0 | 0.013830 | 0 | 276 | 0 |
5.75 | 0.171464 | -3.306797 | 0 | 0.000610 | -0.011764 | 0 | 0.003558 | 0 | 71 | 0 |
6.25 | -9.228571 | -12.428571 | 0 | -0.012948 | -0.017437 | 0 | 0.001403 | 0 | 28 | 0 |
6.75 | 0.000000 | 0.000000 | 0 | 0.000000 | 0.000000 | 0 | 0.000451 | 0 | 9 | 0 |
7.25 | 38.965517 | 36.206897 | 0 | 0.007810 | 0.007257 | 0 | 0.000200 | 0 | 4 | 0 |
7.75 | 0.000000 | 0.000000 | 0 | 0.000000 | 0.000000 | 0 | 0.000050 | 0 | 1 | 0 |
8.25 | 0.000000 | 0.000000 | 0 | 0.000000 | 0.000000 | 0 | 0.000100 | 0 | 2 | 0 |
8.75 | 40.857143 | 38.571429 | 0 | 0.004095 | 0.003865 | 0 | 0.000100 | 0 | 2 | 0 |
13.仮のセカンドモデル(first_model_label1)の回収率も見ておく¶
モデルのエクスポートのために実行しておく
for dataset_dict in target_dataset_mapping.values():
target_lgbm_model_manager.set_bet_column(dataset_dict)
_, dfbetva_target, dfbette_target = target_lgbm_model_manager.merge_dataframe_data(target_dataset_mapping, mode=True)
dfbetva_target, dfbette_target = target_lgbm_model_manager.generate_profit_loss(dfbetva_target, dfbette_target, bet_mode)
dfbette_target[["raceDate", "raceId", "label", "favorite", bet_column, pl_column]]
2024-04-17 23:59:02.176 | INFO | src.model_manager.base_manager:__save_profit_loss:646 - Save profit loss data. save_path: ..\models\first_model_label1\analyze\tan\profit_loss 2024-04-17 23:59:02.178 | INFO | src.model_manager.base_manager:set_keyvalue_to_export_mapping:139 - Set Export info. key=profit_loss_dir, val={'tan': 'e:\\keiba_dev\\keiba_ai\\notebook\\..\\models\\first_model_label1\\analyze\\tan\\profit_loss'}
raceDate | raceId | label | favorite | bet_tan | pl_tan | |
---|---|---|---|---|---|---|
383411 | 2018-01-06 | 201806010101 | 1 | 1 | 1 | 100.0 |
383425 | 2018-01-06 | 201806010102 | 6 | 1 | 1 | -100.0 |
383430 | 2018-01-06 | 201806010103 | 2 | 1 | 1 | -100.0 |
383449 | 2018-01-06 | 201806010104 | 2 | 1 | 1 | -100.0 |
383468 | 2018-01-06 | 201806010105 | 1 | 1 | 1 | 40.0 |
… | … | … | … | … | … | … |
659505 | 2023-12-28 | 202309050908 | 1 | 1 | 1 | 170.0 |
659528 | 2023-12-28 | 202309050909 | 1 | 1 | 1 | 250.0 |
659529 | 2023-12-28 | 202309050910 | 4 | 1 | 1 | -100.0 |
659549 | 2023-12-28 | 202309050911 | 2 | 1 | 1 | -100.0 |
659559 | 2023-12-28 | 202309050912 | 3 | 1 | 1 | -100.0 |
19957 rows × 6 columns
14.分析基盤um-AIのためにモデル情報をエクスポート¶
lgbm_model_manager.export_model_info
メソッドを用いて、モデル情報をエクスポート
target_lgbm_model_manager.export_model_info
メソッドを用いて、モデル情報をエクスポート
lgbm_model_manager.export_model_info()
target_lgbm_model_manager.export_model_info()
2024-04-17 23:59:02.254 | INFO | src.model_manager.base_manager:export_model_info:847 - Export Model info json. export path: E:\keiba_dev\keiba_ai\models\first_model\model_info.json 2024-04-17 23:59:02.258 | INFO | src.model_manager.base_manager:export_model_info:847 - Export Model info json. export path: ..\models\first_model_label1\model_info.json
エクスポートファイル
エクスポートされるモデル情報を管理するmodel_info.jsonは、以下の項目を保持しています。
項目名 | 説明 |
model_id | モデルID。通常モデルパラメータや推論結果を保持しているフォルダ名です。モデル管理分析クラスのインスタンス作成時に指定するmodel_dir引数の一番末尾のフォルダ名のことです。 |
model_type | モデル種別。現状’lightGBM’しかありません。今後NN(ニューラルネットワーク)やRL(強化学習)といった様々なモデルを実装予定です。 |
model_dir | モデルディレクトリ。モデル情報を管理しているモデルディレクトリパス。モデル管理分析クラスのインスタンス作成時に指定するmodel_dirの引数の値。 |
model_analyze_dir | モデル分析結果ディレクトリ。基礎分析、オッズグラフの素データが入っているディレクトリまでのパス |
model_predict_dir | モデル推論結果ディレクトリ。モデルの推論結果データが入っているディレクトリまでのパス |
bet_columns_map | モデルが対象とする馬券と、ベットデータが入っているカラム名のマッピング(JSON形式) |
pl_column_map | モデルが対象とする馬券と、収支データが入っているカラム名のマッピング(JSON形式) |
return_hit_rate_file | モデルが対象とする馬券と、回収率と的中率サマリファイルパスのマッピング(JSON形式) |
fav_bet_num_dir | モデルが対象とする馬券と、人気別ベット回数データのファイルパスのマッピング(JSON形式) |
profit_loss_dir | モデルが対象とする馬券と、馬券別の収支結果ファイルパスのマッピング(JSON形式) |
odds_graph_file | モデルが対象とする馬券と、オッズグラフファイルパスのマッピング(JSON形式) |
confidence_column | 推論結果の確信度カラム |
confidence_rank_column | 推論結果の確信度ランクカラム |
コメント