xgboost_optuna
To get started:
Dynamically pull and run
from hamilton import dataflows, driver
# downloads into ~/.hamilton/dataflows and loads the module -- WARNING: ensure you know what code you're importing!
xgboost_optuna = dataflows.import_module("xgboost_optuna", "zilto")
dr = (
driver.Builder()
.with_config({}) # replace with configuration as appropriate
.with_modules(xgboost_optuna)
.build()
)
# If you have sf-hamilton[visualization] installed, you can see the dataflow graph
# In a notebook this will show an image, else pass in arguments to save to a file
# dr.display_all_functions()
# Execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
[xgboost_optuna.CHANGE_ME, ...], # this specifies what you want back
inputs={...} # pass in inputs as appropriate
)
Use published library version
pip install sf-hamilton-contrib --upgrade # make sure you have the latest
from hamilton import dataflows, driver
# Make sure you've done - `pip install sf-hamilton-contrib --upgrade`
from hamilton.contrib.user.zilto import xgboost_optuna
dr = (
driver.Builder()
.with_config({}) # replace with configuration as appropriate
.with_modules(xgboost_optuna)
.build()
)
# If you have sf-hamilton[visualization] installed, you can see the dataflow graph
# In a notebook this will show an image, else pass in arguments to save to a file
# dr.display_all_functions()
# Execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
[xgboost_optuna.CHANGE_ME, ...], # this specifies what you want back
inputs={...} # pass in inputs as appropriate
)
Modify for your needs
Now if you want to modify the dataflow, you can copy it to a new folder (renaming is possible), and modify it there.
dataflows.copy(xgboost_optuna, "path/to/save/to")
Purpose of this module
This module implements a dataflow to train an XGBoost model with hyperparameter tuning using Optuna.
You give it a 2D arrays for X_train
, y_train
, X_test
, y_test
and you are good to go!
Configuration Options
The Hamilton driver can be configured with the following options:
- {"task": "classification"} to use xgboost.XGBClassifier.
- {"task": "regression"} to use xgboost.XGBRegressor.
There are several relevant inputs and override points.
Inputs:
model_config_override
: Pass a dictionary to override the XGBoost default config. Warning passing amodel_config_override = {"objective": "binary:logistic}
to anXGBRegressor
effectively changes it to anXGBClassifier
optuna_distributions_override
: Pass a dictionary of optuna distributions to define the hyperparameter search space.
Overrides:
base_model
: can change it to the typexgboost.XGBRanker
for a ranking task orxgboost.dask.DaskXGBClassifier
to support Daskscoring_func
: can be anysklearn.metrics
function that acceptsy_true
andy_pred
as arguments. Remember to set accordinglyhigher_is_better
for the optimization taskcross_validation_folds
: can be any sequence of tuples that define (train_index
,validation_index
) to train the model with cross-validation overX_train
Limitations
- It is difficult to adapt for distributed Optuna hyperparameter search.
- The current structure makes it difficult to add custom training callbacks to the XGBoost model (can be done to some extent via
model_config_override
).
Source code
__init__.py
Requirements
numpy
optuna
pandas
scikit-learn
sf-hamilton[visualization]
xgboost