我正在尝试使用从http://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/改编的以下代码在XGBC分类器上执行多类分类问题的交叉验证
import numpy as np
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn import cross_validation,metrics
from sklearn.grid_search import GridSearchCV
def modelFit(alg,X,y,useTrainCV=True,cvFolds=5,early_stopping_rounds=50):
if useTrainCV:
xgbParams = alg.get_xgb_params()
xgTrain = xgb.DMatrix(X,label=y)
cvresult = xgb.cv(xgbParams,xgTrain,num_boost_round=alg.get_params()['n_estimators'],nfold=cvFolds,stratified=True,metrics={'mlogloss'},early_stopping_rounds=early_stopping_rounds,seed=0,callbacks=[xgb.callback.print_evaluation(show_stdv=False),xgb.callback.early_stop(3)])
print cvresult
alg.set_params(n_estimators=cvresult.shape[0])
# Fit the algorithm
alg.fit(X,eval_metric='mlogloss')
# Predict
dtrainPredictions = alg.predict(X)
dtrainPredProb = alg.predict_proba(X)
# Print model report:
print "\nModel Report"
print "Classification report: \n"
print(classification_report(y_val,y_val_pred))
print "Accuracy : %.4g" % metrics.accuracy_score(y,dtrainPredictions)
print "Log Loss score (Train): %f" % metrics.log_loss(y,dtrainPredProb)
feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar',title='Feature Importances')
plt.ylabel('Feature Importance score')
# 1) Read training set
print('>> Read training set')
train = pd.read_csv(trainFile)
# 2) Extract target attribute and convert to numeric
print('>> Preprocessing')
y_train = train['OutcomeType'].values
le_y = LabelEncoder()
y_train = le_y.fit_transform(y_train)
train.drop('OutcomeType',axis=1,inplace=True)
# 4) Extract features and target from training set
X_train = train.values
# 5) First classifier
xgb = XGBClassifier(learning_rate =0.1,n_estimators=1000,max_depth=5,min_child_weight=1,gamma=0,subsample=0.8,colsample_bytree=0.8,scale_pos_weight=1,objective='multi:softprob',seed=27)
modelFit(xgb,X_train,y_train)
其中y_train包含从0到4的标签.但是,当我运行此代码时,我从xgb.cv函数xgboost.core.XGBoostError得到以下错误:参数num_class的值0应该大于等于1.在XGBoost doc上我读了在多类情况下,xgb从目标向量中的标签中推断出类的数量,所以我不明白发生了什么.
最佳答案