我试图使用SVC作为分类器,通过GridSearchCV执行带有交叉验证的递归特征消除(RFECV),如下所示.
我的代码如下.
X = df[my_features]
y = df['gold_standard']
x_train,x_test,y_train,y_test = train_test_split(X,y,random_state=0)
k_fold = StratifiedKFold(n_splits=10,shuffle=True,random_state=0)
clf = SVC(class_weight="balanced")
rfecv = RFECV(estimator=clf,step=1,cv=k_fold,scoring='roc_auc')
param_grid = {'estimator__C': [0.001,0.01,0.1,0.25,0.5,0.75,1.0,10.0,100.0,1000.0],'estimator__gamma': [0.001,2.0,3.0,'estimator__kernel':('rbf','sigmoid','poly')
}
CV_rfc = GridSearchCV(estimator=rfecv,param_grid=param_grid,cv= k_fold,scoring = 'roc_auc',verbose=10)
CV_rfc.fit(x_train,y_train)
但是,我收到一条错误消息:RuntimeError:分类程序未公开“ coef_”或“ feature_importances_”属性
有没有办法解决此错误?如果不是,我可以在SVC中使用哪些其他功能选择技术?
如果需要,我很乐意提供更多详细信息.
最佳答案
要查看更多功能选择实现,您可以看一下:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
例如,在下一个链接中,他们将PCA与k最佳功能选择和svc一起使用.
iris = load_iris()
X,y = iris.data,iris.target
# Maybe some original features where good,too?
selection = SelectKBest()
# Build SVC
svm = SVC(kernel="linear")
# Do grid search over k,n_components and C:
pipeline = Pipeline([("features",selection),("svm",svm)])
param_grid = dict(features__k=[1,2],svm__C=[0.1,1,10])
grid_search = GridSearchCV(pipeline,cv=5,verbose=10)
grid_search.fit(X,y)
print(grid_search.best_estimator_)