这一篇我会总结sklearn.pipeline.Pipeline。 1、sklearn.pipeline.Pipeline类 先给出官方的文档链接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html class sklearn.pipeline.Pipeline(steps) 官网的介绍如下: pipeline of transforms with a final estimator. 最后估计量的变换管线 Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’,that is,they must implement fit and transform methods. The final estimator only needs to implement fit. The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this,it enables setting parameters of the varIoUs steps using their names and the parameter name separated by a ‘__’,as in the example below. 解释:pipeline的目的就是当设置不同的参数时组合几个可以一起交叉验证的步骤。所以可以使用组合这几个步骤的名字和它们的属性参数(不过需要在参数前面加_来连接)。 参数:Parameters: steps: list : List of (name,transform) tuples (implementing fit/transform) that are chained,in the order in which they are chained,with the last object an estimator. 注释:参数steps是一个list,list里面是一个个(name,transform)格式的tuple。最后一个tuple是估计函数(就是我们训练的模型类型)。而前面的tuple就是交叉验证的步骤。 下面给出官网的一个例子: #!/usr/env/bin python # -*- coding:utf-8 -*- from sklearn import svm from sklearn.datasets import samples_generator from sklearn.feature_selection import SelectKBest from sklearn.feature_selection import f_regression from sklearn.pipeline import Pipeline # generate some data to play with # X,y = samples_generator.make_classification(n_informative=5,n_redundant=0,random_state=42) print X print y # ANOVA SVM-C anova_filter = SelectKBest(f_regression,k=5) print anova_filter clf = svm.SVC(kernel='linear')#确定选择的模型 anova_svm = Pipeline([('anova',anova_filter),('svc',clf)]) # You can set the parameters using the names issued # For instance,fit using a k of 10 in the SelectKBest # and a parameter 'C' of the svm anova_svm.set_params(anova__k=10,svc__C=.1).fit(X,y)#可以使用‘_’符号直接链接某个属性 print anova_svm.named_steps #实际上是一个字典 print type(anova_svm) prediction = anova_svm.predict(X) score=anova_svm.score(X,y) print prediction,type(prediction) print score输出结果如下: X [[-2.70323229 0.67787532 -0.65407568 ...,0.18958162 0.50109417 2.41185611] [-0.30777823 0.21915033 0.24938368 ...,0.64548418 0.74625357 1.33408391] [-0.25737654 -1.66858407 0.39922312 ...,0.61351797 0.12003133 -0.22989455] ...,[-0.01530985 0.5792915 0.11958037 ...,-1.47891157 0.39180401 0.21434039] [-1.33123295 -1.83620537 0.50799133 ...,0.95670232 0.70810868 -2.14387014] [-1.31183623 -1.06511366 -0.3052247 ...,0.55781031 1.39020755 -1.58909265]] Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1] anova_filter: SelectKBest(k=5,score_func=<function f_regression at 0xaa05e9c>) anova_svm.named_steps: {'svc': SVC(C=0.1,cache_size=200,class_weight=None,coef0=0.0,degree=3,gamma=0.0,kernel='linear',max_iter=-1,probability=False,random_state=None,shrinking=True,tol=0.001,verbose=False),'anova': SelectKBest(k=10,score_func=<function f_regression at 0xaa05e9c>)} type(anova_svm)= <class 'sklearn.pipeline.Pipeline'> prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'> score= 0.77 上面用到了几个方法: set_params(**params) 设置步骤name的属性值 predict(*args,**kwargs)Applies transforms to the data,and the predict method of the final estimator. 预测估计值 score(*args,and the score method of the final estimator. 对最终的结果进行评分。