我是python和TensorFlow的新手.我最近开始理解并执行TensorFlow示例,并遇到了这个:https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html
我得到了错误,TypeError:类型’float’的参数不可迭代,我相信问题在于以下代码行:
df_train [LABEL_COLUMN] =(df_train [‘income_bracket’].apply(lambda x:’> 50K’在x中)).astype(int)
(income_bracket是人口普查数据集的标签列,其中’> 50K’是可能的标签值之一,另一个标签是’=< 50K'.数据集读入df_train.文档中提供的说明由于上述原因,“由于任务是二元分类问题,我们将构造一个名为”label“的标签列,如果收入超过50K,其值为1,否则为0.”) 如果有人能够解释我究竟发生了什么,我该如何解决它,这将是伟大的.我尝试使用Python2.7和Python3.4,我认为问题不在于语言的版本.此外,如果有人知道TensorFlow和Pandas新手的精彩教程,请分享链接. 完整计划:
import pandas as pd
import urllib
import tempfile
import tensorflow as tf
gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender",keys=["female","male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race",keys=["Amer-Indian-Eskimo","Asian-Pac-Islander","Black","Other","White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education",hash_bucket_size=1000)
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status",hash_bucket_size=100)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship",hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass",hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation",hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country",hash_bucket_size=1000)
age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age,boundaries=[18,25,30,35,40,45,50,55,60,65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")
wide_columns = [gender,native_country,education,occupation,workclass,marital_status,relationship,age_buckets,tf.contrib.layers.crossed_column([education,occupation],hash_bucket_size=int(1e4)),tf.contrib.layers.crossed_column([native_country,tf.contrib.layers.crossed_column([age_buckets,race,hash_bucket_size=int(1e6))]
deep_columns = [
tf.contrib.layers.embedding_column(workclass,dimension=8),tf.contrib.layers.embedding_column(education,tf.contrib.layers.embedding_column(marital_status,tf.contrib.layers.embedding_column(gender,tf.contrib.layers.embedding_column(relationship,tf.contrib.layers.embedding_column(race,tf.contrib.layers.embedding_column(native_country,tf.contrib.layers.embedding_column(occupation,age,education_num,capital_gain,capital_loss,hours_per_week]
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
model_dir=model_dir,linear_feature_columns=wide_columns,dnn_feature_columns=deep_columns,dnn_hidden_units=[100,50])
COLUMNS = ["age","workclass","fnlwgt","education","education_num","marital_status","occupation","relationship","race","gender","capital_gain","capital_loss","hours_per_week","native_country","income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass","native_country"]
CONTINUOUS_COLUMNS = ["age","hours_per_week"]
train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",train_file.name)
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",test_file.name)
df_train = pd.read_csv(train_file,names=COLUMNS,skipinitialspace=True)
df_test = pd.read_csv(test_file,skipinitialspace=True,skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
def input_fn(df):
continuous_cols = {k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS}
categorical_cols = {k: tf.SparseTensor(
indices=[[i,0] for i in range(df[k].size)],values=df[k].values,shape=[df[k].size,1])
for k in CATEGORICAL_COLUMNS}
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
label = tf.constant(df[LABEL_COLUMN].values)
return feature_cols,label
def train_input_fn():
return input_fn(df_train)
def eval_input_fn():
return input_fn(df_test)
m.fit(input_fn=train_input_fn,steps=200)
results = m.evaluate(input_fn=eval_input_fn,steps=1)
for key in sorted(results):
print("%s: %s" % (key,results[key]))
谢谢
PS:错误的完整堆栈跟踪
Traceback (most recent call last):
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py",line 73,in
最佳答案
如您所见,当您检查test.data时,您会明显看到第一行数据在income_bracket字段中具有“NAN”.
我进一步检查过这是唯一包含“NAN”的行:
ib = df_test ["income_bracket"]
t = type('12')
for idx,i in enumerate(ib):
if(type(i) != t):
print idx,type(i)
结果:0< type'float'>
所以你可以跳过这一行:
df_test = pd.read_csv(file_test,names = COLUMNS,skipinitialspace = True,skiprows = 1)