Let’s verify that the data still looks good. Displaying a sample of the labels and images from the ndarray. Hint: you can use matplotlib.pyplot.
这是我尝试过的:
import random rand_smpl = [ train_datasets[i] for i in sorted(random.sample(xrange(len(train_datasets)),1)) ] print(rand_smpl) filename = rand_smpl[0] import pickle loaded_pickle = pickle.load( open( filename,"r" ) ) image_size = 28 # Pixel width and height. import numpy as np dataset = np.ndarray(shape=(len(loaded_pickle),image_size,image_size),dtype=np.float32) import matplotlib.pyplot as plt plt.plot(dataset[2]) plt.ylabel('some numbers') plt.show()
但这就是我得到的:
这没有多大意义.说实话,我的代码也可能,因为我不确定如何解决这个问题!
泡菜是这样创建的:
image_size = 28 # Pixel width and height. pixel_depth = 255.0 # Number of levels per pixel. def load_letter(folder,min_num_images): """Load the data for a single letter label.""" image_files = os.listdir(folder) dataset = np.ndarray(shape=(len(image_files),dtype=np.float32) print(folder) num_images = 0 for image in image_files: image_file = os.path.join(folder,image) try: image_data = (ndimage.imread(image_file).astype(float) - pixel_depth / 2) / pixel_depth if image_data.shape != (image_size,image_size): raise Exception('Unexpected image shape: %s' % str(image_data.shape)) dataset[num_images,:,:] = image_data num_images = num_images + 1 except IOError as e: print('Could not read:',image_file,':',e,'- it\'s ok,skipping.') dataset = dataset[0:num_images,:] if num_images < min_num_images: raise Exception('Many fewer images than expected: %d < %d' % (num_images,min_num_images)) print('Full dataset tensor:',dataset.shape) print('Mean:',np.mean(dataset)) print('Standard deviation:',np.std(dataset)) return dataset
dataset = load_letter(folder,min_num_images_per_class) try: with open(set_filename,'wb') as f: pickle.dump(dataset,f,pickle.HIGHEST_PROTOCOL)
这里的想法是:
Now let’s load the data in a more manageable format. Since,depending on your computer setup you might not be able to fit it all in memory,we’ll load each class into a separate dataset,store them on disk and curate them independently. Later we’ll merge them into a single dataset of manageable size.
We’ll convert the entire dataset into a 3D array (image index,x,y) of floating point values,normalized to have approximately zero mean and standard deviation ~0.5 to make training easier down the road.
解决方法
#define a function to conver label to letter def letter(i): return 'abcdefghij'[i] # you need a matplotlib inline to be able to show images in python notebook %matplotlib inline #some random number in range 0 - length of dataset sample_idx = np.random.randint(0,len(train_dataset)) #now we show it plt.imshow(train_dataset[sample_idx]) plt.title("Char " + letter(train_labels[sample_idx]))
您的代码实际上更改了数据集的类型,它不是大小的数组(220000,28,28)
通常,pickle是一个保存一些对象的文件,而不是数组本身.您应该直接使用pickle中的对象来获取您的火车数据集(使用代码段中的符号):
#will give you train_dataset and labels train_dataset = loaded_pickle['train_dataset'] train_labels = loaded_pickle['train_labels']
更新:
根据@gsarmas的请求,我整个Assignment1解决方案的链接是here.
代码被注释并且大部分都是不言自明的,但是如果有任何问题可以通过github上的任何方式随意联系