0. 安装PyCharm和spark
下载pycharm http://www.jetbrains.com/pycharm/
下载spark http://spark.apache.org/
ps:在安装pycharm前系统需要有java环境
1.安装py4j
$ sudo pip install py4j
2.配置pycharm
在Run/Debug Configurations中 如下图配置
然后就可以在pycharm中运行pyspark的程序了
测试一下:
from pyspark import SparkContext sc = SparkContext() logData = sc.textFile("README.md").cache() numAs = logData.filter(lambda s: 'a' in s).count() numBs = logData.filter(lambda s: 'b' in s).count() print("Lines with a: %i,lines with b: %i" % (numAs,numBs))
运行结果