我正在尝试运行EMR烫伤工作,并且
Scala代码假定要获取位于S3存储桶中的文本文件的内容. scala.io.source库搞乱了S3路径的正确位置.
我将参数runidfile提供给EMR作业:
--runidfile s3://my-bucket/input.txt
scala代码执行以下操作:
val runid_path = args("runidfile") val runid = Source.fromFile(runid_path).getLines().mkString
Caused by: java.io.FileNotFoundException: s3:/my-bucket/input.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at scala.io.Source$.fromFile(Source.scala:90)
at scala.io.Source$.fromFile(Source.scala:75)
at scala.io.Source$.fromFile(Source.scala:53)
at com.move.scalding.userEvents.RecommenderValidator.(RecommenderValidator.scala:37)
解决方法
scala.io.Source库无意直接从Amazon S3访问文件.你需要另一个库.
您可以使用官方Amazon S3 Java Library.这是一些示例代码(从this question and its answers复制)
val credentials = new BasicAWSCredentials("myKey","mySecretKey") val s3Client = new AmazonS3Client(credentials) val s3Object = s3Client.getObject(new GetObjectRequest("my-bucket","input.txt")) val myData = Source.fromInputStream(s3Object.getObjectContent()) val runid = myData.getLines().mkString