当我尝试将csv导入我的Redshift数据库时,我收到此错误
Missing newline: Unexpected character 0x75 found at location 4194303
csv本身似乎一切都很好. stl表告诉我错误是在csv的70269行,它包含这个字符串
10:00:10,2014-07-28,Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0),Not Listed,multiRetrieve,OS-Preview-logItemUsage,"[{""PubEndDate""=>""2013/12/31"",""ItmId""=>""1353296053"",""SourceType""=>""Scholarly Journals"",""ReasonCode""=>""Free"",""MyResearchUser""=>""246763"",""ProjectCode""=>"""",""PublicationCode""=>"""",""PubStartDate""=>""2013/01/01"",""ItmFrmt""=>""AbstractPreview"",""Subrole""=>""AbstractPreview"",""PaymentType""=>""Transactional"",""UsageInfo""=>""P-1008275-154977-CUSTOMER-10000137-2950635"",""Role""=>""AbstractPreview"",""RetailPrice""=>0,""EffectivePrice""=>0,""ParentItemId""=>""53628""}]","[""optype:Online"",""location:null"",""target:null""]",192.234.111.8,DIALOG,20140728131712007:882391,1119643,"2014-07-28 10:00:10-0400,421 {""Items"":[{""PubEndDate"":""2013/12/31"",""ItmId"":""1353296053"",""SourceType"":""Scholarly Journals"",""ReasonCode"":""Free"",""MyResearchUser"":""246763"",""ProjectCode"":"""",""PublicationCode"":"""",""PubStartDate"":""2013/01/01"",""ItmFrmt"":""AbstractPreview"",""Subrole"":""AbstractPreview"",""PaymentType"":""Transactional"",""UsageInfo"":""P-1008275-154977-CUSTOMER-10000137-2950635"",""Role"":""AbstractPreview"",""RetailPrice"":0,""EffectivePrice"":0,""ParentItemId"":""53628""}],""Operation"":[""optype:Online"",""target:null""],""UserAgent"":""Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"",""UserInfo"":{""IP"":""192.234.111.8"",""AppId"":""DIALOG"",""SessId"":""20140728131712007:882391"",""UsageGroupId"":""1119643""},""UsageType"":""multiRetrieve"",""BreadCrumb"":""OS-Preview-logItemUsage""}
任何想法为什么它不会加载?
编辑:这显然与数字’4194303’有关.我的许多redshift上传都失败了,这是我的stl_load_errors的简短示例
Missing newline: Unexpected character 0x3a found at location 4194303 Missing newline: Unexpected character 0x63 found at location 4194303 Missing newline: Unexpected character 0x6c found at location 4194303 Missing newline: Unexpected character 0x22 found at location 4194303
表中出现“text”类型错误的所有条目,大约有30列. csv本身包含数千条记录(相当大的csv文件).
我发现数字4194303来自Redshift复制的TRUNCATECOLUMNS功能设置的4MB限制.通过禁用此功能,我得到“字符串长度超过DDL长度”错误(这就是我首先使用TRUNCATECOLUMNS的原因).
所以问题是我的许多记录超过4MB,如果需要截断任何属性,redshift不支持这样的记录.
但是,通过使用copy命令的MAXERROR 1000选项,我可以忽略4MB记录,并留下一个只包含我想要的行小于4MB的数据库.