我正在尝试下载具有特定标签的照片.我发现实时api相当无用,所以我使用长轮询策略.下面是伪代码,其中有sublte错误的注释
newMediaCount = getMediaCount(); delta = newMediaCount - mediaCount; if (delta > 0) { // if mediaCount changed by now,realDelta > delta,so realDelta - delta photos won't be grabbed and on next poll if mediaCount didn't change again realDelta - delta would be duplicated else ... // if photo posted from private account last photo will be duplicated as counter changes but nothing is added to recent recentMedia = getRecentMedia(delta); // persist recentMedia mediaCount = newMediaCount; }
第二个问题可以用某种类型的我来解决.但首先真的打扰了我.我已经把两个呼叫转移到instagram api尽可能的接近,但这是否足够?
编辑
由于Amir建议我使用min / max_tag_ids重写了代码.但它仍然跳过照片.我找不到比在磁盘上保存图像一段时间的更好的测试方法,并将结果与instagram.com/explore/tags/进行比较.
public class LousyInstagramApiTest { @Test public void testFeedContinuity() throws Exception { Instagram instagram = new Instagram(Settings.getClientId()); final String TAG_NAME = "portrait"; String id = instagram.getRecentMediaTags(TAG_NAME).getPagination().getMinTagId(); HashtagEndpoint endpoint = new HashtagEndpoint(instagram,TAG_NAME,id); for (int i = 0; i < 10; i++) { Thread.sleep(3000); endpoint.recentFeed().forEach(d -> { try { URL url = new URL(d.getImages().getLowResolution().getImageUrl()); BufferedImage img = ImageIO.read(url); ImageIO.write(img,"png",new File("D:\\tmp\\" + d.getId() + ".png")); } catch (Exception e) { e.printStackTrace(); } }); } } } class HashtagEndpoint { private final Instagram instagram; private final String hashtag; private String minTagId; public HashtagEndpoint(Instagram instagram,String hashtag,String minTagId) { this.instagram = instagram; this.hashtag = hashtag; this.minTagId = minTagId; } public List<MediaFeedData> recentFeed() throws InstagramException { TagMediaFeed Feed = instagram.getRecentMediaTags(hashtag,minTagId,null); List<MediaFeedData> dataList = Feed.getData(); if (dataList.size() == 0) return Collections.emptyList(); String maxTagId = Feed.getPagination().getNextMaxTagId(); if (maxTagId != null && maxTagId.compareTo(minTagId) > 0) dataList.addAll(paginateFeed(maxTagId)); Collections.reverse(dataList); // dataList.removeIf(d -> d.getId().compareTo(minTagId) < 0); minTagId = Feed.getPagination().getMinTagId(); return dataList; } private Collection<? extends MediaFeedData> paginateFeed(String maxTagId) throws InstagramException { System.out.println("pagination required"); List<MediaFeedData> dataList = new ArrayList<>(); do { TagMediaFeed Feed = instagram.getRecentMediaTags(hashtag,null,maxTagId); maxTagId = Feed.getPagination().getNextMaxTagId(); dataList.addAll(Feed.getData()); } while (maxTagId.compareTo(minTagId) > 0); return dataList; } }
解决方法
使用
Tag endpoints获取最近有媒体的标签,它会在其分页信息中返回一个min_tag_id,这个信息在通话时绑定到最近被标记的媒体上.由于API还接受了一个min_tag_id参数,因此您可以将上次查询中的该数字传递给只接收上次查询后标记的媒体.
所以基于你所拥有的任何轮询机制,你只需调用API就可以获得最新的媒体,如果根据最近收到的min_tag_id的话.
您还需要传递一个大的计数参数,并按照响应的分页接收所有数据,而不会丢失任何标签的速度比您的轮询速度更快.
更新:
根据您更新的代码:
public List<MediaFeedData> recentFeed() throws InstagramException { TagMediaFeed Feed = instagram.getRecentMediaTags(hashtag,100000); List<MediaFeedData> dataList = Feed.getData(); if (dataList.size() == 0) return Collections.emptyList(); // follow the pagination MediaFeed recentMediaNextPage = instagram.getRecentMediaNextPage(Feed.getPagination()); while (recentMediaNextPage.getPagination() != null) { dataList.addAll(recentMediaNextPage.getData()); recentMediaNextPage = instagram.getRecentMediaNextPage(recentMediaNextPage.getPagination()); } Collections.reverse(dataList); minTagId = Feed.getPagination().getMinTagId(); return dataList; }