Python中的持久性多进程共享缓存,具有stdlib或最小依赖性

我只是尝试使用Python shelve模块作为从外部服务获取的数据的持久缓存. The complete example is here.

我想知道如果我想让这个多进程安全,最好的方法是什么？我知道redis,memcached和这样的“真正的解决方案”,但是我只想使用Python标准库的部分或非常小的依赖性来保持我的代码紧凑,并且在单个进程中运行代码时不会引入不必要的复杂性 – 单线程模型.

很容易想出一个单进程解决方案,但这不适用于当前的Python Web运行时.具体来说,问题在于Apache mod_wsgi环境

>只有一个进程正在更新缓存数据一次(文件锁定,某种程度上？)
>其他进程在更新过程中使用缓存数据
>如果进程无法更新缓存数据,则在另一个进程可以再次尝试(防止thundering herd等)之前有N分钟的惩罚 – 如何在mod_wsgi进程之间发出信号
>您没有使用任何“重工具”,只有Python标准库和UNIX

此外,如果一些PyPi包没有外部依赖性,请告诉我.欢迎使用替代方法和建议,例如“只使用sqlite”.

例：

import datetime
import os
import shelve
import logging


logger = logging.getLogger(__name__)


class Converter:

    def __init__(self,fpath):
        self.last_updated = None
        self.data = None

        self.data = shelve.open(fpath)

        if os.path.exists(fpath):
            self.last_updated = datetime.datetime.fromtimestamp(os.path.getmtime(fpath))

    def convert(self,source,target,amount,update=True,determiner="24h_avg"):
        # Do something with cached data
        pass

    def is_up_to_date(self):
        if not self.last_updated:
            return False

        return datetime.datetime.now() < self.last_updated + self.refresh_delay

    def update(self):
        try:
            # Update data from the external server
            self.last_updated = datetime.datetime.now()
            self.data.sync()
        except Exception as e:
            logger.error("Could not refresh market data: %s %s",self.api_url,e)
            logger.exception(e)

最佳答案

我想你想要使用一些现有的缓存库,dogpile.cache可以想到,它已经有很多功能,你可以轻松插入你可能需要的后端.

dogpile.cache文档说明如下：

This “get-or-create” pattern is the entire key to the “Dogpile”
system,which coordinates a single value creation operation among many
concurrent get operations for a particular key,eliminating the issue
of an expired value being redundantly re-generated by many workers
simultaneously.

Python中的持久性多进程共享缓存,具有stdlib或最小依赖性

猜你在找的Python相关文章