Python

Environment

安装 Python 包你的 Python 包都装到哪了？假设当前 Python 解释器的路径是 $path_prefix/bin/python，那么你启动 Python 交互环境或者用这个解释器运行脚本时，会默认寻找以下位置 $path_prefix/lib（标准库路径） $path_prefix/lib/pythonX.Y/site-packages（三方库路径，X.Y 是对应 Python 的主次版本号，如 3.7, 2.6）当前工作目录（pwd命令的返回结果）几个有用的函数 sys.executable：当前使用的 Python 解释器路径 sys.path：当前包的搜索路径列表 sys.prefix：当前使用的 $path_prefix 除此之外，还在以在命令行中运行 python -m site，会打印出当前 Python 的一些信息，包括搜索路径列表。使用环境变量添加搜索路径如果你的包的路径不存在上面列出的搜索路径列表里，可以把路径加到 PYTHONPATH 环境变量里虚拟环境虚拟环境就是为了隔离不同项目的依赖包，使他们安装到不同的路径下，以防止依赖冲突的问题。理解了 Python 是如何安装包的机制之后就不难理解虚拟环境（virtualenv, venv模块）的原理。其实，运行virtualenv myenv会复制一个新的 Python 解释器到myenv/bin下，并创建好myenv/lib，myenv/lib/pythonX.Y/site-packages等目录（venv模块不是用的复制，但结果基本一样）。执行source myenv/bin/activate以后会把myenv/bin塞到PATH前面，让这个复制出来的 Python 解释器最优先被搜索到。这样，后续安装包时，$path_prefix就会是myenv了，从而实现了安装路径的隔离。运行 Python 脚本运行一个子目录中某脚本的代码，应该用 python -m <module_name>。python -m 后面的参数是（以 . 分隔的）模块名，而不是路径名。 pip 运行 pip 有两种方式： pip ... python -m pip ... 第一种方式和第二种方式大同小异，区别是第一种方式使用的 Python 解释器是写在 pip 文件的 shebang 里的，一般情况下，如果你的 pip 路径是 $path_prefix/bin/pip，那么 Python 路径对应的就是 $path_prefix/bin/python。如果你用的是 Unix 系统则 cat $(which pip) 第一行就包含了 Python 解释器的路径。第二种方式则显式地指定了 Python 的位置。 ...

Libraries

任务调度 schedule install pip install schedule usage import schedule # add schedule job schedule.every(10).seconds.do(lambda: print("running")) # run scheduler while True: schedule.run_pending() time.sleep(1) add job with parameters def func(name: str): print(f"My name is {name}") schedule.every(5).seconds.do(func, name="Tom") while True: schedule.run_pending() time.sleep(1) Apscheduler Install pip install apscheduler Triggers：任务触发逻辑 cron：cron 格式触发 interval：固定时间间隔触发 date：在某固定日期触发一次 combine：组合条件触发 Scheduler BlockingScheduler：阻塞式，当程序只运行这个 scheduler 时使用 BackgroundScheduler：调度器在后台运行 Executor ThreadPoolExecutor：默认使用多线程执行器 ProcessPoolExecutor：如果是 CPU 密集型任务可以使用多进程执行器 Job store：如果任务调度信息存在内存中，当程序退出后会丢失，可以其他存储器进行持久化存储 MemoryJobStore：默认使用内存存储 SQLAlchemyJobStore MongoDBJobStore etc. 创建 scheduler ...

Python

Command # print version python -V # run python command python -c "print('Hello world!')" Python Files Header #!/usr/bin/python # -*- coding: utf-8 -*- Module A python file is a module main.py database.py const.py import module # method 1: import module import database client = database.Client() # method 2: import class from module from database import Client run a module as script python -m module_name # if the module is in parent/child/module_name.py python -m parent.child.module_name Package A folder of python files is a package ...

Scraper

Scraper [TOC] urllib Python built-in lib for web requesting Import from urllib.request import urlopen from urllib.request import urlretrieve from urllib.error import HTTPError Open url page = urlopen(URL) Requests HTTP for human Import import requests get/post r = requests.get(URL) r = requests.post(URL) Add Headers headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'Accept': 'text/html, application/xhtml+xml, application/xml; q=0.9, img/webp,*/*; q=0.8', 'Host': 'www.zhihu.com', 'Referer': 'https://www.zhihu.com/'} r = requests.get(URL, headers=headers) Add cookies cookies = dict(cookies_are='working') r = requests.get(URL, cookies=cookies) # using cookie jar jar = requests.cookies.RequestsCookieJar() jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies') jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere') r = requests.get(url, cookies=jar) Check results ...

StandardLib

StandardLib Text Processing Services re 正则表达式 import re # 编译 datepat = re.compile(r'\d+/\d+/\d+') # 匹配 text1 = '11/27/2012' if datepat.match(text1): print('yes') # 搜索 text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' datepat.findall(text) # ['11/27/2012', '3/13/2013'] # 通常会分组匹配 datepat = re.compile(r'(\d+)/(\d+)/(\d+)') m = datepat.match('11/27/2012') print(m.group(0), m.group(1), m.group(2), m.group(3), m.groups()) datepat.findall(text) # [('11', '27', '2012'), ('3', '13', '2013')] # 返回迭代 for m in datepat.finditer(text): print(m.groups()) # 只是一次匹配/搜索操作的话可以无需先编译 re.findall(r'(\d+)/(\d+)/(\d+)', text) # 替换 re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text) # 'Today is 2012-11-27. PyCon starts 2013-3-13.' re.sub(r'(?P<month>\d+)/(?P<day>\d+)/(?P<year>\d+)', r'\g<year>-\g<month>-\g<day>', text) # 命名分组 Data Types datetime from datetime import datetime a = datetime(2012, 9, 23) # 时间转字符串 a.strftime('%Y-%m-%d') # 字符串转时间 text = '2012-09-20' y = datetime.strptime(text, '%Y-%m-%d') zoneinfo (3.9+) from datetime import datetime from zoneinfo import ZoneInfo # Create a datetime object without timezone naive_dt = datetime.now() # Add the timezone to the datetime object aware_dt = naive_dt.replace(tzinfo=ZoneInfo('Asia/Shanghai')) print(aware_dt) collections nametuple from collections import nametuple # namedtuple(typename, field_names) Point = namedtuple('Point', ['x', 'y']) p = Point(x=11, y=22) print(p.x + p.y) deque from collections import deque d = deque(["a", "b", "c"]) d.append("f") # add to the right side d.appendleft("z") # add to the left side e = d.pop() # pop from the right side e = d.popleft() # pop from the left side d = deque(maxlen=10) # deque with max length, FIFO Counter collections — Container datatypes ...

Visualization

Matplotlib Basic Import from matplotlib import pyplot as plt Build figure fig = plt.figure(1) fig = plt.figure(1, figsize=(10,10)) # set figure size Tighten the layout fig.tight_layout() Build subplots ax = plt.subplot(111) ax = plt.subplot(211) # build two subplots and select the left one ax = plt.subplot(111, projection='polar') # build polar subplot Draw graphs ax.plot() ax.bar() ax.hist() ax.scatter() ax.plot_date() Show figure fig.show() Clear figure fig.clf() Save figure plt.savefig('path/name.png') Legend & Label & Tick & Grid # title ax.set_title('plot', fontsize=20) # label ax.set_xlabel('Threshold (m/s)') ax.set_ylabel('Strom periods (hours)') # ticks ax.set_xticks(np.arange(0, 1.1, 0.1)) ax.set_yticks(np.arange(0, 1.1, 0.1)) ax.set_xticklabels(labels, size=9, rotation=15) # axis limits plt.xlim(0, 1) # or ax.set_xlim(0, 1) # grid ax.grid(True) ax.grid(False) ax.yaxis.grid(True) # legend ax.plot(xx, yy, label='plot1') ax.legend(loc='lower left', frameon=False, fontsize=12) # or ax.legend(['line1', 'line2']) Two y-axis ...