Environment

安装 Python 包 你的 Python 包都装到哪了? 假设当前 Python 解释器的路径是 $path_prefix/bin/python,那么你启动 Python 交互环境或者用这个解释器运行脚本时,会默认寻找以下位置 $path_prefix/lib(标准库路径) $path_prefix/lib/pythonX.Y/site-packages(三方库路径,X.Y 是对应 Python 的主次版本号,如 3.7, 2.6) 当前工作目录(pwd命令的返回结果) 几个有用的函数 sys.executable:当前使用的 Python 解释器路径 sys.path:当前包的搜索路径列表 sys.prefix:当前使用的 $path_prefix 除此之外,还在以在命令行中运行 python -m site,会打印出当前 Python 的一些信息,包括搜索路径列表。 使用环境变量添加搜索路径 如果你的包的路径不存在上面列出的搜索路径列表里,可以把路径加到 PYTHONPATH 环境变量里 虚拟环境 虚拟环境就是为了隔离不同项目的依赖包,使他们安装到不同的路径下,以防止依赖冲突的问题。理解了 Python 是如何安装包的机制之后就不难理解虚拟环境(virtualenv, venv模块)的原理。其实,运行virtualenv myenv会复制一个新的 Python 解释器到myenv/bin下,并创建好myenv/lib,myenv/lib/pythonX.Y/site-packages等目录(venv模块不是用的复制,但结果基本一样)。执行source myenv/bin/activate以后会把myenv/bin塞到PATH前面,让这个复制出来的 Python 解释器最优先被搜索到。这样,后续安装包时,$path_prefix就会是myenv了,从而实现了安装路径的隔离。 运行 Python 脚本 运行一个子目录中某脚本的代码,应该用 python -m <module_name>。python -m 后面的参数是(以 . 分隔的)模块名,而不是路径名。 pip 运行 pip 有两种方式: pip ... python -m pip ... 第一种方式和第二种方式大同小异,区别是第一种方式使用的 Python 解释器是写在 pip 文件的 shebang 里的,一般情况下,如果你的 pip 路径是 $path_prefix/bin/pip,那么 Python 路径对应的就是 $path_prefix/bin/python。如果你用的是 Unix 系统则 cat $(which pip) 第一行就包含了 Python 解释器的路径。第二种方式则显式地指定了 Python 的位置。 ...

January 1, 2000

Libraries

任务调度 schedule install pip install schedule usage import schedule # add schedule job schedule.every(10).seconds.do(lambda: print("running")) # run scheduler while True: schedule.run_pending() time.sleep(1) add job with parameters def func(name: str): print(f"My name is {name}") schedule.every(5).seconds.do(func, name="Tom") while True: schedule.run_pending() time.sleep(1) Apscheduler Install pip install apscheduler Triggers:任务触发逻辑 cron:cron 格式触发 interval:固定时间间隔触发 date:在某固定日期触发一次 combine:组合条件触发 Scheduler BlockingScheduler: 阻塞式,当程序只运行这个 scheduler 时使用 BackgroundScheduler:调度器在后台运行 Executor ThreadPoolExecutor:默认使用多线程执行器 ProcessPoolExecutor:如果是 CPU 密集型任务可以使用多进程执行器 Job store:如果任务调度信息存在内存中,当程序退出后会丢失,可以其他存储器进行持久化存储 MemoryJobStore: 默认使用内存存储 SQLAlchemyJobStore MongoDBJobStore etc. 创建 scheduler ...

January 1, 2000

Python

Command # print version python -V # run python command python -c "print('Hello world!')" Python Files Header #!/usr/bin/python # -*- coding: utf-8 -*- Module A python file is a module main.py database.py const.py import module # method 1: import module import database client = database.Client() # method 2: import class from module from database import Client run a module as script python -m module_name # if the module is in parent/child/module_name.py python -m parent.child.module_name Package A folder of python files is a package ...

January 1, 2000

Scraper

Scraper [TOC] urllib Python built-in lib for web requesting Import from urllib.request import urlopen from urllib.request import urlretrieve from urllib.error import HTTPError Open url page = urlopen(URL) Requests HTTP for human Import import requests get/post r = requests.get(URL) r = requests.post(URL) Add Headers headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36', 'Accept': 'text/html, application/xhtml+xml, application/xml; q=0.9, img/webp,*/*; q=0.8', 'Host': 'www.zhihu.com', 'Referer': 'https://www.zhihu.com/'} r = requests.get(URL, headers=headers) Add cookies cookies = dict(cookies_are='working') r = requests.get(URL, cookies=cookies) # using cookie jar jar = requests.cookies.RequestsCookieJar() jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies') jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere') r = requests.get(url, cookies=jar) Check results ...

January 1, 2000

StandardLib

StandardLib Text Processing Services re 正则表达式 import re # 编译 datepat = re.compile(r'\d+/\d+/\d+') # 匹配 text1 = '11/27/2012' if datepat.match(text1): print('yes') # 搜索 text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' datepat.findall(text) # ['11/27/2012', '3/13/2013'] # 通常会分组匹配 datepat = re.compile(r'(\d+)/(\d+)/(\d+)') m = datepat.match('11/27/2012') print(m.group(0), m.group(1), m.group(2), m.group(3), m.groups()) datepat.findall(text) # [('11', '27', '2012'), ('3', '13', '2013')] # 返回迭代 for m in datepat.finditer(text): print(m.groups()) # 只是一次匹配/搜索操作的话可以无需先编译 re.findall(r'(\d+)/(\d+)/(\d+)', text) # 替换 re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text) # 'Today is 2012-11-27. PyCon starts 2013-3-13.' re.sub(r'(?P<month>\d+)/(?P<day>\d+)/(?P<year>\d+)', r'\g<year>-\g<month>-\g<day>', text) # 命名分组 Data Types datetime from datetime import datetime a = datetime(2012, 9, 23) # 时间转字符串 a.strftime('%Y-%m-%d') # 字符串转时间 text = '2012-09-20' y = datetime.strptime(text, '%Y-%m-%d') zoneinfo (3.9+) from datetime import datetime from zoneinfo import ZoneInfo # Create a datetime object without timezone naive_dt = datetime.now() # Add the timezone to the datetime object aware_dt = naive_dt.replace(tzinfo=ZoneInfo('Asia/Shanghai')) print(aware_dt) collections nametuple from collections import nametuple # namedtuple(typename, field_names) Point = namedtuple('Point', ['x', 'y']) p = Point(x=11, y=22) print(p.x + p.y) deque from collections import deque d = deque(["a", "b", "c"]) d.append("f") # add to the right side d.appendleft("z") # add to the left side e = d.pop() # pop from the right side e = d.popleft() # pop from the left side d = deque(maxlen=10) # deque with max length, FIFO Counter collections — Container datatypes ...

January 1, 2000

Visualization

Matplotlib Basic Import from matplotlib import pyplot as plt Build figure fig = plt.figure(1) fig = plt.figure(1, figsize=(10,10)) # set figure size Tighten the layout fig.tight_layout() Build subplots ax = plt.subplot(111) ax = plt.subplot(211) # build two subplots and select the left one ax = plt.subplot(111, projection='polar') # build polar subplot Draw graphs ax.plot() ax.bar() ax.hist() ax.scatter() ax.plot_date() Show figure fig.show() Clear figure fig.clf() Save figure plt.savefig('path/name.png') Legend & Label & Tick & Grid # title ax.set_title('plot', fontsize=20) # label ax.set_xlabel('Threshold (m/s)') ax.set_ylabel('Strom periods (hours)') # ticks ax.set_xticks(np.arange(0, 1.1, 0.1)) ax.set_yticks(np.arange(0, 1.1, 0.1)) ax.set_xticklabels(labels, size=9, rotation=15) # axis limits plt.xlim(0, 1) # or ax.set_xlim(0, 1) # grid ax.grid(True) ax.grid(False) ax.yaxis.grid(True) # legend ax.plot(xx, yy, label='plot1') ax.legend(loc='lower left', frameon=False, fontsize=12) # or ax.legend(['line1', 'line2']) Two y-axis ...

January 1, 2000