新聞中心
itertools —- 為高效循環(huán)而創(chuàng)建迭代器的函數(shù)
本模塊實現(xiàn)一系列 iterator ,這些迭代器受到APL,Haskell和SML的啟發(fā)。為了適用于python,它們都被重新寫過。

本模塊標準化了一個快速、高效利用內(nèi)存的核心工具集,這些工具本身或組合都很有用。它們一起形成了“迭代器代數(shù)”,這使得在純Python中有可能創(chuàng)建簡潔又高效的專用工具。
例如,SML有一個制表工具: tabulate(f),它可產(chǎn)生一個序列 f(0), f(1), ...。在Python中可以組合 map() 和 count() 實現(xiàn): map(f, count())。
這些內(nèi)置工具同時也能很好地與 operator 模塊中的高效函數(shù)配合使用。例如,我們可以將兩個向量的點積映射到乘法運算符: sum(map(operator.mul, vector1, vector2)) 。
無窮迭代器:
|
迭代器 |
實參 |
結(jié)果 |
示例 |
|---|---|---|---|
count() | start, [step] | start, start+step, start+2*step, … |
|
cycle() | p | p0, p1, … plast, p0, p1, … |
|
repeat() | elem [,n] | elem, elem, elem, … 重復無限次或n次 |
|
根據(jù)最短輸入序列長度停止的迭代器:
|
迭代器 |
實參 |
結(jié)果 |
示例 |
|---|---|---|---|
accumulate() | p [,func] | p0, p0+p1, p0+p1+p2, … |
|
chain() | p, q, … | p0, p1, … plast, q0, q1, … |
|
chain.from_iterable() | iterable — 可迭代對象 | p0, p1, … plast, q0, q1, … |
|
compress() | data, selectors | (d[0] if s[0]), (d[1] if s[1]), … |
|
dropwhile() | pred, seq | seq[n], seq[n+1], … 從pred首次真值測試失敗開始 |
|
filterfalse() | pred, seq | seq中pred(x)為假值的元素,x是seq中的元素。 |
|
groupby() | iterable[, key] | 根據(jù)key(v)值分組的迭代器 | |
islice() | seq, [start,] stop [, step] | seq[start:stop:step]中的元素 |
|
pairwise() | iterable — 可迭代對象 | (p[0], p[1]), (p[1], p[2]) |
|
starmap() | func, seq | func(seq[0]), func(seq[1]), … |
|
takewhile() | pred, seq | seq[0], seq[1], …, 直到pred真值測試失敗 |
|
tee() | it, n | it1, it2, … itn 將一個迭代器拆分為n個迭代器 | |
zip_longest() | p, q, … | (p[0], q[0]), (p[1], q[1]), … |
|
排列組合迭代器:
|
迭代器 |
實參 |
結(jié)果 |
|---|---|---|
product() | p, q, … [repeat=1] | 笛卡爾積,相當于嵌套的for循環(huán) |
permutations() | p[, r] | 長度r元組,所有可能的排列,無重復元素 |
combinations() | p, r | 長度r元組,有序,無重復元素 |
combinations_with_replacement() | p, r | 長度r元組,有序,元素可重復 |
|
例子 |
結(jié)果 |
|---|---|
|
|
|
|
|
|
|
|
Itertool函數(shù)
下列模塊函數(shù)均創(chuàng)建并返回迭代器。有些迭代器不限制輸出流長度,所以它們只應在能截斷輸出流的函數(shù)或循環(huán)中使用。
itertools.accumulate(iterable[, func, **, initial=None*])
創(chuàng)建一個迭代器,返回累積匯總值或其他雙目運算函數(shù)的累積結(jié)果值(通過可選的 func 參數(shù)指定)。
如果提供了 func,它應當為帶有兩個參數(shù)的函數(shù)。 輸入 iterable 的元素可以是能被 func 接受為參數(shù)的任意類型。 (例如,對于默認的加法運算,元素可以是任何可相加的類型包括 Decimal 或 Fraction。)
通常,輸出的元素數(shù)量與輸入的可迭代對象是一致的。 但是,如果提供了關(guān)鍵字參數(shù) initial,則累加會以 initial 值開始,這樣輸出就比輸入的可迭代對象多一個元素。
大致相當于:
def accumulate(iterable, func=operator.add, *, initial=None):'Return running totals'# accumulate([1,2,3,4,5]) --> 1 3 6 10 15# accumulate([1,2,3,4,5], initial=100) --> 100 101 103 106 110 115# accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120it = iter(iterable)total = initialif initial is None:try:total = next(it)except StopIteration:returnyield totalfor element in it:total = func(total, element)yield total
There are a number of uses for the func argument. It can be set to min() for a running minimum, max() for a running maximum, or operator.mul() for a running product. Amortization tables can be built by accumulating interest and applying payments:
>>> data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8]>>> list(accumulate(data, operator.mul)) # running product[3, 12, 72, 144, 144, 1296, 0, 0, 0, 0]>>> list(accumulate(data, max)) # running maximum[3, 4, 6, 6, 6, 9, 9, 9, 9, 9]# Amortize a 5% loan of 1000 with 4 annual payments of 90>>> cashflows = [1000, -90, -90, -90, -90]>>> list(accumulate(cashflows, lambda bal, pmt: bal*1.05 + pmt))[1000, 960.0, 918.0, 873.9000000000001, 827.5950000000001]
參考一個類似函數(shù) functools.reduce() ,它只返回一個最終累積值。
3.2 新版功能.
在 3.3 版更改: 增加可選參數(shù) func 。
在 3.8 版更改: 添加了可選的 initial 形參。
itertools.chain(\iterables*)
創(chuàng)建一個迭代器,它首先返回第一個可迭代對象中所有元素,接著返回下一個可迭代對象中所有元素,直到耗盡所有可迭代對象中的元素??蓪⒍鄠€序列處理為單個序列。大致相當于:
def chain(*iterables):# chain('ABC', 'DEF') --> A B C D E Ffor it in iterables:for element in it:yield element
classmethod chain.from_iterable(iterable)
構(gòu)建類似 chain() 迭代器的另一個選擇。從一個單獨的可迭代參數(shù)中得到鏈式輸入,該參數(shù)是延遲計算的。大致相當于:
def from_iterable(iterables):# chain.from_iterable(['ABC', 'DEF']) --> A B C D E Ffor it in iterables:for element in it:yield element
itertools.combinations(iterable, r)
返回由輸入 iterable 中元素組成長度為 r 的子序列。
The combination tuples are emitted in lexicographic ordering according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeated values in each combination.
大致相當于:
def combinations(iterable, r):# combinations('ABCD', 2) --> AB AC AD BC BD CD# combinations(range(4), 3) --> 012 013 023 123pool = tuple(iterable)n = len(pool)if r > n:returnindices = list(range(r))yield tuple(pool[i] for i in indices)while True:for i in reversed(range(r)):if indices[i] != i + n - r:breakelse:returnindices[i] += 1for j in range(i+1, r):indices[j] = indices[j-1] + 1yield tuple(pool[i] for i in indices)
combinations() 的代碼可被改寫為 permutations() 過濾后的子序列,(相對于元素在輸入中的位置)元素不是有序的。
def combinations(iterable, r):pool = tuple(iterable)n = len(pool)for indices in permutations(range(n), r):if sorted(indices) == list(indices):yield tuple(pool[i] for i in indices)
當 0 <= r <= n 時,返回項的個數(shù)是 n! / r! / (n-r)!;當 r > n 時,返回項個數(shù)為0。
itertools.combinations_with_replacement(iterable, r)
返回由輸入 iterable 中元素組成的長度為 r 的子序列,允許每個元素可重復出現(xiàn)。
The combination tuples are emitted in lexicographic ordering according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
不同位置的元素是不同的,即使它們的值相同。因此如果輸入中的元素都是不同的話,返回的組合中元素也都會不同。
大致相當于:
def combinations_with_replacement(iterable, r):# combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CCpool = tuple(iterable)n = len(pool)if not n and r:returnindices = [0] * ryield tuple(pool[i] for i in indices)while True:for i in reversed(range(r)):if indices[i] != n - 1:breakelse:returnindices[i:] = [indices[i] + 1] * (r - i)yield tuple(pool[i] for i in indices)
combinations_with_replacement() 的代碼可被改寫為 production() 過濾后的子序列,(相對于元素在輸入中的位置)元素不是有序的。
def combinations_with_replacement(iterable, r):pool = tuple(iterable)n = len(pool)for indices in product(range(n), repeat=r):if sorted(indices) == list(indices):yield tuple(pool[i] for i in indices)
當 n > 0 時,返回項個數(shù)為 (n+r-1)! / r! / (n-1)!.
3.1 新版功能.
itertools.compress(data, selectors)
創(chuàng)建一個迭代器,它返回 data 中經(jīng) selectors 真值測試為 True 的元素。迭代器在兩者較短的長度處停止。大致相當于:
def compress(data, selectors):# compress('ABCDEF', [1,0,1,0,1,1]) --> A C E Freturn (d for d, s in zip(data, selectors) if s)
3.1 新版功能.
itertools.count(start=0, step=1)
創(chuàng)建一個迭代器,它從 start 值開始,返回均勻間隔的值。常用于 map() 中的實參來生成連續(xù)的數(shù)據(jù)點。此外,還用于 zip() 來添加序列號。大致相當于:
def count(start=0, step=1):# count(10) --> 10 11 12 13 14 ...# count(2.5, 0.5) --> 2.5 3.0 3.5 ...n = startwhile True:yield nn += step
當對浮點數(shù)計數(shù)時,替換為乘法代碼有時精度會更好,例如: (start + step * i for i in count()) 。
在 3.1 版更改: 增加參數(shù) step ,允許非整型。
itertools.cycle(iterable)
創(chuàng)建一個迭代器,返回 iterable 中所有元素并保存一個副本。當取完 iterable 中所有元素,返回副本中的所有元素。無限重復。大致相當于:
def cycle(iterable):# cycle('ABCD') --> A B C D A B C D A B C D ...saved = []for element in iterable:yield elementsaved.append(element)while saved:for element in saved:yield element
注意,該函數(shù)可能需要相當大的輔助空間(取決于 iterable 的長度)。
itertools.dropwhile(predicate, iterable)
創(chuàng)建一個迭代器,如果 predicate 為true,迭代器丟棄這些元素,然后返回其他元素。注意,迭代器在 predicate 首次為false之前不會產(chǎn)生任何輸出,所以可能需要一定長度的啟動時間。大致相當于:
def dropwhile(predicate, iterable):# dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1iterable = iter(iterable)for x in iterable:if not predicate(x):yield xbreakfor x in iterable:yield x
itertools.filterfalse(predicate, iterable)
創(chuàng)建一個迭代器,只返回 iterable 中 predicate 為 False 的元素。如果 predicate 是 None,返回真值測試為false的元素。大致相當于:
def filterfalse(predicate, iterable):# filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8if predicate is None:predicate = boolfor x in iterable:if not predicate(x):yield x
itertools.groupby(iterable, key=None)
創(chuàng)建一個迭代器,返回 iterable 中連續(xù)的鍵和組。key 是一個計算元素鍵值函數(shù)。如果未指定或為 None,key 缺省為恒等函數(shù)(identity function),返回元素不變。一般來說,iterable 需用同一個鍵值函數(shù)預先排序。
groupby() 操作類似于Unix中的 uniq。當每次 key 函數(shù)產(chǎn)生的鍵值改變時,迭代器會分組或生成一個新組(這就是為什么通常需要使用同一個鍵值函數(shù)先對數(shù)據(jù)進行排序)。這種行為與SQL的GROUP BY操作不同,SQL的操作會忽略輸入的順序?qū)⑾嗤I值的元素分在同組中。
返回的組本身也是一個迭代器,它與 groupby() 共享底層的可迭代對象。因為源是共享的,當 groupby() 對象向后迭代時,前一個組將消失。因此如果稍后還需要返回結(jié)果,可保存為列表:
groups = []uniquekeys = []data = sorted(data, key=keyfunc)for k, g in groupby(data, keyfunc):groups.append(list(g)) # Store group iterator as a listuniquekeys.append(k)
groupby() 大致相當于:
class groupby:# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC Ddef __init__(self, iterable, key=None):if key is None:key = lambda x: xself.keyfunc = keyself.it = iter(iterable)self.tgtkey = self.currkey = self.currvalue = object()def __iter__(self):return selfdef __next__(self):self.id = object()while self.currkey == self.tgtkey:self.currvalue = next(self.it) # Exit on StopIterationself.currkey = self.keyfunc(self.currvalue)self.tgtkey = self.currkeyreturn (self.currkey, self._grouper(self.tgtkey, self.id))def _grouper(self, tgtkey, id):while self.id is id and self.currkey == tgtkey:yield self.currvaluetry:self.currvalue = next(self.it)except StopIteration:returnself.currkey = self.keyfunc(self.currvalue)
itertools.islice(iterable, stop)
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position.
如果 start 為 None,迭代從0開始。如果 step 為 None ,步長缺省為1。
Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line).
大致相當于:
def islice(iterable, *args):# islice('ABCDEFG', 2) --> A B# islice('ABCDEFG', 2, 4) --> C D# islice('ABCDEFG', 2, None) --> C D E F G# islice('ABCDEFG', 0, None, 2) --> A C E Gs = slice(*args)start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1it = iter(range(start, stop, step))try:nexti = next(it)except StopIteration:# Consume *iterable* up to the *start* position.for i, element in zip(range(start), iterable):passreturntry:for i, element in enumerate(iterable):if i == nexti:yield elementnexti = next(it)except StopIteration:# Consume to *stop*.for i, element in zip(range(i + 1, stop), iterable):pass
itertools.pairwise(iterable)
返回從輸入 iterable 中獲取的連續(xù)重疊對。
輸出迭代器中 2 元組的數(shù)量將比輸入的數(shù)量少一個。 如果輸入可迭代對象中少于兩個值則它將為空。
大致相當于:
def pairwise(iterable):# pairwise('ABCDEFG') --> AB BC CD DE EF FGa, b = tee(iterable)next(b, None)return zip(a, b)
3.10 新版功能.
itertools.permutations(iterable, r=None)
連續(xù)返回由 iterable 元素生成長度為 r 的排列。
如果 r 未指定或為 None ,r 默認設(shè)置為 iterable 的長度,這種情況下,生成所有全長排列。
The permutation tuples are emitted in lexicographic order according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeated values within a permutation.
大致相當于:
def permutations(iterable, r=None):# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC# permutations(range(3)) --> 012 021 102 120 201 210pool = tuple(iterable)n = len(pool)r = n if r is None else rif r > n:returnindices = list(range(n))cycles = list(range(n, n-r, -1))yield tuple(pool[i] for i in indices[:r])while n:for i in reversed(range(r)):cycles[i] -= 1if cycles[i] == 0:indices[i:] = indices[i+1:] + indices[i:i+1]cycles[i] = n - ielse:j = cycles[i]indices[i], indices[-j] = indices[-j], indices[i]yield tuple(pool[i] for i in indices[:r])breakelse:return
permutations() 的代碼也可被改寫為 product() 的子序列,只要將含有重復元素(來自輸入中同一位置的)的項排除。
def permutations(iterable, r=None):pool = tuple(iterable)n = len(pool)r = n if r is None else rfor indices in product(range(n), repeat=r):if len(set(indices)) == r:yield tuple(pool[i] for i in indices)
當 0 <= r <= n ,返回項個數(shù)為 n! / (n-r)! ;當 r > n ,返回項個數(shù)為0。
itertools.product(\iterables, repeat=1*)
可迭代對象輸入的笛卡兒積。
大致相當于生成器表達式中的嵌套循環(huán)。例如, product(A, B) 和 ((x,y) for x in A for y in B) 返回結(jié)果一樣。
嵌套循環(huán)像里程表那樣循環(huán)變動,每次迭代時將最右側(cè)的元素向后迭代。這種模式形成了一種字典序,因此如果輸入的可迭代對象是已排序的,笛卡爾積元組依次序發(fā)出。
要計算可迭代對象自身的笛卡爾積,將可選參數(shù) repeat 設(shè)定為要重復的次數(shù)。例如,product(A, repeat=4) 和 product(A, A, A, A) 是一樣的。
該函數(shù)大致相當于下面的代碼,只不過實際實現(xiàn)方案不會在內(nèi)存中創(chuàng)建中間結(jié)果。
def product(*args, repeat=1):# product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy# product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111pools = [tuple(pool) for pool in args] * repeatresult = [[]]for pool in pools:result = [x+[y] for x in result for y in pool]for prod in result:yield tuple(prod)
在 product() 運行之前,它會完全耗盡輸入的可迭代對象,在內(nèi)存中保留值的臨時池以生成結(jié)果積。 相應地,它只適用于有限的輸入。
itertools.repeat(object[, times])
Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified.
大致相當于:
def repeat(object, times=None):# repeat(10, 3) --> 10 10 10if times is None:while True:yield objectelse:for i in range(times):yield object
A common use for repeat is to supply a stream of constant values to map or zip:
>>> list(map(pow, range(10), repeat(2)))[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
itertools.starmap(function, iterable)
Make an iterator that computes the function using arguments obtained from the iterable. Used instead of map() when argument parameters are already grouped in tuples from a single iterable (when the data has been “pre-zipped”).
The difference between map() and starmap() parallels the distinction between function(a,b) and function(*c). Roughly equivalent to:
def starmap(function, iterable):# starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000for args in iterable:yield function(*args)
itertools.takewhile(predicate, iterable)
創(chuàng)建一個迭代器,只要 predicate 為真就從可迭代對象中返回元素。大致相當于:
def takewhile(predicate, iterable):# takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4for x in iterable:if predicate(x):yield xelse:break
itertools.tee(iterable, n=2)
從一個可迭代對象中返回 n 個獨立的迭代器。
The following Python code helps explain what tee does (although the actual implementation is more complex and uses only a single underlying FIFO queue):
def tee(iterable, n=2):it = iter(iterable)deques = [collections.deque() for i in range(n)]def gen(mydeque):while True:if not mydeque: # when the local deque is emptytry:newval = next(it) # fetch a new value andexcept StopIteration:returnfor d in deques: # load it to all the dequesd.append(newval)yield mydeque.popleft()return tuple(gen(d) for d in deques)
Once a tee() has been created, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed.
tee 迭代器不是線程安全的。當同時使用由同一個 tee() 調(diào)用所返回的迭代器時可能引發(fā) RuntimeError,即使原本的 iterable 是線程安全的。
該迭代工具可能需要相當大的輔助存儲空間(這取決于要保存多少臨時數(shù)據(jù))。通常,如果一個迭代器在另一個迭代器開始之前就要使用大部份或全部數(shù)據(jù),使用 list() 會比 tee() 更快。
itertools.zip_longest(\iterables, fillvalue=None*)
創(chuàng)建一個迭代器,從每個可迭代對象中收集元素。如果可迭代對象的長度未對齊,將根據(jù) fillvalue 填充缺失值。迭代持續(xù)到耗光最長的可迭代對象。大致相當于:
def zip_longest(*args, fillvalue=None):# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-iterators = [iter(it) for it in args]num_active = len(iterators)if not num_active:returnwhile True:values = []for i, it in enumerate(iterators):try:value = next(it)except StopIteration:num_active -= 1if not num_active:returniterators[i] = repeat(fillvalue)value = fillvaluevalues.append(value)yield tuple(values)
如果其中一個可迭代對象有無限長度,zip_longest() 函數(shù)應封裝在限制調(diào)用次數(shù)的場景中(例如 islice() 或 takewhile())。除非指定, fillvalue 默認為 None 。
itertools 配方
本節(jié)將展示如何使用現(xiàn)有的 itertools 作為基礎(chǔ)構(gòu)件來創(chuàng)建擴展的工具集。
The primary purpose of the itertools recipes is educational. The recipes show various ways of thinking about individual tools — for example, that chain.from_iterable is related to the concept of flattening. The recipes also give ideas about ways that the tools can be combined — for example, how compress() and range() can work together. The recipes also show patterns for using itertools with the operator and collections modules as well as with the built-in itertools such as map(), filter(), reversed(), and enumerate().
A secondary purpose of the recipes is to serve as an incubator. The accumulate(), compress(), and pairwise() itertools started out as recipes. Currently, the iter_index() recipe is being tested to see whether it proves its worth.
基本上所有這些西方和許許多多其他的配方都可以通過 Python Package Index 上的 more-itertools 項目 來安裝:
python -m pip install more-itertools
Many of the recipes offer the same high performance as the underlying toolset. Superior memory performance is kept by processing elements one at a time rather than bringing the whole iterable into memory all at once. Code volume is kept small by linking the tools together in a functional style which helps eliminate temporary variables. High speed is retained by preferring “vectorized” building blocks over the use of for-loops and generators which incur interpreter overhead.
def take(n, iterable):"Return first n items of the iterable as a list"return list(islice(iterable, n))def prepend(value, iterator):"Prepend a single value in front of an iterator"# prepend(1, [2, 3, 4]) --> 1 2 3 4return chain([value], iterator)
當前標題:創(chuàng)新互聯(lián)Python教程:itertools—-為高效循環(huán)而創(chuàng)建迭代器的函數(shù)
標題鏈接:http://m.fisionsoft.com.cn/article/djopspd.html


咨詢
建站咨詢
