About the title
Python - Cookbook study CHAPTER 1 - Data Strucutres and Algorithms - PCB1 for short
Something to say before everything begins
I plan to have a python cookbook study (David Beazley & Brian K. Jones). Aim to have a deeper understanding Python coding tech. And for better studying, I will make a notebook for studying process, I will upload this part to github, if I inadvertently encroaching on the interests of anyone, please contact me in time, I will delete the related information immediately.
1.1. Unpacking a Sequence into Separate Variables
Unpack N-element tuple or sequence into a collection of N variables
data = ['ACME', 50, 91.1, (2012, 12, 21)] |
and if there is a mismatch in the number of elements, you’ll get an error.
ValueError: need more than 2 values to unpack |
throwaway variable name
If you want to discard certain values.
data = ['ACME', 50, 91.1, (2012, 12, 21)] |
1.2. Unpacking Elements from Iterables of Arbitrary Length
Problems of “too many values to unpack”
Python use “star expressions” to address the problems when there are too many values to unpack.
def drop_first_last(grades): |
The unpacked variables will be returned as a list (including none)
record = ('Dave', 'dave@example.com', '773-555-1212', '847-555-1212') |
example 2 is to put the star expressions at the first one in the list.
*trailing_qtrs, current_qtr = sales_record |
and
*trailing, current = [10, 8, 7, 1, 9, 5, 10, 3] |
star syntax can be especially useful when iterating over a sequence of tupls of varying length. For example, a sequence of tagged tuples:
records = [('ffo', 1, 2), |
star unpacking combined with certain kinds of string processing operations, such as splitting.
line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false' |
combined with a common throwaway variable name, such as _ or ign (ignored).
record = ('ACME', 50, 123.45, (12, 18, 2012)) |
Split a list into head and tail components with star syntax
items = [1, 10, 7, 4, 5, 9] |
apply the star unpacking to recursive algorithm
(not recommand, because of Python is not strong about inherent recursion limit)
def sum(items): |
1.3. Keeping the Last N Items
To keep a limited history of the last few items seen during iteration or during some other kind of processing. It’s a perfect use for a cllections.deque.
What is deque
Related codes and text translated from HERE.
- Similirity with list
# deque provides the same function with part of list
from collections import deque
d = deque() #Create a 'deque' sequence
d.append(3)
d.append(8)
d.append(1)
d
d = deque([3, 8, 1])
len(d)
3
d[0]
3
d[-1]
1 - The use of pop of deque
d = deque('12345')
d = deque(['1', '2', '3', '4', '5'])
d.pop()
5
d.leftpop()
1 - The ‘length limit fuction’ of deque
d = deque(maxlen = 3)
for i in range(30):
d.append(str(i)):
d.append(str(i))
d
d = deque(['10', '11', '12'], maxlen = 3) - Extend the list to deque
d = deque([1, 2, 3, 4, 5]) |
Yield?
Codes and text is tranlated HERE
Another thing to concern is the yield, here I find some explain about this function.
At first, you can just think yield as return for easy understanding. And next, think it as a generator
def foo(): |
Here you may understand the relationship between yield and return, yield is a generator but not a function. There is a function of yield, next(), means which function to generate next step, and this time the fuction will continue from where previous next() stops, and when call the next(), generator will not begin from foo(), but from the previous stopped point. Then when meet the yield again, return the generated number, and this stip will stop.
Then another example about send()
def foo(): |
Here, <-**-> The fucntion of g.send() includes g.next(), and different from g.next(), it can pass the value 7 to res.
Why need the yield?
If there is a very big data, like (0 ~ 1000).
for n in range(1000): |
|
It is the same with xrange(1000):
for n in xrange(1000): |
However, in python3, range() is xrange() already. So you don’t need to care about this.
After the preparing knowledge about yield as well as the deque. The following code performs a simple text match on a sequence of lines and yields the matching line along with the previous N lines of context when found:
from collections import deque |
When writing code to search for items, it is common to use a generator function involving yield, this decouples the process of searching from the code that uses the reults. And using deque(maxlen=N) creates a fixed-sized queue. When new items are added and the queue is full, the oldest item is automatically removed.
More generally, a deque can be used whenever you need a simple queue structure. If you don’t give it a maximum size, you get an unbounded queue that lets you append and pop items on either end. Adding or popping items from either end of a queue has O(1) complexity. This is unlike a list where inserting or removing items from the front of the list is O(N).
1.4. Finding the Largest or Smallest N Items
The heapq module has two functions, nlargest() and nsmallest() — that do exactly what you want.
import heapq |
A key parameter that allows them to be used with more complicated data structures is accepted.
portfolio = [ |
If N is small compared to the overall size of the collection and you are looking for the N smallest or largest items.
They work by first converting the data into a list where items are ordered as a heap. For example,
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2] |
The most important feature of a heap, heap[0] is **always the smallest item. Moreover, subsequent items can be easily found using the heapq.heappop() method. For example, to find the three smallest items, you can do this:
heapq.heappop(heap) |
If you just want to find the single largest or smallest, max() and min() is faster.
If N is about the same size as the collection itself, sort first and take a slice (sorted(items)[:N] or sorted(items)[-N:] is faster).
1.5. Implemeting a Priority Queue
How to implement a queue that sorts items by a given priority and always returns the item with the highest priority on each pop operations.
import heapq |
Make data a priority, item tuple will help make a comparision between two items.
a = Item('foo') |
<-**-> Here index will function as a proper way to handle the problems when two items possess the same priority.
a = (1, 0, Item('foo')) |