Miscellaneous Topics

On various Python topics.

zip( ) and dict( )

Python is exceptional for having native functionality that performs tasks which in other languages take some doing. The functions zip( ) and dict( ) illustrate this well.

Suppose we have two lists whose elements correspond, respectively, to an apple type and the type's typical color.

>>> apple_types = ['Red delicious', 'Yellow delicious', 'Granny Smith']
>>> colors = ['red', 'yellow', 'green']

We can use zip( ) to pair apple type with color.

>>> pairs = zip(apple_types, colors)
>>> list(pairs)
[('Red delicious', 'red'), ('Yellow delicious', 'yellow'), ('Granny Smith', 'green')]

Combine zip( ) with dict( ) to get a dictionary instead.

>>> appleDict = dict(zip(apple_types, colors))

>>> appleDict
{'Yellow delicious': 'yellow', 'Red delicious': 'red', 'Granny Smith': 'green'}

As of Python 3.x, what zip( ) returns is a zip object—not a list of pairs, as in Python 2.x. That's why we had to call list(pairs)) in the code snippet above, in effect generating all of the elements of the zip object.

The zip object is a type of iterator. The tricky thing about iterators is that you can generate their elements only one time.

>>> pairs = zip(apple_types, colors)

>>> list(pairs)
[('Red delicious', 'red'), ('Yellow delicious', 'yellow'), ('Granny Smith', 'green')]

>>> list(pairs)
[]

You see that, once all its elements are generated, the zip object is empty. So if you want to use it more than once, you need to make a copy.

>>> pairs = zip(apple_types, colors)

>>> pairList = list(pairs)

>>> pairList
[('Red delicious', 'red'), ('Yellow delicious', 'yellow'), ('Granny Smith', 'green')]

# You can reuse pairList.
>>> pairList
[('Red delicious', 'red'), ('Yellow delicious', 'yellow'), ('Granny Smith', 'green')]

# pairs, however, is empty.
>>> list(pairs)
[]

You can iterate through a zip object one element at a time just as if it were a list. But, here again, you can visit the pairs only once—after that, the generator is empty.

>>> pairs = zip(apple_types, colors)
>>> for pair in pairs:
...     print(pair)
... 
('Red delicious', 'red')
('Yellow delicious', 'yellow')
('Granny Smith', 'green')

>>> for pair in pairs:
...     print(pair)
...

"Unzipping": Given a list of tuples, you can split them up with zip( ) and the * operator.

>>> pairList
[('Red delicious', 'red'), ('Yellow delicious', 'yellow'), ('Granny Smith', 'green')]

>>> unzipped = list(zip(*pairList))

>>> unzipped
[('Red delicious', 'Yellow delicious', 'Granny Smith'), ('red', 'yellow', 'green')]

String Formatting with F-Strings

Finally, as of version 3.6, Python has a way of quickly formatting strings that use variable values.

>>> str_17 = "seventeen"

>>> dig_17 = 17

>>> print(f"'{dig_17}' is the same as '{str_17}'.") 
'17' is the same as 'seventeen'.

Here's what happens if you forget the 'f':

>>> print("'{dig_17}' is the same as '{str_17}'.") 
'{dig_17}' is the same as '{str_17}'.

And here's what happens if you try it on the wrong version of Python:

>>> print(f"'{dig_17}' is the same as '{str_17}'.")
  File "<stdin>", line 1
    print(f"'{dig_17}' is the same as '{str_17}'.")
                                                 ^
SyntaxError: invalid syntax

>>> import sys

>>> sys.version_info
sys.version_info(major=3, minor=5, micro=9, releaselevel='final', serial=0)

Ilya Kamen has a blog in Medium that explains why f-strings are better than other formatting options. He also introduces a library to convert these other types to f-strings, for sake of code readability.

PyInstaller

Oftentimes it's clumsy to run a Python script outside of your normal development environment, for example, on a network machine. I always find myself forced to do a checkout of my repository on the network machine and run the script from the working directory.

There's a real solution to this problem, and it's called PyInstaller. This program lets you build an executable from a Python program under Windows, Linux and Mac OS X.

In a Nutshell

Activate your virtual environment and install pyinstaller into your virtual environment.

$ source venv/bin/activate

(venv) $ pip install pyinstaller

You run pyinstaller from the command line. Let's suppose my project has a directory structure something like this.

- MyProject
    - NER
        - __init__.py
        - NerService.py
    - venv

NerService is a web service program inside a Python package called NER. I locate my virtual environment in the venv directory. I want the output—the executable file—to go into the root of my project, so I would run PyInstaller like so.

$ pwd
/.../workspace/MyProject

$ source venv/bin/activate

(venv) $ pyinstaller --onefile NER/Service/NerService.py

After running, two new directories are created in the root of my project, in addition to a few other files.

- MyProject
    - NER
        - __init__.py
        - NerService.py
    - build
    - dist
        - NerService
    - venv

The executable file is dist/NerService. Since I did the build on a Linux system, I can take that file, copy it to any other Linux machine, and run it like this: $ ./NerService. I don't even need to activate a virtual environment.

Troubleshooting

Regarding portability, there's one thing to be aware of: sometimes the executable built by PyInstaller won't run on a machine with an older version of certain system-level C libraries than that of the machine it was built on. An executable built on an older machine will run on a newer one; however, an executable built on a newer machine might fail on an older machine with a message like the following:

/lib64/libc.so.6: version `GLIBC_2.14' not found (required by /tmp/_MEIHvp19q/libz.so.1)

The upshot is that you should build your executable on the oldest version of Linux you mean to support.

For any other issues, and to have a look at the many options the pyinstaller program offers you, see the PyInstaller Manual.

Close to the Machine