Showing posts with label Python Weekly. Show all posts
Showing posts with label Python Weekly. Show all posts

Saturday, 7 May 2016

Python Weekly #12 - Decorators 101

Decorators

Decorators 101 - how they work, and how to write them.

For someone new to Python, Decorators are one of the more baffling of the language features. In this post, I will explore why one might need decorators, and how to write them.
Simply put a decorator is a way to extend the functionality of a function or a method without changing the internal implementation of the function or crucially without changing the calling signature.

Examples of when to use decorators include :
  • To add logging to multiple functions
  • To add common code safety checks to multiple functions
  • To transform function arguments or return values

Calling Signature

The calling signature is the list of arguments that you pass to the function, and the values that the function returns. So long as the calling signature is unchanged, other code can continue to call the function as normal, as if they decorator hadn't been applied.

The Basic Principles

In python everything is an 'object' - this includes numbers, strings, and crucially here, functions. Since functions are objects in Python they can be passed as arguments to other functions, and they can be returned by other functions. Objects are `bound` to names - so this code
>>> spam = 'Spam and Eggs.'
Creates a string object - and binds it to the name spam. Names given to functions are no different. A function is an object, which is bound to a name. This means that we can do some interesting things with functions :
  • We can rename them, by assigning a different name
  • We can pass them as arguments to other functions
  • We can store them in lists, dictionaries or anywhere else
  • We can call the function from anywhere we have the function object, so long as we honour the calling signature. It is irrelevant what name we currently use, or if the function has a name at all.  
  • We can return functions from functions
It is these properties that allow us to write decorators.

Functions as arguments

>>> def greeting():
...    print 'Hello from John, Paul, George & Ringo'

>>> def logme( function ):
...    print 'INFO: Calling {} function'.format( function.__name__)
...    return function()

>>> logme( greeting )
INFO : Calling greeting function
Hello from John, Paul, George & Ringo
We are not quite at a decorator yet - but we have been able to pass a function (the greeting function) as an argument into the logme function, and then call it from the logme function. You will notice that the logme function does not use greeting by name - but it uses it's own function argument.

Although this works, it isn't very useful, as we have to remember to call the logme function, and pass the greeting function as argument, each time we want this function call to be logged, and if the greeting function had it's own arguments we would need a different version of 'logme', it would get messy to call greeting each time. Finally it is no longer obvious from the code logme( greeting ), that we are even calling the greeting function at all - all in all it makes the code more difficult to write and to read.
Before we get to a full decorator, we have one more python feature to explore - nested functions :

Nested Functions

>>> def outer( arg):
...    def inner( arg2):
...        print  'inner called with {}'.format(arg2)
...        return arg*arg2 + 1
...    print  'outer called with {}'.format(arg)
...    return inner( arg  + 1) 
...
>>> outer( 5 )
outer called with 5
inner called with 6
31
>>> inner( 6 )
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'inner' is not defined
There are a few things to note here :
  • The inner function has access to the arguments passed to the outer function. If you try it you will find that inner can change the value of arg, but that change is not visible in the outer function, even after it calls inner.
  • The inner function can only be accessed inside the outer function. As the error message shows inner can't be called from elsewhere.
Instead of having outer return the result of calling inner, what if outer just returned the inner function itself without calling it:

>>> def outer( arg):
...    def inner( arg2):
...        print  'inner called with {}'.format(arg2)
...        return arg * arg2 + 1
...    print  'outer called with {}'.format(arg)
...    return inner
...
>>> five = outer( 5 )
outer called with 5
>>> six = outer( 6 )
outer called with 6
Now the outer function doesn't return a value - it returns the inner function, and the inner function is never called (well not yet). Strictly speaking the outer function returns a function object. In our code above we call outer twice, and bind the results to the names five and six respectively. We know that the original inner object was expecting a single argument (arg2), and we know that five and six are bound to a function object so - we should be able to call them :
>>>five(2)
11
>>>five(3)
16
>>>six(2)
13
>>>six(3)
19
We can see that calling five and six generate different results - and working through the code, you can see that the five function behaves as we would expect if arg is 5, and five function behaves as we would expect if arg is 6. You will find this to always be true - when you define nested functions, and the inner function uses arguments or variables from the outer function, then the inner function will be locked to the values as they are when the inner function is defined. So lets glue this all together - and build a decorator

Our first decorator

>>> def logme( func ):
...     def wrapper( *args, **kwargs ):
...         print 'INFO : calling {}( {}, {} )'.format( function.__name__, args, kwargs)
...         return func( *args, **kwargs )
...     return wrapper
...
>>> def greeting( name ):
...     print 'Hello {}'.format( name )
...
>>> def goodbye( name ):
...     print 'Goodbye {}'.format( name )
...
>>> greeting( 'Tony' )
Hello Tony
>>> goodbye( 'Tony' )
Goodbye Tony
>>> greeting = logme( greeting)
>>> goodbye = logme( goodbye)
>>> greeting( 'Tony' )
INFO : calling greeting( (Tony,), {})
Hello Tony
>>> goodbye( 'Tony' )
INFO : calling goodbye( (Tony,), {})
Goodbye Tony
If you aren't sure - work through the logme function. Notice that it will return the wrapper function, which in turn uses the func argument which was passed to logme. If you haven't seen the *args and **kwargs notation - this is argument unpacking. You can see at the bottom of the code snippet that we redefine the names greeting and goodbye (remember function names are simply names, and they can be used and reused as we wish). We now have new functions which have added functionality, and we can add this functionality very easily to any function we wish. This ability is so useful, there is a very simple syntax which removes the need for use to redefine the function names :
>>> def logme( func ):
...     def wrapper( *args, **kwargs ):
...         print 'INFO : calling {}( {}, {} )'.format( function.__name__, args, kwargs)
...         return func( *args, **kwargs )
...     return wrapper
...
>>> @logme
>>> def greeting( name ):
...     print 'Hello {}'.format( name )
...
>>> @logme
>>> def goodbye( name ):
...     print 'Goodbye {}'.format( name )
...
>>> greeting( 'Tony' )
INFO : calling greeting( (Tony,), {})
Hello Tony
>>> goodbye( 'Tony' )
INFO : calling goodbye( (Tony,), {})
Goodbye Tony
The @logme line before each function is a special syntax which causes the function to be wrapped in the decorator (in this case the 'logme' decorator), and saves you having to redefine the function name each time. Just remember that
>>> @logme
>>> def greeting( name ):
...     pass 
...
... # The code does EXACTLY the same as the code below 
...
>>> def greeting( name ):
...     pass 
...
>>> greeting = logme(greeting)
You should now have a better understanding of how to write simple decorators, and how they work. The key rules for these simple decorators are :
  1. The Outer function takes one argument - that is the function which will be decorated
  2. The name of the outer function is the name of the decorator
  3. The Outer function is called whenever the decorator is used - i.e. whenever the `@logme' is used in the above example.
  4. The Outer function should only return the inner function. 
  5. The inner function takes the same arguments as the function being decorated (or more usefully *args & **kwargs)
  6. In general it is the inner function which implements the new functionality (either before or after calling the function)
  7. The Inner function can return anything at all, but ideally it should return the same type as the function being decorated (or raise an exception) - remember the calling signature. 
  8. The inner function is called every time the decorated function is called.
There will be other posts in this Decorator series - watch this space.

Saturday, 26 December 2015

Weekly Python #11 - Duck Typing

Duck Typing

If it walks like a duck, and quacks like a duck ....

If you are new to Python you may well have heard of Duck Typing but you might not be too sure of what it means.  In this article we will explore Duck Typing with one of the most powerful concepts in Python : Iterators, a very powerful example of duck typing.

An Iterator is any object which implements functionality to allow the user to move through a collection of data one item at a time. Iterator behaviours are both simple to use and implement. Many parts of the python standard library implement iterators : strings, lists, dictionaries, tuples, generators, and even files.

Because every iterator shares the same behaviour, then your application can write one function which will work with every single iterator - no matter what type it is, and your application can implement it's own iterators, and the same functions which work on standard library iterators, will work the same way on your iterator too.

That is the theory, lets see something in practice :

>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...     for item in iterator:
...         yield item
...         yield item
The function above will take any iterator, and will return a generator which yields each item in the original duplicated. This function relies on Duck Typing: it needs whatever is passed in as the iterator argument to be implemented as an Iterator: It will work on any standard iterator (Note that Iterables (such as Lists, strings, tuples) are also Iterators, where as generator expressions, files are only Iterators).
>>> # A list of numbers
>>> [i for i in dup([0,1,2,3,4,5,6,7,8,9])]
[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9]
>>>
>>> # A String
>>> [i for i in dup("abcdef")]
['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e', 'f', 'f']
>>>
>>> # A Tuple
>>> [i for i in dup( (0,1,2,3,4,5,6) )]
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]
>>>
>>> # A Generator expression
>>> [i for i in dup( (i for i in range(0,7) )]
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6]
This function would also work if you passed in an open file object, and would generate a duplicate for every line in the file, and with a dictionary it will generate a duplicated sequence of keys.

With a language such as C or C++, you will struggle to implement such a generic function, which would operate on strings, vectors, arrays and files, since C is a statically typed language which means you need to declare ahead the arguments that your function, and strings, vectors, arrays are all different types.

With Duck typing, as we have seen it isn't the type that matters, it is the behaviour (in this case data which behaves like an Iterable). In general you don't even worry about testing that the arguments are the correct/expected type (let the lower levels worry about raising exceptions and cascading them upwards); if your code is using hasattr or isinstance to check the type or behaviour exists before you use it, then you aren't utilizing the full power of Duck-Typing.

 Custom Errors

You might consider that if you need to raise a custom error rather than use the exception generated by the  lower level code, then the right thing to do is to check that the attributes have the right behaviour before you use them :

>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...      if not hasattr(iterator,"__iter__"):
...           raise TypeError("You can only duplicate items from an iterator")
...
...      for item in iterator:
...          yield item
...          yield item

Although this will work, it is not the best use of the language, and not very Pythonic (i.e. best practice for Python code). A Better solution is to use the power of Duck Typing, and exception handling.
>>> def dup( iterator ):
...     "A Generator which will duplicate each item in the given iterator"
...      try:
...          for item in iterator:
...              yield item
...              yield item
...      except TypeError:
...          raise TypeError("You can only duplicate items from an iterator")
Not only is this version best practice (if you really insist on a custom Exception message), but it will be quicker as it wont need to execute the hasattr test each time the function is called.

using isinstance or hasattr

There is one case where using isinstance or hasattr is completely valid thing to do (and even recommended) and that is when your code is taking data as an argument into a class __init__ which needs to be an expected type, but your code doesn't use that data until much later (as it is always better to report errors as soon as possible).

Tuesday, 22 December 2015

Weekly Python #10 - Employing disabled workers

Employing disabled workers

This article is a short post which has nothing to do with Python, but is about my other area of interest : Disability and Employment rights.

In most modern jurisdictions (certainly true in the EU), it is an obligation of employees to treat disabled workers fairly and not show any bias due to their disability. The definition of disability is pretty wide; for instance in the UK it is generally defined as anyone with a recognized chronic (i.e. long term) condition.

For an employer with worker who becomes disabled, it is responsibility of the employer to make reasonable adjustments to the job to ensure that the worker can still do their role, but what does the obligation to fairness mean when you are looking to take on a brand new employee.

During both CV and interview stage a prospective employee should concentrate on the skills of the applicants - and what they can bring to the organization, rather than what they might not be able to do based on their disability. All relevant skills and experience should be seen as positives for that candidate, and training needs are clearly costs for employing that candidate. My understanding of the law [1] suggests that if there are extra costs associated with employing a disabled candidate that these should not be counted against that person unless they would be unreasonable for the company to bear.

I have an example recently which is worth recounting :

My current employer (I am not giving any names) has a strategy to try to bring everyone together into a number of key locations (in order to improve collaboration, and reduce infrastructure and building costs). The strategy has clearly stated exceptions for employees who are disabled or otherwise unable to be based in one of these locations. I recently had a telephone interview for a design role: being a member of a team completing software and network designs.

During the interview though, I wasn't really asked about my skills or experience, or what I could bring to the role. The interviewer focused almost entirely on the company's buildings strategy and how me working from home was incompatible with that strategy, and that it was "a waste of time" to talk to me any further. The role had no operational need for everyone to be be based in the same building; no secure networks, no key customers to work with daily. There was no rationale given for why being in the office was essential to the role, although I can understand that one person working from home will impact how the team works together (most of which could be overcome with an appropriate use of technology). The only "explanation" given was the building strategy, despite it not being mandatory for all employees.

I am not suggesting that every role can be executed efficiently by everyone (a person in a wheelchair would probably find it difficult to be a scaffold rigger), but there are many cases where a disabled person would be able to do a given role just as efficiently as a non-disabled person, and therefore the critical decision should be whether that person has the right mix of technical, business and personal skills to do the role.

I accept that there is sometimes a fine line between two candidates and I am not suggesting that an employer should always choose the disabled candidate, but the employer should be looking at a disability as something to adjust to, rather than something which prevents the job from being done.

[1] I am not an employment lawyer, or a trained recruiter. I am a s/w designer and developer with 27 years experience of systems and network experience, and 2 years experience of living with a disability.



Sunday, 13 December 2015

Weekly Python #9 - Adding to a list - only one way ?

Adding to a list

Is there really only one way ?

It is one of the principles of python that "There should be one-- and preferably only one --obvious way to do it." You will notice though when it comes to many of the parts of the standard language, that there is actually many different ways to do somethings - for instance adding a single item to a list.
>>> a = [0,1,2,3,4,5,6,7,8,9,10]
>>> a.append(11)
>>> a += [12]
>>> a = a + [13]
>>> a.extend([14])
>>> a
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
All of these appear to do the same things, but behind the scenes they all do different things - and have different advantages.

a.append 

Using the append method is by far the most obvious method : It is very readable (and readability is key when writing software). It is implemented a single method call on the list method, and once the append is complete there is no return value. it all makes for a lightning quick execution
$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a.append(1001)'
1000000 loops, best of 3: 0.0452 usec per loop
0.0452 micro seconds (a micro second is one millionth of a second) to append 1,000,000 elements to a list.

Using +=

For some people this idiom is just as readable as an append but it is actually a bit less efficient, as there a few things that have to be done before the item can be appended to the list.
  • the element (12 in our case) has to be made into a list
  • The __iadd__ method is called
  • The __iadd__ method does a type check to ensure that the argument is a list
$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a += [1001]'
1000000 loops, best of 3: 0.081 usec per loop
Using the += method takes nearly 2 times longer than using append for adding single elements.

It would probably be far more efficient to use this method to add another existing list (rather than looping around the list and appending each element) - we will look at this later

Using + 

A first glance doing a = a + [13] is going to be similar in performance to the previous statement a += [12] but there is a lot more going under the skin here.
  • the element (12 in our case) has to be made into a list
  • The __add__ method is called
  • The __add__ method does a type check to ensure that the argument is a list
  • The __add__ method creates a brand new empty list to hold the result of the addition, and will need to copy the original a list and argument list into this new empty list. The name a will be bound to the new list returned by the __add__ method.
These difference go some way to explaining to performance differences :
$ python -m timeit -n 1000000 -s 'a = [i for i in range(1000)]' 'a = a + [1001]'
10000 loops, best of 3: 25.4 usec per loop
So using a straight list addition as above - it takes over 500 times longer to add a single element to a list; and it can't be recommended for use. It will also use more memory (as it creates this brand new list and fills it, before the old one is unbound), and this could also be significant for your application.

Note

This significant performance impact seems to not be connected to the size of the original list - even adding a single element to an empty list takes a similar amount of time, and creation of a new empty list doesn't seem to be the issue either. At the time of writing the performance impact seems to be caused by the re-binding of the new list to one of the names in the expression.

The only advantage is that if you have another name bound to the original list, then that original list wont be changed - but there are better ways to ensure that this happens - by taking a copy of the original list before you change it :
>>> a = [0,1,2,3,4,5,6,7,8,9,10]
>>> b = a[:]            # Take an explicit shallow copy of a
>>> a.append(11)        # Use append - which we know is a lot quicker
>>> a, b
[0,1,2,3,4,5,6,7,8,9,10,11], [0,1,2,3,4,5,6,7,8,9,10]
>>>

extend

The extend method is intended to be used to combine two lists together; a.extend(x) is equivalent to a += x (where a and x are both lists). We can also use extend to add single elements to a list :
$ python -m timeit -s 'a = [i for i in range(1000)]' 'a.extend([1001])'
10000000 loops, best of 3: 0.0924 usec per loop
So extend is approximately equivalent to using a += x - as expected.

Adding multiple elements efficiently

So far we have looked at adding single elements to a list, but often our code needs combine two lists; so it would be useful to look at the relative performance of a += x vs a.extend(x} where x is a list.
$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 'a += b'
100000 loops, best of 3: 2.25 usec per loop
$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 'a.extend(b)'
100000 loops, best of 3: 2.3 usec per loop
So within a margin of error the two are roughly equivalent. Just out of interest how much slower is appending each element :
$ python -m timeit -s 'a = [i for i in range(1000)]' -s 'b=[i for i in range(1001,2001)]' 
'for e in b:' '   a.append(e)'
10000 loops, best of 3: 47.2 usec per loop
Appending each element of a 1000 element list is 23 times slower than adding the two lists together, and it is not unreasonable to expect that the gap between the two approaches will increase as the lists get longer.

Conclusion

Although (on the face of things) there are 4 different methods for adding an element to a list, and at least 3 for combining lists together, there is really only one way that you should use; append for single items and extend of combining lists. The guiding "One obvious way" principle espoused by Python experts still holds, although sometimes it isn't obvious which way is the obvious one.

Friday, 4 December 2015

Python Weekly #8 - I know Python - now what ?

I know Python - so now what ?

In this article I wanted to write a few words about a question that I have seen regularly on a number of different support forums. The question is generally of the form given in the title : New programmers have adsorbed the syntax and control structures of Python, and now want to know what to do with it.

Have you really learned Python ?

The first thing that springs to mind is whether the person asking the question has really learned the language at all. The syntax of Python is relatively simple, and the basic data types (str, list, int, bool etc) are relatively easy to learn and intuitive, but Python is so much more than that - for instance - do you know what this does before trying it :
>>> import antigravity 
This is just one (albeit just a bit of fun) of over 100 different importable modules that come delivered with python (The Standard Library) that allows you to do wonderful things with it. Beyond that there is the world of pyPi (python Package index) - a world of over 70,000 modules and packages developed across the world and freely available for you to use in  a few easy steps.

These package provide functionality that far extends the basic syntax, and it is all reusable for free.

Now I am not suggest you should know everyone of the 70,000+ modules on pyPi or all the details even of the 100+ modules that form the Standard Library, but you should be aware that they exist, how to find them, how to read the documentation, and at least have a working knowledge of some of the main ones (See my previous article Batteries included - 5 Standard Library Packages everyone should know). In the case of the pyPi contributions you should also know how to install these to your local environment (Hint : you will need to use pip).

Practice, Practice, Practice

The best way to learn anything and get proficient at it is to continue to practice. There are a number of ways to hone your skills in Python and these include :
  • Project Euler : "a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems". A heavy mathematical bent, but well worth trying some to hone your skills, especially when it comes to writing efficient and performant code. There are 526 Project Euler problems created at the time of writing.
  • The Python Challenge : Billed as "The first programming riddle on the internet", this is a chain of problems which are solved by deciphering a set of clues, and then writing a short scripts to calculate/derive the answer - which is then the form of the URL for the next riddle". For the most part they rely on the standard library. There are 33 steps to the riddle currently.
  • Python programming challenges : A small set of 7 programming challenges set to support the UK GCSE Computer Programming qualification (GCSE is aimed at students aged around 16).
  • CodingBat Python problems : "CodingBat is a free site of live coding problems to build coding skill in Python created by  who is computer science lecturer at Stanford. The coding problems give immediate feedback, so it's an opportunity to practice and solidify understanding of the concepts.
These (and similar sites) allow you to practice without the difficulty of trying to come up with your own ideas - and if you had your own "Big Idea" then you wouldn't be reading this article.

What are you interested in ?

If you really want to get moving on your own big idea, then my advice is to look around you - is there something in your life, that you do that would benefit from a computer program to help you do it. This could be something from your work environment or your home life (Caution if you are going to tackle a work based problem - check that you wont be in breach of any work based security rules before you start. I would hate for you to loose your job because you followed my advice, identified a work problem and  installed Python on your work PC/laptop without permission).

The reason behind this advice is simple: If you need something, (or even think you need something) it is far more likely that you will work on it until it is complete (or at least good enough). If you start on something that you don't need, or you are not enthused about, you will be far less likely to complete it.

I will give you an example based on my first application developed in 2008.  I am an amateur photographer, and I use flickr to store the majority of my pictures. At the time the flickr uploader on the website was clunky, slow, unreliable and required a lot of work post upload to add tags, titles, to the pictures, as well needing to add pictures to folders etc. It was not suprising then that my first project was to write my own PC based uploader, that allowed me to take groups of pictures - add them into folders, add tags, titles, descriptions,  change permissions etc. and then to upload them to flickr at a single click.  This application required me to work with a number of different libraries and packages to complete the functionality including :
  • pyGTK  - Python bindings for the Gnome Toolkit - A fulll featured GUI framework
  • PIL - Python Image Library - an image manipulation toolset
  • flickrapi - A Python interface onto the http based Flickr API.
I completed this - and used it "in anger" for several years (fixing bugs and adding more features) as I went along  until in around 2012 the web site based uploader was re-developed and is now far better than it was. My pyFlickr application although complete is now never used.

I have to admit that my development directory is littered with programming projects which I have started but never completed - mainly because at the end of the day I didn't need it.

Get Involved

If you haven't got a project that you need to write, but you still want to keep your Python skills fresh, then look around for a project that others have started but that you can work on (by it's very nature much of the work published on pyPi and other places is Open Source, and many projects are looking for collaborators, either to fix bugs or assist in developing new features). Find a project that interests you and contact the authors.

Many of the bigger projects will require some proof of experience and knowledge of python - a project you have completed yourself - or contributions to other projects. Nearly all projects will require you to have a good working knowledge of one of the python testing frameworks and knowledge of how to use the chosen method of change control, and I would suggest that you should be proficient in being able to profile your code in terms of performance and memory usage, and in being able to measure metrics like code coverage etc.

Sunday, 29 November 2015

Python Weekly #7 - An Easy to use extend plugin framework

An easy to extend plugin framework

In a number of my projects I have made several iterations of developing a plugin framework - that is, a way that functionality can be extended easily (by me, or anyone else) without having to explicitly editing the main code.
An example might be a game where the user has to manage a number of different types of resources produced by different types of buildings, and with specialist people/aliens/trolls etc staffing those buildings and using those resources. With an efficient plugin system, it is relatively easy to imagine  the base game defining a starting set of buildings, resources, and types of workers/aliens etc, and then be able to extend the game adding new buildings, new resources etc by simply adding a new plugin, and without anyone editing the main game.
A plugin system has to have the following characteristics : 
  1. Automatic Discovery : Newly added plugins should be able to be automatically discoverable - i.e. the application should be able to find new plugins easily without any user intervention
  2. Clear Functionality : It should be obvious to the application what "type" of functionality the plugin adds (using the game example above does the plugin add new resources, new buildings or new workers - or all 3 ?).
  3. Is there a simple way for the user to use the plugin once it is added; for instance are plugins added to existing menus, or do they create obvious new menus etc.
This article is going to describe a simple framework that can be used to extend your python applications. The framework certainly addresses the first two points of the list above, and I will give you some pointers on the 3rd item, as there is no generic solution to it - it really does depend on your application

Source Code

Source code to accompany this article is published on GitHub : plugin framework. This includes a package which encapsulates the code within a simple to use class, an example very simple application (showing how to use the framework), and a very simple skeleton plugin, showing the very simple requirements that any plugin should implement to work with this framework.

1 -Automatic Discovery

This is actually three parts; can we find the code that forms the plugin, can we load this code (a python module or package), and can we identify which parts of this loaded code are actually the plugins and which are supporting code only being used by the plugin.

Finding the python code

Perhaps unsurprisingly, this is the simplest problem to address - the application can keep all of the plugins in a standard directory (or maybe two - a system wide and user specific directory) :
import sys
import os

def get_plugin_dirs( app_name ):
    """Return a list of plugin directories for this application or user

    Add all possible plugin paths to sys.path - these could be <app_path>/plugins and ~/<app_name>/plugins 
    Only paths which exist are added to sys.path : It is entirely possible for nothing to be added.
    return the paths which were added to sys.path
    """
    # Construct the directories into a list
    plugindirs = [os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])), "plugins"),
                  os.path.expanduser("~/.{}/plugins".format(app_name) )]

    # Remove any non-existant directories
    plugindirs =  [path for path in plugindirs if os.path.isdir(path)]
  
    sys.path = plugindirs + sys.path
    return plugindirs

Note

The get_plugin_dirs function presented above relies heavily on the os.path library, since this is the most portable way to ensure that the application correctly constructs valid file paths etc.

We have a list of zero or more directories which may contain plugin code, so - lets identify code in those directories.
In Python code could exist as either :
  • An uncompiled python file, with the `.py` extension.
  • A compiled python file, with the `.pyc` extension
  • A C extension, with `.so` extension (or similar)
Thankfully - python makes it very easy to identify all of these files : use imp.get_suffixes() (in python 3.5 you should use importlib.get_suffixes()). Because of the features we want to use later we actually only want to use the python files (compiled and uncompiled) - and not any of the C extensions.

Plugins written in C ?

If you are adept enough to write an extension in C which you want to use as a plugin, then you can also easily write one or more wrappers in Python around you C extension code so that it complies with our framework - more on that later.
import imp
import importlib

def identify_modules(dir_list):
    """Generate a list of valid modules or packages to be imported

    param: dir_list : A list of directories to search in
    return: A list of modules/package names which might be importable
    """
    # imp.get_suffixes returns a list of tuples : (<suffix>, <mode>, <type>)
    suff_list = [s[0] for s in imp.get_suffixes() if s[2] in [imp.PY_SOURCE, imp.PY_COMPILED]]
      
    # By using a set we easily remove duplicated names - e.g. file.py and file.pyc
    candidates = set()

    # Look through all the directories in the dir_list
    for dir in dir_list:
        # Get the content of each dir - don't need os.walk
        dir_content = os.listdir(dir)

        # Look through each name in the directory
        for file in dir_content:

            # Does the file have a valid suffix for a python file
            if os.path.isfile(os.path.join(dir,file)) and os.path.splitext(file)[1] in suff_list:
                candidates.add(os.path.splitext(file)[0])
              
            # Is the file a package (i.e. a directory containing a __init__.py or __init__.pyc file 
            if os.path.isdir(os.path.join(dir, file)) and
                      any(os.path.exists(os.path.join(dir, file, f)) for f in ["__init__"+s for s in suff_list]):
                candidates.add(os.path.splitext(file)[0])
    return candidates 
In the final discovery step - we need to see if any of the identified files actually implement a plugin, and for this step we can use a hidden gem of the Python Standard Library - the inspect library. The inspect provides functionality to look inside python modules and classes, including ways to list the classes within modules, and methods in classes (and a lot more besides). We are also going to make use of a key feature of Object Oriented programming - inheritance. We can define a basic PluginBase class, and use the inspect library to look at each of out candidate modules to find a class which inherits from the plugin class. In order to comply with our framework, the classes which implement our plugins must inherit from PluginBase.

Location of PluginBase

Currently we are presenting our framework as a set of functions - without identifying a module etc. If our plugin classes are going to inherit from PluginBase, then our framework, and especially PluginBase will need to exists in a place where it can be easily imported by our plugin modules. This is achieved by having the PluginBase class defined in a top level module, or a module in a top level package. (A top level module/package is one that exists directly under one of the entries in sys.path).
import inspect
import importlib

class PluginBase(object):
    @classmethod
    def register(cls_):
        """Must be implemented by the actual plugin class
           
           Must return a basic informational string about the plugin
        """
        raise NotImplemented("Register method not implemented in {}".format(cls_))

def find_plugin_classes(module_list):
   """Return a list of classes which inherit from PluginBase
   param: module_list: a list of valid modules - from identify_modules
   return : A dictionary of classes, which inherit from PluginBase, and implement the register method
            The class is the key in the dictionary, and the value is the returned string from the register method
   """
   cls_dict = {}
   for mod_name in module_list:
       m = importlib.import_module(mod_name)
       for name, cls_ in inspect.getmembers(m, inspect.isclass):
           if issubclass(cls_, PluginBase):
              try:
                 info = cls_.register()
              except NotImplemented:
                  continue
              else:
                  cls_dict[cls_] = info

   return cls_dict
And there we have it - the basis of plugin characteristic #1- That the plugin is automatically discoverable. Using the code above - all that the Plugin implementation needs to do is to be in a module which exists in one of the two plugin directories, be a class which inherit from the PluginBase class, and implement a sensible register method.

2 - Clear Functionality

The clear functionality characteristic is incredibly easy to implement, again using inheritance. Your application will have a number of base classes which define the basic functionality that each element of your game will implement, so just ensure that your plugin classes inherit from one of these classes.

import collections

def categorise_plugins(cls_dict, base_classes):
    """Split the cls_dict into one or more lists depending on which base_class the plugin class inherits from"""
   
   categorise = collections.defaultdict(lambda x: {}) 
   for base in base_classes:
       for cls_ in cls_dict:
          if issubclass(cls_,base):
             categorise[base][cls_] = cls_dict[cls_]
   return categorise
We can put all of this together into a useful helper class for the loading and unloading of the plugin functionality - see plugin_framework on GitHub for the full implementation, and a demonstration simple application.

3 - Simple to Use

How your application makes the plug-in simple to use and accessible really does depend on your application, but as promised here are  some pointers :
  • The register() method on the plugin-class could be used to return information on which menus etc this plugin should appear - or even if a new menus, toolboxes etc should be created to allow access to this plugin. 
  • In most cases the plugin class should also allow itself to be instantiated, and each instance may well be in a different state at any given time. The class therefore will need to implement methods to allow the the application to use those instances, to change their state etc.
It is up to the application to define the expected interface that is expected of each different BaseClass (i.e. the attributes, classmethods, staticmethods and instance methods). This definition should be clearly documented.

Saturday, 21 November 2015

Python Weekly #6 : Regular Expressions - a starter for 10

Regular expressions

or "How I stopped worrying and learned to love regular expressions"
(with apologies to Stanley Kubrick)

In a previous post (Batteries Included - 5 modules every python developer should know) I mentioned that everyone should learn about regular expressions and the re package in the Standard Library. I will be completely honest at this point and say that when I wrote that post, I really did not understand the package myself, but I did understand how important it is, and that I really should learn it one day. That day is now - I have started to learn regular expressions, and some of my early experiments are documented in this article.

In general terms, a regular expression (or regex) is a way of executing a very flexible search (and potentially replace) type operation against a piece of text, without having to worry about parsing each character at a time. Python has the re package which implements a powerful set of regex syntax, and I will be using this in this article.

Useful sites

A few useful sites which would be worth bookmarking :
This article will work through building a regex to extract telephone numbers from some arbitary text. Throughout this article, I am going to use the findall method to find matching sub strings. Depending on your uses, it could be more advantageous to use some of the other methods within the re package. For instance if you just need the first match, it is likely to be much more efficient to use the search method, or if you plan to simply separate the text into various sections - then the split method is probably what you need.
Lets start by simply trying to find numbers within your text, and the regex will build with complexity and functionality as we go.
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"\d+", text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334', '456', '789', '07664', '282', '777', '8', '6', '08', '00', '18','00']
And there is the power of regex in a nutshell - one simple line. Let's explain that one line :
return re.findall(r"\d+", text_str)
The key part of this is the first argument to the findall - this is the regular expression (regex):
  • r : this signifies a raw string - which means that python won't try to do anything with the '\' characters within the string - i.e. what gets passed to the findall method is exactly what you type. It is strongly recommended that you always use raw strings when constructing regexs.
  • \d : A special regex sequence which specifically matches a single digit
  • + : A special character (a repeater) which means that you should match at least one or more of the previous matched sequence - i.e. we match one or more digits
The second argument to the findall method is termed the 'target text', which is the text to be searched or matched. The findall method searches the target text to find all the non-overlapping sub strings with match the regex - in our case all substrings which are numbers. This regex is greedy, which means that each match is the longest possible substring it can be - which is what we wanted here.

Greedy vs not-greedy

Many discussions and tutorials use the terms 'greedy' and 'non-greedy' without clear explanations. Hopefully these examples will make things clear.
  • A regex of r"\d+" with a target text of "1234" will match "1234" - it is a greedy match, and uses the longest possible string.
  • A regex of r"\d+?" with a target text of "1234" will match "1" - it is non-greedy match - and uses the shortest possible string.
As you can see - findall returns a list of the matched sub-strings if there are any, or any empty list if there are no matches, but we don't actually want all of those matches (as you can see it has found the text relating to the opening times, as well as just the phone numbers.

Let's start building up a more complex regex - that will just extract the phone numbers (or at least things that look like phone numbers). From now on I will use the verbose flag - so that we can document the regex as we go.
Lets start with our first format - that will match a phone number of the format nnnn nnnn nnn (4 digits, 4 digits, 3 digits), with simple spaces separating the various number groups.
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...         (?x)              # Verbose mode
...         \d{4}             # 4 digits
...         \s                # space
...         \d{4}             # 4 digits
...         \s                # space
...         \d{3}           # 3 digits - followed by a non digit
...         """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334 456 789']
This seems to do what we want - but just for one format, and it would probably be inefficient to do separate searches for each possible format, so we will use the '|' combination operator to combine multiple searches into a single regex
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...         (?x)                   # Verbose mode
...         \d{4}\s\d{4}\s\d{3}    # nnnn nnnn nnn
...         |                      # Or
...         \d{5}\s\d{3}\s\d{3}    # nnnnn nnn nnn
...         """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334 456 789', '07664 282 777']
There is no limit to how complex the search sequence can be between the '|' characters, and brackets are not needed to group together the regex patterns either side of the '|'.
 Our regex has found both of the phone numbers, and importantly it no longer finds the time information. So how could we extend it ? In the uk - all full format numbers will always start with a zero :
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...         (?x)                   # Verbose mode
...         0\d{3}\s\d{4}\s\d{3}    # 0nnn nnnn nnn
...         |                      # Or
...         0\d{4}\s\d{3}\s\d{3}    # 0nnnn nnn nnn
...         """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334 456 789', '07664 282 777']

We continue to add functionality : In the UK we have a habit of putting brackets around the local code (the first block of numbers), but the brackets are optional - so let's extend our regex:
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...     (?x)                    # Verbose mode
...     \(?                     # Optional opening bracket
...     0\d{3}                  # 0nnn dialing code
...     \)?                     # Optional closing bracket
...     \s\d{4}\s\d{3}          # remainder of number : nnnn nnn
...     |                       # Or
...     \(?                     # Optional opening bracket
...     0\d{4}                  # 0nnnn dialing code 
...     \)?                     # Optional closing bracket
...     \s\d{3}\s\d{3}          # remainder of number : nnn nnn
...     """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334 456 789', '07664 282 777']
>>> find_nums( "Call us on (0133) 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334 456 789', '07664 282 777']
There is a problem - this simple regex is very naive - and will match a number which is probably malformed : 
>>> import re
>>> find_nums( "Call us on 0133) 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
['01334) 456 789', '07664 282 777']
and this is probably not what you want to happen - it certainly looks wrong, so lets make our regex more robust :
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...      (?x)                    # Verbose mode
...      (                       # Outer Group 0nnn nnnn nnn or 0nnnn nnn nnnn
...          (                   # Group 1 - 
...               \(0\d{3}\)     # match (0nnn)
...               |              # or
...               0\d{3}         # match 0nnn
...          )                   # Close Group 1
...          \s\d{4}\s\d{3}      # remainder of number : nnnn nnn
...          |                   # Or
...          (                   # Group 2 
...               \(0\d{4}\)     # match (0nnnn)
...               |              # or
...               0\d{4}         # match 0nnnn
...          )                   # Close Group 2
...          \s\d{3}\s\d{3}      # remainder of number : nnn nnn
...      )                       # Close Outer Group
...          """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('0133 4456 789', '', ''), ('07664 282 777', '', '')]
>>> find_nums( "Call us on (0133) 4456 789 or (07664) 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('(0133) 4456 789', '(', ''), ('(07664) 282 777', '', '(')]
>>> find_nums( "Call us on 0133) 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('07664 282 777', '', '')]
Because we are now using groups - the findall method returns a list of tuples (where each element of the tuple is associated with a group in the regex, in the same order that the group starts in the regex - if you are not sure simply count along group starting '(' in your regex - be beware of other '(' in your regex which don't start groups.

Beware

One issue that is commonly faced is that the regex syntax uses the same symbols for lots of different things, and they can be tough to read. This is why the verbose mode is so useful, as you can clearly annotate exactly what your regex does, and where groups start (and end).
In our case we have a outer group which surrounds all of our regex - and this appears as the 1st element in our tuple. The other elements are associated with the groups we have used to capture the opening brackets, and they can be ignored in this example.

This seems like the finished article - we can detect both common number formats (with and without brackets), and we can reject numbers with incorrect brackets. But - there is an issue (we are ignoring that there are a number of other UK number formats or UK numbers with the international prefix). We tested for a misplaced closing bracket - but what about a misplaced opening bracket ?
>>> find_nums( "Call us on (0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('(0133 4456 789', '' ''), ('07664 282 777', '', '')]
By including another test case we can see that our current regex will ignore that one of our numbers has a leading brace and no closing brace, and match against it anyway. The problem we have is that although the regex allows for brackets around the dialling code, it doesn't exclude a '(' in the case which is meant to be matching the un-bracketed local code, so lets fix our regex. We will use a special syntax called a Negative Lookbehind Assertion : the syntax is (?<!...). This looks back along the string being matched and generates a positive result if the preceeding characters do not match the given pattern - in our case (?<!\() will only match if the previous character is not a '(' - which is exactly what we need.
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...      (?x)                    # Verbose mode
...      (                       # Outer Group 0nnn nnnn nnn or 0nnnn nnn nnnn
...          (                   # Group 1 - 
...               \(0\d{3}\)     # match (0nnn)
...               |              # or
...               (?<!\()0\d{3}         # match 0nnn
...          )                   # Close Group 1
...          \s\d{4}\s\d{3}      # remainder of number : nnnn nnn
...          |                   # Or
...          (                   # Group 2 
...               \(0\d{4}\)     # match (0nnnn)
...               |              # or
...               (?<!\()0\d{4}         # match 0nnnn
...          )                   # Close Group 2
...          \s\d{3}\s\d{3}      # remainder of number : nnn nnn
...      )                       # Close Outer Group
...          """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('0133 4456 789', '0133', ''), ('07664 282 777', '', '07664')]
>>> find_nums( "Call us on (0133) 4456 789 or (07664) 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('(0133) 4456 789', '(0133)', ''), ('(07664) 282 777', '', '(07664)')]
>>> find_nums( "Call us on 0133) 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('07664 282 777', '', '07664')]
>>> find_nums( "Call us on (0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [(('07664 282 777', '', '07664')]
This now seems to do exactly what we expect - well formed phone numbers are extracted from an arbitrary piece of text. The lesson from this step is if there is an optional substring, your regex might need two paths: one which looks for the optional substring, and a second path which works on the premise that the option substring. There is one final issue to address, which can be illustrated by the following test case :
>>> find_nums( "Call me on 0133 4456 78977")
 [('0133 4456 789', '', '')]
Our regex matches numbers which are too long, but which contain a valid number format - we need to ensure that there are no digits after the last one we match. This is relatively easy - We can use a Negative Lookahead Assertion : the syntax is (?!...). This looks back forward along the string being matched and generates a positive result if the next characters do not match the given pattern - in our case (?!\d) will only match if the next characters are not digits - which is exactly what we need.
>>> import re
>>> def find_nums_re(text_str):
>>>     return re.findall(r"""
...      (?x)                        # Verbose mode
...      (                           # Outer Group 0nnn nnnn nnn or 0nnnn nnn nnnn
...          (                       # Group 1 - 
...               \(0\d{3}\)         # match (0nnn)
...               |                  # or
...               (?<!\()0\d{3}      # match 0nnn
...          )                       # Close Group 1
...          \s\d{4}\s\d{3}(?!\d)    # remainder of number : nnnn nnn
...          |                       # Or
...          (                       # Group 2 
...               \(0\d{4}\)         # match (0nnnn)
...               |                  # or
...               (?<!\()0\d{4}      # match 0nnnn
...          )                       # Close Group 2
...          \s\d{3}\s\d{3}(?!\d)    # remainder of number : nnn nnn
...      )                           # Close Outer Group
...          """, text_str)
>>>
>>> find_nums( "Call us on 0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('0133 4456 789', '0133', ''), ('07664 282 777', '', '07664')]
>>> find_nums( "Call us on (0133) 4456 789 or (07664) 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('(0133) 4456 789', '(0133)', ''), ('(07664) 282 777', '', '(07664)')]
>>> find_nums( "Call us on 0133) 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('07664 282 777', '', '07664')]
>>> find_nums( "Call us on (0133 4456 789 or 07664 282 777 between 8am to 6pm (08:00 to 18:00)")
 [('07664 282 777', '', '07664')]
>>> find_nums( "Call me on 0133 4456 78977")
 []

Non-capturing regex

There is a concept called a non-capturing regex, the syntax is (?:...). The intention is that these patterns match against the 'target text', consume the characters that are matched, but unlike ordinary groups, the results aren't captured as a separate group in the results (i.e. no extra elements in the reported tuple from the findall method, or reported as group in the results from the search or match methods.
At first glance it seems that we could have used them instead of the Lookahead or Lookbehind assertions (which we used in the last two iterations), as all we want to do is ensure that certain substrings exist (or don't), but we don't care what they are. But, in reality, there is a wrinkle :
It is true that uhe non-capturing group (?[^\(]) will only match if there isn't a '(', and there wont be a group created for whatever is there, but the character sequence that is there will be reported as part of the containing group, and since the entire regex is treated as group, the character sequence will be reported as the content of the regex - which is not what we want in this case.
The main benefit of a non-capturing regex is that they can be added to existing sequences without impacting the numbering of the existing groups, and there is a small performance improvement.

Final Comments

It should be noted that we are only testing our regex pattern with 5 test cases - which is definitely not sufficient for production software, but it does seem like we have most of the issues solved.

For those of you who like picture, this is a visualization (created by Debugex) of our final regex.

Saturday, 14 November 2015

Python Weekly #5 : Batteries Included - 5 Standard Library Packages every python developer should know

Batteries Included

5 Standard Library Packages every python developer should know

One of the reasons that python is incredibly popular is that as a language it is relatively fully formed out of the box. By fully formed I mean that many of the low level, and even medium level functionality is already developed and included in the Standard Library (this is what is often referred to as Batteries Included). The Standard Library is also very well documented for all versions of python - so it is difficult to go wrong.

It is also often the case that even if the standard library doesn't include what you need - someone else may have already published what you need : (See the pyPi - Python Package Index - for a complete list of everything that has been published by other developer - 69350 packages so far. It can be tricky to search if you aren't sure exactly how to describe what you need).

In this blog I am going to summarise 5 packages within the standard library that every developer should learn - in no particular order

re : Regular expression matching

Many applications will require some form of text proceesing, and if you have to go any more complex than finding a simple string in a piece of input, then you should learn how to use regular expressions - as implemented by the re module.
Regular expressions are a fantastic way to fuzzy search a text string, and extract sections of the text. In a future blog post I will cover the module in more detail.
Despite the power of regular expressions, some text problems are not really conducive to using regular expressions - for instance parsing XML, HTML or CSS : In these cases it is far better to use dedicated libraries to parse those types of input.
A word of warning though : regular expressions are very powerful, and can be complex. They will take time to learn and even longer to master - they are worth it - there are plenty of tutorials available.

datetime : Manipulation of dates and time stamps

Why bother with datetime ? -  very simply (unless you are asked to do it as a college assignment) life is far too short to try to write your own module to handle date and times. Seemingly simple rules can become very complex when you have to take into account leap years, leap seconds, time zones, different formats, and all the other nuances. Don't make the effort, even to write a simple module, when the datetime module (and it's partners calendar) already exist, and are known to work.

logging : flexible event logging system

The logging module is a flexible and standard method for getting useful information from your application, instead of including lots of print statements/functions in your code while developing, and then removing them before delivery (and risking breaking your code). With the logging module in use - just change your logging level - and your debug logging will cease - although the best way to debug is to actually use a debugger, in combination with your logging information, and of course your automated test cases.

argparse : Command line argument parse

Many complex applications will support a command line with one or more options and arguments. The argparse package within the standard library contains all the tools you need to parse the command line and extract the arguments and options. You will need to write code to identify invalid combinations, or invalid arguments, although in many cases this should not be that complex.

unittest : A unit testing framework

I mentioned in my previous blog post : 10 things to learn as a python developer, and again in : Incremental Development, it is vital that you use some technique to allow you to run a consistent set of tests against your code, to ensure that it works and continues to work as you develop it. There are plenty of mechanisms that do this, but I would strongly recommend that you use unittest, a very powerful set of tools which allow you to manage and execute your tests. 

Saturday, 7 November 2015

Python Weekly # 4 : Scoping - arguments and variables.

Python Scoping

Arguments, variables and other things

In this article I wanted to explore something of the rules which govern python scoping. Scoping is how Python decided when and where a name is valid and accessible. In Python names are not the same as objects or data, so even if the name is no longer valid, the object may well still exist (it depends on how many other references there are to the object - in simple terms how many other names are bound to that object or how many times the object appears in a dictionary or list).

When a python program executes, scopes are created for each module that is imported, and each class that is defined (although there is a twist to class scopes explained below). A scope is also created each time a function or method runs. You can think of a scope as like a dictionary which matches a name to the object that it is bound to.  As the program runs, python keeps track of the various scopes which are created, in a nested fashion, and at any point, there one more scopes which exist.

The scoping rules are actually relatively simple : 
  • When a name is bound to an object it is created in the inner most scope by default. If a function is being executed, than a name will be created in that function's scope, unless the name has been listed on a global statement
  • When a name is referenced (i.e. not created), the name is searched for  inner most scope first, and then going outwards towards the module and builtin scopes
A name is bound when :
  • It is the name of a module which is imported
  • It is the name of a class which is being defined
  • It is the name of a function which is being defined
  • It is the name of an argument to a function which is being executed.
  • It is a name which appears on the left hand side of an assignment
  • It is a name which appears as the loop variable in a for loop 
  • It is a name which appears as in an except statement.
Some examples would be helpful here :
  • A function defined in a module : If a name is defined in the function which matches a name defined in a module, then the version defined in the function will be used in the function (unless the global statement is used for that name) - that includes any arguments which are defined for that function.
  • If a function is defined in another function : The inner function can refer to any names defined in the outer function (or of course the module/builtin scope), but the outer function can't refer to names in the inner function. If the inner function rebinds a name used in the outer function, that doesn't change the binding made in the outer function :
     
    
    >>> def outer():
    >>>    a = 1
    >>>    def inner():
    >>>        a = 2
    >>>        print "Inner ",a
    >>>    print
    >>>    print "Outer a",a
    >>>    inner()
    >>>    print "Outer b",a
    >>>
    >>> outer()
    
    Outer a 1
    Inner 2
    Outer b 1
    
     
    
  • If a function is defined in a class (i.e. a method), then the method cannot access anything at the class scope, without using either the instance or class identity and qualifying the name - i.e using either self.name or self.__class__.name (or something similar) - this is the class scope twist that was mentioned above. The decision to use a qualified name (rather than just the name) is so that there is a single way that your code refers to or rebind names in the outer scope (contrast that with the nested function example above where the inner function has no way to rebind the name defined in the outer scope).
These scoping rules are fairly sensible (the principle of "least surprise" is common in Python - meaning code should always do what the developer expects it to do) and doesn't hold that many surprises to people already used to other programming languages. However, unlike in some other languages there are no compiler/interpret warnings if you define a name which hides/masks one of the builtin names. In theory that means you could redefine one of the very critical functions - like open (although it is not recommended unless you really know what you are doing, as you can easily break things).

Beware !!!

In Python 2.7 any name defined in a list comprehension is treated exactly as if the name had been defined in a for loop, or similar.
 
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
>>> l = [a for a in range(10)]
>>> a
9
 
This is one example of where the "least surprise" principle isn't actually maintained, especially when a similar generator expression does not do the same - in this case trying to access the generator loop variable does not succeed:
 
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
>>> l = [(a for a in range(10))]
>>> a 
NameError: name 'a' is not defined 
 
This surprise is due to how Python2.7 implements the list comprehension in the first example, and this implementation issue is resolved in Python 3. It is definitely not recommended that you write any form of code that relies on this implementation detail in Python 2.7, as it is simply not clear what the expected value should be, and your code will break when translated from Python 2 to Python 3.

Saturday, 31 October 2015

Python Weekly # 3 : Incremental development

Incremental Development

or how a long journey should start with a single step

Unlike most of the other articles in this blog,  this article really isn't about python, but the language does lend itself to this style of project.

When you start a software project, it is very easy to dive straight in and start coding - with a good idea of the end goal, but maybe with not much of an idea of how you get from a empty file to the final application. I have made exactly that mistake myself - and my Development folder is littered with half started "big" projects.

It is much more efficient to take small steps, coding, testing, debugging and then repeating until finished. This is Incremental Development, and there are a few good reasons why it works - and works well :
  • You know that each step works - so when you start your next step you are not introducing bugs on top of bugs.
  • You might well surprise yourself and finish early - as you realise you get to point where your application is good enough, and you don't need all the bells and whistle you thought you might need.
  • You get a powerful sense of achievement - each cycle you know your code works and you are one more step along your journey - a bit like seeing distance markers on a highway.
With Incremental Development you follow a few simple rules :
  • Your cycle (design, code, test, debug) should be no more than a couple of weeks long - and if possible even shorter.
  • Target a small number of features on each cycle - maybe only one or two features to begin with.
  • Start your cycle by working out how you will test your feature - and if you can start by writing your test cases first.
  • At the end of the cycle - make sure your latest working code is saved in some form - so if the next cycles goes horribly wrong - you can always go back. The best way to do this is to use a formal change control tool such as git, mercurial or subversion.
  • Within your cycle complete one feature before you do the next one - that way you have progress made even if the 2nd feature doesn't complete.
  • If a feature proves too complex,  drop it from the cycle - and take time to refine the design or split it up by redesigning it in a brand new cycle.
The key to this process is the concept of a feature, which is best defined as follows :
  • A single piece of  functionality which can be described by a simple sentence. One thing your software will do in order to achieve the whole aim.
This is best explained by an example :

One of my first home projects was an application to upload one or more images to flickr. To achieve this the application need to do a number of things including :
  • Connect to a flickr account
  • Get a list of Sets from flickr (a set is a group of images)
  • Upload a image to flickr
  • Add an image to a set
  • Record an optional title for the image
  • and many more (as you might imagine)
The application also included a GUI which needed to be developed which included a number of different elements.

Any one of those bullet points (and any one of the GUI elements) could be described as a "feature" - don't treat the GUI as a single feature - even with very good design tools, GUIs can be very complex, and maybe could be left to last.

Don't be tempted to make your features too complicated; if a features seems too easy (for example recording a title against an image) you will be able to develop more than one feature in a cycle. If you start trying to combine items and call the combination a feature - you run risk of something being far more complex, and you end up with a bigger failed feature.

Language features in python such as classes, modules and packages make it very easy to practise this style of development. Many features can be added as either new methods or new classes and since the language itself does not have a complex syntax which requires a lot of extra punctuation, it is relatively painless to write new code quickly. The python interpreter is also a massive benefit - giving you the change to test new code quickly, even without full test scripts.

To summarise - start slowly in small steps, and test as you go along - don't try to do everything at once. Don't be afraid to go backwards.

Saturday, 24 October 2015

Python Weekly # 2 : 10 things to learn as a Python Developer

10 things to learn as a Python Developer

Whether you are contributing to Open source projects, or simply creating applications for your own use, there are a number of tools that you really should learn - theses tools will make your life so much easier.

These are not recommendations for beginners - In fact I would strongly suggest as a beginner you don't worry about extra tools until you are confident with the basic language capabilities - including how to write Object Oriented Code, and how to separate your code into modules and Python Packages.
  1. An IDE  - A good IDE can shave significant time off your development cycle. There are many good IDEs available : IDLE, Eclipse, PyCharm (to name but a few).  My personal favourite is PyCharm - I will explore why in another post. There are few projects which will insist collaborators using a particular IDE, although of course some work environments may have standard IDEs which you need to use.
  2. Choose a unit test framework or two and get familiar with then. The ability to design, construct and maintain unit tests is invaluable in any serious development project, as you will find your code quality increases (as does your confidence in the code), if you have a set of test cases that you can execute on a regular basis. unittest (Python 3.5), doctest (Python 3.5) and nose (if you are using unittest) are a few examples. 
  3. A mocking framework : Mocking is an incredibly powerful technique which allows you to replace a module with artificial version, allowing you fine control over your testing, and allowing you to test your code in ways that might otherwise be impossible (or at least very difficult) to reproduce. A common mocking framework is mock (from Python 3.3 onwards this is part of the standard library)
  4. Test Driven Development : A very powerful technique which effectively boils down to defining what your code will do before you start writing it. The difference is that instead of defining the code in a document, you define your code in terms of unit tests which will pass once your code is complete. During development regularly run your tests, so you can understand what is left to develop.
  5. A Debugger : It is highly unlikely that your code will be developed bug free - so when you find a bug, knowing how to use the debugger will be significant. Certainly using a debugger well is far more efficient than splattering your code with multiple print statements. Most IDEs (see above) will come with an inbuilt debugger, and of course you can use pdb (Python 3.5). Remember once you find and fix the bug - write a test case for it, and include it in your standard test cases that you run.
  6. Data Persistence : Any reasonably complex project will probably have a need to store and retrieve data between executions, and for that you will need to decide how that data is stored and retrieved. Options include : csv files (Python 3.5), json (Python 3.5), pickle (Python 3.5) or sqlite (Python 3.5). Personally I tend to use pickle for small low volume objects, and sqllite for larger volume data storage.
  7. Documentation : Don't forget to document your code, and do it as you go. Make sure your code is readable, and that the comments and documentation strings explain what your code does and why (in preference to how it works). Also make sure you have good overall documentation on your complete application - README files, user guides etc. It is surprising how quickly you can forget a piece of code that you wrote a few weeks ago - and end up staring at it for hours trying to work out what it does.
  8. Virtualisation : You will probably want to ensure that you isolate your code as you test it, to ensure that you don't destroy your working system. If you don't have a separate test machine, then this could be as simple as creating a clean working directory, but you might well need to use virtualenv or even virtualbox depending on what your code does and how far you want to isolate your testing.
  9. Change Control : Even if you have no intention of publishing your code, making use of a good change control system is highly recommended - if nothing else it provides you a long term undo for all of your code. If you learn to use long lived branches, you can keep your next development independent of your working version. Examples include : git, mercurial, and subversion (also called svn). If you decide to contribute to Open source projects though, you will need to be familiar with the change control system that they use.
  10. Packaging : Even if you are just creating code for your own use, if you want to ensure a clean installation every time, you probably want to look at packaging your code - either using the Python Packaging tools, or Debian Packaging tools (depending on how your code needs to be installed).

Saturday, 17 October 2015

Python Weekly # 1 : Notes on developing an import hook

Notes on developing an import hook

Before we start I should give a warning, this article covers import hooks in python 2.7. The hook mechanism has changed from Python 3.5, although the mechanism in Python 2.7 is still supported.

I am a big fan of using JSON to allow configuration of my application, especially where the configuration is more towards the back end functionality. Many of of my applications have variations of either Class or object factories based on JSON data - using JSON dictionaries as templates for the formation of individual classes or instances, and I soon realised that a more "standard" way of approaching JSON would be ideal.

Using the Python import hook mechanism (See PEP 0302), I set out to develop a way of importing JSON file, and automatically generating classes etc from the data within the JSON - an implementation of data as code.

The result is the importjson library - (available as source on GitHub, and an installable package from PyPi - Python Package Index)

This article documents some of what learned during the development process.

Python exposes two types of import hooks - sys.path_hooks and sys.meta_path :
sys.path_hooks
These hooks are best used when there will be a specific entry on sys.path which only needs the special importer - for instance the zipimport module uses this mechanism.
sys.meta_path
These hooks are appropriate when a special sys.path entry will not exist - i.e. where the things to be imported could exist in anywhere.
It was clear that sys.meta_path is ideal for my purposes, since in my applications often co-locate JSON files with my python files - and I am sure this is the case with many other developers. Also using sys.meta_path mechanism would still work if the JSON files are kept separate from the python files.

The Importer Protocol is well documented in PEP 0302, so I will not cover the details here, but I will cover a few important notes for anyone trying something similar - things I tripped over.

Installation

Your importer not only needs to implement the Importer Protocol, but it will also need to install it self onto either sys.path_hooks, or sys.meta_path.

An entry in sys.meta_path should be an object instance which implements at least the find_module method from the Importer Protocol, while an entry in sys.path_hooks will need to be a callable which when passed a path (from sys.path), returns an object which implements at least the find_module method from the the Importer Protocol (the callable can clearly be a class which accepts the sys.path entry as an argument to the constructor).

Reuse and caching

Regardless of whether the Importer is installed on sys.path_hooks or sys.meta_path the instance will be reused, so care should be taken when deciding which data should be stored on a per instance basis. 

Due to the way that Python caches sys.path_hook entries (sys.path_importer_cache), once an Importer successfully finds one module within an sys.path entry the same Importer will be used to load all other modules in the same sys.path entry. This caching make a sys.path_hook type importer not suitable for situations where many types of importable file might exist within a given directory.

Translation of module names

The Importer Protocol consists of two methods which have a very simple signatures :
  • finder.find_module(fullname, path=None) 
  • finder.load_module(fullname )
The detail of what these methods need to do are documented in PEP 0302, so I wont cover it again here, but I will note :


Both methods are passed the fullname of the module - i.e. the fully dotted name of the module being loaded, so both methods need to be able to translate the fullname of the module into a file name which can be read and converted to a module, and it makes sense for that translation to be consistent (I cannot think of a situation where you want inconsistency here).

In my module I solved this by the find_module method storing the translation of fullname to file name into a class wide dictionary, and the load_module method uses the same dictionary to identify which file to open.

How you do the translation depends in part whether you have a sys.path_hook or sys.meta_path installed Importer :
  • With a sys.path_hook importer, the importer instance is initialized with an entry from sys.path (i.e. a file path), and the find_module method is then passed the full qualified name and the path of the most immediate parent module which applies (or None is the module being imported is a top-level module). Using either the path from the initializer, or the path given as the find_module argument, you can identify the path to the specific module.
  • With a sys.meta_path importer, the find_module method is passed a fully qualified module name, and the path the most immediate parent module (or None if the module being imported is a top-level module). The importer should use the the path to the parent if provided, but the importer will need to determine how and where to look for top level modules. My JSON importer does it's own traversal of sys.path to find where to search, but there are other options.

And finally :

During the development cycle beware when reloading your importer module, as this could well result in multiple entries in sys.path_hooks or sys.meta_paths, which could lead to inconsistent test results - take care to remove old entries before reloading.

If during development your importer does go completely wrong, and breaks your ability to import other modules, you have a few different options :
  • Remove the importer entry from sys.meta_path
  • Remove the importer entry from sys.path_hooks and from sys.path_importer_cache
  •  If all else fails, exit the interpreter and start again.
Writing your own import hook may seem scary at first, but in fact it is relatively simple - the key thing is coming up with the idea in the first place.