Thursday, 29 October 2015

Python gotchas # 2 : Names, dynamic typing, is and ==

Python gotchas - Names vs Variables

or how not to get caught out when you start using python

At first glance python seems very familiar, especially if you have used other procedural languages such as C or Java - but actually Python is different - in some cases very different, and those differences can trip you up as you progress along your python journey. In this series of occasional posts, I am going to cover some of those gotchas. 

Names and variables 

In most other languages a variable holds a specific value, and a change to the variable changes the value that variable holds. That is true to a certain extent in Python, but critically there are some differences. Python doesn't have variables in the truest sense of the meaning - it has names and objects.
Everything in python is an object - numbers, strings, lists are all objects, as are the items in the list for instance, and names are simply references to those objects. (Classes and functions are also special types of object - I will cover that in more detail in a later blog post).

This process of making a name refer to an object is called binding, and a name is said to be bound to an object.

Take the following code :

################ Example 1
>>> var1 = 3
>>> var2 = var1
>>> id(var1) == id(var2)
True

This gives us two names (var1 and var2) both bound to the same object (The id function returns the object Id which the name is bound to). There is a int object with the value '3' to which both var1 and var2 are bound.

We can bind var1  to a different integer object - without affecting what var2 is bound to.

################ Example 2 
>>> var1 = 3
>>> var2 = var1
>>>id(var1) == id(var2)
True
>>> var1 = 4
>>> id(var1) == id(var2)
False
>>> print var1, var2
4 3

When you bind a name to an object the rules are simple - nothing is copied, the name is simply created and made to refer to that object - in terms of the C language - the name is a pointer to an object. The big difference is that in Python since every name is a reference you don't need a special syntax element to get to the object being refereed to.

When we increment our number :

################ Example 3
>>> var1 = 3
>>> var2 = var1
>>> id(var1) = id(var2) # bound to the same object ?
True
>>> var1 = var1 + 1
>>> id(var1) = id(var2) # bound to the same object ?
False
>>> print var1, var2
4 3

Here we change the value of the object that var1 is bound to, and var2 is left bound to it's previous object, while var1 is now bound to a new object.

So what happens when our names are bound to a list object, and we change the list - do we get two list objects :

################ Example 4
>>> l1 = [1,2,3,4]
>>> l2 = l1
>>> id(l2) == id(l1) # bound to the same object ?
True
>>> l1[0] = 4
>>> id(l2) == id(l1) # bound to the same object ?
True
>>> print l2
[4,2,3,4]

We can see that even after the change, the l2 name is still bound to the same object as l1, and therefore the change made to the object, is reflected in l1 and l2.

The difference between example 3 and example 4 is entirely down to whether the object is immutable :  lists (and other data structures such as dictionaries) are considered to be mutable - i.e. they can be changed without the creation of a new object, and therefore a change in the object is reflected across all the names bound to it.
In the case of integers and a number of other types, the objects are considered "immutable" (i.e. cannot be changed) and attempting to change the value (adding 1 to var1 in example 3 results in the creation of a new object - in this case new integer object (value 4) which var1 is then bound to.
(In fact when python starts up integer objects already exist for all values between -5 and 256 inclusive - as these values appear most regularly across most type of program, and having these objects ready saves time as your program runs).

So far the end result is probably similar to what you might expect : after all you can't redefine 3 to be 4, but you can change the content of a list.

The main tripping points are that for the mutable types there are many different way to change the object - and some don't result in the reflection one might expect :

>>> ################ Example 5
>>> l1 = [1,2,3]
>>> l2 = l1
>>> id(l1) == id(l2)
True 
>>> l1.append(4)   # append changes the object 'in place'
id(l1) == id(l2)
True
>>> print l1, l2
[1,2,3,4] [1,2,3,4]
>>> l1+=[5]       # this form of list addition also operates 'in place'
>>> id(l1) == id(l2)
True
>>> print l1, l2
[1,2,3,4,5] [1,2,3,4,5]

>>> ################ Example 6
>>> l1 = [1,2,3]
>>> l2 = l1
>>> id(l1) == id(l2)
True 
>>> l1 = l1 + [4] # This form of change creates a new object
>>> id(l1) == id(l2)
False 
>>> print l1, l2
[1,2,3,4] [1,2,3]

In example 6 - a new object is created (when the expression l1+[4] is evaluated), and since we have a new object, l2 is no longer bound to the same object as l1 - i.e. the reflection is broken.

The key phrase here is "in place" - many of the standard functions modify the object "in place" - i.e. modify the object without the creation of a new object, and this type of modification will always ensure that changes are reflected through all bound names, but there are equally many ways to apparently change the value of a mutable object which in fact will result in the creation of a new object - if in doubt you can always open your python interpreter and try it out - remember you can check the id values (or use the is operator discussed below).

Names and dynamic typing 

Unlike in C where you have to declare  what type a name/variable is before you use it, a name in Python can be bound to any object you want, and be bound to a different type of object later on - this is called dynamic typing - and is one of Python's greatest benefits - var1 can be an integer, and then a floating point number, and then a string. With that flexibility comes great strength, and also potential pitfalls.

In a complex application without clear boundaries it is easy for the developer to loose track of what type that variable should be at this point, and to try to do something which either results in an error or even worse a subtle bug, because for instance the value that should be an integer is actually a string :

>>> ################ Example 7
>>> var1 = 3
>>> print "Final value %"%(var1*3)
Final value 9
>>> ......
>>> ...... # Some time later
>>> var1 = "xxx"
>>> print "Final value %"%(var1*3)
Final value xxxxxxxxx

It is strongly suggested by the author that in any complex program you use sensible names for your values (not var1, var2 etc) - this naming will help prevent some of the worst of the challenges that Dynamic Typing can bring. 
There are also methods of testing what type of object a name is bound to, I will cover that in a future blog post.

And finally - a warning about 'is' and '=='

Many python beginners get confused about when to use the 'is' comparison and when to use '=='.

The '==' operator should be used when you are testing whether the two names have the same value - i.e. the objects they are bound to have the same value - for lists for instance this is whether the two lists have the same content in the same order - it is highly likely that you will use '==' far more often that you use 'is'.

The 'is' operator should be used when testing whether two names refer to the same object.

For trivial examples : integers between -5 and 256, and short string literals (less then 20 characters), then 'is' and '==' will return the same result (due to optimisations already mentioned) which can lead you into a false understanding of what the operators do. Using values outside those ranges will show the distinction :

>>> ################ Example 8
>>> a = 10*100
>>> b = 1000
>>> a is b        # bound to the same object ? equivalent to id(a) == id(b)
False             # NO
>>> a == b        # But definitely the same value
True
>>> l1 = [1,2,3,4]
>>> l2 = [1,2,3,4]
>>> l1 is l2      # Not bound to the same object 
False
>>> l1 == l2      # But again definitely equal in value
True

The 'is' operator is equivalent to comparing the id value.
It is expected that if two names are bound to the same object (var1 is var2) then the value comparison (==) will also be the same - this will always be the case with objects created by the standard library. As demonstrated in Example 8, the reverse is not true - if two names are not bound to the same object, their values could still be equal.

No comments:

Post a Comment