Python - Classes and Instances (__init__, __call__, etc.)
Unlike C++, classes in Python are objects in their own right, even without instances. They are just self-contained namespaces. Therefore, as long as we have a reference to a class, we can set or change its attributes anytime we want.
The following statement makes a class with no attributes attached, and in fact, it's an empty namespace object:
class Student: pass
The name of this class is Student, and it doesn't inherit from any other class. Class names are usually capitalized, but this is only a convention, not a requirement. Everything in a class is indented, just like the code within a function, if statement, for loop, or any other block of code. The first line not indented is outside the class.
In the code, the pass is the no-operation statement. This Student class doesn't define any methods or attributes, but syntactically, there needs to be something in the definition, thus the pass statement. This is a Python reserved word that just means move along, nothing to see here. It's a statement that does nothing, and it's a good placeholder when we're stubbing out functions or classes. The pass statement in Python is like an empty set of curly braces {} in Java or C.
>>> Student.name = "Jack" >>> Student.id = 20001
Then, we attached attributes to the class by assigning name to it outside of the class. In this case, the class is basically an objec with field names attached to it.
>>> print(Student.name) Jack
Note that this is working even though there are no instances of the class yet.
Many classes are inherited from other classes, but the one in the example is not. Many classes define methods, but this one does not. There is nothing that a Python class absolutely must have, other than a name. In particular, C++ programmers may find it odd that Python classes don't have explicit constructors and destructors. Although it's not required, Python classes can have something similar to a constructor: the __init__() method.
In Python, objects are created in two steps:
- Constructs an object
__new()__ - Initializes the object
__init()__
However, it's very rare to actually need to implement __new()__ because Python constructs our objects for us. So, in most of the cases, we usually only implement the special method, __init()__.
Let's create a class that stores a string and a number:
class Student(object): '''Classes can (and should) have docstrings too, just like modules and functions''' def __init__(self, name, id = 20001): self.name = name self.id = id
When a def appears inside a class, it is usually known as a method. It automatically receives a special first argument, self, that provides a handle back to the instance to be processed. Methods with two underscores at the start and end of names are special methods.
The __init__() method is called immediately after an instance of the class is created. It would be tempting to call this the constructor of the class. It's really tempting, because it looks like a C++ constructor, and by convention, the __init__() method is the first method defined for the class. It appears to be acting like a constructor because it's the first piece of code executed in a newly created instance of the class. However, it's not like a constructor, because the object has already been constructed by the time the __init()__ method is called, and we already have a valid reference to the new instance of the class.
The first parameter of __init()__ method, self, is equivalent to this of C++. Though we do not have to pass it since Python will do it for us, we must put self as the first parameter of nonstatic methods. But the self is always explicit in Python to make attribute access more obvious.
The self is always a reference to the current instance of the class. Though this argument fills the role of the reserved word this in c++ or Java, but self is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but self; this is a very strong convention.
When a method assigns to a self attribute, it creates an attribute in an instance because self refers to the instance being processed.
To instantiate a class, simply call the class as if it were a function, passing the arguments that the __init__() method requires. The return value will be the newly created object. In Python, there is no explicit new operator like there is in c++ or Java. So, we simply call a class as if it were a function to create a new instance of the class:
s = Student(args)
We are creating an instance of the Student class and assigning the newly created instance to the variable s. We are passing one parameter, args, which will end up as the argument in Student's __init__() method.
s is now an instance of the Student class. Every class instance has a built-in attribute, __class__, which is the object's class. Java programmers may be familiar with the Class class, which contains methods like getName() and getSuperclass() to get metadata information about an object. In Python, this kind of metadata is available through attributes, but the idea is the same.
We can access the instance's docstring just as with a function or a module. All instances of a class share the same docstring.
We can use the Student class defined above as following:
studentA = Student("Jack") studentB = Student("Judy", 10005)
Unlike C++, the attributes of Python object are public, we can access them using the dot(.) operator:
>>>studentA.name 'Jack' >>>studentB.id 10005
We can also assign a new value to the attribute:
>>> studentB.id = 80001 >>> studentB.id 80001
How about the object destruction?
Python has automatic garbage collection. Actually, when an object is about to be garbage-collected, its __del()__ method is called, with self as its only argument. But we rarely use this method.
Let's look at another example:
class Student: def __init__(self, id): self.id = id def setData(self, value): self.data = value def display(self): print(self.data)
What is self.id?
It's an instance variable. It is completely separate from id, which was passed into the __init__() method as an argument. self.id is global to the instance. That means that we can access it from other methods. Instance variables are specific to one instance of a class. For example, if we create two Student instances with different id values, they will each remember their own values.
Then, let's make two instances:
>>> s1 = Student() >>> s2 = Student()
Here, we generated instance objects. These objects are just namespaces that have access to their classes' attributes. The two instances have links back to the class from which they were created. If we use an instance with the name of an attribute of class object, Python retrieves the name from the class.
>>> s1.setData("Jake") # method call, self is s1 >>> s2.setData(4294967296) # runs Student.setData(s2,4294967296)
Note that neither s1 nor s2 has a setData() attribute of its own. So, Python follows the link from instance to class to find the attribute.
In the setData() function inside Student, the value passed in is assigned to self.data. Within a method, self automatically refers to the instance being processed (s1 or s2). Thus, the assignment store values in the instances' namespaces, not the class's.
When we call the class's display() method to print self.data, we see the self.data differs in each instance. But the name display() itself is the same in s1 and s2:
>>> s1.display() Jake >>> s2.display() 4294967294 >>>
Note that we stored different object types in the data member in each instance. In Python, there are no declarations for instance attributes (members). They come into existence when they are assigned values. The attribute named data does not even exist in memory until it is assigned within the setData() method.
class MyClassA: def setData(self,d): self.data = d def display(self): print(self.data)
Then, we generate instance objects:
a = MyClassA() a2 = MyClassA()
The instance objects are just namespaces that have access to their classes' attributes. Actually, at this point, we have three objects: a class and two instances. Note that neither a nor a2 has a setData attribute of its own. However, the value passed into the setData is assigned to self.data. Within a method, self automatically refers to the instance being processed (a or a2). So, the assignments store values in the instances' namespaces, not the class's. Methods must go through the self argument to get the instance to be processed. We can see it from the output:
>>> a.setData("Python") >>> a2.setData("PyQt") >>> a.display() Python >>> a2.display() PyQt >>>
As we expected, we stored the value for each instance object even though we used the same method, display. The self made all the differences! It refers to instances.
Superclass is listed in parenthesis in a class header as we see the example below.
class MyClassB(MyClassA): def display(self): print('B: data = "%s"' %self.data)
MyClassB redefines the display of its superclass, and it replaces the display attribute while still inherits the setData method in MyClassA as we see below:
>>> b = MyClassB() >>> b.setData("BigData") >>> b.display() B: data = "BigData"
But for the instance of MyClassA is still using the display, previously defined in MyClassA.
>>> a2.display() PyQt >>>
Operator overloading allows objects intercepts and respond to operations. It makes object interfaces more consistent and it also allows class objects to act like built-ins
class MyClassC(MyClassB): def __init__(self, d): self.data = d def __add__(self,d2): return MyClassC(self.data + d2) def __str__(self): return '[MyClassC: %s]' % self.data def mul(self, d2): self.data *= d2
MyClassC is a MyClassB, and its instances inherits the display() method from MyClassB.
>>> c = MyClassC("123") >>> c.display() B: data = "123" >>> print(c) [MyClassC: 123] >>> c = c + '789' >>> c.display() B: data = "123789" >>> print(c) [MyClassC: 123789] >>> c = MyClassC('123') >>> c.mul(3) >>> print c [MyClassC: 123123123]
When MyClassC is created, an argument '123' is passed. This is passed to the d argument in the __init__ constructor and assigned to self.data. Actually, MyClassC arranges to set the data attribute automatically at construction time. It does not require calling setdata() from instance at later time.
For +, Python passes the instance object on the left of the self argument in __add__ and the value on the right to d2. For print(), Python passes the object being printed to self in __str__. Som whatever string this method returns is taken to be the print string for the object. By implementing __str__, we can use print to display objects of this class, and we do not have to call the display() method.
The __add__ makes and returns a new instance object by calling MyClassC. However, mul changes the current instance object in-place by reassigning the self attribute.
Here is an example of a class Rectangle with a member function returning its area.
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def area(self): return self.width * self.height >>> rect = Rectangle(100,20) >>> rect.area() 2000 >>> rect.height = 30 >>> rect.area() 3000 >>>
Note that this version is using direct attribute access for the width and height.
We could have used the following implementing setter and getter methods:
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def getWidth(self): return self.width def getHeight(self): return self.height def setWidth(self, w): self.width = w def setHeight(self): self.height = h def area(self): return self.getWidth() * self.getHeight()
Object attributes are where we store our information, and most of the case the following syntax is enough:
objName.attribute # Retrieve attribute value objName.attribute = value # Change attribute value
However, there are cases when more flexibility is required. For example, to validate the setter and getter methods, we may need to change the whole code like this:
class Rectangle(object): ... def getWidth(self): if not valid(): raise TypeError('cannot retrieve width') else: return self.width.convert() ... def setWidth(self, w): if not valid(w): raise TypeError('cannot set width') else: self.width = convert(w) ...
The solution for the issue of flexibility is to allow us to run code automatically on attribute access, if needed. The properties allow us to route a specific attribute access (attribute's get and set operations) to functions or methods we provide, enabling us to insert code to be run automatically. A property is created by assigning the result of a built-in function to a class attribute:
attribute = property(fget, fset, fdel, doc)
- We pass
- fget: a function for intercepting attribute fetches
- fset: a function for assignments
- fdel: a function for attribute deletions
- doc: receives a documentation string for the attribute
If we go back to the earlier code, and add property(), then the code looks like this:
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def area(self): return self.width * self.height m_area = property(area)
We can use the class as below:
>>> rect = Rectangle (33, 9) >>> rect.width, rect.height, rect.m_area (33, 9, 297)
The example above simply traces attribute accesses. However, properties usually do compute the value of an attribute dynamically when fetched, as the following example illustrates:
class SquareIt: def __init__(self, init): self.value = init def getValue(self): return self.value**2 def setValue(self, v): self.value = v V = property(getValue, setValue)
The class defines an attribute V that is accessed as though it were static data. However, it really runs code to compute its value when fetched. When the code runs, the value is stored in the instance as state information, but whenever we retrieve it via the managed attribute, its value is automatically squared.
>>> S1 = SquareIt(4) >>> S1.value 4 >>> S1.V 16 >>> S1.V = 5 >>> S1.V 25 >>> S2 = SquareIt(64) >>> S2.value 64 >>> S2.V 4096 >>>
Again, note that the fetch computes the square of the instance's data.
By implementing __cmp__() method, all of the comparison operators(<, ==, !=, >, etc.) will work.
So, let's add the following to our Rectangle class:
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def area(self): return self.width * self.height def __cmp__(self, other): return cmp(self.area, other.area)
Note that we used the built-in cmp() function to implement __cmp__. The cmp() function returns -1 if the first argument is less than the second, 0 if they are equal, and 1 if the first argument is greater than the second.
For Python 3.0, we get TypeError: unorderable types. So, we need to use specific methods since the __cmp__() and cmp() built-in functions are removed in Python 3.0.
The __str__ is the 2nd most commonly used operator overloading in Python after __init__. The __str__ is run automatically whenever an instance is converted to its print string.
Let's use the previous example:
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def area(self): return self.width * self.height
If we print the instance, it displays the object as a whole as shown below.
>>> r1 = Rectangle(100,20) >>> print(r1) <__main__.Rectangle object at 0x02B40750>
It displays the object's class name, and its address in memory which is basically useless except as a unique identifier.
So, let's add the __str__ method:
class Rectangle(object): def __init__(self, w, h): self.width = w self.height = h def area(self): return self.width * self.height def __str__(self): return '(Rectangle: %s, %s)' %(self.width, self.height) >>> r1 = Rectangle(100,20) >>> print(r1) (Rectangle: 100, 20]
The code above extends our class to give a custom display that lists attributes when our class's instances are displayed as a whole, instead of relying on the less useful display. Note that we're doing string % formatting to build the display string in __str__.
The difference between __str__ and __repr__ are not that obvious.
When we use print, Python will look for an __str__ method in our class. If it finds one, it will call it. If it does not, it will look for a __repr__ method and call it. If it cannot find one, it will create an internal representation of our object.
>>> class MyClass: def __init__(self, d): self.data = d >>> x = MyClass('secret') >>> print(x) <__main__.MyClass instance at 0x027335D0> >>> x <__main__.MyClass instance at 0x027335D0> >>>
Not much information from print(x) and just echoing the object x. That's why we do customize the class by using __str__.
>>> class MyClass: def __init__(self, d): self.data = d def __str__(self): return '[data: %s]' %self.data >>> x = MyClass('secret') >>> print(x), x [data: secret] [data: secret] >>> myObjects = [MyClass('A'), MyClass('B')] >>> for obj in myObjects: print(obj) [data: A] [data: B]
But not when we use print(myObjecs). Note that the instances are in the list:
>>> print(myObjects) [<__main__.MyClass instance at 0x02733800>, <__main__.MyClass instance at 0x027338C8>] >>> myObjects [<__main__.MyClass instance at 0x027338C8>, <__main__.MyClass instance at 0x027336C0>]
So, we need to define __repr__:
>>> class MyClass: def __init__(self, d): self.data = d def __str__(self): return '[str- data: %s]' %self.data def __repr__(self): return '[repr- data: %s]' %self.data >>> myObjects = [MyClass('A'), MyClass('B')] >>> for obj in myObjects: print(obj) [str- data: A] [str- data: B] >>> print(myObjects) [[repr- data: A], [repr- data: B]] >>> myObjects [[repr- data: A], [repr- data: B]]
In his book, Learning Python, Mark Lutz summarizes as follows:
...__repr__, provides an as-code low-level display of an object when present. Sometimes classes provide both a __str__ for user-friendly displays and a __repr__ with extra details for developers to view. Because printing runs __str__ and the interactive prompt echoes results with __repr__, this can provide both target audience with an appropriate display.
sys.argv is the list of arguments passed to the Python program.
The first argument, sys.argv[0], is actually the name of the program. It exists so that we can change our program's behavior depending on how it was invoked. For example:
- sys.argv[0] is operating system dependent whether this is a full pathname or not.
- If the command was executed using the -c command line option to the interpreter, sys.argv[0] is set to the string '-c'.
- If no script name was passed to the Python interpreter, sys.argv[0] is the empty string.
# t.py if __name__ == '__main__': import sys print('program: %s' %sys.argv[0]) for arg in sys.argv[1:]: print('Arg: %s' %arg)
If we run it with or without any argument, we get the output below:
C:\TEST>python t.py program: t.py C:\TEST>python t.py A B C program: t.py Arg: A Arg: B Arg: C
The getopt.getopt() parses command line options and parameter list. args is the argument list to be parsed, without the leading reference to the running program. Typically, this means sys.argv[1:]. options is the string of option letters that the script wants to recognize, with options that require an argument followed by a colon (:).
long_options, if specified, must be a list of strings with the names of the long options which should be supported. The leading -- characters should not be included in the option name. Long options which require an argument should be followed by an equal sign (=). Optional arguments are not supported. To accept only long options, options should be an empty string. Long options on the command line can be recognized so long as they provide a prefix of the option name that matches exactly one of the accepted options:
For example, if long_options is ['foo', 'frob'], the option --fo will match as --foo, but --f will not match uniquely, so GetoptError will be raised.
The return value consists of two elements:
- the first is a list of (option, value) pairs
- the second is the list of program arguments left after the option list was stripped (this is a trailing slice of args).
-from http://docs.python.org/2/library/getopt.html.
# option.py import getopt, sys def main(): in_filename = 'fin' out_filename = 'fout' print('argv : %s' %(sys.argv[1:])) try: options, remainder = getopt.getopt(sys.argv[1:], 'i:o:', ['input=', 'output=']) except getopt.GetoptError: sys.exit() print('options : %s' %(options)) for opt, arg in options: if opt in ('-i', '--input'): in_filename = arg elif opt in ('-o', '--output'): out_filename = arg print('remainder : %s' %(remainder)) print('input file = %s' %(in_filename)) print('output file = %s' %(out_filename)) return 0 if __name__ == "__main__": sys.exit(main())
If we run with two file long options:
$ python option.py --input=my_in_file --output=my_out_file r1 r2 argv : ['--input=my_in_file', '--output=my_out_file', 'r1', 'r2'] options : [('--input', 'my_in_file'), ('--output', 'my_out_file')] remainder : ['remainer1', 'remainder2'] input file = my_in_file output file = my_out_file
With short options:
$ python option.py -i my_in_file -o my_out_file argv : ['-i', 'my_in_file', '-o', 'my_out_file'] options : [('-i', 'my_in_file'), ('-o', 'my_out_file')] remainder : [] input file = my_in_file output file = my_out_file
Note that the long option does not have to be fully matched:
$ python option.py --i=my_in_file --o=my_out_file argv : ['--i=my_in_file', '--o=my_out_file'] options : [('--input', 'my_in_file'), ('--output', 'my_out_file')] remainder : [] input file = my_in_file output file = my_out_file
Python runs a __call__ method when an instance is called as a function form. This is more useful when we do some interfacing work with APIs expecting functions. On top of that, we can also retain state info as we see in the later examples of this section.
It is believed to be the third most commonly used operator overloading method, behind the __init__ and the __str__ and __repr__ according to Mark Lutz (Learning Python).
Here is an example with a __call__ method in a class. The __call__ method simply prints out the arguments it takes via keyword args.
>>> class Called: def __call__(self, *args, **kwargs): print("I've just called:", args, kwargs) >>> A = Called() >>> A('psy', 'Gangnam', 'Style') I've just called: ('psy', 'Gangnam', 'Style') {} >>> A (2013, 'Sept', 16, Singer='psy', Title='Gangnam Style') I've just called: (2013, 'Sept', 16) {'Singer': 'psy', 'Title': 'Gangnam Style'}
As another example with a little bit more mixed arguments:
>>> class classA: def __call__(self, *args, d=44, **kwargs): print(args, d, kwargs) >>> instanceA = classA() >>> instanceA(1) (1,) 44 {} >>> instanceA(1,2) (1, 2) 44 {} >>> instanceA(1,2,3) (1, 2, 3) 44 {} >>> instanceA(1,2,3,4) (1, 2, 3, 4) 44 {} >>> instanceA(a=1, b=2, d=4) () 4 {'a': 1, 'b': 2} >>> instanceA(a=1, b=2, c=3, d=4) () 4 {'a': 1, 'c': 3, 'b': 2} >>> instanceA(*[1,2], **dict(c=333, d=444)) (1, 2) 444 {'c': 333} >>> instanceA(1, *(2,), c=3, **dict(d=4)) (1, 2) 4 {'c': 3}
While the __call__ method allows us to use class instances to emulate functions as we saw in the above examples, there is another use of __call__ method: we can retain state info:
>>> class State: def __init__(self, state): self.myState = state def __call__(self, newInput): return self.myState + newInput >>> S = State(100) >>> S(99) 199 >>> S(77) 177 >>>
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization