28 July 2022
What IS the real significance of OOP?
Many years ago, I was invited to give a speech at a conference about OOP in Norway, called ROOTS – the significance of that name being that OOP originated in Norway (in the 1960s) with the Simula language. After my talk, an older man came up to me and challenged me to define the advantage of ‘encapsulating the methods with the data’. He was obviously sceptical, and he went on to explain that for many years he had been the boss of Kristen Nygaard, who, with Ole Johan Dahl had been awarded the Turing Medal for the invention of OOP, and he claimed that Nygaard himself was never able to provide him with a convincing argument as to the advantage. He went on to make the startling claim that towards the end of his life Nygaard ‘recanted the whole idea of encapsulation’, making it sound almost like a deathbed confession! (I have to say that remain sceptical of that claim since I have never found any corroborating evidence.)
Recently, I was challenged again to define the unique benefit for encapsulating methods with data – and this time I realised that I was unable to do so – because none of the arguments that I might have made 20 years ago really stands up today.
Fundamentally, objects are ‘user-defined data types’, so that instead of working only with generic data structures such as arrays, lists, dictionaries, queues, stacks, trees, and graphs, you can work with instantiable data structures that correspond to the nouns in your problem domain: customers, products, and orders; teachers, pupils, and classrooms; missiles, soldiers, and zombies; atoms, leptons, and quarks. Each holds a unique configuration of named properties, which may be of different types, including associations to other objects. The advantage of user-defined types is enormous – it makes programs easier to read, write, and modify. The problem is that this idea of user-defined types existed in the more advanced procedural languages such as Algol, before the advent of OOP, and exists today in pure ‘functional programming’ (FP) languages such as Haskell.
OOP went beyond user-defined types by introducing the idea of encapsulating methods onto the type itself. In the text that follows I have examined five advantages that have been claimed for encapsulation – including by me – over the last fifty years. (The ordering of the advantages is not significant).
- Encapsulating data with the most common methods (functions) that make use of that data, facilitates re-use. When you instantiate a Stack class, the instance comes with the Push, Pop, and Peek methods bound into it – instead of having to copy the underlying data type (typically an array) and then separately copying those functions. That was once a huge advantage. Today, however, most software development involves sophisticated ‘package management’ systems (see this list for example) – which may package data structures with freestanding functions that operate on them without OOP-style encapsulation.
- Encapsulation gives the programmer the convenience of ‘dot syntax’. If you have an instance of an object class and you type ‘.’ then (depending on your specific programming language and/or IDE) you can be given a pop-up list of all the methods encapsulated on that object, automatically filtering the list as you type the initial letters. Dot syntax also means that you can chain method calls, which is often a convenient way to define expressions. The problem is that in many modern languages you can get this without the methods needing to be encapsulated on the object. In C# and VB, for example, you can define ‘extension methods’ – that appear to the user to be encapsulated but are defined outside the class. (One big advantage of that capability is it allows you to add useful methods to a class that you may not be able to modify – for example a System class.)
- Encapsulation allows methods to be ‘inherited’ by sub-classes. Most school textbooks, which, frankly, are not written by people who really understand OOP, push ‘inheritance’ as the big thing. Most experienced professional OOP developers are sceptical of inheritance, using it only sparingly, if at all. They have seen too many complex inheritance hierarchies that end up reducing the ‘agility’, or ‘malleability’, of the resulting model rather than increasing it. Most would agree that ‘inheritance is just a weak form of polymorphism’ and that the latter is far more important. Hence …
The problem is that polymorphism is no longer dependent upon encapsulation. Haskell, for example, deliberately does not permit encapsulation, but it does support polymorphism very well.
- Last, but by no means least, encapsulation supports ‘information hiding’. Instead of allowing direct access to data, information hiding forced the programmer to go through methods – whether those methods are ‘query’ methods (that just read data, potentially transforming it), or ‘mutating’ methods that change the data. There are many arguments for information hiding. I once made the case to a large government organisation that the Customer’s date of birth should be completely hidden. ‘But everything needs to access the DoB!’ they protested. ‘What, for example?’ ‘Well, we need to know their age to determine their eligibility for certain benefits.’ ‘Fine, Customer will have an AgeToday() method and/or a WillBeOverAgeOnDate(..) method. ‘But we also need it view it to validate the customer’s identity!’ ‘Then that’s a privacy leak,’ I said; ‘So, instead I’ll give you a ConfirmDoBIs(..) method or even a Confirm2DigitsFromDateOfBirth() method.’ As well as improving privacy and security, such methods reduce the coding effort and the risk of inconsistency: because you no longer have every system calculating current age from the date of birth (not nearly as trivial as it sounds). By contrast, I once consulted to a well-known American investment bank where it transpired that their systems had twenty-seven different ways of handling a leap year.
Information hiding is very important. But how do you square that with the fact that Python, one of the most popular programming languages in the world, and especially in education, does not differentiate between public and private fields – all data is visible outside an object? (Yes, I know that there are ‘conventions’ e.g. for fields are not intended to be read or modified from outside the object, and that Python fans argue that ‘Python is a grown-up language where programmers are trusted to do the right thing’. All I can say is that such people can’t ever have worked on large scale systems where code is being used by many programmers who never see each other and are all under pressure to deliver the most expedient solution for their own customer – which frequently involves by-passing any ‘rules’ that they can!)
The first paradigm shift in programming that I experienced was from procedural to structured programming. (I think that one counts as a paradigm shift because, when I first saw the arguments for structured programming, my reaction was ‘But how can you possibly program without GOTO statements?’). But within a year or so I’d completely made the transition.
As you get older, paradigm shifts get harder. Embracing the object-oriented paradigm properly took me most of the 1990s. But I eventually adopted it with a purity that I am still proud of. My PhD thesis (in 2004) proposed a radical, ultra-pure, approach to object-oriented design and implementation which became known as the ‘Naked Objects pattern’, and most of my professional work for the last 20 years has been in designing and building large scale object-oriented enterprise systems that use this pattern.
I didn’t start learning about the functional programming paradigm until I was in my fifties. I am much encouraged by the fact that even Simon Peyton Jones, one of the worlds foremost authorities on functional programming, says that when he first encountered it, at university, he ‘thought that it could not possibly work.’ I can’t say I’ve completely embraced FP yet, but I’m beginning to get a good feel for it – and am convinced that it represents the future of programming.
What does this mean for the other paradigms? Structured programming, and OOP, both still depended on the core idea procedural programming: of executing sequential statements. FP doesn’t (except, arguably, in a very limited way for coding input/output).
FP doesn’t render most the skills I learned in OOP completely redundant. The essential concept of user-defined types that represent the nouns of the domain is still there, and is critical in certain kinds of application such as simulation, science, data analysis, and modelling. Polymorphism is still hugely important: Haskell has one of the most advanced type systems of any programming language. But, like GOTO statements, encapsulating methods on objects will one day be just ‘the way we used to do things in the past’.