Smalltalk.org™| main

The Futility of Adding Types to a Dynamic Language
written by Peter William Lount
version 1, 20050105 4:28pm PST.
version 2, 4:48 pm PST, added side bar to address "opinion".
version 3, 5:18 pm PST, adjustments to the tone.

James Robertson, in Adding the exclamation point, points us to the ongoing attempt to add variable type definitions into the Python Language.

First let me state that the Python language community can design their language anyway they wish. It's not the end of Python even if it's the end of dynamism and simplicity in Python. From the dynamic perspective I wonder if one could liken "static types" to a form of illness in the software in that the addition of even optional types tends to pollute a language? Ok, cut the irrelevant chatter (indended for levity), steady on target red five... let's see if we can contribute to the Python improvement project in a meaningful and positive way.

Let's start with this comment by Christopher G. Petrilli regarding Python:

So what do I do if I want to say that I just want a number. Any number will do. Or maybe I only want rational numbers? What if my int is actually a Decimal type? Is that ok? - From Adding the exclamation point

As I've written about in Validations Are Best Done At Runtime With Full Language Semantics if you are going to add "type definitions" (or interface definitions) to your programming language it's best to do so using the full semantic power of your language, don't invent a new syntax.

Please note that it is my opinion and view that adding types to Python are a mistake and that this should not be seen as an ad hominen personal attack upon Guido van Rossum, as that is not intended. I have the deepest respect for him and the work he is doing. The arguments in the article are addressing the folly and futility (in my opinion) of adding types to a dynamic language especailly considering that a new alternative is available. Not to mention that Smalltalk does fine without static types.

I view the desire for typed variables a deep and pervasive mistake (with unrelenting and powerful peer pressure to conform and adopt typed variable and interface limitations) in the computing industry as a whole, a technique that is overly relied upon to address programming errors.

This is even more evident considering the two recent papers (PDF) on "Composable Encapsulation Policies" and "Object-Oriented Encapsulation for Dynamically Typed Languages" which present a viable alternative to static types that can preserve the dynamic nature of a programming language. An important read for anyone considering adding types to a dynamic language. The article Extending Encapsulation For Smalltalk links to these papers and highlights their relevance to Smalltalk and languages like Python.

Unfortunately the designer of Python, Guido van Rossum is, in my view (see side bar), making the mistake of adding types by adding new "special" syntax. PERL made this complexity mistake a long time ago and PERL programmers and their clients have been paying for it since.

Apart from the mistake of adding "compile time" type constraints to Python he is (in my view) compounding this by adding a new special syntax. Now, before anyone points out that he's "proposing" that these type additions are optional it should be noted that certain features of programming languages impact the language, the programmers, and the programs written so deeply that there is no turning back.

While I'm not an expert in Python (yet) I am an expert in dynamic typeless systems and in Smalltalk. I'm also a polyglot in that I speak many computer languages. Clearly from the perspective of dynamic systems and from the perspective of maximizing the expressive power of computer languages the proposed additions to Python have a lot to be desired. It's hard when evolving a language to keep taking successful steps forward as it's so easy (maybe an order of magnitude easier) to devolve the language before you know it.

As Perilli points out the proposed "special type syntax" for Python lacks the expressive power to specify the various situations that Perilli can envision. This is typical of all the "typed languages" that I'm aware of.

To gain the maximum expressiveness any "type system" needs to have a language that's fully general purpose. It would be folly to define a special language just for the specification of types as that would double the complexity that programmers would need to learn and then apply in everyday use. Thus it's better and simpler to use the same syntax for both general purpose programming and type interface definitions.

As I've mentioned (in Validations Are Best Done At Runtime With Full Language Semantics) compile time "type and interface definitions" are a very limited form within the more powerful and general Validation Framework that includes compile time and the flexible run time validation opportunities. The point is why add new capabilities to a language while ignoring the state of the art of that capability domain? Once added to a language a feature is notoriously difficult to remove, especially syntactic features.

Validations, all forms of them, are really a form of "meta data" about the program. "Meta data" are program statements (using a special syntax or using the general purpose language syntax) about the program itself. As Smalltalk has demonstrated the most powerful and general method of having meta data in a language is to have that meta data written in the general purpose syntax of the language itself. In this way meta data programming is the same as regular programming and many more programmers will be able to access this powerful dynamic capability. In Smalltalk style languages the "meta data" are first class objects. The impact of this is that much of the added meta data functionality can be written in the language itself and added as a group of "classes" (or objects) otherwise known as a Framework.

Adding a Validations Framework to a language makes sense when it's added as "first class meta data" in the general purpose language itself. By virtue of being added as a "Framework" the validations can be refined and evolved by regular programmers, while one might need to take care when adding changes to a meta data framework or class, in general anyone can do it, usually within a few minutes, without being a "virtual machine" (VM) programmer. There are some cases when new meta data does require support from the VM systems programmer, which is why language design is crucial if one wants to maximize power to the general programmer and minimize low level tinkering.

The danger of a general purpose Validations Framework is that programmers will over constrain their programs in a similar way that programmers using "typed languages" do. This is especially a danger for programmers who have ingrained bad habits learned from using languages that enforce compile time "type validations and constraints" upon them. Think C, C++, Java, etc... A compile time type limitation moved to run time is still the same limitation. It seems that the notion that one can make a mess with any tools still applies even in the best designed system. Knowing when one is crossing thresholds that reduce flexibility of a dynamic program is an art form of experienced designers. It's also something that needs to be studied as it might present some opportunities for advancement in our understanding of the fundamental difference between dynamic runtime systems and static compile time systems.

In the design of Zoku, a next generation Smalltalk derived language, the above issues are prevalent in my design decisions. There are an infinite set of choices in the landscape of language design, choices in syntax, choices in capabilities, choices in data and object models, choices in aesthetics, plus many more. Keeping the pure spirit of Smalltalk is critical. The notions above are important principles at the heart of Smalltalk. They are at the heart of the language and their loss would diminish the language. The key when designing a new language is finding where it's heart lives, what are the core principles that drive the language forward from it's deepest level. The expressive power of systems built using a language will be limited - and influenced - to a high degree by the choices the language designer makes. Choose wisely. Consider the options before committing as your end users will pay the price of your choices.

Let's consider the price that end users will pay with a real example.

The "Javascript" and "FlashMX" languages (both based on the same standard) live in the "web browser" environment. In this environment errors in one program statement should not stop a web page from displaying so in their brilliance the designers of Javascript made a fundamental choice: when a program statement has an error simply ignore it, let it go, don't raise an exception that could otherwise "stop the web page displaying". I guess this seemed a reasonable choice to them at the time. The capability as designed achieved their goal, web pages pretty much display even with serious programming errors. The implications of ignoring the run time "exception meta data" by throwing it out is extreme in both lost productivity, lost time, extra costs and huge amounts of frustration.

Why? It's simple really, by tossing out the exception meta data they toss out the baby with the bath water! It's very difficult to debug Javascript and FlashMX. I know, I've tried. Others know, I've talked to people who've spend the last few years working full time in these two environments. It's their biggest bane. Even with the new debugger in Flash MX you can't see where the errors are! How are you supposed to debug when you don't have the computer pointing out the errors? Unfortunately the "print statement" becomes the primary and almost sole debugging tool. Talk about archaic.

Now it might be possible for the Flash MX debugger to mark statements "red" when they fail, but for some reason that hasn't been added to the system. When I'm programming in Javascript or FlashMX I now have to take into account this "debugging nightmare potential". I do this by allotting extra time to the development schedule. This has a direct cost impact on clients.

Who's building these systems and how do they get away with putting shoddy products into the global market place? Just think about how much Javascript and Flash MX code is being written out there with substandard debugging tools and capabilities? Think of the economic cost. It's apparent that choices in language design can have a fundamental impact upon the productivity of programmers. Should language designers be held accountable for this or is it simply "buyer beware"?

In much the same way that "ignoring runtime errors" in Javascript and Flash MX inherently cause real problems the "compile time type validations and constraints" applied to variables and "type interfaces" cause real problems. It's just that most professionals don't see the dangers yet.

The solution for Javascript and Flash MX is to add exception handling to the languages and to clean up the semantics of run time error handling. These are major overhauls complicated by billions of lines of legacy code in the wilds of the Internet.

The PHP language has a similar upgrading underway (PHP 5) that cleans up "object references" by making object references the default rather than "shallow copying objects" when simply doing a variable assignment. This archaic feature is another example of a language design choice that is best relegated to the dust heap of history. I'm glad the designers of PHP have come to their senses to fix this serious flaw. It will make PHP 5 a much better language.

The PERL language is also going a major revision with PERL 6. Not only are they improving the syntax over previous versions of PERL they are building a powerful virtual machine technology from the ground up that has application to many languages. As I've mentioned before it would be cool to see Squeak Smalltalk running on the Parrot virtual machine. I hope that the choices that Larry Wall is making will make PERL a better and less complicated language.

There is a common theme running through language design that applies to many programming languages. Make the right choices or you'll have to come back and change the language.

There is also another theme running through some evolutions of languages, by exploring the "language space" it's possible to have new capabilities that others have not thought of yet and that actually open up new dimensions. The key of this evolution is adding new dimensions of extendability that are accessible to anyone using the language. While it's nice to see language like PHP and PERL evolving and attempting to meet standards set long ago by Smalltalk, it's not enough for the new challenges and opportunities coming from the fast approaching future.

A glance at the Smalltalk versions page will show how Smalltalk is evolving to meet the needs of various technological ecosystems. Some of these variations are simple and some are evolutionary. A few are revolutionary. All of them are valid. I can't speak for the people designing those systems, but I can say that the future of language design is wide open. Make your choices wisely as people will speak your language and to a large extent be trapped within it.

The article Extending Encapsulation For Smalltalk links to two papers (PDF) on "Composable Encapsulation Policies" and "Object-Oriented Encapsulation for Dynamically Typed Languages". An important read for anyone considering adding types to a dynamic language.

Given the direction of their work the Python designers would benefit from a serious look at the lastest state of the art work in "extending encapsulation for dynamic languages". I urge the Python community to seriously consider keeping their language dynamic with by extending their encapsulation rather than adding types. A quote within the article illustrates the relevance for Python remaining a fully dynamic language:

"In this paper we describe the problems that are caused by insufficient encapsulation mechanisms and then present object-oriented encapsulation, a simple and uniform approach that solves these problems by bringing state of the art encapsulation features to dynamically typed languages. We provide a detailed discussion of our design rationales and compare them and their consequences to the encapsulation approaches used for statically typed languages."

"Given the importance of encapsulation to object-oriented programming, it is surprising to note that mainstream object-oriented languages offer only limited and fixed ways of encapsulating methods. Typically one may only address two categories of clients, users and heirs, and one must bind visibility and access rights at an early stage. This can lead to inflexible and fragile code as well as clumsy workarounds. We propose a simple and general solution to this problem in which encapsulation policies can be specified separately from implementations. As such they become composable [and first class meta data] entities that can be reused by different classes. We present a detailed analysis of the problem with encapsulation and visibility mechanisms in mainstream OO languages, we introduce our approach in terms of a simple model, and we evaluate how our approach compares with existing approaches [including the approach of encapsulation via type systems]. We also assess the impact of incorporating encapsulation policies into Smalltalk" [and by extension other dynamic languages].

I'm willing to assist anyone designing a new language (or revising an existing language) to avoid the mistakes illustrated above (and others) by engaging in a professional and emotionally calm dialog with an eye towards making their language the best it can possibly be given the nature of the "language space". This includes the languages mentioned herein.