The elements of a compiler

Although "compiler" sounds like quite a technical term, it is just the name given to an application that converts source code to object code. The source code need not be a programming language, although that is certainly the most common thing to use - anything that converts input A to output B can be thought of as a compiler.

As far as we are concerned, compilers convert one language, such as PHP, into a form that can be executed. Compilers come in three parts: analysis, synthesis, and output. The first two of these can be subdivided further, but for now we'll consider them at this high level.

At the analysis stage, the source code is read in, and checked to make sure it is correct. At the synthesis stage, the source code is usually converted to "intermediate code" - a language form that is not human-readable like the source code, but not tied to a machine like the object code is. This intermediate code is then optimised for maximum performance and/or size. The final stage is output, where the intermediate code is converted to the target output, which may be an executable file, or may be an interpreted system of operation codes (opcodes).

The middle step, synthesis, is optional, and not all systems employ it. PHP, for example, converts straight from PHP script to its interpreted opcodes, with no middle step. As a result, it is not easy to optimise PHP codes. The advantage to having the synthesis stage, apart from the extra chance to do optimisation, is primarily that it keeps the source language and the destination executable well apart. If a compiler converted C++ directly to Windows .exe executable files, how would that compiler be ported to create Linux executables?

If an intermediate code system is in place, the same analysis stage ("front end") can be used for any target output, and, similarly, the same generation stage ("back end") can be used for any input - as long a source code can be converted to the intermediate code, it can be passed to the output stage for generation. This method is employed in the Microsoft Common Language Runtime - many .NET languages compile down to MS IL (Microsoft Intermediate Language), and there is one MS IL interpreter than can hotspot compile (that is, on the fly) to assembly from MS IL.

Despite the benefits of an analysis stage, we'll be missing out on it in this example as it's exceptionally heavy on theory and not really much to do with PHP. That leaves us with the analysis stage and the output stage.

 

Want to learn PHP 7?

Hacking with PHP has been fully updated for PHP 7, and is now available as a downloadable PDF. Get over 1200 pages of hands-on PHP learning today!

If this was helpful, please take a moment to tell others about Hacking with PHP by tweeting about it!

Next chapter: Analysis >>

Previous chapter: Why make your own language?

Jump to:

 

Home: Table of Contents

Copyright ©2015 Paul Hudson. Follow me: @twostraws.