PHP attributes are so awesome I just had to add attribute based field mapping to my ORM

I wrote this post to talk about the architectural decisions I had to make in upgrading my ORM recently to PHP8 as well as the general justification for using an ORM in the first place as a query writer, runner, and object factory and why I consider attributes to be the holy grail of ORM field mapping in PHP.

Writing Queries

I've always been fascinated with moving data in and out of a database. I've been at this from PHP4 days, you know. As hard as it is to believe even back then we had a testable, repeatable, and accurate way of pushing data into and out of a database. I quickly found myself writing query after query.... manually. At the time there was no off the shelve framework you could just composer require. So I found myself thinking about this problem again and again and again and again.

Put simply when you write SQL queries you are mapping variable values in memory to a string. If you are one of those people that claims that an ORM is bloat you'll find yourself writing hundreds of queries all over your projects.

If you've ever imploded an array by a "','" but then ended up doing it in dozens or hundreds or thousands of places the next logical step is to stop rewriting your code and write a library.... and hence an ORM is born. An ORM is an absolute statement; that "I will not rewrite code for handling a datetime" and then actually following through.

To make this happen you must write some sort of translation layer from PHP types to SQL types. In most ORMs this is called mapping.

Before PHP8 the typical thing to do was to use a PHP array defined statically containing all that information. Like this. In fact this is from Divergence from before version 2.

    public static $fields = [
        'ID' => [
            'type' => 'integer',
            'autoincrement' => true,
            'unsigned' => true,
        ],
        'Class' => [
            'type' => 'enum',
            'notnull' => true,
            'values' => [],
        ],
        'Created' => [
            'type' => 'timestamp',
            'default' => 'CURRENT_TIMESTAMP',
        ],
        'CreatorID' => [
            'type' => 'integer',
            'notnull' => false,
        ],
    ];

But this kinda sucks. There's no way to do any auto complete and it's just generally a little bit slower.

A few frameworks decided to support mapping using annotations (an extended PHPDoc in PHP comments) and even yaml field maps but those are all just bandaids on the real problem. Which was that there was no real way to provide proper field definitions using the raw language's tokens, instead relying on the runtime to store and process that information.

Attributes

So that's my long winded explanation about why my absolute favorite feature of PHP 8 is attributes. Specifically for ORM field mapping. Yea really.

    #[Column(type: "integer", primary:true, autoincrement:true, unsigned:true)]
    protected $ID;

    #[Column(type: "enum", notnull:true, values:[])]
    protected $Class;

    #[Column(type: "timestamp", default:'CURRENT_TIMESTAMP')]
    protected $Created;

    #[Column(type: "integer", notnull:false)]
    protected $CreatorID;

This is sooooo much cleaner. Look how awesome it is. Suddenly I have auto complete for field column definitions right in my IDE!

Now the code is cleaner and easier to follow! You've even got the ability to type hint all your fields.

Once we take field mapping to it's logical conclusion it becomes practical to simply map as many database types as we can into the ORM. We can even try to support all of the various field types that could be used in SQL and represented somehow in PHP. Of course to make this happen it becomes necessary to tell the framework some details about the database field type you decide to use. For example you can easily see yourself using a string variable type for varchar, char, text, smalltext, blob, etc but most ORMs aren't typically smart enough to warn you when you inevitably make a mistake and try to save your larger than 256 character string to a varchar(255). If you were to build all of this yourself you would invariably find yourself creating your own types for your ORM and doing a translation from language primitive to database type and back just like I am here. This gets even more complex when an ORM decides to support multiple database engines. Once this becomes more fleshed out you can even have your ORM write the schema query and automatically create your tables from your code.

Here for example I'm gonna go ahead and create a Tag class and then save something.

class Tag extends Model {
    public static $tableName = 'tags';
    protected $Tag; // this will be a varchar(255) in MySQL, a string in PHP
}

$myTag = new Tag();
$myTag->Tag = 'my tag';
$myTag->save();

With these few simple lines of code we've created a tags table and saved a new Tag "my tag" into our new table. The framework automatically detected the table was missing during the save and created it and then followed through by running the original save. Excellent for bootstrapping new projects or installing software.

Protected? Magic Methods

Traditionally it's common to think that __get and __set are triggered only when a property is undefined. However it is also triggered if you try to access a protected property from outside of the model. When you access a protected attribute from outside of the object it will always trigger __get when retrieving and __set when setting. For this reason I decided to use protected attributes in a Model for mapping.

The way in and out allows us to do some type casting. For example Divergence supports reading a timestamp in Y-m-d H:i:s, unix timestamp, and if it's a string that isn't Y-m-d H:i:s it will try running it through strtotime() before giving up. But that will only ever happen when __set is called thereby starting the chain of events leading to the necessary type casting for that field type. Unfortunately the downside to using a protected in this context is that when you need to access a field inside the object you can't use $this->Tag = 'mytag'; because it won't trigger __set and so what's gonna happen is you will end up messing with the internal data of that field incorrectly. So for the specific context where you're working with the field directly inside of the object itself you should actually use setValue and getValue instead. Frankly you can use __set and __get directly but let's be civilized here. This caveat is why I would like to see the ability for PHP to have __set and __get triggerables configurable to do so on existing object members.

Relationships.

But wait. There's more! Now with the ability to use Attributes for field definitions we can use them for relationship definitions as well.

    #[Relation(
        type: 'one-one',
        class: User::class,
        local: 'CreatorID',
        foreign: 'ID'
    )]
    protected $Creator;

    #[Relation(
        type: 'one-many',
        class: PostTags::class,
        local: 'ID',
        foreign: 'BlogPostID'
    )]
    protected $Tags;

Relationships take Attributes to the next level.

$this->getValue('Tags');

This is all you need to pull all the Tags as an array. Most relationship types can be easily expressed in this format.

$Model->Tags // also works but only from "outside"

You can run it like this as well from outside of the Model.

PHP in 2023 is blindingly fast.

Today the entire suite of tests written for Divergence covering 82% of the code completes in under a second. Some of these tests are generating thumbnails and random byte strings for mock data.

For reference in 2019 the same test suite clocked in at 8.98 seconds for only 196 tests. By the way these are database tests including dropping the database and creating new tables with data from scratch. The data is randomized on every run.

Performance

What you are seeing is [time since start] [label] [memory usage] at print time.

The first part where it generates 1000 data points is all using math random functions to generate random data entirely in raw none-framework PHP code. Each Canary has 17 fields meant to simulate every field type available to you in the ORM. Generating all 17 1000 times takes a none trivial amount of time and later when creating objects from this data it must again process 17,000 data points.

Optimizing the memory usage here will be a point of concern for me in future versions of the ORM.

All the work related to what I wrote about in this post is currently powering this blog and is available on Github and Packagist.

From the Terminal