Object-Oriented Data

by Sergey Kucherov, Sep 2019

Software development is a mix of art and science. We are looking for ways to make software faster, cheaper, smarter, and, above all, more efficient. Efficiency is defined as software’s value divided by its use complexity.

The record for the most efficient software was established in 1979 when Dan Bricklin and Bob Frankston introduced the world to the first electronic table—VisiCalc. We have to admit that in forty years since then, nobody has been able to beat this record.

Electronic tables. Spreadsheets. The only software that anyone can use without any technical expertise or training. Every aspect of its user interface is intuitive and functionality is virtually limitless. Electronic tables not only change the way individuals handle data but also revolutionize the entire software industry. Before VisiCalc (and for many years after), software used computer resources to transform input into output. VisiCalc brings the concept of function programming all the way up to the user. Even on slow machines, you can see the value of calculated cells almost immediately. That creates an illusion that the spreadsheet “knows” the correct answer, not calculates it.

Today, I present a simple idea that takes this concept to the next level.

Data Objects

A data structure is a collection of information organized using key/value pairs. Unlike other forms of data (free-form text, an array of numbers, or a stream of bits), data structures have a predefined format (a schema). For the last forty years, the IT industry has invented many open standards to store structured data: XML, JSON, YAML, Avro, Parquet, Pickle, HDF5, etc. The concept of object-oriented data can be applied to any of them. I use JSON format in this article because it is simple, easy to read, understand, edit, and process.

Encapsulation

Following the path of object-oriented programming, we may enhance data structures to data objects. The idea of a data object can be illustrated using the following example:

{
  "firstName": "John",
  "lastName": "Doe",
  "dateOfBirth": "1979-10-17"
}

Listing 1: Json Record

Obviously, this record contains information about a person. The object structure and format can be described using another data object that is called JSON schema:

{
  "type": "object",
  "properties": {
    "firstName": { "type": "string" },
    "lastName": { "type": "string" },
    "dateOfBirth": { "type": "string" }
  },
  "required": ["firstName", "lastName", "dateOfBirth" ]
}

Listing 2: Person JSON record schema

You can use many libraries and online tools to validate the object against the provided schema. The validation will give you a single verdict: true or false. The function that validates the data object looks like this:

boolean matchObjectToSchema(JsonNode data, JsonNode schema)
{}

If the object does not match the schema, some functions may also return hints on which fields the schema requires are missing or have the wrong value or type.

Schemas are used to ensure interoperability of interfaces and data quality. They do not do anything else. We can extend its functionality by adding an optional property called expression. For example:

{
  "type": "object",
  "properties": {
    "firstName": { "type": "string" },
    "lastName": { "type": "string" },
    "dateOfBirth": { "type": "string" },
    "age": {
      "id": "age",
      "type": "integer",
      "expression": "@years(@today - dob)"
    }
  }
}

Listing 3: Person data object schema

The property contains a formula instructing the JSON engine to create a new data field and calculate its value. If you apply the schema to the original object you will receive the following output:

{
  "firstName": "John",
  "lastName": "Doe",
  "dateOfBirth": "1979-10-17",
  "age": 40
}

Listing 4: Person data object

The new JSON engine can use the extended schema to create another JSON object by evaluating expressions:

JsonNode evaluateJsonData(JsonNode data, JsonNode schema)
{}

The formula language refers to variables located in the original data objects the same way the Excel formulae refer to other cells in the spreadsheet. The source data does not have to include the new fields. JSON engine will automatically add them to the output. It can do this every time you read the JSON object. So, if you read this object next year, the age will be 41.

Now, are you ready for the good news? Following the design of electronic tables, the expressions may also refer to the calculated fields. And in JSON, a field can be more than just a single value.

Lists and arrays

Any JSON element can contain a value, an object, or an array. The expression can be used to perform map-reduce operations. For example, the following schema will calculate average score of the players:

{
  "type": "object",
  "properties": {
    "score": {
      "id": "listPlayers",
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "total": { "type": "number" }
          },
          "required": ["name", "total"]
        }
      ]
    },
    "average": {
      "type": "number",
      "expression": "@average(@all(listPlayers[].total))"
    }
  },
  "required": [ "score", "average" ]
}

Listing 5: Using array in the formula

Since listPlayers is an array of objects, the expression can map it to an array of numbers. Then, it uses the function average to calculate the average score. Here is what the result may look like:

{
  "score": [
    { "name": "Pete", "total": 13.2 },
    { "name": "Alise", "total": 23.6 },
    { "name": "Simon", "total": 14.8 },
    { "name": "Helga", "total": 12.1 }
  ],
  "average": 15.925
}

Listing 6: Average score

Needless to say, adding new players or changing any score will immediately be reflected in the new average.

Assigning ID to a field allows you to use the field in the formula. However, the formula can also access data from its “home” branch using field names:

{
  "type": "object",
  "properties": {
    "employees": {
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "rate": { "type": "number" },
            "hours": { "type": "number" },
            "total": { "type": "number", "expression": "$rate * $hours" }
          },
          "required": ["name", "rate", "hours", "total"]
        }
      ]
    }
  },
  "required": [ "score" ]
}

Listing 7: Using local field reference

In that example, every employee array record will be processed individually. The formula refers to the fields of the records by name using the $ prefix. Here is what the result may look like:

{
  "employees": [
    { "name": "Pete", "rate": 13.2, "hours": 7.5, "total": 7.5 },
    { "name": "Alise", "rate": 23.6, "hours": 6, "total": 7.5  },
    { "name": "Simon", "rate": 14.8, "hours": 12.5, "total": 7.5  },
    { "name": "Helga", "rate": 12.1, "hours": 4, "total": 7.5  }
  ]
}

Listing 8: Calculate the value for every object in the array

Recursively, every object inside the data object is also a data object.

Hidden info

Sometimes, you need to make calculations that will be used in other formulas, but you do not want to show the intermediate calculation result in the output document. In that case, you can make the property hidden:

{
  "type": "object",
  "properties": {
    "score": {
      "id": "listPlayers",
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "name": { "type": "string" },
            "total": { "type": "number" }
          },
          "required": ["name", "total"]
        }
      ]
    },
    "sorted": {
      "id": "listSorted",
      "hide": true,
      "type": "array",
      "expression": "@sort(listPlayers, 'total', 'ASC'))"
    },
    "winner": {
      "type": "string",
      "expression": "@top(@all(listSorted[].name),1)"
    }
  },
  "required": [ "score" ]
}

Listing 9: Schema with hidden values

The result will contain the winner but not the sorted list:

{
  "score": [
    { "name": "Pete", "total": 13.2 },
    { "name": "Alise", "total": 23.6 },
    { "name": "Simon", "total": 14.8 },
    { "name": "Helga", "total": 12.1 }
  ],
  "winner": "Alise"
}

Listing 10: Data object with the calculated winner

Inheritance

The schema defines a class of data objects, and we can use more than one schema. For example, let’s take a schema for a Person specified in Listing 3 and make it accessible in our network under the ID /people/person. Now, we can extend the schema the same way we extend a Java class. To do this, we introduce a new keyword, $extends, that refers to the schema of the parent class.

{
  "$extends": "https://example.com/schemas/people/person",
  "type": "object",
  "properties": {
    "order": {
      "type": "object",
      "properties": {
        "subtotal": { "type": "number" },
        "tax": { "type": "number" },
        "shipping": { "type": "number" },
        "total": {
          "type": "number",
          "expression": "(#subtotal+$shipping)*(1+$tax)"
        }
      },
      "required": [ "subtotal", "tax", "shipping" ]
    },
    "restricted": {
      "type": "boolean",
      "expression": "age < 18"
    }
  },
  "required": [ "order" ]
}

Listing 11: Inhertited schema

The JSON engine uses the expression provided in both schemas to calculate the total amount and to verify that the person can make the purchase (age restriction). The final object will look like this:

{
  "firstName": "John",
  "lastName": "Doe",
  "dateOfBirth": "1979-10-17",
  "age": 40,
  "order": {
    "subtotal": 78.12,
    "tax": 0.035,
    "shipping": 20.50,
    "total": 102.07
  },
  "restricted": false
}

Listing 12:Data object with inheritance

Polymorphism

We cannot declare our design as truly object-oriented unless we provide a way to manage class abstractions. That can be demonstrated using the following examples. Let’s define a schema that describes an abstract geometric figure. The schema will be accessible in our network under the ID /geometry/shape:

{
  "type": "object",
  "properties": {
    "type": { "type": "string" }, 
    "size": { "type": "object" },
    "area": { "type": "number" }
  },
  "required": [ "type",  "size" ]
}

Listing 13: Geometric shape schema

Now, let’s define a schema that describes the specific geometric figure – a rectangle. We will add specific requirements for the shape size and an expression to calculate the shape area automatically:

{
  "$extends": "https://example.com/schemas/geometry/shape",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": "rectangle" }, 
    "size": {
      "type": "object",
      "properties": { 
        "A": { "type": "number" },
        "B": { "type": "number" }
      },
      "required": [ "A", "B" ]
    },
    "area": {
      "type": "number",
      "expression": "$size.A * $size.B"
    }
  },
  "required": [ "type",  "size" ]
}

Listing 14: Rectangle shape schema

Unlike programming languages, the class of the data object is not stored in the data. It will be determined by matching the data object with the schema. For example, the following object can be identified as a rectangle because it matches the schema listed above. The JSON engine will use the expression provided in the rectangle schema to calculate the area of the shape. The final output will look like this:

{
  "type": "rectangle",
  "size": { "A": 6, "B": 4.5 },
  "area": 27.0
}

Listing 15: Rectangle area calculated

To calculate the area of other shapes, we need to define a different schema for every shape. Every schema will have its own way of representing shape size and the formula to calculate the area. A triangle data object may look like this:

{
  "type": "triangle",
  "size": { "A": 8, "B": 5, "C": 9 },
  "area": 19.9
}

Listing 16: Triangle area calculated

If you follow this trend, then the circle object will look like this:

{
  "type": "circle",
  "size": { "radius": 8 },
  "area": 201.062
}

Listing 17: Circle area calculated

Conclusion

The success of VisiCalc inspired a wave of spreadsheet applications, including Lotus 1-2-3 and Microsoft Excel. Today, these programs offer many more features than their original prototype. However, the core feature remains intact: the ability to keep and change data and data processing logic together in a single file. In some sense, the electronic table was (and still is) a pocket ERP.

Using a domain language to define custom functions is not a new idea. All I did was combine some design patterns of a spreadsheet and SQL stored procedures. What makes the difference is the timing. The full potential of this approach can be realized only when we have a robust computing environment. The VisiCalc was released to use the processor and memory available back then. Today we 50,000 times more memory and use processors that are a million times faster:

Year PC Processor MIPS PC RAM
1979 Intel 8088 0.75 640k
2019 AMD Ryzen 9 3950 X 749,070 32Gb

This power allows us to “reinvent to simplify.” Instead of using an old, slow, overweight word processor, I am writing this text using markdown. The text is automatically rendered in real-time using nice fonts. The JSON code is automatically formatted with highlighted keywords. Thanks to a fast network connection, I can check the spelling and grammar online. We are used to the magic. Today, we are walking around with a pocket supercomputer that can use a wireless network to make video phone calls and play movies.

All this power allows us to manage data smarter. We can develop data plugins that evaluate JSON expressions automatically every time you save the file (or even in real-time, as you type). Imagine NoSQL databases such as MongoDB or Azure Cosmos supporting object-oriented data. Our laptops can easily manage files containing hundreds of megabytes of data. Using object-oriented data, every file can be treated as a pocket DB server. I say we can afford such an upgrade.

Copyright © Sergey Kucherov - All rights reserved