Data types or simply types, are categories of data based on limitations on the values that they may have making them easier to manipulate.

At machine code level the boundary between data and instruction becomes less evident so programming languages like Assembly had no need for data types. But even the simplest of operations are arduous to implement in such languages and with the rise of applications that were concerned with the manipulation of data, the higher level programing languages, the concept of Type appeared. Having predefined limitations on their values, brought inherently by certain characteristics imposed a-priori, it became easier to implement sets of instructions (operations and functions) that can manipulate these groups of data without the need to know their values prior to the construction of these functions.

There is an important aspect to be said with regards to the type. The type itself, the category to which data belongs to can and often is information. It is probably the only kind of information that is available to the computer software and not the human operator. Because the type often determines the kinds of operations that can or must be performed on pieces of data, it represents an element that can be a precursor in the decision making process of the computer software. Even if such decision making process is nothing else but a pre-programmed set of instructions, it nevertheless is a sort of knowledge mechanism (see more at: The semantics in the computer code).

◊ Primitives

Primitives are the most basic of data types and have the widest acceptance of all the types. Almost every programming language makes use of them and are usually part of the core of the programming language.

Although they may be called different names (“int”, “integer”) in different programing languages they usually refer to the same thing and they are needed because they represent a necessity of the computer programming reality. Contrary to what the name suggest, an “Integer”, is not any mathematical integer but rather only one of the mathematical integers in the range [-231, +231-1], that are representable on 32 bits. As such, the Integer type is a partial wrapper of the mathematical integer numbers combined with a wrapper of a limitation in the computing reality: that information is ultimately encoded in bits (ones and zeros) and only a finite number of bits fit onto physical devices such as a block of memory or the stack. If more precision is needed, a wider range type can be used called “long integer” which can take value in the range [-263, +263-1], a 64 bit representation system, nevertheless, it will still represent the combined concept as in the case of simple simple int.

  • Example 5, Some common primitives
  • boolean: represents a logical value (true or false, one or zero) on a bit.
  • integers: represents an integer value in the range of [-231, +231-1] or [-263, +263-1],
  • floating points: represent a real number in floating point representation. The range is much larger than that of integers for the same space they occupy. The precision varies with the magnitude of the number: the larger the numbers, the bigger the gap between two consecutive representable number (as real numbers have infinite precision and as such are impossible to represent with finite resources).
  • character: represents an alphanumerical character on a byte, usually
  • string: represents a sequence of characters, arbitrary texts (usually human readable)
  • byte array: represents a raw sequence of data of any kind (usually not human readable)

These primitives (the most basic level of them) are a lot more about representation rather than the actual meaning of what they encode: concepts like integer, or real are only loosely encoded into these types. In fact, if arbitrary precision is needed for special purposes like scientific data or accounting, custom representations need to be created, like BigInteger or BigDecimal in Java programming language, because regular primitives are not suitable.

These things may not be new for the intended audience, but it is important to name them because there is a lot of confusion between the concept and representation of the concept, that stem from the fact that the representations are named suggestively to match the concept that they loosely encode and make it easier for the developer to work with.

In fact they could be just as well be named “regetni”1 or “elbuod” or “naeloob”, because it would be just the same from the compiler / interpreter or machine code’s perspective. They would only be a lot harder to work with. This is why virtually all programming languages that use mathematical operations will name the integer type suggestively as “int” or “integer” and as such, the terminology became universal.

Rarely do any programming language extend their set of primitive types beyond these representational primitive. It is the responsibility of the developer to correctly encode any custom concept onto these universal structures such that it captures all the subjective information regarding those concepts within the context of the client and the provided specification.

◊ Variables

Before we get to the structured data types, it is important to mention the concept of variable, another extensively used artifact in computer science. In programming (at least from the aspect discussed by this paper), variables are containers that can be used to store values and operate on them.

  • Example 6, General format of a variable.
  • Type name = initial value;

Variables are usually defined as an identifier (a name) which is used as locator for the stored value (stack or memory), a type that serves as constraint for the values that can be assigned to the variable and it may also contain an initial value. Identifier is usually ubiquitous in all languages except for very low level languages like Assembly2, but the other elements vary from language to language.

This seemingly simple construct is so powerful that it is used everywhere in computer programming. Everything from simple memory zones allocated for temporary storage, cells in a spreadsheet, a database or a dot on the screen can be thought of as a variable: a container in which value can be stored. This concept of placing a value in a container is essential to the computing process, it is the only way operations can be made in a serial system, but beyond that, things like type and name are really irrelevant once the software becomes machine code. Nevertheless they are extensively used and very popular too, because both, type and name have the power to carry semantics into the process of creating the computer program. We do not give it a lot of thought, but the fact that we can assign meaningful names to variable revolutionized computer programming. We’ll discuss more on this subject later.

◊ Arrays & Matrices, Graphs, Trees & Maps

It is sometimes useful to be able to work with collections of data which can be handled in bulk according to some characteristics. Arrays, matrices (multi dimensional arrays), lists, sets, trees, maps are all such collections.

It is important to note that although programming languages treat them under the same umbrella, these collections are not in fact data types, but rather compound variables. Variables that have multiple slots where data can be placed, according to specific rules, some fixed some dynamic in nature. In the case of arrays and matrices, which are similar to those in mathematics, the slots are accessible by their position, and they can have uni or bi-dimensional structure respectively and occasionally even more. In the case of linked structures like linked lists, trees, or graphs, access is done according to the relation between elements. People that are still familiar with older programming languages like Standard C or Pascal, can recall that these linked data structures did not even exist, back then as part of the standard API. They had to be defined as collections of dynamically allocated memory zones and then linked with one another.

Maps are interesting because they are very similar with the concept of structure, as elements in a Map are accessible by their names, so to speak, and as such additional information exists besides the value of the variable itself, in the form of a key or a name. As opposed to structures, this information can be carried into the application, and be part of the program’s execution.

◊ Structures

Concepts and values handled by these modern information manipulating applications go well beyond simple integers or reals or values of truth. To be able to easily manipulate data that are complex, structured types were created out of which developers can construct complex types that represent complex concepts from reality.

      • Example 7, Structure
      • Book{
        	String title;
        	String author;
        	String publisher;
        	Date datePublished;
        	string ISBN;
        	etc …
        }
        

Suppose the concept of Book has to be wrapped in a computer program and that in the eye of the client a book would be described by a series of properties, such as title, author, etc … example: 7. As such, these properties have to be treated together for each individual book, otherwise it would be really difficult to track all these properties.

Structures contain ordered groups of data items. Unlike the elements of an array, the items within a structure can have varied data types and are accessible using similar syntax as variables. Classic programming languages treat these these properties as the definition of the type, in the current case a Book, and model them conceptually together with various paradigms.

The Relational Model, considers the type book as a relation between the typed items that construct the book and packs them together in relations (better known as tables), where each row is a group of values that together represent an individual relation, a book in our case. The final structure is in fact a matrix of values, where the rows represent individual relations (book entries) and each column represents one particular aspect of all known (stored) relations. A language was built which very efficiently handles manipulations related to the storage of the data in this form (storing, recovering, filtering, etc …) due to the reduced complexity of the architecture (both structure wise and operation wise). To make a blunt analogy, databases are memory zones, with each table being a matrix like the one mentioned above accessible via a variable, which is the table name.

To efficiently handle data like Book, programming languages don’t treat them as arrays of coupled values, but rather like structures, variables that have variables inside. Instead of book = {book[0], book[1], book[2], … }, it becomes book = {book.title, book.author, book.publisher, …} which is a lot easier to work with. The approach allows developers to embed semantic elements into the construction of the program, the source code, an aid that makes the program many times easier to develop, maintain, upgrade, handed over to other developer, test, etc …

  • 1, Integer written backwards
  • 2, Assembly language is an instruction oriented language which is very close to machine language. Variables are not used, operations are done by inserting values directly into memory addresses or operator registries.