Tips about designing databases

When you design a database, you can make a series of errors or omissions that, later, can result on many headaches, or you can shelve the use of some useful resources simply because of ignorance of them. In this article I will give you a few tips that can help you to better design and management of your data and applications that manage them, especially if you are a beginner.

Use surrogate keys

All the tables in the database normally have a primary key to uniquely identify each record. Surrogate keys are a numerical, normally auto-increment type (MySQL, SQL Server) or generated by a sequence (Oracle), whose sole function is to serve as primary key of the table.

The advantage of using this type of key is that is not dependent on the restrictions and changes of the data model. A set of fields that initially must have unique values can become, due to some changes in specifications, in an invalid key for the table. In addition, indexes created on a numeric field are faster to consult and take up less space than those created on a set of fields, and the queries are easier to write.

Create a field with the record creation date

When a database process creates records in multiple tables, it is useful to have a field on them indicating the date and time of record creation. This can help us to detect bottlenecks when you have to optimize it.

It can also be useful to have a field with the date of last modification, for example, to automate the updating of data between development and production servers.

Mark the current record in a table with historical data

Sometimes a table keeps records on the status of an entity that may interest us keep to analyze their evolution over time, but only one of them represents the current state. In this type of tables you can use a field to mark this current record, so that you can avoid subqueries of the type:

SELECT * FROM TABLE JOIN (SELECT MAX(KEY), FK, DATA FROM STATUS GROUP BY FK, DATA) SQ ON TABLE.FK = SQ.FK

And substitute it with something like this:

SELECT * FROM TABLE JOIN STATUS ON TABLE.FK = STATUS.FK AND STATUS.CURRENT = 1

Do not duplicate information in multiple tables

Although sometimes it can be tempting to put information in a table that is already contained in another one to simplify some queries, this should never be done because it may result in data inconsistencies. It is preferable to create an index on that field or fields if they are being used as a filter in queries or create a view that present the complete data set in a unified way.

Isolate the data access from the rest of the application

When developing an application that must access to a database, it is convenient to separate the part that directly accesses the database of the rest of the application, using a class library or a specialized framework as EntityFramework. This made easier the task of adapting the application in case that we decide to change the database server or if you want to make an application that is independent of the server used, simply changing the class library.

Always define the foreign key constraints

Although it may be tedious to define a foreign key constraint whenever a table is related with others, they should be always defined, as they guarantee the data integrity. In addition, this facilitates the work with applications for management and development of the database, automating some tasks such as creating diagrams and creating relationships between objects that map the tables.

Always use parameterized queries

When you write a program that use a database, sometimes you can make the mistake of building a query by concatenating the values of the fields you want to use as a filter, for example with the data entered by the user in a TextBox control. This should never be done for two main reasons: firstly, if you launch a query constructed in this way, each time it will look different for the database engine, which must analyze it each time and cannot store it in the cache, so that the application performance will be affected.

The other reason is related with security. There is a type of attack, known as SQL injection, in which a malicious user can enter as a value something like this:

(DELETE FROM TABLE)

If you concatenate this into a SQL statement, it will become:

SELECT * FROM TABLE WHERE FIELD = (DELETE FROM TABLE)

With foreseeable consequences. To avoid this, you should use always parameterized queries, and pass the values entered by the user in these parameters. It is more efficient and safer.

Know and use the server catalog

The database servers have some views, tables and stored procedures that constitute the catalog, where all information about objects from different databases, names and fields of tables and views are stored, their data types, etc. This information is very useful when you develop maintenance tools or scripts. SQL Server and MySQL have a very similar catalog, while Oracle provides views totally different.

Use views or stored procedures for complex queries

Complex queries that run often are best implemented through a view or stored procedure. This will prevent to have to adjust your data access application whenever you have to modify the query. Think that the more complex is a query, the more likely it is that you have to change something at some point.

When you can, always implement asynchronous data access

There is nothing more frustrating to a user of a program that to have to wait for the results of a long query that has launched to continue working. When queries can be costly in time, it is always better to perform the query in the background, so that the user can continue with other tasks until the results are returned.

I hope that these tips can help you to develop better data access tools, which is essential in order to be considered a good programmer.