Bitcoin Forum
June 20, 2024, 04:12:23 PM *
News: Voting for pizza day contest
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Random thought: Logging as a key design principle in relational databases  (Read 158 times)
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5236
Merit: 13089


View Profile
February 05, 2019, 05:09:35 PM
Merited by Foxpup (8), dbshck (5), qwk (1)
 #1

I've dealt with relational databases for well over 10 years. One thing which I've realized in the past few years but which I've never really seen mentioned in any of the books or guides about this (though maybe I've just missed it) is the following principle of effective relational database design: You should always start designing a relational database schema from the perspective of logs. For example, instead of having an "email_address" column in a users table which gets updated from time to time, you should have a table like email_log (user_id, time, email_address), and the user's current email address will be their newest entry in that log.

On countless occasions I or someone else had not done this, and I'd regretted it. Structuring things in this log-based way:
 - Promotes application-level database consistency, since you can't as easily update data outside of the intended contract.
 - Gives you access to a more complete picture of each piece of data, which you often want later.
 - Makes normalization the default.
 - Starts you off on a good footing performance-wise, since insertions into a log table are usually near-free and non-locking. Because two processes never need to write the same data at the same time, contention is kept to a minimum. Locking/contention is in practice the biggest performance issue for many applications, such as websites.

Now, sometimes it's too annoying to always do this. If your idea of a "user" is the end result of a dozen or more logs, then it may be difficult or performance-poor to perform more complex queries on the data. In this case, you can create more traditional "caching" tables using triggers on the log tables (or at the application level, or by using DBMS support for materialized views). Since these tables are only derived from the real data, in some cases it may be acceptable to do this on a "best-effort" basis, for example by only updating the caching tables occasionally or by using low levels of transaction isolation when dealing with the caching tables.

It's fairly common I think to do the reverse of the above paragraph, where you have triggers on the main tables which fill up log tables, but I don't regard this as ideal because it removes most of the advantages I listed earlier.

If logs become too large to be performant or convenient, it often requires no changes to the application to just delete all non-latest log entries older than <some time>.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
qwk
Donator
Legendary
*
Offline Offline

Activity: 3542
Merit: 3413


Shitcoin Minimalist


View Profile
February 06, 2019, 12:07:09 AM
 #2

A very old idea of mine that I've never come to implement has been for an online forum to keep a record of all prior versions of an edited post.
I.e. you'd have a table with post_ids and another table where you have one entry for each version of the post.
In a view, you'd simply always display that as one table, of course (which would possibly make it backwards compatible with existing forum software).
To save some space, old versions of posts could simply be diffs from the latest version (or the first version, whichever you prefer).
No more deleting posts, only superseding them with new versions, admins & maybe privileged users like moderators always being able to see what you edited in or out.

Of course, that's totally against GDPR laws' "right to be forgotten" Wink

Yeah, well, I'm gonna go build my own blockchain. With blackjack and hookers! In fact forget the blockchain.
theymos (OP)
Administrator
Legendary
*
Offline Offline

Activity: 5236
Merit: 13089


View Profile
February 06, 2019, 04:02:30 AM
 #3

A very old idea of mine that I've never come to implement has been for an online forum to keep a record of all prior versions of an edited post.

Epochtalk works like that. I think that currently the edit logs are all public, though users may end up being too uncomfortable with that.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!