The following article was originaly posted on Vertabelo blog.
Security Approaches in Data Modeling
Applying Simple Access Control to a Small Data Model
“Information is the lifeblood of any organization…” We hear a lot of statements like this, or about an “information age,” or an “information economy.” When we agree with belief that amplifies the importance of information in the world today, we have to consider how to make that all-important information secure. Who can see my bank account? Was the facilities maintenance contract lost? Why can’t I get the latest lab report? The database professional has such concerns magnified by the thousands or millions, and so modeling for data security is critical. Here we look at some issues in securing and controlling access to information stored in a database.
Information Security: Start with Terms
As a topic, information security doesn’t lend itself to brief blog entries, so to help database novices get up to speed before we cover substantial issues regarding security and data modeling, let’s review some important terminology.
CIA – an Apt Acronym
The key security qualities for an information system are C-I-A: Confidentiality, Integrity, and Availability. The fact that this acronym is shared with America’s Central Intelligence Agency is felicitous.
- Confidentiality means that data can only be used by authorized people in authorized functions.
- Integrity means the data is correct, and that it is consistent with all the other data.
- Availability means that every authorized function works correctly for every authorized user within the required response time.
Important note: these three qualities conflict with each other. High confidentiality often causes low availability, for example. These qualities must be balanced.
Disclosure
Ordinarily, this means sharing a piece of information with someone who wouldn’t otherwise know it. In the context of information security, Disclosure indicates a security incident with a loss of Confidentiality. This is the first important information to look for in any news story on a data breach. An example is an article in the Wall Street Journal for October 9. Here’s where you look:
Figure 1: Data breach news in the Wall Street Journal
Here’s the disclosure – 3500 individuals’ payment card info.
Sensitivity
Sensitivity of some information is a way of describing how important it is to keep it secure. This may refer to some intrinsic value of the data, or the likely cost if breached. In terms of ‘cost if breached’ – for individuals, government-issued ID numbers are often sensitive – the U.S. Social Security Number is very sensitive! For businesses, investments in technology, partnerships, and details of mergers are very sensitive. ‘Value of data’ sensitivity could refer to intellectual property such as proprietary formulas for plastics or construction plans.
Authentication
Authentication is how you log in to a system. It’s the mechanism by which the computer system identifies some ‘actor’ in order to determine what information to share or what not to share. Note that an actor may be a (human…?) person, a group, a proxy, or a program.
Figure 2: Authentication in the Google login
You may not actually know all the times you’re authenticating. In Figure 2 above, Google acts like it doesn’t know anything about you, but you’ve already been identified by your Internet address, browser, and cookies in your browser. This is authentication as well, but we can call it ‘low grade.’
Authorization
Authorization is a rule by which a user is granted or denied the use of a specified function on a specified data object. This may also be called access control, privileges, permissions, grants, or rights. For the computer system, it is some way to get to a “yes” or a “no” on an access request. There are different ways to perform authorization based on how the actor or requestor is handled, and how the data object is handled.
Discretionary Access Controls
One main form of authorization permits specific data for specific users or groups. This relatively intuitive, “let Jemimah see my invoice records” authorization is sometimes called Discretionary Access Control. For this, consider another non-business example – Facebook:
Figure 3: Authorization in the Facebook ‘Privacy’ Settings
That privacy drop-down gives me the ability to authorize some group or another to see my tropical photos from February 2015.
Access Control List
An Access Control List is just a list of permissions like this. It enumerates authorization rules on a single piece of information. While you may say “let this group see the album,” you may want to say “people in this group can re-share the album” and “this person can add to the album” as well. You want each of those distinct permissions to apply, making it not a single authorization but an access control list.
Mandatory Access Controls: Classification, Clearance
Under this approach, people (actors) and data are only checked for sensitivity. “Mandatory” access controls set up sensitivity and privilege in a parallel hierarchy, with the same labels for both. In such systems, information is not authorized to individuals or groups, nor individuals to information, but both are identified by sensitivity level. Clearance identifies the level of sensitivity an actor can access; Classification assigns a sensitivity level to a piece of information. These distinctions are sometimes used in commercial systems but are generally a military approach. For the US military, the sensitivity levels in increasing order are “unclassified,” “confidential,” “secret,” and “top secret.” A spy movie may have some grave operative ominously voice the term “Top Secret,” provoking the young soldier’s gasp “I only have Secret clearance!” Now you know why the soldier is upset — just by looking at a piece of paper, he’s violated security policy!
How We Approach This Exercise
We can’t use a typical operational database for this security implementation exercise. I can’t remember the last time I took a data model into production with fewer than fifty tables and such complexity will obscure the key points I’m trying to make. Similarly, the number of security concerns and issues and practices on even the smallest business data model will take a lot of space which perhaps few readers will want to tackle in a single article.
So I’m going to take a purposely simple hypothetical database model, screen out much of it, and add a single table with sensitive data. I will then iterate the model to implement one security mechanism – authorization – with one form of the various authorization controls just described. Along the way I will make some simple modeling errors to intentionally illustrate some of the security problems encountered in data modeling. Hopefully there aren’t unintentional ones!
Database for a Club Website
Let’s say we already have a simple website for social or special-interest clubs. Naturally it has message boards where members can post news and who knows what, but for now we only care about clubs and members. Here is the basic structure of membership:
Figure 4: Basic Structure of Club Membership
The main entities are Person and Club, each carrying an id surrogate column as primary key. Member
records who belongs to which club, implementing a many-to-many relationship that allows a Club multiple Persons, and a Person multiple Clubs. Note the PK membership_seq
column gives this table an aspect of history where a person may join and leave the same club multiple times. Each Club has a variety of roles described in Office. The actual officers are listed in the Officer
table joining Member
(Club
, Person
) to Office
. As with Member
, Office
has a sequence field officer_seq
to handle the very frequent case of the same Member holding the same Office repeatedly.
Adding Photos to the Club Database
Let’s add a table for photos and a reference table for image formats:
Figure 5: Simple Addition of Photos for Club
We’ll store each photo in one row using a binary large object (“blob”) column, add some housekeeping, and whew, that was easy! Except wait – who is allowed to do what with this photo? Based on this data model, the only options for who can access this photo:
- the member who uploaded the photo
- members of all clubs to which the uploading member belongs
- all persons
Wait — no way. This can’t work. By attaching Photo
to Person
(via uploaded_by_member_id
), we don’t limit a photo to a specific club. This is not the way to find out your Vegetarian Club President takes pictures for his buddies in the Northeast Butchers’ Association. Well, that is easily fixed:
Figure 6: Photos are now attached to Clubs as well as Persons
Key Learning No. 1:
Give a table with sensitive data the appropriate relationships to other tables to properly control users’ access.
OK, well now we have forestalled some embarrassing possibilities! But we don’t have enough specificity to support explicit authorization of photo actions to selected members without authorizing them for all members. We also don’t have permissions for dealing with that photo one club member posted — he thought it hilarious — that offended every last other member. We need some additional control here!
Define Actions on Sensitive Data
With the existing data model, we can only discern one action on a photo: upload. We can infer that once a member uploads a photo to a club, the whole club can see it. We might also infer that any club member could delete it. This is simply not enough. Let’s discuss a few possible actions:
Action | Description | Security Concerns |
---|---|---|
View | View some photo on the club site | |
Upload | Upload a photo to the club site | Flooding site with pictures could obstruct availability, both in terms of exhausting data storage and in overloading the post/publish activity |
Post | Make photo visible to whole club | Inappropriate photos could harm the club in terms of membership or participation |
Un-Post | Make photo invisible | Dropping a posted photo could also harm the club |
Edit | Modify the photo in some way | Edits could substantially alter the way people would view a photo – for better or worse |
Delete | Delete the photo permanently | Frees up resources. May alienate someone who posted it. |
Publish | Share the photo on club’s public web pages | Publishing an inappropriate photo could damage club reputation or tax standing. |
Un-Publish | Remove a photo from public web pages | Removing a photo could protect the club or harm its outreach. |
Key Learning No. 2:
For each type of sensitive data, define the actions that can be performed in light of those actions’ effects on the organization.
Augment Roles to Handle Defined Actions
So now we need to decide how to limit access to these photos based on the actions available. I’ve purposely listed the actions in order of their potential security impact. Because of this, we can implement our simplest form of authorization – clearance.
Note also that the design of clubs allocates each role, or “office,” specifically to the club. Each club defines its own roles. This also means that attaching photo-action clearances to Office for one club will not compromise the flexibility of another club. The net effect is that we can add “Post” or “Publish” authorizations to regular club roles like ‘Treasurer’ or ‘Secretary’ or ‘Chieftain,’ and a club can add specialized roles like “Photo Editor” or “Public Outreach” with photo permissions.
Figure 7: Adding Photo Rights to Each Club Office (Role)
To keep this post brief, this includes simplifications I probably wouldn’t ship. Feel free to discuss in the comments section below along with any other thoughts.
Key Learning No. 3:
Enumerate the actions for your sensitive data in a table and refer to that table in your authorization data model.
The photo_action
table data:
Handle | Sensitivity Rank | Description |
---|---|---|
View | 1 | View some photo on the club site |
Upload | 2 | Upload a photo to the club site |
Post/Un-Post | 3 | Make photo visible to whole club, or remove it from visibility |
Edit | 4 | Modify the photo in some way |
Delete | 5 | Delete the photo permanently |
Publish /Un-Publish | 6 | Share or remove the photo on club’s public web pages |
Making the Authorization Decision
This is the crux of our discussion: Member X wants to perform action Y – do we allow it?
We start with an authenticated person P, and a club C. Can this person upload a photo?
First, let’s see what the sensitivity of ‘Upload’ is:
select sensitivity_rank from photo_action where handle = 'Upload'
This gives us a sensitivity we’ll call S.
select member_seq from member where person_id = P and club_id = C and sysdate between joined and nvl (exited, sysdate + 1)
This allows us to ensure we have a current member of the club (valid member_seq
M). Now we check if this member is an officer:
select office_seq from officer where member_id = P and membership_seq = M and club_id = C and sysdate between officer_from and nvl (officer_until, sysdate + 1)
Now we have an office_seq
value Q. Let’s check the authorization of that office:
select photo_action_clearance from club_office where club_id = C and office_seq = Q
Check the photo_action_clearance
and if its at least rank S, then it’s authorized.
Key Learning No. 4:
You must verify each step in the modeling process by using step-by-step sample queries to ensure you can store the right data to get the right answer.
We can check any clearance we want, in a single query, so long as the query does a join correctly on all the relevant tables. This is checking six members at once for clubs 1002 and 1003:
with query_parameters as ( select 9004 as mbr_id, 1003 as club_id, 'Upload' as req_action from dual union all select 2012 as mbr_id, 1002 as club_id, 'Upload' as req_action from dual union all select 2012 as mbr_id, 1002 as club_id, 'publish-unpublish' as req_action from dual ) select case when cofa.sensitivity_rank >= (select pac.sensitivity_rank from photo_action pac where pac.handle = upper (prm.req_action)) then 'Authorized' else 'PROHIBITED' end as AUTHORIZATION_DECISION , prm.mbr_id as member_id_requested, prm.club_id as club_id_requested , (select given_name from person where id = prm.mbr_id) as given_name , (select family_name from person where id = prm.mbr_id) as family_name , (select name from club where id = prm.club_id) as club_name , (select pac.handle from photo_action pac where pac.handle = upper (prm.req_action)) as action_requested, cof.office_label, cof.photo_action_clearance, cofa.sensitivity_rank as officer_clearance_value , (select pac.sensitivity_rank from photo_action pac where pac.handle = upper (prm.req_action)) as action_sensitivity from query_parameters prm left outer join member mbr on ( prm.mbr_id = mbr.person_id and mbr.club_id = prm.club_id ) left outer join officer ofr on ( mbr.person_id = ofr.member_id and mbr.membership_seq = ofr.membership_seq and mbr.club_id = ofr.club_id ) left outer join club_office cof on ( ofr.club_id = cof.club_id and ofr.office_seq = cof.office_seq ) left outer JOIN PHOTO_ACTION COFA ON ( cof.photo_action_clearance = cofa.handle ) where 1 = 1 and (mbr.person_id is null or sysdate between mbr.joined and nvl (mbr.exited, (sysdate + 1))) and (ofr.club_id is null or (ofr.officer_from <= sysdate and (ofr.officer_until is null or ofr.officer_until >= sysdate)))
A few style points on this query: I write queries in an incremental, debugging fashion. I have included most of the additional debug information in this query. I like putting query parameters in a WITH factoring at the top of my SQL, and use left outer joins to try to get some row – good or bad! – for each input row in the WITH “prm” set. I calculate the decision value in an output column to check my logic, along with otherwise-needless columns from the constituent tables. However, to put this into production, we only need the AUTHORIZATION_DECISION
column to be selected. And all of those outer joins can be straight joins.
Key Learning No. 5:
Always test your model all the way through. Generate the tables in a working database, create sample data with a variety of values. The more data with the more randomization, the better. Then test with valid and invalid data – users, members, clubs, actions. This simple example had several quirks needing adjustment to give me assurance (and you, dear reader, a valid example).
Simple Authorization for Club Members/Officers to Upload
We now have a basic authorization scheme by which to allow or prohibit any action a user might request. We’re not dealing with the actual mechanics of uploading or photo display, but this is a simple framework for answering a simple security problem.
Also, if you look closely you’ll see we are not allowing ‘regular’ members to view, upload or do anything with photos. I’d like to invite readers to give suggestions on how we would do this without seriously distorting the data model by short-circuiting ‘View’ or ‘Upload’ for all members of all clubs.
Conclusion
Don’t skimp on security as you work your data model. Use these first steps as a guide. What is your sensitive data? What are the operations to perform on it? Who can do which operations? Some aspect of security is in every database system, whether you recognize it or not. And if you are building a business database, you want it to grow! What will happen when fourteen users becomes 140 or 14,000? Keep these things in mind, and we will explore them further in later installments of this series.