Kevin Lyon

Getting Your Head in the Cloud: 5 Tips on Distributed Data and Discipline

Our institution recently (generously/mercifully) provided the entire university access to Box.com, an unlimited online cloud storage solution. While many in our office were already pro subscribers of Dropbox or Office 365/OneDrive, the addition of an officially available solution for all faculty, staff, and students opened many, many opportunities—but brought a few challenges as well.

Some of the users we’ve worked with following the release were already familiar with cloud storage solutions, which also means they are well aware of the “data discipline” required when you have near limitless (or in our case, actually limitless) storage that can span across physical hardware setups and locations. However, some who were new to this, or just those who—let’s just say “have trouble with cleaning up” their files—needed a bit of a primer on data discipline and how to avoid the digital dumping ground.

This project has also come into focus for us as we begin exploring using a Learning Object Repository (LOR) that could one day be shared with various departments and faculty across the university.

Data Discipline here refers to having a sense of logical order and maintenance for the files stored in these locations—and actually sticking to that disciplined approach to that data. For both a cloud storage site and an LOR, finding what you need and making use of it is the paramount objective, and that becomes impossible when the root folder (the “home” or “first” location upon opening the site) is hosting all the individual files, or worse, multiple copies of these files.

The points below provide a good place to start (i.e. this is not an exhaustive list) for setting up your own cloud storage site or any shared repository.

Folders are your friends

Chalk this up to my firm belief in “a place for everything, and everything in its place,” but folders can rarely be overused.

While some might argue that you should set up any of these shared repositories to use as few clicks as possible, that isn’t always possible, and really depends on the kinds of information being shared within the repository.

I am not suggesting that each file has its own folder, but each kind of file or project should be separate from items that are unrelated. Pictures go in the pictures folder marked by event or date, documents go in the document folder by time, date, or other identifier, music goes in by artist and album, etc.

Coupled with a version control system, this helps to make sure that users are finding the project or document they need without having to dig through multiple copies of the files or unrelated versions or files. Nobody likes having to determine if fileFINAL.doc is more correct than fileFINALFINAL.doc or fileFINALFINALFINAL.doc. Version control or an archive folder can remedy this.

Version control with an iron fist

While I already discussed version control in the last point, it is worth stressing on its own.

Version control means that as you make an update to a file or set of files, those changes override older versions of the document (though often retains some access to the original versions that came before, in case you need to revert). Automatic systems do this by appending a version number after anything gets edited, which means you can also choose to go back to any earlier versions sans-edits.

Manually controlling these versions might look like appending a date of some kind onto the file, or giving your own version control numbers. Almost as important as the numbering scheme is using an archive folder to move the older versions into to make sure they are not living alongside the main version. Moving the older versions into an archive within that subfolder means that users who need access to earlier versions know exactly where to look for them, and the users who just need the most recent version see that before anything else (in the next item, we will look at controlling access to folders too, since some users won’t need to see the archive at all).

Some systems, such as the LOR solution we are exploring, even allow you to limit subgroups of users or shares to using only the most recent version of a document, or to allow them to make local changes to their own copy of the file without affecting the main copy. This kind of version control is critical for accuracy when a file is shared widely.

Having one of these version control systems in place from the onset ensures that there is no panic moment when the wrong file is either edited or shared—controlling versions and changes gives reliability and accuracy. This also aids in the concept of Distributed Data, where the files/data can be updated from one point out to all the distributed versions of that file by making one edit rather than changing each and every copy of a file to match.

User access controls must be well defined and segmented

One of the biggest challenges in maintaining a shared set of data in the cloud or an LOR is deciding who gets access to what, and how much access they should have. When sharing from my cloud storage sites, I make sure to grant only the level of access or control necessary to complete the function I’ve asked the person to engage in (e.g. to view only, to enter comments, to make edits, or to co-own a folder or item). This often means setting expiration dates as well, since once a project is finished, the access to that file or folder doesn’t need to remain. This is both for security reasons and for keeping things clean—nobody likes seeing files sticking around that they don’t need anymore. The same applies for an LOR—once a user no longer needs access to a share, their access should be removed to prevent accidental (or intentional) changes coming in from users who shouldn’t be there.

Coupled with folders, user access control also helps to ensure that there is no “cross pollination” between files that shouldn’t be related. Having someone share project update documents into a picture folder isn’t just annoying, it might also mean that core files or updates are missed by others who need them, or are mishandled by being unintentionally shared with the wrong batch of files. Making sure that users only have access to the folders or files they need when they need them prevents this clutter or misuse.

Don’t rely on any one system as your only copy

Distributed Data can also refer to data that is distributed across multiple systems or formats, in this case, not interconnected automatically. Sharing files with others can carry inherent risk. While uncommon, systems fail, or someone might find a way to change or delete a file in such a way that version control is more of a hassle or even impossible to recover. Having a clean copy that wasn’t connected to that system might be the only way to recover that item. Most commercial/professional cloud storage solutions run on servers that have built in redundancy, so if one or more hard drives or systems fail, the data still exists in a second or third copy in another location.

Make backups that live on a different system, either another cloud solution, an offline external hard drive or NAS (network-attached storage), or other storage device. For maximum redundancy for critical data, some users even save the data in another region or physical location so that if one becomes inaccessible or damaged, it can be recovered from another location with lower risk or cost.

This may also make it easier to clean up the system over time, since backups can contain all the items, including past archives and versions, while the “live” system contains the most recent or most useful items.

Usage as discipline

It may seem unnecessary to say, but being disciplined doesn’t only mean using the file structure and access controls but means being serious about using the system at all. Time spent setting up your cloud accounts or LOR resources is wasted if you or others don’t make use of these systems.

If your intention is to provide access to the files from anywhere, or to share the resources to those most interested in using them, these locations should be the first place that comes to mind when looking for the resources. If I were to save a file only on my local machine, it is no good to me unless I am in that office on that computer the next time I need it. By having it in a well organized cloud location, I can access it regardless of what office I am in, or even when I am traveling for a conference or vacation and someone needs access.

The benefit of having the cloud or LOR solution is that the files and folders we need most are accessible whenever and wherever we are. Simply logging on and finding what I need is crucial—the same goes for sharing syllabi, course policies, or department modules across courses and instructors. You can only use what you have access to—a bird in the hand is worth two in the bush, and all that.

Making sure the system is healthy and ubiquitous also cuts down on fragmentation—having different versions of files or storage locations that either don’t match or don’t distribute well (or are expensively redundant).

Fragmentation causes confusion and cuts down on efficiency and interoperability. The fewer systems or locations that exist for files to be saved to or drawn from, the easier it is to ensure the system’s longevity and health, because it is being maintained by many invested parties on a more regular basis. It also makes everyone’s work easier, since there’s no need to search multiple places for a file.

Kevin Lyon

About Kevin Lyon

Kevin is a Double-Demon, receiving his Bachelor's degree in English with a minor in Professional Writing from DePaul in 2009, and staying on for his Master's in Writing, Rhetoric, and Discourse with dual concentrations in Technical and Professional Writing and Teaching Writing and Language. He is an now an Instructional Technology Consutlant and a Writing, Rhetoric and Discourse instructor. His research interests include technology in education, education and identity formation/negotiation, and online learning and interaction.

Leave a Reply