Currently, I shoot with the Sony Alpha A700. I shoot exclusively in RAW files. These files weigh in at around 17MB apiece. But I rather like to think of how much space I need in terms of how many of those 16GB CF cards I fill up. Over the course of the last few years, I've amassed nearly 1TB of RAW files. These files currently live in two locations: in a Lightroom 3 catalog on an external 2TB drive and on my desktop, which I rarely use. This desktop computer is the subject of this post.
How I Got Here
When I built it, I wanted to make sure I had a machine that would last for the next three years, so to that end, I ended up getting a server class motherboard:
- Tyan 2 socket server socket motherboard
- 2 x 6 core AMD Opeteron 64bit CPU(s)
- 16GB of memory
- 8 x 1TB SATA drives
- 1 Kwatt power supply
After much back and forth, I eventually settled on OpenSolaris for access to the much prized ZFS filesystem. The storage on the box was meant to be an archive, and it has been. However, because it has also been running OpenSolaris, I've been unable to run any of my photo editing software or any of my video editing software, rendering the box effectively a very expensive NAS box. Ugh.
I only power it on periodically to sync up my laptops and to archive my working copies of photos, albeit in a very haphazard fashion. I am SO not adhering to DAM standards! No doughnut!
Where I Want To Go
Well, I still want the benefit of ZFS. However, I want to ditch OpenSolaris. I want to be able to run copies of the operating systems I need to run. I want to take advantage of those 12 cores and 16GB of ram. Yeah, I want alot of stuff.
I used to work for VMware, you know, that little known virtualization company. They have a free enterprise grade product called ESXi, currently at revision 4.1. This basically allows you to turn a computer into virtualization server. The upside of this is you have VERY little overhead and most of the capacity of the machine is usable for the guest operating systems. As far as virtualization products go, VMware's is the technology to beat. They use it extensively in-house and I find myself leveraging the power of virtualization wherever I go. So... why not at home?
However, ESXi does not do any of the data protection that ZFS does. So, for that, I will leverage another software product, Nexenta. Nexenta is actually a branching off of OpenSolaris and incorporates the latest advancements made available there, in its product. For installations of data up to 18TB, the use of Nexenta is free. Beyond 18TB, you'll have to pay. Considering my ballpark is in the 8-10TB range, I think I'm safe. There is even a project called Napp-It! which makes using Nexenta even easier to use.
Combining these elements, the goal is to:
- Have ESXi running on the bare metal hardware as a lightweight hypervisor
- Manage storage in ZFS raidz storage pools via a Nexenta VM
- Export ZFS volumes as iSCSI disks back to the parent ESXi server
- Use those volumes as VMFS volumes to create virtual machines on
- Volumes can be expanded as needed and snapshotted at the storage level vs using VMware delta file snapshots.
This is summarized in the diagram below:
|I'm describing what I'd like... ie, 20TB of storage... *sigh*|
Because I wanted to contain any unstable elements, ie. Linux with various beta multimedia libraries, or Windows for some special photo application I need to run, I can run those in their own virtual machines hosted on the ZFS backed iSCSI storage volumes managed by the Nexenta VM.
The requirement, of course, is that the Nexenta VM be started first, before any of the other VM(s) using the Nexenta storage volumes, are started. There is also a slight performance hit using storage managed from within a VM, but this is partially negated by providing the Nexenta VM have direct PCI access and control of the underlying storage. Additionally, some of the storage can be exported from Nexenta in the form of NFS, CIFS, and/or AFP shares, which can be used by laptops/etc.
The added benefit of using ZFS backed storage to power VMFS volumes is the ability to snapshot at the storage layer. The upshot of this is that even if ESXi, you will be able to make a near-instant clone of a virtual machine by doing the following in Nexenta:
- Create a ZFS volume snapshot
- Make a read/write clone of the ZFS volume snapshot
- Export the read/write clone of the ZFS volume via iSCSI to ESXi
- Add the new iSCSI volume to ESXi via vSphere client
- Modify the configuration file of the new snapshot's copy of the VMX config file to reflect changed path/location/name/etc.
- Import the copy of the VM into ESXi via the vSphere client
- Start up the copy of the VM!
This isn't as painless as just using vSphere's licensed vmotion-based live cloning. However, it avoids the need to do a somewhat convoluted snapshot/copy vmdk/remove snapshot process on the ESXi CLI interface, which can impact the performance of the running VM when ESXi has snapshots applied to a VM.
What Does This Have To Do With Photography!?
In a sense, it is a more involved way of looking at Digital Asset Management(DAM). Things go wrong... either bad software loads or perhaps a virus/malware. Basically, shit happens. This allows one to isolate different tools and applications from one another... especially if they have little to nothing to do with one another. Ie, the software I use to batch automated convert video clips from one format to another has nothing to do with my photo editing and management software... if one crashes, it really should have no impact to the other. Same thing with the DNLA media server for the TV/PS3/etc. It can be buggy or crash... that shouldn't cause bad things to happen elsewhere on the server.
The images and catalogs themselves will be stored directly on the ZFS filesystem folders. They will be accessible via NFS/CIFS from either one of the VM(s) or from a laptop, to checkin or checkout images and files. If I have doubts, I can always create a snapshot on the Nexenta/ZFS side periodically, say... every 10-15 minutes, with daily and weekly snapshots kicking in at predetermined times. Got a problem? Roll back or load up a read-only copy of that week or day's snapshot and copy the uncorrupted/damaged file. You can't do that with VMware snapshots, and Mac OS X Time Machine _can_ do that, but I'd rather reserve that for managing my laptop backups. Ideally, I would have something like Portfolio running in a VM with access to the files.. but that's another project... one which costs a bit more given the Extensis Portfolio server license costs...
Dream Setup & Other Wishful Things
Ideally, I would like to be able to have a virtual machine running Adobe Lightroom 3 and PS CS5, with access to say... 8 cpu cores to crunch through my files and manage a kickass large global LR3 catalog. However, I realize that would require one of two things to happen: licensing a copy of Windows and running it in a VM, then getting another license for LR3(I have the Mac OS X license) and PS CS5(same here).... or somehow get a Mac OS X desktop running in a virtual machine on the box and loading it up with aforementioned photo applications, then just accessing it with some VNC/remote desktop client. That would be nice.
Wait... Where's the Beef!?
Oh, so why don't I have blow by blow screenshots of how I have accomplished this? Simple. I haven't done it yet. Still in the planning stages. Right now, I've got something like 6-7TB of cruft data mixed in with valuable image/video/document data. Sifting through that is... well... painful. I also need more of those 2TB disks. Got two right now... and I'm using them. So, I still need the parts and then the time to start and complete this little restructuring project. But when it's all done, I'll be a much much happier camper.
Some Tech Notes
- For the PCI passthrough to work properly, I'm guessing I'll need to have the drives on a different storage controller from the onboard SATA ports. This might be a good time to pickup a PCIe x4 SATA controller board...
- For the ZFS setup, I'm intending to create it as a RAIDZ2 or perhaps even RAIDZ3 array to fend off one or *gasp* two simultaneous disk failures. RAIDZ2/3 will also make upgrading the array later much less harrowing. The upgrade is done by "failing" or removing a live disk and plugging in a bigger disk, waiting for the resilver to complete, then rinse and repeat for the other disks in the array. Ie you effectively lose your protection disk if you only have a RAIDZ... with a RAIDZ2 or RAIDZ3, you have 1 and 2 spare disks after taking one out, respectively.
- For better ZFS performance:
- One SSD for the ZIL, which is effectively the journaling or intent log for the ZFS filesystem. Normally, it is written to one of the disks in the array. However, this can impact head seek and slow down the filesystem. An SSD would greatly speed this up.
- Another SSD would be used for the L2ARC, which is like a filesystem cache. Often used content is cached here. For normal operation, this is fine, though I'm not sure about how well it would work for virtual machines. Guess we'll see.
- Finally, I'll want to tweak the available system memory allocated to ZFS cache. I believe Nexenta will tweak this dynamically, but doesn't hurt to check.
- I'll also setup a weekly or monthly job to perform a ZPOOL scrub, which goes through all of the storage and verifies that what's written to the disk is good and make any checksum corrections as required. Another reason to go with RAIDZ2 or RAIDZ3. Extra copies of either your data or of the checksums goes a long way towards ensuring integrity of your data.
An old friend and ex-coworker has been kind enough to enlighten me regarding the improved benefits of NFS exports vs iSCSI exports for VMware environments. It would most certainly make the resizing of available storage as easy as falling off a log. Thanks Dave! Check out the benefits he lists for NFS here!
I work in the tech sector and do photography on the side. I use VMware's virtualization software, but am not compensated for anything positive or negative I might say about it. I have likewise used VirtualBox, XEN, Solaris Zones, and other virtualization technologies. I make use of OpenSolaris currently and plan to use Nexenta(also based on OpenSolaris), but I am not compensated by either groups or parent organizations for my commentary. I use Adobe software, but all copies of software I currently use are paid for out of my own pocket. I do not receive monies from Adobe for writing or talking about their software offerings. I just happen to be a happy user of their software. I am not currently affiliated with any of the above technologies parent companies.