The word “cloud computing” has started to really dig under my skin for way too long. I’ve watched it turn from an analogous phrase basically meaning service oriented architecture (SOA) with redundancy and outsourcing, into a market with just “cloud”. It’s not some magic bullet that somehow Microsoft/Amazon/whoever-else has which keeps whatever you want them to run for you up and operational without any worries 24×7. It’s putting your responsibility for yourself into the hands of another individual who’s vested interest is in themselves.
For those that are new to the idea of cloud computing, or have heard it and think it’s the best thing since sliced bread, let’s throw out what it is so we’re not thinking oddly anymore. It’s a concept alone. Basically, if you have a service to run (MS Sharepoint, Oracle database, etc) and you don’t want to have it in your building or even in your hands, someone else hosts it for you. The very concept simply is outsourcing with a vague idea of reliability/scalability/plausibility. Since the idea has simply been “there” and has never been stress tested (since it’s not even a “product”), “Cloud computing” has in some twisted sense brought itself into the computing world without having any merit aside from some MBA jargon with a vague spattering of tech ideas attached.
So, in a nutshell (help, I’m in a nutshell!), let’s create logic that has been around for decades and implement “cloud computing” with common sense approaches that make it your OWN cloud computing for YOURSELF. The plus side is if you do it yourself, proven redundancy and easy failover can occur with little to no data loss and low outage window timing.
The components of cloud computing
* Virtualization infrastructure OR physical hardware
The first step is to plan your storage. What are you doing? Do you have multiple data centers? Are you running a disaster recovery center somewhere else for that “just in case” situation that is brewing in your imagination, or are you simply looking to house a disaster recovery component into your current singular data center? All of these questions help mold the approach.
For multiple data centers, two approaches can be taken. For nearly guaranteed data synchronization between two points of presence, the most reliable is Network Appliance with the block-by-block synchronization technology built into the system. It’s expensive, but very well worth it for this situation.
The second approach that can be taken is more make-shift, but without paying a quarter of a million for a NetApp array. It also has it’s inherent risks of losing 2-5 seconds of data in rare cases. While I could go on a sixty-thousand page diatribe on tieing different software together to create your own NetApp, since it’s 2011 life is much easier. There’s an open-source software named “Openfiler” that wraps many different technologies together to create a better-than-homebrew version of NetApp without the pricetag. (and the ability to expand to 24×7 support by purchase of a service pack from the company) Basically, it’s a storage aggregator/distribution software which pools different disk styles together into a single sliceable volume that can be shared via SMB, NFS, iSCSI, HTTP(S)/WebDAV, FTP, & Rsync. An integral feature is also remote block replication, which performs write-locks to ensure the disaster recovery filer has the exact same data. With the right configuration (RAID, etc) this would be nearly fool-proof. Oh… and be sure the system has 4 gigabit (1024mb/s) NIC, as there will be a tremendous amount of data going through and 2 will be used for the local network, 2 for the synchronization.
Now, on to the second part which is Networking. This part can get very hairy, since there are a zillion different solutions and they all range from super-cheap to super-expensive. I’ll go the abstract method since ultimately it all does the same thing.
The first step is to ensure each system has dual 1 gigabit NIC. Pretty easy, huh? 🙂
Now, for math. Each system will be transmitting about 900mb/s, so a switch backplane speed will need to be chosen that can hold the load of the number of servers attached. The best way to handle this is in rack mentality. Place 3-5 servers per rack (about a 1/8 rack usage depending on the size) which would transmit ~2,700-4,500mb/s over the backplane. A good starting point is a Netgear GS108T Prosafe 8-port gigabit smart switch which has a throughput speed of 16Gbps. (this includes incoming & outgoing) For each rack, two switches should be used for data & two switches for storage. Each of the servers in each rack should have one NIC connected to both data switches, which will allow redundancy in case of switch failure.
A master switch setup will be necessary, so an HP ProCurve 6108 would be a good solution for that with it’s stackable capabilities with fiber channel, along with 16Gbps backplane speed. From then on, take each of the switches in the racks and split them mentally in two. Run an RJ45 ethernet cable from one of the rack switches to one of the master switches. The idea is to have two rack switches (per rack) connected to separate master switches.
Now for the fun part! If you are running a remote data center, you’ll need a router with at least a DS1 (T1) connection speed to connect to your remote data center. (I’ll leave this part as an exercise for the reader) Once you do that, attach a 4 port switch of your liking to each of the routers and tag up ethernet from each of the master switches to each of the router switches. Great, now the physical hardware portion of the network is complete, and must be replicated exactly on the remote data center location.
Price wise, for a dual-rack setup this would equal approximately $5,800+router/router switches/ethernet cable. Not terribly bad for mass redundancy.
The next part is Power. Electricity. This part can be hard to replicate, as nearly everywhere you go you find only one solution of power provider unless you’re lucky enough to have your data center(s) on the edge of two utility corridors. So, the best that can be done is battery backup, or generator. This should be for all components within the data center, as without one of them the entire thing is useless. This is outside of the scope of the article, so I’ll leave this as an exercise for the reader.
The final step is debating what your going to run and how your going to run it. Are you happy with (para)virtualization (VMWare, QEmu, LPar, Xen, OpenVZ, etc) or do you feel what your needs will be require raw CPU power? I personally highly recommend virtualization in a “cloud” environment, as it allows a “crowd sourcing” mentality which will utilize each machine more readily than tossing an application onto a bare machine. With VMWare VirtualCenter, multiple machines can be added and VMs can be built throughout and migrated between each machine as needed with VMotion. Another benefit of VMWare is it’s ease of integrating SAN into the storage structure.
If you require non-virtualized processing power and still want it segmented from everything else, OpenVZ is the answer.
If your data makes you money, the best thing you can do is take responsibility for your own data. Shifting it to someone else and throwing money at them to “make it safe” only leads to paying someone to disrupt your life at a later time when your the most vulnerable. Always plan for the worst case scenario. Okay, maybe not the worst case… I don’t think a barrage of attack rabid chinchilla raining from the skies can ever be planned for.