Blog Entries in Disaster Recovery

Thursday, June 24th, 2010 - 11:37 am EDT

Tech Tip: Common Ways to Tell You Are Not Prepared to Recover from a Disaster

Posted by: Michelle Liro

Today's tip comes to us from author Eric Beehler via our friends at Realtime Publishers.

Disaster recovery is somewhat of a buzzword in the IT industry, and IT professionals have all been exposed to their share of great disaster recovery ideas from business managers. These ideas are often based on the industry buzz and seem to only make more work for you with little gain overall. This is usually because the idea is not backed up with a real plan. The actual implementation of disaster recovery is usually a big chore to undertake correctly, but in the end, it is well worth the trouble.

It's important to be ready to recover your data and systems when a disaster strikes, but it is rarely a top priority in the grand scheme of IT projects when crisis has yet to strike close to home. Unless your company has decided to make disaster recovery a high-level objective, it's usually the front-line administrator that will be saddled with the responsibility of implementing some sort of plan to save the day -- but you will likely be short changed on training and resources to get the job done.

There are many ways to deal with a disaster, from having a set of cold standby machines to employing a fully redundant hot data center. In reality, as the administrator, your job doesn't change much based on the scenario for recovery; it has to be up and available to keep your business running. You likely have some kind of plan now, but if you haven't been through the real thing, you really don't know if your plan will hold water. For Windows administrators, there are several problems that seem to expose themselves when it's time to exercise a disaster recovery plan, or worse yet, go through the real thing. Here are some common ways to tell that you are not ready for a disaster.

Plan for an Alternative Site
You are not ready for a disaster if you don't have a place to go, which requires planning for a full on-site disaster in which your site is down or inaccessible. There are several methods to address this issue if you don't have a solution today, from having an alternative site with servers waiting to be loaded up for operation to a warm site that is always ready and waiting to take traffic. These decisions are not usually made by you but by the CIO. All you can often do is consider the solution given to you and how that will impact your ability to recover. A cold site, for example, will allow you to have hardware and connectivity available, but you will need to account for operating systems (OSs), drivers, configuration differences, and data center differences. In a warm site, you have to ensure that changes to configurations and data remain synched across the two sites.

Plan for Downtime
You also have to consider whether the site solution will support the Recovery Time Objective (RTO) required by the applications and business. Simply put, the RTO is the amount of time your users will be without the functions supported by your server, which could be a Web site, a mailbox, or the ability to log on to the domain. You should have this time defined per application or function supported by your server. This, of course, in a bigger effort for disaster recovery, may be defined for you, but don't be surprised if the business people you support have no idea that your server supports the functionality they require. You may need to interject with your personal knowledge of how your server functions in order to get this definition correct.

There are generally accepted categories for RTO that fall into tiers, as Figure 1 shows. Use these as a guideline but feel free to create standards within your own organization to meet your needs. If you have a need to recover applications with 2, 4, and 8 hours, redefine the tiers so that they make sense to your business through an analysis of the business impact of downtime. Just be sure that you can apply the standards as broadly as possible across the organization.
 

Plan Your Tolerance to Data Loss
You are not ready for a disaster if you don't know your tolerance for data loss. Let's start with the basic foundation of the backup. Whether you use simple tape backups or an advanced nearline solution, you have to consider that most solutions are put in place to account for day-to-day operational needs. First, the exercise you went through with RTO must be done for the Recovery Point Objective (RPO), which is the amount of data that can be lost. You have to understand what the business can afford to lose; this value is not necessarily tied to an RTO tier. Take, for example, a point of sale system. If the system is down for 5 hours, the business may be able to recover by entering the orders taken while the system was down, but data loss of 5 hours may mean millions of dollars in lost sales.



The gut reaction for your RPO on some of your systems may be that no data loss is acceptable. In other cases, 24 hours of data loss may be acceptable. The goal is to understand what can be tolerated, not what is desired. Everyone will desire no data loss, but put a realistic perspective to the real value of the data. If you define Tier A RPO as no data loss, then you have to put systems in place that allow for that reliability. This means copying transactions as they happen to a backup site, which is an expensive solution that should be used only on your critical business applications, depending on your budget. If you have Tier B systems as defined in Figure 2, you will need some sort of solution that will be separate from your nightly backups, as you cannot count on having your last nightly tape backup at your recovery site.

Considering the Loss of a Backup
You are not ready for disaster if you rely on your daily backup for a recovery scenario. You may have in your head that you can rely on the last tape backup in the event of a disaster. Whether such is the case depends on a key question: can you get your restore process to work offsite? Don't be so quick to answer this one. If you take advantage of offsite storage either through a vendor or your own in-house process, it is an excellent step, but offsite storage doesn't necessarily guarantee you can restore at your disaster recovery site within the specified RPO and RTO.

Tape drive compatibility, backup software, delivery time, drivers, and OSs are all considerations that you must address prior to saying your solution is ready. This is especially true for a third-party backup site that will provide you with "like" hardware. That equipment will not be your equipment, and even if it is, expect aspects of the infrastructure to be different, such as IP address schemes, firmware (which can be a nightmare when working with SANs), and simple access to the hardware.

You also have the issue of archive requirements and the fact that you likely rely on these tapes for your day-to-day restores. If you perform restores for file recovery and other issues, you likely want to keep those tapes close by. If you ship them away for maximum protection, it's going to cost a pretty penny in order to request tapes from your offsite storage vendor.

You also have to consider how those tapes make it to the recovery site. If you make full backups only once a week and you only do offsite storage once a week, you might only get a restore from 2 weeks prior. Why? Because if you are lucky enough to get your tapes offsite a day or two after the full backup and you get the shipment to your disaster recovery site 4 to 8 hours after they are requested, you can almost bet that Murphy's Law will strike and you will get a bad tape somewhere in the set. Then you have to move back in the chain, and with most full backups run weekly, you might be taking you system back 2 weeks or more if Murphy continues to strike. Now, the RPO of your plan that you expected to meet with your existing backup plan is not being met.

Even if you do recover your servers with no issues, how long will it take to recover them all? Consider the queuing on the tape drives, with multiple servers waiting for those tapes to be loaded. It could take quite a long time before you even get a chance to try a restore to your server depending on the technology present at the recovery site. What can you do? Well, time to restore will be reduced if you can restore large chunks at one time. Consider putting systems with like RPO and RTO requirements in the same backup set.

Better yet, host them on a LUN or set of LUNs on your SAN or other logical storage method in your situation so that a restore can be done all at once. You might even consider booting from the SAN, which might save you from having to restore the local disk of many servers. If you have a blade server solution, this may even be baked into your infrastructure.

Using Disk-Based Backup
Let's also consider disk-based backup. This solution has become increasingly popular because of the low cost of hard disks and the ease of backup and restore. In addition, disks often take minutes to back up and restore what used to take hours. The software supported by these systems even has versioning, much more frequent backups, and nifty utilities that make life much easier on the administrator. This is usually all handled by complex backup management software such as Microsoft System Center Data Protection Manager. When using this kind of solution, consider employing these often-integrated features to support data replication of some sort, although vendors name these types of features differently.

You can even copy your live data to your recovery site using a SAN/NAS vendor's Failure Resistant Disk Solution (FRDS). You should, however, consider the fact that this kind of solution will be much more expensive than tapes because it will require duplicate equipment with data replication happening across a wide area network (WAN).

You should refer to your RTO and RPO tiers to determine whether certain servers and data sets could stand to be away from your disk replication and rely more on a tape solution. You should also consider your disaster site and understand whether it can support this kind of solution. You should treat your server restores as a form of triage. You need to know, based upon RTO and RPO, what you are going to recover first and what can wait.

Considering Configuration
If you can't identify the full configuration of your servers, you are not ready for a disaster. Realistically, can you keep track of 300 shares on a terabyte SAN served by a load-balanced Windows cluster server? Do you know which shares go to which directories on which LUNs? You have to document configurations. This is true whether you have a basic bare metal restore plan or a full redundant data center. The luxuries of a production environment won't be at your disposal. A normal production environment allows you the opportunity to compare configurations when something goes wrong and work through a problem. A disaster affords you no such luxury.

No matter how familiar you are with your systems, you need to have everything documented that can be changed. For any applications, you should have a guide for their installation in your environment. You should have the servers documented with everything from IP addresses and patches to database connections and configuration files. If you run IIS for Web applications, you should have that configuration documented as well. Some sort of context diagram is often useful to determine how your server interacts with other systems.

Utilize configuration management systems, such as SMS, to do some of the heavy lifting for you. Create reports and keep them up to date in an alternative location, either a paper copy offsite or an electronic one. Configuration problems seem to be a killer when recovering because changes sometimes get applied without strict control. What seems like a small change can kill you in a disaster when it hasn't been documented.

Documenting the infrastructure goes beyond your own servers, but is just as important when it's time to troubleshoot. You can bring your file server back and you can bring your application servers back, but if you don't have proper DNS or connectivity, no one will be connecting to those systems you've recovered. If you have dependencies on other systems, you need to identify them. Know what names should be in DNS, what IP addresses and subnet you are on, what systems you interact with such as database servers or other back-end services such as the DMZ or Internet access. When you tell a database administrator that your application is taking SQL errors, you should know what database server, database, port, connection type, and authentication type you are using. You should also know the user name and password being used, if there is one. Does the server break down into pieces? Does it have multiple applications or functions? Document those functions separately.

You can't think of server as a single system if your customers don't see it as a single function. Remember that restoring an infrastructure is many pieces to a whole, and you should not expect any of those pieces to work correctly as you can in a production environment. In fact, when you face an issue in production, it usually has a single root cause, but a disaster recovery will usually experience several major issues at the same time. You need to know where you stand in the ecosystem of your environment to understand how to identify and help fix those issues.

Identifying Single Points of Failure
If you have a single point of failure, you are not ready for a disaster. A single point of failure can ruin your nicely laid out plans. Although not a requirement for a disaster recovery, the ‘N + 1' definition used when considering disaster recovery is many components backed up by a single component. You can still run into problems using N + 1, especially at a cold site where you have not been exercising your disaster recovery equipment to ensure its health. You might consider having additional servers of a similar capacity available above the minimum number required to recover just in case you experience a failure at your recovery site.

An optimal solution will have redundancy built-in to your recovery site the way you have it outfitted at your production site. If you have a failover cluster in one location, you would do the same in the recovery site, even though you could technically get by with a single server, assuming that server functions as expected. You should also consider the interdependencies of your infrastructure, such as network, when you think of this issue. Single switches, routers, domain controllers, and sources of power can also be points of failure.

Single point of failure doesn't stop at the system level. You might have that one guy or gal who knows everything about your environment. When you're at his desk and something goes wrong with the system or a specific application, he always has the answer. This gal is a good person, but when it comes down to it, you can't rely on a single person. When a disaster strikes, the go-to person may not be available during the recovery phase-yourself included. When everyone looks around and throws up their hands because such and such is down, what do you do? You wish you could go back in time and document that ingrained knowledge. This is also true for day-to-day operations, but especially necessary when everything is going wrong because of a disaster. The person who knows it all is not what you need, you need full documentation of the knowledge that person possesses. Your go-to should really be your documentation.

Integrating Disaster Recovery into Daily Life
If you don't integrate disaster recovery into your daily operations, you are not ready for a disaster. Organizations that plan for disaster recovery as a single project with a start and an end will fail. Don't let the hard work go to waste. When you put these plans in motion, get all that documentation done, have recovery solutions in place, and continue to update your documentation and test your systems. If you don't test you disaster recovery process regularly, how do you know it will work? If you don't update your documentation day-to-day when changes are made, your documentation is outdated and may even be detrimental to your recovery efforts. Don't let apathy or a disconnected process of change management get you in the end. Not only does integration help your readiness, it reduces the dedicated time necessary to getting disaster recovery ready. Find a way to make what you use in disaster recovery a part of daily life.

Eric Beehler has been working in the IT industry since the mid-90s and has been playing with computer technology well before that. From Help desk technician to solutions provider, he has been involved at many layers of enterprise solutions from the desktop to the network to the server and the SAN. He currently has certifications from CompTIA (A+, N+, Server+), and Microsoft (MCITP: Enterprise Support Technician and Consumer Support Technician, MCTS: Windows Vista Configuration, MCDBA SQL Server 2000, MCSE+I Windows NT 4.0, MCSE Windows 2000, and MCSE Windows 2003). He also holds a Master’s degree in Business Administration from the University of Colorado at Colorado Springs. His experience includes more than nine years with Hewlett-Packard’s Managed Services division, working with Fortune 500 companies to deliver network and server solutions and, most recently, IT experience in the insurance industry working on highly available solutions and disaster recovery. He has co-authored books, including MCITP: Microsoft Windows Vista Desktop Support Enterprise Study Guide (Sybex/Wiley Publishing), authored several white papers, and co-hosts the "CS Techcast" podcast aimed at IT professionals. He provides consulting and training through Consortio Services, LLC.

For additional information about Disaster Recovery and High Availability topics, be sure to check out Marathon's Resource Center which has an extensive library of white papers, webinars and eBooks availabile for download.

 

Show Discussion / Comments (0)
Disaster Recovery  Availability  Business Continuity  Disaster Tolerance  High Availability 

| More



Thursday, June 17th, 2010 - 1:37 pm EDT

How to Cut Risks and Costs with a Downtime Analysis & Action Plan

Posted by: Michelle Liro

Earlier this week, we hosted a webinar on the topic of “How to Cut Risks and Costs with a Downtime Analysis & Action Plan.” We know from our experience in application availability that many companies avoid these types of assessments – they either don't know where to start or decide that they don’t have the time or experience to conduct an assessment, so they just live with the unknowns and hope that nothing bad happens. (We’ve seen the consequences of downtime at many companies and don’t recommend this method!)

Our VP of Services & Support, Beth Shea, explored this topic in detail and provided a simple framework that companies can use today to uncover their risks and put measures in place to minimize the impact of downtime. To learn more, be sure to watch the 30-minute webinar. You can also check out the Q&A session from the webinar, summarized below.

Q: When looking at the impact of downtime, it is just unplanned downtime, or should you include planned downtime as well?
You absolutely need to plan for both planned and unplanned downtime, as there’s a real cost and business impact to both. They both need to be included in your impact assessment.

Q: What about branch offices – should they be included in a downtime assessment?
According to Forrester Research, about 20% of a company’s business is tied up in branch and remote offices, and IT needs to include these offices in any assessment that they are conducting. You shouldn’t overlook these offices when putting together your downtime and business impact assessments. They have to be factored in.

Q: How often should I conduct a business impact and risk assessment?
What we’ve found with our customers is that conducting an annual assessment is sufficient, or in some cases, twice a year, depending on the type of business. You can then use these as your benchmark going forward to determine the success of the initiative and ensure that you have the key metrics to report to your management team.

Q: How do you determine when to use local high availability vs. a disaster recovery solution?
Fault tolerance, high availability, disaster recovery - all of these different terms can be confusing and they can have different meanings to different people. The way we think of this is that when you’re implementing high availability or fault tolerance this is to ensure that locally you are protected against the everyday, nuisance failures that cause downtime. If you lose a fan or a drive for example, you would automatically route to another server within the same building or local area. Disaster recovery solutions are really for recovery from catastrophes (fire, flood) or other events where you need to failover to a much more distant location. You don’t want to use this type of solution for everyday failures, as it can be very time consuming to failover and failback, and you can potentially lose some data. For local protection, you want high availability/fault tolerant solutions.

Q: What about hosted applications like salesforce.com, how do I account for those in this type of assessment?
In today’s world, so many applications are offered as Software-as-a-Service (SaaS) or sometimes called hosted applications, where they are no longer hosted at your site. However, they are still important to your business and need to be included as part of your overall assessment. Our approach is to conduct the assessment for your SaaS applications as if all they were onsite. Then use your tiered analysis and make sure that your SaaS vendor is meeting your availability requirements for that application, and that they have the necessary protections in place to protect that application to the same level that you would protect if it were in-house.

Q: Does Marathon offer any services to conduct this type of assessment?
Yes – this is a service that we provide for our customers. Most customers are very satisfied with the service, because it usually has an immediate ROI for their business. If you are interested in this type of service, please feel free to us at 978-489-1100.

Q: Does Marathon have any templates available to build a framework for this type of assessment?
Absolutely. From our 16+ years of working with customers on the assessment and prevention of downtime, we’ve put together an extensive list of questions to ask about the business risks and impact of downtime. Please feel free to contact us if you would like more information.

Q: How do you measure or put a price on the intangible impacts of downtime?
This can be tough to nail down, but what we recommend is developing some basic estimates. This isn’t meant to be an exact number, what we are really trying to achieve here is to prioritize applications, put them into the tiers that we discussed and make sure that you are putting the right amount of resources against the right applications. From a productivity perspective, one metric you could use is to look at the cost of employee salaries and how much it would cost in salary costs to have employees not be able to work for a certain amount of time. This is just one example.

Q: Does everRun handle quick switch over to back up site if the main site goes down?
Yes, within seconds.

Q: What are the requirements for the backup site?
The machines at the backup site are in the same pool as the primary site, so the backup machines must meet the requirements to be in the same pool as the primary site machines.

Q: How about regular data sync between main site and backup site?
Since the primary and backup site are running in lockstep mode, the application and the data are always in sync between the primary and backup sites.

Show Discussion / Comments (0)
Downtime  Availability  Disaster Recovery  Fault Tolerance  High Availability  Interview  Webinar 

| More



Monday, May 24th, 2010 - 11:58 am EDT

The Changing Dynamics of Data Protection

Posted by: Michelle Liro

Frank Ohlhorst, former Executive Technical Editor for eWeek and award-winning IT expert, was our expert guest speaker this week for the webinar, “Cut Your DR Costs and Get Better Data Protection.” During his presentation, Frank reviewed why he believes that now is the time to rethink traditional approaches to disaster recovery. He explained why the total cost of ownership for disaster recovery solutions is on the rise, and why changing data protection dynamics are making it more economical to focus your time and budget on the prevention of downtime and data loss, rather than recovery.

Below is the summary of the audience questions from the Q&A portion of the webinar.

Q: You talked about how HA can give you a geographic advantage. What do you mean by that?
Frank Ohlhorst: High availability systems are designed to work with multiple servers and there’s no reason why you can’t have those servers located hundreds or thousands of miles apart. You get a geographic advantage because your data centers is in multiple places and regional areas, so if a weather-related or other event occurs, let’s say a blizzard up north with a power outage, your data center down south can pick up the slack without kicking users off the system. The same can be said about a data center located in an area with hurricanes or other natural disasters. The geographic separation gives you added protection.
When high availability is paired with load balancing, it helps to locate the data resources closer to where the users are requesting them. Let’s say you have users in Utah, it’s better performance-wise to have them talk to the data center in Nevada rather than Virginia. It helps on that level also. HA solutions also have the tools for monitoring what is going on with your users and network, to help you plan out how you should assign users to specific data centers for the most efficiency.

Q: I understand how high availability can handle unplanned downtime, but what about planned downtime? Can it help there as well?
Frank Ohlhorst: Yes, the idea there is being as you have multiple active systems to meet the user’s needs, you can take one of those systems down for maintenance and have the users serviced by the active machines while you make the updates and improvements. Then when you are done, just resynchronize with the other systems, move the users over to those systems and update the rest of the servers.
Another great benefit of this is for testing upgrades and changes. So take one system offline and test your upgrades to see if they work properly before you return that system to production.

Q: If I have an HA solution in place, is back-up still necessary?
Frank Ohlhorst: 99% of the time the answer to that question is yes. It depends on what your corporate needs are. There are certain situations where HA might not deal with your catastrophe. Those are usually software-damaging events, like a virus infection, that winds up getting replicated across the system. Of course, that should really be part of your security planning to prevent events like that from even happening. With today’s security technologies, it’s pretty easy to prevent that. But if you did ever have one of those events, you do need something to roll-back to, and that’s where the back-up comes in to play. Ideally though, you should be preventing that type of event, because you also have the potential to lose active data if that happens. When it comes to compliance or auditing, you have to restore data relevant to that time period to meet the needs of e-discovery, compliance, accounting audits and other similar requirements. So you can’t just say, “I have HA in place, so I don’t need to back-up.”

Q: What about data de-duplication technologies, don’t they help solve this problem of managing large volumes of data?
Frank Ohlhorst: They reduce the data footprint for sure, but what we’re talking about here is availability of the data. They can certainly reduce the size of your data footprint, you can use de-dup to speed up backups. At the end of the day though, if the system or application is not accessible to the user, then it’s not available and you haven’t met your objectives. It’s a simple matter of business logic that data de-duplication can improve performance and reduce the size of the footprint, but it doesn’t solve the problem of providing access to users during catastrophic events.

Q: Do you see continuous availability and high availability as the same, and if so, how do you differentiate between the two and the costs?
Frank Ohlhorst: There was a time when those technologies were very, very different. That was way back when we relied on expensive hardware-based solutions or appliances that provided continuous availability. High availability at that time was thought of as a method to switch from one server to another using a manual process in the case of an emergency.

High Availability technology has evolved significantly since then. Now, the two are really one in the same from a planning and software point of view. Today’s HA solutions eliminate that step of manual switchover. What you see with the vendors today is automatic HA technology that really delivers continuous availability. And the cost gap today is pretty much zero, since the technology for continuous availability and high availability has evolved to be almost one in the same.

Q: With an SRDF/S-type solution, how can we get around the fact that being geographically more separated to mitigate regional disruptions can mean slower primary system response times due to the need to remain synchronous?
Frank Ohlhorst:
Let’s look at this first from the ideology of what we’re trying to do which is business continuity. So, if you encounter a situation when you lose connectivity to a system and it’s still available at another location, then you’ve met the goal there of providing continuity. And you’re in much better shape than you would be at that point if you had a disaster recovery solution instead of a business continuity solution.

The question you have to ask yourself at that point in time is: Is reduced performance better than no performance at all? For most businesses, the answer is yes. For others, if the performance lag is significant enough it can impact business. In those cases, you’ll have to work out a way to develop geographically dispersed sites can that can provide enough performance to the user sets that need access to the data. You also need to make sure that your connectivity has enough bandwidth to support your BC/HA solutions, which means the ability to replicate the data in real time across the wire. You might have to invest in larger pipes for better connectivity to support that. But again, that depends on your particular business and your needs. There is no one correct answer to this question, but the good news is that there are several solutions today that can help you solve this problem and meet the levels of availability that you need for your business.

Show Discussion / Comments (0)
Disaster Recovery  Availability  Business Continuity  Continuous Availability  Data Replication  Disaster Tolerance  Fault Tolerance  High Availability  Interview  Webcast  Webinar 

| More



Monday, April 26th, 2010 - 1:38 pm EDT

10 Common Mistakes Made by Disaster Recovery Teams

Posted by: Michelle Liro

The application availability experts here at Marathon were asked to put together "10 Common Mistakes Made by Disaster Recovery Teams" for a featured slideshow for ITBusinessEdge.com. These 10 common mistakes are summarized below:

1. Confusing HA and DR
A lot of companies confuse high availability (HA) and disaster recovery (DR), or implement a DR solution when they really need HA. Put simply, HA is about preventing the everyday failures that cause downtime (network card failure, storage corruption), while DR solutions are designed to help you recover from true disasters (floods, hurricanes), not minor problems.

2. No specific disaster recovery plan
Implementing disaster recovery software or speaking broadly about “what-ifs” is not enough. The IT team must be well versed in a set plan which has been tested and proven effective. IT staff, as well as upper-level management, should be trained in the DR protocols in the case of any business disruption. In the event of a disaster, team members should already be familiar with the plan and not rely on in-the-moment decision making.

3. Untested disaster recovery plan
While testing the plan may not mean that it will go off without a hitch, it is an important step in preparing the company for a disaster. After testing, improvements should be made and the plan should be scrutinized for any possible holes.

4. Only involving the IT team in the planning process
Disasters affect the entire business, not just your IT infrastructure. Representatives from all company departments should be involved in the planning process and should know their role in the event of a disaster. In addition, it is imperative to train company executives and decision makers in how to carry-out the plan. They should be aware of all protocols, and be involved in testing exercises.

5. Adding too much complexity
Many technologies actually introduce complexity into the IT environment. For example, clustering technologies may require administrators to painstakingly maintain each server in the cluster to support successful failover. IT organizations instead should find and embrace those technologies that reduce complexity for operational staff—thereby eliminating potential sources of human error.

6. Purchasing inexpensive, low-quality hardware
While it is tough to justify shelling out the extra dough for a top-of-the-line server, it is well worth it on the day that your processor fails. Many IT staffs are working with constrained budgets and therefore have to buy lower priced equipment. This equipment is more likely to see failures, increasing the likeliness of future problems.

7. Using common components in the physical network hardware
For example, dual-ported network cards share common hardware logic, and a single card failure can disable both ports. For full redundancy, you need either two separate adapters or a built-in network port combined with a separate network adapter.

8. Utilizing on-site data replication
Many factors can cause site-wide failures, including an air conditioning failure or leaking roof, a power failure, or a major hurricane. Site disruptions can last anywhere from a few hours to days or even weeks. There are two methods for replicating data across sites. One method is to tightly couple redundant servers across high speed/low latency links, to provide zero data-loss and zero downtime. The other method is to loosely couple redundant servers over medium speed/higher latency/greater distance lines. This provides a disaster recovery capability where a remote server can be restarted with a copy of the application database missing only the last few updates. In the latter case, asynchronous data replication maintains a backup copy of the database.

9. Implementing a plan that worked for someone else
DR/HA is not one-size-fits-all. Every business has different objectives for different applications. It’s ok to look to others for guidance, but stay focused on your specific goals.

10. Not understanding business requirements
What exactly is it that you need to accomplish? Implementing wrong or incomplete solutions can waste time and money. Know what clients and users need and adjust the DR plan based on the service levels that need to be met.
 

Show Discussion / Comments (0)
Disaster Recovery  High Availability 

| More



Monday, March 8th, 2010 - 11:29 am EST

Best Practices for Creating Disaster Recovery Plans for Your SMB

Posted by: Michelle Liro

Marathon’s Sr. Director of Products, Michael Bilancieri, recently answered some questions for Paul Mah of ITBusinessEdge.com regarding disaster recovery planning for small & medium businesses. A few of Michael’s answers are highlighted below. For the complete Q&A with Paul Mah, see the article here.

Mah: Any tips to help SMBs with constrained budgets get management’s approval to implement a DR program?
Bilancieri: This may be the most important part of the process. Without support from the senior management team, any DR plan will be hard to get off the ground. The key takeaway here is to translate the technical language into business terms.

Since DR is not primarily about the technology (it is about the business value), it is important to clearly express what downtime means in terms of revenue loss. By creating a chart, organized by each application, it is easy to clearly articulate how much revenue is lost across each application for a certain amount of time.

Mah: What are the best criteria for determining an optimal disaster recovery plan?
Bilancieri: First, you have to identify what it is you need to accomplish. This includes defining the recovery time objectives (RTO), which is the amount of time applications can be unavailable and recovery point objectives (RPO), which is the amount of data that can be lost when a recovery is required.

Keep in mind that these values will likely vary for each of your different applications. Implementing incorrect or incomplete solutions will result in wasted time and resources. Check with your users and clients to determine their requirements and any service level agreements that must be met.

Mah: Once you determine exactly what your needs, how do you select a plan?
Bilancieri: DO YOUR HOMEWORK! Seriously, there are so many different products that claim to be “DR” solutions, all approaching the problem from different angles, it can be very confusing to determine what actually does the job you are looking for it to perform. As you research different products to implement as part of your DR plan, be sure to ask specifically what their product does (copies just the data, takes data snapshots, captures complete images of the full system, etc.) and don’t be afraid to ask probing questions.

Many vendors make the same claims using the same terms but actually deliver very different results. If you are going to test these solutions in-house, which is recommended, try to do the test under similar conditions as your production environment, with similar system and application loads. Oftentimes, something works well in a test environment [where there is] no real processing happening, [but] fails to function adequately once deployed in the live production environment.

Mah: What would a DR plan look like for a company that may face natural disasters such as hurricanes and flooding?
Bilancieri: Since hurricanes and floods can cause severe damage that can result in long-term outages, it would be wise to implement a solution that protects your systems between locations that could not be affected by the same disaster. Ensure that the backup, or DR, site is planned for a location that can be readily accessible by your users and clients should the primary location be destroyed or otherwise inaccessible.

Marathon has a customer based in Georgia, The Sullivan Group, which implemented a disaster recovery plan just for this reason. The team decided to virtualize its data center with Citrix XenServer and implement Marathon's everRun VM solution to provide redundant virtual machines and synchronized mirroring of the entire system including network, applications and data. The Sullivan Group has a small IT staff but needs to be continuously available for their clients, so they needed a solution that was fully automated and offered simply implementation.

Their first step was to identify what their customers’ needs were - and they decided that they needed continuous protection. Second, the team determined exactly what they could afford, and the ROI they would see from implementing DR software. They already knew that they would constantly face the threat of storms, and that they needed their data to be backed up in a remote location. Finally, they determined exactly what solution their IT staff could support and decided exactly which business applications needed to be fully available.
 

 

Show Discussion / Comments (0)
Disaster Recovery  High Availability  Interview 

| More



Wednesday, February 3rd, 2010 - 4:38 pm EST

Top 5 Tips for Branch Office Application Availability

Posted by: Michelle Liro

Keeping your applications “always-on” for users is no easy task, and can be particularly tricky for branch or remote locations where you probably have little or no IT staff to support your efforts. Forrester Research senior analyst Stephanie Balaouras has been studying this trend and has pulled together the top 5 best practices for supporting application availability at remote and branch locations. She presented these during a webinar last month and we've also summarized them below.


TIP #1 – Don't Overlook Remote Location Availability

While this may seem like an obvious point, it’s actually very common for IT departments to overlook their branch and remote locations when it comes to application availability. You can’t neglect these offices for both high availability (HA) and disaster recovery (DR) plans—you need a holistic approach to protect all of your business applications, no matter where they are located. This also means that you need to factor in these systems when planning your IT budget as well.

According to recent Forrester Research data, IT systems at remote and branch office locations account for more than 20% of your total infrastructure. They are critical to your business process and operations. Today, a lot of these locations don’t have HA or DR, and in some cases, they don’t even have basic back-up. Make sure that these offices and locations aren’t forgotten as part of your HA and DR plans.

TIP #2 – Classify Systems by Criticallity

When developing your strategy for operational HA and DR, best practices include performing a business impact analysis. This doesn’t have to be a lengthy process—you just need to map the dependent systems for each business process, and then create a rough estimate the cost of downtime for each. Once you have that information, you can determine availability rates as well as recovery objectives. As part of that process you should also identify the most probable types of downtime. When you put that all together, you can classify systems by criticality, such as mission critical, business critical, business supporting, etc., and you can then determine the availability rates needed for each of those systems.

TIP #3 – Develop Tiers of Service for Availability

Once you understand your range of recovery objectives, it helps to have an IT availability and service continuity catalog. This catalog defines a range of service tiers. Forrester typically sees four levels: mission critical, business critical, business important and business supporting. Each of these tiers has associated recovery objectives, technology pre-requisites and the costs to deliver that service. This catalog helps to simplify your strategy, by allowing you to assign appropriate tier classifications to new systems quickly and easily.

Another benefit of using this method is that it also helps you to limit the number of point products you are using for HA and DR. The more point products you are using, the more you complicate the sequencing and complexity of preventing a failure or recovering from a failure. Keep it simple. Every time you deploy a new application or system, assign a tier from your catalog, put the appropriate protection in place, and then communicate that to the business.


TIP #4 – Measure Availability from the End-User Perspective

Well-written objectives measure both planned and unplanned downtime and also take into account the timing of downtime. For example, you don’t take your systems down for planned maintenance during peak sales periods or at 1pm on a weekday when your traffic is at its highest level. You select times when users will be least affected. Availability isn’t about the individual IT system, infrastructure or component. Technology uptime is important to track but is not a true measure of availability. True availability has to be measured from the end user perspective. If the application or service is not available for use, even if the individual components are functioning, then that means the service is down. When making decisions about HA and DR strategies, you have to look at availability from a people perspective, not a technology perspective.


TIP #5 – Make Availability Part of Every IT Decision

Availability is no longer an optional practice. It’s essential. It’s something you owe to your employees, your customers, your partners and your investors. Application resiliency has to be part of the planning process right from the start—HA and DR should not be an after-thought. Even in remote and branch locations, these applications are critical to the success of the business, so availability of the systems should be included during the planning phases of the project, rather than an add-on after the project is completed.

 

Show Discussion / Comments (1)
Availability  Disaster Recovery  High Availability 

| More



Monday, September 21st, 2009 - 9:40 am EDT

Q&A: Windows Server High Availability

Posted by: Michael Bilancieri

Thanks again to those who joined us for last week’s webinar, "Windows Server 2008 High Availability: Technology Comparison." The on-demand recording of last week's webinar is now available to watch at your convenience (here).

We had a lot of good questions from our attendees during the Q&A portion of the webinar, which are summarized below.

Q: How do you determine when to use an HA solution vs. a DR solution?
When it comes to availability vs. recovery, the most important question to ask is what are your recovery time objectives (RTO)? What is the amount of time your application can afford to be down? If the applications have strict requirements, then you want an availability solution. Disaster recovery is data replication often times with a failover capability, not availability. For critical applications, this may not be sufficient.

Q: If I have an HA solution in place, do I still need a solution for backup?
Availability and backup are two different things. That question comes up a lot, along with the need for disaster recovery. Backup will never likely go away completely. You still need to backup your data to ensure recovery in the future should that be necessary.

Q: Is everRun available for Linux applications?
Yes. We can provide basic failover capabilities for Linux applications today.

Q: How does everRun differ from replication solutions?
everRun 2G is used for availability, both locally and for short-distance geographic separation as well. We have a replication and recovery solution as well that can be used for disaster recovery for long distances. You should determine what your objectives are: do I have to keep my applications up and running or do I just need to recover it if something fails? What’s the recovery time objective for each application? It’s up to your individual applications and what level of protection you need for each. Often times availability is a priority as downtime is not desirable, with DR also a requirement on top of that to ensure recovery in the event of a major outage.

Q: Can everRun be used for planned downtime (i.e. to keep one host running for end-users while the application on the other host is being upgraded)?
Yes, everRun can be used to help facilitate certain system updates to reduce interruptions and mitigate risk.

Q: Can it work between two virtual machines and on x64 based systems?
Yes, we support XenServer and 64-bit hardware and Windows Server environments.

Q: What is the performance impact of using everRun 2G?
That’s variable depending on your application. It can be anywhere from 3-15%. We’ve done some performance testing specifically on XenApp and Exchange. You can download those white papers here:
Understanding and Characterizing Performance Implications for Running Exchange 2007 with everRun
XenApp 5.0 High Availability Performance

Q: Does Marathon offer backup solutions for everRun users?
We have methods to backup your systems and we’re working improving on our current offerings to make them quicker, easier and more granular.

Q: Can everRun work with dissimilar hardware? Can everRun work with more than two servers?
From a server standpoint, you just need similar processors; storage does not need to be similar. You can have SAN on one side and NAS on the other or any other combination. On the second question, yes, everRun will work with more than two servers. You can build a pool of servers and protect within that pool.

Q: Does everRun have backward compatibility with older OS?
Yes. It will work with Windows Server 2003, and also Windows Server 2008.

Q: Can everRun run on the Foundation Server Edition of Windows 2008?
It does not. everRun supports the full implementation of Windows Server 2008. everRun runs underneath Windows, it does not install into Windows.

Q: How does everRun handle data stored on NAS?
Storage is transparent to everRun. We look at storage as just a LUN.

Q: What is difference between everRun HA and everRun 2G in Windos Server 2003?
The differences are the ability to create multiple workloads. HA can protect one workload. everRun 2G can protect multiple workloads. There is also a new and improved graphical interface with better reporting and management capabilities.

Q: Does everRun work with XenServer 5.5?
Yes, everRun works with XenServer 5.5.

Q: Are there any changes in WS 2008 & WS 2008 R2 in the way that HA improves?
Yes. You can find an overview of those changes directly from David Hanna of Microsoft in our recent webinar and white paper “The Top 10 Reasons to Upgrade to Windows Server 2008.” You can also read the Q&A with Microsoft from that webinar here.

Q: Is everRun 2G available for Microsoft Hyper-v?
We will provide support for Hyper-v in a future release.

Q: With applications using various DNS names, how does this solution integrate with DNS changes? (failover to remote office for true DR-different IP/network)
everRun availability solutions pairs systems within the same subnet of vLAN, eliminating the need to make any DNS changes.

Q: Question is tied to what permissions are needed to do a recovery. For recovery in active Directory most items need to replicate around that there was a change and we do not want to hand out Admin control over the domain(separation of access)
everRun is designed to not require any changes to Active Directory during or after a failure or recovery.

 

Show Discussion / Comments (0)
Availability  Continuous Availability  Data Replication  Disaster Recovery  EverRun  Fault Tolerance  High Availability  Marathon  Webcast  Webinar  Windows 

| More



Tuesday, March 3rd, 2009 - 3:07 pm EST

Q & A for the February Webinar: Practical, Affordable High Availability and Disaster Recovery for a Tough Economy - Featuring Forrester Research

Posted by: Michael Bilancieri

We had a lot of great questions during the Q & A session of our February webinar with Stephanie Balaouras of Forrester Research. We’ve posted the questions and responses here on our blog for everyone’s benefit.

Questions from the webinar:

Q: In the architecture two "mirrored" VMs are shown which are connected. Does that mean that you have to install 2 application VM servers or do you have to install just one and Marathon makes the second?
A: You only need to create one application VM. After this is created, you can use everRun to protect that application. As part of the protection process, everRun creates a “cloned” instance of the application on the second host. The instance is completely identical to the original, with the same identity, MAC address, resources, etc. It is this redundancy created by everRun that protects the applications.

Q: In the Marathon license there is HA and FT. In which are the levels 1-2-3 available?
A: Levels 1, 2, and 3 are available in a single solution called everRun VM and any level of protection can be enabled on a VM. everRun VM level 3 protection will be available in Q2.

Q: The licensing question you just answered seems different from what you used previously. You previously only had to license the VMs OS in a fully protected system. Please explain.
A: Microsoft licensing requires a valid Windows license for each side of the protected VM. Using Enterprise Edition can reduce the number of licenses required. Please refer to Microsoft licensing terms for specific details for your environment.

Q: How does the software communicate between disparate storage NAS to DAS, SATA to Fibre Channel?
A: everRun does not limit you to needing matching storage requirements on multiple hosts. Communication between hosts is done through Availability Links (A-Links), which are private networks between each host. everRun handles the mirroring at the host level, passing I/O through XenServer to write to the disks. The type of disk or connection is not relevant.

Q: How does this compare to VMWare's SRM & VDM products?
A: VMware SRM provides a mechanism to restart a VM on an alternate host, however it relies on other storage mirroring solutions (often within the storage system) to perform the mirroring. SRM does not move data or provide a comprehensive HA or FT solution.

Q: Is the product host based or a fabric based solution?
A: everRun VM is a host based solution, with a minimum of 2 hosts required.

Q: Do you need to keep a warm copy of the applications at the DR site?
A: During the protection process, everRun takes the chosen VM and clones it to the designate secondary host. This creates a complete and identical instance on the secondary host. everRun maintains these two synchronously so that they are always identical. everRun’s unique architecture exposes these two mirrored instances as a single entity; there is no need to install, manage, or update both sides, only the one single instance of the OS/application. Should the entire ‘primary’ host fail, the ‘secondary’ host will immediately start the cloned version. It comes up with the same IP address, hostname, and MAC address of the primary so that there are no client-side, DNS, Active Directory, or other infrastructure changes required.

Q: Is the DATA synchronous like SRDF or near synchronous?
A: everRun performs synchronous mirroring of the entire Windows environment, including the OS, application, and data.

Q: How does this compare to products like RecoverPoint/Replistore, InMage, Neverfail, Falconstor etc?
A: These products are disaster recovery products intended for long-distant asynchronous data replication and failover. everRun availability solutions provide true availability in a comprehensive and automated manner. Marathon also offers DR solutions for long-distant protection. Disaster recovery and availability are mutually exclusive in most cases and should generally be considered separately. They are complimentary more than competing solutions.

Q: What is the software support plan? What are the recurring costs for your product year to year?
A: We offer a Premier support plan or a Basic support plan. The only recurring cost year to year is the cost of support.

Q: What are the operating system requirements, how many copies of the OS do you need?
A: Each Windows environment is mirrored to a secondary host, requiring a second Windows license. Using Enterprise Edition of Windows allows for fewer licensed copies. Please refer to your Windows licensing terms for specific requirements.

Q: Regarding the 10ms sync time, what happens if that time increases to say 20ms due to network traffic?
A: If the latency increases beyond our requirement the paired systems may assume that one system is down and redundancy may be lost. In a properly configured environment the application should remain running while the secondary system is no longer maintained in a redundant fashion. Once the latency returns to within spec, the systems will re-sync automatically and return to a fully redundant state. Typically the application is not impacted.

Q: What are the bandwidth requirements?
A: Best practices state 155MB link between the two hosts. For local systems a simple crossover cable between the two systems is sufficient. When separating the systems the 155MB requirement becomes more relevant. This number can vary depending on the applications being protected and the amount of data being managed.

Q: Do you have instances of numerous geo-available solutions with specific applications?
A: Here are two examples:
MAN AG success story with everRun SplitSite
Chester County, PA success story with SplitSite

Q: Is windows Server 2008 VM supported? If not, why?
A: Windows Server 2008 64-bit will be supported in Q2 of this year.

Show Discussion / Comments (0)
Availability  Data Replication  Disaster Recovery  EverRun  EverRun VM  Marathon  VMware  Webinar  XenServer 

| More



Friday, February 13th, 2009 - 1:49 pm EST

Q & A with Stephanie Balaouras of Forrester on High Availability

Posted by: Melanie Stec

On February 24th, we’re going to be doing a webinar featuring Stephanie Balaouras, Principal Analyst at Forrester Research and co-author of the report, X86 Server Virtualization for High Availability and Disaster Recovery. Stephanie was good enough to sit down with us to answer a couple of questions we had before the webinar.

Q: Stephanie, can you give us the 10,000 ft. explanation of why server virtualization is a good alternative for high availability and disaster recovery?

A: In a nutshell, server virtualization facilitates a rapid — or even automatic — restart of applications after an IT failure, and when used in conjunction with data replication between data centers, it can restart applications at a recovery site following a primary site failure. In particular, x86 server virtualization can improve the availability of business-critical systems that are important to the business but not critical enough to warrant the investment in expensive and complex resiliency technologies like fault-tolerant hardware or clustering.

Q: You had mentioned that Forrester is seeing increased customer interest in active-active strategies for high availability. Is that just in Fortune 500 companies or is the interest broader than that?

A: Active-active isn’t just for the largest of companies. Companies of all sizes are under increasing pressure to improve their recovery capabilities but at the same time, they are under pressure to reduce costs and achieve greater operational efficiencies. Companies need an alternate site so they can failover critical business operations in the event of a primary site failure. Given the necessary investment, an alternate data center simply can't remain idle waiting for some disaster to occur. Companies must determine ways to maximize this investment to improve business operations, accelerate growth, or elevate availability.

Q: What’s changed that is driving the greater interest in active active for HA?

A: There are a couple of reasons why there is a growing interest in active-active strategies. First, as I mentioned, most companies are under increasing pressure to improve recovery objectives. In fact, most companies that I speak with have recovery time and recovery point objectives measured in hours, not days. To achieve this type of recovery, today you need to have dedicated infrastructure (servers, storage etc.) at the alternate site.

In the past, many companies might have turned to a DR services provider for their needs. For cost reasons, they subscribe to shared infrastructure services. Because the infrastructure is shared, recovery is limited to recovery of system configurations and data from tape, which means that best case scenario for recovery is 24 hours to 48 hours. As result, many companies are brining DR “back in-house” and making the business case with better recovery objectives and the ability to use the investment in the alternate site for multiple purposes.

Thanks Stephanie, we’ll see you on the 24th. Want to hear more from Stephanie? View her posts on the Forrester Blog for IT Infrastructure & Operations Professionals.

Register here for the webinar featuring Forrester Research.

Show Discussion / Comments (0)
Availability  Clustering  Data Replication  Disaster Recovery  High Availability  Virtualization  Webinar 

| More



Tuesday, December 30th, 2008 - 12:19 pm EST

Healthcare: An Industry Looking to Use Server Virtualization for High Availability and Disaster Recovery

Posted by: Gary Phillips

For healthcare organizations and their IT departments, almost everything is mission critical, from patient information to registration systems and records management. Information needs to be readily available and data has to protected at all times to avoid compliance risk or calamitous consequences.

From what we’ve seen, the interest in virtualization for high availability and disaster recovery is driven by two key factors: cost savings and greater demand for 24x7 availability of health records. Like so many organizations in this tough economy, health care providers are under tremendous pressure to deliver the same quality services at lower cost. Using server virtualization for server consolidation can help. And the VMotion and XenMotion capability in VMware and XenServer respectively can provide these organizations with DR that is significantly easier to deploy and execute. On top of XenServer they can add everRun VM for fault tolerant, high availability protection that is much more affordable and practical than what they have had in the past.

Testament of the increased interest in virtualization from healthcare organizations comes from our own experiences here at Marathon. We’ve seen a positive uptake in healthcare customers who are deploying everRun VM to protect their virtual environments. Currently, about 30% of new customers that are in Marathon’s pipeline for sales are in the healthcare related space. We can only assume that the number of healthcare customers we service will continue to grow as we venture into 2009.

The changes these organizations are making are allowing them to stay ahead of the competition as they increase efficiency, ensure the availability of patient records and most importantly set the standard for inpatient and outpatient care.

Are you part of a healthcare organization that is starting to deploy server virtualization? Is more effective HA and DR a key goal?

Show Discussion / Comments (0)
Availability  Citrix  Disaster Recovery  EverRun  EverRun VM  Fault Tolerant  Healthcare  High Availability  Marathon  Virtualization  VMware  XenServer 

| More



Thursday, December 4th, 2008 - 10:49 am EST

Exchange 2007 and the Virtualization Opportunity

Posted by: Jerry Melnick

While most companies using Microsoft Exchange still use Exchange 2003, Exchange 2007 provides a new, more flexible architecture that provides real benefits worth looking at. This new architecture is based on server roles. All services and features are organized around five distinct server roles: Mailbox, Client Access, Hub Transport, Unified Messaging and Edge Transport. The big advantage to this approach is that you only have to deploy the roles that are needed and multiple copies of a role can be deployed for enhanced availability, DR and performance.

When Exchange 2007 is run in a virtual server environment each role can be implemented as a separate virtual machine. Individual services can be easily matched to resource requirements by selecting the number and location of the virtual machines implementing each service to be started. The number, location and configuration of these virtual machines can be dynamically adjusted as usage requirements change over time. Infrastructure components that support the Exchange environment, including Active Directory, DNS and DHCP that have traditionally required separate servers and distinct availability solutions, can now be implemented as virtual machines in a common resource pool and leverage the common availability solution that is used to address the entire virtualization environment.

Virtualization also makes disaster recovery easier to implement, more effective and less costly. Virtual machines separate the software configuration from the underlying hardware. This provides total flexibility in the hardware required for the disaster site. One set of hardware can provide disaster backup for multiple applications and cost effective configurations can be chosen strictly based on their disaster recovery role. Software configurations change over time and changes must be duplicated at the disaster site to ensure proper operation. This can be extremely time consuming and error prone in a physical environment. In a virtual environment, the configuration is contained within the virtual machine definition file. Simply copying this file to the disaster site is all that is needed to maintain configuration compatibility.

So how many of you have made the important step to moving to Exchange 2007? If you haven’t deployed 2007 yet, are you planning to? We would love to hear from you. If you have a minute, please take the poll to the left and tell us your plans. If you deployed it, are you taking advantage of the virtualization benefits? Leave us a comment and share your thoughts.

Show Discussion / Comments (0)
Availability  Disaster Recovery  Exchange  Virtual Machine  Virtualization 

| More



Wednesday, November 12th, 2008 - 7:51 am EST

Virtualizing Exchange Webinar Q & A

Posted by: Brian Mullins

Yesterday, Matt Fairbanks, VP Product Marketing, Citrix, and Jerry Melnick, CTO, Marathon, presented the webinar “Virtualizing Exchange – The Cold, Hard Numbers on Why Citrix XenServer + everRun VM is the Best Platform.” Below are a few of the questions asked from participants with Jerry’s response to each:

Q: What happens in a case of a split brain scenario?

Jerry: In our SplitSite products, we have what we call a quorum services capability – it’s actually an additional component that’s added on to manage split brain and arbitrate when you lose all connections between the two machines.

Q: How long does it generally take to set up XenServer with everRun VM to create this kind of a solution?

Jerry: Citrix people have always mentioned “Ten minutes to Xen” which is a pretty good rule of thumb. We say it’s another ten minutes to add the Marathon software. It’s a simple script that gets run on each host, and then you’re off and ready to protect the machines. The actual protection process itself is really a matter of a minute. The simplicity and ease have never been seen before in this industry with this class of availability solution.

Q: In field of limitations and customers that have deployed this kind of technology, are there any things you would council people to consider to set up XenServer and everRun in the most highly available and robust way?

Jerry: With our system, we provide best practice guidelines for configuring networks availability, etc. One of the beauties of our technology – working in conjunction with XenServer – is that once everything is installed and running, we put everything into an active validation mode so that we know components are configured properly. If something is misconfigured or isn’t running redundantly, you’re going to see the status and receive a warning. A key benefit of this system is you will know how to fix it before there are any problems.

There are many cases in availability systems where you have simple failover technologies: you take an error, you failover, you get to that resource, and then you find out the network or disk isn’t working because it wasn’t configured properly. By having this active validation capability and the report out, the status is being monitored in a simple and reliable fashion – you know when you’re redundant and how you’re going to manage failures.

Thanks to everyone that attended. For those that didn’t have the chance to attend or ask questions, please feel free to leave them in the comments section and we will do our best to answer them.

Show Discussion / Comments (0)
Citrix  Disaster Recovery  EverRun VM  Exchange  Marathon  Webinar  XenServer 

| More



Wednesday, October 15th, 2008 - 7:45 am EDT

How Midsize Companies Can Get Practical Business Continuity and Disaster Recovery Using Server Virtualization

Posted by: Brian Mullins

On October 21 at 10:00 a.m. EST, our CTO Jerry Melnick will be a featured presenter at the 2008 NorthEast Disaster Recovery Information X-Change (NEDRIX). Jerry’s presentation, Better Business Continuity and Disaster Recovery through Virtualization, will help attendees learn how and why server virtualization done right can:

• Make disaster recovery planning and execution much easier
• Simplify the notoriously difficult process of high availability maintenance
• Deliver high availability protection tailored for each application

Are any of you currently using virtualization for business continuity or disaster recovery? If so, what have your experiences been like thus far?

This year’s conference will take place from October 20-22 at the Hyatt Goat Island Newport, RI. For more information about the event and how you can register please visit NEDRIX’ website.

Show Discussion / Comments (0)
Business Continuity  Disaster Recovery  Events  High Availability  Virtualization 

| More



Wednesday, September 10th, 2008 - 1:41 pm EDT

Understanding Disaster Recovery & High Availability

Posted by: Michael Bilancieri

This afternoon I was fortunate enough to lead the “Breaking Through the Confusion About Disaster Recovery and High Availability” Webinar. I would like to thank everyone that attended and give a special thanks to Alex Jarret from the Technology Executives Club for hosting the event.

Unfortunately there was a minor error towards the end and participants did not have the opportunity to send me their questions, except for one individual who asked if I could provide them with the presentation. In response, I’ve made the presentation available in PDF format which can be downloaded here.

If anyone that attended the Webinar had any questions they haven’t yet had a chance to ask or new questions arise while reviewing the presentation, please feel free to email them to me directly at MBilancieri[at]marathontechnologies.com and I will do my best to answer them.

Show Discussion / Comments (0)
Disaster Recovery  High Availability  Webcast  Webinar 

| More



Wednesday, August 27th, 2008 - 6:23 am EDT

Breaking Through the Confusion about Disaster Recovery and High Availability

Posted by: Michael Bilancieri

Virtually every company we talk to needs both disaster recovery solutions to recover their systems and data after a major disruption, and high availability to keep key applications always available. In my discussions with companies considering our everRun software, I’ve heard a lot of them say that they are confused by many vendors’ claims and counter-claims for DR and HA. One of the biggest sources of confusion is that some vendors with solid products for disaster recovery are trying to pass off their DR solutions as reliable HA solutions. If the feedback I’m getting is any indication, these DR solutions posing as HA solutions just don’t work.

It’s not hard to see why a DR solution doesn’t make a good HA solution. With a product that is good at DR, in most cases getting the data across to the other location is pretty straightforward. But when you try to use the same solution to get both the application and the data across to use it for HA, well that’s where it breaks down. Let’s look at why.

A good DR product is usually fairly easy to set up for data replication to another site. But setting up the same product to restart the whole thing, application and data, when a failover occurs is complex and prone to errors. To set it up, you have to script all the pieces to make it happen – fault detection, client redirection to the DR site, application reset, and the list goes on. No wonder we so often hear that scripted-DR-for-HA doesn’t work consistently – there are too many moving parts that have to managed and monitored. In addition, no matter how minor a failure is, failover to the remote site is required. Not every failure you face is a disaster; therefore each failure should not be treated as one. Based on these horror stories, we thought it was a good idea to put together this webinar, Breaking Through the Confusion about DR and HA. I hope to help you better understand when, how, and why DR is the best fit to meet your requirements, when to use an HA solution and how to combine the two for optimal protection.

Interested? You can register here.

Show Discussion / Comments (0)
Disaster Recovery  EverRun VM  High Availability  Marathon  Virtualization  Webcast 

| More



Wednesday, July 30th, 2008 - 11:56 am EDT

Preventing Disaster Rather than Recovering from It

Posted by: Michael Bilancieri

We all like to think that we will be prepared in the event of an emergency, or a disaster. Hospitals exist if we fall sick; fire stations surround us if flames break loose; we are constantly preparing so if a catastrophe strikes, we are ready.

Preparing for a system’s disaster is no different. However, how to go about preparing for an event like this can be confusing. There are many options out there when it comes to protecting your system, each best suited for specific requirement. Unfortunately, many vendors use terms like disaster recovery and high availability interchangeably to describe their solutions when in fact they are usually designed for one or the other.

Disaster Recovery (DR) is the way to recover applications and from a system failure. DR is a reactive solution where if a failure occurs, IT relocates the data, builds the system over, and brings everything back up to working order. This takes time, a precious commodity that typically businesses relying on critical applications don’t have. In addition, recovering applications could bring about a number of side effects which you really don’t want to endure every time some minor failure happens.

But what if I could tell you that instead of worrying about how to recover from a computer system failing, you could simply prevent it from occurring at all?

Disaster tolerance (DT) is a proactive way to prevent system failure from impacting application and data availability. A disaster tolerant solution isn’t going to recover the data if there’s a disaster. Instead it will tolerate the fault if a disaster occurs – keeping an organization’s critical applications up and running at all times. It is not recovery, but rather prevention. And with solutions like our everRun SplitSite, separate servers don’t even need to be in the same building – they can be up to 100 miles apart with fault-tolerant protection between the two locations.

DR solutions are good for applications that can afford some downtime while you recover them. But for essential applications like Microsoft Exchange, SQL, and SharePoint, which need to be available all the time, disaster tolerance is often the best way to go.

So what combination of DT and DR protection would work best for your company’s applications?

Show Discussion / Comments (0)
Availability  CIO  Disaster Recovery  Disaster Tolerance  Downtime  EverRun  Exchange  Fault Tolerance  High Availability  Marathon  Sharepoint 

| More



Monday, March 24th, 2008 - 3:14 pm EDT

You Heard it Here First!!!

Posted by: admin

After much speculation and blogosphere rumors, we decided it was time to let the cat out of the bag and officially launch everRun VM! Of course, for an announcement this big, we thought unveiling the news LIVE right here on the blog was the best way to inform the press, analysts and general public about the new product we’ve been working so hard on. So, tell your friends you heard it here first!

Wait a minute? The release crossed the wire this morning? Gary, Michael, Steve and Jerry have already been talking to the press?

Well, then….

You Heard it Here… Eighth (or Ninth)!!!

We’ve included some links to the everRun VM coverage below. We’ll keep you posted on the progress of everRun VM beta testing and the feedback we receive from testers. In the meantime, enjoy the articles and leave us a comment if your interested in learning more about the product. As you can tell from all the quotes in these articles, we’re always happy to talk!

Marathon Releases Virtual HA, Fault Tolerance
Byte and Switch
http://www.byteandswitch.com/document.asp?doc_id=149019&WT.svl=news2_1

Marathon's Virtualization Tool Simplifies Disaster Recovery
CIO
http://www.cio.com/article/print/202350

Get fault tolerant virtual servers
Computerworld
http://blogs.computerworld.com/get_fault_tolerant_virtual_servers

Marathon Launches Fault-Tolerant Software For Server Virtualization
CRN
http://www.crn.com/hardware/206905384

Marathon extends fault tolerance to VMs
IDG

everRun VM Hits the Ground Running
Virtual Strategy Magazine
http://www.virtual-strategy.com/vsm-podcasts/everrun-vm-hits-the-ground-running.html

Show Discussion / Comments (0)
Availability  CIO  Disaster Recovery  EverRun  EverRun VM  Fault Tolerance  Fault Tolerant  Marathon  Podcast  Virtualization  XenServer 

| More



Wednesday, August 29th, 2007 - 1:47 pm EDT

Are your customers protected in the event of a disaster?

Posted by: admin

When businesses think of disaster planning they take the basic cautionary measures; boarding up their businesses, putting hard files in water-safe boxes in case of flooding and most importantly discussing evacuation procedures with their employees in case of a catastrophic event. However, organizations like the Credit Union National Association (CUNA) have started to evaluate disaster planning differently in regards to protecting data crucial to business operations.

After Hurricane Katrina struck Louisiana in 2005, it became evident that one of the primary concerns when dealing with disaster recovery is ensuring customers’ funds are safe. With Hurricane Dean marking the first major hurricane of the season, we hope that credit unions and other businesses in the hurricanes path took similar precautionary measures that the Texas Credit Union League and the Louisiana League utilized as highlighted here.

Since one of our primary concerns is ensuring application availability, we encourage you to re-evaluate your disaster recovery plans so that natural disasters won’t mean the end of your business or major monetary losses. With today’s reliance on critical applications, taking the initiative to protect your data allows you to spend more time focused on your personal life, knowing your business is safe.

Show Discussion / Comments (0)
Availability  Disaster Recovery 

| More



Tuesday, August 14th, 2007 - 1:45 pm EDT

Disaster Recovery

Posted by: admin

Disaster recovery is a plan which enables the protection and restoration of critical information in the event of disruption. Disaster recovery management includes functions such as identifying the critical and vital information, determining recovery needs, developing backup solutions and implementing the backup/recovery solution.

Show Discussion / Comments (1)
Disaster Recovery  Glossary 

| More