Quantcast
Channel: You Had Me At EHLO…
Viewing all 607 articles
Browse latest View live

Introducing Office Web App Integration with Outlook Web App in Exchange Online

$
0
0

On the Exchange team, it’s always important to us to get feedback from our customers on what we’re doing well and what we could be doing to make the experience better for you. In Exchange 2007, in response to one of the biggest Exchange 2003 OWA feature requests, we introduced WebReady Document Viewing. This let OWA users preview attached Microsoft Office documents and PDFs in a web browser without having to save them to disk or open them in a locally-installed application. We continued to offer this functionality in Exchange 2010, but we received a lot of feedback that text and formatting of Office documents were sometimes not the same as the original document when viewed in the desktop Office client applications.

We’re excited to announce an update to OWA in Exchange Online that now integrates the Office Web Apps into the attachment previewing experience for Word, Excel, and PowerPoint files! Along with continued PDF support, this means Exchange Online users get high-fidelity previews of Office documents on the web, in exactly the same format they were created. WebReady Document Viewing is perfect for quick document previews and if you need to edit a document you can easily open the file in your desktop Office client from the Office Web App through a single click.

Click on the “Open in browser” link you see next to Office document attachments to start using this feature in Exchange Online today!

David Alexander, Kartik Murthy


Capacity Planning – Yes Transaction Log Space is Critical to Keeping your Databases Healthy and Mounted

$
0
0

The other day I was chatting with one of our Supportability Program Managers, Nino Bilic, and he mentioned something that was rather alarming  - the number one reason why our Premier customers open Exchange 2010 critical situations is because Mailbox databases dismount due to running out of disk space on the transaction log LUN. 

I’ll let that sink in for a moment.  Naturally I’m shocked…to be completely honest, I thought with the Mailbox Requirements Calculator and our guidance on TechNet, we’d have wiped out this issue by now.  After sharing this information with me, Nino decided that I, not he, should write a blog article on the topic of transaction log capacity planning (gee, thanks Nino!).

Capacity Planning 101

In order to properly size a transaction log LUN, we need to understand a few things about the environment:

  1. How many mailboxes will reside in the database?
  2. What is the message profile of the mailboxes in the database?
  3. What is the average message size?
  4. What is the average mailbox size?
  5. How many mailboxes are moved per day?
  6. What is the backup and restore solution?
  7. Does the solution need to take into account any other failure scenarios, like network failures?

For the purposes of this discussion, let’s assume that each database will house 250 mailboxes.  Each mailbox sends/receives a 150 messages per day, with an average message size of 100KB.  Based on the table in Understanding Mailbox Database and Log Capacity Factors, we know that a 150 message profile with a 75KB average message size generates 30 transaction logs per day (24 hour period).  Since our message size is greater than 75KB, we need to account for that in our transaction logs per mailbox generation.  The guidance stipulates:

If the average message size doubles to 150 KB, the logs generated per mailbox increases by a factor of 1.9. This number represents the percentage of the database that contains the attachments and message tables (message bodies and attachments).

Therefore, we can determine the impact our 100KB average message size has with this formula:

150 / 1.9 = [average message size of profile] / x

x = (100 * 1.9) / 150

x = 1.266666666666667 ~ 1.27

So by having a message size that is 25KB larger than the baseline, the number of transaction logs generated per day per mailbox increases by a factor of 1.27.  Therefore, 30 transaction logs * 1.27 = 39 transaction logs / day / mailbox.  This means, that for a database of 250 mailboxes, each database will generate 39 * 250 = 9,750 mailbox generated transaction logs / day / database.

Mailbox moves also generate transaction logs.  Each mailbox moved to the destination database generates roughly enough logs (at the destination, not the source) that equal the size of the mailbox (including the contents in the Recoverable Items folders).  For example, moving 1% of the mailboxes per day will mean that 2.5 mailboxes are moved into a database each day.  If each mailbox is 5.4GB in size on average (including 14 day deleted item retention with Single Item Recovery enabled), then 2.5 * 5.4GB / 1024 = 13,888 mailbox move transaction logs / day / database.

From a backup/restore perspective, we need to take into account the type of backup architecture we are leveraging.  With each backup scenario, there is a recommended number of additional days you should provision from a capacity perspective for your mailbox generated transaction logs.  By provisioning extra space, you can survive multiple failures without suffering an outage event.  For more information on transaction log truncation, see Understanding Backup, Restore and Disaster Recovery.

  Transaction Log Truncation Recommended Backup Failure Protection
Daily Full Backup Daily 3 days
Weekly Full Backup / Daily Incremental Daily 3 days
Weekly Full Backup / Daily Differential Weekly 7 days
Bi-Monthly Full Backup / Daily Incremental Daily 3 days
Exchange Native Data Protection As logs are no longer required 3 days

Of course, there are other scenarios that you may need to consider.  For example, if you are deploying a stretched Database Availability Group (DAG) across two datacenters, log truncation will only occur if the network link between the two datacenters is operational and the database copies are healthy.  If you know that an outage of the WAN link could take 5 days to repair, you should adjust your backup failure protection to take that into account.

For our scenario, let’s assume we only need to ensure we can survive 3 days of truncation failure events. This means that we need 9,750 / 1024 * 3 = 28.5GB of disk space for our mailbox generated transaction logs.

In addition, we need to account for the amount of disk space required for our mailbox move events for the entire week: 13,888 / 1014 * 7 days = 94.9GB of disk space for our mailbox move operations.

All told, this means that each database needs 123GB of disk space for transaction logs.  We should also include a data overhead factor as well, to account for any unexplained phenomenon that may occur: 123GB * 1.2 = 148GB of disk space for transaction logs.

If we are deploying a dedicated LUN for the transaction logs, we would not provision a LUN of 150GB as that would mean that we could consume all of the disk space if we were having backup failures and excessive mailbox moves.  Typically you want to ensure that each LUN is provisioned such that only 80% of the disk capacity is utilized.  The formula is:

LUN Space = [projected disk space utilization] / (1 – [desired free space percentage])

LUN Space = 148GB / (1 – .2) = 148GB / .8 = 185GB LUN Space for Dedicated Transaction Log Volume

If you are deploying the transaction logs on the same LUN as the database, you would simply combine the transaction log disk space requirements with the database disk space requirements for the [projected disk space utilization] value.

How can I prevent consuming all of my transaction log disk space?

First and foremost you need to obtain a baseline of your environment to determine you typical log generation rate per day.  In addition, you must setup monitoring and take action on any alerts that are generated.  Monitoring should monitor for the following scenarios:

  1. Transaction Log LUN disk space.  Setup up several thresholds and different alerting mechanisms.  Your first alert should not be the one that indicates 90% of your disk has been consumed.  If you know your typical log generation baseline, you can setup a threshold to report if you are 20% over, for example.
  2. Monitor for successful completion of your backups (if you aren’t leveraging Exchange Native Data Protection).  Your first indication of backup failures should not be when you run out of disk space.
  3. Monitor for the truncation events in the Application Log.
  4. Monitor your database copy replication health. 

What if I’m having unexplained growth in my Transaction Logs?

My friend, Mike Lagase, wrote a great article on how to troubleshoot this scenario - http://blogs.technet.com/b/mikelag/archive/2009/07/12/troubleshooting-store-log-database-growth-issues.aspx (please note that the article was written with Exchange 2007 in mind, so several of the tools and/or recommendations may no longer apply with Exchange 2010).  In addition to the steps Mike mentions, you can utilize the following in Exchange 2010 to help determine the unexplained transaction log growth (thanks to Todd Luttinen for putting this list together):

  1. You can use the store usage statistics cmdlet  (get-StoreUsageStatistics with DigestCategory = ‘LogBytes’) to identify mailboxes generating high log byte count.  Note that this doesn’t always work for cases where log bytes aren’t generated by the mailbox owner or the operation is performed on behalf of client (like CopyOnWrite) and doesn’t include log bytes generated by system services (reported in Event ID 9826).  These stats provide a summary of last 10 min of activity for top mailboxes generating log activity (up to 6 samples covering last hour). The following shows how to use store usage stats to find top mailbox generating log bytes over last hour:

    [PS] C:\>$stats = Get-StoreUsageStatistics –Database <Database Name>
    [PS] C:\>$stats | ? {$_.DigestCategory -eq 'LogBytes'} | group MailboxGuid |sort count -Descending | Select -first 1 -ExpandProperty Group | sort SampleTime | ft -a MailboxGuid,Sample*,Log*

    MailboxGuid SampleID SampleTime LogRecordCount LogRecordBytes
    c007c87a-e030-4414-b741-9cf61e88b9de 5 11/7/2011 4:25:05 PM 237 274163
    c007c87a-e030-4414-b741-9cf61e88b9de 4 11/7/2011 4:35:05 PM 451 387362
    c007c87a-e030-4414-b741-9cf61e88b9de 3 11/7/2011 4:45:06 PM 483 144999
    c007c87a-e030-4414-b741-9cf61e88b9de 2 11/7/2011 4:55:06 PM 734 293433
    c007c87a-e030-4414-b741-9cf61e88b9de 1 11/7/2011 5:05:06 PM 933 411485
    c007c87a-e030-4414-b741-9cf61e88b9de 0 11/7/2011 5:15:06 PM 247 209987

  2. There are also application events generated for administrative clients (Event ID 9826).  These stats represent 2 hours of activity:

    Starting from <date/time> service <name> has performed this activity on the server:
    RPC Operations: 24168.
    Database Pages Read: 1329 (of which 629 pages preread).
    Database Pages Updated: 12418 (of which 11555 pages reupdated).
    Database Log Records Generated: 13906.
    Database Log Records Bytes Generated: 660331.
    Time in Server: 19142 ms.
    Time in User Mode: 6100 ms.
    Time in Kernel Mode: 63 ms.

  3. The performance monitor counter “MSExchangeIS Client(*)\JET Log Record Bytes/sec” can be used to identify what client type is causing log growth.

I think all of us understand how critical it is to ensure that there is enough capacity to ensure that your database availability is not affected.  Hopefully this information helps in planning your transaction log capacity.

Ross Smith IV
Principal Program Manager
Exchange Customer Experience

Time to revisit recommendations around Windows networking enhancements usually called Microsoft Scalable Networking Pack

$
0
0

Over the years, there has been a lot of debate around features in Windows which are usually referred to as Microsoft Scalable Networking Pack (individual features are known as Receive Side Scaling (RSS) and Chimney/TCP Connection Offload/TOE), and the effect of having them enabled or disabled on our servers.

Taking the trip down the memory lane - while it is true that when the features were released in Windows 2003 SP2, there were some issues to work out (in both Microsoft and 3rd party code such as network drivers) - the situation has improved dramatically over the years, to the point where disabling them can have significant impact on the performance of your servers.

Here is an example:

The following screenshot shows one of CPUs being overly taxed while others are not sharing the load. This is quite typical on a server with busy networking connection and RSS feature turned off:

clip_image002[5]

The following shows a bit better what happens when RSS is in fact enabled on the server. The point of enabling it is illustrated by the red circle. Note how a single processor was very busy with networking traffic while the rest were not nearly as busy, and what happens after RSS was enabled:

clip_image004[5]

Now that I have your attention - I wanted to point you to an article that one of my counterparts from Windows team, Tod Edwards, has written recently - which goes in depth on what those features are, why you should enable them, how to do so and also - how to make sure that you are in a good place when you do. Please go here to read it:

Give Microsoft’s Scalable Networking Pack Another Look

http://www.windowsitpro.com/article/networking/give-microsofts-scalable-networking-pack-140350

(It should go without saying but: please make sure that your network card drivers are updated!)

Enjoy!

Nino Bilic

Windows Disk Timeouts and Exchange Server 2010

$
0
0

A few months Bruce Langworthy wrote an excellent article regarding some new recommendations for setting the Windows Disk Timeout value - http://blogs.msdn.com/b/san/archive/2011/08/15/the-windows-disk-timeout-value-understanding-why-this-should-be-set-to-a-small-value.aspx.

This post got me thinking about Exchange and how we deal with I/O problems. If you haven't read Bruce’s article, it explains that the default disk timeout of 60 seconds means that Windows will not report the hung I/O for 60 seconds and won’t retry the I/O for 8 minutes. 8 minutes is far too long to wait before retrying a hung IO, so Microsoft is releasing new guidance recommending changing the Windows Disk Timeout setting to a value that aligns with your storage architecture.

The question in my mind for Exchange was simple, how does this disk timeout behavior affect Exchange DAG deployments; more specifically should I reduce the Windows Disk Timeout on my Exchange Servers as per the new recommendations or leave things alone??

To answer this question I approached some of our ESE developers to get their thoughts… this is what came from that discussion…

  • The Windows Disk Timeout value is mainly intended for event logging and I/O retry.
  • Prior to Exchange Server 2010, Exchange did not take any action for slow I/O other than report it in the event log.
  • Exchange Server 2010 RTM introduced pre-emptive page patching (clean page overwrite) for pages affected by slow I/O.
  • Exchange Server 2010 SP1 is the first version of Exchange to include intelligence for dealing with hung I/O and will actively fail (bugcheck) the server if the hung I/O is affecting active databases on a DAG node.

I decided that before we could determine what to do with our disk timeout settings that first we must understand what intelligence Exchange Server 2010 SP1 introduced and how it might interact with disk timeouts.

Exchange Server 2010 SP1 Extensible Storage Engine Recovery on Hung IO

Exchange Server 2010 SP1 brought with it some great improvements in how we deal with hung I/O. These improvements are discussed in detail in the following TechNet article http://technet.microsoft.com/en-us/library/ff625233.aspx:

“Exchange 2010 SP1 includes new recovery logic that leverages the built-in Windows bugcheck behavior when certain conditions occur, specifically, when hung IO occurs. In SP1, Extensible Storage Engine (ESE) has been updated to detect hung IO and to take corrective action to automatically recover the server. ESE maintains an IO watchdog thread that detects when an IO has been outstanding for a specific period of time. By default, if an IO for a database is outstanding for more than one minute, ESE will log an event. If a database has an IO outstanding for greater than 4 minutes, it will log a specific failure event, if it is possible to do so. ESE event 507, 508, 509 or 510 may or may not be logged, depending on the nature of the hung IO. If the nature of the problem is such that the OS volume is affected or the ability to write to the event log is affected, the events will not be logged. If the events are logged, the Microsoft Exchange Replication service (MSExchangeRepl.exe) will detect that condition and intentionally cause a bugcheck of Windows by terminating the wininit.exe process.”

So, what does this mean? Well after some discussion (and some searching of ESE code), the following table was created to make the behavior easier to understand (I have included previous versions of Exchange for reference).

Note: I really want to say huge thanks at this point to Alexandre Costa and Brett Shirley who are both ESE developers within the Exchange team and without whom this information would not have been possible – thanks guys!

Exchange Version

I/O Type

I/O Time

Behavior

Exchange Server 2003

Completed

>60 seconds

  • Write to Event Log

Exchange Server 2007

Completed

>60 seconds

  • Write to Event Log

Exchange Server 2010 RTM

Completed

>60 seconds

  • Write to Event Log
  • ESE performs clean-page overwrite on pages affected by slow I/O

Exchange Server 2010 SP1

In Flight

>60 seconds

  • Write to Event Log

>4 minutes

  • Terminate wininit.exe process and bugcheck the server.

Completed

>30 seconds

  • Write to Event Log
  • ESE performs clean-page overwrite on pages affected by slow I/O

Note: In Flight I/O describes a slow I/O operation that has not yet successfully completed. Completed I/O represents a slow I/O that has completed, but has taken longer than 30 seconds. It is important to note here that prior to Exchange Server 2010 there was no concept of detecting slow I/O in-flights, we only reported once the I/O had completed.

I don't like this new behaviour, what can I do about it?

As with most things, I would advise against changing the new behavior unless you have a very clearly defined and compelling reason to do so… However, if you do need to modify the new Extensible Storage Engine Recovery on Hung IO behavior then there are some registry keys/Active Directory attributes that allow you to do so which are documented here.

Conclusion

If we go back to the reason I started out writing this article it was to assess if we should reduce the Windows Disk TimeOutVale on Exchange DAG server nodes as recommended here.

After speaking with Matt Gossage in the Exchange team (Matt knows everything about Exchange and I/O), he explained that one of the things that the disk timeout does is to protect the host from bus reset storms. One of the interesting side effects when an I/O reaches the Windows disk TimeOutValue is that the disk.sys driver will issue a bus reset, this reset affects all LUN’s on the server, not just the LUN that is failing to respond.

The most common scenario where this behaviour has been observed is with Exchange 2010 and JBOD storage. Where a RAID solution is deployed the disk controller is able to deal with bad block reads by either reading the data from another disk or re-calculating the data from parity; this delays the I/O, but not significantly. With JBOD there is only a single copy of the data block and so there is the potential for a bad block to cause a hung I/O while we wait for the disk to try and read the data – the bottom line here is that with a JBOD deployment we do not want to reduce disk TimeOutValue and in fact we may even want to increase it to reduce the effects of a bus reset storm if one of the JBOD disk spindles begins to fail.

The following table outlines the recommended guidance for setting the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue for servers running the Exchange Server 2010 mailbox role.

Scenario Recommendation
Direct-Attached Storage
  • Reduce Windows disk TimeOutValue to 20 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash
SAN-Attached RAID Storage
  • Reduce Windows disk TimeOutValue to 20 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash
JBOD Storage
  • Increase Windows disk TimeOutValue to 180 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash

Neil Johnson
Senior Consultant, UK MCS

Recommended Windows Hotfix for Database Availability Groups running Windows Server 2008 R2

$
0
0

In early August of this year, the Windows SE team released the following Knowledge Base (KB) article and accompanying software hotfix regarding an issue in Windows Server 2008 R2 failover clusters:

KB2550886 - A transient communication failure causes a Windows Server 2008 R2 failover cluster to stop working

This hotfix is strongly recommended for all databases availability groups that are stretched across multiple datacenters. For DAGs that are not stretched across multiple datacenters, this hotfix is good to have, as well. The article describes a race condition and cluster database deadlock issue that can occur when a Windows Failover cluster encounters a transient communication failure. There is a race condition within the reconnection logic of cluster nodes that manifests itself when the cluster has communication failures. When this occurs, it will cause the cluster database to hang, resulting in quorum loss in the failover cluster.

As described on TechNet, a database availability group (DAG) relies on specific cluster functionality, including the cluster database. In order for a DAG to be able to operate and provide high availability, the cluster and the cluster database must also be operating properly.

Microsoft has encountered scenarios in which a transient network failure occurs (a failure of network communications for about 60 seconds) and as a result, the entire cluster is deadlocked and all databases are within the DAG are dismounted. Since it is not very easy to determine which cluster node is actually deadlocked, if a failover cluster deadlocks as a result of the reconnect logic race, the only available course of action is to restart all members within the entire cluster to resolve the deadlock condition.

The problem typically manifests itself in the form of cluster quorum loss due to an asymmetric communication failure (when two nodes cannot communicate with each other but can still communicate with other nodes). If there are delays among other nodes in the receiving of cluster regroup messages from the cluster’s Global Update Manager (GUM), regroup messages can end up being received in unexpected order. When that happens, the cluster loses quorum instead of invoking the expected behavior, which is to remove one of the nodes that experienced the initial communication failure from the cluster.

Generally, this bug manifests when there is asymmetric latency (for example, where half of the DAG members have latency of 1 ms, while the other half of the DAG members have 30 ms latency) for two cluster nodes that discover a broken connection between the pair. If the first node detects a connection loss well before the second node, a race condition can occur:

  • The first node will initiate a reconnect of the stream between the two nodes. This will cause the second node to add the new stream to its data.
  • Adding the new stream tears down the old stream and sets its failure handler to ignore. In the failure case, the old stream is the failed stream that has not been detected yet.
  • When the connection break is detected on the second node, the second node will initiate a reconnect sequence of its own. If the connection break is detected in the proper race window, the failed stream's failure handler will be set to ignore, and the reconnect process will not initiate a reconnect. It will, however, issue a pause for the send queue, which stops messages from being sent between the nodes. When the messages are stopped, this prevents GUM from operating correctly and forces a cluster restart.

If this issue does occur, the consequences are very bad for DAGs. As a result, we recommend that you deploy this hotfix to all of your Mailbox servers that are members of a DAG, especially if the DAG is stretched across datacenters. This hotfix can also benefit environments running Exchange 2007 Single Copy Clusters and Cluster Continuous Replication environments.

In addition to fixing the issue described above, KB2550886 also includes other important Windows Server 2008 R2 hotfixes that are also recommended for DAGs:

Exchange Advice Video – YOUR Advice

$
0
0

You hear from us pretty often. While we absolutely want you to read Ross’ blog and factor in transaction log space in your capacity planning or listen to Perry talk about how we’ve shifted to a services culture, we like hearing from you too. As part of our Exchange Ideas video series, we asked some folks in the Exchange community, “If you could give one piece of advice to other Exchange folks in 15 seconds or less, what would it be?”

In this Exchange Advice video, you are imparting your wisdom to the rest of the Exchange world and we think some of your recommendations are pretty fantastic.

Let us know what you’d like to talk about and share with the rest of the Exchange community. We’ll be around at upcoming events and would love to hear from you.

Cheers!

Ann Vu
Technical Product Manager

Released: Exchange Server 2010 SP2

$
0
0

I had previously mentioned that Exchange 2010 Service Pack 2 would be coming this year – and it’s here! I’m pleased to announce the availability of Exchange Server 2010 Service Pack 2 which is ready to download.

We’re delighted to continually add value to Exchange as part of our ongoing release rhythm and the enhancements in this Service Park are largely due to your feedback. SP2 includes much anticipated features such as the Hybrid Configuration Wizard, Address Book Policies, Outlook Web App Mini and Cross-Site Silent Redirection for Outlook Web App as well as customer requested fixes and rollups released prior to Service Pack 2.

As we did with SP1, Service Pack 2 is a fully slipstreamed version of Exchange with 13 server languages and 66 client languages (including English) available in a single package. There is no separate download for client and server languages; you’ll only need to download and install separate language packs if you have Unified Messaging.

Please check out the features in more detail or download SP2 and try them out yourself.

I had also announced that we would support the on-premises configuration of Exchange in a multi-tenant environment. In order to receive support, we’ll publish a follow-up blog shortly that will outline some scenarios and point to our detailed guidance. Please stay tuned.

Thanks again to our TAP participants and you, our customers for all of the great feedback that you provide us!

Kevin Allison
General Manager
Exchange Customer Experience

Updates

Exchange 2010 Service Pack 2 and Hosting

$
0
0

With the changes in strategy announced by the Exchange team a few months back we wanted to take the opportunity to make clear what is supported in what are typically referred to as hosting scenarios.

We announced that hosters would be able to use Exchange 2010 SP2 to provide hosted Exchange services once we released it. Well, we just released it and we have now also released guidance to help our customers configure their solutions in a supported manner. We have created a web site to recognize control panel vendors who have provided adequate details of their solutions for us to list them as having a compliant solution. The guidance is intended for both hosters and control panel ISV’s, but will also be useful for anyone trying to build a multi-tenant type system (sometimes referred to as a private cloud), using Exchange 2010 SP2.

The most important thing to understand is that a hoster, a control panel vendor, or anyone who uses and follows the guidance we publish publically to build their solution is fundamentally no different than any other customer who deploys Exchange, but chooses not to change any of the default settings. We intend to offer support to you no differently than we would any other customer.

For example, you are an a typical Enterprise customer, and deploy Exchange, configure some Address Book Policies (ABP), change some calendar permissions and add few thousand accepted domains, you will get support just as you always have, as your configuration uses only supported tools and processes. As a hoster or private cloud builder it will be no different. You too create objects, set up some ABP’s, and may end up with an unusual configuration in the eyes of an average Exchange customer, but that is all it is – unusual, customized to meet your requirements, but not unsupported.

Here are a few examples to try and clarify what this means:

  • You call us with an Exchange transport agent problem and it is clear that whatever you built doesn’t follow any of our published development guidance. We will recommend you change it to follow our guidance, and that advice won’t change whether you are a hoster, building a private cloud or are an Enterprise organization.
  • You are a hoster and call us and to say that you can’t stop internal OOF’s being delivered between tenants on your self-built hosting platform. We point you at our hosting guidance where we clearly state this is a known issue with this type of configuration and also tell you that the document also suggests the right approach to take to try and solve this kind of issue. If you want to then open a separate developer case to help as you create the solution, you can do that too.

So as you can see, if you are a hoster, or an Enterprise customer, or someone who builds themselves a solution to host multiple tenants in some way, and you have used supported tools and methods to configure your system we’ll be able to effectively support it. That’s really no different than it is today, if you choose to make some rather unusual changes to your system, we don’t ask to validate the end to end system before we help you recover that database. If, on the other hand, the database failed because of that rather unusual change you made, that’s when we get to discuss why you made those changes and potentially point out they are unsupported.

If a control panel vendor wishes to sell their solution AND have their solution listed on our web page, they need to provide written confirmation to us that their solution complies with the ENTIRE guidance document. If they only 90% comply, they won’t be listed. It won’t stop a vendor selling their solution, as they can do that without us reviewing any of their solution, but a customer who wants to buy a solution will not see theirs listed on our web page.

So in summary, for customers using Exchange 2010 SP2, we will treat our hosters and enterprise customers the same – if the root cause of your problem is an unsupported setting or change, we will point that out and recommend you change it. As a hoster you can really create a multi-tenancy system without making any unsupported changes. The guidance we have published will help you to do so, and we recommend you follow it.

I like to think about it like this: our end goal in providing guidance and allowing hosters to use Exchange Server 2010 SP2 is to make sure they end up with a solution based upon a supported configuration, which makes their system just the same as anyone else’s. We really do want you to get support for your system when you need it, you just need to make sure what you are doing will help us to help you.

Greg Taylor


Exchange Online December 2011 Service Update

$
0
0

Over the next several months Microsoft will deploy the Exchange Online December 2011 Service Update. As a part of this update, we will also make high availability architecture enhancements in all datacenters that host Office 365 tenant domains. These updates will be staggered globally, beginning in December 2011, and we expect full deployment to complete within six months. Approximately one week before the service update is deployed to your region we will post more information in the Planned Maintenance section in your Service Health Dashboard portal. Please check this resource for awareness of when your update is scheduled to begin.

The changes contained in this update are summarized below.

  • High Availability Architecture Enhancements: We are extending our high availability architecture across additional sites to provide greater resilience in the event of network failures. Administrators and end users may notice changes to server names in URLs and in protocol settings. The connection for client applications and devices, including those configured to connect directly to server addresses, will automatically redirect when the mailbox is migrated to the latest software. A very small percentage of mobile devices are not 100% compliant and may have to be reconfigured to connect to a changed pod address. Please refer users to the Mobile Phone Setup Wizard for connection procedures.
  • Sender Photos in Outlook Web App: You can now match faces to names in your organization with photos displayed next to sender information in emails. Display of photos is enabled by default, but you can modify the settings of your Outlook Web App mailbox policy to disable this feature.
  • Outlook Web App in Internet Explorer 9 App Mode: Outlook Web App can now be pinned to the task bar using Internet Explorer 9 App Mode. This gives you the ability to launch Outlook Web App with one click and run it with fewer distractions, separated from other browsing sessions. It also keeps you informed of incoming email and IM when minimized or hidden and offers quick access to common Outlook Web App commands from the taskbar.
  • Group Naming Policy: You can now standardize and manage the names of distribution groups, also known as public groups, created by users in your organization. You can require a specific prefix and suffix be added to the name for a distribution group when it's created, and you can block specific words from being used. This helps you minimize the use of inappropriate words in group names.
  • Retention Policy and Tag Management: We have made it easier than ever to manage retention settings for the user mailboxes in your organization. You can now use the mail control settings in Exchange Control Panel to create and manage retention tags and policies.
  • Multi-Mailbox Search Enhancements: You can now launch a separate window to preview message hits and statistics for each query. Search performance has also been improved with reduced impact of retried query failures, as well as enhancements to scalability and availability.
  • Migration Enhancements: Two new enhancements to migration features will bring greater efficiency to e-mail migrations.
    • Enhanced Management Capabilities: The new Exchange Online migration dashboard helps to improve administrative efficiency during a cutover Exchange migration, a staged Exchange migration, or an IMAP migration. Tenant administrators can schedule multiple migration batches, obtain migration status information for migration batches, view per user details, and see skipped items. Improved reporting and diagnostics tools provide an improved troubleshooting experience.
    • Number of Concurrent Migrations: Administrators can now use Exchange Management Shell to increase the value for a migration batch to as high as 50.
  • Exchange Hybrid Configuration Wizard: This wizard will help streamline the hybrid deployment process by simplifying configuration of features and services such as calendar and free/busy information sharing, mailbox moves, secure mail flow and Exchange Online Archiving. This feature is not included as part of the Exchange Online December 2011 Service Update, but will be available in December 2011 as part of the Exchange Server 2010 SP2 release.

We look forward to delivering these updates to you, and as always, let us know what you think!

Cheers,

Steve Chew
Technical Product Manager

Introducing the Hybrid Configuration Wizard

$
0
0

During the beta of Office 365 for Enterprises, we received great feedback from our customers and wanted to vastly simplify the process for configuring Exchange in a hybrid deployment with Office 365. We are introducing the Hybrid Configuration Wizard in Exchange 2010 Service Pack 2 to refine the deployment process as a result of that feedback.

What is the Hybrid Configuration Wizard?

The Hybrid Configuration Wizard consists of:

  1. A new Exchange Management Console (EMC) wizard that guides you through the end-to-end process for configuring a hybrid deployment.
  2. A set of Exchange Management Shell (EMS) cmdlets that orchestrate the configuration process (as always, the EMC executes these Shell cmdlets).
  3. Improvements to the manageability of some of the underlying hybrid features (no more exchangedelegation.contoso.com or service.contoso.com domains – Yay!)

What does it do?

The hybrid configuration cmdlets take inputs from the wizard, analyze the state of your existing on-premises and cloud organizations, and calculate the required steps to correctly configure both organizations correctly. You can learn more about this process here.

This friendly wizard replaces approximately 50 manual steps with just a few inputs and several clicks of your mouse. Here are some of the top tasks that the Hybrid Configuration Wizard will automatically verify and configure for you:

  1. Verifies that your on-premises and Office 365 organizations meet the prerequisites for a hybrid deployment.
  2. Provisions your on-premises Exchange federation trust.
  3. Creates mutual organization relationships between your on-premises and Exchange Online organizations.
  4. Modifies e-mail address policies to ensure that mailboxes can be moved successfully to Exchange Online in Office 365.
  5. Enables and configures free/busy calendar sharing, message tracking and MailTips for both your on-premises and Exchange Online organizations.
  6. Configures secure mail flow between your on-premises and Exchange Online organizations. You can even choose to have the wizard automatically configure Exchange Online organization to route mail through your on-premises Exchange organization to meet any additional business or compliance requirements.
  7. Enables support for Exchange Online Archiving for on-premises mailboxes for those customers that have chosen to include archiving in their Office 365 service plan.

Once the hybrid deployment configuration process is complete, the following features are available between your on-premises Exchange organization and Exchange Online:

FeatureDescription
Native mailbox move Online mailbox moves with automatic Outlook reconfiguration
Free/busy and calendar sharing Free/busy and calendar sharing between on-premises and Exchange Online mailboxes
Secure mail TLS-encrypted and authenticated mail flow between your on-premises and Exchange Online organizations
Exchange Online Archiving Provide unlimited cloud-based archive storage for your on-premises mailboxes
Message tracking Integrated message tracking logs across on-your on-premises and Exchange Online organizations
Multi-mailbox search Create a single search request that automatically queries both on-premises and Exchange Online mailboxes
Outlook Web App redirection Redirect OWA logons for users that have been moved to Exchange Online
Mailtips Ensures that MailTips are available for both your on-premises and Exchange Online organizations

If you've used the Exchange Server Deployment Assistant to configure a previous hybrid deployment, please note that we’re busy updating the current scenarios to provide guidance based on the automatic configuration process using the Hybrid Configuration Wizard. Watch this blog for announcements when the Deployment Assistant is updated.

With that in mind, we'll be retiring the manual hybrid deployment configuration guidance provided with SP1 and we strongly encourage you use the wizard wherever possible. Although we'll continue to support manually configured hybrid deployments, we believe that using the new wizard is the easiest, most reliable way of getting deployed and staying correctly configured.

Where can I learn more?

Interested in learning more about this great new tool? Check out Hybrid Deployments with the Hybrid Configuration Wizard on TechNet.

Download Exchange 2010 SP2 and check out the Hybrid Configuration Wizard for yourself.

Ben Appleby
Senior Program Manager

OWA Cross-Site Silent Redirection in Exchange 2010 SP2

$
0
0

By now, many of you have seen the articles that discussed Address Book Policies, hosting changes, and the Hybrid Configuration Wizard, but deep down I know you all have been hoping that we would discuss the most sought after feature that we decided to include in Exchange 2010 SP2.

What’s that you say?  Yes, I know, Tony said that the “new features in SP2 are unlikely to cause much of a fuss” and that there are not that many new features in SP2 (I believe his exact words were “relative paucity”) in his SP2 announcement article over at Windows IT Pro.

Well, I'm here to set the record straight. There's one killer feature in Exchange 2010 SP2, that's often not mentioned. That's right – it's time to discuss Cross-Site Silent Redirection for Outlook Web App!

Definitions

First, let’s go over some definitions to make sure we are all on the same page.

  • Internet-facing Active Directory Site An Active Directory site that contains CAS that have an ExternalURL populated for the associated service (like OWA). Typically this is the primary datacenter/site where Exchange 2010 is deployed.
  • Regional Internet-facing Active Directory Site An Active Directory site that contains CAS that have an ExternalURL populated for the associated service (like OWA).
  • Non-Internet-facing Active Directory Site An Active Directory site that contains CAS that do not have an ExternalURL populated for the associated service.
  • Direct Connect The process where CAS establishes an RPC session with the Mailbox server hosting the mailbox data.
  • Proxy The process where a CAS in an Internet-facing Active Directory site proxies incoming requests to a CAS in a Non-Internet-facing Active Directory site (that's located in the same site as the Mailbox server being accessed).
  • Redirection The process where an Internet-facing CAS in one Active Directory site redirects the end user to another Internet-facing CAS that resides in the same site as the Mailbox server being accessed.
  • Silent Redirection The process by which CAS issues a silent redirect back to the user’s browser, telling the browser to establish a connection to a specified URL.
  • Single Sign-On (SSO) Redirection The process by which CAS issues a silent redirect back to the user's browser, telling the browser to submit the request and authentication credentials to a target CAS so that the login experience is seamless.

OWA Connection Process

In order to understand the various proxy and redirection scenarios, it's important to understand the mechanics behind what happens when a user authenticates against a CAS to access OWA as it happens today with Exchange 2010 pre-SP2:

  1. User accesses OWA URL using web browser.
  2. User enters credentials.
  3. CAS authenticates user and retrieves the following information via service discovery request:
    1. User's mailbox version
    2. User's mailbox location (Active Directory site), if known
  4. CAS gathers additional information based on mailbox information so that it can perform the correct operation:
    1. If mailbox is Exchange 2010 and local, CAS performs direct connect.
    2. If mailbox is Exchange 2007 and local, CAS retrieves the ExternalURL of an Exchange 2007 CAS (if one isn't defined it'll use the InternalURL) and silently redirects.
    3. If mailbox is Exchange 2003, CAS retrieves Exchange2003URL and silently redirects.
    4. If mailbox is not local, CAS retrieves target ExternalURL (if defined) and redirects or proxies if no OWA ExternalURLs are defined in the target Active Directory site.

SP1 OWA Redirection Types

In Exchange 2010 SP1, we changed things slightly which resulted in three types of redirection experience for OWA in the on-premises product:

  • Manual Redirection
  • Temporary Manual Redirection
  • Legacy Silent Redirection

Manual Redirection

Manual redirection enables customers to not have to funnel and proxy all traffic from a central location when there are CAS closer to the user’s mailbox.

Manual redirections are performed when CAS must redirect an OWA request to Exchange 2007 or Exchange 2010 CAS infrastructure that's located in a different Active Directory site. As mentioned previously, in order for a manual redirection to be performed, the target OWA virtual directory must have an ExternalURL. Your users see the following manual redirection message and the ExternalURL of the CAS in the other Active Directory site:

1 
Figure1: Manual redirection when mailbox is located in another Active Directory site

 

Temporary Manual Redirection

In SP1, we added another redirection type for OWA, known as Temporary Manual Redirection. There are two scenarios where Temporary Manual Redirection comes into play:

  1. During a datacenter activation switchback event, there exists the possibility that the user’s web browser still has the incorrect DNS entry cached and thus is pointing to the CAS infrastructure in the Ative Directory site that no longer hosts the mailbox. As a result, the CAS will issue a manual redirect to the correct Active Directory site, but the redirection is to the same URL that the user is currently using. To prevent a ping-pong effect where the user cannot access his mail, CAS will detect if the same session cookie is being returned and if so, will check to see if the target CAS has a FailbackURL value for the OWA virtual directory. If a FailbackURLis specified, then CAS issues a temporary manual redirection page providing the FailbackURL link. If a FailbackURL is not specified, CAS issues an error page asking the user to close all browser sessions and to try again.

    3
    Figure 2: Temporary manual redirection upon datacenter activation switchback
  2. The second scenario is where CAS will issue the temporary manual redirection page when it detects that the local CAS's site matches that of the Mailbox databases's RpcClientAccessServer value, but the database is actually mounted in a different Active Directory site, so CAS issues a temporary redirect with the ExternalURLof the CAS in the site hosting the mounted database.

    2
    Figure 3: Temporary manual redirection when mailbox is mounted in another Active Directory site

Legacy Silent Redirection

For Outlook Web Access, Exchange 2010 CAS does not support rendering mailbox data from legacy versions of Exchange. Exchange 2010 CAS does one of four scenarios depending on the target mailbox's version and/or location:

  • If the Exchange 2007 mailbox is in the same Active Directory Site as CAS2010, CAS2010 will silently redirect the session to the Exchange 2007 CAS.
  • If the Exchange 2007 mailbox is in another Internet-facing Active Directory Site, CAS2010 will manually redirect the user to the Exchange 2007 CAS.
  • If the Exchange 2007 mailbox is in a non-Internet-facing Active Directory site, CAS2010 will proxy the connection to the Exchange 2007 CAS.
  • If the mailbox is on an Exchange 2003 server, CAS2010 will silently redirect the session to a pre-defined URL.

As indicated above, legacy silent redirection is only used for same-site redirection events between an Exchange 2010 CAS and the legacy infrastructure. When performing the legacy silent redirection, CAS2010 issues a silent redirect back to the user’s browser, telling the browser to establish a connection to legacy CAS2007/FE2003 infrastructure. In order to successfully redirect to the legacy infrastructure, the following must be configured:

  • To redirect Exchange 2003 mailboxes, the Exchange 2010 OWA virtual directory must have the Exchange2003URL populated.
  • To redirect to an Exchange 2007 CAS, the target Exchange 2007 OWA virtual directory must have the ExternalURL.

Legacy Silent Redirection can also provide a single sign-on experience when Forms-Based Authentication (FBA) is used on the source and destination OWA virtual directories by issuing back to the web browser a hidden FBA form with the fields populated. This hidden form contains the same information as what the user had originally submitted to CAS2010 FBA page (username, password, public/private selector) as well as, a redirect to the target Exchange specific path and query string. As soon as this form is loaded it is immediately submitted to the target URL. The result is the user is automatically authenticated and can access the mailbox data.

What’s wrong with Manual Redirection?

At first glance, you might think, “hey, manual redirection is great, Microsoft” and to some extent you are correct. It is a great feature for the IT organization to control where users access their data (and thus forcing users to utilize the correct network links). But in reality, the experience is not optimal for the end user. In the scenario where the user uses the wrong OWA URL, the user performs the following actions:

  1. User enters into the web browser the wrong URL.
  2. User enters credentials and authenticates against CAS (wrong site).
  3. CAS (wrong site) performs service discovery and determines that it can redirect user to the correct CAS.
  4. CAS (wrong site) provides the user with a page that contains a link to CAS (correct site).
  5. User clicks link to access OWA from the correct site.
  6. User enters credentials and authenticates against CAS (correct site).

It’s this experience where the user is told they used the wrong URL and that he has to enter his credentials twice that are the sub-optimal experiences with manual redirection.

Cross-Site Silent Redirection in Exchange 2010 SP2

To remove this sub-optimal experience (Greg refers to this as a crappy experience, by the way), we've provided a fourth redirection experience for OWA in Exchange 2010 SP2, known as Cross-Site Silent Redirection. As its name implies, Cross-Site Silent Redirection only performs silent redirection for requests that are destined to CAS located in another Active Directory site (within the same Exchange organization) that have an OWA ExternalURL.

A new parameter has been created to support Cross-Site Silent Redirection, CrossSiteRedirectType. This parameter is available on the Set-OWAVirtualDirectory cmdlet and supports two values, Manual and Silent. Cross-Site Silent Redirection is disabled by default (the default value is Manual), meaning that if you currently perform manual redirection between CAS in different Active Directory sites, it will continue after you deploy SP2.

If you want to enable Cross-Site Silent Redirection, set the CrossSiteRedirectType to Silent on the Internet-facing CAS OWA virtual directories:

Set-OWAVirtualDirectory -Identity "Contoso\owa (Default Web site)" -CrossSiteRedirectType Silent

We've updated the OWA connection process to support Cross-Site Silent Redirection. The CAS performs the following steps during service discovery:

  1. Evaluate the mailbox version (either Exchange 2007 or Exchange 2010).
  2. Check the mailbox's location.
  3. Obtain the ExternalURL of target CAS.
  4. Obtain the redirection type on the source CAS.
    1. If CrossSiteRedirectType=Manual, we issue a manual redirect.
    2. If CrossSiteRedirectType=Silent, we issue a silent redirect.
      1. If source and target CAS have FBA enabled, then the source CAS issues a hidden form back to the browser that contains the user’s credentials and FBA settings, along with the redirect URL.
      2. If FBA is not enabled on source and target, source CAS simply issues a 302 redirect.

That’s right; Cross-Site Silent Redirection can be a SSO experience when the source and target OWA virtual directories leverage Forms-Based Authentication. Customers that only deploy OWA internally can also achieve a SSO experience when the OWA virtual directory authentication mechanism is Windows Integrated Authentication and the OWA namespaces are added to the “Local Intranet” security zone.

When can I not obtain a SSO Experience?

These are the few scenarios where you can't obtain a SSO experience when redirecting between Active Directory sites:

  1. You use Basic Authentication on the source and target OWA virtual directories.
  2. You leverage different authentication settings on the source and target OWA virtual directories.
  3. You leverage a two-factor authentication solution on the source and target OWA virtual directories.
  4. You leverage a pre-authentication solution (like Microsoft Threat Management Gateway 2010) that uses different web listeners for the source and target OWA namespaces.

Keep in mind, that while the SSO experience will be unavailable for these scenarios, a 302 redirection (what we refer to as a silent redirection) will still occur.

Cross-Site Silent Redirection reduces the end user dissatisfaction around having to click a link to get to the correct OWA infrastructure, and may in fact remove the need to enter credentials a second time. For those of you that have been using OWA Manual Redirection up to now, I hope you will enable Cross-Site Silent Redirection when you deploy Exchange 2010 SP2!

Ross Smith IV
Principal Program Manager
Exchange Customer Experience

Database Maintenance in Exchange 2010

$
0
0

Over the last several months there has been significant chatter around what is background database maintenance and why is it important for Exchange 2010 databases. Hopefully this article will answer these questions.

What maintenance tasks need to be performed against the database?

The following tasks need to be routinely performed against Exchange databases:

Database Compaction

The primary purpose of database compaction is to free up unused space within the database file (however, it should be noted that this does not return that unused space to the file system). The intention is to free up pages in the database by compacting records onto the fewest number of pages possible, thus reducing the amount of I/O necessary. The ESE database engine does this by taking the database metadata, which is the information within the database that describes tables in the database, and for each table, visiting each page in the table, and attempting to move records onto logically ordered pages.

Maintaining a lean database file footprint is important for several reasons, including the following:

  1. Reducing the time associated with backing up the database file
  2. Maintaining a predictable database file size, which is important for server/storage sizing purposes.

Prior to Exchange 2010, database compaction operations were performed during the online maintenance window. This process produced random IO as it walked the database and re-ordered records across pages. This process was literally too good in previous versions – by freeing up database pages and re-ordering the records, the pages were always in a random order. Coupled with the store schema architecture, this meant that any request to pull a set of data (like downloading items within a folder) always resulted in random IO.

In Exchange 2010, database compaction was redesigned such that contiguity is preferred over space compaction. In addition, database compaction was moved out of the online maintenance window and is now a background process that runs continuously.

Database Defragmentation

Database defragmentation is new to Exchange 2010 and is also referred to as OLD v2 and B+ tree defragmentation. Its function is to compact as well as defragment (make sequential) database tables that have been marked/hinted as sequential. Database defragmentation is important to maintain efficient utilization of disk resources over time (make the IO more sequential as opposed to random) as well as to maintain the compactness of tables marked as sequential.

You can think of the database defragmentation process as a monitor that watches other database page operations to determine if there is work to do. It monitors all tables for free pages, and if a table gets to a threshold where a significant high percentage of the total B+ Tree page count is free, it gives the free pages back to the root. It also works to maintain contiguity within a table set with sequential space hints (a table created with a known sequential usage pattern). If database defragmentation sees a scan/pre-read on a sequential table and the records are not stored on sequential pages within the table, the process will defrag that section of the table, by moving all of the impacted pages to a new extent in the B+ tree. You can use the performance counters (mentioned in the monitoring section) to see how little work database defragmentation performs once a steady state is reached.

Database defragmentation is a background process that analyzes the database continuously as operations are performed, and then triggers asynchronous work when necessary. Database defragmentation is throttled under two scenarios:

  1. The max number of outstanding tasks This keeps database defragmentation from doing too much work the first pass if massive change has occurred in the database.
  2. A latency throttle of 100ms When the system is overloaded, database defragmentation will start punting defragmentation work. Punted work will get executed the next time the database goes through that same operational pattern. There's nothing that remembers what defragmentation work was punted and goes back and executes it once the system has more resources.

Database Checksumming

Database checksumming (also known as Online Database Scanning) is the process where the database is read in large chunks and each page is checksummed (checked for physical page corruption). Checksumming’s primary purpose is to detect physical corruption and lost flushes that may not be getting detected by transactional operations (stale pages).

With Exchange 2007 RTM and all previous versions, checksumming operations happened during the backup process. This posed a problem for replicated databases, as the only copy to be checksummed was the copy being backed up. For the scenario where the passive copy was being backed up, this meant that the active copy was not being checksummed. So in Exchange 2007 SP1, we introduced a new optional online maintenance task, Online Maintenance Checksum (for more information, see Exchange 2007 SP1 ESE Changes – Part 2).

In Exchange 2010, database scanning checksums the database and performs post Exchange 2010 Store crash operations. Space can be leaked due to crashes, and online database scanning finds and recovers lost space. Database checksum reads approximately 5 MB per second for each actively scanning database (both active and passive copies) using 256KB IOs. The I/O is 100 percent sequential. The system in Exchange 2010 is designed with the expectation that every database is fully scanned once every seven days.

If the scan takes longer than seven days, an event is recorded in the Application Log :

Event ID: 733
Event Type: Information
Event Source: ESE
Description: Information Store (15964) MDB01: Online Maintenance Database Checksumming background task is NOT finishing on time for database 'd:\mdb\mdb01.edb'. This pass started on 11/10/2011 and has been running for 604800 seconds (over 7 days) so far.

If it takes longer than seven days to complete the scan on the active database copy, the following entry will be recorded in the Application Log once the scan has completed:

Event ID: 735
Event Type: Information
Event Source: ESE
Description: Information Store (15964) MDB01 Database Maintenance has completed a full pass on database 'd:\mdb\mdb01.edb'. This pass started on 11/10/2011 and ran for a total of 777600 seconds. This database maintenance task exceeded the 7 day maintenance completion threshold. One or more of the following actions should be taken: increase the IO performance/throughput of the volume hosting the database, reduce the database size, and/or reduce non-database maintenance IO.

In addition, an in-flight warning will also be recorded in the Application Log when it takes longer than 7 days to complete.

In Exchange 2010, there are now two modes to run database checksumming on active database copies:

  1. Run in the background 24×7 This is the default behavior. It should be used for all databases, especially for databases that are larger than 1TB. Exchange scans the database no more than once per day. This read I/O is 100 percent sequential (which makes it easy on the disk) and equates to a scanning rate of about 5 megabytes (MB)/sec on most systems. The scanning process is single threaded and is throttled by IO latency. The higher the latency, the more database checksum slows down because it is waiting longer for the last batch to complete before issuing another batch scan of pages (8 pages are read at a time).
  2. Run in the scheduled mailbox database maintenance process When you select this option, database checksumming is the last task. You can configure how long it runs by changing the mailbox database maintenance schedule. This option should only be used with databases smaller than 1 terabyte (TB) in size, which require less time to complete a full scan.

Regardless of the database size, our recommendation is to leverage the default behavior and not configure database checksum operations against the active database as a scheduled process (i.e., don’t configure it as a process within the online maintenance window).

For passive database copies, database checksums occur during runtime, continuously operating in the background.

Page Patching

Page patching is the process where corrupt pages are replaced by healthy copies. As mentioned previously, corrupt page detection is a function of database checksumming (in addition, corrupt pages are also detected at run time when the page is stored in the database cache). Page patching works against highly available (HA) database copies. How a corrupt page is repaired depends on whether the HA database copy is active or passive.

Page patching process

On active database copies On passive database copies
  1. A corrupt page(s) is detected.
  2. A marker is written into the active log file. This marker indicates the corrupt page number and that page requires replacement.
  3. An entry is added to the page patch request list.
  4. The active log file is closed.
  5. The Replication service ships the log file to passive database copies.
  6. The Replication service on a target Mailbox server receives the shipped log file and inspects it.
  7. The Information Store on the target server replays the log file and replays up to marker, retrieves its healthy version of the page, invokes Replay Service callback and ships the page to the source Mailbox server.
  8. The source Mailbox server receives the healthy version of the page, confirms that there is an entry in the page patch request list, then writes the page to the log buffer, and correspondingly, the page is inserted into the database cache.
  9. The corresponding entry in the page patch request list is removed.
  10. At this point the database is considered patched (at some later point the checkpoint will advance and the database cache will be flushed and the corrupt page on disk will be overwritten).
  11. Any other copy of this page (received from another passive copy) will be silently dropped, because there is no corresponding entry in the page patch request list.
  1. On the Mailbox server where the corrupt page(s) is detected, log replay is paused for the affected database copy.
  2. The replication service coordinates with the Mailbox server that is hosting the active database copy and retrieves the corrupted page(s) and the required log range from the active copy’s database header.
  3. The Mailbox server updates the database header for the affected database copy, inserting the new required log range.
  4. The Mailbox server notifies the Mailbox server hosting the active database copy which log files it requires.
  5. The Mailbox server receives the required log files and inspects them.
  6. The Mailbox server injects the healthy versions of the database pages it retrieved from the active database copy. The pages are written to the log buffer, and correspondingly, the page is inserted into the database cache.
  7. The Mailbox server resumes log replay.

Page Zeroing

Database Page Zeroing is the process where deleted pages in the database are written over with a pattern (zeroed) as a security measure, which makes discovering the data much more difficult.

With Exchange 2007 RTM and all previous versions, page zeroing operations happened during the streaming backup process. In addition since they occurred during the streaming backup process they were not a logged operation (e.g., page zeroing did not result in the generation of log files). This posed a problem for replicated databases, as the passive copies never had its pages zeroed, and the active copies would only have it pages zeroed if you performed a streaming backup. So in Exchange 2007 SP1, we introduced a new optional online maintenance task, Zero Database Pages during Checksum (for more information, see Exchange 2007 SP1 ESE Changes – Part 2). When enabled this task would zero out pages during the Online Maintenance Window, logging the changes, which would be replicated to the passive copies.

With the Exchange 2007 SP1 implementation, there is significant lag between when a page is deleted to when it is zeroed as a result of the zeroing process occurring during a scheduled maintenance window. So in Exchange 2010 SP1, the page zeroing task is now a runtime event that operates continuously, zeroing out pages typically at transaction time when a hard delete occurs.

In addition, database pages can also be scrubbed during the online checksum process. The pages targeted in this case are:

  • Deleted records which couldn’t be scrubbed during runtime due to dropped tasks (if the system is too overloaded) or because Store crashed before the tasks got to scrub the data;
  • Deleted tables and secondary indices. When these get deleted, we don’t actively scrub their contents, so online checksum detects that these pages don’t belong to any valid object anymore and scrubs them.

For more information on page zeroing in Exchange 2010, see Understanding Exchange 2010 Page Zeroing.

Why aren’t these tasks simply performed during a scheduled maintenance window?

Requiring a scheduled maintenance window for page zeroing, database defragmentation, database compaction, and online checksum operations poses significant problems, including the following:

  1. Having scheduled maintenance operations makes it very difficult to manage 24x7 datacenters which host mailboxes from various time zones and have little or no time for a scheduled maintenance window. Database compaction in prior versions of Exchange had no throttling mechanisms and since the IO is predominantly random, it can lead to poor user experience.
  2. Exchange 2010 Mailbox databases deployed on lower tier storage (e.g., 7.2K SATA/SAS) have a reduced effective IO bandwidth available to ESE to perform maintenance window tasks. This is an issue because it means that IO latencies will increase during the maintenance window, thus preventing the maintenance activities to complete within a desired period of time.
  3. The use of JBOD provides an additional challenge to the database in terms of data verification. With RAID storage, it's common for an array controller to background scan a given disk group, locating and re-assigning bad blocks. A bad block (aka sector) is a block on a disk that cannot be used due to permanent damage (e.g. physical damage inflicted on the disk particles). It's also common for an array controller to read the alternate mirrored disk if a bad block was detected on the initial read request. The array controller will subsequently mark the bad block as “bad” and write the data to a new block. All of this occurs without the application knowing, perhaps with just a slight increase in the disk read latency. Without RAID or an array controller, both of these bad block detection and remediation methods are no longer available. Without RAID, it's up to the application (ESE) to detect bad blocks and remediate (i.e., database checksumming).
  4. Larger databases on larger disks require longer maintenance periods to maintain database sequentiality/compactness.

Due to the aforementioned issues, it was critical in Exchange 2010 that the database maintenance tasks be moved out of a scheduled process and be performed during runtime continuously in the background.

Won’t these background tasks impact my end users?

We’ve designed these background tasks such that they're automatically throttled based on activity occurring against the database. In addition, our sizing guidance around message profiles takes these maintenance tasks into account. However, you must take care when designing your storage architecture. If you plan to store multiple databases on the same LUN or volume, ensure that the aggregate size of all the databases does not exceed 2 TB. This is because database maintenance is throttled by serializing based on the number of databases/volume and assumes that the aggregate size is not greater than 2 TB.

How can I monitor the effectiveness of these background maintenance tasks?

In previous versions of Exchange, events in the Application Log would be used to monitor things like online defragmentation. In Exchange 2010, there are no longer any events recorded for the defragmentation and compaction maintenance tasks. However, you can use performance counters to track the background maintenance tasks under the MSExchange Database ==> Instances object:

Counter Description
Database Maintenance Duration The number of hours that have passed since the maintenance last completed for this database
Database Maintenance Pages Bad Checksums The number of non-correctable page checksums encountered during a database maintenance pass
Defragmentation Tasks The count of background database defragmentation tasks that are currently executing
Defragmentation Tasks Completed/Sec The rate of background database defragmentation tasks that are being completed

You'll find the following page zeroing counters under the MSExchange Database object:

Counter Description
Database Maintenance Pages Zeroed Indicates the number of pages zeroed by the database engine since the performance counter was invoked
Database Maintenance Pages Zeroed/sec Indicates the rate at which pages are zeroed by the database engine

How can I check whitepace in a database?

You can use the Shell to check available whitespace in a database. For mailbox databases, use:

Get-MailboxDatabase MDB1 -Status | FL AvailableNewMailboxSpace

For Public Folder databases, use:

Get-PublicFolderDatabase PFDB1 –Status | FL AvailableNewMailboxSpace

How can I reclaim the whitespace?

Naturally, after seeing the available whitespace in the database, the question that always ensues is – how can I reclaim the whitespace?

Many assume the answer is to perform an offline defragmentation of the database using ESEUTIL. However, that's not our recommendation. When you perform an offline defragmentation you create an entirely brand new database and the operations performed to create this new database are not logged in transaction logs. The new database also has a new database signature, which means that you invalidate the database copies associated with this database.

In the event that you do encounter a database that has significant whitespace and you don't expect that normal operations will reclaim it, our recommendation is:

  1. Create a new database and associated database copies.
  2. Move all mailboxes to the new database.
  3. Delete the original database and its associated database copies.

A terminology confusion

Much of the confusion lies in the term background database maintenance. Collectively, all of the aforementioned tasks make up background database maintenance. However, the Shell, EMC, and JetStress all refer to database checksumming as background database maintenance, and that's what you're configuring when you enable or disable it using these tools.


Figure 1: Enabling background database maintenance for a database using EMC

Enabling background database maintenance using the Shell:

Set-MailboxDatabase -Identity MDB1 -BackgroundDatabaseMaintenance $true


Figure 2: Running background database maintenance as part of a JetStress test

My storage vendor has recommended I disable Database Checksumming as a background maintenance task, what should I do?

Database checksumming can become an IO tax burden if the storage is not designed correctly (even though it's sequential) as it performs 256K read IOs and generates roughly 5MB/s per database.

As part of our storage guidance, we recommend you configure your storage array stripe size (the size of stripes written to each disk in an array; also referred to as block size) to be 256KB or larger.

It's also important to test your storage with JetStress and ensure that the database checksum operation is included in the test pass.

In the end, if a JetStress execution fails due to database checksumming, you have a few options:

  1. Don’t use striping  Use RAID-1 pairs or JBOD (which may require architectural changes) and get the most benefit from sequential IO patterns available in Exchange 2010.
  2. Schedule it  Configure database checksumming to not be a background process, but a scheduled process. When we implemented database checksum as a background process, we understood that some storage arrays would be so optimized for random IO (or had bandwidth limitations) that they wouldn't handle the sequential read IO well. That's why we built it so it could be turned off (which moves the checksum operation to the maintenance window).

    If you do this, we do recommend smaller database sizes. Also keep in mind that the passive copies will still perform database checksum as a background process, so you still need to account for this throughput in our storage architecture. For more information on this subject see Jetstress 2010 and Background Database Maintenance.

  3. Use different storage or improve the capabilities of the storage  Choose storage which is capable of meeting Exchange best practices (256KB+ stripe size).

Conclusion

The architectural changes to the database engine in Exchange Server 2010 dramatically improve its performance and robustness, but change the behavior of database maintenance tasks from previous versions. Hopefully this article helps your understanding of what is background database maintenance in Exchange 2010.

Ross Smith IV
Principal Program Manager
Exchange Customer Experience

EHLO Blog goes International

Geek Out with Perry in 2012

$
0
0

Happy New Year! Kick off 2012 by Geeking out with Perry! In Perry’s recent blog and latest video installment, Perry answers a frequently asked question about our service, “How can you provide large mailboxes at such a low cost?” Perry has discussed large mailbox and storage efficiencies in the past but in this recent blog, he addresses other topics including Time Averaging and Efficient Data Centers, which also help our efforts to be greener

Haven’t had time to Geek Out enough last year but you resolve to do more this year? Catch up on your Geek Out with Perry videos by viewing a playlist here.

Please let us know if you have other questions and topics you’d like Perry to geek out on so we can get to more geeking out than ever via videos and blogs in 2012.

Cheers,

Ann Vu

Microsoft Security Bulletin MS11-100 and Exchange Server

$
0
0

On December 29th, Microsoft released Security Bulletin MS11-100 to address a publicly disclosed vulnerability and three privately reported vulnerabilities in Microsoft .NET Framework. For details about the vulnerabilities, affected software and update information, see MS11-100 Vulnerabilities in .NET Framework Could Allow Elevation of Privilege (2638420).

We have completed testing of the security updates on Exchange 2010, Exchange 2007 and Exchange 2003 servers running on the corresponding supported versions of Windows Server – Windows 2008 R2, Windows 2008 and Windows 2003.

We recommend that customers apply the corresponding security update for Windows Server (listed in the security bulletin) on their Exchange 2010, Exchange 2007 and Exchange 2003 servers.

Bharat Suneja


Released: Migrating Exchange from HMC 4.5 to Exchange Server 2010 SP2 whitepaper

$
0
0

The download has been taken offline to incorporate a couple of edits. We'll update this post when it's available again. Sorry for the inconvenience!

To follow on from the recent blog post where I covered changes to hosting scenarios in Exchange Server 2010 SP2, we have been working on some documents to help our hosting customers migrate to SP2. The first of those is for those customers coming from the Microsoft Solution for Hosted Messaging and Collaboration (HMC) 4.5. We have just published a paper and a set of scripts to help you with migration.

Check out Migrating Exchange from HMC 4.5 to Exchange Server 2010 SP2. It contains a white paper and PowerShell scripts. Together they provide the recommended and supported migration path from HMC 4.5 to Exchange 2010 SP2. The steps in the guide may also be helpful when migrating from non-HMC environments that have configured some form of multi-tenancy.

Coming soon will be a guide to help you migrate from Exchange /hosting mode to Exchange 2010 SP2 installed without the /hosting switch.

I hope this helps you with your plans to migrate to Exchange Server 2010 SP2.

Greg Taylor

Custom (aka. Extension) attributes in Exchange 2010 SP2 and their use

$
0
0

Some of sharper readers of our documentation talking about schema changes that Exchange makes (see Exchange Server Active Directory Schema Changes Reference, November 2011) have noticed that in Exchange 2010 SP2, we have added several things that sound very related to what’s traditionally known as “custom attributes” in Exchange. Specifically:

For object class ms-Exch-Custom-Attributes we added:

  • ms-Exch-Extension-Attribute-16 to 45
  • ms-exch-extension-custom-attribute-1 to 5

There have been some questions regarding this; namely – are all of those for you to use? Does this mean that you now have all of those attributes to modify to your heart’s content? What’s the difference between all those things anyway?

Here’s the scoop:

For a while now, Exchange provides 15 custom attributes. Those are still there and you are free to use them as you used them before. They are known as CustomAttribute1 to 15 (or can also be referred to as ms-Exch-Extension-Attribute1 to 15). For more on those, please see this. So nothing has changed with those.

New! In Exchange 2010 SP2, we have added five new multi-value custom attributes that you can use to store information for mail recipient objects. They are the ExtensionCustomAttribute1 to 5 (also can be referred to as ms-exch-extension-custom-attribute-1 to 5). For the list of CMDlets that support those, please see this.

New! Finally, we have also added ms-Exch-Extension-Attribute-16 to 45. Those are not exposed to various CMDlets and Exchange management UI, because they were added for future use. As such, we cannot recommend that you use non-Exchange tools to edit their values because we might use those attributes in the future for various Exchange features. If and when we add management tools access to them, we will definitely let you know!

Nino Bilic

Recurring Meeting Requests with Conflicting Instances 2: The Power of Delegates

$
0
0

The key takeaway from my last post on this topic was that the Resource Booking Assistant never allows double booking of a resource room calendar as a result of a recurring meeting request (please see Automatic Processing of Recurring Meeting Requests with Conflicting Instances).

Since there are times that an administrator may want to allow double-booking, we offered two workarounds that I’d like to address in a bit more depth. I'd also like to offer a third one that wasn’t mentioned before:

1) Send Follow-Up Nonrecurring Meeting Requests to Double Book

Recall that if a recurring meeting series is accepted individual conflict notifications will be emailed to the organizer in addition to the acceptance email for the series. The organizer can use those declined-instance emails as a reference for following the first of our workarounds, which would be to send additional non-recurring meeting requests to double-book the intended resource room for each declined instance.

This method, though laborious, allows fine control over when a resource is double booked and when not.

Suppose on the other hand an administrator follows the second workaround, and hands control to a trusted delegate instead of the Resource Booking Assistant? A delegate has the human discretion to allow all recurring meeting conflicts to double book by accepting an entire recurring meeting series. It also turns out a delegate can selectively decline any number of conflicting instances while accepting the series, something the assistant cannot. The question came up about how exactly they can do this, so let’s take a look:

2) Allow a Delegate to Double Book Resources

The request policy on a resource mailbox can be configured to require delegate control over resolving recurring meeting request instance conflicts. But how exactly do they use that power? What might the process look like, and what tools can they use carry it out? The best functionality for this is in Outlook 2010.

Let’s go to an example. Say we have a resource room, called Green Room, which is managed by a delegate named Howard. As meeting requests for the Green Room come in Howard accepts them for the room calendar. Presently there’s a meeting scheduled for 2PM on Wednesday, and another for 3PM on the following Thursday.

Now a new recurring meeting request with the Green Room as the room resource goes out to several recipients. The room’s request policy requires Howard to approve all meeting requests, so this new one gets forwarded to him. We see that the recurring meeting request is for four instances, Tuesday through Thursday, from 2:30PM - 3:30PM each day. Outlook helpfully points out (highlighted in yellow and also below the Calendar Preview) that two of the four instances conflict with the existing appointments:

image

If Howard wishes to accept the entire series, and allow double booking he can just accept the whole thing. But what if he wants to decline one conflict, but allow the other? Howard can click on the arrow next to “Conflicts: 2” and get a preview of each area of the calendar where conflicts overlap with an existing appointment.

He does so, and sees the first conflict is with the Wednesday, 2PM Catalog Review meeting:

image

The second is with the Sales Presentation meeting on Thursday:

image

Suppose Howard wishes to decline the double booking on Thursday, but let the Wednesday conflict get booked? To decline the Thursday instance he can simply double-click on the item in the Calendar Preview section of the forwarded meeting request.

That action will open up a view for that time from the Green Room’s calendar. Howard can then right click on the instance that he wishes to decline, go to the Decline menu item, then select an option to decline just this occurrence:

image

Now that Howard has determined which instance to decline and which to allow he can simply go back to the original forwarded meeting request, and accept the series. This will accept all the remaining instances while preserving the manually declined instances:

image

So to summarize a delegate's power in this area, they can use the conflict notifications provided in Outlook 2010 to quickly decline (or accept) individual occurrences of a recurring meeting request.

3) Send a Series Update Without Changing Details

There is a third known way of working around the Resource Booking Assistant’s refusal to double book a room due to a recurring meeting request. Thanks to feedback from a customer relayed to me by my colleague Patriciu Seliceanu, we now know that a meeting organizer can simply send an update for a recurring meeting but without changing any details. The amazing result is that conflict instances declined before are then accepted. This workaround of course requires that conflicts be allowed for single-instance meeting requests, which by default is enabled in virtue of the AllowConflicts attribute set to "True" for calendar processing settings.

The update to the recurring meeting works because each recipient receives not another recurring meeting request, but an update meeting request (with no actual changes) for each individual instance. Since Exchange sees much the same as the single-instance meeting requests from workaround method 1 above it allows double-booking for each updated instance. This may save a bit of labor over the steps required in item 1 of this list, such as when the number of conflicts is high, and assuming every conflict should be double booked.

In conclusion, there are a number of ways to work around the safety mechanism inherent in the Resource Booking Assistant which prevents recurring meeting request from double-booking a resource mailbox. The most robust and powerful of these is the intrepid delegate with Outlook 2010 at their fingertips and Exchange 2010 at the ready. Please note though that for the majority of cases, you should not even have to worry about doing this, as in most cases, the default behavior is the correct one.

Thanks to Tom Kern for his help and counsel, and to Patriciu Seliceanu for method 3.

Jesse Tedoff

Released: Update Rollup 6 for Exchange 2007 Service Pack 3

$
0
0

Earlier today the Exchange CXP team released Update Rollup 6 for Exchange Server 2007 SP3 to the Download Center.

Note: The post title erroneously referred to Update Rollup 3. It has been updated to reflect the correct rollup number.

This update contains a number of customer-reported and internally found issues since the release of RU5. See KB 2608656: Description of Update Rollup 6 for Exchange Server 2007 Service Pack 3' for more details.

We would like to specifically call out the following fixes which are included in this release:

  • DST Cadence Release for Dec 2011 - Exchange 2007
  • 22656040 An Exchange Server 2007 Client Access server may respond slowly or stop responding when users try to synchronize the Exchange ActiveSync devices with their mailboxes
  • 2498852 "0x80041606" error message when you perform a prefix search by using Outlook in online mode in an Exchange Server 2007 environment
  • 22653334 The reseed process is unsuccessful on the SCR passive node when the circular logging feature is enabled in an Exchange Server 2007 environment
  • 22617784 Journal reports are expired or lost when the Microsoft Exchange Transport service is restarted in an Exchange Server 2007 environment
  • 2289607 The week numbers displayed in OWA do not match the week numbers displayed in Outlook for English users and French users in an Exchange Server 2007 environment

General Notes:

For DST Changes: http://www.microsoft.com/time.

Note for Forefront Protection for Exchange users  For those of you running Forefront Protection for Exchange, be sure you perform these important steps from the command line in the Forefront directory before and after this rollup's installation process. Without these steps, Exchange services for Information Store and Transport will not start after you apply this update. Before installing the update, disable ForeFront by using this command: fscutility /disable. After installing the update, re-enable ForeFront by running fscutility /enable.

Exchange Team

.PST, Time to Walk the Plank

$
0
0

Ask and ye shall receive, mateys!

As we announced in July, we are always looking for new ways to make your work easier - especially when your work involves ending PST proliferation. Today, we are happy to announce that PST Capture is now available as a free download.

PST Capture helps you search your network to discover and then import .pst files across your environment - all from a straightforward admin-driven tool. PST Capture will help reduce risk while increasing productivity for your users by importing .pst files into Exchange Online or Exchange Server 2010 - directly into users' primary mailboxes or archives.

In addition to all the positive feedback you have given us regarding the Archiving, Retention, Legal Hold and Discovery capabilities of Exchange, you made it clear that PST import is an important area for us to focus on moving forward. As we looked at the best ways to address this challenging need, we saw the great work that ISV partner, Red Gate, has done with their stellar solution. We determined that acquiring this product from Red Gate as a starting point was the best strategy to ensuring a quality product for you.

We put Red Gate’s tool through further feature development and a rigorous testing process that included beta testing with customers, passing through our internal product security gates, and overall quality assurance. It’s now ready for prime time and available as a free download here! For even more insight, watch the video below

And thus, we offer you PST Captarrrrrrrrrgh - or PST Capture, for those more refined than I.

As always, keep the feedback coming!

Ankur Kothari

Red Gate creates ingeniously simple software tools used by more than 500,000 IT professionals worldwide. The company works to uplift the market it serves through free web community sites, technical publications and conference sponsorships that reach millions annually.

Viewing all 607 articles
Browse latest View live




Latest Images