Scale Apps in Azure App Service

Examine autoscale factors

Autoscaling is a cloud system or process that adjusts available resources based on the current demand. Autoscaling performs scaling in and out, as opposed to scaling up and down. Autoscaling responds to changes in the environment by adding or removing web servers and balancing the load between them. Autoscaling doesn’t have any effect on the CPU power, memory, or storage capacity of the web servers powering the app, it only changes the number of web servers. Autoscaling makes its decisions based on rules that you define. A rule specifies the threshold for a metric, and triggers an autoscale event when this threshold is crossed. Autoscaling can also deallocate resources when the workload has diminished. Can not autoscale resource heavy tasks, and shouldn’t be a long term solution if there is a long-term growth.

Identify autoscale factors

Autoscaling and the App Service Plan

Autoscaling is a feature of the App Service Plan used by the web app. When the web app scales out, Azure starts new instances of the hardware defined by the App Service Plan to the app. To prevent runaway autoscaling, an App Service Plan has an instance limit. Plans in more expensive pricing tiers have a higher limit. Autoscaling cannot create more instances than this limit.

Autoscale conditions

You indicate how to autoscale by creating autoscale conditions. Azure provides two options for autoscaling:

  • Scale based on a metric, such as the length of the disk queue, or the number of HTTP requests awaiting processing.
  • Scale to a specific instance count according to a schedule. For example, you can arrange to scale out at a particular time of day, or on a specific date or day of the week. You also specify an end date, and the system will scale back in at this time.

Metrics for autoscale rules

CPU Percentage, Memory Percentage, Disk Queue Length, Http Queue Length, Data In, Data Out - You can also scale based on metrics for other Azure services. For example, if the web app processes requests received from a Service Bus Queue, you might want to spin up additional instances of a web app if the number of items held in an Azure Service Bus Queue exceeds a critical length.

How an autoscale rule analyzes metrics

  • Time grain aggregates the values retrieved for a metric for all instances across a period of time, usually one minute
  • Duration takes the aggregates over a longer period of time and is at least 5 minutes They can be configured such that duration takes the maximum of the last ten time grains.

Autoscale actions

A scale-out action increases the number of instances, and a scale-in action reduces the instance count. An autoscale action can also set the instance count to a specific level, rather than incrementing or decrementing the number available. An autoscale action has a cool down period, specified in minutes.

Combining autoscale rules

A single autoscale condition can contain several autoscale rules (for example, a scale-out rule and the corresponding scale-in rule). However, the autoscale rules in an autoscale condition don’t have to be directly related.

Enable autoscale in App Service

By default, an App Service Plan only implements manual scaling. Selecting Custom autoscale reveals condition groups you can use to manage your scale settings.
Once you enable autoscaling, you can edit the automatically created default scale condition, and you can add your own custom scale conditions. Remember that each scale condition can either scale based on a metric, or scale to a specific instance count.
The Default scale condition is executed when none of the other scale conditions are active.

Explore autoscale best practices

  • Ensure the maximum and minimum values are different and have an adequate margin between them
  • For diagnostics metrics, you can choose among Average, Minimum, Maximum and Total as a metric to scale by
  • Estimation during a scale-in is intended to avoid “flapping” situations, where scale-in and scale-out actions continually go back and forth
  • On scale-out, autoscale runs if any rule is met. On scale-in, autoscale require all rules to be met
  • Always select a safe default instance count, because autoscale scales your service to that count when metrics are not available
  • You can also use an Activity Log alert to monitor the health of the autoscale engine and get notifications