Scaling
Resize your service and tune autoscaling behavior.
Web services on Appentic autoscale between a minimum and maximum number of instances based on request concurrency. On top of that, you can resize the underlying machine for more CPU and memory. Between the two, you can dial in the right cost/performance trade-off for each service without changing any code.
Resizing a service
From Settings → Machine, pick a new memory/CPU tier. Appentic rolls the change out as a new deploy, and your service stays available throughout: the new revision boots, passes its health check, then takes traffic while the old one drains. There's no downtime window you have to coordinate.
Resize up when your metrics show the service is CPU-bound or running close to its memory ceiling. Resize down when you're paying for headroom you never use.
Autoscaling
Each service has a minimum and maximum instance count. Appentic adds instances when request concurrency rises and removes them when it falls. For most apps the defaults (min 0, max 5) are fine.
Two numbers matter:
- Min = 0. The service scales to zero when there's no traffic, which eliminates idle cost. The trade-off is a cold start on the first request after an idle period.
- Min = 1. Keeps a single instance warm so there are no cold starts. This is usually the right choice for anything user-facing where a 200ms delay on an otherwise quiet Sunday morning would be noticeable.
Set min to 1 for customer-facing services, webhook receivers, and anything on a hot path. Keep min at 0 for internal tools, dev environments, and low-traffic admin panels.
Concurrency
The concurrency setting controls how many in-flight requests a single instance can handle before Appentic scales up. The default is 80, which suits most Node.js, Python, and Ruby web frameworks that use non-blocking I/O.
Lower the concurrency for CPU-bound workloads (image processing, PDF generation, crypto work) because a single instance can only keep a few requests busy at once before they start queuing. Higher the concurrency for IO-bound workloads if your runtime and connection pools can handle it.