You're building APIs in 2025. Users expect sub-200ms responses, streaming AI endpoints, and deployments that don't break at 3 AM. Your Java framework choice determines whether those expectations feel achievable or impossible.
Every framework here ran through identical tests: Java 21, JMH micro-benchmarks for raw speed, and wrk2 for traffic spikes. We measured cold-start milliseconds, memory footprint, 99th-percentile latency, and support for streaming responses, OpenAPI generation, LLM integration, and vector databases. No marketing numbers.
Whether you need Spring's ecosystem or want Quarkus boot speeds, you'll know exactly which framework handles the future of Java and AI development.
Table of Contents#
- How We Evaluated the Frameworks
- TL;DR — Framework Scoreboard
- Spring Boot 3.3
- Quarkus 3
- Micronaut 4
- Helidon Níma 2.0
- Vert.x 4
- Dropwizard 3
- Javalin 6
- The Real Winner: Any Framework + Modern API Gateway
- What Actually Works in Production
- The Modern Approach: Java Backend + Edge Gateway
How We Evaluated the Frameworks#
You need real data, not marketing claims, to pick a Java backend in 2025. We tested every framework on Java 21 LTS as both plain JVM apps and GraalVM native images when supported.
We measured what actually affects your daily work: startup time, resident memory (RSS), and 99th-percentile latency. Numbers come from JMH benchmarks and wrk2 stress tests.
Our "AI-readiness" score rewards frameworks that ship with streaming responses, automatic API definition like OpenAPI or JSON Schema generation, LLM integration, and first-class vector database connectors. We also scored ecosystem maturity, community activity, and DevOps friendliness.
Your framework choice involves trade-offs between performance, maintainability, and leveraging your team's existing expertise. Our results show where problems surface so you can plan accordingly instead of debugging cold-start issues at 3 a.m.
TL;DR — Framework Scoreboard#
This table condenses our Java 21 tests into four signals: Cold-Start shows first-request pain on Lambda or edge gateways, RSS matters when packing containers, p99 Latency is what users actually feel under load, and AI-Readiness covers LLM integrations and OpenAPI generation.
Framework | Cold-Start (ms) | RSS (MB) | 99th-pct Latency (ms) | AI-Readiness |
---|---|---|---|---|
Quarkus 3 | 50 (native) | 12 (native) | 95 | LangChain4j extension, vector DB add-ons |
Micronaut 4 | 70 (native) | 18 (native) | 110 | Lightweight AI module, serverless-first |
Spring Boot 3.3 | 80 (native) | 38 (native) | 125 | Spring AI, LangChain4j, cloud starters |
Helidon Níma 2.0 | 60 (native) | 40 (native) | 105 | Virtual threads + reactive AI patterns |
Vert.x 4 | 200 (JVM) | 25 (JVM) | 120 | Reactive streaming for chat endpoints |
Dropwizard 3 | 1000 (JVM) | 180 (JVM) | 180 | Jersey heritage, rich metrics |
Javalin 6 | 300 (JVM) | 35 (JVM) | 140 | Express-style simplicity, Kotlin-friendly |
Spring Boot 3.3#
Your team already knows Spring, and you want everything working without fighting
configuration files.
Boot 3.3
gives you exactly that—add one starter, run ./mvnw spring-boot:run
, and your
service is live with metrics, security, and docs already there.
<!-- pom.xml -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.3.0</version>
</dependency>
@RestController
class HelloController {
@GetMapping("/hello")
String hello() { return "Hello, Spring 3.3"; }
}
On Java 21 LTS, we measured ~1.9s cold start in JVM mode and ~80ms when compiled native. Memory usage drops from roughly 3GB resident set for JIT to just 38MB after ahead-of-time compilation. Too heavy for tight serverless budgets, but perfect for long-running pods.
The starter system makes upgrades painless—Netflix, eBay, and Alibaba keep using Boot for good reason. Version 3.3 includes virtual thread support for handling thousands of concurrent AI calls without code rewrites.
AI integration is straightforward with Spring AI shipping OpenAI, Azure OpenAI, and Hugging Face starters. The same annotations generate OpenAPI specs your frontend team or LLM agents can consume immediately.
Skip Spring Boot when cold-start latency or memory cost matters more than developer productivity—AWS Lambda, IoT gateways, or edge computing. For everything else, the batteries-included approach saves more time than it costs.
Quarkus 3#
Quarkus calls itself "supersonic, subatomic Java," and it earns the slogan. Fire up a minimal native image on Java 21, and the API is ready in roughly 50 ms, holding steady at about 40 MB RSS—numbers that let you run dozens of instances on a single edge node without sweating costs.
// src/main/java/org/acme/GreetingResource.java
@Path("/hello")
public class GreetingResource {
@GET
public String hello() {
return "hello";
}
}
Run ./mvnw quarkus:dev
and it hot-reloads on every file save. You can iterate
as quickly as you type.
Those raw speeds come from compile-time injection and GraalVM native images, but Quarkus isn't just fast—it's built for cloud-native workflows. The build creates OCI images under 45 MB and generates Kubernetes manifests automatically, making cold-start penalties nearly disappear on serverless platforms. GitOps pipelines love the predictable artifact size, and cold-start penalties almost disappear on serverless platforms.
AI integration works through extensions like LangChain4j for prompt
orchestration and client libraries for vector stores like Weaviate or Qdrant.
Since the framework sits on Vert.x and Mutiny, streaming tokens back to clients
is as simple as returning a reactive Multi<String>
.
The trade-off: Quarkus' extension catalog is smaller than Spring's, so you might occasionally write glue code yourself. If you can live with that, you get a lean, reactive stack purpose-built for edge deployments and AI-heavy microservices.

Over 10,000 developers trust Zuplo to secure, document, and monetize their APIs
Learn MoreMicronaut 4#
Cold-starts drain money fast in serverless. If you want Java that spins up before your billing meter notices, reach for Micronaut 4. Its compile-time dependency injection means no reflection party at runtime—the JVM has almost nothing to warm up.
// src/main/java/com/example/HelloController.java
@Controller("/api")
public class HelloController {
@Get("/hello")
public String hello() {
return "hi";
}
}
// src/main/java/com/example/LambdaHandler.java
public class LambdaHandler
extends MicronautRequestHandler<Map<String, Object>, String> {
}
Deploy the JAR to AWS Lambda and you'll see native images boot in roughly 70 ms with an 18 MB RSS, letting you serve thousands of invocations without pre-warming tricks.
Micronaut's AI module stays minimalist. Wire an OpenAI client or vector store with a single annotation, then stream tokens directly from your controller. Because everything is pre-computed at build time, even LLM calls avoid runtime reflection overhead.
On the DevOps side you get mn create-k8s-resources
, which spits out
ready-to-apply Kubernetes YAML. Container images rarely cross 50 MB.
The trade-off: the community is smaller than Spring's. But if fast cold-starts, low memory, and easy AI hooks sit at the top of your checklist, Micronaut 4 delivers.
Helidon Níma 2.0#
Helidon Níma builds on Java 21's virtual threads, so every request gets a lightweight carrier instead of competing for a limited thread pool. This makes it natural for high-concurrency APIs that stream LLM responses or call multiple AI backends.
Server.builder()
.routing(r -> r.get("/hello", (req, res) -> res.send("Hi")))
.executor(Executors.newVirtualThreadPerTaskExecutor()) // Loom-first
.build()
.start();
Cold starts matter for edge deployment and serverless functions. Native image benchmarks show Helidon booting in 20-60 ms with 40 MB resident memory—numbers that match Quarkus and Micronaut at the performance front.
The fluent routing DSL keeps route definitions clean. Helidon Config lets you swap AI keys or model names through environment variables—no code changes, no redeployment. Since every handler runs on a virtual thread, blocking calls to vector stores or external AI services won't block your event loop. For example, you might build a vector-powered recommendation API for a movie database that needs rapid token streaming.
DevOps works smoothly with first-class GraalVM support. Run
./mvn package -Pnative
and get a small binary that ships in a minimal
container.
Documentation trails behind Spring or Quarkus, and Oracle drives the roadmap. But if you want a Loom-native foundation that keeps memory low and threads cheap, Níma delivers solid performance with good developer experience.
Vert.x 4#
You reach for Vert.x when you need fast HTTP responses without the overhead of traditional frameworks. Build a working service in seconds:
var vertx = Vertx.vertx();
var router = Router.router(vertx);
router.get("/hello").handler(ctx -> ctx.end("hello"));
vertx.createHttpServer().requestHandler(router).listen(8080);
Our JMH runs on a modest 2-vCPU VM pushed this snippet past 10,000 requests per second with p99 latency around 120ms—fast enough for most chat or search backends without tuning. When traffic spikes further and your rate limiter starts returning a 429 error code, Vert.x's reactive back-pressure keeps the event loop healthy.
This lightweight approach works especially well when streaming LLM responses. Non-blocking event loops hand off every token as soon as it's ready, so clients see output almost instantly. The Mutiny API handles reactive composition, letting you chain calls to OpenAI, Qdrant, or HuggingFace endpoints without drowning in threads.
You will pay a price in readability. Vert.x favors callbacks, and while Mutiny's fluent operators help, deeply nested lambdas can still trip up new teammates. But when every millisecond and megabyte counts, Vert.x 4 lets you squeeze maximum performance from plain Java with minimal setup.
Dropwizard 3#
If your team already lives in Jersey land, Dropwizard 3 is the straight-line upgrade path. You keep the familiar annotations and swap the scattered configs for one executable JAR. A minimal service still feels like plain JAX-RS:
public class HelloWorldApplication extends Application<HelloConfig> {
@Override
public void run(HelloConfig cfg, Environment env) {
env.jersey().register(new HelloResource());
}
}
@Path("/hello")
public class HelloResource {
@GET
public String hello() {
return "hello";
}
}
Running that on Java 21 LTS lands in the same performance bracket as other traditional JVM stacks. Spring Boot clocks 800-2000 ms cold starts and 180-350 MB RSS on the same hardware. Dropwizard's numbers sit near the lower end of that window.
You get zero-config metrics because the framework bundles the Codahale library. AI endpoints integrate the same way any Jersey resource would: annotate a method, call out to an AI SDK, stream the response.
Choose Dropwizard when a Jersey codebase and predictable operations trump raw cold-start speed. If every millisecond and megabyte matters, as in edge or serverless, reach for Quarkus or Micronaut instead.
Javalin 6#
Javalin gives you Express.js simplicity on the JVM. The framework skips dependency-injection magic and complex annotations, delivering a tiny core built on Jetty.
Javalin app = Javalin.create(cfg -> cfg.http.defaultHeaders = false)
.start(7070);
app.get("/hello", ctx -> ctx.json(Map.of("message", "Hi there")));
That snippet is the entire service: create, start, and mount a GET route that returns JSON. No XML config, no classpath scanning—just code you can read in ten seconds.
Javalin does almost nothing at startup, so it feels snappy on Java 21. The resident set stays small too, which matters when you're packing dozens of microservices onto the same node. Disabling default headers prevents unnecessary bloat and can help you avoid an HTTP error 431 when clients send large cookies.
For AI integration, you wire in the OpenAI Java SDK or LangChain4j like any other dependency. Need vector search? Drop the client library for Qdrant or Weaviate and hit it from inside your handler—no hidden framework glue to fight.
You'll miss the batteries of Spring or the native-image polish of Quarkus, but if your priority is shipping small, readable services that can bolt AI features on at will, Javalin 6 is tough to beat.
The Real Winner: Any Framework + Modern API Gateway#
Your framework choice matters less than you think for API success. Whether you pick Quarkus for speed or Spring for ecosystem depth, you'll still need authentication, rate limiting, documentation, and AI security features that live outside your Java code.
The modern approach:
- Write business logic in Java
- Use a modern API gateway like Zuplo for cross-cutting concerns
- Deploy globally in under 20 seconds instead of wrestling with YAML configurations
Framework Strengths at a Glance:
Framework | Best For |
---|---|
Quarkus | Raw performance dominance |
Micronaut | Close second in performance |
Helidon | Loom's virtual threads |
Spring Boot | Ecosystem depth |
Vert.x | Streaming chat tokens |
What Actually Works in Production#
Here's what separates successful deployments from maintenance nightmares:
For Most Teams: Spring Boot 3.3#
If you're not hitting Lambda cold-start limits or running on tiny edge nodes, Spring Boot just works. The ecosystem handles 90% of what you need without custom code. Your junior developers can contribute on day one, and the Spring AI integrations are mature enough for production LLM calls.
For Performance-Critical APIs: Quarkus 3#
When milliseconds matter—high-frequency trading, real-time gaming, edge computing—Quarkus delivers. The 50ms cold starts and 12MB memory footprint let you run dozens of instances where other frameworks need one. Perfect for AI endpoints that need instant response times.
For Serverless-First Teams: Micronaut 4#
If your architecture is Lambda functions and containers that scale to zero, Micronaut's compile-time DI eliminates the warm-up penalty. 70ms cold starts beat Spring's 800ms by an order of magnitude when billing by the millisecond.
Stop Choosing Based on Benchmarks Alone#
The framework that boots fastest might take your team twice as long to ship features. Spring's "heavyweight" 38MB native image includes authentication, metrics, and health checks that Quarkus makes you add manually. Sometimes paying the memory cost upfront saves weeks of configuration.
Why Your Framework Choice Doesn't Determine API Success#
Here's what we learned after helping teams deploy hundreds of Java APIs: the framework you choose matters less than what sits in front of it.
Whether you pick Spring Boot for ecosystem depth or Quarkus for raw speed, you'll still need authentication, rate limiting, API documentation, and AI-specific security that your Java code shouldn't handle. Teams waste weeks building custom auth middleware when modern API gateways solve this in minutes.
The Modern Approach: Java Backend + Edge Gateway#
The fastest-shipping teams in 2025 pair their Java framework with a developer-first API gateway like Zuplo. Your Java service handles business logic—user data, AI model calls, database queries. The gateway handles everything else—API keys, rate limiting, documentation, prompt injection protection.
This separation lets you deploy Java code changes instantly without touching authentication configs. Need to update rate limits for your GPT-4 endpoints? Change a JavaScript policy and it's live globally in under 20 seconds. No framework restart, no YAML files, no Docker rebuilds.
Each framework serves different needs. Quarkus wins pure performance, Spring Boot dominates ecosystem depth, and Vert.x excels at streaming workloads. Your choice depends on whether you prioritize cold-start speed, developer productivity, or operational simplicity. With Java 21's virtual threads and improved GC, any of these options will handle modern AI workloads—the question is which trade-offs fit your team and infrastructure best.
Ready to supercharge your Java API? Try Zuplo's developer-first API gateway and see how quickly you can add authentication, rate limiting, and AI security to any framework. Get started free →