essay / payments
Migrating a payment service to cloud in 8 weeks by not touching the code
When two engineers left mid-project and the datacenter contract renewal was weeks away, the right move was a container wrapper, not a rewrite.
VRBO’s payment service had been running in a managed datacenter for years. COVID pushed every team to find cost savings, and migrating this service to cloud alongside our other services would save roughly $100k per month. The datacenter contract was up for renewal at the end of 2020. Missing the window meant another year locked in.
I was tasked with leading the migration. Two weeks in, the staff engineer and another engineer who understood the service left the team. The documentation was thin. The codebase was unfamiliar. The deadline was immovable.
The honest assessment
I documented the risks plainly. We did not understand the application well enough to make code changes confidently within the time window. Modifying the service to run in our PCI cloud environment would require changes to configuration, networking, and possibly the application code itself. Any of these changes carried risk we could not quantify because we did not fully understand the existing behavior.
The responsible thing was to say so.
The alternative
Instead of modifying the application, I proposed wrapping the original JAR in a container and deploying it to our cloud environment unchanged. No code modifications. The service would run exactly as it always had, just in a different location.
This was throwaway work in the sense that we would eventually need to properly integrate the service. But it was work we had done many times before for other services. The pattern was well-understood, the risks were known, and the timeline was achievable.
I built a proof of concept in our lab environment and presented it to leadership and the principal engineers. They identified issues with firewall exceptions and database latencies due to the new network topology. Each concern was researched, and either solved or documented as a known risk with mitigation.
The outcome
The migration completed on time. Latencies actually improved by over 25% because the service was now co-located with the microservices it depended on, eliminating network hops that had been invisible in the datacenter setup. We saved over $100k per month in hosting costs immediately.
Once the service was stable in its new environment, we planned the proper database migration to RDS, which further improved latencies.
The decision to not touch the code was the hardest part. Every engineer’s instinct is to fix things while you are in there. But the constraint was not “make it better.” The constraint was “move it before the contract renews.” Matching the solution to the actual constraint, rather than the ideal one, is what made the timeline achievable.