8/27/2023 0 Comments Vray cloud rendering costOther - We looked at storage, network transfers, and other miscellaneous costs on the farm. What are the software license fees leveraged on the farm? What constraints do they enforce on scaling needs? Licenses - Because of the change from Enterprise License on-prem to Data Center license on Google Cloud, we were seeing license costs around 45% of the overall spend on the farm. Utilization - How is the per node and overall farm utilization over 24X7X365? We looked at usage and submission patterns on the farm along with utilization to find out ways to drive efficiency. Can we use GPU acceleration to optimize render times? What about leveraging instance types like Spot? Nodes - Are we using the right machine size and configurations on the farm? The current deployment had a single pool which forced the machine size to be optimized for worst-case usage, leading to waste for 90% of our use cases. At a high level, we looked into the following key areas: We categorized the overall spend on the farm into various funnels, and assigned weights to the impact each lever can drive. One of our goals was to optimize all-inclusive cost to render per hour. On Google Cloud, we were shifting the focus from an owned asset to pay-per-usage model. We developed a simplified formula of all-inclusive render cost per core hour and time needed for each asset, making it easier for each team to drive objectives with focus and transparency. To optimize rendering costs we needed to not only drive down the cost of the rendering platform but also optimize the workflows to reduce the render hours usage per asset. We formed a focused team of engineers, business stakeholders, infrastructure experts, and Google Cloud to drive discussions. During deep dives, we realized that due to a lack of visibility into usage in our current state,, we had many inefficiencies with our deployed infrastructure footprint and how the farm is used by artists and modelers. When on-prem, we didn’t have a lot of insight into our usage and infrastructure costs, as those were managed by a centralized Infrastructure team. To create an execution plan, the first step was to thoroughly understand what was driving our cloud spending. We followed the three Cloud FinOps principles - inform, optimize and operate - to create a holistic strategy to optimize our spending and drive sustained governance going forward. This would help us not only optimize but also provide better visibility and the render-hour usage across the rendering farm, for greater savings. With a variety of compute available from Google Cloud, we decided to take advantage of it for the farm. We realized that we could do better than a one-size-fits-all model. Wasted usage on rendering due to non-optimized workflows and cost controls Minimal visibility into cost and render-hour usage across the farm Missed opportunities for automation and consolidation Poor infrastructure optimization with fixed farm size and one-size-fits-all machines Post-migration, we recognized inefficiencies in the deployed architecture, which was not well suited for the economics of the pay-as-you-go cloud model. Here’s our lift-and-shift deployed architecture on Google Cloud:ĭuring the migration, our goal was to provide as-is SLAs to our customers without compromising the quality of the pipelines. The Render Farm uses Deadline to dispatch jobs to the nodes we completed migrating it from on-premises to the cloud In Q2 2022. The Automation farm is managed using OpenCue to dispatch jobs to the nodes We have two different “farms'' that we use, one primarily for automation tasks and the other for rendering tasks with: We’ve been working with the Google Cloud team to complete the transition from a hybrid cloud to a Unified Public Cloud strategy. We worked closely with Google Cloud to optimize our render platform, driving an estimated ~$9M of savings on an annualized basis. Last year, we performed a lift-and-shift migration to the cloud, but because we hadn’t optimized our workloads for the cloud, our costs bubbled up substantially. But creating these 3D renders requires significant computation (rendering) capabilities. At a high level, suppliers provide us with product images and information about dimensions, materials, etc., and we use them to create photorealistic 3D models and generate proprietary imagery. At Wayfair, we have content creation pipelines that automate some portions of 3D model and 3D scene creation, and render images from those models/scenes.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |