Posted by Momiji, ML Engineer of Gaudiy Inc.(Translated from a Japanese post on May 16, 2024.)
Hello, my name is Momiji, and I work as an ML engineer at Gaudiy. I am mainly responsible for the development of the recommendation system.
Since April of this year, we have added a collaborative filtering-based recommendation feature to “Gaudiy Fanlink,” a product developed and provided by Gaudiy. In this article, I would like to discuss the logic and system architecture behind this system.
Gaudiy Fanlink is a social media community platform where IP fans gather. The role of “recommendations” in this platform is to promote matching between users and content and increase overall activity within the community. To promote matching, it is important to present content that matches the user’s preferences and encourage them to access more content.
Before implementing the recommendation system, users had to proactively search for their favorite posts, and the matching between users and content was not efficiently delivered. In fact , even in communities where enthusiastic core creators of UGC(User Generated Contents) gather, it was important to create a cycle of “posting UGC” → “viewed by many people” → “getting excited with comments and other reactions” as a key part of the user experience. We thought that if viewers could passively encounter posts they liked, it would lead to an increase in the number of views and reactions, so we decided to implement a recommendation system.
There are two main approaches to recommendation logic: Collaborative Filtering and Content-Based Filtering. Collaborative Filtering scores user behavior related to preferences and designs recommendations. On the other hand, Content-Based Filtering generates representations of content and designs recommendations incorporating domain information.
Gaudiy Fanlink is not a single general-purpose platform but a platform for multiple IPs and the Content-Based Filtering approach requires careful design on how to handle domain-specific information according to the attributes of the community. Therefore, when considering this approach, a major concern was how much development cost should be allocated for each IP community.
In contrast, Collaborative Filtering allows designing user behavior, regarded as preferences for items, within each community. Therefore, as a first step, we decided to build a recommendation system using Collaborative Filtering.
For the Collaborative Filtering algorithm, we adopted iALS (Implicit Alternating Least Squares) this time. The iALS is a type of matrix factorization algorithm.
Skipping the background, the flow from Funk-SVD → ALS → iALS is notable for its ability to speed up parallel computation even with large numbers of users and items, and for handling situations where users do not explicitly indicate preferences (only implicit feedback).
2–1. Training
We score and evaluate user preferences for items based on actions such as clicks, likes, and replies.
Next, we define this evaluation as a user-item evaluation matrix and consider decomposing it into user matrix W and item matrix H. Matrix factorization is performed by finding W and H that minimize the loss function. iALS has several definitions of loss functions, and here we use the following definition:
The first term evaluates whether the decomposed W and H approximate the original evaluation matrix well by minimizing the difference between the reconstructed evaluation matrix from W and H for observed user-item pairs. The second term penalizes elements with no evaluation to prevent large values, and the third term is a regularization term for generalization.
By alternatingly optimizing W and H, the separated steps for W and H can be updated in parallel for each user and item. For example, when optimizing the user vector $w_w$ in the user matrix W with H fixed, we find $w_u$ when the partial derivative with respect to $w_u$ is zero.
The same can be written for items:
The second term is common for all users or all items, so it only needs to be computed once before parallel computation. This is known as the “Gramian trick,” which reduces computational cost.
2–2. Batch Inference
Using the matrices W and H obtained from the training batch, we can calculate $\widehat{r}_{u,i}$ and get recommendation candidates for each user. Finally, we apply sorting logic and cache the results for serving.
2–3. Real-Time Inference
When a user’s evaluation is updated, we can update the user vector $w_u$ in real-time. This also applies to new users with evaluations.
The formulation is the same as the training batch, but real-time inference can be computationally intensive, so if there are throughput or latency issues, we may use optimizations or approximations.
Common techniques include Cholesky decomposition and conjugate gradient methods. In simple simulations, Cholesky decomposition is faster than regular matrix inversion, and conjugate gradient methods can improve the long tail by terminating at appropriate iterations. It’s advisable to determine the number of iterations based on inference accuracy. Both methods have implementations in Scipy.
The construction of this recommendation system was carried out as Gaudiy was transitioning from Cloud Run to GKE. For more details on the GKE migration, please refer to this article.
Following this, both batch and service components of the recommendation system are built on GKE. The batch process is built with Cloud Composer, making it quite easy to set up (big thanks to the SRE team).
As a detailed point, real-time inference consumes significant computational resources even with approximate solutions, and sufficient memory is required to retain item matrices and Gramian trick matrices. Hence, we have prepared dedicated node pools for ML to prevent resource strain on other microservices due to user and item spikes.
Since the current release included multiple features, we have not yet analyzed the specific effects of the recommendation feature. We are implementing AB testing and will report the results as they become available.
We also anticipate that personalization in recommendation candidates will become challenging as the number of users increases. Therefore, we are considering clustering and graph-based candidate generation.
Additionally, only business use case sorting is implemented in this release, but we believe reranking recommendation candidates is also important. We are beginning by modeling the linkage between user behavior and business KPIs (the architecture with a service layer is designed with reranking in mind, while still focusing much of the inference on batch processes).
Lastly, due to the nature of Gaudiy Fanlink as a fan community, the domains are diverse, and each one is very deep. We aim to address this by utilizing multimodal data.
We are actively conducting foundational research on embedding models and R&D on multimodal LLMs. If anyone is interested in these efforts, we would love to talk with you.
Credit: Source link