Sampling RTB transactions in an online machine learning setting
Abstract
We (the machine learning team at Jampp) strive to predict click-through rates (CTR) and conversion rates (CVR) for the real-time bidding (RTB) online advertising market by means of an in-house online machine learning platform based on a state-of-the-art stochastic gradient descent estimator. Our estimation framework has already been covered in a previous paper, so here we want to focus on some peripheral aspects of our platform that, in spite of being of a somewhat ancillary nature, nevertheless tend to dominate development efforts and overall system complexity; namely, in order to feed the learning system we first need to sample a very high-volume stream of out-of-order and scattered-in-time events and consolidate them into a sequence of observations representing the underlying market transactions, each observation composed of a set of features and a response, from which the estimator is ultimately able to learn. This paper is written in a down-to-earth fashion: we describe a number of particular difficulties the general problem of sampling in an online high-volume setting poses and then we present our concrete answers to those difficulties based on real, hands-on, experience.
Downloads
Published
Issue
Section
License
Copyright (c) 2017 Carlos Pita

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Those authors who have publications with this journal, agree with the following terms:
a. Authors will retain its copyright and will ensure the rights of first publication of its work to the journal, which will be at the same time subject to the Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0) allowing third parties to share the work as long as the author and the first publication on this journal is indicated.
b. Authors may elect other non-exclusive license agreements of the distribution of the published work (for example: locate it on an institutional telematics file or publish it on an monographic volume) as long as the first publication on this journal is indicated,
c. Authors are allowed and suggested to disseminate its work through the internet (for example: in institutional telematics files or in their website) before and during the submission process, which could produce interesting exchanges and increase the references of the published work. (see The effect of open Access)















