----------------------- REVIEW 1 --------------------- PAPER: 6 TITLE: Learning Driven Mobility Control of Airborne Base Stations in Emergency Networks AUTHORS: Rui Li, Chaoyun Zhang, Razvan Stanica, Fabrice Valois and Paul Patras Overall evaluation: 0 (borderline paper) ----------- Overall evaluation ----------- The paper proposes a RL based mobility control of airbone base stations. It is a straightforward application of A3C RL algorithm, one of the recent RL algorithm with distributed workers doing local sampling. The choice of the reward function in (2) is reasonable. Although it is interesting to see the performance improvement of the proposed scheme over the simple measurement-based benchmark scheme, the contribution of the paper is minimal lacking novel idea. Training the policy with 100 to 2,000 episodes as shown in Figure 4 is indeed a practical concern. No serious efforts were made to address this concern. Overall It lacks technical contribution but I rate as a borderline paper since I find the application has a practical value. ----------------------- REVIEW 2 --------------------- PAPER: 6 TITLE: Learning Driven Mobility Control of Airborne Base Stations in Emergency Networks AUTHORS: Rui Li, Chaoyun Zhang, Razvan Stanica, Fabrice Valois and Paul Patras Overall evaluation: 1 (weak accept) ----------- Overall evaluation ----------- In this paper an important networking problem of mobility control of airborne base stations is addressed. Based on a system model of wireless channel and UE mobility/handover, the authors proposed a reinforcement learning based scheme to direct the movement of base stations and achieve higher downlink SINR. The major novelty of the paper is applying reinforcement learning in mobility control by formulating wireless channel and base station/UE locations as environment and SINR as reward. The complexity of interaction between mobility and SINR is modeled by MLPs (which are generally not considered as deep networks though) in the A3C algorithm. The simulation results are not very clear and can be improved. The description of SINR gradient based solution in "Benchmark" of Section 4.1 is not clear: why is it called "gradient based"? what is the rationale behind the direction with weakest SINR value? ----------------------- REVIEW 3 --------------------- PAPER: 6 TITLE: Learning Driven Mobility Control of Airborne Base Stations in Emergency Networks AUTHORS: Rui Li, Chaoyun Zhang, Razvan Stanica, Fabrice Valois and Paul Patras Overall evaluation: 2 (accept) ----------- Overall evaluation ----------- The paper considers base stations mounted on UAVs, intended for emergency networks, for providing coverage in challenging conditions, where poor signal strength can be catastrophic. The main issue considered is mobility control. The solution proposed uses a deep reinforcement learning based approach. The goal is to make accurate movement control decisions, after training. The proposed approach is compared against a benchmark gradient-based approach. The paper is nicely written, except for the description of the benchmark (used for comparison). Without a better description, it's a bit tough to appreciate the results, although they seem encouraging.