skip to main content
Skip header Section
Reinforcement LearningJune 1992
Publisher:
  • Kluwer Academic Publishers
  • 101 Philip Drive Assinippi Park Norwell, MA
  • United States
ISBN:978-0-7923-9234-7
Published:01 June 1992
Pages:
172
Skip Bibliometrics Section
Reflects downloads up to 22 Oct 2024Bibliometrics
Abstract

No abstract available.

Cited By

  1. Wu C, Chang Y, Tsai C and Hou C (2023). A Q-learning-based distributed queuing Mac protocol for Internet-of-Things networks, EURASIP Journal on Wireless Communications and Networking, 2023:1, Online publication date: 20-Oct-2023.
  2. Jeong H, Schlotfeldt B, Hassani H, Morari M, Lee D and Pappas G Learning Q-network for Active Information Acquisition 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (6822-6827)
  3. Kralik J Toward a Comprehensive List of Necessary Abilities for Human Intelligence, Part 2: Using Knowledge Artificial General Intelligence, (271-281)
  4. Berrio J, Shan M, Worrall S and Nebot E (2022). Camera-LIDAR Integration: Probabilistic Sensor Fusion for Semantic Mapping, IEEE Transactions on Intelligent Transportation Systems, 23:7, (7637-7652), Online publication date: 1-Jul-2022.
  5. Lei X, Xia Y, Deng L and Sun L (2022). A deep reinforcement learning framework for life-cycle maintenance planning of regional deteriorating bridges using inspection data, Structural and Multidisciplinary Optimization, 65:5, Online publication date: 1-May-2022.
  6. Videau M, Leite A, Teytaud O and Schoenauer M Multi-objective Genetic Programming for�Explainable Reinforcement Learning Genetic Programming, (278-293)
  7. Fontbonne N, Maudet N and Bredeche N Cooperative Co-evolution and�Adaptive Team Composition for�a�Multi-rover Resource Allocation Problem Genetic Programming, (179-193)
  8. Mandow L, Perez-de-la-Cruz J and Pozas N (2022). Multi-objective dynamic programming with limited precision, Journal of Global Optimization, 82:3, (595-614), Online publication date: 1-Mar-2022.
  9. Kurniawan B, Vamplew P, Papasimeon M, Dazeley R and Foale C (2022). Discrete-to-deep reinforcement learning methods, Neural Computing and Applications, 34:3, (1713-1733), Online publication date: 1-Feb-2022.
  10. ACM
    Li R, Zhang Y, Zhao Y, Wui H, Xu Z and Zhao K Deep Reinforcement Learning with Noisy Exploration for Autonomous Driving 2022 The 6th International Conference on Machine Learning and Soft Computing, (8-14)
  11. Wang D, Ha M and Zhao M (2022). The intelligent critic framework for advanced optimal control, Artificial Intelligence Review, 55:1, (1-22), Online publication date: 1-Jan-2022.
  12. Hahn E, Perez M, Schewe S, Somenzi F, Trivedi A and Wojtczak D Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives Formal Methods, (142-159)
  13. Haider A, Hawe G, Wang H and Scotney B Multi-Asset Market Making via Multi-Task Deep Reinforcement Learning Machine Learning, Optimization, and Data Science, (353-364)
  14. el Hassouni A, Hoogendoorn M, Ciharova M, Kleiboer A, Amarti K, Muhonen V, Riper H and Eiben A pH-RL: A Personalization Architecture to Bring Reinforcement Learning to Health Practice Machine Learning, Optimization, and Data Science, (265-280)
  15. Rezaee K, Yadmellat P and Chamorro S Motion Planning for Autonomous Vehicles in the Presence of Uncertainty Using Reinforcement Learning 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (3506-3511)
  16. Tsvarkaleva M and Dennis L No Free Lunch: Overcoming Reward Gaming in AI Safety Gridworlds Computer Safety, Reliability, and Security. SAFECOMP 2021 Workshops, (226-238)
  17. Zombori Z, Urban J and Olšák M The Role of Entropy in Guiding a Connection Prover Automated Reasoning with Analytic Tableaux and Related Methods, (218-235)
  18. Lai S, Wu X, Wang S, Peng Y and Peng Z Learning an Index Advisor with Deep Reinforcement Learning Web and Big Data, (178-185)
  19. Yu H, Wang H and Wu J Mixup Without Hesitation Image and Graphics, (143-154)
  20. Huisman M, van Rijn J and Plaat A (2021). A survey of deep meta-learning, Artificial Intelligence Review, 54:6, (4483-4541), Online publication date: 1-Aug-2021.
  21. Muñoz J, Quintero L, Stephens C and Pope A Taxonomy of Physiologically Adaptive Systems and Design Framework Adaptive Instructional Systems. Design and Evaluation, (559-576)
  22. Hahn E, Perez M, Schewe S, Somenzi F, Trivedi A and Wojtczak D Model-Free Reinforcement Learning for Branching Markov Decision Processes Computer Aided Verification, (651-673)
  23. Kim M, Jaseemuddin M and Anpalagan A (2021). Deep Reinforcement Learning Based Active Queue Management for IoT Networks, Journal of Network and Systems Management, 29:3, Online publication date: 1-Jul-2021.
  24. Przybyszewski A Theory of Mind Helps to Predict Neurodegenerative Processes in Parkinson’s Disease Computational Science – ICCS 2021, (542-555)
  25. Fahid F, Rowe J, Spain R, Goldberg B, Pokorny R and Lester J Adaptively Scaffolding Cognitive Engagement with Batch Constrained Deep Q-Networks Artificial Intelligence in Education, (113-124)
  26. Taguchi Y, Hino H and Kameyama K (2021). Pre-Training Acquisition Functions by Deep Reinforcement Learning for Fixed Budget Active Learning, Neural Processing Letters, 53:3, (1945-1962), Online publication date: 1-Jun-2021.
  27. Pham U, Luu Q and Tran H (2021). Multi-agent reinforcement learning approach for hedging portfolio problem, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 25:12, (7877-7885), Online publication date: 1-Jun-2021.
  28. Walmsley J (2021). Artificial intelligence and the value of transparency, AI & Society, 36:2, (585-595), Online publication date: 1-Jun-2021.
  29. Zelvelder A, Westberg M and Främling K Assessing Explainability in Reinforcement Learning Explainable and Transparent AI and Multi-Agent Systems, (223-240)
  30. Kopel M and Szczurek W Parallelization of Reinforcement Learning Algorithms for Video Games Intelligent Information and Database Systems, (195-207)
  31. Gonzalez Fabre R, Camacho Ibáñez J and Tejedor Escobar P (2021). Moral control and ownership in AI systems, AI & Society, 36:1, (289-303), Online publication date: 1-Mar-2021.
  32. ACM
    Chatzilygeroudis K, Hatzilygeroudis I and Perikos I Machine Learning Basics Intelligent Computing for Interactive System Design, (143-193)
  33. Kirtay M, Vannucci L, Albanese U, Laschi C, Oztop E and Falotico E (2021). Emotion as an emergent phenomenon of the neurocomputational energy regulation mechanism of a cognitive agent in a decision-making task, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 29:1, (55-71), Online publication date: 1-Feb-2021.
  34. Tan J, Liang Y, Zhang L and Feng G (2021). Deep Reinforcement Learning for Joint Channel Selection and Power Control in D2D Networks, IEEE Transactions on Wireless Communications, 20:2, (1363-1378), Online publication date: 1-Feb-2021.
  35. Zhang T, Zhu K and Wang J (2021). Energy-Efficient Mode Selection and Resource Allocation for D2D-Enabled Heterogeneous Networks: A Deep Reinforcement Learning Approach, IEEE Transactions on Wireless Communications, 20:2, (1175-1187), Online publication date: 1-Feb-2021.
  36. Bertsimas D and Stellato B (2020). The voice of optimization, Machine Language, 110:2, (249-277), Online publication date: 1-Feb-2021.
  37. Della Santina C, Katzschmann R, Bicchi A, Rus D, Jiang H, Wang Z, Jin Y, Chen X, Li P, Gan Y, Lin S and Chen X (2021). Hierarchical control of soft manipulators towards unstructured interactions, International Journal of Robotics Research, 40:1, (411-434), Online publication date: 1-Jan-2021.
  38. Civitarese G, Sztyler T, Riboni D, Bettini C and Stuckenschmidt H (2020). POLARIS: Probabilistic and Ontological Activity Recognition in Smart-Homes, IEEE Transactions on Knowledge and Data Engineering, 33:1, (209-223), Online publication date: 1-Jan-2021.
  39. Metzger A, Quinton C, Mann Z, Baresi L and Pohl K Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services Service-Oriented Computing, (269-286)
  40. Carvalho D, Melo F and Santos P A new convergent variant of Q-learning with linear function approximation Proceedings of the 34th International Conference on Neural Information Processing Systems, (19412-19421)
  41. Patel N, Acerbi L and Pouget A Dynamic allocation of limited memory resources in reinforcement learning Proceedings of the 34th International Conference on Neural Information Processing Systems, (16948-16960)
  42. Grimm C, Barreto A, Singh S and Silver D The value equivalence principle for model-based reinforcement learning Proceedings of the 34th International Conference on Neural Information Processing Systems, (5541-5552)
  43. Ashwood Z, Roy N, Bak J, The International Brain Laboratory and Pillow J Inferring learning rules from animal decision-making Proceedings of the 34th International Conference on Neural Information Processing Systems, (3442-3453)
  44. Neu G and Pike-Burke C A unifying view of optimism in episodic reinforcement learning Proceedings of the 34th International Conference on Neural Information Processing Systems, (1392-1403)
  45. Larsson D, Maity D and Tsiotras P (2020). Q-Tree Search: An Information-Theoretic Approach Toward Hierarchical Abstractions for Agents With Computational Limitations, IEEE Transactions on Robotics, 36:6, (1669-1685), Online publication date: 1-Dec-2020.
  46. Shi H, Wu H, Xu C, Zhu J, Hwang M and Hwang K (2020). Adaptive Image-Based Visual Servoing Using Reinforcement Learning With Fuzzy State Coding, IEEE Transactions on Fuzzy Systems, 28:12, (3244-3255), Online publication date: 1-Dec-2020.
  47. Marsh A (2020). Special delivery by satellite, IEEE Spectrum, 57:12, (50-50), Online publication date: 1-Dec-2020.
  48. Sen S, Maity S and Das D (2020). The body is the network: To safeguard sensitive data, turn flesh and tissue into a secure wireless channel, IEEE Spectrum, 57:12, (44-49), Online publication date: 1-Dec-2020.
  49. (2020). Good grids make good neighbors, IEEE Spectrum, 57:12, (38-43), Online publication date: 1-Dec-2020.
  50. Ross P (2020). Flying beyond mach 5 is back, decades after the original need-for-speed arms race ended: Going Hypersonic, IEEE Spectrum, 57:12, (32-37), Online publication date: 1-Dec-2020.
  51. Ulrich L (2020). GM bets big on batteries: A new $2.3 billion plant cranks out Ultium cells to power a future line of electric vehicles, IEEE Spectrum, 57:12, (26-31), Online publication date: 1-Dec-2020.
  52. Higginbotham S and Pesce M (2020). Who's behind that robot? - [CrossTalk], IEEE Spectrum, 57:12, (24-25), Online publication date: 1-Dec-2020.
  53. Smil V (2020). Energiewende, 20 years later - [CrossTalk], IEEE Spectrum, 57:12, (22-23), Online publication date: 1-Dec-2020.
  54. (2020). Turning carbon dioxide into vodka: A Brooklyn startup is an XPrize finalist for its boozy carbon-capture technology - [Podcasts], IEEE Spectrum, 57:12, (21-21), Online publication date: 1-Dec-2020.
  55. Cass S (2020). Painless FPGA programming: The alchitry AU kit can simplify projects that need a lot of input/output - [Hands on], IEEE Spectrum, 57:12, (18-20), Online publication date: 1-Dec-2020.
  56. (2020). Light and lively - [News], IEEE Spectrum, 57:12, (16-17), Online publication date: 1-Dec-2020.
  57. Waltz E, Ackerman E, Choi C, Dhar P and Dumiak M (2020). Take a tour inside a cell: Advances in microscopy let researchers give immersive VR trips through brain cells - [News], IEEE Spectrum, 57:12, (8-14), Online publication date: 1-Dec-2020.
  58. Hassler S (2020). Ending the COVID-19 pandemic: Embracing no-tech solutions will make high-tech solutions possible - [Spectral Lines], IEEE Spectrum, 57:12, (6-6), Online publication date: 1-Dec-2020.
  59. (2020). Autonomous vehicles require batteries with lasting power, IEEE Spectrum, 57:12, (5-5), Online publication date: 1-Dec-2020.
  60. (2020). The body electric - [Back Story], IEEE Spectrum, 57:12, (2-2), Online publication date: 1-Dec-2020.
  61. Dunnhofer M, Martinel N and Micheloni C Tracking-by-Trackers with a Distilled and Reinforced Model Computer Vision – ACCV 2020, (631-650)
  62. Hmedoush I, Adjih C and Mühlethaler P A Regret Minimization Approach to Frameless Irregular Repetition Slotted Aloha: IRSA-RM Machine Learning for Networking, (73-92)
  63. Zhu J and Jiang C TAC-GAIL: A Multi-modal Imitation Learning Method Neural Information Processing, (688-699)
  64. Zhou Z, Liao H, Wang X, Mumtaz S and Rodriguez J (2020). When Vehicular Fog Computing Meets Autonomous Driving: Computational Resource Management and Task Offloading, IEEE Network: The Magazine of Global Internetworking, 34:6, (70-76), Online publication date: 1-Nov-2020.
  65. ACM
    Chen Y, Peng Y, Bao Y, Wu C, Zhu Y and Guo C Elastic parameter server load distribution in deep learning clusters Proceedings of the 11th ACM Symposium on Cloud Computing, (507-521)
  66. Almulla H and Gay G Generating Diverse Test Suites for Gson Through Adaptive Fitness Function Selection Search-Based Software Engineering, (246-252)
  67. Yang I (2020). A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces, Journal of Optimization Theory and Applications, 187:1, (133-157), Online publication date: 1-Oct-2020.
  68. van Heeswijk W Smart Containers with Bidding Capacity: A Policy Gradient Algorithm for Semi-cooperative Learning Computational Logistics, (52-67)
  69. Zhang J, Dong A and Yu J Intelligent Dynamic Spectrum Access for Uplink Underlay Cognitive Radio Networks Based on Q-Learning Wireless Algorithms, Systems, and Applications, (691-703)
  70. Bishop J and Gallagher M Optimality-Based Analysis of XCSF Compaction in Discrete Reinforcement Learning Parallel Problem Solving from Nature – PPSN XVI, (471-484)
  71. Coppens Y, Steckelmacher D, Jonker C and Nowé A Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning Trustworthy AI - Integrating Learning, Optimization and Reasoning, (163-179)
  72. Kim Y, Allmendinger R and López-Ibáñez M Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art Trustworthy AI - Integrating Learning, Optimization and Reasoning, (123-139)
  73. Ye J and Zhang Y (2020). DRAG: Deep Reinforcement Learning Based Base Station Activation in Heterogeneous Networks, IEEE Transactions on Mobile Computing, 19:9, (2076-2087), Online publication date: 1-Sep-2020.
  74. Wang N, Zeng J, Hong W and Zhu S (2020). Privacy-preserving spatial keyword location-to-trajectory matching, Distributed and Parallel Databases, 38:3, (667-686), Online publication date: 1-Sep-2020.
  75. Pedersen T and Johansen C (2019). Behavioural artificial intelligence: an agenda for systematic empirical studies of artificial inference, AI & Society, 35:3, (519-532), Online publication date: 1-Sep-2020.
  76. Gros T, Höller D, Hoffmann J and Wolf V Tracking the Race Between Deep Reinforcement Learning and Imitation Learning Quantitative Evaluation of Systems, (11-17)
  77. Abdelfattah S, Kasmarik K and Hu J (2020). A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 28:4, (273-292), Online publication date: 1-Aug-2020.
  78. Serrano-Cuevas J, Morales E and Hernández-Leal P (2020). Safe reinforcement learning using risk mapping by similarity, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 28:4, (213-224), Online publication date: 1-Aug-2020.
  79. Tian Y, Wang Z, Yin X, Shi X, Guo Y, Geng H and Yang J (2020). Traffic Engineering in Partially Deployed Segment Routing Over IPv6 Network With Deep Reinforcement Learning, IEEE/ACM Transactions on Networking, 28:4, (1573-1586), Online publication date: 1-Aug-2020.
  80. el Hassouni A, Hoogendoorn M, Eiben A and Muhonen V Structural and Functional Representativity of GANs for Data Generation in Sequential Decision Making Machine Learning, Optimization, and Data Science, (458-471)
  81. Dinu M, Hofmarcher M, Patil V, Dorfer M, Blies P, Brandstetter J, Arjona-Medina J and Hochreiter S XAI and Strategy Extraction via Reward Redistribution xxAI - Beyond Explainable AI, (177-205)
  82. Zhang S, Liu B, Yao H and Whiteson S Provably convergent two-timescale off-policy actor-critic with function approximation Proceedings of the 37th International Conference on Machine Learning, (11204-11213)
  83. Zhang S, Liu B and Whiteson S GradientDICE Proceedings of the 37th International Conference on Machine Learning, (11194-11203)
  84. Ghosh D and Bellemare M Representations for stable off-policy reinforcement learning Proceedings of the 37th International Conference on Machine Learning, (3556-3565)
  85. Xu Y, Yu J and Buehrer R (2020). The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments With Partial Observations, IEEE Transactions on Wireless Communications, 19:7, (4494-4506), Online publication date: 1-Jul-2020.
  86. Gaujal B, Girault A and Plassart S (2020). Feasibility of on-line speed policies in real-time systems, Real-Time Systems, 56:3, (254-292), Online publication date: 1-Jul-2020.
  87. Manoonpong P, Xiong X, Larsen J, Porr B and Miller P (2020). Forward propagation closed loop learning, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 28:3, (181-194), Online publication date: 1-Jun-2020.
  88. Wang F, Zhang C, Wang F, Liu J, Zhu Y, Pang H and Sun L (2020). DeepCast: Towards Personalized QoE for Edge-Assisted Crowdcast With Deep Reinforcement Learning, IEEE/ACM Transactions on Networking, 28:3, (1255-1268), Online publication date: 1-Jun-2020.
  89. Olsen M Online Stacking Using RL with Positional and Tactical Features Learning and Intelligent Optimization, (184-194)
  90. Silva R, Vasco M, Melo F, Paiva A and Veloso M Playing Games in the Dark Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, (1260-1268)
  91. Goecks V, Gremillion G, Lawhern V, Valasek J and Waytowich N Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, (465-473)
  92. ACM
    Altamimi S and Shirmohammadi S (2020). QoE-Fair DASH Video Streaming Using Server-side Reinforcement Learning, ACM Transactions on Multimedia Computing, Communications, and Applications, 16:2s, (1-21), Online publication date: 30-Apr-2020.
  93. Ramicic M and Bonarini A (2020). Adaptation of learning agents through artificial perception, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 28:2, (79-88), Online publication date: 1-Apr-2020.
  94. Li K and Burdick J (2020). Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning, International Journal of Robotics Research, 39:5, (568-585), Online publication date: 1-Apr-2020.
  95. Li W, Wang S, Xu Y and Lu S (2020). Charging on the Route: An Online Pricing Gateway Congestion Control for ICNs, IEEE Transactions on Network and Service Management, 17:1, (239-250), Online publication date: 1-Mar-2020.
  96. Ottoni A, Nepomuceno E, de Oliveira M and de Oliveira D (2019). Tuning of reinforcement learning parameters applied to SOP using the Scott–Knott method, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:6, (4441-4453), Online publication date: 1-Mar-2020.
  97. Hu J, Zhang H, Song L, Han Z and Poor H (2020). Reinforcement Learning for a Cellular Internet of UAVs: Protocol Design, Trajectory Control, and Resource Management, IEEE Wireless Communications, 27:1, (116-123), Online publication date: 1-Feb-2020.
  98. Miyazaki K and Ida M (2020). Construction of consistency judgment system of diploma policy and curriculum policy using character‐level CNN, Electronics and Communications in Japan, 102:12, (30-39), Online publication date: 17-Jan-2020.
  99. Hasegawa S, Kim S, Shoji Y and Hasegawa M Performance Evaluation of Machine Learning Based Channel Selection Algorithm Implemented on IoT Sensor Devices in Coexisting IoT Networks 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), (1-5)
  100. Shinzaki M, Koda Y, Yamamoto K, Nishio T and Morikura M Reducing Transmission Delay in EDCA Using Policy Gradient Reinforcement Learning 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), (1-6)
  101. Koda Y, Nakashima K, Yamamoto K, Nishio T and Morikura M Cooperative Sensing in Deep RL-Based Image-to-Decision Proactive Handover for mmWave Networks 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC), (1-6)
  102. Andrychowicz O, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L and Zaremba W (2020). Learning dexterous in-hand manipulation, International Journal of Robotics Research, 39:1, (3-20), Online publication date: 1-Jan-2020.
  103. Tilahun F, Kang C and Chowdhury M (2020). Decentralized and Dynamic Band Selection in Uplink Enhanced Licensed-Assisted Access, Wireless Communications & Mobile Computing, 2020, Online publication date: 1-Jan-2020.
  104. Shen X, Chen Q, Nie Y, Gan K and Pinchera D (2020). Adaptive Secure MIMO Transmission Mechanism against Smart Attacker, Wireless Communications & Mobile Computing, 2020, Online publication date: 1-Jan-2020.
  105. Shah A, Ganesan R, Jajodia S, Samarati P and Cam H (2019). Adaptive Alert Management for Balancing Optimal Performance among Distributed CSOCs using Reinforcement Learning, IEEE Transactions on Parallel and Distributed Systems, 31:1, (16-33), Online publication date: 1-Jan-2020.
  106. Pagani S, Manoj P, Jantsch A and Henkel J (2019). Machine Learning for Power, Energy, and Thermal Management on Multicore Processors: A Survey, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39:1, (101-116), Online publication date: 1-Jan-2020.
  107. Sharma S and Wang X (2020). Toward Massive Machine Type Communications in Ultra-Dense Cellular IoT Networks: Current Issues and Machine Learning-Assisted Solutions, IEEE Communications Surveys & Tutorials, 22:1, (426-471), Online publication date: 1-Jan-2020.
  108. ACM
    Lei Y and Li W (2019). Interactive Recommendation with User-Specific Deep Reinforcement Learning, ACM Transactions on Knowledge Discovery from Data, 13:6, (1-15), Online publication date: 31-Dec-2020.
  109. Ishizuka M and Kurashige K Self-Generation of Reward by Inputs from Multi Sensors -Integration of Evaluations for Inputs to Avoid Danger- 2018 International Symposium on Micro-NanoMechatronics and Human Science (MHS), (1-7)
  110. Russel R and Petrik M Beyond confidence regions Proceedings of the 33rd International Conference on Neural Information Processing Systems, (7049-7058)
  111. Zeeshan M and Xu H Jerk-Bounded Trajectory Planning of Industrial Manipulators* 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), (1089-1096)
  112. Bui H and Chong N Autonomous Speech Volume Control for Social Robots in a Noisy Environment Using Deep Reinforcement Learning* 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), (1263-1268)
  113. Rizvi S and Lin Z (2019). Experience replay–based output feedback Q‐learning scheme for optimal output tracking control of discrete‐time linear systems, International Journal of Adaptive Control and Signal Processing, 33:12, (1825-1842), Online publication date: 2-Dec-2019.
  114. Van Huynh N, Nguyen D, Hoang D and Dutkiewicz E (2019). “Jam Me If You Can:” Defeating Jammer With Deep Dueling Neural Network Architecture and Ambient Backscattering Augmented Communications, IEEE Journal on Selected Areas in Communications, 37:11, (2603-2620), Online publication date: 1-Nov-2019.
  115. Hernandez-Leal P, Kartal B and Taylor M (2019). A survey and critique of multiagent deep reinforcement learning, Autonomous Agents and Multi-Agent Systems, 33:6, (750-797), Online publication date: 1-Nov-2019.
  116. Fluri C, Ruch C, Zilly J, Hakenberg J and Frazzoli E Learning to Operate a Fleet of Cars 2019 IEEE Intelligent Transportation Systems Conference (ITSC), (2292-2298)
  117. Hamzehi S, Bogenberger K, Franeck P and Kaltenhäuser B Combinatorial Reinforcement Learning of Linear Assignment Problems 2019 IEEE Intelligent Transportation Systems Conference (ITSC), (3314-3321)
  118. Satic U, Jacko P and Kirkbride C Performance Evaluation of Scheduling Policies for the DRCMPSP Analytical and Stochastic Modelling Techniques and Applications, (100-114)
  119. Nageshrao S, Tseng H and Filev D Autonomous Highway Driving using Deep Reinforcement Learning 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), (2326-2331)
  120. Schwung D, Modali M and Schwung A Self-Optimization in Smart Production Systems using Distributed Reinforcement Learning 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), (4063-4068)
  121. Sun Y, Peng M, Zhou Y, Huang Y and Mao S (2019). Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues, IEEE Communications Surveys & Tutorials, 21:4, (3072-3108), Online publication date: 1-Oct-2019.
  122. Luong N, Hoang D, Gong S, Niyato D, Wang P, Liang Y and Kim D (2019). Applications of Deep Reinforcement Learning in Communications and Networking: A Survey, IEEE Communications Surveys & Tutorials, 21:4, (3133-3174), Online publication date: 1-Oct-2019.
  123. Carpi F, H�ger C, Martal� M, Raheli R and Pfister H Reinforcement Learning for Channel Coding: Learned Bit-Flipping Decoding 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (922-929)
  124. Barriga A, Rutle A and Heldal R Personalized and automatic model repairing using reinforcement learning Proceedings of the 22nd International Conference on Model Driven Engineering Languages and Systems, (175-181)
  125. Cardellini V, Lo Presti F, Nardelli M and Rossi F Self-adaptive Container Deployment in the Fog: A Survey Algorithmic Aspects of Cloud Computing, (77-102)
  126. Watson D (2019). The Rhetoric and Reality of Anthropomorphism in Artificial Intelligence, Minds and Machines, 29:3, (417-440), Online publication date: 1-Sep-2019.
  127. Luo Z, Wu C, Li Z and Zhou W (2019). Scaling Geo-Distributed Network Function Chains: A Prediction and Learning Framework, IEEE Journal on Selected Areas in Communications, 37:8, (1838-1850), Online publication date: 1-Aug-2019.
  128. ACM
    Sharma M, Komninos A, López-Ibáñez M and Kazakov D Deep reinforcement learning based parameter control in differential evolution Proceedings of the Genetic and Evolutionary Computation Conference, (709-717)
  129. ACM
    Stapelberg B and Malan K Global structure of policy search spaces for reinforcement learning Proceedings of the Genetic and Evolutionary Computation Conference Companion, (1773-1781)
  130. ACM
    Russo G, Cardellini V and Presti F Reinforcement Learning Based Policies for Elastic Stream Processing on Heterogeneous Resources Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems, (31-42)
  131. ACM
    Gong Y, Li B, Liang B and Zhan Z Chic Proceedings of the International Symposium on Quality of Service, (1-10)
  132. Heo K, Oh H and Yang H Resource-aware program analysis via online abstraction coarsening Proceedings of the 41st International Conference on Software Engineering, (94-104)
  133. Eisen M, Zhang C, Chamon L, Lee D and Ribeiro A (2019). Learning Optimal Resource Allocations in Wireless Systems, IEEE Transactions on Signal Processing, 67:10, (2775-2790), Online publication date: 15-May-2019.
  134. Innes C and Lascarides A Learning Factored Markov Decision Processes with Unawareness Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, (2030-2032)
  135. Elkholy A, Yang F and Gustafson S Interpretable Automated Machine Learning in Maana™ Knowledge Platform Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, (1937-1939)
  136. Anastassacos N and Musolesi M Towards Decentralized Reinforcement Learning Architectures for Social Dilemmas Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, (1776-1777)
  137. Yang F, Liu B and Dong W Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, (296-304)
  138. Narvekar S and Stone P Learning Curriculum Policies for Reinforcement Learning Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, (25-33)
  139. Zeigler B and Kim D Multi-resolution modeling for adaptive UAV service systems Proceedings of the Annual Simulation Symposium, (1-12)
  140. ACM
    Govindaiah S and Petty M Applying Reinforcement Learning to Plan Manufacturing Material Handling Part 1 Proceedings of the 2019 ACM Southeast Conference, (168-171)
  141. Xu S, Liu P, Wang R and Panwar S Realtime Scheduling and Power Allocation Using Deep Neural Networks 2019 IEEE Wireless Communications and Networking Conference (WCNC), (1-5)
  142. Krishnan S, Garg A, Liaw R, Thananjeyan B, Miller L, Pokorny F and Goldberg K (2020). SWIRL, International Journal of Robotics Research, 38:2-3, (126-145), Online publication date: 1-Mar-2019.
  143. ACM
    Gillies M (2019). Understanding the Role of Interactive Machine Learning in Movement Interaction Design, ACM Transactions on Computer-Human Interaction, 26:1, (1-34), Online publication date: 28-Feb-2019.
  144. Mazouchi M, Naghibi‐Sistani M, Hosseini Sani S, Tatari F and Modares H (2018). Observer‐based adaptive optimal output containment control problem of linear heterogeneous Multiagent systems with relative output measurements, International Journal of Adaptive Control and Signal Processing, 33:2, (262-284), Online publication date: 3-Feb-2019.
  145. Guay M and Atta K (2018). A set‐based model‐free reinforcement learning design technique for nonlinear systems, International Journal of Adaptive Control and Signal Processing, 33:2, (315-334), Online publication date: 3-Feb-2019.
  146. Poveda J, Benosman M and Teel A (2018). Hybrid online learning control in networked multiagent systems, International Journal of Adaptive Control and Signal Processing, 33:2, (228-261), Online publication date: 3-Feb-2019.
  147. Huang M, Gao W and Jiang Z (2017). Connected cruise control with delayed feedback and disturbance, International Journal of Adaptive Control and Signal Processing, 33:2, (356-370), Online publication date: 3-Feb-2019.
  148. Vamvoudakis K, Lewis F and Dixon W (2017). Open‐loop Stackelberg learning solution for hierarchical control problems, International Journal of Adaptive Control and Signal Processing, 33:2, (285-299), Online publication date: 3-Feb-2019.
  149. van Heeswijk W, Mes M and Schutten J (2017). The Delivery Dispatching Problem with Time Windows for Urban Consolidation Centers, Transportation Science, 53:1, (203-221), Online publication date: 1-Feb-2019.
  150. Becker S, Cheridito P and Jentzen A (2021). Deep optimal stopping, The Journal of Machine Learning Research, 20:1, (2712-2736), Online publication date: 1-Jan-2019.
  151. Ryzhov I, Mes M, Powell W and van den Berg G (2019). Bayesian Exploration for Approximate Dynamic Programming, Operations Research, 67:1, (198-214), Online publication date: 1-Jan-2019.
  152. Shah H, Koo I, Kwak K and Dimitrijevic B (2019). Actor–Critic-Algorithm-Based Accurate Spectrum Sensing and Transmission Framework and Energy Conservation in Energy-Constrained Wireless Sensor Network-Based Cognitive Radios, Wireless Communications & Mobile Computing, 2019, Online publication date: 1-Jan-2019.
  153. Huang B, Li Y, Li Z, Pan L, Wang S, Xu Y, Hu H and El-Hajjar M (2019). Security and Cost-Aware Computation Offloading via Deep Reinforcement Learning in Mobile Edge Computing, Wireless Communications & Mobile Computing, 2019, Online publication date: 1-Jan-2019.
  154. Zhang T, Xie S and Rose O Real-time batching in job shops based on simulation and reinforcement learning Proceedings of the 2018 Winter Simulation Conference, (3331-3339)
  155. Rabe M, Ammouriova M and Schmitt D Utilizing domain-specific information for the optimization of logistics networks Proceedings of the 2018 Winter Simulation Conference, (2873-2884)
  156. Fu M Monte Carlo tree search Proceedings of the 2018 Winter Simulation Conference, (222-236)
  157. Ibarz B, Leike J, Pohlen T, Irving G, Legg S and Amodei D Reward learning from human preferences and demonstrations in Atari Proceedings of the 32nd International Conference on Neural Information Processing Systems, (8022-8034)
  158. Hong Z, Shann T, Su S, Chang Y, Fu T and Lee C Diversity-driven exploration strategy for deep reinforcement learning Proceedings of the 32nd International Conference on Neural Information Processing Systems, (10510-10521)
  159. Ohnishi M, Yukawa M, Johansson M and Sugiyama M Continuous-time value function approximation in reproducing kernel hilbert spaces Proceedings of the 32nd International Conference on Neural Information Processing Systems, (2818-2829)
  160. Marom O and Rosman B Zero-shot transfer with deictic object-oriented representation in reinforcement learning Proceedings of the 32nd International Conference on Neural Information Processing Systems, (2297-2305)
  161. Choudhury S, Bhardwaj M, Arora S, Kapoor A, Ranade G, Scherer S and Dey D (2020). Data-driven planning via imitation learning, International Journal of Robotics Research, 37:13-14, (1632-1672), Online publication date: 1-Dec-2018.
  162. R. T. R, Das G and Sen D (2018). Energy Efficient Scheduling for Concurrent Transmission in Millimeter Wave WPANs, IEEE Transactions on Mobile Computing, 17:12, (2789-2803), Online publication date: 1-Dec-2018.
  163. Hung S, Hsu H, Cheng S, Cui Q and Chen K (2018). Delay Guaranteed Network Association for Mobile Machines in Heterogeneous Cloud Radio Access Network, IEEE Transactions on Mobile Computing, 17:12, (2744-2760), Online publication date: 1-Dec-2018.
  164. Prause M and Weigand J (2018). Market Model Benchmark Suite for Machine Learning Techniques, IEEE Computational Intelligence Magazine, 13:4, (14-24), Online publication date: 1-Nov-2018.
  165. Hu S and Bettis R (2018). Multiple Organization Goals with Feedback from Shared Technological Task Environments, Organization Science, 29:5, (873-889), Online publication date: 1-Oct-2018.
  166. Schuller B Multimodal user state and trait recognition The Handbook of Multimodal-Multisensor Interfaces, (129-165)
  167. Primeau N, Falcon R, Abielmona R and Petriu E (2018). A Review of Computational Intelligence Techniques in Wireless Sensor and Actuator Networks, IEEE Communications Surveys & Tutorials, 20:4, (2822-2854), Online publication date: 1-Oct-2018.
  168. Mao Q, Hu F and Hao Q (2018). Deep Learning for Intelligent Wireless Networks: A Comprehensive Survey, IEEE Communications Surveys & Tutorials, 20:4, (2595-2621), Online publication date: 1-Oct-2018.
  169. ACM
    Martinez-Gil F, Lozano M, Garc�a-Fern�ndez I and Fern�ndez F (2017). Modeling, Evaluation, and Scale on Artificial Pedestrians, ACM Computing Surveys, 50:5, (1-35), Online publication date: 30-Sep-2018.
  170. ACM
    Hu Y, Da Q, Zeng A, Yu Y and Xu Y Reinforcement Learning to Rank in E-Commerce Search Engine Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (368-377)
  171. Taylor M Improving reinforcement learning with human input Proceedings of the 27th International Joint Conference on Artificial Intelligence, (5724-5728)
  172. Grau-Moya J, Leibfried F and Bou-Ammar H Balancing two-player stochastic games with soft Q-learning Proceedings of the 27th International Joint Conference on Artificial Intelligence, (268-274)
  173. Brisk R, Bond R, Liu J, Finlay D, McLaughlin J and McEneaney D AI to enhance interactive simulation-based training in resuscitation medicine Proceedings of the 32nd International BCS Human Computer Interaction Conference, (1-4)
  174. ACM
    Elfwing S, Uchibe E and Doya K Online meta-learning by parallel algorithm competition Proceedings of the Genetic and Evolutionary Computation Conference, (426-433)
  175. Aref M and Jayaweera S (2018). Jamming-Resilient Wideband Cognitive Radios with Multi-Agent Reinforcement Learning, International Journal of Software Science and Computational Intelligence, 10:3, (1-23), Online publication date: 1-Jul-2018.
  176. ACM
    Mu T, Al-Fuqaha A, Shuaib K, Sallabi F and Qadir J (2018). SDN Flow Entry Management Using Reinforcement Learning, ACM Transactions on Autonomous and Adaptive Systems, 13:2, (1-23), Online publication date: 30-Jun-2018.
  177. ACM
    Emam S and Miller J (2018). Inferring Extended Probabilistic Finite-State Automaton Models from Software Executions, ACM Transactions on Software Engineering and Methodology, 27:1, (1-39), Online publication date: 5-Jun-2018.
  178. ACM
    Yau K, Qadir J, Khoo H, Ling M and Komisarczuk P (2017). A Survey on Reinforcement Learning Models and Algorithms for Traffic Signal Control, ACM Computing Surveys, 50:3, (1-38), Online publication date: 31-May-2018.
  179. Benosman M (2018). Model‐based vs data‐driven adaptive control, International Journal of Adaptive Control and Signal Processing, 32:5, (753-776), Online publication date: 9-May-2018.
  180. Luley R and Qiu Q Optimizing data transfers for improved performance on shared GPUs using reinforcement learning Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (378-381)
  181. Crawford D, Levit A, Ghadermarzy N, Oberoi J and Ronagh P (2018). Reinforcement learning using quantum boltzmann machines, Quantum Information & Computation, 18:1-2, (51-74), Online publication date: 1-Feb-2018.
  182. Šošić A, Zoubir A, Rueckert E, Peters J and Koeppl H (2018). Inverse reinforcement learning via nonparametric spatio-temporal subgoal modeling, The Journal of Machine Learning Research, 19:1, (2777-2821), Online publication date: 1-Jan-2018.
  183. Yu H, Mahmood A and Sutton R (2018). On generalized Bellman equations and temporal-difference learning, The Journal of Machine Learning Research, 19:1, (1864-1912), Online publication date: 1-Jan-2018.
  184. Sun J, Li J and Monterola C (2018). A Stable Distributed Neural Controller for Physically Coupled Networked Discrete-Time System via Online Reinforcement Learning, Complexity, 2018, Online publication date: 1-Jan-2018.
  185. Lei X, Zhang Z, Dong P and Pennock G (2018). Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning, Journal of Robotics, 2018, Online publication date: 1-Jan-2018.
  186. Xi L, Liu L, Huang Y, Xu Y, Zhang Y and Zhou Y (2018). Research on Hierarchical and Distributed Control for Smart Generation Based on Virtual Wolf Pack Strategy, Complexity, 2018, Online publication date: 1-Jan-2018.
  187. Turner M, Smaldino P and Hilbert M (2018). Paths to Polarization, Complexity, 2018, Online publication date: 1-Jan-2018.
  188. ACM
    Yan Y, Zhang S, Feng Z, Wang T, Sun W and Bao H A Multiagent Learning based Hierarchical Distribute Control Framework with Low Communication Costs in Large Scale AUG Networks Proceedings of the 2017 International Conference on Automation, Control and Robots, (1-8)
  189. Dai H, Khalil E, Zhang Y, Dilkina B and Song L Learning combinatorial optimization algorithms over graphs Proceedings of the 31st International Conference on Neural Information Processing Systems, (6351-6361)
  190. Cesa-Bianchi N, Gentile C, Lugosi G and Neu G Boltzmann exploration done right Proceedings of the 31st International Conference on Neural Information Processing Systems, (6287-6296)
  191. van Seijen H, Fatemi M, Romoff J, Laroche R, Barnes T and Tsang J Hybrid reward architecture for reinforcement learning Proceedings of the 31st International Conference on Neural Information Processing Systems, (5398-5408)
  192. Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, P�rolat J, Silver D and Graepel T A unified game-theoretic approach to multiagent reinforcement learning Proceedings of the 31st International Conference on Neural Information Processing Systems, (4193-4206)
  193. Zhang T, Xie S and Rose O Real-time job shop scheduling based on simulation and markov decision processes Proceedings of the 2017 Winter Simulation Conference, (1-9)
  194. Songhori M, Jalali M and Terano T The effects of teams' initial characterizations of interactions on product development performance Proceedings of the 2017 Winter Simulation Conference, (1-12)
  195. ACM
    Oosterhuis H and de Rijke M Balancing Speed and Quality in Online Learning to Rank for Information Retrieval Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (277-286)
  196. ACM
    Collins A Strategically Forming Groups in the El Farol Bar Problem Proceedings of the 2017 International Conference of The Computational Social Science Society of the Americas, (1-6)
  197. Ososkov G and Goncharov P (2017). Shallow and deep learning for image classification, Optical Memory and Neural Networks, 26:4, (221-248), Online publication date: 1-Oct-2017.
  198. Nikolić D (2017). Why deep neural nets cannot ever match biological intelligence and what to do about it?, International Journal of Automation and Computing, 14:5, (532-541), Online publication date: 1-Oct-2017.
  199. ACM
    Mao H, Netravali R and Alizadeh M Neural Adaptive Video Streaming with Pensieve Proceedings of the Conference of the ACM Special Interest Group on Data Communication, (197-210)
  200. ACM
    Petrova I and Buzdalova A Reinforcement learning based dynamic selection of auxiliary objectives with preservation of the best found solution Proceedings of the Genetic and Evolutionary Computation Conference Companion, (1435-1438)
  201. ACM
    Farhana E and Heber S Biogeography-based rule mining for classification Proceedings of the Genetic and Evolutionary Computation Conference, (417-424)
  202. ACM
    Chakareski J Drone Networks for Virtual Human Teleportation Proceedings of the 3rd Workshop on Micro Aerial Vehicle Networks, Systems, and Applications, (21-26)
  203. Lamont S, Aslanides J, Leike J and Hutter M Generalised Discount Functions applied to a Monte-Carlo AI u Implementation Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, (1589-1591)
  204. Ramos G, da Silva B and Bazzan A Learning to Minimise Regret in Route Choice Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, (846-855)
  205. Craven T and Krejci C An agent-based model of regional food supply chain disintermediation Proceedings of the Agent-Directed Simulation Symposium, (1-10)
  206. ACM
    Esmaeil Zadeh Soudjani S and Majumdar R Controller Synthesis for Reward Collecting Markov Processes in Continuous Space Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control, (45-54)
  207. ACM
    Papaioannou I and Lemon O Combining Chat and Task-Based Multimodal Dialogue for More Engaging HRI Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, (365-366)
  208. Yang J, Zhu K, Ran Y, Cai W and Yang E (2017). JointźAdmissionźControlźandźRoutingźViaźApproximate Dynamic Programming for Streaming Video Over Software-Defined Networking, IEEE Transactions on Multimedia, 19:3, (619-631), Online publication date: 1-Mar-2017.
  209. Gavrilina E, Zakharov M, Karpenko A, Smirnova E and Sokolov A (2017). Software System META-3 for Quantitative Evaluation of Student's Meta-competencies on the Basis of Analysis of his or her Behavior in Social Networking Services, Procedia Computer Science, 103:C, (432-438), Online publication date: 1-Mar-2017.
  210. Hanna J and Stone P Grounded action transformation for robot learning in simulation Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (3834-3840)
  211. Li Z, Narayan A and Leong T An efficient approach to model-based hierarchical reinforcement learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (3583-3589)
  212. James S, Konidaris G and Rosman B An analysis of monte carlo tree search Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (3576-3582)
  213. Hanna J, Stone P and Niekum S Bootstrapping with models Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (4933-4934)
  214. Thomas P, Theocharous G, Ghavamzadeh M, Durugkar I and Brunskill E Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (4740-4745)
  215. Lowe R and Billing E (2017). Affective-Associative Two-Process theory, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 25:1, (5-23), Online publication date: 1-Feb-2017.
  216. Rovira A and Slater M (2017). Reinforcement Learning as a tool to make people move to a specific location in Immersive Virtual Reality, International Journal of Human-Computer Studies, 98:C, (89-94), Online publication date: 1-Feb-2017.
  217. Wirth C, Akrour R, Neumann G and F�rnkranz J (2017). A survey of preference-based reinforcement learning methods, The Journal of Machine Learning Research, 18:1, (4945-4990), Online publication date: 1-Jan-2017.
  218. Vinogradska J, Bischoff B, Nguyen-Tuong D and Peters J (2017). Stability of controllers for Gaussian process dynamics, The Journal of Machine Learning Research, 18:1, (3483-3519), Online publication date: 1-Jan-2017.
  219. Qiao M, Zhao H, Huang S, Zhou L, Wang S and Bai K (2017). Optimal Channel Selection Based on Online Decision and Offline Learning in Multichannel Wireless Sensor Networks, Wireless Communications & Mobile Computing, 2017, Online publication date: 1-Jan-2017.
  220. Merlone U, Manassero E and Zara G The lingering effects of past crimes over future criminal careers Proceedings of the 2016 Winter Simulation Conference, (3532-3543)
  221. Ho M, Littman M, MacGlashan J, Cushman F and Austerweil J Showing versus doing Proceedings of the 30th International Conference on Neural Information Processing Systems, (3035-3043)
  222. Norouzi M, Bengio S, Chen Z, Jaitly N, Schuster M, Wu Y and Schuurmans D Reward augmented maximum likelihood for neural structured prediction Proceedings of the 30th International Conference on Neural Information Processing Systems, (1731-1739)
  223. ACM
    Mao H, Alizadeh M, Menache I and Kandula S Resource Management with Deep Reinforcement Learning Proceedings of the 15th ACM Workshop on Hot Topics in Networks, (50-56)
  224. ACM
    Shafto P and Nasraoui O Human-Recommender Systems Proceedings of the 10th ACM Conference on Recommender Systems, (127-130)
  225. Le T, Liu S and Lau H A reinforcement learning framework for trajectory prediction under uncertainty and budget constraint Proceedings of the Twenty-second European Conference on Artificial Intelligence, (347-354)
  226. ACM
    Fudenberg D and Peysakhovich A (2016). Recency, Records, and Recaps, ACM Transactions on Economics and Computation, 4:4, (1-18), Online publication date: 26-Aug-2016.
  227. ACM
    Krakovsky M (2016). Reinforcement renaissance, Communications of the ACM, 59:8, (12-14), Online publication date: 22-Jul-2016.
  228. ACM
    Rost A, Petrova I and Buzdalova A Adaptive Parameter Selection in Evolutionary Algorithms by Reinforcement Learning with Dynamic Discretization of Parameter Range Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, (141-142)
  229. ACM
    Smith R, Kelly S and Heywood M Discovering Rubik's Cube Subgroups using Coevolutionary GP Proceedings of the Genetic and Evolutionary Computation Conference 2016, (789-796)
  230. ACM
    Li X and Yang G Transferable XCS Proceedings of the Genetic and Evolutionary Computation Conference 2016, (453-460)
  231. ACM
    Shen S and Chi M Reinforcement Learning Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, (37-44)
  232. Liu B, Liu J, Ghavamzadeh M, Mahadevan S and Petrik M Proximal gradient temporal difference learning algorithms Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (4195-4199)
  233. Konidaris G Constructing abstraction hierarchies using a skill-symbol loop Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (1648-1654)
  234. Doshi-Velez F and Konidaris G Hidden parameter markov decision processes Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (1432-1440)
  235. Hein D, Hentschel A, Runkler T and Udluft S (2016). Reinforcement Learning with Particle Swarm Optimization Policy PSO-P in Continuous State and Action Spaces, International Journal of Swarm Intelligence Research, 7:3, (23-42), Online publication date: 1-Jul-2016.
  236. Memarzadeh M and Pozzi M (2016). Integrated Inspection Scheduling and Maintenance Planning for Infrastructure Systems, Computer-Aided Civil and Infrastructure Engineering, 31:6, (403-415), Online publication date: 1-Jun-2016.
  237. Büyüktahtakın İ and Liu N (2016). Dynamic programming approximation algorithms for the capacitated lot-sizing problem, Journal of Global Optimization, 65:2, (231-259), Online publication date: 1-Jun-2016.
  238. ACM
    Kiourt C, Pavlidis G and Kalles D ReSkill Proceedings of the 9th Hellenic Conference on Artificial Intelligence, (1-4)
  239. Le T, Liu S and Lau H Reinforcement Learning Framework for Modeling Spatial Sequential Decisions under Uncertainty Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, (1449-1450)
  240. Narvekar S, Sinapov J, Leonetti M and Stone P Source Task Creation for Curriculum Learning Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, (566-574)
  241. Liang Y, Machado M, Talvitie E and Bowling M State of the Art Control of Atari Games Using Shallow Reinforcement Learning Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, (485-493)
  242. Yu Y, Hou P, Da Q and Qian Y Boosting Nonparametric Policies Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, (477-484)
  243. ACM
    Losada D, Parapar J and Barreiro Á Feeling lucky? Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1027-1034)
  244. ACM
    Smucker M and Clarke C Modeling Optimal Switching Behavior Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval, (317-320)
  245. Puranam P and Swamy M (2016). How Initial Representations Shape Coupled Learning Processes, Organization Science, 27:2, (323-335), Online publication date: 1-Mar-2016.
  246. Wang Z, Wang X and Aggarwal V (2016). Transmission With Energy Harvesting Nodes in Frequency-Selective Fading Channels, IEEE Transactions on Wireless Communications, 15:3, (1642-1656), Online publication date: 1-Mar-2016.
  247. ACM
    Schuth A, Oosterhuis H, Whiteson S and de Rijke M Multileave Gradient Descent for Fast Online Learning to Rank Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, (457-466)
  248. Alsarhan A (2016). Reinforcement Learning for Routing and Spectrum Management in Cognitive Wireless Mesh Network, International Journal of Wireless Networks and Broadband Technologies, 5:1, (59-72), Online publication date: 1-Jan-2016.
  249. Sarhan S, Abu ElSoud M and Rashed H (2016). Enhancing video games policy based on Least-Squares Continuous Action Policy Iteration, International Journal of Computer Games Technology, 2016, (2-2), Online publication date: 1-Jan-2016.
  250. ACM
    Emam S and Miller J (2015). Test Case Prioritization Using Extended Digraphs, ACM Transactions on Software Engineering and Methodology, 25:1, (1-41), Online publication date: 2-Dec-2015.
  251. Nikolaidis S, Lasota P, Ramakrishnan R and Shah J (2015). Improved human–robot team performance through cross-training, an approach inspired by human team training practices, International Journal of Robotics Research, 34:14, (1711-1730), Online publication date: 1-Dec-2015.
  252. Sánchez E, Clempner J and Poznyak A (2015). A priori-knowledge/actor-critic reinforcement learning architecture for computing the mean-variance customer portfolio, Engineering Applications of Artificial Intelligence, 46:PA, (82-92), Online publication date: 1-Nov-2015.
  253. ACM
    Kordali A and Cottis P Cognitive channel selection for opportunistic spectrum access Proceedings of the 19th Panhellenic Conference on Informatics, (261-266)
  254. ACM
    Luo J, Dong X and Yang H Learning to Reinforce Search Effectiveness Proceedings of the 2015 International Conference on The Theory of Information Retrieval, (271-280)
  255. ACM
    Luo J, Dong X and Yang H Session Search by Direct Policy Learning Proceedings of the 2015 International Conference on The Theory of Information Retrieval, (261-270)
  256. Wang J, Ding X, Lahijanian M, Paschalidis I and Belta C (2015). Temporal logic motion control using actor–critic methods, International Journal of Robotics Research, 34:10, (1329-1344), Online publication date: 1-Sep-2015.
  257. ACM
    Kharitonov E, Macdonald C, Serdyukov P and Ounis I Optimised Scheduling of Online Experiments Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, (453-462)
  258. Lopez-Guede J, Fernandez-Gauna, B, Graña M and Zulueta E (2015). Training Multiagent Systems by Q-Learning, Computational Intelligence, 31:3, (498-512), Online publication date: 1-Aug-2015.
  259. Hermoso R, Lopes Cardoso H and Fasli M (2015). From roles to standards, Information Systems Frontiers, 17:4, (763-778), Online publication date: 1-Aug-2015.
  260. Theocharous G, Thomas P and Ghavamzadeh M Personalized ad recommendation systems for life-time value optimization with guarantees Proceedings of the 24th International Conference on Artificial Intelligence, (1806-1812)
  261. Liu B, Liu J, Ghavamzadeh M, Mahadevan S and Petrik M Finite-sample analysis of proximal gradient TD algorithms Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, (504-513)
  262. ACM
    Trunfio G An Effective Approach for Adapting the Size of Subcomponents in Large-Scale Optimization with Cooperative Coevolution Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation, (1495-1496)
  263. ACM
    Petrova I and Buzdalova A Selection of Auxiliary Objectives in the Travelling Salesman Problem using Reinforcement Learning Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation, (1455-1456)
  264. ACM
    Kelly S and Heywood M Knowledge Transfer from Keepaway Soccer to Half-field Offense through Program Symbiosis Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, (1143-1150)
  265. ACM
    Schrum J and Miikkulainen R Solving Interleaved and Blended Sequential Decision-Making Problems through Modular Neuroevolution Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, (345-352)
  266. Kula U and Ocaktan B (2015). A reinforcement learning algorithm with fuzzy approximation for semi Markov decision problems, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 28:4, (1733-1744), Online publication date: 1-Jul-2015.
  267. Censi A and Murray R (2015). Bootstrapping bilinear models of Simple Vehicles, International Journal of Robotics Research, 34:8, (1087-1113), Online publication date: 1-Jul-2015.
  268. Thomos N, Kurdoglu E, Frossard P and van der Schaar M (2015). Adaptive Prioritized Random Linear Coding and Scheduling for Layered Data Delivery From Multiple Servers, IEEE Transactions on Multimedia, 17:6, (893-906), Online publication date: 1-Jun-2015.
  269. ACM
    Theocharous G, Thomas P and Ghavamzadeh M Ad Recommendation Systems for Life-Time Value Optimization Proceedings of the 24th International Conference on World Wide Web, (1305-1310)
  270. Jia Y Hyperheuristic search for SBST Proceedings of the Eighth International Workshop on Search-Based Software Testing, (15-16)
  271. Jia Y, Cohen M, Harman M and Petke J Learning combinatorial interaction test generation strategies using hyperheuristic search Proceedings of the 37th International Conference on Software Engineering - Volume 1, (540-550)
  272. Godoy J, Karamouzas I, Guy S and Gini M Adaptive Learning for Multi-Agent Navigation Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1577-1585)
  273. Prasad H, L.A. P and Bhatnagar S Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (1371-1379)
  274. Efthymiadis K and Kudenko D Knowledge Revision for Reinforcement Learning with Abstract MDPs Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (763-770)
  275. Sen S, Ridgway A and Ripley M Adaptive Budgeted Bandit Algorithms for Trust Development in a Supply-Chain Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (137-144)
  276. Wang M and Bertsekas D (2015). Incremental constraint projection methods for variational inequalities, Mathematical Programming: Series A and B, 150:2, (321-363), Online publication date: 1-May-2015.
  277. ACM
    Chen X, Bailly G, Brumby D, Oulasvirta A and Howes A The Emergence of Interactive Behavior Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, (4217-4226)
  278. Helms T, Reinhardt O and Uhrmacher A Bayesian changepoint detection for generic adaptive simulation algorithms Proceedings of the 48th Annual Simulation Symposium, (62-69)
  279. Geramifard A, Dann C, Klein R, Dabney W and How J (2015). RLPy, The Journal of Machine Learning Research, 16:1, (1573-1578), Online publication date: 1-Jan-2015.
  280. Helms T, Maus C, Haack F and Uhrmacher A Multi-level modeling and simulation of cell biological systems with ML-rules Proceedings of the 2014 Winter Simulation Conference, (177-191)
  281. Namiki N, Oyo K and Takahashi T How do humans handle the dilemma of exploration and exploitation in sequential decision making? Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies, (113-117)
  282. ACM
    Cuay�huitl H, Kruijff-Korbayov� I and Dethlefs N (2014). Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots, ACM Transactions on Interactive Intelligent Systems, 4:3, (1-30), Online publication date: 21-Nov-2014.
  283. ACM
    Zhu M, Hu Z and Liu P Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed Proceedings of the First ACM Workshop on Moving Target Defense, (51-58)
  284. Taherian N and Shiri M (2014). Q^{*}-based state abstraction and knowledge discovery in reinforcement learning, Intelligent Data Analysis, 18:6, (1153-1175), Online publication date: 1-Nov-2014.
  285. Seijen H, Whiteson S and Kester L (2014). EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING, Computational Intelligence, 30:4, (657-699), Online publication date: 1-Nov-2014.
  286. ACM
    Tavakol M and Brefeld U Factored MDPs for detecting topics of user sessions Proceedings of the 8th ACM Conference on Recommender systems, (33-40)
  287. ACM
    Pejovic V and Musolesi M Anticipatory mobile computing for behaviour change interventions Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, (1025-1034)
  288. ACM
    Valerio L, Bruno R and Passarella A Adaptive data offloading in opportunistic networks through an actor-critic learning method Proceedings of the 9th ACM MobiCom workshop on Challenged networks, (31-36)
  289. Massera G, Ferrauto T, Gigliotta O and Nolfi S (2014). Designing adaptive humanoid robots through the FARSA open-source framework, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 22:4, (255-265), Online publication date: 1-Aug-2014.
  290. ACM
    Buzdalova A, Kononov V and Buzdalov M Selecting evolutionary operators using reinforcement learning Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, (1033-1036)
  291. ACM
    Acre J, Zoller N and Eskridge B Effects of personality decay on collective movements Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, (1025-1028)
  292. ACM
    Yliniemi L, Agogino A and Tumer K Evolutionary agent-based simulation of the introduction of new technologies in air traffic management Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, (1215-1222)
  293. ACM
    Acre J, Eskridge B, Zoller N and Schlupp I Adapting to a changing environment using winner and loser effects Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, (137-144)
  294. Valenzano R, Sturtevant N, Schaeffer J and Xie F A comparison of knowledge-based GBFS enhancements and knowledge-free exploration Proceedings of the Twenty-Fourth International Conferenc on International Conference on Automated Planning and Scheduling, (375-379)
  295. Feldman Z and Domshlak C On MABs and separation of concerns in Monte-Carlo planning for MDPs Proceedings of the Twenty-Fourth International Conferenc on International Conference on Automated Planning and Scheduling, (120-127)
  296. Da Q, Yu Y and Zhou Z Napping for functional representation of policy Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, (189-196)
  297. Devlin S, Yliniemi L, Kudenko D and Tumer K Potential-based difference rewards for multiagent reinforcement learning Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, (165-172)
  298. Rieser V, Lemon O and Keizer S (2014). Natural language generation as incremental planning under uncertainty, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22:5, (979-994), Online publication date: 1-May-2014.
  299. Raeisy B, Golbahar Haghighi S and Safavi A (2014). Active noise control system via multi-agent credit assignment, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 26:2, (1051-1063), Online publication date: 1-Mar-2014.
  300. Wilson A, Fern A and Tadepalli P (2014). Using trajectory data to improve bayesian optimization for reinforcement learning, The Journal of Machine Learning Research, 15:1, (253-282), Online publication date: 1-Jan-2014.
  301. Sebag M (2014). A tour of machine learning, AI Communications, 27:1, (11-23), Online publication date: 1-Jan-2014.
  302. Baumann M and B�ning H (2014). Adaptive function approximation in reinforcement learning with an interpolating growing neural gas, International Journal of Hybrid Intelligent Systems, 11:1, (55-69), Online publication date: 1-Jan-2014.
  303. Mousavi A, Araabi B and Ahmadabadi M (2015). Context transfer in reinforcement learning using action-value functions, Computational Intelligence and Neuroscience, 2014, (52-52), Online publication date: 1-Jan-2014.
  304. Vadakkan K (2015). An electronic circuit model of the interpostsynaptic functional LINK designed to study the formation of internal sensations in the nervous system, Advances in Artificial Neural Systems, 2014, (13-13), Online publication date: 1-Jan-2014.
  305. Dura-Bernal S, Chadderdon G, Neymotin S, Francis J and Lytton W (2014). Towards a real-time interface between a biomimetic model of sensorimotor cortex and a robotic arm, Pattern Recognition Letters, 36, (204-212), Online publication date: 1-Jan-2014.
  306. Geramifard A, Walsh T, Tellex S, Chowdhary G, Roy N and How J (2013). A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning, Foundations and Trends� in Machine Learning, 6:4, (375-451), Online publication date: 19-Dec-2013.
  307. Gosavi A Relative value iteration for average reward semi-Markov control via simulation Proceedings of the 2013 Winter Simulation Conference: Simulation: Making Decisions in a Complex World, (623-630)
  308. Xu D and Son Y An integrated simulation, Markov decision processes and game theoretic framework for analysis of supply chain competitions Proceedings of the 2013 Winter Simulation Conference: Simulation: Making Decisions in a Complex World, (3930-3931)
  309. Sibertin-Blanc C and Gemayel J Boundedly Rational Agents Playing the Social Actors Game Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 02, (375-382)
  310. ACM
    Kharitonov E, Macdonald C, Serdyukov P and Ounis I Using historical click data to increase interleaving sensitivity Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (679-688)
  311. Watanabe T and Saito Y Camera modeling technique of 3D sensing based on tile coding for computer vision Proceedings of the 8th International Conference on Body Area Networks, (347-350)
  312. ACM
    Campos J, Lopez-Sanchez M, Salam� M, Avila P and Rodr�guez-Aguilar J (2013). Robust Regulation Adaptation in Multi-Agent Systems, ACM Transactions on Autonomous and Adaptive Systems, 8:3, (1-27), Online publication date: 1-Sep-2013.
  313. ACM
    Iturrate I, Omedes J and Montesano L Shared control of a robot using EEG-based feedback signals Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication, (45-50)
  314. ACM
    Buzdalov M, Buzdalova A and Petrova I Generation of tests for programming challenge tasks using multi-objective optimization Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, (1655-1658)
  315. ACM
    Li X and Hirasawa K Extended rule-based genetic network programming Proceedings of the 15th annual conference companion on Genetic and evolutionary computation, (155-156)
  316. ACM
    Lakhman K and Burtsev M Neuroevolution results in emergence of short-term memory in multi-goal environment Proceedings of the 15th annual conference on Genetic and evolutionary computation, (703-710)
  317. ACM
    Helms T, Ewald R, Rybacki S and Uhrmacher A A generic adaptive simulation algorithm for component-based simulation systems Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, (11-22)
  318. Tsoumakos D, Konstantinou I, Boumpouka C, Sioutas S and Koziris N Automated, elastic resource provisioning for NoSQL clusters using TIRAMOLA Proceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (34-41)
  319. Wu A, Wiegand R and Pradhan R Using response probability to build system redundancy in multiagent systems Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1343-1344)
  320. Mahmud M and Ramamoorthy S Learning in non-stationary MDPs as transfer learning Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1259-1260)
  321. Hester T, Lopes M and Stone P Learning exploration strategies in model-based reinforcement learning Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1069-1076)
  322. Gehring C and Precup D Smart exploration in reinforcement learning using absolute temporal difference errors Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1037-1044)
  323. ACM
    Han S, Mok A, Meng J, Wei Y, Huang P, Leng Q, Zhu X, Sentis L, Kim K and Miikkulainen R Architecture of a cyberphysical avatar Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems, (189-198)
  324. Badawy R, Yassine A, He�ler A, Hirsch B and Albayrak S (2013). A novel multi-agent system utilizing quantum-inspired evolution for demand side management in the future smart grid, Integrated Computer-Aided Engineering, 20:2, (127-141), Online publication date: 1-Apr-2013.
  325. ACM
    Knox W and Stone P Learning non-myopically from human-generated reward Proceedings of the 2013 international conference on Intelligent user interfaces, (191-202)
  326. ACM
    Flagg A and MacLean K Affective touch gesture recognition for a furry zoomorphic machine Proceedings of the 7th International Conference on Tangible, Embedded and Embodied Interaction, (25-32)
  327. ACM
    Hofmann K, Schuth A, Whiteson S and de Rijke M Reusing historical interaction data for faster online learning to rank for IR Proceedings of the sixth ACM international conference on Web search and data mining, (183-192)
  328. Cheng S, Raja A and Lesser V (2013). Multiagent meta-level control for radar coordination, Web Intelligence and Agent Systems, 11:1, (81-105), Online publication date: 1-Jan-2013.
  329. Herd S, Krueger K, Kriete T, Huang T, Hazy T and O'Reilly R (2013). Strategic cognitive sequencing, Computational Intelligence and Neuroscience, 2013, (4-4), Online publication date: 1-Jan-2013.
  330. Bigus J, Chen-Ritzo C, Hermiz K, Tesauro G and Sorrentino R Applying a framework for healthcare incentives simulation Proceedings of the Winter Simulation Conference, (1-12)
  331. Cilden E and Polat F Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02, (348-355)
  332. ACM
    Maggio M, Hoffmann H, Papadopoulos A, Panerati J, Santambrogio M, Agarwal A and Leva A (2012). Comparison of Decision-Making Strategies for Self-Optimization in Autonomic Computing Systems, ACM Transactions on Autonomous and Adaptive Systems, 7:4, (1-32), Online publication date: 1-Dec-2012.
  333. Fang C (2012). Organizational Learning as Credit Assignment, Organization Science, 23:6, (1717-1732), Online publication date: 1-Nov-2012.
  334. ACM
    Zhang X, Lin M and Zhang D A learning strategy for software testing optimization based on dynamic programming Proceedings of the Fourth Asia-Pacific Symposium on Internetware, (1-6)
  335. ACM
    Farley B, Juels A, Varadarajan V, Ristenpart T, Bowers K and Swift M More for your money Proceedings of the Third ACM Symposium on Cloud Computing, (1-14)
  336. Powell W, George A, Sim�o H, Scott W, Lamont A and Stewart J (2012). SMART, INFORMS Journal on Computing, 24:4, (665-682), Online publication date: 1-Oct-2012.
  337. ACM
    Moling O, Baltrunas L and Ricci F Optimal radio channel recommendations with explicit and implicit feedback Proceedings of the sixth ACM conference on Recommender systems, (75-82)
  338. Chu Y, Chol Song Y, Levinson R and Kautz H (2012). Interactive activity recognition and prompting to assist people with cognitive disabilities, Journal of Ambient Intelligence and Smart Environments, 4:5, (443-459), Online publication date: 1-Sep-2012.
  339. Campos L, Oliveira R, Melo J and Neto A Overhead-Controlled routing in WSNs with reinforcement learning Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning, (622-629)
  340. Dibangoye J, Amato C and Doniec A Scaling up decentralized MDPs through heuristic search Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, (217-226)
  341. Lo W, Knaus C and Zwicker M Learning motion controllers with adaptive depth perception Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, (145-154)
  342. Lo W, Knaus C and Zwicker M Learning motion controllers with adaptive depth perception Proceedings of the 11th ACM SIGGRAPH / Eurographics conference on Computer Animation, (145-154)
  343. Dethlefs N, Hastie H, Rieser V and Lemon O Optimising incremental dialogue decisions using information density for interactive systems Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, (82-93)
  344. ACM
    Loscalzo S, Wright R, Acunto K and Yu L Sample aware embedded feature selection for reinforcement learning Proceedings of the 14th annual conference on Genetic and evolutionary computation, (887-894)
  345. ACM
    Doucette J, Lichodzijewski P and Heywood M Hierarchical task decomposition through symbiosis in reinforcement learning Proceedings of the 14th annual conference on Genetic and evolutionary computation, (97-104)
  346. ACM
    Piperagkas G, Georgoulas G, Parsopoulos K, Stylios C and Likas A Integrating particle swarm optimization with reinforcement learning in noisy problems Proceedings of the 14th annual conference on Genetic and evolutionary computation, (65-72)
  347. ACM
    Zhang K, Collins E and Shi D (2012). Centralized and distributed task allocation in multi-robot teams via a stochastic clustering auction, ACM Transactions on Autonomous and Adaptive Systems, 7:2, (1-22), Online publication date: 1-Jul-2012.
  348. Subagdja B, Wang W, Tan A, Tan Y and Teow L Memory formation, consolidation, and forgetting in learning agents Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (1007-1014)
  349. Knox W and Stone P Reinforcement learning from simultaneous human and MDP reward Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (475-482)
  350. Devlin S and Kudenko D Dynamic potential-based reward shaping Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (433-440)
  351. Teacy W, Chalkiadakis G, Farinelli A, Rogers A, Jennings N, McClean S and Parr G Decentralized Bayesian reinforcement learning for online agent collaboration Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (417-424)
  352. ACM
    Konstantinou I, Angelou E, Tsoumakos D, Boumpouka C, Koziris N and Sioutas S TIRAMOLA Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, (725-728)
  353. White S, Martinez T and Rudolph G (2012). Reinforcement Programming, Computational Intelligence, 28:2, (176-208), Online publication date: 1-May-2012.
  354. Papangelis A A comparative study of reinforcement learning techniques on dialogue management Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, (22-31)
  355. ACM
    Korah J, Santos E and Santos E Multi-agent framework for real-time processing of large and dynamic search spaces Proceedings of the 27th Annual ACM Symposium on Applied Computing, (755-762)
  356. ACM
    Abundo M, Cardellini V and Lo Presti F (2012). Admission control policies for a multi-class QoS-aware service oriented architecture, ACM SIGMETRICS Performance Evaluation Review, 39:4, (89-98), Online publication date: 9-Mar-2012.
  357. Posen H and Levinthal D (2012). Chasing a Moving Target, Management Science, 58:3, (587-601), Online publication date: 1-Mar-2012.
  358. ACM
    Kanani P and McCallum A Selecting actions for resource-bounded information extraction using reinforcement learning Proceedings of the fifth ACM international conference on Web search and data mining, (253-262)
  359. Bertsekas D and Yu H (2012). Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming, Mathematics of Operations Research, 37:1, (66-94), Online publication date: 1-Feb-2012.
  360. Matignon L, Laurent G and Le fort-piat N (2012). Review: independent reinforcement learners in cooperative markov games, The Knowledge Engineering Review, 27:1, (1-31), Online publication date: 1-Feb-2012.
  361. Moradi P, Shiri M, Rad A, Khadivi A and Hasler M (2012). Automatic skill acquisition in reinforcement learning using graph centrality measures, Intelligent Data Analysis, 16:1, (113-135), Online publication date: 1-Jan-2012.
  362. Ryzhov I, Powell W and Frazier P (2012). The Knowledge Gradient Algorithm for a General Class of Online Learning Problems, Operations Research, 60:1, (180-195), Online publication date: 1-Jan-2012.
  363. Miller G, Weatherwax M, Gardinier T, Abe N, Melville P, Pendus C, Jensen D, Reddy C, Thomas V, Bennett J, Anderson G and Cooley B (2012). Tax Collections Optimization for New York State, Interfaces, 42:1, (74-84), Online publication date: 1-Jan-2012.
  364. Sisikoglu E, Epelman M and Smith R A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes Proceedings of the Winter Simulation Conference, (4091-4102)
  365. Gosavi A and Purohit M Stochastic policy search for variance-penalized semi-Markov control Proceedings of the Winter Simulation Conference, (2865-2876)
  366. Mattila V and Virtanen K Scheduling fighter aircraft maintenance with reinforcement learning Proceedings of the Winter Simulation Conference, (2540-2551)
  367. Tilak O and Mukhopadhyay S (2011). Partially decentralized reinforcement learning in finite, multi-agent Markov decision processes, AI Communications, 24:4, (293-309), Online publication date: 1-Dec-2011.
  368. Groce A Coverage rewarded Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, (380-383)
  369. ACM
    Kamishima T and Akaho S Personalized pricing recommender system Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems, (57-64)
  370. Berral J, Gavalda R and Torres J Adaptive Scheduling on Power-Aware Managed Data-Centers Using Machine Learning Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, (66-73)
  371. Sevay H and Tsatsoulis C Multiagent reactive plan application learning in dynamic environments Proceedings of the 15th WSEAS international conference on Computers, (424-429)
  372. Furmston T and Barber D Efficient inference in Markov control problems Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, (221-229)
  373. Fard M, Pineau J and Szepesv�ri C PAC-Bayesian policy evaluation for reinforcement learning Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, (195-202)
  374. Czibula G, Bocicor M and Czibula I A distributed reinforcement learning approach for solving optimization problems Proceedings of the 5th WSEAS international conference on Communications and information technology, (25-30)
  375. ACM
    Junges R and Kl�gl F Evolution for modeling Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (551-558)
  376. ACM
    Niekum S, Spector L and Barto A Evolution of reward functions for reinforcement learning Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, (177-178)
  377. ACM
    Allmendinger R and Knowles J Policy learning in resource-constrained optimization Proceedings of the 13th annual conference on Genetic and evolutionary computation, (1971-1978)
  378. ACM
    Li X, Mabu S and Hirasawa K Use of infeasible individuals in probabilistic model building genetic network programming Proceedings of the 13th annual conference on Genetic and evolutionary computation, (601-608)
  379. ACM
    Hashmi A, Berry H, Temam O and Lipasti M (2011). Automatic abstraction and fault tolerance in cortical microachitectures, ACM SIGARCH Computer Architecture News, 39:3, (1-10), Online publication date: 22-Jun-2011.
  380. ACM
    Srinivasan M and Metoyer R Semi-automatic end-user tools for construction of virtual avatar behaviors Proceedings of the 16th International Conference on 3D Web Technology, (121-128)
  381. Dethlefs N and Cuay�huitl H Hierarchical reinforcement learning and hidden Markov models for task-oriented natural language generation Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, (654-659)
  382. ACM
    Jungmann A, Lutterbeck J, Werdehausen B, Kleinjohann B and Kleinjohann L Towards a real-world scenario for investigating organic computing principles in heterogeneous societies of robots Proceedings of the 2011 workshop on Organic computing, (41-50)
  383. Dethlefs N, Cuay�huitl H and Viethen J Optimising natural language generation decision making for situated dialogue Proceedings of the SIGDIAL 2011 Conference, (78-87)
  384. ACM
    Rao J, Bu X, Wang K and Xu C (2011). Self-adaptive provisioning of virtualized resources in cloud computing, ACM SIGMETRICS Performance Evaluation Review, 39:1, (321-322), Online publication date: 7-Jun-2011.
  385. ACM
    Rao J, Bu X, Wang K and Xu C Self-adaptive provisioning of virtualized resources in cloud computing Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, (129-130)
  386. ACM
    Ge Y and Qiu Q Dynamic thermal management for multimedia applications using machine learning Proceedings of the 48th Design Automation Conference, (95-100)
  387. ACM
    Wang Y, Xie Q, Ammari A and Pedram M Deriving a near-optimal power management policy using model-free reinforcement learning and Bayesian classification Proceedings of the 48th Design Automation Conference, (41-46)
  388. ACM
    Hashmi A, Berry H, Temam O and Lipasti M Automatic abstraction and fault tolerance in cortical microachitectures Proceedings of the 38th annual international symposium on Computer architecture, (1-10)
  389. ACM
    Mariani L, Pezzè M, Riganelli O and Santoro M AutoBlackTest Proceedings of the 33rd International Conference on Software Engineering, (1013-1015)
  390. HolmesParker C and Agogino A Agent-based resource allocation in dynamically formed CubeSat constellations The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (1157-1158)
  391. De Hauwere Y, Vrancx P and Nowé A Solving delayed coordination problems in MAS The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (1115-1116)
  392. Grześ M and Hoey J Efficient planning in R-max The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (963-970)
  393. Parsons S, Tang Y, Sklar E, McBurney P and Cai K Argumentation-based reasoning in agents with varying degrees of trust The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (879-886)
  394. Barrett S, Stone P and Kraus S Empirical evaluation of ad hoc teamwork in the pursuit domain The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (567-574)
  395. Moriyama K, Kurihara S and Numao M Evolving subjective utilities The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (233-240)
  396. Devlin S and Kudenko D Theoretical considerations of potential-based reward shaping for multi-agent systems The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (225-232)
  397. ACM
    Misu T, Sugiura K, Kawahara T, Ohtake K, Hori C, Kashioka H, Kawai H and Nakamura S (2011). Modeling spoken decision support dialogue and optimization of its dialogue strategy, ACM Transactions on Speech and Language Processing , 7:3, (1-18), Online publication date: 1-May-2011.
  398. ACM
    Pietquin O, Geist M, Chandramohan S and Frezza-Buet H (2011). Sample-efficient batch reinforcement learning for dialogue management optimization, ACM Transactions on Speech and Language Processing , 7:3, (1-21), Online publication date: 1-May-2011.
  399. ACM
    Lemon O and Pietquin O (2011). Introduction to special issue on machine learning for adaptivity in spoken dialogue systems, ACM Transactions on Speech and Language Processing , 7:3, (1-3), Online publication date: 1-May-2011.
  400. Dore A, Pinasco M, Ciardelli L and Regazzoni C (2011). A bio-inspired system model for interactive surveillance applications, Journal of Ambient Intelligence and Smart Environments, 3:2, (147-163), Online publication date: 1-Apr-2011.
  401. ACM
    Gomez-Hicks G and Kauchak D Dynamic game difficulty balancing for backgammon Proceedings of the 49th Annual Southeast Regional Conference, (295-299)
  402. Beck J, Feng T and Watson J (2011). Combining Constraint Programming and Local Search for Job-Shop Scheduling, INFORMS Journal on Computing, 23:1, (1-14), Online publication date: 1-Jan-2011.
  403. ACM
    Lee Y, Wampler K, Bernstein G, Popović J and Popović Z Motion fields for interactive character locomotion ACM SIGGRAPH Asia 2010 papers, (1-8)
  404. Konidaris G, Kuindersma S, Barto A and Grupen R Constructing skill trees for reinforcement learning agents from demonstration trajectories Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, (1162-1170)
  405. Tang J and Abbeel P On a connection between importance sampling and the likelihood ratio policy Gradient Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, (1000-1008)
  406. Beygelzimer A, Hsu D, Langford J and Zhang T Agnostic active learning without constraints Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1, (199-207)
  407. Xu H and Mannor S Distributionally robust Markov decision processes Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 2, (2505-2513)
  408. Fard M and Pineau J PAC-Bayesian model selection for reinforcement learning Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 2, (1624-1632)
  409. Khazab M, Tweedale J and Jain L (2010). Web-based multi-agent system architecture in a dynamic environment, International Journal of Knowledge-based and Intelligent Engineering Systems, 14:4, (217-227), Online publication date: 1-Dec-2010.
  410. De Hauwere Y, Vrancx P and Nowé A (2010). Generalized learning automata for multi-agent reinforcement learning, AI Communications, 23:4, (311-324), Online publication date: 1-Dec-2010.
  411. ACM
    Lee Y, Wampler K, Bernstein G, Popović J and Popović Z (2010). Motion fields for interactive character locomotion, ACM Transactions on Graphics, 29:6, (1-8), Online publication date: 1-Dec-2010.
  412. ACM
    Namin A and Sridharan M Bayesian reasoning for software testing Proceedings of the FSE/SDP workshop on Future of software engineering research, (349-354)
  413. ACM
    Le T and Cai C A new feature for approximate dynamic programming traffic light controller Proceedings of the Third International Workshop on Computational Transportation Science, (29-34)
  414. ACM
    Changuel N, Mastronarde N, Van der Schaar M, Sayadi B and Kieffer M End-to-end stochastic scheduling of scalable video overtime-varying channels Proceedings of the 18th ACM international conference on Multimedia, (731-734)
  415. Chandramohan S, Geist M and Pietquin O Sparse approximate dynamic programming for dialog management Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (107-115)
  416. Georgila K, Wolters M and Moore J Learning dialogue strategies from older and younger simulated users Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (103-106)
  417. Cervellera C, Macciò D and Muselli M (2010). Functional Optimization Through Semilocal Approximate Minimization, Operations Research, 58:5, (1491-1504), Online publication date: 1-Sep-2010.
  418. Oh J, Meneguzzi F and Sycara K ANTIPA Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, (1055-1056)
  419. Dickens L, Broda K and Russo A The Dynamics of Multi-Agent Reinforcement Learning Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, (367-372)
  420. Hans A and Udluft S Uncertainty Propagation for Efficient Exploration in Reinforcement Learning Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, (361-366)
  421. Sykulski A, Chapman A, Munoz de Cote E and Jennings N EA2 Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, (209-214)
  422. ACM
    Russell I, Markov Z, Neller T and Coleman S (2010). MLeXAI, ACM Transactions on Computing Education, 10:3, (1-36), Online publication date: 1-Aug-2010.
  423. ACM
    Wang J, Fleet D and Hertzmann A Optimizing walking controllers for uncertain inputs and environments ACM SIGGRAPH 2010 papers, (1-8)
  424. ACM
    Wang J, Fleet D and Hertzmann A (2010). Optimizing walking controllers for uncertain inputs and environments, ACM Transactions on Graphics, 29:4, (1-8), Online publication date: 26-Jul-2010.
  425. ACM
    Abe N, Melville P, Pendus C, Reddy C, Jensen D, Thomas V, Bennett J, Anderson G, Cooley B, Kowalczyk M, Domick M and Gardinier T Optimizing debt collections using constrained reinforcement learning Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, (75-84)
  426. da Costa L, Clua E, Giraldi G, Bernardini F, Bianchi R, Schulze B and Montenegro A A framework of intentional characters for simulation of social behavior Proceedings of the 2010 Summer Computer Simulation Conference, (244-249)
  427. ACM
    Knittel A An activation reinforcement based classifier system for balancing generalisation and specialisation (ARCS) Proceedings of the 12th annual conference companion on Genetic and evolutionary computation, (1871-1878)
  428. ACM
    Schrum J and Miikkulainen R Evolving agent behavior in multiobjective domains using fitness-based shaping Proceedings of the 12th annual conference on Genetic and evolutionary computation, (439-446)
  429. Pardoe D, Stone P, Saar-Tsechansky M, Keskin T and Tomak K (2010). Adaptive Auction Mechanism Design and the Incorporation of Prior Knowledge, INFORMS Journal on Computing, 22:3, (353-370), Online publication date: 1-Jul-2010.
  430. Yu H Convergence of least squares temporal difference methods under general conditions Proceedings of the 27th International Conference on International Conference on Machine Learning, (1207-1214)
  431. Dinculescu M and Precup D Approximate predictive representations of partially observable systems Proceedings of the 27th International Conference on International Conference on Machine Learning, (895-902)
  432. Morimura T, Sugiyama M, Kashima H, Hachiya H and Tanaka T Nonparametric return distribution approximation for reinforcement learning Proceedings of the 27th International Conference on International Conference on Machine Learning, (799-806)
  433. Downey C and Sanner S Temporal difference Bayesian model averaging Proceedings of the 27th International Conference on International Conference on Machine Learning, (311-318)
  434. Chakraborty D and Stone P Convergence, targeted optimality, and safety in multiagent learning Proceedings of the 27th International Conference on International Conference on Machine Learning, (191-198)
  435. Knox W and Stone P Training a Tetris agent via interactive shaping Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (1767-1768)
  436. Banerjee B and Kraemer L Action discovery for reinforcement learning Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (1585-1586)
  437. Chakraborty D and Stone P Online model learning in adversarial Markov decision processes Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (1583-1584)
  438. Meriçli Ç, Meriçli T and Akin H A reward function generation method using genetic algorithms Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (1513-1514)
  439. Raboin E, Nau D, Kuter U, Gupta S and Svec P Strategy generation in multi-agent imperfect-information pursuit games Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (947-954)
  440. De Hauwere Y, Vrancx P and Nowé A Learning multi-agent state space representations Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (715-722)
  441. Comanici G and Precup D Optimal policy switching algorithms for reinforcement learning Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (709-714)
  442. Grześ M and Kudenko D PAC-MDP learning with knowledge-based admissible models Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (349-358)
  443. Stone P and Kraus S To teach or not to teach? Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (117-124)
  444. Amato C and Shani G High-level reinforcement learning in strategy games Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (75-82)
  445. Knox W and Stone P Combining manual feedback with subsequent MDP reward signals for reinforcement learning Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (5-12)
  446. Yu H and Bertsekas D (2010). Error Bounds for Approximations from Projected Linear Equations, Mathematics of Operations Research, 35:2, (306-329), Online publication date: 1-May-2010.
  447. ACM
    White S, Martinez T and Rudolph G Generating three binary addition algorithms using reinforcement programming Proceedings of the 48th Annual Southeast Regional Conference, (1-6)
  448. Cuayáhuitl H, Renals S, Lemon O and Shimodaira H (2010). Evaluation of a hierarchical reinforcement learning spoken dialogue system, Computer Speech and Language, 24:2, (395-429), Online publication date: 1-Apr-2010.
  449. Castro D and Meir R (2010). A Convergent Online Single Time Scale Actor Critic Algorithm, The Journal of Machine Learning Research, 11, (367-410), Online publication date: 1-Mar-2010.
  450. Boylu F, Aytug H and Koehler G (2010). Induction over Strategic Agents, Information Systems Research, 21:1, (170-189), Online publication date: 1-Mar-2010.
  451. ACM
    Peng H and Lin Y (2010). An optimal warning-zone-length assignment algorithm for real-time and multiple-QoS on-chip bus arbitration, ACM Transactions on Embedded Computing Systems, 9:4, (1-39), Online publication date: 1-Mar-2010.
  452. Powell W (2010). Feature Article---Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming, INFORMS Journal on Computing, 22:1, (2-17), Online publication date: 1-Jan-2010.
  453. Shi D, Sauter M and Kralik J Distributed, heterogeneous, multi-agent social coordination via reinforcement learning Proceedings of the 2009 international conference on Robotics and biomimetics, (653-658)
  454. ACM
    Coros S, Beaudoin P and van de Panne M Robust task-based control policies for physics-based characters ACM SIGGRAPH Asia 2009 papers, (1-9)
  455. Gosavi A Reinforcement learning for model building and variance-penalized control Winter Simulation Conference, (373-379)
  456. Ramanna S and Meghdadi A (2009). Measuring Resemblances Between Swarm Behaviours: A Perceptual Tolerance Near Set Approach, Fundamenta Informaticae, 95:4, (533-552), Online publication date: 1-Dec-2009.
  457. Ramanna S and Meghdadi A (2009). Measuring Resemblances Between Swarm Behaviours: A Perceptual Tolerance Near Set Approach, Fundamenta Informaticae, 95:4, (533-552), Online publication date: 1-Dec-2009.
  458. Bidoki A, Yazdani N and Ghodsnia P (2009). FICA: A novel intelligent crawling algorithm based on reinforcement learning, Web Intelligence and Agent Systems, 7:4, (363-373), Online publication date: 1-Dec-2009.
  459. Kanamori T, Hido S and Sugiyama M (2009). A Least-squares Approach to Direct Importance Estimation, The Journal of Machine Learning Research, 10, (1391-1445), Online publication date: 1-Dec-2009.
  460. ACM
    Coros S, Beaudoin P and van de Panne M (2009). Robust task-based control policies for physics-based characters, ACM Transactions on Graphics, 28:5, (1-9), Online publication date: 1-Dec-2009.
  461. Frampton M and Lemon O (2009). Review:, The Knowledge Engineering Review, 24:4, (375-408), Online publication date: 1-Dec-2009.
  462. ACM
    Yugay O, Kyung L and Ko F Reinforcement learning coordination with combined heuristics in multi-agent environment for university timetabling Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, (995-1000)
  463. ACM
    Sung Y, Cho K and Um K A reward field model generation in Q-learning by dynamic programming Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, (674-679)
  464. Khan R, Lewis M and Singh V (2009). Dynamic Customer Management and the Value of One-to-One Marketing, Marketing Science, 28:6, (1063-1079), Online publication date: 1-Nov-2009.
  465. ACM
    Liu A, Hile H, Borriello G, Brown P, Harniss M, Kautz H and Johnson K Customizing directions in an automated wayfinding system for individuals with cognitive impairment Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility, (27-34)
  466. Watanabe T Implementation of fuzzy Q-learning based on modular fuzzy model and parallel structured learning Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics, (1338-1344)
  467. Li H and Liu Z A probabilistic fuzzy logic system Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics, (383-388)
  468. Wawerla J and Vaughan R Robot task switching under diminishing returns Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems, (5033-5038)
  469. Pahliani A, Spaan M and Lima P Decision-theoretic robot guidance for active cooperative perception Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems, (4837-4842)
  470. Banerjee B, Abukmail A and Kraemer L (2009). Layered Intelligence for Agent-based Crowd Simulation, Simulation, 85:10, (621-633), Online publication date: 1-Oct-2009.
  471. Devlin S, Grzes M and Kudenko D Reinforcement Learning in RoboCup KeepAway with Partial Observability Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (201-208)
  472. Lefévre F, Gašić M, Jurčíček F, Keizer S, Mairesse F, Thomson B, Yu K and Young S k-nearest neighbor Monte-Carlo control algorithm for POMDP-based dialogue systems Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (272-275)
  473. DeLooze L and Viner W Fuzzy Q-learning in a nondeterministic environment Proceedings of the 5th international conference on Computational Intelligence and Games, (162-169)
  474. ACM
    Knox W and Stone P Interactively shaping agents via human reinforcement Proceedings of the fifth international conference on Knowledge capture, (9-16)
  475. Liu Z and Li H Probabilistic fuzzy logic system Proceedings of the 18th international conference on Fuzzy Systems, (848-853)
  476. ACM
    Sui X and Leung H A q-learning based adaptive bidding strategy in combinatorial auctions Proceedings of the 11th International Conference on Electronic Commerce, (186-194)
  477. Moriyama K (2009). Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games, Web Intelligence and Agent Systems, 7:3, (233-242), Online publication date: 1-Aug-2009.
  478. ACM
    da Silva M, Durand F and Popović J Linear Bellman combination for control of character animation ACM SIGGRAPH 2009 papers, (1-10)
  479. ACM
    da Silva M, Durand F and Popović J (2009). Linear Bellman combination for control of character animation, ACM Transactions on Graphics, 28:3, (1-10), Online publication date: 27-Jul-2009.
  480. ACM
    Diaz F and Arguello J Adaptation of offline vertical selection predictions in the presence of user feedback Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, (323-330)
  481. ACM
    Konen W and Bartz-Beielstein T Reinforcement learning for games Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, (2641-2648)
  482. ACM
    Catteeuw D and Manderick B Learning in the time-dependent minority game Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers, (2011-2016)
  483. ACM
    Barreto A, Augusto D and Barbosa H On the characteristics of sequential decision problems and their impact on evolutionary computation Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (1767-1768)
  484. ACM
    David-Tabibi O, van den Herik H, Koppel M and Netanyahu N Simulating human grandmasters Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (1483-1490)
  485. ACM
    Ramírez-Ruiz J, Valenzuela-Rendón M and Terashima-Marín H uQFCS Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (1307-1314)
  486. ACM
    Handa H EDA-RL Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (405-412)
  487. ACM
    Soltoggio A and Jones B Novelty of behaviour as a basis for the neuro-evolution of operant reward learning Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (169-176)
  488. ACM
    Koppejan R and Whiteson S Neuroevolutionary reinforcement learning for generalized helicopter control Proceedings of the 11th Annual conference on Genetic and evolutionary computation, (145-152)
  489. Vengerov D (2009). A reinforcement learning framework for utility-based scheduling in resource-constrained systems, Future Generation Computer Systems, 25:7, (728-736), Online publication date: 1-Jul-2009.
  490. ACM
    Mahmood T and Ricci F Improving recommender systems with adaptive conversational strategies Proceedings of the 20th ACM conference on Hypertext and hypermedia, (73-82)
  491. Hu Z and Tham C SI-CCMAC Proceedings of the 7th international conference on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, (131-140)
  492. Walsh T, Szita I, Diuk C and Littman M Exploring compact reinforcement-learning representations with linear regression Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, (591-598)
  493. Regan K and Boutilier C Regret-based reward elicitation for Markov decision processes Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, (444-451)
  494. Crowley M, Nelson J and Poole D Seeing the forest despite the trees Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, (126-134)
  495. Xu J, Zhang J and Liu Y An adaptive inventory control for a supply chain Proceedings of the 21st annual international conference on Chinese control and decision conference, (5750-5755)
  496. ACM
    Perez J, Germain-Renaud C, Kégl B and Loomis C Responsive elastic computing Proceedings of the 6th international conference industry session on Grids meets autonomic computing, (55-64)
  497. Kiran Y, Venkatesh T and Murthy C A multi-agent reinforcement learning approach to path selection in optical burst switching networks Proceedings of the 2009 IEEE international conference on Communications, (2431-2435)
  498. Venkatesh T, Kiran Y and Murthy C Joint path and wavelength selection using Q-learning in optical burst switching networks Proceedings of the 2009 IEEE international conference on Communications, (2420-2424)
  499. Hosoya H A motor learning neural model based on Bayesian network and reinforcement learning Proceedings of the 2009 international joint conference on Neural Networks, (760-767)
  500. Bethke B and How J Approximate dynamic programming using Bellman residual elimination and Gaussian process regression Proceedings of the 2009 conference on American Control Conference, (745-750)
  501. Hartland C, Bredeche N and Sebag M Memory-enhanced evolutionary robotics Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2788-2795)
  502. Cardamone L, Loiacono D and Lanzi P On-line neuroevolution applied to the open racing car simulator Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2622-2629)
  503. Mouret J and Doncieux S Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (1161-1168)
  504. Ramachandran D and Gupta R Smoothed Sarsa Proceedings of the 2009 IEEE international conference on Robotics and Automation, (3327-3334)
  505. Kober J and Peters J Learning motor primitives for robotics Proceedings of the 2009 IEEE international conference on Robotics and Automation, (2509-2515)
  506. Strasdat H, Stachniss C and Burgard W Which landmark is useful? Proceedings of the 2009 IEEE international conference on Robotics and Automation, (197-202)
  507. Wang F and Swegles K An Online Algorithm for Applying Reinforcement Learning to Handle Ambiguity in Spoken Dialogues Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation, (380-389)
  508. Gomes E and Kowalczyk R Modelling the dynamics of multiagent Q-learning with ε-greedy exploration Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (1181-1182)
  509. Fasel I, Quinlan M and Stone P A task specification language for bootstrap learning Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (1169-1170)
  510. Hennes D, Tuyls K and Rauterberg M State-coupled replicator dynamics Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (789-796)
  511. Kalyanakrishnan S and Stone P An empirical analysis of value function-based and policy search reinforcement learning Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (749-756)
  512. Li L, Littman M and Mansley C Online exploration in least-squares policy iteration Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (733-739)
  513. Hester T and Stone P Generalized model learning for reinforcement learning in factored domains Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (717-724)
  514. Proper S and Tadepalli P Solving multiagent assignment Markov decision processes Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (681-688)
  515. James M and Singh S SarsaLandmark Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (585-591)
  516. Klos T and Nooteboom B Adaptive learning in evolving task allocation networks Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (465-472)
  517. Schvartzman L and Wellman M Stronger CDA strategies through empirical game-theoretic analysis and reinforcement learning Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (249-256)
  518. Fang C and Levinthal D (2009). Near-Term Liability of Exploitation, Organization Science, 20:3, (538-551), Online publication date: 1-May-2009.
  519. Bone C and Dragićević S (2009). Defining Transition Rules with Reinforcement Learning for Modeling Land Cover Change, Simulation, 85:5, (291-305), Online publication date: 1-May-2009.
  520. ACM
    Carvalho M A distributed reinforcement learning approach to mission survivability in tactical MANETs Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies, (1-4)
  521. Gomes E and Kowalczyk R (2009). Learning the IPA market with individual and social rewards, Web Intelligence and Agent Systems, 7:2, (123-138), Online publication date: 1-Apr-2009.
  522. Leng J, Fyfe C and Jain L (2009). Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 20:1,2, (73-82), Online publication date: 1-Apr-2009.
  523. Gosavi A (2009). Reinforcement Learning, INFORMS Journal on Computing, 21:2, (178-192), Online publication date: 1-Apr-2009.
  524. Sallez Y, Berger T and Trentesaux D (2009). A stigmergic approach for dynamic routing of active products in FMS, Computers in Industry, 60:3, (204-216), Online publication date: 1-Apr-2009.
  525. ACM
    Kim E, Leyzberg D, Tsui K and Scassellati B How people talk when teaching a robot Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, (23-30)
  526. ACM
    Chen H, Jiang G, Zhang H and Yoshihira K Boosting the performance of computing systems through adaptive configuration tuning Proceedings of the 2009 ACM symposium on Applied Computing, (1045-1049)
  527. Levina T, Levin Y, McGill J and Nediak M (2009). Dynamic Pricing with Online Learning and Strategic Consumers, Operations Research, 57:2, (327-341), Online publication date: 1-Mar-2009.
  528. Chen X, Zhao Z, Jiang T, Grace D and Zhang H (2009). Intercluster connection in cognitive wireless mesh networks based on intelligent network coding, EURASIP Journal on Advances in Signal Processing, 2009, (7-7), Online publication date: 1-Mar-2009.
  529. ACM
    Diaz F Integration of news content into web results Proceedings of the Second ACM International Conference on Web Search and Data Mining, (182-191)
  530. ACM
    Kulkarni S and Rao G Modeling reinforcement learning algorithms for performance analysis Proceedings of the International Conference on Advances in Computing, Communication and Control, (35-39)
  531. Urbanowicz R and Moore J (2009). Learning classifier systems, Journal of Artificial Evolution and Applications, 2009, (1-25), Online publication date: 1-Jan-2009.
  532. Bengio Y (2009). Learning Deep Architectures for AI, Foundations and Trends� in Machine Learning, 2:1, (1-127), Online publication date: 1-Jan-2009.
  533. Secomandi N and Margot F (2009). Reoptimization Approaches for the Vehicle-Routing Problem with Stochastic Demands, Operations Research, 57:1, (214-230), Online publication date: 1-Jan-2009.
  534. Foo B and Van Der Schaar M (2009). A rules-based approach for configuring chains of classifiers in real-time stream mining systems, EURASIP Journal on Advances in Signal Processing, 2009, (1-17), Online publication date: 1-Jan-2009.
  535. Miller S, Harris Z and Chong E (2009). A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking, EURASIP Journal on Advances in Signal Processing, 2009, (1-17), Online publication date: 1-Jan-2009.
  536. Su X and Khoshgoftaar T (2009). A survey of collaborative filtering techniques, Advances in Artificial Intelligence, 2009, (2-2), Online publication date: 1-Jan-2009.
  537. Salkham A, Cunningham R, Garg A and Cahill V A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (560-566)
  538. Lemouzy S, Camps V and Glize P Towards a Self-Organising Mechanism for Learning Adaptive Decision-Making Rules Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03, (616-620)
  539. Hennes D, Tuyls K and Rauterberg M Formalizing Multi-state Learning Dynamics Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (266-272)
  540. Iwata K An Information-Theoretic Class of Stochastic Decision Processes Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (340-344)
  541. Moriyama K Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (322-325)
  542. Min H, Zeng J, Chen J and Zhu J A Study of Reinforcement Learning in a New Multiagent Domain Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (154-161)
  543. Perron J, Hogan J, Moulin B, Berger J and B�langer M A hybrid approach based on multi-agent geosimulation and reinforcement learning to solve a UAV patrolling problem Proceedings of the 40th Conference on Winter Simulation, (1259-1267)
  544. Gosavi A On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning Proceedings of the 40th Conference on Winter Simulation, (525-531)
  545. ACM
    Fujii N, Hashida M and Katayose H Strategy-acquisition system for video trading card game Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology, (175-182)
  546. Leng J, Fyfe C and Jain L (2008). Simulation and reinforcement learning with soccer agents, Multiagent and Grid Systems, 4:4, (415-436), Online publication date: 1-Dec-2008.
  547. Leng J, Jain L and Fyfe C (2009). Experimental analysis of eligibility traces strategies in temporal difference learning, International Journal of Knowledge Engineering and Soft Data Paradigms, 1:1, (26-39), Online publication date: 1-Dec-2008.
  548. Bhatnagar S and Abdulla M (2008). Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes, Simulation, 84:12, (577-600), Online publication date: 1-Dec-2008.
  549. Sikora R (2008). Meta-learning optimal parameter values in non-stationary environments, Knowledge-Based Systems, 21:8, (800-806), Online publication date: 1-Dec-2008.
  550. ACM
    Hu Z and Tham C CCMAC Proceedings of the 11th international symposium on Modeling, analysis and simulation of wireless and mobile systems, (60-69)
  551. ACM
    Simpkins C, Bhat S, Isbell C and Mateas M (2008). Towards adaptive programming, ACM SIGPLAN Notices, 43:10, (603-614), Online publication date: 27-Oct-2008.
  552. ACM
    Simpkins C, Bhat S, Isbell C and Mateas M Towards adaptive programming Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications, (603-614)
  553. Leng J, Li J and Jain L (2008). A role-oriented BDI framework for real-time multiagent teaming, Intelligent Decision Technologies, 2:4, (205-217), Online publication date: 1-Oct-2008.
  554. Howard M, Klanke S, Gienger M, Goerick C and Vijayakumar S (2008). Behaviour generation in humanoids by learning potential-based policies from constrained motion, Applied Bionics and Biomechanics, 5:4, (195-211), Online publication date: 1-Oct-2008.
  555. Rokach L, Naamani L and Shmilovici A (2008). Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns, Data Mining and Knowledge Discovery, 17:2, (283-316), Online publication date: 1-Oct-2008.
  556. Wu J, Kalyanam R and Givan R Stochastic enforced hill-climbing Proceedings of the Eighteenth International Conference on International Conference on Automated Planning and Scheduling, (396-403)
  557. Yoshida T (2008). A model of implicit term relationship for information retrieval, WSEAS Transactions on Computers, 7:9, (1457-1466), Online publication date: 1-Sep-2008.
  558. ACM
    Mahmood T and Ricci F Adapting the interaction state model in conversational recommender systems Proceedings of the 10th international conference on Electronic commerce, (1-10)
  559. ACM
    Tan C and Cheng H A combined tactical and strategic hierarchical learning framework in multi-agent games Proceedings of the 2008 ACM SIGGRAPH symposium on Video games, (115-122)
  560. ACM
    Huebscher M and McCann J (2008). A survey of autonomic computing—degrees, models, and applications, ACM Computing Surveys, 40:3, (1-28), Online publication date: 1-Aug-2008.
  561. ACM
    Xu Z and Akella R A bayesian logistic regression model for active relevance feedback Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, (227-234)
  562. Konidaris G Autonomous robot skill acquisition Proceedings of the 23rd national conference on Artificial intelligence - Volume 3, (1855-1856)
  563. Tumer K and Agogino A Adaptive management of air traffic flow Proceedings of the 23rd national conference on Artificial intelligence - Volume 3, (1581-1584)
  564. Hachiya H, Akiyama T, Sugiyama M and Peters J Adaptive importance sampling with automatic model selection in value function approximation Proceedings of the 23rd national conference on Artificial intelligence - Volume 3, (1351-1356)
  565. Gauci J and Stanley K A case study on the critical role of geometric regularity in machine learning Proceedings of the 23rd national conference on Artificial intelligence - Volume 2, (628-633)
  566. Guez A, Vincent R, Avoli M and Pineau J Adaptive treatment of epilepsy via batch-mode reinforcement learning Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3, (1671-1678)
  567. Dejmal S, Fern A and Nguyen T Reinforcement learning for vulnerability assessment in peer-to-peer networks Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3, (1655-1662)
  568. Rabinovich Z, Pochter N and Rosenschein J Coordination and multi-tasking using EMT Proceedings of the 23rd national conference on Artificial intelligence - Volume 1, (144-149)
  569. ACM
    David-Tabibi O, Koppel M and Netanyahu N Genetic algorithms for mentor-assisted evaluation function optimization Proceedings of the 10th annual conference on Genetic and evolutionary computation, (1469-1476)
  570. Lo W and Zwicker M Real-time planning for parameterized human motion Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, (29-38)
  571. ACM
    Silver D, Sutton R and M�ller M Sample-based learning and search with permanent and transient memories Proceedings of the 25th international conference on Machine learning, (968-975)
  572. ACM
    Sakuma J, Kobayashi S and Wright R Privacy-preserving reinforcement learning Proceedings of the 25th international conference on Machine learning, (864-871)
  573. ACM
    Parr R, Li L, Taylor G, Painter-Wakefield C and Littman M An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning Proceedings of the 25th international conference on Machine learning, (752-759)
  574. ACM
    Frank J, Mannor S and Precup D Reinforcement learning in the presence of rare events Proceedings of the 25th international conference on Machine learning, (336-343)
  575. Bianchi R, Ramisa A and L�pez de M�ntaras R Learning to Select Object Recognition Methods for Autonomous Mobile Robots Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, (927-928)
  576. Servin A and Kudenko D Multi-Agent Reinforcement Learning for Intrusion Detection Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, (873-874)
  577. Pavlidis N, Tasoulis D, Adams N and Hand D Dynamic Multi-Armed Bandit with Covariates Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, (777-778)
  578. Leopold T, Kern-Isberner G and Peters G Belief revision with reinforcement learning for interactive object recognition Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, (65-69)
  579. Ipek E, Mutlu O, Mart�nez J and Caruana R Self-Optimizing Memory Controllers Proceedings of the 35th Annual International Symposium on Computer Architecture, (39-50)
  580. Livingston S, Garvey J and Elhanany I On the Broad Implications of Reinforcement Learning based AGI Proceedings of the 2008 conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference, (478-482)
  581. Smith L Artificial general intelligence Proceedings of the 2008 conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference, (429-433)
  582. Banerjee B, Abukmail A and Kraemer L Advancing the Layered Approach to Agent-Based Crowd Simulation Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation, (185-192)
  583. ACM
    Ipek E, Mutlu O, Mart�nez J and Caruana R (2008). Self-Optimizing Memory Controllers, ACM SIGARCH Computer Architecture News, 36:3, (39-50), Online publication date: 1-Jun-2008.
  584. Goualard F and Jermann C (2008). A Reinforcement Learning Approach to Interval Constraint Propagation, Constraints, 13:1-2, (206-226), Online publication date: 1-Jun-2008.
  585. Klos T and van Ahee G Evolutionary dynamics for designing multi-period auctions Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3, (1589-1592)
  586. Cetina V Autonomous agent learning using an actor-critic algorithm and behavior models Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3, (1353-1356)
  587. Lazaric A, Quaresimale M and Restelli M On the usefulness of opponent modeling Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3, (1345-1348)
  588. Iscen A and Erogul U A new perspective to the keepaway soccer Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3, (1341-1344)
  589. Ferrante E, Lazaric A and Restelli M Transfer of task representation in reinforcement learning using policy-based proto-value functions Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3, (1329-1332)
  590. Alexander G, Raja A and Musliner D Controlling deliberation in a Markov decision process-based agent Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (461-468)
  591. Bowling M, Geramifard A and Wingate D Sigma point policy iteration Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (379-386)
  592. Chakraborty D and Sen S MB-AIM-FSI Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (371-378)
  593. Jong N, Hester T and Stone P The utility of temporal abstraction in reinforcement learning Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (299-306)
  594. Metzen J, Edgington M, Kassahun Y and Kirchner F Analysis of an evolutionary reinforcement learning method in a multiagent domain Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1, (291-298)
  595. Tumer K, Welch Z and Agogino A Aligning social welfare and agent preferences to alleviate traffic congestion Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2, (655-662)
  596. Agogino A and Tumer K Regulating air traffic flow with coupled agents Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2, (535-542)
  597. ACM
    Richert W and Kleinjohann B Adaptivity at every layer Proceedings of the 2008 international workshop on Software engineering for adaptive and self-managing systems, (113-120)
  598. ACM
    Merrick K (2008). Modeling motivation for adaptive nonplayer characters in dynamic computer game worlds, Computers in Entertainment, 5:4, (1-32), Online publication date: 1-Mar-2008.
  599. ACM
    Shum H, Komura T and Yamazaki S Simulating interactions of avatars in high dimensional state space Proceedings of the 2008 symposium on Interactive 3D graphics and games, (131-138)
  600. van Seijen H, Bakker B and Kester L Switching between different state representations in reinforcement learning Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications, (226-231)
  601. Gyenes V, Bontovics � and L�rincz A (2008). Factored temporal difference learning in the new ties environment, Acta Cybernetica, 18:4, (651-668), Online publication date: 20-Jan-2008.
  602. Duc L, Sidhu A and Chaudhari N (2008). Hierarchical pathfinding and AI-based learning approach in strategy game design, International Journal of Computer Games Technology, 2008, (1-11), Online publication date: 10-Jan-2008.
  603. Powell W The optimizing-simulator Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come, (43-53)
  604. De Toledo S, Barreiro J, Fuertes J, Gonz�lez � and Lara J An approach to fully automatic aircraft collision avoidance and navigation Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Computer Science - Volume 7, (259-265)
  605. ACM
    Bar-Hillel A, Di-Nur A, Ein-Dor L, Gilad-Bachrach R and Ittach Y Workstation capacity tuning using reinforcement learning Proceedings of the 2007 ACM/IEEE conference on Supercomputing, (1-11)
  606. Samejima K and Doya K Estimating Internal Variables of a Decision Maker’s Brain: A Model-Based Approach for Neuroscience Neural Information Processing, (596-603)
  607. Hiraoka K, Yoshida M and Mishima T Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin Neural Information Processing, (487-496)
  608. Bahati R, Bauer M and Vieira E Policy-driven autonomic management of multi-component systems Proceedings of the 2007 conference of the center for advanced studies on Collaborative research, (137-151)
  609. ACM
    Provost F, Melville P and Saar-Tsechansky M Data acquisition and cost-effective predictive modeling Proceedings of the ninth international conference on Electronic commerce, (389-398)
  610. ACM
    Das S Learning to trade with insider information Proceedings of the ninth international conference on Electronic commerce, (169-176)
  611. ACM
    Mahmood T and Ricci F Learning and adaptivity in interactive recommender systems Proceedings of the ninth international conference on Electronic commerce, (75-84)
  612. ACM
    Sculley D Practical learning from one-sided feedback Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, (609-618)
  613. ACM
    McCann J and Pollard N Responsive characters from motion fragments ACM SIGGRAPH 2007 papers, (6-es)
  614. Chong H, Tan A and Ng G (2007). Integrated cognitive architectures, Artificial Intelligence Review, 28:2, (103-130), Online publication date: 1-Aug-2007.
  615. ACM
    McCann J and Pollard N (2007). Responsive characters from motion fragments, ACM Transactions on Graphics, 26:3, (6-es), Online publication date: 29-Jul-2007.
  616. Tirenni G, Labbi A, Berrospi C, Elisseeff A, Bhose T, Pauro K and P�yh�nen S (2007). The 2005 ISMS Practice Prize Winner---Customer Equity and Lifetime Management CELM Finnair Case Study, Marketing Science, 26:4, (553-565), Online publication date: 1-Jul-2007.
  617. Hoey J and Little J (2007). Value-Directed Human Behavior Analysis from Video Using Partially Observable Markov Decision Processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29:7, (1118-1132), Online publication date: 1-Jul-2007.
  618. Timuri T, Spronck P and van den Herik J Automatic rule ordering for dynamic scripting Proceedings of the Third AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (49-54)
  619. Tan C and Cheng H Personality-based adaptation for teamwork in game agents Proceedings of the Third AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (37-42)
  620. Chen G, Low C and Yang Z Extremal search of decision policies for scalable distributed applications Proceedings of the 2nd international conference on Scalable information systems, (1-8)
  621. Vishwanathan S, Smola A and Vidal R (2007). Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes, International Journal of Computer Vision, 73:1, (95-119), Online publication date: 1-Jun-2007.
  622. ACM
    Tumer K and Agogino A Distributed agent-based air traffic flow management Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-8)
  623. ACM
    Ahmadi M, Taylor M and Stone P IFSA Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-8)
  624. ACM
    Vigorito C Distributed path planning for mobile robots using a swarm of interacting reinforcement learners Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-8)
  625. ACM
    Gomes E and Kowalczyk R Reinforcement learning with utility-aware agents for market-based resource allocation Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-3)
  626. ACM
    Jong N and Stone P Model-based function approximation in reinforcement learning Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-8)
  627. ACM
    Kalyanakrishnan S and Stone P Batch reinforcement learning in a complex domain Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-8)
  628. ACM
    Grounds M and Kudenko D Parallel reinforcement learning with linear function approximation Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, (1-3)
  629. Xu X, Sun Y and Huang Z Defending DDoS attacks using hidden Markov models and cooperative reinforcement learning Proceedings of the 2007 Pacific Asia conference on Intelligence and security informatics, (196-207)
  630. Kyung-Joong Kim , Heejin Choi and Sung-Bae Cho Hybrid of Evolution and Reinforcement Learning for Othello Players Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games, (203-209)
  631. ACM
    Biskupski B, Dowling J and Sacha J (2007). Properties and mechanisms of self-organizing MANET and P2P systems, ACM Transactions on Autonomous and Adaptive Systems, 2:1, (1-es), Online publication date: 1-Mar-2007.
  632. Mannor S, Simester D, Sun P and Tsitsiklis J (2007). Bias and Variance Approximation in Value Function Estimates, Management Science, 53:2, (308-322), Online publication date: 1-Feb-2007.
  633. ACM
    Wiratanaya A, Lyons M, Butko N and Abe S iMime Proceedings of the 12th international conference on Intelligent user interfaces, (262-265)
  634. Dabney W and McGovern A Utile distinctions for relational reinforcement learning Proceedings of the 20th international joint conference on Artifical intelligence, (738-743)
  635. Prabha V and Monie E (2007). Hardware architecture of reinforcement learning scheme for dynamic power management in embedded systems, EURASIP Journal on Embedded Systems, 2007:1, (1-1), Online publication date: 1-Jan-2007.
  636. Zhang K and Pan W The Two Facets of the Exploration-Exploitation Dilemma Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology, (371-380)
  637. Alexander G and Raja A The Role of Problem Classification in Online Meta-cognition Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology, (218-225)
  638. ACM
    Peddemors A, Eertink H and Niemegeers I Experience-based network resource usage on mobile hosts Proceedings of the 2006 ACM CoNEXT conference, (1-2)
  639. Qiao H, Rozenblit J, Szidarovszky F and Yang L Multi-agent learning model with bargaining Proceedings of the 38th conference on Winter simulation, (934-940)
  640. Porta J, Vlassis N, Spaan M and Poupart P (2006). Point-Based Value Iteration for Continuous POMDPs, The Journal of Machine Learning Research, 7, (2329-2367), Online publication date: 1-Dec-2006.
  641. Jonsson A and Barto A (2006). Causal Graph Based Decomposition of Factored MDPs, The Journal of Machine Learning Research, 7, (2259-2301), Online publication date: 1-Dec-2006.
  642. Bhatnagar S, Borkar V and Akarapu M (2006). A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events, The Journal of Machine Learning Research, 7, (1937-1962), Online publication date: 1-Dec-2006.
  643. Kok J and Vlassis N (2006). Collaborative Multiagent Reinforcement Learning by Payoff Propagation, The Journal of Machine Learning Research, 7, (1789-1828), Online publication date: 1-Dec-2006.
  644. Even-Dar E, Mannor S and Mansour Y (2006). Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, The Journal of Machine Learning Research, 7, (1079-1105), Online publication date: 1-Dec-2006.
  645. Whiteson S and Stone P (2006). Evolutionary Function Approximation for Reinforcement Learning, The Journal of Machine Learning Research, 7, (877-917), Online publication date: 1-Dec-2006.
  646. Munos R (2006). Policy Gradient in Continuous Time, The Journal of Machine Learning Research, 7, (771-791), Online publication date: 1-Dec-2006.
  647. Pucheta J, Pati�o H, Fullana R, Schugurensky C and Kuchen B (2006). A Neuro-Dynamic Programming-Based Optimal Controller for Tomato Seedling Growth in Greenhouse Systems, Neural Processing Letters, 24:3, (241-260), Online publication date: 1-Dec-2006.
  648. Chang J, Wang H and Yin G A time-frame based trust model for p2p systems Proceedings of the 9th international conference on Information Security and Cryptology, (155-165)
  649. ACM
    Nurmi P Modeling energy constrained routing in selfish ad hoc networks Proceeding from the 2006 workshop on Game theory for communications and networks, (6-es)
  650. Hutter M General discounting versus average reward Proceedings of the 17th international conference on Algorithmic Learning Theory, (244-258)
  651. Haruno M and Kawato M (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops, Neural Networks, 19:8, (1242-1254), Online publication date: 1-Oct-2006.
  652. Ohta H and Gunji Y (2006). Recurrent neural network architecture with pre-synaptic inhibition for incremental learning, Neural Networks, 19:8, (1106-1119), Online publication date: 1-Oct-2006.
  653. Tanaka S, Samejima K, Okada G, Ueda K, Okamoto Y, Yamawaki S and Doya K (2006). Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics, Neural Networks, 19:8, (1233-1241), Online publication date: 1-Oct-2006.
  654. Simen P, Cohen J and Holmes P (2006). Rapid decision threshold modulation by reward rate in a neural network, Neural Networks, 19:8, (1013-1026), Online publication date: 1-Oct-2006.
  655. Sakai Y, Okamoto H and Fukai T (2006). Computational algorithms and neuronal network models underlying decision processes, Neural Networks, 19:8, (1091-1105), Online publication date: 1-Oct-2006.
  656. Dayan P, Niv Y, Seymour B and Daw N (2006). The misbehavior of value and the discipline of the will, Neural Networks, 19:8, (1153-1160), Online publication date: 1-Oct-2006.
  657. Torrey L, Shavlik J, Walker T and Maclin R Skill acquisition via transfer learning and advice taking Proceedings of the 17th European conference on Machine Learning, (425-436)
  658. Ishii S and Yoshida W (2006). Part 4: Reinforcement learning: Machine learning and natural learning, New Generation Computing, 24:3, (325-350), Online publication date: 1-Sep-2006.
  659. ACM
    Pardoe D, Stone P, Saar-Tsechansky M and Tomak K Adaptive mechanism design Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet, (92-102)
  660. Frampton M and Lemon O Learning more effective dialogue strategies using limited dialogue move features Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, (185-192)
  661. Shahaf D and Amir E Learning partially observable action schemas Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (913-919)
  662. Yeow W, Tham C and Wong W Hard constrained semi-Markov decision processes Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (549-554)
  663. Whiteson S and Stone P Sample-efficient evolutionary function approximation for reinforcement learning Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (518-523)
  664. Soni V and Singh S Using Homomorphisms to transfer options across continuous reinforcement learning domains Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (494-499)
  665. Maclin R, Shavlik J, Walker T and Torrey L A simple and effective method for incorporating advice into kernel methods Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (427-432)
  666. Liu Y and Stone P Value-function-based transfer for reinforcement learning using structure mapping Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (415-420)
  667. Hundt C, Panagaden P, Pineau J and Precup D Representing systems with hidden state Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (368-374)
  668. Geramifard A, Bowling M and Sutton R Incremental least-squares temporal difference learning Proceedings of the 21st national conference on Artificial intelligence - Volume 1, (356-361)
  669. Stanley K, Bryant B, Karpov I and Miikkulainen R Real-time evolution of neural networks in the NERO video game proceedings of the 21st national conference on Artificial intelligence - Volume 2, (1671-1674)
  670. Agogino A and Tumer K QUICR-learning for multi-agent coordination proceedings of the 21st national conference on Artificial intelligence - Volume 2, (1438-1443)
  671. ACM
    Lee G and Bulitko V Genetic algorithms for action set selection across domains Proceedings of the 8th annual conference on Genetic and evolutionary computation, (1697-1704)
  672. ACM
    Whiteson S and Stone P On-line evolutionary computation for reinforcement learning in stochastic domains Proceedings of the 8th annual conference on Genetic and evolutionary computation, (1577-1584)
  673. ACM
    Bosman P and de Jong E Combining gradient techniques for numerical multi-objective evolutionary optimization Proceedings of the 8th annual conference on Genetic and evolutionary computation, (627-634)
  674. Galindo C, Cruz-Martin A, Blanco J, Fernńndez-Madrigal J and Gonzalez J (2006). A multi-agent control architecture for a robotic wheelchair, Applied Bionics and Biomechanics, 3:3, (179-189), Online publication date: 1-Jul-2006.
  675. Hurst J and Bull L (2006). A Neural Learning Classifier System with Self-Adaptive Constructivism for Mobile Robot Control, Artificial Life, 12:3, (353-380), Online publication date: 1-Jul-2006.
  676. Laporte C and Arbel T (2006). Efficient Discriminant Viewpoint Selection for Active Bayesian Recognition, International Journal of Computer Vision, 68:3, (267-287), Online publication date: 1-Jul-2006.
  677. Abbott R Automated expert modeling for automated student evaluation Proceedings of the 8th international conference on Intelligent Tutoring Systems, (1-10)
  678. ACM
    Toussaint M and Storkey A Probabilistic inference for solving discrete and continuous state Markov Decision Processes Proceedings of the 23rd international conference on Machine learning, (945-952)
  679. ACM
    Poupart P, Vlassis N, Hoey J and Regan K An analytic solution to discrete Bayesian reinforcement learning Proceedings of the 23rd international conference on Machine learning, (697-704)
  680. ACM
    Konidaris G and Barto A Autonomous shaping Proceedings of the 23rd international conference on Machine learning, (489-496)
  681. ACM
    Keller P, Mannor S and Precup D Automatic basis function construction for approximate dynamic programming and reinforcement learning Proceedings of the 23rd international conference on Machine learning, (449-456)
  682. ACM
    Asgharbeygi N, Stracuzzi D and Langley P Relational temporal difference learning Proceedings of the 23rd international conference on Machine learning, (49-56)
  683. ACM
    Abbeel P, Quigley M and Ng A Using inaccurate models in reinforcement learning Proceedings of the 23rd international conference on Machine learning, (1-8)
  684. Madeira C, Corruble V and Ramalho G Designing a reinforcement learning-based adaptive AI for large-scale strategy games Proceedings of the Second AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (121-123)
  685. White C and Brogan D The self organization of context for learning in multiagent games Proceedings of the Second AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (92-97)
  686. Micelli V Searching for grammar right Proceedings of the Third Workshop on Scalable Natural Language Understanding, (57-64)
  687. Chapados N and Bengio Y The K best-paths approach to approximate dynamic programming with application to portfolio optimization Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence, (491-502)
  688. Lu F, Boritz J and Covvey D Adaptive fraud detection using benford's law Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence, (347-358)
  689. Tetreault J and Litman D Comparing the utility of state features in spoken dialogue using reinforcement learning Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, (272-279)
  690. Palmer V Multi-Agent Least-Squares Policy Iteration Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy, (733-734)
  691. Jung T and Polani D Least Squares SVM for Least Squares TD Learning Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy, (499-503)
  692. ACM
    Kim D, Park S, Jin Y, Chang H, Park Y, Ko I, Lee K, Lee J, Park Y and Lee S SHAGE Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems, (79-85)
  693. Eiben A, Horvath M, Kowalczyk W and Schut M Reinforcement learning for online control of evolutionary algorithms Proceedings of the 4th international conference on Engineering self-organising systems, (151-160)
  694. ACM
    Tumer K Coordinating simple and unreliable agents Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, (1119-1121)
  695. ACM
    Simari G and Parsons S On the relationship between MDPs and the BDI architecture Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, (1041-1048)
  696. ACM
    Conde T and Thalmann D Learnable behavioural model for autonomous virtual agents Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, (89-96)
  697. Miao C, Weng J, Goh A, Shen Z and An B (2006). Fuzzy cognitive maps for dynamic grid service negotiation, Multiagent and Grid Systems, 2:2, (101-114), Online publication date: 1-Mar-2006.
  698. Vengerov D Adaptive utility-based scheduling in resource-constrained systems Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence, (477-488)
  699. Powell W The optimizing-simulator Proceedings of the 37th conference on Winter simulation, (96-109)
  700. Str�sslin T, Sheynikhovich D, Chavarriaga R and Gerstner W (2005). 2005 Special issue, Neural Networks, 18:9, (1125-1140), Online publication date: 1-Nov-2005.
  701. Panait L and Luke S (2005). Cooperative Multi-Agent Learning, Autonomous Agents and Multi-Agent Systems, 11:3, (387-434), Online publication date: 1-Nov-2005.
  702. ACM
    Ziane S and Melouk A A swarm intelligent multi-path routing for multimedia traffic over mobile ad hoc networks Proceedings of the 1st ACM international workshop on Quality of service & security in wireless and mobile networks, (55-62)
  703. English M and Heeman P Learning mixed initiative dialog strategies by using reinforcement learning on both conversants Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, (1011-1018)
  704. Torrey L, Walker T, Shavlik J and Maclin R Using advice to transfer knowledge acquired in one reinforcement learning task to another Proceedings of the 16th European conference on Machine Learning, (412-424)
  705. Riedmiller M Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method Proceedings of the 16th European conference on Machine Learning, (317-328)
  706. Peters J, Vijayakumar S and Schaal S Natural actor-critic Proceedings of the 16th European conference on Machine Learning, (280-291)
  707. Yin P, Bhanu B, Chang K and Dong A (2005). Integrating Relevance Feedback Techniques for Image Retrieval Using Reinforcement Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27:10, (1536-1551), Online publication date: 1-Oct-2005.
  708. Preda M and Popescu D Personalized Web Recommendations Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, (692-695)
  709. Zambetta F and Abbattista F The design and implementation of SAMIR Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II, (768-774)
  710. Nakamura Y, Mori T and Ishii S An off-policy natural policy gradient method for a partial observable Markov decision process Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II, (431-436)
  711. Kasderidis S and Taylor J Combining attention and value maps Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I, (79-84)
  712. ACM
    Vienne P and Sourrouille J A middleware for autonomic QoS management based on learning Proceedings of the 5th international workshop on Software engineering and middleware, (1-8)
  713. Damoulas T, Cos-Aguilera I, Hayes G and Taylor T Valency for adaptive homeostatic agents Proceedings of the 8th European conference on Advances in Artificial Life, (936-945)
  714. ACM
    Jodogne S and Piater J Interactive learning of mappings from visual percepts to actions Proceedings of the 22nd international conference on Machine learning, (393-400)
  715. ACM
    Engel Y, Mannor S and Meir R Reinforcement learning with Gaussian processes Proceedings of the 22nd international conference on Machine learning, (201-208)
  716. ACM
    Crandall J and Goodrich M Learning to compete, compromise, and cooperate in repeated general-sum games Proceedings of the 22nd international conference on Machine learning, (161-168)
  717. ACM
    Abbeel P and Ng A Exploration and apprenticeship learning in reinforcement learning Proceedings of the 22nd international conference on Machine learning, (1-8)
  718. Wu J and Givan R Feature-Discovering approximate value iteration methods Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation, (321-331)
  719. ACM
    Whiteson S Improving reinforcement learning function approximators via neuroevolution Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, (1386-1386)
  720. ACM
    Agogino A and Turner K Multi-agent reward analysis for learning in noisy domains Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, (81-88)
  721. Now� A, Verbeeck K and Peeters M Learning automata as a basis for multi agent reinforcement learning Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems, (71-85)
  722. Sherstov A and Stone P Improving action selection in MDP's via knowledge transfer Proceedings of the 20th national conference on Artificial intelligence - Volume 2, (1024-1029)
  723. Maclin R, Shavlik J, Torrey L, Walker T and Wild E Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression Proceedings of the 20th national conference on Artificial intelligence - Volume 2, (819-824)
  724. Banerjee B and Peng J Efficient no-regret multiagent learning Proceedings of the 20th national conference on Artificial intelligence - Volume 1, (41-46)
  725. Yoshida T, Shinkai D and Nishida S (2005). A document retrieval support system with term relationship, Web Intelligence and Agent Systems, 3:3, (171-182), Online publication date: 1-Jul-2005.
  726. ACM
    Drugowitsch J and Barry A XCS with eligibility traces Proceedings of the 7th annual conference on Genetic and evolutionary computation, (1851-1858)
  727. ACM
    Thierens D An adaptive pursuit strategy for allocating operator probabilities Proceedings of the 7th annual conference on Genetic and evolutionary computation, (1539-1546)
  728. ACM
    Murata T and Yamaguchi M Neighboring crossover to improve GA-based Q-learning method for multi-legged robot control Proceedings of the 7th annual conference on Genetic and evolutionary computation, (145-146)
  729. ACM
    Francik J and Szarowicz A Integrate and conquer Proceedings of the 2005 ACM SIGCHI International Conference on Advances in computer entertainment technology, (413-420)
  730. ACM
    Kephart J Research challenges of autonomic computing Proceedings of the 27th international conference on Software engineering, (15-22)
  731. Mainland G, Parkes D and Welsh M Decentralized, adaptive resource allocation for sensor networks Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2, (315-328)
  732. W�st C, Steffens L, Verhaegh W, Bril R and Hentschel C (2005). QoS Control Strategies for High-Quality Video Processing, Real-Time Systems, 30:1-2, (7-29), Online publication date: 1-May-2005.
  733. Chen G, Yang Z, He H and Goh K (2005). Coordinating Multiple Agents via Reinforcement Learning, Autonomous Agents and Multi-Agent Systems, 10:3, (273-328), Online publication date: 1-May-2005.
  734. Lilith N and Dogan�ay K Reduced-State SARSA featuring extended channel reassignment for dynamic channel allocation in mobile cellular networks Proceedings of the 4th international conference on Networking - Volume Part II, (531-542)
  735. ACM
    Pardoe D and Stone P (2005). Developing adaptive auction mechanisms, ACM SIGecom Exchanges, 5:3, (1-10), Online publication date: 1-Apr-2005.
  736. ACM
    Dinerstein J and Egbert P (2005). Fast multi-level adaptation for interactive autonomous characters, ACM Transactions on Graphics, 24:2, (262-288), Online publication date: 1-Apr-2005.
  737. ACM
    Katayama K, Koshiishi T and Narihisa H Reinforcement learning agents with primary knowledge designed by analytic hierarchy process Proceedings of the 2005 ACM symposium on Applied computing, (14-21)
  738. Li M, Wu X, Yao R and Yan X Q-DPM Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (526-527)
  739. Rashidi F Design of multi agent adaptive neuro-fuzzy based intelligent controllers for multi-objective nonlinear system Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases, (1-6)
  740. Jasso H and Triesch J A virtual reality platform for modeling cognitive development Biomimetic Neural Learning for Intelligent Robots, (211-224)
  741. K�n�nen V (2005). Gradient descent for symmetric and asymmetric multiagent reinforcement learning, Web Intelligence and Agent Systems, 3:1, (17-30), Online publication date: 1-Jan-2005.
  742. Golkhou V, Parnianpour M and Lucas C (2004). The role of multisensor data fusion in neuromuscular control of a sagittal arm with a pair of muscles using actor-critic reinforcement learning method, Technology and Health Care, 12:6, (425-438), Online publication date: 1-Dec-2004.
  743. Mangasarian O, Shavlik J and Wild E (2004). Knowledge-Based Kernel Approximation, The Journal of Machine Learning Research, 5, (1127-1141), Online publication date: 1-Dec-2004.
  744. Mannor S and Shimkin N (2004). A Geometric Approach to Multi-Criterion Reinforcement Learning, The Journal of Machine Learning Research, 5, (325-360), Online publication date: 1-Dec-2004.
  745. ACM
    Joshi K, Hiltunen M, Schlichting R, Sanders W and Agbaria A Online model-based adaptation for optimizing performance and dependability Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems, (85-89)
  746. ACM
    Dowling J and Cahill V Self-managed decentralised systems using K-components and collaborative reinforcement learning Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems, (39-43)
  747. Hryshko A and Downs T (2004). System for foreign exchange trading using genetic algorithms and reinforcement learning, International Journal of Systems Science, 35:13-14, (763-774), Online publication date: 20-Oct-2004.
  748. Lee J and Lee K Precomputing avatar behavior from human motion data Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, (79-87)
  749. ACM
    Abe N, Verma N, Apte C and Schroko R Cross channel optimized marketing by reinforcement learning Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, (767-772)
  750. Agogino A and Tumer K Unifying Temporal and Structural Credit Assignment Problems Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2, (980-987)
  751. Weinberg M and Rosenschein J Best-Response Multiagent Learning in Non-Stationary Environments Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2, (506-513)
  752. Hauk T, Buro M and Schaeffer J *-MINIMAX performance in backgammon Proceedings of the 4th international conference on Computers and Games, (51-66)
  753. Kveton B and Hauskrecht M Heuristic refinements of approximate linear programming for factored continuous-state Markov decision processes Proceedings of the Fourteenth International Conference on International Conference on Automated Planning and Scheduling, (306-314)
  754. Koenig S, Likhachev M, Liu Y and Furcy D (2004). Incremental heuristic search in AI, AI Magazine, 25:2, (99-112), Online publication date: 1-Jun-2004.
  755. K�n�nen V (2004). Asymmetric multiagent reinforcement learning, Web Intelligence and Agent Systems, 2:2, (105-121), Online publication date: 1-Apr-2004.
  756. Gosavi A (2004). A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward, Machine Language, 55:1, (5-29), Online publication date: 1-Apr-2004.
  757. E Silva Jr. E, Idiart M, Trevisan M and Engel P (2004). Autonomous Learning Architecture for Environmental Mapping, Journal of Intelligent and Robotic Systems, 39:3, (243-263), Online publication date: 1-Mar-2004.
  758. Nareyek A Choosing search heuristics by non-stationary reinforcement learning Metaheuristics, (523-544)
  759. Mart�n M and Geffner H (2004). Learning Generalized Policies from Planning Examples Using Concept Languages, Applied Intelligence, 20:1, (9-19), Online publication date: 1-Jan-2004.
  760. Burke E, Kendall G and Soubeiga E (2003). A Tabu-Search Hyperheuristic for Timetabling and Rostering, Journal of Heuristics, 9:6, (451-470), Online publication date: 1-Dec-2003.
  761. Abe N, Biermann A and Long P (2003). Reinforcement Learning with Immediate Rewards and Linear Hypotheses, Algorithmica, 37:4, (263-293), Online publication date: 1-Dec-2003.
  762. Yang Q and Cheng H Mining Plans for Customer-Class Transformation Proceedings of the Third IEEE International Conference on Data Mining
  763. ACM
    Macskassy S and Hirsh H Adding numbers to text classification Proceedings of the twelfth international conference on Information and knowledge management, (240-246)
  764. Barto A and Mahadevan S (2003). Recent Advances in Hierarchical Reinforcement Learning, Discrete Event Dynamic Systems, 13:4, (341-379), Online publication date: 1-Oct-2003.
  765. Secomandi N (2003). Analysis of a Rollout Approach to Sequencing Problems with Stochastic Routing Applications, Journal of Heuristics, 9:4, (321-352), Online publication date: 1-Sep-2003.
  766. ACM
    Etzioni O, Tuchinda R, Knoblock C and Yates A To buy or not to buy Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, (119-128)
  767. Isukapalli R and Greiner R Use of off-line dynamic programming for efficient image interpretation Proceedings of the 18th international joint conference on Artificial intelligence, (1319-1325)
  768. Guestrin C, Koller D, Gearhart C and Kanodia N Generalizing plans to new environments in relational MDPs Proceedings of the 18th international joint conference on Artificial intelligence, (1003-1010)
  769. Margoliash D (2003). Offline learning and the role of autogenous speech, Speech Communication, 41:1, (165-178), Online publication date: 1-Aug-2003.
  770. ACM
    Bradley J and Hayes G Introducing an agent of a certain persuasion Proceedings of the second international joint conference on Autonomous agents and multiagent systems, (944-945)
  771. ACM
    Huang P and Sycara K Multi-agent learning in extensive games with complete information Proceedings of the second international joint conference on Autonomous agents and multiagent systems, (701-708)
  772. ACM
    Banerjee B and Peng J Adaptive policy gradient in multiagent learning Proceedings of the second international joint conference on Autonomous agents and multiagent systems, (686-692)
  773. ACM
    Cheng S, Leung E, Lochner K, O'Malley K, Reeves D, Schvartzman L and Wellman M Walverine Proceedings of the second international joint conference on Autonomous agents and multiagent systems, (465-472)
  774. G�rard P and Sigaud O Designing efficient exploration with MACS Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (1882-1893)
  775. Ohigashi Y, Omori T, Morikawa K and Oka N Acceleration of game learning with prediction-based reinforcement learning Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing, (786-793)
  776. Yang Q and Cheng H Planning for marketing campaigns Proceedings of the Thirteenth International Conference on International Conference on Automated Planning and Scheduling, (174-183)
  777. Bonet B and Geffner H Labeled RTDP Proceedings of the Thirteenth International Conference on International Conference on Automated Planning and Scheduling, (12-21)
  778. Godzik N, Schoenauer M and Sebag M Evolving symbolic controllers Proceedings of the 2003 international conference on Applications of evolutionary computing, (638-650)
  779. Akar N and Sahin C Reinforcement learning as a means of dynamic aggregate QoS provisioning Proceedings of the 2003 international conference on Architectures for quality of service in the internet, (100-114)
  780. ACM
    Mill�n J (2003). Adaptive brain interfaces, Communications of the ACM, 46:3, (74-80), Online publication date: 1-Mar-2003.
  781. Lee I, Lau H and Wai L An experimental evaluation of reinforcement learning for gain scheduling Design and application of hybrid intelligent systems, (351-360)
  782. Cao X (2003). From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning, Discrete Event Dynamic Systems, 13:1-2, (9-39), Online publication date: 1-Jan-2003.
  783. Barto A and Mahadevan S (2003). Recent Advances in Hierarchical Reinforcement Learning, Discrete Event Dynamic Systems, 13:1-2, (41-77), Online publication date: 1-Jan-2003.
  784. Mesot B, Sanchez E, Pe�a C and Perez-Uribe A SOS++ Proceedings of the eighth international conference on Artificial life, (264-273)
  785. Kearns M, Mansour Y and Ng A (2002). A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes, Machine Language, 49:2-3, (193-208), Online publication date: 1-Nov-2002.
  786. Vullo A and Frasconi P A Bi-Recursive Neural Network Architecture for the Prediction of Protein Coarse Contact Maps Proceedings of the IEEE Computer Society Conference on Bioinformatics
  787. Bonet B and Pearl J Qualitative MDPs and POMDPs Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, (61-68)
  788. Lawson J and Wolpert D The design of collectives of agents to control non-Markovian systems Eighteenth national conference on Artificial intelligence, (332-337)
  789. Perkins T Reinforcement learning for POMDPs based on action values and stochastic optimization Eighteenth national conference on Artificial intelligence, (199-204)
  790. ACM
    Pednault E, Abe N and Zadrozny B Sequential cost-sensitive decision making with reinforcement learning Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, (259-268)
  791. ACM
    Blumberg B, Downie M, Ivanov Y, Berlin M, Johnson M and Tomlinson B Integrated learning for interactive synthetic characters Proceedings of the 29th annual conference on Computer graphics and interactive techniques, (417-426)
  792. ACM
    Tesauro G and Bredin J Strategic sequential bidding in auctions using dynamic programming Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2, (591-598)
  793. ACM
    Tumer K, Agogino A and Wolpert D Learning sequences of actions in collectives of autonomous agents Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1, (378-385)
  794. Jokinen K, Kerminen A, Kaipainen M, Jauhiainen T, Wilcock G, Turunen M, Hakulinen J, Kuusisto J and Lagus K Adaptive dialogue systems - interaction with interact Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2, (64-73)
  795. Scheffler K and Young S Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning Proceedings of the second international conference on Human Language Technology Research, (12-19)
  796. Franklin J and Manfredi V Nonlinear credit assignment for musical sequences Computational intelligence and applications, (245-250)
  797. Domingos P Machine learning Handbook of data mining and knowledge discovery, (660-670)
  798. Belker T, Beetz M and Cremers A (2002). Learning of plan execution policies for indoor navigation, AI Communications, 15:1, (3-16), Online publication date: 1-Jan-2002.
  799. Smith R, Dike B, Ravichandran B, El-Fallah A and Mehra R Discovering novel fighter combat maneuvers Creative evolutionary systems, (467-486)
  800. Likas A (2001). Reinforcement Learning Using the Stochastic Fuzzy Min–Max Neural Network, Neural Processing Letters, 13:3, (213-220), Online publication date: 9-Jul-2001.
  801. Sun R and Giles C (2001). Sequence Learning, IEEE Intelligent Systems, 16:4, (67-70), Online publication date: 1-Jul-2001.
  802. ACM
    Minut S and Mahadevan S A reinforcement learning model of selective visual attention Proceedings of the fifth international conference on Autonomous agents, (457-464)
  803. ACM
    Dooly D, Goldman S and Scott S (2001). On-line analysis of the TCP acknowledgment delay problem, Journal of the ACM, 48:2, (243-273), Online publication date: 1-Mar-2001.
  804. Kocsis L, Uiterwijk J and Herik H Learning Time Allocation Using Neural Networks Revised Papers from the Second International Conference on Computers and Games, (170-185)
  805. Ng A and Jordan M PEGASUS Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, (406-415)
  806. ACM
    Wolpert D, Kirshner S, Merz C and Tumer K Adaptivity in agent-based routing for data networks Proceedings of the fourth international conference on Autonomous agents, (396-403)
  807. ACM
    Avnur R and Hellerstein J (2000). Eddies, ACM SIGMOD Record, 29:2, (261-272), Online publication date: 1-Jun-2000.
  808. ACM
    Avnur R and Hellerstein J Eddies Proceedings of the 2000 ACM SIGMOD international conference on Management of data, (261-272)
  809. Litman D, Singh S, Kearns M and Walker M NJFun Proceedings of the ANLP-NAACL 2000 Workshop on Conversational Systems, (17-20)
  810. Litman D, Singh S, Kearns M and Walker M NJFun Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems - Volume 3, (17-20)
  811. ACM
    Debenham J A multi-agent architecture for process management accommodates unexpected performance Proceedings of the 2000 ACM symposium on Applied computing - Volume 1, (15-19)
  812. Touzet C (2000). Robot Awareness in Cooperative Mobile Robot Learning, Autonomous Robots, 8:1, (87-97), Online publication date: 1-Jan-2000.
  813. Meuleau N, Peshkin L, Kim K and Kaelbling L Learning finite-state controllers for partially observable environments Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (427-436)
  814. Meuleau N, Kim K, Kaelbling L and Cassandra A Solving POMDPs by searching the space of finite policies Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (417-426)
  815. McAllester D and Singh S Approximate planning for factored POMDPs using belief state simplification Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (409-416)
  816. Mansour Y and Singh S On the complexity of policy iteration Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, (401-408)
  817. Schuurmans D and Greenwald L Efficient exploration for optimizing immediate reward Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, (385-392)
  818. Balkenius C (1999). Dynamics of a Classical Conditioning Model, Autonomous Robots, 7:1, (41-56), Online publication date: 1-Jul-1999.
  819. Wiering M, Sałustowicz R and Schmidhuber J (1999). Reinforcement Learning Soccer Teams with Incomplete World Models, Autonomous Robots, 7:1, (77-88), Online publication date: 1-Jul-1999.
  820. Bertsekas D and Castanon D (1999). Rollout Algorithms for Stochastic Scheduling Problems, Journal of Heuristics, 5:1, (89-108), Online publication date: 1-Apr-1999.
  821. IEEE Intelligent Systems staff (1999). Room Service, AI-Style, IEEE Intelligent Systems, 14:2, (8-19), Online publication date: 1-Mar-1999.
  822. Jensen R and Veloso M OBDD-based universal planning Artificial intelligence today, (213-248)
  823. Mahadevan S, Theocharous G and Khaleeli N (1998). Rapid Concept Learning for Mobile Robots, Autonomous Robots, 5:3-4, (239-251), Online publication date: 1-Jul-1998.
  824. Mahadevan S, Theocharous G and Khaleeli N (1998). Rapid Concept Learning for Mobile Robots, Machine Language, 31:1-3, (7-27), Online publication date: 1-Apr-1998.
  825. Ho Y, Kuo P, Wang H and Li T Fuzzy Q-Learning Based Weight-Lifting Autobalancing Control Strategy for Adult-Sized Humanoid Robots 2015 IEEE International Conference on Systems, Man, and Cybernetics, (364-369)
  826. Nichols B Continuous Action-Space Reinforcement Learning Methods Applied to the Minimum-Time Swing-Up of the Acrobot 2015 IEEE International Conference on Systems, Man, and Cybernetics, (2084-2089)
  827. Toubman A, Roessingh J, Spronck P, Plaat A and Herik J Rewarding Air Combat Behavior in Training Simulations 2015 IEEE International Conference on Systems, Man, and Cybernetics, (1397-1402)
  828. Moorthy S and Guan Z FlyTera: Echo State Learning for Joint Access and Flight Control in THz-enabled Drone Networks 2020 17th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), (1-9)
  829. Najar A, Sigaud O and Chetouani M Training a robot with evaluative feedback and unlabeled guidance signals 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), (261-266)
  830. Singh P, Singh V, Dutta S and Kumar S Model & Feature Agnostic Eye-in-Hand Visual Servoing using Deep Reinforcement Learning with Prioritized Experience Replay 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), (1-8)
  831. Kronewitter F, Lee S and Oliphant K A Cognitive ML Agent for Airborne Networking MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM), (115-120)
  832. Isele D, Luna J, Eaton E, de la Cruz G, Irwin J, Kallaher B and Taylor M Lifelong learning for disturbance rejection on mobile robots 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (3993-3998)
  833. Gordillo C, Frank B, Ulbert I, Paul O, Ruther P and Burgard W Automatic channel selection in neural microprobes: A combinatorial multi-armed bandit approach 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (1844-1850)
  834. Jeong H and Lee D Efficient learning of stand-up motion for humanoid robots with bilateral symmetry 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (1544-1549)
  835. Wang H, Kaplan Z, Niu D and Li B Optimizing Federated Learning on Non-IID Data with Reinforcement Learning IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, (1698-1707)
  836. Emara S, Li B and Chen Y Eagle: Refining Congestion Control by Learning from the Experts IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, (676-685)
  837. Sunberg Z, Kochenderfer M and Pavone M Optimized and trusted collision avoidance for unmanned aerial vehicles using approximate dynamic programming 2016 IEEE International Conference on Robotics and Automation (ICRA), (1455-1461)
  838. Choi S, Lee K and Oh S Robust learning from demonstration using leveraged Gaussian processes and sparse-constrained optimization 2016 IEEE International Conference on Robotics and Automation (ICRA), (470-475)
  839. Kumar V, Todorov E and Levine S Optimal control with learned local models: Application to dexterous manipulation 2016 IEEE International Conference on Robotics and Automation (ICRA), (378-383)
  840. Blanton R, Li X, Mai K, Marculescu D, Marculescu R, Paramesh J, Schneider J and Thomas D Statistical learning in chip (SLIC) 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (664-669)
  841. Macua S, Zazo S and Zazo J Learning in constrained stochastic dynamic potential games 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (4568-4572)
  842. Tsurumine Y, Cui Y, Yamazaki K and Matsubara T Generative Adversarial Imitation Learning with Deep P-Network for Robotic Cloth Manipulation 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), (274-280)
  843. Cui Y, Matsubara T and Sugimoto K Local Update Dynamic Policy Programming in reinforcement learning of pneumatic artificial muscle-driven humanoid hand control 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), (1083-1089)
  844. Stamatakis G, Pappas N and Traganitis A Controlling Status Updates in a Wireless System with Heterogeneous Traffic and Age of Information Constraints 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  845. Dinh T, Alsheikh M, Gong S, Niyato D, Han Z and Liang Y Defend Jamming Attacks: How to Make Enemies Become Friends 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  846. Tan J, Zhang L and Liang Y Deep Reinforcement Learning for Channel Selection and Power Control in D2D Networks 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  847. Ding Y, Jiang D, Huang J, Xiao L, Liu S, Tang Y and Dai H QoE-Aware Power Control for UAV-Aided Media Transmission with Reinforcement Learning 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  848. Liu D, Zhao J and Yang C Energy-Saving Predictive Video Streaming with Deep Reinforcement Learning 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  849. Chen X, Zhao Z, Wu C, Chen T, Zhang H and Bennis M Secrecy Preserving in Stochastic Resource Orchestration for Multi-Tenancy Network Slicing 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  850. Cui J, Ding Z, Deng Y and Nallanathan A Model-Free Based Automated Trajectory Optimization for UAVs toward Data Transmission 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  851. Zhang T, Chiang Y, Borcea C and Ji Y Learning-Based Offloading of Tasks with Diverse Delay Sensitivities for Mobile Edge Computing 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  852. Balevi E and Andrews J A Novel Deep Reinforcement Learning Algorithm for Online Antenna Tuning 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
  853. Hwang K, Chiang H and Jiang W Adaboost-like method for inverse reinforcement learning 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1922-1925)
  854. Al-Talabi A and Schwartz H Kalman fuzzy actor-critic learning automaton algorithm for the pursuit-evasion differential game 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1015-1022)
  855. Analikwu C and Schwartz H Reinforcement learning in the guarding a territory game 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1007-1014)
  856. Camci E and Kayacan E Game of drones: UAV pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (618-625)
  857. Jungmann A and Kleinjohann B A holistic and adaptive approach for automated prototyping of image processing functionality 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), (1-8)
  858. Laskey M, Lee J, Chuck C, Gealy D, Hsieh W, Pokorny F, Dragan A and Goldberg K Robot grasping in clutter: Using a hierarchy of supervisors for learning from demonstrations 2016 IEEE International Conference on Automation Science and Engineering (CASE), (827-834)
  859. Liu Q and Hui Q A hybrid ACO algorithm based on Bayesian factorizations and reinforcement learning for continuous optimization 2016 IEEE Congress on Evolutionary Computation (CEC), (4236-4243)
  860. Cheng X, Chen G and Zhang M An XCS-based algorithm for multi-objective reinforcement learning 2016 IEEE Congress on Evolutionary Computation (CEC), (4007-4014)
  861. Kizhakkemadam S, Porwal V, Mantripragada S, Udupi N and Chintapenta B Hybrid scheduling of component carriers for Small Cells in unlicensed spectrum 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC), (164-170)
  862. Edwards A, Hebert J and Pilarski P Machine learning and unlearning to autonomously switch between the functions of a myoelectric arm 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), (514-521)
  863. Ansari Y, Falotico E, Mollard Y, Busch B, Cianchetti M and Laschi C A Multiagent Reinforcement Learning approach for inverse kinematics of high dimensional manipulators with precision positioning 2016 6th IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob), (457-463)
Contributors
  • DeepMind Technologies Limited

Recommendations