Publication Library
Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions
Description: https://medagentsim.netlify.app/ In this work, we introduce MedAgentSim, an open-source simulated clinical environment with doctor, patient, and measurement agents designed to evaluate and enhance LLM performance in dynamic diagnostic settings. Unlike prior approaches, our framework requires doctor agents to actively engage with patients through multi-turn conversations, requesting relevant medical examinations (e.g., temperature, blood pressure, ECG) and imaging results (e.g., MRI, X-ray) from a measurement agent to mimic the real-world diagnostic process. Additionally, we incorporate self improvement mechanisms that allow models to iteratively refine their diagnostic strategies. We enhance LLM performance in our simulated setting by integrating multi-agent discussions, chain-of-thought reasoning, and experience-based knowledge retrieval, facilitating progressive learning as doctor agents interact with more patients. We also introduce an evaluation benchmark for assessing the LLM's ability to engage in dynamic, context-aware diagnostic interactions. While MedAgentSim is fully automated, it also supports a user-controlled mode, enabling human interaction with either the doctor or patient agent. Comprehensive evaluations in various simulated diagnostic scenarios demonstrate the effectiveness of our approach. Our code, simulation tool, and benchmark are available at https://medagentsim.netlify.app/
Created At: 07 April 2025
Updated At: 07 April 2025
LLM Post-Training - A Deep Dive into Reasoning Large Language Models
Description: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these models, yet the research community is now increasingly shifting focus toward post-training techniques to achieve further breakthroughs. While pretraining provides a broad linguistic foundation, post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations. Fine-tuning, reinforcement learning, and test-time scaling have emerged as critical strategies for optimizing LLMs performance, ensuring robustness, and improving adaptability across various real-world tasks. This survey provides a systematic exploration of post-training methodologies, analyzing their role in refining LLMs beyond pretraining, addressing key challenges such as catastrophic forgetting, reward hacking, and inference-time trade-offs. We highlight emerging directions in model alignment, scalable adaptation, and inference-time reasoning, and outline future research directions. We also provide a public repository to continually track developments in this fast-evolving field: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
Created At: 07 April 2025
Updated At: 07 April 2025
Improving Sense-Making with Artificial Intelligence
Description: The report identifies 20 challenges associated with scaling sense-making processes in five key areas: Collection orchestration Data access and sharing Data fusion and analysis Model management Skills and training The report suggests that AI capabilities, such as natural language processing, computer vision, planning systems, prediction/classification, and expert systems, can be combined to address these challenges.
Created At: 05 April 2025
Updated At: 05 April 2025
State-of-play and future trends on the development of oversight frameworks for emerging technologies - Part 1
Description: As technologies become more pervasive and form a critical aspect of our societal infrastructure, governance and wider oversight mechanisms have a key role to play in ensuring that benefits from technology are maximised and risks are managed proactively. The goal of technology oversight is to ensure that technology is developed, deployed and used in a responsible and ethical manner, and that it does not pose undue risks or harm to individuals or society as a whole. Wellcome commissioned RAND Europe to undertake a study on the state-of-play and future trends on the development of oversight frameworks for emerging technologies. The specific objective of the study is to identify and analyse a suite of oversight frameworks and mechanisms (including associated emerging trends and novel approaches) that are in use, in development or under debate in different jurisdictions across the globe for a set of emerging technologies. The technologies of interest include genomics (specifically engineering biology), human embryology, organoids, neurotechnology, artificial intelligence (AI) (specifically its application and use as a research tool) and data platforms. The study findings are presented in two related documents: the global technology landscape review report and the technology oversight report (this report). The two reports should be read alongside each other. This report examines notable oversight mechanisms that are either established or under development across a selection of global jurisdictions, offering key learning and insights that could inform future technology oversight discussions.
Created At: 05 April 2025
Updated At: 05 April 2025
State-of-play and future trends on the development of oversight frameworks for emerging technologies - Part 2
Description: Part 2 highlights the challenges and opportunities in regulating these technologies, emphasising the need for updated frameworks that address ethical, privacy, and collaboration issues. In Part 2, we use a mixed-methods approach, including desk research, interviews, SWOT analysis and expert elicitation, to examine existing and developing oversight mechanisms. We provide insights into legislative and non-regulatory standards, ethical guidelines and self-regulatory frameworks relevant to key debates on oversight of emerging technologies, including on the lack of specific regulatory frameworks for organoids, ethical challenges in human embryology, fragmented oversight in engineering biology, and privacy concerns in neurotechnology. The study also discusses the potential for dual-use scenarios in neurotechnology and the need for international collaboration in managing biosecurity threats in engineering biology.
Created At: 05 April 2025
Updated At: 05 April 2025