Publication Library

Publication Library

Probabilistic weather forecasting with machine learning

Description: Weather forecasts are fundamentally uncertain, so predicting the range of probable weather scenarios is crucial for important decisions, from warning the public about hazardous weather to planning renewable energy use. Traditionally, weather forecasts have been based on numerical weather prediction (NWP)1, which relies on physics- based simulations of the atmosphere. Recent advances in machine learning (ML)- based weather prediction (MLWP) have produced ML-based models with less forecast error than single NWP simulations2,3. However, these advances have focused primarily on single, deterministic forecasts that fail to represent uncertainty and estimate risk. Overall, MLWP has remained less accurate and reliable than state-of-the-art NWP ensemble forecasts. Here we introduce GenCast, a probabilistic weather model with greater skill and speed than the top operational medium-range weather forecast in the world, ENS, the ensemble forecast of the European Centre for Medium-Range Weather Forecasts4. GenCast is an ML weather prediction method, trained on decades of reanalysis data. GenCast generates an ensemble of stochastic 15-day global forecasts, at 12-h steps and 0.25° latitude–longitude resolution, for more than 80 surface and atmospheric variables, in 8 min. It has greater skill than ENS on 97.2% of 1,320 targets we evaluated and better predicts extreme weather, tropical cyclone tracks and wind power production. This work helps open the next chapter in operational weather forecasting, in which crucial weather-dependent decisions are made more accurately and efficiently.

Created At: 30 January 2025

Updated At: 30 January 2025

Challenges in Human-Agent Communication

Description: Remarkable advancements in modern generative foundation models have enabled the development of sophisticated and highly capable autonomous agents that can observe their environment, invoke tools, and communicate with other agents to solve problems. Although such agents can communicate with users through natural language, their complexity and wide-ranging failure modes present novel challenges for human-AI interaction. Building on prior research and informed by a communication grounding perspective, we contribute to the study of human-agent communication by identifying and analyzing twelve key communication challenges that these systems pose. These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication. We illustrate each challenge through concrete examples and identify open directions of research. Our findings provide insights into critical gaps in human-agent communication research and serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.

Created At: 30 January 2025

Updated At: 30 January 2025

Agent Workflow Memory

Description: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

Created At: 30 January 2025

Updated At: 30 January 2025

Exploring the Role of Social Support when Integrating Generative AI into Small Business Workflows

Description: Small business owners stand to benefit from generative AI technologies due to limited resources, yet they must navigate increasing legal and ethical risks. In this paper, we interview 11 entrepreneurs and support personnel to investigate existing practices of how entrepreneurs integrate generative AI technologies into their business workflows. Specifically, we build on scholarship in HCI which emphasizes the role of small, offline networks in supporting entrepreneurs' technology maintenance. We detail how entrepreneurs resourcefully leveraged their local networks to discover new use cases of generative AI (e.g., by sharing accounts), assuage heightened techno-anxieties (e.g., by recruiting trusted confidants), overcome barriers to sustained use (e.g., by receiving wrap-around support), and establish boundaries of use. Further, we suggest how generative AI platforms may be redesigned to better support entrepreneurs, such as by taking into account the benefits and tensions of use in a social context.

Created At: 30 January 2025

Updated At: 30 January 2025

CowPilot - A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Description: While much work on web agents emphasizes the promise of autonomously performing tasks on behalf of users, in reality, agents often fall short on complex tasks in real-world contexts and modeling user preference. This presents an opportunity for humans to collaborate with the agent and leverage the agent's capabilities effectively. We propose CowPilot, a framework supporting autonomous as well as human-agent collaborative web navigation, and evaluation across task success and task efficiency. CowPilot reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions. During execution, users can interleave their actions with the agent by overriding suggestions or resuming agent control when needed. We conducted case studies on five common websites and found that the human-agent collaborative mode achieves the highest success rate of 95% while requiring humans to perform only 15.2% of the total steps. Even with human interventions during task execution, the agent successfully drives up to half of task success on its own. CowPilot can serve as a useful tool for data collection and agent evaluation across websites, which we believe will enable research in how users and agents can work together. Video demonstrations are available at this https URL: https://oaishi.github.io/cowpilot.html

Created At: 30 January 2025

Updated At: 30 January 2025

First 18 19 20 21 22 23 24 Last