Publication Library

LLMs-as-Judges - A Comprehensive Survey on LLM-based Evaluation Methods

Description: The rapid advancement of Large Language Models (LLMs) has driven their expanding application across various fields. One of the most promising applications is their role as evaluators based on natural language responses, referred to as ''LLMs-as-judges''. This framework has attracted growing attention from both academia and industry due to their excellent effectiveness, ability to generalize across tasks, and interpretability in the form of natural language. This paper presents a comprehensive survey of the LLMs-as-judges paradigm from five key perspectives: Functionality, Methodology, Applications, Meta-evaluation, and Limitations. We begin by providing a systematic definition of LLMs-as-Judges and introduce their functionality (Why use LLM judges?). Then we address methodology to construct an evaluation system with LLMs (How to use LLM judges?). Additionally, we investigate the potential domains for their application (Where to use LLM judges?) and discuss methods for evaluating them in various contexts (How to evaluate LLM judges?). Finally, we provide a detailed analysis of the limitations of LLM judges and discuss potential future directions. Through a structured and comprehensive analysis, we aim aims to provide insights on the development and application of LLMs-as-judges in both research and practice. We will continue to maintain the relevant resource list at this https URL: https://github.com/CSHaitao/Awesome-LLMs-as-Judges

Created At: 30 January 2025

Updated At: 30 January 2025

PDF Read Document

Trends and Reversion in Financial Markets on Time Scales from Minutes to Decades

Description: We empirically analyze the reversion of financial market trends with time horizons ranging from minutes to decades. The analysis covers equities, interest rates, currencies and commodities and combines 14 years of futures tick data, 30 years of daily futures prices, 330 years of monthly asset prices, and yearly financial data since medieval times. Across asset classes, we find that markets are in a trending regime on time scales that range from a few hours to a few years, while they are in a reversion regime on shorter and longer time scales. In the trending regime, weak trends tend to persist, which can be explained by herding behavior of investors. However, in this regime trends tend to revert before they become strong enough to be statistically significant, which can be interpreted as a return of asset prices to their intrinsic value. In the reversion regime, we find the opposite pattern: weak trends tend to revert, while those trends that become statistically significant tend to persist. Our results provide a set of empirical tests of theoretical models of financial markets. We interpret them in the light of a recently proposed lattice gas model, where the lattice represents the social network of traders, the gas molecules represent the shares of financial assets, and efficient markets correspond to the critical point. If this model is accurate, the lattice gas must be near this critical point on time scales from 1 hour to a few days, with a correlation time of a few years.

Created At: 29 January 2025

Updated At: 29 January 2025

PDF Read Document

DeepSeek-R1 - Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Description: We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeekR1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

Created At: 29 January 2025

Updated At: 29 January 2025

PDF Read Document

DeepSeek-V3 Technical Report

Description: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token (https://github.com/deepseek-ai/DeepSeek-V3). To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at this https URL: https://github.com/deepseek-ai/DeepSeek-V3

Created At: 29 January 2025

Updated At: 29 January 2025

PDF Read Document

Introduction to IoT

Description: The Internet of Things has rapidly transformed the 21st century, enhancing decision-making processes and introducing innovative consumer services such as pay-as-you-use models. The integration of smart devices and automation technologies has revolutionized every aspect of our lives, from health services to the manufacturing industry, and from the agriculture sector to mining. Alongside the positive aspects, it is also essential to recognize the significant safety, security, and trust concerns in this technological landscape. This chapter serves as a comprehensive guide for newcomers interested in the IoT domain, providing a foundation for making future contributions. Specifically, it discusses the overview, historical evolution, key characteristics, advantages, architectures, taxonomy of technologies, and existing applications in major IoT domains. In addressing prevalent issues and challenges in designing and deploying IoT applications, the chapter examines security threats across architectural layers, ethical considerations, user privacy concerns, and trust-related issues. This discussion equips researchers with a solid understanding of diverse IoT aspects, providing a comprehensive understanding of IoT technology along with insights into the extensive potential and impact of this transformative field.

Created At: 29 January 2025

Updated At: 29 January 2025

PDF Read Document

First 7 8 9 10 11 12 13 Last