[
    {
        "idx": 0,
        "base_model":"PPL Chunking",
        "language":"zh",
        "ppl_threshold":0,
        "chunk_length":150,
        "original_text": "2023-08-01 06:50，正文：“65元一小时，包妆造，探店费用和路费客户报销。”近日，陪拍在各大网络社交平台掀起热潮。平台化经营陪拍的公司应运而生，但入驻平台的审核环节形同虚设，暗藏风险。记者分别以陪拍者和用户两种身份注册，以用户预约时竟匹配到了自己注册的陪拍账号，不过标价竟比记者注册时的报价翻了一倍。有人有游玩拍照的需求，有人有陪同拍照的时间、设备、技术、经验，双方一“拍”即合，一次陪拍服务的交易就达成了。当前，多数陪拍发生于个人之间，这种服务与陪逛街、陪诊等性质一样，都是“陪伴经济”的一部分，满足了一些人的需求，也活跃了市场，增加了灵活就业机会。当然，陪拍服务不都是“美颜”和“笑脸”，有时也会产生纠纷。比如，有的服务提供方为了吸引客户，会对拍摄设备、技术进行虚假、夸大宣传，客户发现其与宣传不符，进而产生权益纠纷；有的双方约定比较粗糙，容易发生“放鸽子”“不守时”等问题。更令人担心的是，陪拍服务多发生在陌生人之间，其中的安全隐患值得警惕。当然，面对这些问题和风险，没有必要因噎废食。对陪拍业态还是应该秉持包容的态度，采取建设性措施予以规范引导。比如，社交平台应加强对陪拍服务者和约拍者的身份、职业等信息的审核，可要求双方对相关的诚信、安全等义务作出承诺，对相关信息进行备案。监管部门应对居间平台或商家在陪拍服务营销过程中的虚假、夸大宣传行为加大治理力度，敦促居间平台或商家向客户提供全面、真实、准确的陪拍服务信息。约拍者与陪拍服务者在沟通预约过程中也有必要全面详细地约定具体事项，形成更完善的陪拍服务契约，对双方形成更明确有力的约束，从源头预防、化解服务纠纷。随着其发展，监管部门、行业协会等还可考虑设计相关合同示范文本，供双方参考。时下，“陪伴经济”给不少人带来了便利和美好，对其中的一些风险和问题，有关方面应认真研究，做好指引，对其后续的发展趋势等也要持续关注。"
    },
    {
        "idx": 1,
        "base_model":"PPL Chunking",
        "language":"zh",
        "ppl_threshold":0,
        "chunk_length":150,
        "original_text": "2023-08-01 10:47，正文：通气会现场 来源：湖南高院7月31日，湖南高院联合省司法厅召开新闻通气会。湖南高院副院长杨翔，省委依法治省办成员、省司法厅党组成员、副厅长杨龙金通报2022年湖南省行政机关负责人出庭应诉有关情况，并发布5个典型案例。2022年，全省经人民法院通知出庭的行政机关负责人出庭应诉率提升至96.5%。杨翔介绍，从出庭应诉数量看，负责人出庭应诉意识普遍提升。2022年，全省法院共发出行政机关负责人出庭应诉通知书4228份，行政机关负责人到庭应诉4018件。行政机关负责人参加调查询问1117件，参与案件协调化解741件。与2021年相比，行政机关负责人到庭应诉和参加调查询问等案件增加2802件。从地区分布情况来看，全省各地经人民法院通知的行政机关负责人出庭应诉率均达到90%以上，较往年有明显提升。2022年，从行政管理领域看，全省法院制发负责人出庭应诉通知书的案件所涉行政管理领域较为集中，自然资源、社会保障、公安、市场监管等部门负责人出庭应诉的案件数量较多。从涉案行政行为看，被诉行为类型相对集中。排名前五的行政行为类型依次为行政征收或征用类案件、行政确认类案件、不履行法定职责类案件、行政处罚类案件及行政登记类案件。从出庭应诉负责人层级比例看，基层行政机关负责人出庭应诉占比较高。县市区及乡镇负责人出庭应诉数量占全部出庭应诉案件数的80.8%。杨龙金介绍，为进一步加强和完善负责人出庭应诉制度建设，省委依法治省办、省法院、省司法厅联合印发《关于进一步推进行政机关负责人出庭应诉的工作方案》（以下简称《工作方案》），推动省政府出台《湖南省行政应诉工作规定》并召开全省行政应诉工作会议，依托府院联动，推动行政机关负责人出庭应诉工作有序开展。湖南高院、省司法厅根据最高人民法院相关司法解释，在《工作方案》中统一了行政机关负责人出庭应诉的认定标准和计算方式，实现了全省负责人出庭应诉工作的标准化和规范化。同时，推动将行政机关负责人出庭应诉情况纳入省绩效考核、平安建设、市域社会治理等考核指标体系，进一步压实出庭应诉主体责任。《工作方案》还明确将行政机关负责人参与调解和解并实质化解争议的案件视为已履行出庭应诉义务，既提高了负责人出庭应诉的积极性，也有力维护了当事人合法权益，促进经济社会和谐稳定。"
    },
    {
        "idx": 2,
        "base_model":"PPL Chunking",
        "language":"en",
        "ppl_threshold":0,
        "chunk_length":150,
        "original_text":"Following the emergence of ChatGPT (OpenAI, 2022), enthusiasm for large language models (LLMs) has escalated globally. The release of the Llama series (Touvron et al., 2023) has further ignited interests within the open-source community, particularly regarding GPT-level local LLMs. Recently, Claude-3 Opus (Anthropic, 2024) and GPT-4o (omni) (OpenAI, 2024), the updated model for ChatGPT, have ascended to the pinnacle of the Chatbot Arena (Chiang et al., 2024) in quick succession. This platform is well-regarded for its human evaluations of LLMs. Moreover, Llama-3 (AI@Meta, 2024) has emerged as the state-of-the-art open-weight model series, narrowing the performance gap with leading proprietary models and widely acknowledged as GPT-4–level. An increasing number of competitive LLMs are now pursuing advancements similar to those made by the GPT series from OpenAI. Many of these models, including Qwen (Bai et al., 2023a), Mistral (Jiang et al., 2023a), Gemma (Mesnard et al., 2024), etc., have been released in an open-weight manner. Over recent months, we have successively introduced the Qwen series (Bai et al., 2023a) and progressed to Qwen1.5 (Qwen Team, 2024a). In the meantime, we have unveiled the vision-language model Qwen-VL (Bai et al., 2023b), and launched the audio-language model Qwen-Audio (Chu et al., 2023). In this work, we introduce the newest addition to the Qwen family of large language models and large multimodal modles: Qwen2. Qwen2 is a series of LLMs, grounded in the Transformer architecture (Vaswani et al., 2017), trained using next-token prediction. The model series encompasses foundational, i.e., base language models, pre-trained but unaligned to human preferences, and instruction-tuned models, fine-tuned with single-turn and multi-turn instruction following datasets suitable for chat and agent purposes. Our release comprises four dense models with parameter counts of 0.5 billion, 1.5 billion, 7 billion, and 72 billion, plus a Mixture-of-Experts (MoE) model with 57 billion parameters, of which 14 billion are activated for each token. The smaller models, specifically Qwen2-0.5B and Qwen2-1.5B, are designed for easy deployment on portable devices such as smartphones, earphones, and smart glasses. Conversely, the larger models cater to deployment across GPUs of varying scales. All models were pre-trained on a high-quality, large-scale dataset comprising over 7 trillion tokens, covering a wide range of domains and languages. Compared to previous editions of Qwen, Qwen2 includes a broader spectrum of linguistic data, enhancing the quantity and quality of code and mathematics content. This enrichment is hypothesized to improve reasoning abilities of LLMs. Regarding post-training, all models underwent supervised fine-tuning and direct preference optimization (DPO, Rafailov et al., 2023), aligning them with human preferences through learning from human feedback. This process endows the models with the capability to follow instructions effectively."
    }
    
]