geekdoc-linux-zh/data/The-Linux-Kernel-Module-Pro...

- en: <!--yml
  id: totrans-0
  prefs: []
  type: TYPE_NORMAL
  zh: <!--yml
- en: 'category: 未分类'
  id: totrans-1
  prefs: []
  type: TYPE_NORMAL
  zh: 分类：未分类
- en: 'date: 2025-12-20 20:24:55'
  id: totrans-2
  prefs: []
  type: TYPE_NORMAL
  zh: 日期：2025-12-20 20:24:55
- en: -->
  id: totrans-3
  prefs: []
  type: TYPE_NORMAL
  zh: -->
- en: The Linux Kernel Module Programming Guide
  id: totrans-4
  prefs:
  - PREF_H1
  type: TYPE_NORMAL
  zh: Linux 内核模块编程指南
- en: 来源：[https://sysprog21.github.io/lkmpg/](https://sysprog21.github.io/lkmpg/)
  id: totrans-5
  prefs:
  - PREF_BQ
  type: TYPE_NORMAL
  zh: 来源：[https://sysprog21.github.io/lkmpg/](https://sysprog21.github.io/lkmpg/)
- en: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram, Jim Huang
  id: totrans-6
  prefs: []
  type: TYPE_NORMAL
  zh: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram, Jim Huang
- en: September 28, 2025
  id: totrans-7
  prefs: []
  type: TYPE_NORMAL
  zh: 2025年9月28日
- en: '![PIC](img/78a1165dae09cd77d532c4c0e3be17a8.png)'
  id: totrans-8
  prefs: []
  type: TYPE_IMG
  zh: '![PIC](img/78a1165dae09cd77d532c4c0e3be17a8.png)'
- en: 1 [Introduction](#introduction)
  id: totrans-9
  prefs: []
  type: TYPE_NORMAL
  zh: 1 [简介](#introduction)
- en: 1.1 [Authorship](#authorship)
  id: totrans-10
  prefs: []
  type: TYPE_NORMAL
  zh: 1.1 [作者](#authorship)
- en: 1.2 [Acknowledgements](#acknowledgements)
  id: totrans-11
  prefs: []
  type: TYPE_NORMAL
  zh: 1.2 [致谢](#acknowledgements)
- en: 1.3 [What Is A Kernel Module?](#what-is-a-kernel-module)
  id: totrans-12
  prefs: []
  type: TYPE_NORMAL
  zh: 1.3 [什么是内核模块？](#what-is-a-kernel-module)
- en: 1.4 [Kernel module package](#kernel-module-package)
  id: totrans-13
  prefs: []
  type: TYPE_NORMAL
  zh: 1.4 [内核模块包](#kernel-module-package)
- en: 1.5 [What Modules are in my Kernel?](#what-modules-are-in-my-kernel)
  id: totrans-14
  prefs: []
  type: TYPE_NORMAL
  zh: 1.5 [我的内核中有什么模块？](#what-modules-are-in-my-kernel)
- en: 1.6 [Is there a need to download and compile the kernel?](#is-there-a-need-to-download-and-compile-the-kernel)
  id: totrans-15
  prefs: []
  type: TYPE_NORMAL
  zh: 1.6 [是否需要下载和编译内核？](#is-there-a-need-to-download-and-compile-the-kernel)
- en: 1.7 [Before We Begin](#before-we-begin)
  id: totrans-16
  prefs: []
  type: TYPE_NORMAL
  zh: 1.7 [开始之前](#before-we-begin)
- en: 2 [Headers](#headers)
  id: totrans-17
  prefs: []
  type: TYPE_NORMAL
  zh: 2 [头文件](#headers)
- en: 3 [Examples](#examples)
  id: totrans-18
  prefs: []
  type: TYPE_NORMAL
  zh: 3 [示例](#examples)
- en: 4 [Hello World](#hello-world)
  id: totrans-19
  prefs: []
  type: TYPE_NORMAL
  zh: 4 [Hello World](#hello-world)
- en: 4.1 [The Simplest Module](#the-simplest-module)
  id: totrans-20
  prefs: []
  type: TYPE_NORMAL
  zh: 4.1 [最简单的模块](#the-simplest-module)
- en: 4.2 [Hello and Goodbye](#hello-and-goodbye)
  id: totrans-21
  prefs: []
  type: TYPE_NORMAL
  zh: 4.2 [你好和再见](#hello-and-goodbye)
- en: 4.3 [The __init and __exit Macros](#the-init-and-exit-macros)
  id: totrans-22
  prefs: []
  type: TYPE_NORMAL
  zh: 4.3 [__init和__exit宏](#the-init-and-exit-macros)
- en: 4.4 [Licensing and Module Documentation](#licensing-and-module-documentation)
  id: totrans-23
  prefs: []
  type: TYPE_NORMAL
  zh: 4.4 [许可和模块文档](#licensing-and-module-documentation)
- en: 4.5 [Passing Command Line Arguments to a Module](#passing-command-line-arguments-to-a-module)
  id: totrans-24
  prefs: []
  type: TYPE_NORMAL
  zh: 4.5 [向模块传递命令行参数](#passing-command-line-arguments-to-a-module)
- en: 4.6 [Modules Spanning Multiple Files](#modules-spanning-multiple-files)
  id: totrans-25
  prefs: []
  type: TYPE_NORMAL
  zh: 4.6 [跨多个文件的模块](#modules-spanning-multiple-files)
- en: 4.7 [Building modules for a precompiled kernel](#building-modules-for-a-precompiled-kernel)
  id: totrans-26
  prefs: []
  type: TYPE_NORMAL
  zh: 4.7 [为预编译内核构建模块](#building-modules-for-a-precompiled-kernel)
- en: 5 [Preliminaries](#preliminaries)
  id: totrans-27
  prefs: []
  type: TYPE_NORMAL
  zh: 5 [预备知识](#preliminaries)
- en: 5.1 [How modules begin and end](#how-modules-begin-and-end)
  id: totrans-28
  prefs: []
  type: TYPE_NORMAL
  zh: 5.1 [模块的开始和结束](#how-modules-begin-and-end)
- en: 5.2 [Functions available to modules](#functions-available-to-modules)
  id: totrans-29
  prefs: []
  type: TYPE_NORMAL
  zh: 5.2 [模块可用的函数](#functions-available-to-modules)
- en: 5.3 [User Space vs Kernel Space](#user-space-vs-kernel-space)
  id: totrans-30
  prefs: []
  type: TYPE_NORMAL
  zh: 5.3 [用户空间与内核空间](#user-space-vs-kernel-space)
- en: 5.4 [Name Space](#name-space)
  id: totrans-31
  prefs: []
  type: TYPE_NORMAL
  zh: 5.4 [命名空间](#name-space)
- en: 5.5 [Code space](#code-space)
  id: totrans-32
  prefs: []
  type: TYPE_NORMAL
  zh: 5.5 [代码空间](#code-space)
- en: 5.6 [Device Drivers](#device-drivers)
  id: totrans-33
  prefs: []
  type: TYPE_NORMAL
  zh: 5.6 [设备驱动程序](#device-drivers)
- en: 6 [Character Device drivers](#character-device-drivers)
  id: totrans-34
  prefs: []
  type: TYPE_NORMAL
  zh: 6 [字符设备驱动程序](#character-device-drivers)
- en: 6.1 [The file_operations Structure](#the-fileoperations-structure)
  id: totrans-35
  prefs: []
  type: TYPE_NORMAL
  zh: 6.1 [file_operations结构](#the-fileoperations-structure)
- en: 6.2 [The file structure](#the-file-structure)
  id: totrans-36
  prefs: []
  type: TYPE_NORMAL
  zh: 6.2 [文件结构](#the-file-structure)
- en: 6.3 [Registering A Device](#registering-a-device)
  id: totrans-37
  prefs: []
  type: TYPE_NORMAL
  zh: 6.3 [注册设备](#registering-a-device)
- en: 6.4 [Unregistering A Device](#unregistering-a-device)
  id: totrans-38
  prefs: []
  type: TYPE_NORMAL
  zh: 6.4 [注销设备](#unregistering-a-device)
- en: 6.5 [chardev.c](#chardevc)
  id: totrans-39
  prefs: []
  type: TYPE_NORMAL
  zh: 6.5 [chardev.c](#chardevc)
- en: 6.6 [Writing Modules for Multiple Kernel Versions](#writing-modules-for-multiple-kernel-versions)
  id: totrans-40
  prefs: []
  type: TYPE_NORMAL
  zh: 6.6 [为多个内核版本编写模块](#writing-modules-for-multiple-kernel-versions)
- en: 7 [The /proc Filesystem](#the-proc-filesystem)
  id: totrans-41
  prefs: []
  type: TYPE_NORMAL
  zh: 7 [/proc 文件系统](#the-proc-filesystem)
- en: 7.1 [The proc_ops Structure](#the-procops-structure)
  id: totrans-42
  prefs: []
  type: TYPE_NORMAL
  zh: 7.1 [proc_ops 结构](#the-procops-structure)
- en: 7.2 [Read and Write a /proc File](#read-and-write-a-proc-file)
  id: totrans-43
  prefs: []
  type: TYPE_NORMAL
  zh: 7.2 [读取和写入/proc文件](#read-and-write-a-proc-file)
- en: 7.3 [Manage /proc file with standard filesystem](#manage-proc-file-with-standard-filesystem)
  id: totrans-44
  prefs: []
  type: TYPE_NORMAL
  zh: 7.3 [使用标准文件系统管理/proc文件](#manage-proc-file-with-standard-filesystem)
- en: 7.4 [Manage /proc file with seq_file](#manage-proc-file-with-seqfile)
  id: totrans-45
  prefs: []
  type: TYPE_NORMAL
  zh: 7.4 [使用seq_file管理/proc文件](#manage-proc-file-with-seqfile)
- en: '8 [sysfs: Interacting with your module](#sysfs-interacting-with-your-module)'
  id: totrans-46
  prefs: []
  type: TYPE_NORMAL
  zh: 8 [sysfs：与你的模块交互](#sysfs-interacting-with-your-module)
- en: 9 [Talking To Device Files](#talking-to-device-files)
  id: totrans-47
  prefs: []
  type: TYPE_NORMAL
  zh: 9 [与设备文件通信](#talking-to-device-files)
- en: 10 [System Calls](#system-calls)
  id: totrans-48
  prefs: []
  type: TYPE_NORMAL
  zh: 10 [系统调用](#system-calls)
- en: 11 [Blocking Processes and threads](#blocking-processes-and-threads)
  id: totrans-49
  prefs: []
  type: TYPE_NORMAL
  zh: 11 [阻塞进程和线程](#blocking-processes-and-threads)
- en: 11.1 [Sleep](#sleep)
  id: totrans-50
  prefs: []
  type: TYPE_NORMAL
  zh: 11.1 [睡眠](#sleep)
- en: 11.2 [Completions](#completions)
  id: totrans-51
  prefs: []
  type: TYPE_NORMAL
  zh: 11.2 [补全](#completions)
- en: 12 [Synchronization](#synchronization)
  id: totrans-52
  prefs: []
  type: TYPE_NORMAL
  zh: 12 [同步](#synchronization)
- en: 12.1 [Mutex](#mutex)
  id: totrans-53
  prefs: []
  type: TYPE_NORMAL
  zh: 12.1 [互斥锁](#mutex)
- en: 12.2 [Spinlocks](#spinlocks)
  id: totrans-54
  prefs: []
  type: TYPE_NORMAL
  zh: 12.2 [自旋锁](#spinlocks)
- en: 12.3 [Read and write locks](#read-and-write-locks)
  id: totrans-55
  prefs: []
  type: TYPE_NORMAL
  zh: 12.3 [读写锁](#read-and-write-locks)
- en: 12.4 [Atomic operations](#atomic-operations)
  id: totrans-56
  prefs: []
  type: TYPE_NORMAL
  zh: 12.4 [原子操作](#atomic-operations)
- en: 13 [Replacing Print Macros](#replacing-print-macros)
  id: totrans-57
  prefs: []
  type: TYPE_NORMAL
  zh: 13 [替换打印宏](#replacing-print-macros)
- en: 13.1 [Replacement](#replacement)
  id: totrans-58
  prefs: []
  type: TYPE_NORMAL
  zh: 13.1 [替换](#replacement)
- en: 13.2 [Flashing keyboard LEDs](#flashing-keyboard-leds)
  id: totrans-59
  prefs: []
  type: TYPE_NORMAL
  zh: 13.2 [闪烁键盘LED](#flashing-keyboard-leds)
- en: 14 [GPIO](#gpio)
  id: totrans-60
  prefs: []
  type: TYPE_NORMAL
  zh: 14 [GPIO](#gpio)
- en: 14.1 [GPIO](#gpio1)
  id: totrans-61
  prefs: []
  type: TYPE_NORMAL
  zh: 14.1 [GPIO](#gpio1)
- en: 14.2 [Control the LED’s on/off state](#control-the-leds-onoff-state)
  id: totrans-62
  prefs: []
  type: TYPE_NORMAL
  zh: 14.2 [控制LED的开关状态](#control-the-leds-onoff-state)
- en: 14.3 [DHT11 sensor](#dht-sensor)
  id: totrans-63
  prefs: []
  type: TYPE_NORMAL
  zh: 14.3 [DHT11传感器](#dht-sensor)
- en: 15 [Scheduling Tasks](#scheduling-tasks)
  id: totrans-64
  prefs: []
  type: TYPE_NORMAL
  zh: 15 [调度任务](#scheduling-tasks)
- en: 15.1 [Tasklets](#tasklets)
  id: totrans-65
  prefs: []
  type: TYPE_NORMAL
  zh: 15.1 [任务](#tasklets)
- en: 15.2 [Work queues](#work-queues)
  id: totrans-66
  prefs: []
  type: TYPE_NORMAL
  zh: 15.2 [工作队列](#work-queues)
- en: 16 [Interrupt Handlers](#interrupt-handlers)
  id: totrans-67
  prefs: []
  type: TYPE_NORMAL
  zh: 16 [中断处理程序](#interrupt-handlers)
- en: 16.1 [Interrupt Handlers](#interrupt-handlers1)
  id: totrans-68
  prefs: []
  type: TYPE_NORMAL
  zh: 16.1 [中断处理程序](#interrupt-handlers1)
- en: 16.2 [Detecting button presses](#detecting-button-presses)
  id: totrans-69
  prefs: []
  type: TYPE_NORMAL
  zh: 16.2 [检测按钮按下](#detecting-button-presses)
- en: 16.3 [Bottom Half](#bottom-half)
  id: totrans-70
  prefs: []
  type: TYPE_NORMAL
  zh: 16.3 [下半部](#bottom-half)
- en: 16.4 [Threaded IRQ](#threaded-irq)
  id: totrans-71
  prefs: []
  type: TYPE_NORMAL
  zh: 16.4 [线程化中断](#threaded-irq)
- en: 17 [Virtual Input Device Driver](#virtual-input-device-driver)
  id: totrans-72
  prefs: []
  type: TYPE_NORMAL
  zh: 17 [虚拟输入设备驱动程序](#virtual-input-device-driver)
- en: '18 [Standardizing the interfaces: The Device Model](#standardizing-the-interfaces-the-device-model)'
  id: totrans-73
  prefs: []
  type: TYPE_NORMAL
  zh: 18 [标准化接口：设备模型](#standardizing-the-interfaces-the-device-model)
- en: 19 [Device Tree](#device-tree)
  id: totrans-74
  prefs: []
  type: TYPE_NORMAL
  zh: 19 [设备树](#device-tree)
- en: 19.1 [Introduction to Device Tree](#introduction-to-device-tree)
  id: totrans-75
  prefs: []
  type: TYPE_NORMAL
  zh: 19.1 [设备树简介](#introduction-to-device-tree)
- en: 19.2 [Device Tree and Kernel Modules](#device-tree-and-kernel-modules)
  id: totrans-76
  prefs: []
  type: TYPE_NORMAL
  zh: 19.2 [设备树和内核模块](#device-tree-and-kernel-modules)
- en: '19.3 [Example: Device Tree Module](#example-device-tree-module)'
  id: totrans-77
  prefs: []
  type: TYPE_NORMAL
  zh: 19.3 [示例：设备树模块](#example-device-tree-module)
- en: 19.4 [Device Tree Source Example](#device-tree-source-example)
  id: totrans-78
  prefs: []
  type: TYPE_NORMAL
  zh: 19.4 [设备树源示例](#device-tree-source-example)
- en: 19.5 [Testing Device Tree Modules](#testing-device-tree-modules)
  id: totrans-79
  prefs: []
  type: TYPE_NORMAL
  zh: 19.5 [测试设备树模块](#testing-device-tree-modules)
- en: 19.6 [Common Device Tree Functions](#common-device-tree-functions)
  id: totrans-80
  prefs: []
  type: TYPE_NORMAL
  zh: 19.6 [常见的设备树函数](#common-device-tree-functions)
- en: 20 [Optimizations](#optimizations)
  id: totrans-81
  prefs: []
  type: TYPE_NORMAL
  zh: 20 [优化](#optimizations)
- en: 20.1 [Likely and Unlikely conditions](#likely-and-unlikely-conditions)
  id: totrans-82
  prefs: []
  type: TYPE_NORMAL
  zh: 20.1 [可能和不可能条件](#likely-and-unlikely-conditions)
- en: 20.2 [Static keys](#static-keys)
  id: totrans-83
  prefs: []
  type: TYPE_NORMAL
  zh: 20.2 [静态键](#static-keys)
- en: 21 [Common Pitfalls](#common-pitfalls)
  id: totrans-84
  prefs: []
  type: TYPE_NORMAL
  zh: 21 [常见陷阱](#common-pitfalls)
- en: 21.1 [Using standard libraries](#using-standard-libraries)
  id: totrans-85
  prefs: []
  type: TYPE_NORMAL
  zh: 21.1 [使用标准库](#using-standard-libraries)
- en: 21.2 [Disabling interrupts](#disabling-interrupts)
  id: totrans-86
  prefs: []
  type: TYPE_NORMAL
  zh: 21.2 [禁用中断](#disabling-interrupts)
- en: 22 [Where To Go From Here?](#where-to-go-from-here)
  id: totrans-87
  prefs: []
  type: TYPE_NORMAL
  zh: 22 [从这里开始？](#where-to-go-from-here)
- en: 1 Introduction
  id: totrans-88
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 1 简介
- en: The Linux Kernel Module Programming Guide is a free book; you may reproduce
    or modify it under the terms of the [Open Software License](https://opensource.org/licenses/OSL-3.0),
    version 3.0.
  id: totrans-89
  prefs: []
  type: TYPE_NORMAL
  zh: 《Linux内核模块编程指南》是一本免费书籍；您可以在[开放软件许可](https://opensource.org/licenses/OSL-3.0)的条款下复制或修改，版本3.0。
- en: This book is distributed in the hope that it would be useful, but without any
    warranty, without even the implied warranty of merchantability or fitness for
    a particular purpose.
  id: totrans-90
  prefs: []
  type: TYPE_NORMAL
  zh: 本书分发是为了希望它会有用，但没有任何保证，甚至没有商销性或特定用途适用性的暗示保证。
- en: The author encourages wide distribution of this book for personal or commercial
    use, provided the above copyright notice remains intact and the method adheres
    to the provisions of the [Open Software License](https://opensource.org/licenses/OSL-3.0).
    In summary, you may copy and distribute this book free of charge or for a profit.
    No explicit permission is required from the author for reproduction of this book
    in any medium, physical or electronic.
  id: totrans-91
  prefs: []
  type: TYPE_NORMAL
  zh: 作者鼓励广泛分发此书，无论是个人还是商业用途，只要上述版权声明保持完整，并且方法遵守[开放软件许可](https://opensource.org/licenses/OSL-3.0)的规定。总之，您可以免费或盈利地复制和分发此书。无需作者明确许可即可以任何介质复制此书，无论是物理的还是电子的。
- en: Derivative works and translations of this document must be placed under the
    Open Software License, and the original copyright notice must remain intact. If
    you have contributed new material to this book, you must make the material and
    source code available for your revisions. Please make revisions and updates available
    directly to the document maintainer, Jim Huang <jserv@ccns.ncku.edu.tw>. This
    will allow for the merging of updates and provide consistent revisions to the
    Linux community.
  id: totrans-92
  prefs: []
  type: TYPE_NORMAL
  zh: 本文档的衍生作品和翻译必须置于开放软件许可之下，并且必须保留原始版权声明。如果您为此书贡献了新材料，您必须提供材料和源代码以供您的修订。请直接向文档维护者Jim
    Huang <jserv@ccns.ncku.edu.tw>提供修订和更新。这将允许合并更新并提供给Linux社区一致的修订。
- en: If you publish or distribute this book commercially, donations, royalties, or
    printed copies are greatly appreciated by the author and the [Linux Documentation
    Project](https://tldp.org/) (LDP). Contributing in this way shows your support
    for free software and the LDP. If you have questions or comments, please contact
    the address above.
  id: totrans-93
  prefs: []
  type: TYPE_NORMAL
  zh: 如果您商业出版或分发此书，作者和 [Linux 文档项目](https://tldp.org/)（LDP）将非常感激捐赠、版税或印刷副本。以这种方式做出贡献表明您支持免费软件和
    LDP。如果您有任何问题或评论，请通过上述地址联系。
- en: 1.1 Authorship
  id: totrans-94
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.1 著作权
- en: The Linux Kernel Module Programming Guide was initially authored by Ori Pomerantz
    for Linux v2.2\. As the Linux kernel evolved, Ori’s availability to maintain the
    document diminished. Consequently, Peter Jay Salzman assumed the role of maintainer
    and updated the guide for Linux v2.4\. Similar constraints arose for Peter when
    tracking developments in Linux v2.6, leading to Michael Burian joining as a co-maintainer
    to bring the guide up to speed with Linux v2.6\. Bob Mottram contributed to the
    guide by updating examples for Linux v3.8 and later. Jim Huang then undertook
    the task of updating the guide for recent Linux versions (v5.0 and beyond), along
    with revising the LaTeX document. The guide continues to be maintained for compatibility
    with modern kernels (v6.x series) while ensuring examples work with older LTS
    kernels.
  id: totrans-95
  prefs: []
  type: TYPE_NORMAL
  zh: 《Linux 内核模块编程指南》最初由 Ori Pomerantz 为 Linux v2.2 版本编写。随着 Linux 内核的演变，Ori 维护文档的能力逐渐减弱。因此，Peter
    Jay Salzman 接替了维护者的角色，并为 Linux v2.4 版本更新了指南。当 Peter 跟踪 Linux v2.6 版本的进展时，也遇到了类似的限制，导致
    Michael Burian 加入作为共同维护者，使指南与 Linux v2.6 版本保持同步。Bob Mottram 通过更新 Linux v3.8 及以后的示例为指南做出了贡献。随后，Jim
    Huang 承担了更新指南以适应最新 Linux 版本（v5.0 及以上）的任务，同时修订了 LaTeX 文档。指南继续维护以兼容现代内核（v6.x 系列），同时确保示例与较旧的
    LTS 内核兼容。
- en: 1.2 Acknowledgements
  id: totrans-96
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.2 致谢
- en: 'The following people have contributed corrections or good suggestions:'
  id: totrans-97
  prefs: []
  type: TYPE_NORMAL
  zh: 以下人员对纠正或提出了良好的建议：
- en: Amit Dhingra, Andrew Kreimer, Andrew Lin, Andy Shevchenko, Arush Sharma, Aykhan
    Hagverdili, Benno Bielmeier, Bob Lee, Brad Baker, Che-Chia Chang, Cheng-Shian
    Yeh, Cheng-Yang Chou, Chih-En Lin, Chih-Hsuan Yang, Chih-Yu Chen, Ching-Hua (Vivian)
    Lin, Chin Yik Ming, Chung-Han Tsai, cvvletter, Cyril Brulebois, Daniele Paolo
    Scarpazza, David Porter, demonsome, Dimo Velev, Ekang Monyet, Ethan Chan, Francois
    Audeon, Gilad Reti, Hao.Dong, heartofrain, Horst Schirmeier, Hsin-Hsiang Peng,
    Hung-Jen Pao, Ignacio Martin, I-Hsin Cheng, Integral, Iûnn Kiàn-îng, Jian-Xing
    Wu, Jimmy Ma, Johan Calle, keytouch, Kohei Otsuka, Kuan-Wei Chiu, manbing, Marconi
    Jiang, mengxinayan, Meng-Zong Tsai, Peter Lin, Roman Lakeev, Sam Erickson, Shao-Tse
    Hung, Shih-Sheng Yang, Stacy Prowell, Steven Lung, Tristan Lelong, Tse-Wei Lin,
    Tucker Polomik, Tyler Fanelli, VxTeemo, Wei-Hsin Yeh, Wei-Lun Tsai, Xatierlike
    Lee, Yan-Jie Chan, Yen-Yu Chen, Yin-Chiuan Chen, Yi-Wei Lin, Yo-Jung Lin, Yu-Chun
    Lin, Yu-Hsiang Tseng, YYGO.
  id: totrans-98
  prefs: []
  type: TYPE_NORMAL
  zh: Amit Dhingra, Andrew Kreimer, Andrew Lin, Andy Shevchenko, Arush Sharma, Aykhan
    Hagverdili, Benno Bielmeier, Bob Lee, Brad Baker, Che-Chia Chang, Cheng-Shian
    Yeh, Cheng-Yang Chou, Chih-En Lin, Chih-Hsuan Yang, Chih-Yu Chen, Ching-Hua (Vivian)
    Lin, Chin Yik Ming, Chung-Han Tsai, cvvletter, Cyril Brulebois, Daniele Paolo
    Scarpazza, David Porter, demonsome, Dimo Velev, Ekang Monyet, Ethan Chan, Francois
    Audeon, Gilad Reti, Hao.Dong, heartofrain, Horst Schirmeier, Hsin-Hsiang Peng,
    Hung-Jen Pao, Ignacio Martin, I-Hsin Cheng, Integral, Iûnn Kiàn-îng, Jian-Xing
    Wu, Jimmy Ma, Johan Calle, keytouch, Kohei Otsuka, Kuan-Wei Chiu, manbing, Marconi
    Jiang, mengxinayan, Meng-Zong Tsai, Peter Lin, Roman Lakeev, Sam Erickson, Shao-Tse
    Hung, Shih-Sheng Yang, Stacy Prowell, Steven Lung, Tristan Lelong, Tse-Wei Lin,
    Tucker Polomik, Tyler Fanelli, VxTeemo, Wei-Hsin Yeh, Wei-Lun Tsai, Xatierlike
    Lee, Yan-Jie Chan, Yen-Yu Chen, Yin-Chiuan Chen, Yi-Wei Lin, Yo-Jung Lin, Yu-Chun
    Lin, Yu-Hsiang Tseng, YYGO。
- en: 1.3 What Is A Kernel Module?
  id: totrans-99
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.3 什么是内核模块？
- en: Involvement in the development of Linux kernel modules requires a foundation
    in the C programming language and a track record of creating conventional programs
    intended for process execution. This pursuit delves into a domain where an unregulated
    pointer, if disregarded, may potentially trigger the total elimination of an entire
    filesystem, resulting in a scenario that necessitates a complete system reboot.
  id: totrans-100
  prefs: []
  type: TYPE_NORMAL
  zh: 参与开发 Linux 内核模块需要具备 C 编程语言的基础，并拥有创建旨在执行进程的传统程序的历史记录。这项追求深入到一个领域，如果忽视未受管理的指针，可能会触发整个文件系统的完全消除，导致需要完全系统重启的情景。
- en: A Linux kernel module is precisely defined as a code segment capable of dynamic
    loading and unloading within the kernel as needed. These modules enhance kernel
    capabilities without necessitating a system reboot. A notable example is seen
    in the device driver module, which facilitates kernel interaction with hardware
    components linked to the system. In the absence of modules, the prevailing approach
    leans toward monolithic kernels, requiring direct integration of new functionalities
    into the kernel image. This approach leads to larger kernels and necessitates
    kernel rebuilding and subsequent system rebooting when new functionalities are
    desired.
  id: totrans-101
  prefs: []
  type: TYPE_NORMAL
  zh: Linux 内核模块精确地定义为一段可以在内核中按需动态加载和卸载的代码。这些模块增强了内核功能，而无需重新启动系统。一个显著的例子是设备驱动模块，它促进了内核与系统连接的硬件组件之间的交互。如果没有模块，当前的方法倾向于使用单核内核，需要将新功能直接集成到内核映像中。这种方法会导致内核变大，并在需要新功能时需要重建内核和随后的系统重启。
- en: 1.4 Kernel module package
  id: totrans-102
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.4 内核模块包
- en: Linux distributions provide the commands `modprobe` , `insmod` and `depmod`
    within a package.
  id: totrans-103
  prefs: []
  type: TYPE_NORMAL
  zh: Linux 发行版在包中提供了 `modprobe`、`insmod` 和 `depmod` 命令。
- en: 'On Ubuntu/Debian GNU/Linux:'
  id: totrans-104
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Ubuntu/Debian GNU/Linux 上：
- en: '[PRE0]'
  id: totrans-105
  prefs: []
  type: TYPE_PRE
  zh: '[PRE0]'
- en: 'On Arch Linux:'
  id: totrans-106
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Arch Linux 上：
- en: '[PRE1]'
  id: totrans-107
  prefs: []
  type: TYPE_PRE
  zh: '[PRE1]'
- en: 1.5 What Modules are in my Kernel?
  id: totrans-108
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.5 我的内核中有什么模块？
- en: To discover what modules are already loaded within your current kernel, use
    the command `lsmod` .
  id: totrans-109
  prefs: []
  type: TYPE_NORMAL
  zh: 要发现当前内核中已经加载的模块，请使用命令 `lsmod`。
- en: '[PRE2]'
  id: totrans-110
  prefs: []
  type: TYPE_PRE
  zh: '[PRE2]'
- en: 'Modules are stored within the file /proc/modules, so you can also see them
    with:'
  id: totrans-111
  prefs: []
  type: TYPE_NORMAL
  zh: 模块存储在文件 /proc/modules 中，因此您也可以使用以下命令查看它们：
- en: '[PRE3]'
  id: totrans-112
  prefs: []
  type: TYPE_PRE
  zh: '[PRE3]'
- en: 'This can be a long list, and you might prefer to search for something particular.
    To search for the fat module:'
  id: totrans-113
  prefs: []
  type: TYPE_NORMAL
  zh: 这可能是一个很长的列表，您可能更喜欢搜索特定内容。要搜索 fat 模块：
- en: '[PRE4]'
  id: totrans-114
  prefs: []
  type: TYPE_PRE
  zh: '[PRE4]'
- en: 1.6 Is there a need to download and compile the kernel?
  id: totrans-115
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.6 是否需要下载和编译内核？
- en: To effectively follow this guide, there is no obligatory requirement for performing
    such actions. Nonetheless, a prudent approach involves executing the examples
    within a test distribution on a virtual machine, thus mitigating any potential
    risk of disrupting the system.
  id: totrans-116
  prefs: []
  type: TYPE_NORMAL
  zh: 为了有效地遵循本指南，没有执行此类操作的强制性要求。然而，一种谨慎的方法是在虚拟机上的测试发行版中执行示例，从而降低对系统造成潜在风险的任何可能性。
- en: 1.7 Before We Begin
  id: totrans-117
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 1.7 开始之前
- en: Before delving into code, certain matters require attention. Variances exist
    among individuals’ systems, and distinct personal approaches are evident. The
    achievement of successful compilation and loading of the inaugural “hello world”
    program may, at times, present challenges. It is reassuring to note that overcoming
    the initial obstacle on the first attempt paves the way for subsequent endeavors
    to proceed seamlessly.
  id: totrans-118
  prefs: []
  type: TYPE_NORMAL
  zh: 在深入研究代码之前，有一些事项需要关注。不同系统的差异存在，并且明显的个人方法也很明显。首次尝试成功编译和加载第一个“hello world”程序有时可能会遇到挑战。值得注意的是，首次尝试克服初始障碍为后续的顺利进展铺平了道路。
- en: Modversioning. A module compiled for one kernel will not load if a different
    kernel is booted, unless `CONFIG_MODVERSIONS` is enabled in the kernel. Module
    versioning will be discussed later in this guide. Until module versioning is covered,
    the examples in this guide may not work correctly if running a kernel with modversioning
    turned on. However, most stock Linux distribution kernels come with modversioning
    enabled. If difficulties arise when loading the modules due to versioning errors,
    consider compiling a kernel with modversioning turned off.
  id: totrans-119
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 模块版本化。为某个内核编译的模块如果启动了不同的内核则无法加载，除非在内核中启用了 `CONFIG_MODVERSIONS`。模块版本化将在本指南的后面讨论。在覆盖模块版本化之前，如果运行启用了模块版本化的内核，本指南中的示例可能无法正确工作。然而，大多数股票
    Linux 发行版内核都启用了模块版本化。如果由于版本错误而加载模块时出现困难，请考虑编译一个禁用了模块版本化的内核。
- en: Using the X Window System. It is highly recommended to extract, compile, and
    load all the examples discussed in this guide from a console. Working on these
    tasks within the X Window System is discouraged.
  id: totrans-120
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 使用 X Window 系统。强烈建议从控制台提取、编译和加载本指南中讨论的所有示例。在 X Window 系统内执行这些任务是不被推荐的。
- en: Modules cannot directly print to the screen like `printf()` can, but they can
    log information and warnings to the kernel’s log ring buffer. This output is not
    automatically displayed on any console or terminal. To view kernel module messages,
    you must use `dmesg` to read the kernel log ring buffer, or check the systemd
    journal with `journalctl -k` for kernel messages. Refer to [Section 4](#hello-world)
    for more information. The terminal or environment from which you load the module
    does not affect where the output goes—it always goes to the kernel log.
  id: totrans-121
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 模块不能像 `printf()` 一样直接打印到屏幕，但它们可以将信息和警告记录到内核的日志环形缓冲区。此输出不会自动在任何控制台或终端上显示。要查看内核模块消息，您必须使用
    `dmesg` 读取内核日志环形缓冲区，或使用 `journalctl -k` 检查 systemd 日志以获取内核消息。有关更多信息，请参阅[第 4 节](#hello-world)。加载模块的终端或环境不会影响输出位置——它始终输出到内核日志。
- en: SecureBoot. Numerous modern computers arrive pre-configured with UEFI SecureBoot
    enabled—an essential security standard ensuring booting exclusively through trusted
    software endorsed by the original equipment manufacturer. Certain Linux distributions
    even ship with the default Linux kernel configured to support SecureBoot. In these
    cases, the kernel module necessitates a signed security key.
  id: totrans-122
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: SecureBoot。许多现代计算机出厂时已预配置为启用 UEFI SecureBoot——这是一个确保仅通过原始设备制造商认可的受信任软件启动的必要安全标准。某些
    Linux 发行版甚至默认配置了支持 SecureBoot 的 Linux 内核。在这些情况下，内核模块需要签名安全密钥。
- en: 'Failing that, an attempt to insert your first “hello world” module would result
    in the message: “ERROR: could not insert module”. If this message “Lockdown: insmod:
    unsigned module loading is restricted; see man kernel lockdown.7” appears in the
    `dmesg` output, the simplest approach involves disabling UEFI SecureBoot from
    the boot menu of your PC or laptop, allowing the successful insertion of the “hello
    world” module. Naturally, an alternative involves undergoing intricate procedures
    such as generating keys, system key installation, and module signing to achieve
    functionality. However, this intricate process is less appropriate for beginners.
    If interested, more detailed steps for [SecureBoot](https://wiki.debian.org/SecureBoot)
    can be explored and followed.'
  id: totrans-123
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: '如果失败，尝试插入您的第一个“Hello World”模块将导致出现消息：“ERROR: could not insert module”。如果此消息“Lockdown:
    insmod: unsigned module loading is restricted; see man kernel lockdown.7”出现在 `dmesg`
    输出中，最简单的方法是禁用 PC 或笔记本电脑的启动菜单中的 UEFI SecureBoot，以允许成功插入“Hello World”模块。当然，另一种方法是进行复杂的程序，如生成密钥、系统密钥安装和模块签名以实现功能。然而，这个过程对于初学者来说不太合适。如果您感兴趣，可以探索并遵循[SecureBoot](https://wiki.debian.org/SecureBoot)的更详细步骤。'
- en: 2 Headers
  id: totrans-124
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 2 头文件
- en: Before building anything, it is necessary to install the header files for the
    kernel.
  id: totrans-125
  prefs: []
  type: TYPE_NORMAL
  zh: 在构建任何东西之前，需要安装内核的头文件。
- en: 'On Ubuntu/Debian GNU/Linux:'
  id: totrans-126
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Ubuntu/Debian GNU/Linux 上：
- en: '[PRE5]'
  id: totrans-127
  prefs: []
  type: TYPE_PRE
  zh: '[PRE5]'
- en: 'The following command provides information about the available kernel header
    files. Then, for example:'
  id: totrans-128
  prefs: []
  type: TYPE_NORMAL
  zh: 以下命令提供了有关可用内核头文件的信息。例如：
- en: '[PRE6]'
  id: totrans-129
  prefs: []
  type: TYPE_PRE
  zh: '[PRE6]'
- en: 'On Arch Linux:'
  id: totrans-130
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Arch Linux 上：
- en: '[PRE7]'
  id: totrans-131
  prefs: []
  type: TYPE_PRE
  zh: '[PRE7]'
- en: 'On Fedora:'
  id: totrans-132
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Fedora 上：
- en: '[PRE8]'
  id: totrans-133
  prefs: []
  type: TYPE_PRE
  zh: '[PRE8]'
- en: 3 Examples
  id: totrans-134
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 3 示例
- en: All the examples from this document are available within the examples subdirectory.
  id: totrans-135
  prefs: []
  type: TYPE_NORMAL
  zh: 本文档中的所有示例都可在 examples 子目录中找到。
- en: Should compile errors occur, it may be due to a more recent kernel version being
    in use, or there might be a need to install the corresponding kernel header files.
  id: totrans-136
  prefs: []
  type: TYPE_NORMAL
  zh: 如果出现编译错误，可能是由于正在使用较新的内核版本，或者可能需要安装相应的内核头文件。
- en: 4 Hello World
  id: totrans-137
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 4 Hello World
- en: 4.1 The Simplest Module
  id: totrans-138
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.1 最简单的模块
- en: Most individuals beginning their programming journey typically start with some
    variant of a hello world example. It is unclear what the outcomes are for those
    who deviate from this tradition, but it seems prudent to adhere to it. The learning
    process will begin with a series of hello world programs that illustrate various
    fundamental aspects of writing a kernel module.
  id: totrans-139
  prefs: []
  type: TYPE_NORMAL
  zh: 大多数开始编程之旅的人通常从某种“Hello World”示例的变体开始。对于偏离这一传统的人的结果尚不清楚，但似乎遵循它更为谨慎。学习过程将从一系列展示编写内核模块各种基本方面的“Hello
    World”程序开始。
- en: Presented next is the simplest possible module.
  id: totrans-140
  prefs: []
  type: TYPE_NORMAL
  zh: 下面展示的是最简单的模块。
- en: 'Make a test directory:'
  id: totrans-141
  prefs: []
  type: TYPE_NORMAL
  zh: 创建一个测试目录：
- en: '[PRE9]'
  id: totrans-142
  prefs: []
  type: TYPE_PRE
  zh: '[PRE9]'
- en: 'Paste this into your favorite editor and save it as hello-1.c:'
  id: totrans-143
  prefs: []
  type: TYPE_NORMAL
  zh: 将以下内容粘贴到您喜欢的编辑器中，并保存为 hello-1.c：
- en: '[PRE10]'
  id: totrans-144
  prefs: []
  type: TYPE_PRE
  zh: '[PRE10]'
- en: Now you will need a Makefile. If you copy and paste this, change the indentation
    to use tabs, not spaces.
  id: totrans-145
  prefs: []
  type: TYPE_NORMAL
  zh: 现在您需要一个 Makefile。如果您复制并粘贴，请将缩进更改为使用制表符而不是空格。
- en: '[PRE11]'
  id: totrans-146
  prefs: []
  type: TYPE_PRE
  zh: '[PRE11]'
- en: In Makefile, $(CURDIR) can be set to the absolute pathname of the current working
    directory (after all -C options are processed, if any). See more about CURDIR
    in [GNU make manual](https://www.gnu.org/software/make/manual/make.html).
  id: totrans-147
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Makefile 中，$(CURDIR) 可以设置为当前工作目录的绝对路径名（在处理完所有 -C 选项之后，如果有的话）。有关 CURDIR 的更多信息，请参阅[GNU
    make 手册](https://www.gnu.org/software/make/manual/make.html)。
- en: And finally, just run make directly.
  id: totrans-148
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，直接运行 make。
- en: '[PRE12]'
  id: totrans-149
  prefs: []
  type: TYPE_PRE
  zh: '[PRE12]'
- en: 'If there is no PWD := $(CURDIR) statement in the Makefile, then it may not
    compile correctly with sudo make. This is because some environment variables are
    specified by the security policy and cannot be inherited. The default security
    policy is sudoers. In the sudoers security policy, env_reset is enabled by default,
    which restricts environment variables. Specifically, path variables are not retained
    from the user environment; they are set to default values (for more information,
    see: [sudoers manual](https://www.sudo.ws/docs/man/sudoers.man/)). You can see
    the environment variable settings by:'
  id: totrans-150
  prefs: []
  type: TYPE_NORMAL
  zh: 如果 Makefile 中没有 PWD := $(CURDIR) 语句，那么使用 sudo make 可能无法正确编译。这是因为一些环境变量由安全策略指定，不能被继承。默认的安全策略是
    sudoers。在 sudoers 安全策略中，env_reset 默认启用，这限制了环境变量。具体来说，路径变量不会保留用户环境中的值；它们被设置为默认值（更多信息，请参阅：[sudoers
    手册](https://www.sudo.ws/docs/man/sudoers.man/)）。你可以通过以下方式查看环境变量设置：
- en: '[PRE13]'
  id: totrans-151
  prefs: []
  type: TYPE_PRE
  zh: '[PRE13]'
- en: Here is a simple Makefile as an example to demonstrate the problem mentioned
    above.
  id: totrans-152
  prefs: []
  type: TYPE_NORMAL
  zh: 这里有一个简单的 Makefile 示例，用于演示上述提到的问题。
- en: '[PRE14]'
  id: totrans-153
  prefs: []
  type: TYPE_PRE
  zh: '[PRE14]'
- en: Then, we can use the -p flag to print out the environment variable values from
    the Makefile.
  id: totrans-154
  prefs: []
  type: TYPE_NORMAL
  zh: 然后，我们可以使用 -p 标志打印出 Makefile 中的环境变量值。
- en: '[PRE15]'
  id: totrans-155
  prefs: []
  type: TYPE_PRE
  zh: '[PRE15]'
- en: The PWD variable will not be inherited with sudo.
  id: totrans-156
  prefs: []
  type: TYPE_NORMAL
  zh: PWD 变量在使用 sudo 时不被继承。
- en: '[PRE16]'
  id: totrans-157
  prefs: []
  type: TYPE_PRE
  zh: '[PRE16]'
- en: However, there are three ways to solve this problem.
  id: totrans-158
  prefs: []
  type: TYPE_NORMAL
  zh: 然而，有三种方法可以解决这个问题。
- en: You can use the -E flag to temporarily preserve them.
  id: totrans-159
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 你可以使用 -E 标志临时保留它们。
- en: '[PRE17]'
  id: totrans-160
  prefs:
  - PREF_IND
  type: TYPE_PRE
  zh: '[PRE17]'
- en: You can disable env_reset by editing /etc/sudoers as root using visudo.
  id: totrans-161
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 作为 root 用户编辑 /etc/sudoers，可以禁用 env_reset。
- en: '[PRE18]'
  id: totrans-162
  prefs:
  - PREF_IND
  type: TYPE_PRE
  zh: '[PRE18]'
- en: Then execute env and sudo env individually.
  id: totrans-163
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 然后分别执行 env 和 sudo env。
- en: '[PRE19]'
  id: totrans-164
  prefs:
  - PREF_IND
  type: TYPE_PRE
  zh: '[PRE19]'
- en: You can view and compare these logs to find differences between env_reset and
    !env_reset.
  id: totrans-165
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 你可以查看并比较这些日志，以找到 env_reset 和 !env_reset 之间的差异。
- en: You can preserve environment variables by appending them to env_keep in /etc/sudoers.
  id: totrans-166
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 你可以通过将它们附加到 /etc/sudoers 中的 env_keep 来保留环境变量。
- en: '[PRE20]'
  id: totrans-167
  prefs:
  - PREF_IND
  type: TYPE_PRE
  zh: '[PRE20]'
- en: 'After applying the above change, you can check the environment variable settings
    by:'
  id: totrans-168
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 应用上述更改后，你可以通过以下方式检查环境变量设置：
- en: '[PRE21]'
  id: totrans-169
  prefs:
  - PREF_IND
  type: TYPE_PRE
  zh: '[PRE21]'
- en: 'If all goes smoothly you should then find that you have a compiled hello-1.ko
    module. You can find info on it with the command:'
  id: totrans-170
  prefs: []
  type: TYPE_NORMAL
  zh: 如果一切顺利，你应该会发现你有一个编译好的 hello-1.ko 模块。你可以使用以下命令获取相关信息：
- en: '[PRE22]'
  id: totrans-171
  prefs: []
  type: TYPE_PRE
  zh: '[PRE22]'
- en: 'At this point the command:'
  id: totrans-172
  prefs: []
  type: TYPE_NORMAL
  zh: 在这一点上，以下命令：
- en: '[PRE23]'
  id: totrans-173
  prefs: []
  type: TYPE_PRE
  zh: '[PRE23]'
- en: 'should return nothing. You can try loading your new module with:'
  id: totrans-174
  prefs: []
  type: TYPE_NORMAL
  zh: 应该不会返回任何内容。你可以尝试使用以下命令加载你的新模块：
- en: '[PRE24]'
  id: totrans-175
  prefs: []
  type: TYPE_PRE
  zh: '[PRE24]'
- en: 'The dash character will get converted to an underscore, so when you again try:'
  id: totrans-176
  prefs: []
  type: TYPE_NORMAL
  zh: 连字符将被转换为下划线，所以当你再次尝试时：
- en: '[PRE25]'
  id: totrans-177
  prefs: []
  type: TYPE_PRE
  zh: '[PRE25]'
- en: 'You should now see your loaded module. It can be removed again with:'
  id: totrans-178
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，你应该能看到你加载的模块。它可以再次使用以下命令删除：
- en: '[PRE26]'
  id: totrans-179
  prefs: []
  type: TYPE_PRE
  zh: '[PRE26]'
- en: 'Notice that the dash was replaced by an underscore. To see the module’s output
    messages, use `dmesg` to view the kernel log ring buffer:'
  id: totrans-180
  prefs: []
  type: TYPE_NORMAL
  zh: 注意到连字符已被替换为下划线。要查看模块的输出消息，请使用 `dmesg` 查看内核日志环缓冲区：
- en: '[PRE27]'
  id: totrans-181
  prefs: []
  type: TYPE_PRE
  zh: '[PRE27]'
- en: 'You should see messages like “Hello world 1.” and “Goodbye world 1.” from your
    module. Alternatively, you can check the systemd journal for kernel messages:'
  id: totrans-182
  prefs: []
  type: TYPE_NORMAL
  zh: 你应该会看到来自你的模块的消息，例如“Hello world 1.”和“Goodbye world 1.”。或者，你可以检查 systemd 日志以获取内核消息：
- en: '[PRE28]'
  id: totrans-183
  prefs: []
  type: TYPE_PRE
  zh: '[PRE28]'
- en: You now know the basics of creating, compiling, installing and removing modules.
    Now for more of a description of how this module works.
  id: totrans-184
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，你已经了解了创建、编译、安装和删除模块的基本知识。现在让我们更详细地描述这个模块的工作原理。
- en: 'Kernel modules must have at least two functions: a "start" (initialization)
    function called `init_module()` which is called when the module is `insmod` ed
    into the kernel, and an "end" (cleanup) function called `cleanup_module()` which
    is called just before it is removed from the kernel. Actually, things have changed
    starting with kernel 2.3.13\. You can now use whatever name you like for the start
    and end functions of a module, and you will learn how to do this in [Section 4.2](#hello-and-goodbye).
    In fact, the new method is the preferred method. However, many people still use
    `init_module()` and `cleanup_module()` for their start and end functions.'
  id: totrans-185
  prefs: []
  type: TYPE_NORMAL
  zh: 内核模块必须至少有两个函数：一个名为“start”（初始化）的函数，称为`init_module()`，当模块被`insmod`到内核中时调用；以及一个名为“end”（清理）的函数，称为`cleanup_module()`，在它从内核中移除之前调用。实际上，从2.3.13内核开始，事情已经发生了变化。你现在可以为模块的起始和结束函数使用任何你喜欢的名称，你将在[第4.2节](#hello-and-goodbye)中了解到如何做到这一点。实际上，新方法是首选方法。然而，许多人仍然使用`init_module()`和`cleanup_module()`作为它们的起始和结束函数。
- en: Typically, `init_module()` either registers a handler for something with the
    kernel, or it replaces one of the kernel functions with its own code (usually
    code to do something and then call the original function). The `cleanup_module()`
    function is supposed to undo whatever `init_module()` did, so the module can be
    unloaded safely.
  id: totrans-186
  prefs: []
  type: TYPE_NORMAL
  zh: 通常，`init_module()`要么向内核注册一个处理程序，要么用它的代码替换内核中的一个函数（通常是执行某些操作然后调用原始函数的代码）。`cleanup_module()`函数应该撤销`init_module()`所做的操作，以便模块可以安全卸载。
- en: Lastly, every kernel module needs to include <linux/module.h>. We needed to
    include <linux/printk.h> only for the macro expansion for the `pr_alert()` log
    level, which you’ll learn about in [Item 2](#x1-121702).
  id: totrans-187
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，每个内核模块都需要包含<linux/module.h>。我们只需要包含<linux/printk.h>来为`pr_alert()`日志级别的宏进行展开，你将在[项目2](#x1-121702)中了解到这一点。
- en: A point about coding style. Another thing that may not be immediately obvious
    to anyone getting started with kernel programming is that indentation within your
    code should use tabs and not spaces. It is one of the coding conventions of the
    kernel. You may not like it, but you will need to get used to it if you ever submit
    a patch upstream.
  id: totrans-188
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 关于编码风格的一点。对于刚开始接触内核编程的人来说可能不太明显的是，你的代码缩进应该使用制表符而不是空格。这是内核的编码约定之一。你可能不喜欢它，但如果你要向上游提交补丁，你将需要习惯它。
- en: Introducing print macros. In the beginning there was `printk` , usually followed
    by a priority such as `KERN_INFO` or `KERN_DEBUG` . More recently, this can also
    be expressed in abbreviated form using a set of print macros, such as `pr_info`
    and `pr_debug` . This just saves some mindless keyboard bashing and looks a bit
    neater. They can be found within [include/linux/printk.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/printk.h).
    Take time to read through the available priority macros.
  id: totrans-189
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 介绍打印宏。最初是`printk`，通常后面跟着一个优先级，例如`KERN_INFO`或`KERN_DEBUG`。最近，这也可以通过使用一组打印宏来以缩写形式表达，例如`pr_info`和`pr_debug`。这仅仅节省了一些无意义的键盘敲击，看起来也更整洁。它们可以在[include/linux/printk.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/printk.h)中找到。花点时间阅读可用的优先级宏。
- en: 'Important: These functions write to the kernel log ring buffer, not directly
    to any terminal or console. To view the output from your kernel modules, you must
    use `dmesg` or `journalctl -k` .'
  id: totrans-190
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 重要：这些函数将写入内核日志环形缓冲区，而不是直接写入任何终端或控制台。要查看内核模块的输出，你必须使用`dmesg`或`journalctl -k`。
- en: About Compiling. Kernel modules need to be compiled a bit differently from regular
    userspace apps. Former kernel versions required us to care much about these settings,
    which are usually stored in Makefiles. Although hierarchically organized, many
    redundant settings accumulated in sublevel Makefiles and made them large and rather
    difficult to maintain. Fortunately, there is a new way of doing these things,
    called kbuild, and the build process for external loadable modules is now fully
    integrated into the standard kernel build mechanism. To learn more about how to
    compile modules which are not part of the official kernel (such as all the examples
    you will find in this guide), see file [Documentation/kbuild/modules.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/modules.rst).
  id: totrans-191
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 关于编译。内核模块需要以与常规用户空间应用不同的方式编译。早期内核版本要求我们非常关注这些设置，这些设置通常存储在 Makefiles 中。尽管它们是按层次组织的，但许多冗余设置在子级
    Makefiles 中积累，使它们变得很大，而且相当难以维护。幸运的是，有一种新的方法来做这些事情，称为 kbuild，外部可加载模块的构建过程现在已完全集成到标准内核构建机制中。要了解更多关于如何编译不属于官方内核的模块（例如，您将在本指南中找到的所有示例），请参阅文件
    [Documentation/kbuild/modules.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/modules.rst)。
- en: Additional details about Makefiles for kernel modules are available in [Documentation/kbuild/makefiles.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/makefiles.rst).
    Be sure to read this and the related files before starting to hack Makefiles.
    It will probably save you lots of work.
  id: totrans-192
  prefs:
  - PREF_IND
  type: TYPE_NORMAL
  zh: 关于内核模块的 Makefiles 的更多详细信息，请参阅 [Documentation/kbuild/makefiles.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/makefiles.rst)。在开始修改
    Makefiles 之前，请务必阅读此文件和相关文件。这可能会为您节省大量工作。
- en: Here is another exercise for the reader. See that comment above the return statement
    in `init_module()` ? Change the return value to something negative, recompile
    and load the module again. What happens?
  id: totrans-193
  prefs:
  - PREF_IND
  - PREF_BQ
  type: TYPE_NORMAL
  zh: 这里有一个给读者的练习。看看 `init_module()` 函数上方的注释？将返回值改为一个负数，重新编译并再次加载模块。会发生什么？
- en: 4.2 Hello and Goodbye
  id: totrans-194
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.2 你好和再见
- en: 'In early kernel versions you had to use the `init_module` and `cleanup_module`
    functions, as in the first hello world example, but these days you can name those
    anything you want by using the `module_init` and `module_exit` macros. These macros
    are defined in [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h).
    The only requirement is that your init and cleanup functions must be defined before
    calling those macros, otherwise you will get compilation errors. Here is an example
    of this technique:'
  id: totrans-195
  prefs: []
  type: TYPE_NORMAL
  zh: 在早期内核版本中，您必须使用 `init_module` 和 `cleanup_module` 函数，就像第一个 hello world 示例中那样，但如今您可以通过使用
    `module_init` 和 `module_exit` 宏来命名这些函数。这些宏在 [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
    中定义。唯一的要求是您的初始化和清理函数必须在调用这些宏之前定义，否则您将得到编译错误。以下是一个此技术的示例：
- en: '[PRE29]'
  id: totrans-196
  prefs: []
  type: TYPE_PRE
  zh: '[PRE29]'
- en: 'So now we have two real kernel modules under our belt. Adding another module
    is as simple as this:'
  id: totrans-197
  prefs: []
  type: TYPE_NORMAL
  zh: 因此，我们现在已经有了两个真正的内核模块。添加另一个模块就像这样：
- en: '[PRE30]'
  id: totrans-198
  prefs: []
  type: TYPE_PRE
  zh: '[PRE30]'
- en: Now have a look at [drivers/char/Makefile](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/Makefile)
    for a real world example. As you can see, some things got hardwired into the kernel
    (obj-y) but where have all those obj-m gone? Those familiar with shell scripts
    will easily be able to spot them. For those who are not, the obj-$(CONFIG_FOO)
    entries you see everywhere expand into obj-y or obj-m, depending on whether the
    CONFIG_FOO variable has been set to y or m. While we are at it, those were exactly
    the kind of variables that you have set in the .config file in the top-level directory
    of the Linux kernel source tree, the last time you ran `make menuconfig` or something
    similar.
  id: totrans-199
  prefs: []
  type: TYPE_NORMAL
  zh: 现在看看 [drivers/char/Makefile](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/Makefile)
    以了解一个真实世界的例子。正如您所看到的，一些东西被硬编码到内核中（obj-y），但那些 obj-m 去哪里了？熟悉 shell 脚本的人会很容易地找到它们。对于不熟悉的人，您看到的
    obj-$(CONFIG_FOO) 条目会根据 CONFIG_FOO 变量是否设置为 y 或 m 而展开为 obj-y 或 obj-m。当我们谈论这个问题时，这些正是您在上一次运行
    `make menuconfig` 或类似命令时在 Linux 内核源树顶级目录的 .config 文件中设置的变量。
- en: 4.3 The __init and __exit Macros
  id: totrans-200
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.3 __init 和 __exit 宏
- en: The `__init` macro causes the init function to be discarded and its memory freed
    once the init function finishes for built-in drivers, but not loadable modules.
    If you think about when the init function is invoked, this makes perfect sense.
  id: totrans-201
  prefs: []
  type: TYPE_NORMAL
  zh: '`__init` 宏会导致在初始化函数完成后丢弃初始化函数并释放其内存，但对于可加载模块则不会这样做。如果你考虑初始化函数被调用的时机，这完全说得通。'
- en: There is also an `__initdata` which works similarly to `__init` but for init
    variables rather than functions.
  id: totrans-202
  prefs: []
  type: TYPE_NORMAL
  zh: 此外，还有一个 `__initdata` 宏，它的工作方式与 `__init` 类似，但用于初始化变量而不是函数，而不是函数。
- en: The `__exit` macro causes the omission of the function when the module is built
    into the kernel, and like `__init` , has no effect for loadable modules. Again,
    if you consider when the cleanup function runs, this makes complete sense; built-in
    drivers do not need a cleanup function, while loadable modules do.
  id: totrans-203
  prefs: []
  type: TYPE_NORMAL
  zh: '`__exit` 宏会导致在模块被构建到内核中时省略该函数，并且与 `__init` 一样，对于可加载模块没有影响。再次强调，如果你考虑清理函数运行的时机，这完全说得通；内置驱动程序不需要清理函数，而可加载模块则需要。'
- en: 'These macros are defined in [include/linux/init.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/init.h)
    and serve to free up kernel memory. When you boot your kernel and see something
    like Freeing unused kernel memory: 236k freed, this is precisely what the kernel
    is freeing.'
  id: totrans-204
  prefs: []
  type: TYPE_NORMAL
  zh: 这些宏在 [include/linux/init.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/init.h)
    中定义，用于释放内核内存。当你启动内核并看到类似“释放未使用的内核内存：236k 释放”的消息时，这正是内核正在释放的内容。
- en: '[PRE31]'
  id: totrans-205
  prefs: []
  type: TYPE_PRE
  zh: '[PRE31]'
- en: 4.4 Licensing and Module Documentation
  id: totrans-206
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.4 许可和模块文档
- en: 'Honestly, who loads or even cares about proprietary modules? If you do then
    you might have seen something like this:'
  id: totrans-207
  prefs: []
  type: TYPE_NORMAL
  zh: 老实说，谁会加载甚至关心专有模块？如果你这样做，你可能见过类似这样的：
- en: '[PRE32]'
  id: totrans-208
  prefs: []
  type: TYPE_PRE
  zh: '[PRE32]'
- en: You can use a few macros to indicate the license for your module. Some examples
    are "GPL", "GPL v2", "GPL and additional rights", "Dual BSD/GPL", "Dual MIT/GPL",
    "Dual MPL/GPL" and "Proprietary". They are defined within [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h).
  id: totrans-209
  prefs: []
  type: TYPE_NORMAL
  zh: 你可以使用几个宏来指明你模块的许可。一些例子包括 "GPL"、"GPL v2"、"GPL 和额外权利"、"双 BSD/GPL"、"双 MIT/GPL"、"双
    MPL/GPL" 和 "专有"。它们在 [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
    中定义。
- en: To reference what license you are using, a macro is available called `MODULE_LICENSE`
    . This and a few other macros describing the module are illustrated in the example
    below.
  id: totrans-210
  prefs: []
  type: TYPE_NORMAL
  zh: 要引用你正在使用的许可，有一个名为 `MODULE_LICENSE` 的宏可用。以下示例中展示了该宏以及其他几个描述模块的宏。
- en: '[PRE33]'
  id: totrans-211
  prefs: []
  type: TYPE_PRE
  zh: '[PRE33]'
- en: 4.5 Passing Command Line Arguments to a Module
  id: totrans-212
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.5 将命令行参数传递给模块
- en: Modules can take command line arguments, but not with the argc/argv you might
    be used to.
  id: totrans-213
  prefs: []
  type: TYPE_NORMAL
  zh: 模块可以接受命令行参数，但不是使用你可能习惯的 argc/argv。
- en: To allow arguments to be passed to your module, declare the variables that will
    take the values of the command line arguments as global and then use the `module_param()`
    macro (defined in [include/linux/moduleparam.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/moduleparam.h))
    to set the mechanism up. At runtime, `insmod` will fill the variables with any
    command line arguments that are given, like `insmod mymodule.ko myvariable=5`
    . The variable declarations and macros should be placed at the beginning of the
    module for clarity. The example code should clear up my admittedly lousy explanation.
  id: totrans-214
  prefs: []
  type: TYPE_NORMAL
  zh: 要允许将参数传递给你的模块，声明将接受命令行参数值的变量为全局变量，然后使用 `module_param()` 宏（在 [include/linux/moduleparam.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/moduleparam.h)
    中定义）来设置机制。在运行时，`insmod` 将填充任何给定的命令行参数到变量中，例如 `insmod mymodule.ko myvariable=5`。变量声明和宏应放置在模块的开头以提高清晰度。示例代码应该可以澄清我承认的糟糕解释。
- en: 'The `module_param()` macro takes 3 arguments: the name of the variable, its
    type and permissions for the corresponding file in sysfs. Integer types can be
    signed as usual or unsigned. If you would like to use arrays of integers or strings,
    see `module_param_array()` and `module_param_string()` .'
  id: totrans-215
  prefs: []
  type: TYPE_NORMAL
  zh: '`module_param()` 宏接受 3 个参数：变量的名称、其类型以及对应于 sysfs 中文件的权限。整数类型可以是通常的带符号整数或无符号整数。如果你想要使用整数或字符串数组，请参阅
    `module_param_array()` 和 `module_param_string()` 。'
- en: '[PRE34]'
  id: totrans-216
  prefs: []
  type: TYPE_PRE
  zh: '[PRE34]'
- en: 'Arrays are supported too, but things are a bit different now than they were
    in the olden days. To keep track of the number of parameters, you need to pass
    a pointer to a count variable as the third parameter. At your option, you could
    also ignore the count and pass `NULL` instead. We show both possibilities here:'
  id: totrans-217
  prefs: []
  type: TYPE_NORMAL
  zh: 数组也受到支持，但现在的情况与过去有些不同。为了跟踪参数的数量，您需要将计数变量的指针作为第三个参数传递。根据您的选择，您也可以忽略计数并传递`NULL`。这里展示了两种可能性：
- en: '[PRE35]'
  id: totrans-218
  prefs: []
  type: TYPE_PRE
  zh: '[PRE35]'
- en: A good use for this is to have the module variable’s default values set, like
    a port or IO address. If the variables contain the default values, then perform
    autodetection (explained elsewhere). Otherwise, keep the current value. This will
    be made clear later on.
  id: totrans-219
  prefs: []
  type: TYPE_NORMAL
  zh: 这种用法的一个好例子是设置模块变量的默认值，比如端口或I/O地址。如果变量包含默认值，则执行自动检测（在其他地方解释）。否则，保持当前值。这将在稍后说明。
- en: 'Lastly, there is a macro function, `MODULE_PARM_DESC()` , that is used to document
    arguments that the module can take. It takes two parameters: a variable name and
    a free form string describing that variable.'
  id: totrans-220
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，有一个宏函数`MODULE_PARM_DESC()`，用于记录模块可以接受的参数。它接受两个参数：一个变量名和一个描述该变量的自由格式字符串。
- en: '[PRE36]'
  id: totrans-221
  prefs: []
  type: TYPE_PRE
  zh: '[PRE36]'
- en: 'It is recommended to experiment with the following code:'
  id: totrans-222
  prefs: []
  type: TYPE_NORMAL
  zh: 建议您尝试以下代码：
- en: '[PRE37]'
  id: totrans-223
  prefs: []
  type: TYPE_PRE
  zh: '[PRE37]'
- en: 4.6 Modules Spanning Multiple Files
  id: totrans-224
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.6 多文件跨越的模块
- en: Sometimes it makes sense to divide a kernel module between several source files.
  id: totrans-225
  prefs: []
  type: TYPE_NORMAL
  zh: 有时候，将内核模块分割成几个源文件是有意义的。
- en: Here is an example of such a kernel module.
  id: totrans-226
  prefs: []
  type: TYPE_NORMAL
  zh: 这里是一个这样的内核模块的例子。
- en: '[PRE38]'
  id: totrans-227
  prefs: []
  type: TYPE_PRE
  zh: '[PRE38]'
- en: 'The next file:'
  id: totrans-228
  prefs: []
  type: TYPE_NORMAL
  zh: 下一个文件：
- en: '[PRE39]'
  id: totrans-229
  prefs: []
  type: TYPE_PRE
  zh: '[PRE39]'
- en: 'And finally, the makefile:'
  id: totrans-230
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，是makefile：
- en: '[PRE40]'
  id: totrans-231
  prefs: []
  type: TYPE_PRE
  zh: '[PRE40]'
- en: This is the complete makefile for all the examples we have seen so far. The
    first five lines are nothing special, but for the last example we will need two
    lines. First we invent an object name for our combined module, second we tell
    `make` what object files are part of that module.
  id: totrans-232
  prefs: []
  type: TYPE_NORMAL
  zh: 这是到目前为止我们所看到的所有示例的完整makefile。前五行没有什么特别之处，但为了最后的例子，我们需要两行。首先，我们为我们的组合模块发明一个对象名，然后我们告诉`make`哪些目标文件是该模块的一部分。
- en: 4.7 Building modules for a precompiled kernel
  id: totrans-233
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 4.7 为预编译内核构建模块
- en: 'Obviously, we strongly suggest you to recompile your kernel, so that you can
    enable a number of useful debugging features, such as forced module unloading
    ( `MODULE_FORCE_UNLOAD` ): when this option is enabled, you can force the kernel
    to unload a module even when it believes it is unsafe, via a `sudo rmmod -f module`
    command. This option can save you a lot of time and a number of reboots during
    the development of a module. If you do not want to recompile your kernel then
    you should consider running the examples within a test distribution on a virtual
    machine. If you mess anything up then you can easily reboot or restore the virtual
    machine (VM).'
  id: totrans-234
  prefs: []
  type: TYPE_NORMAL
  zh: 显然，我们强烈建议您重新编译内核，以便您可以使用许多有用的调试功能，例如强制模块卸载（`MODULE_FORCE_UNLOAD`）：当此选项启用时，您可以通过`sudo
    rmmod -f module`命令强制内核卸载模块，即使内核认为这样做不安全。此选项可以在模块开发过程中节省您大量时间和多次重启。如果您不想重新编译内核，那么您应该考虑在虚拟机上运行测试分布中的示例。如果您搞砸了，您可以轻松地重启或恢复虚拟机（VM）。
- en: There are a number of cases in which you may want to load your module into a
    precompiled running kernel, such as the ones shipped with common Linux distributions,
    or a kernel you have compiled in the past. In certain circumstances you could
    require to compile and insert a module into a running kernel which you are not
    allowed to recompile, or on a machine that you prefer not to reboot. If you can’t
    think of a case that will force you to use modules for a precompiled kernel you
    might want to skip this and treat the rest of this chapter as a big footnote.
  id: totrans-235
  prefs: []
  type: TYPE_NORMAL
  zh: 在某些情况下，您可能希望将您的模块加载到预编译的运行内核中，例如与常见Linux发行版一起提供的内核，或者您过去编译的内核。在某些情况下，您可能需要编译并将模块插入到您不允许重新编译的运行内核中，或者在一个您不想重启的机器上。如果您想不出任何必须使用预编译内核模块的情况，您可能想跳过这部分，并将本章的其余部分视为一个大的脚注。
- en: 'Now, if you just install a kernel source tree, use it to compile your kernel
    module and you try to insert your module into the kernel, in most cases you would
    obtain an error as follows:'
  id: totrans-236
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，如果您只是安装了一个内核源树，使用它来编译您的内核模块，并尝试将您的模块插入内核，在大多数情况下，您会得到以下错误：
- en: '[PRE41]'
  id: totrans-237
  prefs: []
  type: TYPE_PRE
  zh: '[PRE41]'
- en: 'Less cryptic information is logged to the systemd journal:'
  id: totrans-238
  prefs: []
  type: TYPE_NORMAL
  zh: 更不神秘的日志信息记录到systemd日志中：
- en: '[PRE42]'
  id: totrans-239
  prefs: []
  type: TYPE_PRE
  zh: '[PRE42]'
- en: 'In other words, your kernel refuses to accept your module because version strings
    (more precisely, version magic, see [include/linux/vermagic.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/vermagic.h))
    do not match. Incidentally, version magic strings are stored in the module object
    in the form of a static string, starting with `vermagic:` . Version data are inserted
    in your module when it is linked against the kernel/module.o file. To inspect
    version magics and other strings stored in a given module, issue the command `modinfo module.ko`
    :'
  id: totrans-240
  prefs: []
  type: TYPE_NORMAL
  zh: 换句话说，您的内核拒绝接受您的模块，因为版本字符串（更准确地说，版本魔法，见[include/linux/vermagic.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/vermagic.h)）不匹配。顺便提一下，版本魔法字符串以`vermagic:`开头的形式存储在模块对象中。当模块与内核/module.o文件链接时，会插入版本数据。要检查给定模块中存储的版本魔法和其他字符串，请发出命令`modinfo module.ko`：
- en: '[PRE43]'
  id: totrans-241
  prefs: []
  type: TYPE_PRE
  zh: '[PRE43]'
- en: To overcome this problem we could resort to the --force-vermagic option, but
    this solution is potentially unsafe, and unquestionably unacceptable in production
    modules. Consequently, we want to compile our module in an environment which was
    identical to the one in which our precompiled kernel was built. How to do this,
    is the subject of the remainder of this chapter.
  id: totrans-242
  prefs: []
  type: TYPE_NORMAL
  zh: 为了克服这个问题，我们可以求助于--force-vermagic选项，但这种解决方案可能不安全，并且在生产模块中无疑是不可接受的。因此，我们希望在构建我们的模块时，环境与我们的预编译内核构建时的环境完全相同。如何做到这一点，是本章剩余部分的主题。
- en: 'First of all, make sure that a kernel source tree is available, having exactly
    the same version as your current kernel. Then, find the configuration file which
    was used to compile your precompiled kernel. Usually, this is available in your
    current boot directory, under a name like config-5.14.x. You may just want to
    copy it to your kernel source tree: ``cp /boot/config-`uname -r` .config`` .'
  id: totrans-243
  prefs: []
  type: TYPE_NORMAL
  zh: 首先，确保有一个与您的当前内核版本完全相同的内核源树。然后，找到用于编译您的预编译内核的配置文件。通常，这个文件位于您的当前引导目录下，名称类似于config-5.14.x。您可能只想将其复制到您的内核源树中：``cp /boot/config-`uname -r` .config``。
- en: 'Let’s focus again on the previous error message: a closer look at the version
    magic strings suggests that, even with two configuration files which are exactly
    the same, a slight difference in the version magic could be possible, and it is
    sufficient to prevent insertion of the module into the kernel. That slight difference,
    namely the custom string which appears in the module’s version magic and not in
    the kernel’s one, is due to a modification with respect to the original, in the
    makefile that some distributions include. Then, examine your Makefile, and make
    sure that the specified version information matches exactly the one used for your
    current kernel. For example, your makefile could start as follows:'
  id: totrans-244
  prefs: []
  type: TYPE_NORMAL
  zh: 让我们再次关注之前的错误信息：仔细查看版本魔法的字符串表明，即使有两个完全相同的配置文件，版本魔法的微小差异也是可能的，并且足以防止模块被插入内核。这种微小差异，即出现在模块版本魔法中而不出现在内核中的自定义字符串，是由于某些发行版包含的makefile相对于原始版本的修改。然后，检查您的Makefile，并确保指定的版本信息与您当前内核使用的版本信息完全匹配。例如，您的makefile可能以以下方式开始：
- en: '[PRE44]'
  id: totrans-245
  prefs: []
  type: TYPE_PRE
  zh: '[PRE44]'
- en: In this case, you need to restore the value of symbol EXTRAVERSION to -rc2.
    We suggest keeping a backup copy of the makefile used to compile your kernel available
    in /lib/modules/5.14.0-rc2/build. A simple command as follows should suffice.
  id: totrans-246
  prefs: []
  type: TYPE_NORMAL
  zh: 在这种情况下，您需要将符号EXTRAVERSION的值恢复到-rc2。我们建议保留一个备份副本的makefile，该makefile用于编译您的内核，并存储在/lib/modules/5.14.0-rc2/build中。以下简单的命令应该足够：
- en: '[PRE45]'
  id: totrans-247
  prefs: []
  type: TYPE_PRE
  zh: '[PRE45]'
- en: Here `` linux-`uname -r` `` is the Linux kernel source you are attempting to
    build.
  id: totrans-248
  prefs: []
  type: TYPE_NORMAL
  zh: 这里 `` linux-`uname -r` `` 是您试图构建的Linux内核源代码。
- en: 'Now, please run `make` to update configuration and version headers and objects:'
  id: totrans-249
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，请运行`make`以更新配置和版本头文件和对象：
- en: '[PRE46]'
  id: totrans-250
  prefs: []
  type: TYPE_PRE
  zh: '[PRE46]'
- en: 'If you do not desire to actually compile the kernel, you can interrupt the
    build process (CTRL-C) just after the SPLIT line, because at that time, the files
    you need are ready. Now you can turn back to the directory of your module and
    compile it: It will be built exactly according to your current kernel settings,
    and it will load into it without any errors.'
  id: totrans-251
  prefs: []
  type: TYPE_NORMAL
  zh: 如果您不想实际编译内核，可以在SPLIT行之后中断构建过程（CTRL-C），因为那时您需要的文件已经准备好了。现在您可以回到您的模块目录并编译它：它将根据您当前的内核设置精确构建，并且可以无错误地加载到内核中。
- en: 5 Preliminaries
  id: totrans-252
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 5 初步
- en: 5.1 How modules begin and end
  id: totrans-253
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.1 模块的开始和结束
- en: A typical program starts with a `main()` function, executes a series of instructions,
    and terminates after completing these instructions. Kernel modules, however, follow
    a different pattern. A module always begins with either the `init_module` function
    or a function designated by the `module_init` call. This function acts as the
    module’s entry point, informing the kernel of the module’s functionalities and
    preparing the kernel to utilize the module’s functions when necessary. After performing
    these tasks, the entry function returns, and the module remains inactive until
    the kernel requires its code.
  id: totrans-254
  prefs: []
  type: TYPE_NORMAL
  zh: 典型的程序从`main()`函数开始，执行一系列指令，并在完成这些指令后终止。然而，内核模块遵循不同的模式。模块总是以`init_module`函数或由`module_init`调用指定的函数开始。这个函数作为模块的入口点，向内核告知模块的功能，并准备内核在需要时利用模块的函数。完成这些任务后，入口函数返回，模块保持不活跃状态，直到内核需要其代码。
- en: All modules conclude by invoking either `cleanup_module` or a function specified
    through the `module_exit` call. This serves as the module’s exit function, reversing
    the actions of the entry function by unregistering the previously registered functionalities.
  id: totrans-255
  prefs: []
  type: TYPE_NORMAL
  zh: 所有模块都以调用`cleanup_module`或通过`module_exit`调用指定的函数结束。这作为模块的出口函数，通过注销之前注册的功能来反转入口函数的操作。
- en: It is mandatory for every module to have both an entry and an exit function.
    While there are multiple methods to define these functions, the terms “entry function”
    and “exit function” are generally used. However, they may occasionally be referred
    to as `init_module` and `cleanup_module` , which are understood to mean the same.
  id: totrans-256
  prefs: []
  type: TYPE_NORMAL
  zh: 每个模块都必须有一个入口函数和一个出口函数。虽然定义这些函数有多种方法，但通常使用“入口函数”和“出口函数”这两个术语。然而，它们有时也可能被称为`init_module`和`cleanup_module`，这些术语都被理解为具有相同的意思。
- en: 5.2 Functions available to modules
  id: totrans-257
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.2 模块可用的函数
- en: Programmers use functions they do not define all the time. A prime example of
    this is `printf()` . You use these library functions which are provided by the
    standard C library, libc. The definitions for these functions do not actually
    enter your program until the linking stage, which ensures that the code (for `printf()`
    for example) is available, and fixes the call instruction to point to that code.
  id: totrans-258
  prefs: []
  type: TYPE_NORMAL
  zh: 程序员经常使用他们自己没有定义的函数。`printf()`就是这样一个典型的例子。你使用的是由标准C库libc提供的库函数。这些函数的定义实际上直到链接阶段才进入你的程序，这确保了代码（例如`printf()`的代码）可用，并固定了调用指令以指向该代码。
- en: Kernel modules are different here, too. In the hello world example, you might
    have noticed that we used a function, `pr_info()` but did not include a standard
    I/O library. That is because modules are object files whose symbols get resolved
    upon running `insmod` or `modprobe` . The definition for the symbols comes from
    the kernel itself; the only external functions you can use are the ones provided
    by the kernel. If you’re curious about what symbols have been exported by your
    kernel, take a look at /proc/kallsyms.
  id: totrans-259
  prefs: []
  type: TYPE_NORMAL
  zh: 内核模块在这里也是如此。在“hello world”示例中，你可能已经注意到我们使用了一个函数`pr_info()`，但没有包含标准I/O库。这是因为模块是对象文件，其符号在运行`insmod`或`modprobe`时得到解析。符号的定义来自内核本身；你可以使用的唯一外部函数是内核提供的函数。如果你对内核导出的符号感兴趣，可以查看/proc/kallsyms。
- en: One point to keep in mind is the difference between library functions and system
    calls. Library functions are higher level, run completely in user space and provide
    a more convenient interface for the programmer to the functions that do the real
    work — system calls. System calls run in kernel mode on the user’s behalf and
    are provided by the kernel itself. The library function `printf()` may look like
    a very general printing function, but all it really does is format the data into
    strings and write the string data using the low-level system call `write()` ,
    which then sends the data to standard output.
  id: totrans-260
  prefs: []
  type: TYPE_NORMAL
  zh: 需要注意的一个问题是库函数和系统调用的区别。库函数是高级的，完全在用户空间运行，并为程序员提供了对执行实际工作的函数（即系统调用）的更方便的接口。系统调用在用户代表下以内核模式运行，并由内核本身提供。库函数`printf()`可能看起来是一个非常通用的打印函数，但实际上它只是将数据格式化为字符串，并使用低级系统调用`write()`将字符串数据写入，然后发送到标准输出。
- en: 'Would you like to see what system calls are made by `printf()` ? It is easy!
    Compile the following program:'
  id: totrans-261
  prefs: []
  type: TYPE_NORMAL
  zh: 你想看看`printf()`做了哪些系统调用吗？这很简单！编译以下程序：
- en: '[PRE47]'
  id: totrans-262
  prefs: []
  type: TYPE_PRE
  zh: '[PRE47]'
- en: with `gcc -Wall -o hello hello.c` . Run the executable with `strace ./hello`
    . Are you impressed? Every line you see corresponds to a system call. [strace](https://strace.io/)
    is a handy program that gives you details about what system calls a program is
    making, including which call is made, what its arguments are and what it returns.
    It is an invaluable tool for figuring out things like what files a program is
    trying to access. Towards the end, you will see a line which looks like `write(1, "hello", 5hello)`
    . There it is. The face behind the `printf()` mask. You may not be familiar with
    write, since most people use library functions for file I/O (like `fopen` , `fputs`
    , `fclose` ). If that is the case, try looking at man 2 write. The 2nd man section
    is devoted to system calls (like `kill()` and `read()` ). The 3rd man section
    is devoted to library calls, which you would probably be more familiar with (like
    `cosh()` and `random()` ).
  id: totrans-263
  prefs: []
  type: TYPE_NORMAL
  zh: 使用 `gcc -Wall -o hello hello.c` 编译。使用 `strace ./hello` 运行可执行文件。你感到惊讶了吗？你看到的每一行都对应一个系统调用。[strace](https://strace.io/)
    是一个方便的程序，它可以提供关于程序正在执行哪些系统调用的详细信息，包括哪个调用被执行、它的参数是什么以及它返回了什么。它是确定诸如程序试图访问哪些文件之类的信息的一个无价工具。在最后，你会看到一行看起来像
    `write(1, "hello", 5hello)` 的内容。就在那里。`printf()` 面具背后的面孔。你可能不熟悉 `write`，因为大多数人使用库函数进行文件
    I/O（如 `fopen`、`fputs`、`fclose`）。如果是这种情况，试着查看 man 2 write。第2个 man 部分（man section）是专门关于系统调用的（如
    `kill()` 和 `read()`）。第3个 man 部分是关于库调用的，你可能更熟悉（如 `cosh()` 和 `random()`）。
- en: You can even write modules to replace the kernel’s system calls, which we will
    do shortly. Crackers often make use of this sort of thing for backdoors or trojans,
    but you can write your own modules to do more benign things, like have the kernel
    log a message whenever someone attempts to delete a file on your system.
  id: totrans-264
  prefs: []
  type: TYPE_NORMAL
  zh: 你甚至可以编写模块来替换内核的系统调用，我们很快就会这样做。黑客经常利用这类东西来创建后门或特洛伊木马，但你也可以编写自己的模块来做更无害的事情，比如当有人试图删除你系统上的文件时，让内核记录一条消息。
- en: 5.3 User Space vs Kernel Space
  id: totrans-265
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.3 用户空间与内核空间
- en: The kernel primarily manages access to resources, be it a video card, hard drive,
    or memory. Programs frequently vie for the same resources. For instance, as a
    document is saved, updatedb might commence updating the locate database. Sessions
    in editors like vim and processes like updatedb can simultaneously utilize the
    hard drive. The kernel’s role is to maintain order, ensuring that users do not
    access resources indiscriminately.
  id: totrans-266
  prefs: []
  type: TYPE_NORMAL
  zh: 内核主要管理对资源的访问，无论是显卡、硬盘还是内存。程序经常争夺相同的资源。例如，当文档被保存时，updatedb 可能开始更新 locate 数据库。在
    vim 等编辑器中的会话和 updatedb 等进程可以同时使用硬盘。内核的作用是维持秩序，确保用户不会无差别地访问资源。
- en: 'To manage this, CPUs operate in different modes, each offering varying levels
    of system control. The Intel 80386 architecture, for example, featured four such
    modes, known as rings. Unix, however, utilizes only two of these rings: the highest
    ring (ring 0, also known as “supervisor mode”, where all actions are permissible)
    and the lowest ring, referred to as “user mode”.'
  id: totrans-267
  prefs: []
  type: TYPE_NORMAL
  zh: 为了管理这一点，CPU 在不同的模式下运行，每个模式提供不同级别的系统控制。例如，Intel 80386 架构具有四种这样的模式，被称为环。然而，Unix
    只利用了这些环中的两个：最高环（ring 0，也称为“管理程序模式”，在这里所有操作都是允许的）和最低环，被称为“用户模式”。
- en: Recall the discussion about library functions vs system calls. Typically, you
    use a library function in user mode. The library function calls one or more system
    calls, and these system calls execute on the library function’s behalf, but do
    so in supervisor mode since they are part of the kernel itself. Once the system
    call completes its task, it returns and execution gets transferred back to user
    mode.
  id: totrans-268
  prefs: []
  type: TYPE_NORMAL
  zh: 回想一下关于库函数与系统调用的讨论。通常，你在用户模式下使用库函数。库函数调用一个或多个系统调用，这些系统调用代表库函数执行，但它们在内核本身的部分以管理程序（supervisor
    mode）执行。一旦系统调用完成其任务，它就会返回，执行控制权就会转回到用户模式。
- en: 5.4 Name Space
  id: totrans-269
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.4 命名空间
- en: When you write a small C program, you use variables which are convenient and
    make sense to the reader. If, on the other hand, you are writing routines which
    will be part of a bigger problem, any global variables you have are part of a
    community of other peoples’ global variables; some of the variable names can clash.
    When a program has lots of global variables which aren’t meaningful enough to
    be distinguished, you get namespace pollution. In large projects, effort must
    be made to remember reserved names, and to find ways to develop a scheme for naming
    unique variable names and symbols.
  id: totrans-270
  prefs: []
  type: TYPE_NORMAL
  zh: 当你编写一个小型C程序时，你会使用方便且对读者有意义的变量。另一方面，如果你正在编写将成为更大问题一部分的例程，你拥有的任何全局变量都是其他人的全局变量社区的一部分；一些变量名可能会冲突。当一个程序有很多没有足够意义来区分的全局变量时，你会得到命名空间污染。在大型项目中，必须努力记住保留的名称，并找到开发命名唯一变量名和符号方案的方法。
- en: When writing kernel code, even the smallest module will be linked against the
    entire kernel, so this is definitely an issue. The best way to deal with this
    is to declare all your variables as static and to use a well-defined prefix for
    your symbols. By convention, all kernel prefixes are lowercase. If you do not
    want to declare everything as static, another option is to declare a symbol table
    and register it with the kernel. We will get to this later.
  id: totrans-271
  prefs: []
  type: TYPE_NORMAL
  zh: 当编写内核代码时，即使是体积最小的模块也会与整个内核链接，所以这确实是一个问题。处理这个问题的最好方法是声明所有变量为静态的，并为你的符号使用一个定义良好的前缀。按照惯例，所有内核前缀都是小写的。如果你不想将所有内容都声明为静态的，另一个选项是声明一个符号表并将其注册到内核中。我们稍后会讨论这个问题。
- en: The file /proc/kallsyms holds all the symbols that the kernel knows about and
    which are therefore accessible to your modules since they share the kernel’s codespace.
  id: totrans-272
  prefs: []
  type: TYPE_NORMAL
  zh: 文件/proc/kallsyms包含了内核所知道的所有符号，因此这些符号可以通过你的模块访问，因为它们共享内核的代码空间。
- en: 5.5 Code space
  id: totrans-273
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.5 代码空间
- en: Memory management is a very complicated subject and the majority of O’Reilly’s
    [Understanding The Linux Kernel](https://www.oreilly.com/library/view/understanding-the-linux/0596005652/)
    exclusively covers memory management! We are not setting out to be experts on
    memory management, but we do need to know a couple of facts to even begin worrying
    about writing real modules.
  id: totrans-274
  prefs: []
  type: TYPE_NORMAL
  zh: 内存管理是一个非常复杂的话题，O'Reilly的[《理解Linux内核》](https://www.oreilly.com/library/view/understanding-the-linux/0596005652/)一书专门涵盖了内存管理！我们并不是要成为内存管理方面的专家，但我们确实需要了解一些事实，才能开始担心编写真正的模块。
- en: If you have not thought about what a segfault really means, you may be surprised
    to hear that pointers do not actually point to memory locations. Not real ones,
    anyway. When a process is created, the kernel sets aside a portion of real physical
    memory and hands it to the process to use for its executing code, variables, stack,
    heap and other things which a computer scientist would know about. This memory
    begins with 0x00000000 and extends up to whatever it needs to be. Since the memory
    space for any two processes does not overlap, every process that can access a
    memory address, say 0xbffff978, would be accessing a different location in real
    physical memory! The processes would be accessing an index named 0xbffff978 which
    points to some kind of offset into the region of memory set aside for that particular
    process. For the most part, a process like our Hello, World program cannot access
    the space of another process, although there are ways which we will talk about
    later.
  id: totrans-275
  prefs: []
  type: TYPE_NORMAL
  zh: 如果你没有想过段错误（segfault）真正意味着什么，你可能会惊讶地听到指针实际上并不指向内存位置。至少不是真正的内存位置。当创建一个进程时，内核会为其实际物理内存分配一部分，并将其交给进程用于执行代码、变量、堆栈、堆和其他计算机科学家会了解的东西。这段内存从0x00000000开始，扩展到所需的任何位置。由于任何两个进程的内存空间都不会重叠，因此任何可以访问内存地址（例如0xbffff978）的进程都会访问实际物理内存中的不同位置！进程会访问一个名为0xbffff978的索引，该索引指向为该特定进程保留的内存区域中的某种偏移量。在大多数情况下，像我们的Hello,
    World程序这样的进程无法访问另一个进程的空间，尽管我们稍后会讨论一些方法。
- en: The kernel has its own space of memory as well. Since a module is code which
    can be dynamically inserted and removed in the kernel (as opposed to a semi-autonomous
    object), it shares the kernel’s codespace rather than having its own. Therefore,
    if your module segfaults, the kernel segfaults. And if you start writing over
    data because of an off-by-one error, then you’re trampling on kernel data (or
    code). This is even worse than it sounds, so try your best to be careful.
  id: totrans-276
  prefs: []
  type: TYPE_NORMAL
  zh: 内核也有自己的内存空间。由于模块是可以在内核中动态插入和删除的代码（与半自主对象相反），它共享内核的代码空间，而不是拥有自己的。因此，如果你的模块发生段错误，内核也会发生段错误。如果你因为偏移量错误而开始覆盖数据，那么你就是在践踏内核数据（或代码）。这比听起来更糟糕，所以请务必小心。
- en: It should be noted that the aforementioned discussion applies to any operating
    system utilizing a monolithic kernel. This concept differs slightly from “building
    all your modules into the kernel”, although the underlying principle is similar.
    In contrast, there are microkernels, where modules are allocated their own code
    space. Two notable examples of microkernels include the [GNU Hurd](https://www.gnu.org/software/hurd/)
    and the [Zircon kernel](https://fuchsia.dev/fuchsia-src/concepts/kernel) of Google’s
    Fuchsia.
  id: totrans-277
  prefs: []
  type: TYPE_NORMAL
  zh: 应当注意，上述讨论适用于任何使用单一内核的操作系统。这个概念与“将所有模块构建到内核中”略有不同，尽管其基本原理相似。相比之下，还有微内核，其中模块分配了自己的代码空间。两个著名的微内核例子包括[GNU
    Hurd](https://www.gnu.org/software/hurd/)和谷歌Fuchsia的[Zircon内核](https://fuchsia.dev/fuchsia-src/concepts/kernel)。
- en: 5.6 Device Drivers
  id: totrans-278
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 5.6 设备驱动程序
- en: One class of module is the device driver, which provides functionality for hardware
    like a serial port. On Unix, each piece of hardware is represented by a file located
    in /dev named a device file which provides the means to communicate with the hardware.
    The device driver provides the communication on behalf of a user program. So the
    es1370.ko sound card device driver might connect the /dev/sound device file to
    the Ensoniq ES1370 sound card. A userspace program like mp3blaster can use /dev/sound
    without ever knowing what kind of sound card is installed.
  id: totrans-279
  prefs: []
  type: TYPE_NORMAL
  zh: 模块的一种类型是设备驱动程序，它为串行端口等硬件提供功能。在Unix系统中，每一块硬件都由位于/dev目录下的一个文件表示，该文件被称为设备文件，它提供了与硬件通信的手段。设备驱动程序代表用户程序进行通信。因此，es1370.ko声卡设备驱动程序可能会将/dev/sound设备文件连接到Ensoniq
    ES1370声卡。像mp3blaster这样的用户空间程序可以使用/dev/sound，而无需知道安装了什么类型的声卡。
- en: 'Let’s look at some device files. Here are device files which represent the
    first three partitions on the primary SCSI storage devices:'
  id: totrans-280
  prefs: []
  type: TYPE_NORMAL
  zh: 让我们来看看一些设备文件。以下是一些代表主SCSI存储设备上前三个分区的设备文件：
- en: '[PRE48]'
  id: totrans-281
  prefs: []
  type: TYPE_PRE
  zh: '[PRE48]'
- en: Notice the column of numbers separated by a comma. The first number is called
    the device’s major number. The second number is the minor number. The major number
    tells you which driver is used to access the hardware. Each driver is assigned
    a unique major number; all device files with the same major number are controlled
    by the same driver. All the above major numbers are 8, because they’re all controlled
    by the same driver.
  id: totrans-282
  prefs: []
  type: TYPE_NORMAL
  zh: 注意到由逗号分隔的数字列。第一个数字被称为设备的major号。第二个数字是minor号。major号告诉你使用哪个驱动程序来访问硬件。每个驱动程序都被分配了一个唯一的major号；所有具有相同major号的设备文件都由同一个驱动程序控制。所有上述major号都是8，因为它们都由同一个驱动程序控制。
- en: The minor number is used by the driver to distinguish between the various hardware
    it controls. Returning to the example above, although all three devices are handled
    by the same driver they have unique minor numbers because the driver sees them
    as being different pieces of hardware.
  id: totrans-283
  prefs: []
  type: TYPE_NORMAL
  zh: 小号数由驱动程序用于区分它所控制的多种硬件。回到上面的例子，尽管这三个设备都由同一个驱动程序处理，但它们具有独特的小号数，因为驱动程序将它们视为不同的硬件。
- en: 'Devices are divided into two types: character devices and block devices. The
    difference is that block devices have a buffer for requests, so they can choose
    the best order in which to respond to the requests. This is important in the case
    of storage devices, where it is faster to read or write sectors which are close
    to each other, rather than those which are further apart. Another difference is
    that block devices can only accept input and return output in blocks (whose size
    can vary according to the device), whereas character devices are allowed to use
    as many or as few bytes as they like. Most devices in the world are character,
    because they don’t need this type of buffering, and they don’t operate with a
    fixed block size. You can tell whether a device file is for a block device or
    a character device by looking at the first character in the output of `ls -l`
    . If it is ‘b’ then it is a block device, and if it is ‘c’ then it is a character
    device. The devices you see above are block devices. Here are some character devices
    (the serial ports):'
  id: totrans-284
  prefs: []
  type: TYPE_NORMAL
  zh: 设备分为两种类型：字符设备和块设备。区别在于块设备有一个请求缓冲区，因此它们可以选择最佳顺序来响应请求。这在存储设备的情况下很重要，因为读取或写入相邻扇区比读取或写入较远扇区要快。另一个区别是，块设备只能以块（其大小可以按设备变化）的形式接受输入并返回输出，而字符设备则允许使用任意多或少的字节。世界上大多数设备都是字符设备，因为它们不需要这种类型的缓冲，并且它们不使用固定块大小操作。你可以通过查看`ls -l`输出的第一个字符来判断设备文件是块设备还是字符设备。如果是‘b’，则它是块设备；如果是‘c’，则它是字符设备。你上面看到的设备是块设备。以下是一些字符设备（串行端口）：
- en: '[PRE49]'
  id: totrans-285
  prefs: []
  type: TYPE_PRE
  zh: '[PRE49]'
- en: If you want to see which major numbers have been assigned, you can look at [Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt).
  id: totrans-286
  prefs: []
  type: TYPE_NORMAL
  zh: 如果你想查看已分配的主编号，你可以查看[Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)。
- en: When the system was installed, all of those device files were created by the
    `mknod` command. To create a new char device named coffee with major/minor number
    12 and 2, simply do `mknod /dev/coffee c 12 2` . You do not have to put your device
    files into /dev, but it is done by convention. Linus put his device files in /dev,
    and so should you. However, when creating a device file for testing purposes,
    it is probably OK to place it in your working directory where you compile the
    kernel module. Just be sure to put it in the right place when you’re done writing
    the device driver.
  id: totrans-287
  prefs: []
  type: TYPE_NORMAL
  zh: 当系统安装时，所有这些设备文件都是由`mknod`命令创建的。要创建一个名为coffee的字符设备，其主/次编号为12和2，只需执行`mknod /dev/coffee c 12 2`。你不必将设备文件放入/dev，但按照惯例是这样做的。林纳斯把他的设备文件放在/dev，你也应该这样做。然而，当为测试目的创建设备文件时，将其放在编译内核模块的工作目录中可能没问题。只是确保在完成设备驱动程序编写后将其放在正确的位置。
- en: A few final points, although implicit in the previous discussion, are worth
    stating explicitly for clarity. When a device file is accessed, the kernel utilizes
    the file’s major number to identify the appropriate driver for handling the access.
    This indicates that the kernel does not necessarily rely on or need to be aware
    of the minor number. It is the driver that concerns itself with the minor number,
    using it to differentiate between various pieces of hardware.
  id: totrans-288
  prefs: []
  type: TYPE_NORMAL
  zh: 虽然在之前的讨论中是隐含的，但以下几点值得明确指出，以增强清晰度。当一个设备文件被访问时，内核利用文件的major编号来识别处理访问的适当驱动程序。这表明内核不一定依赖于或需要知道次编号。是驱动程序关心次编号，并使用它来区分不同的硬件部件。
- en: 'It is important to note that when referring to “hardware”, the term is used
    in a slightly more abstract sense than just a physical PCI card that can be held
    in hand. Consider the following two device files:'
  id: totrans-289
  prefs: []
  type: TYPE_NORMAL
  zh: 需要注意的是，当提到“硬件”时，这个术语的使用比仅仅指可以手持的物理PCI卡要抽象一些。考虑以下两个设备文件：
- en: '[PRE50]'
  id: totrans-290
  prefs: []
  type: TYPE_PRE
  zh: '[PRE50]'
- en: By now you can look at these two device files and know instantly that they are
    block devices and are handled by same driver (block major 8). Sometimes two device
    files with the same major but different minor number can actually represent the
    same piece of physical hardware. So just be aware that the word “hardware” in
    our discussion can mean something very abstract.
  id: totrans-291
  prefs: []
  type: TYPE_NORMAL
  zh: 到现在为止，你可以查看这两个设备文件并立即知道它们是块设备，并由相同的驱动程序处理（块主编号8）。有时，具有相同major编号但不同minor编号的两个设备文件实际上可以代表同一块物理硬件。所以请注意，我们讨论中的“硬件”一词可以指一个非常抽象的概念。
- en: 6 Character Device drivers
  id: totrans-292
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 6 字符设备驱动程序
- en: 6.1 The file_operations Structure
  id: totrans-293
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.1 文件操作结构体
- en: The `file_operations` structure is defined in [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h),
    and holds pointers to functions defined by the driver that perform various operations
    on the device. Each field of the structure corresponds to the address of some
    function defined by the driver to handle a requested operation.
  id: totrans-294
  prefs: []
  type: TYPE_NORMAL
  zh: '`file_operations` 结构体定义在 [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h)，并持有指向驱动程序定义的执行各种设备操作的函数的指针。结构体的每个字段对应于驱动程序定义的用于处理请求操作的函数的地址。'
- en: 'For example, every character driver needs to define a function that reads from
    the device. The `file_operations` structure holds the address of the module’s
    function that performs that operation. Here is what the definition looks like
    for kernel 5.4 and later versions:'
  id: totrans-295
  prefs: []
  type: TYPE_NORMAL
  zh: 例如，每个字符驱动程序都需要定义一个从设备读取数据的函数。`file_operations` 结构体持有执行该操作的模块函数的地址。以下是内核 5.4
    及以后版本的定义示例：
- en: '[PRE51]'
  id: totrans-296
  prefs: []
  type: TYPE_PRE
  zh: '[PRE51]'
- en: Some operations are not implemented by a driver. For example, a driver that
    handles a video card will not need to read from a directory structure. The corresponding
    entries in the `file_operations` structure should be set to `NULL` . [¹](#fn1x0)
  id: totrans-297
  prefs: []
  type: TYPE_NORMAL
  zh: 一些操作不是由驱动程序实现的。例如，处理显卡的驱动程序不需要从目录结构中读取。`file_operations` 结构中的相应条目应设置为 `NULL`。[¹](#fn1x0)
- en: 'There is a gcc extension that makes assigning to this structure more convenient.
    You will see it in modern drivers, and may catch you by surprise. This is what
    the new way of assigning to the structure looks like:'
  id: totrans-298
  prefs: []
  type: TYPE_NORMAL
  zh: 存在一个 gcc 扩展，使得向该结构体赋值更加方便。你会在现代驱动程序中看到它，可能会让你感到惊讶。这是向结构体赋值的新方法的样子：
- en: '[PRE52]'
  id: totrans-299
  prefs: []
  type: TYPE_PRE
  zh: '[PRE52]'
- en: 'However, there is also a C99 way of assigning to elements of a structure, [designated
    initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html), and this
    is definitely preferred over using the GNU extension. You should use this syntax
    in case someone wants to port your driver. It will help with compatibility:'
  id: totrans-300
  prefs: []
  type: TYPE_NORMAL
  zh: 然而，C99 标准中也有一种给结构体元素赋值的方法，称为[指定初始化器](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html)，这比使用
    GNU 扩展更受欢迎。如果你希望有人移植你的驱动程序，你应该使用这种语法。这将有助于兼容性：
- en: '[PRE53]'
  id: totrans-301
  prefs: []
  type: TYPE_PRE
  zh: '[PRE53]'
- en: The meaning is clear, and you should be aware that any member of the structure
    which you do not explicitly assign will be initialized to `NULL` by gcc.
  id: totrans-302
  prefs: []
  type: TYPE_NORMAL
  zh: 意义很明确，你应该知道，结构体中任何未明确赋值的成员将由 gcc 初始化为 `NULL`。
- en: An instance of `struct file_operations` containing pointers to functions that
    are used to implement `read` , `write` , `open` , … system calls is commonly named
    `fops` .
  id: totrans-303
  prefs: []
  type: TYPE_NORMAL
  zh: 包含指向用于实现 `read`、`write`、`open` 等系统调用函数的指针的 `struct file_operations` 实例通常命名为
    `fops`。
- en: Since Linux v3.14, the read, write and seek operations are guaranteed for thread-safe
    by using the `f_pos` specific lock, which makes the file position update to become
    the mutual exclusion. So, we can safely implement those operations without unnecessary
    locking.
  id: totrans-304
  prefs: []
  type: TYPE_NORMAL
  zh: 自 Linux v3.14 版本以来，通过使用 `f_pos` 特定锁来保证读取、写入和查找操作是线程安全的，这使得文件位置更新成为互斥操作。因此，我们可以安全地实现这些操作，而无需不必要的锁定。
- en: Additionally, since Linux v5.6, the `proc_ops` structure was introduced to replace
    the use of the `file_operations` structure when registering proc handlers. See
    more information in the [Section 7.1](#the-procops-structure).
  id: totrans-305
  prefs: []
  type: TYPE_NORMAL
  zh: 此外，自 Linux v5.6 版本以来，引入了 `proc_ops` 结构来替代注册 proc 处理器时使用 `file_operations` 结构。更多详细信息请参阅[第
    7.1 节](#the-procops-structure)。
- en: 6.2 The file structure
  id: totrans-306
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.2 文件结构
- en: Each device is represented in the kernel by a file structure, which is defined
    in [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h).
    Be aware that a file is a kernel level structure and never appears in a user space
    program. It is not the same thing as a `FILE` , which is defined by glibc and
    would never appear in a kernel space function. Also, its name is a bit misleading;
    it represents an abstract open ‘file’, not a file on a disk, which is represented
    by a structure named `inode` .
  id: totrans-307
  prefs: []
  type: TYPE_NORMAL
  zh: 每个设备在内核中通过文件结构体表示，该结构体定义在 [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h)。请注意，文件是一个内核级结构体，永远不会出现在用户空间程序中。它不同于由
    glibc 定义的 `FILE`，后者永远不会出现在内核空间函数中。此外，它的名称有点误导；它代表一个抽象的打开“文件”，而不是磁盘上的文件，磁盘上的文件由名为
    `inode` 的结构体表示。
- en: An instance of struct file is commonly named `filp` . You’ll also see it referred
    to as a struct file object. Resist the temptation.
  id: totrans-308
  prefs: []
  type: TYPE_NORMAL
  zh: struct file 的实例通常命名为 `filp`。你也会看到它被称作 struct file 对象。请抵制这种诱惑。
- en: Go ahead and look at the definition of file. Most of the entries you see, like
    struct dentry, are not used by device drivers, and you can ignore them. This is
    because drivers do not fill file directly; they only use structures contained
    in file which are created elsewhere.
  id: totrans-309
  prefs: []
  type: TYPE_NORMAL
  zh: 继续查看文件的定义。您看到的大部分条目，如struct dentry，都不被设备驱动程序使用，您可以忽略它们。这是因为驱动程序不会直接填充文件；它们只使用文件中包含的结构，这些结构是在其他地方创建的。
- en: 6.3 Registering A Device
  id: totrans-310
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.3 注册设备
- en: As discussed earlier, char devices are accessed through device files, usually
    located in /dev. This is by convention. When writing a driver, it is OK to put
    the device file in your current directory. Just make sure you place it in /dev
    for a production driver. The major number tells you which driver handles which
    device file. The minor number is used only by the driver itself to differentiate
    which device it is operating on, just in case the driver handles more than one
    device.
  id: totrans-311
  prefs: []
  type: TYPE_NORMAL
  zh: 如前所述，字符设备通过设备文件访问，通常位于/dev目录下。这是惯例。在编写驱动程序时，将设备文件放在当前目录中是可以的。只需确保在生产驱动程序中将它放在/dev目录下。主设备号告诉您哪个驱动程序处理哪个设备文件。次设备号仅由驱动程序本身使用，以区分它正在操作哪个设备，以防驱动程序处理多个设备。
- en: Adding a driver to your system means registering it with the kernel. This is
    synonymous with assigning it a major number during the module’s initialization.
    You do this by using the `register_chrdev` function, defined by [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h).
  id: totrans-312
  prefs: []
  type: TYPE_NORMAL
  zh: 将驱动程序添加到您的系统意味着将其注册到内核中。这与在模块初始化期间为其分配一个主设备号同义。您可以通过使用由[include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h)定义的`register_chrdev`函数来完成此操作。
- en: '[PRE54]'
  id: totrans-313
  prefs: []
  type: TYPE_PRE
  zh: '[PRE54]'
- en: Where `unsigned int major` is the major number you want to request, `const char *name`
    is the name of the device as it will appear in /proc/devices and `struct file_operations *fops`
    is a pointer to the `file_operations` table for your driver. A negative return
    value means the registration failed. Note that we didn’t pass the minor number
    to `register_chrdev` . That is because the kernel doesn’t care about the minor
    number; only our driver uses it.
  id: totrans-314
  prefs: []
  type: TYPE_NORMAL
  zh: 在`unsigned int major`是您想要请求的主设备号，`const char *name`是设备在/proc/devices中显示的名称，`struct
    file_operations *fops`是您驱动程序的`file_operations`表的指针。负返回值表示注册失败。请注意，我们没有将次设备号传递给`register_chrdev`。这是因为内核不关心次设备号；只有我们的驱动程序使用它。
- en: Now the question is, how do you get a major number without hijacking one that’s
    already in use? The easiest way would be to look through [Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)
    and pick an unused one. That is a bad way of doing things because you will never
    be sure if the number you picked will be assigned later. The answer is that you
    can ask the kernel to assign you a dynamic major number.
  id: totrans-315
  prefs: []
  type: TYPE_NORMAL
  zh: 现在的问题是，您如何在不抢占已使用的设备号的情况下获得一个主设备号？最简单的方法是查看[Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)并选择一个未使用的设备号。这是一种不好的做法，因为您永远无法确定您选择的号码将来是否会被分配。答案是您可以请求内核为您分配一个动态的主设备号。
- en: If you pass a major number of 0 to `register_chrdev` , the return value will
    be the dynamically allocated major number. The downside is that you can not make
    a device file in advance, since you do not know what the major number will be.
    There are a couple of ways to do this. First, the driver itself can print the
    newly assigned number and we can make the device file by hand. Second, the newly
    registered device will have an entry in /proc/devices, and we can either make
    the device file by hand or write a shell script to read the file in and make the
    device file. The third method is that we can have our driver make the device file
    using the `device_create` function after a successful registration and `device_destroy`
    during the call to `cleanup_module` .
  id: totrans-316
  prefs: []
  type: TYPE_NORMAL
  zh: 如果您将`register_chrdev`的设备号传递为0，则返回值将是动态分配的主设备号。缺点是您无法提前创建设备文件，因为您不知道主设备号是什么。有几种方法可以做到这一点。首先，驱动程序本身可以打印新分配的号码，我们可以手动创建设备文件。其次，新注册的设备将在/proc/devices中有一个条目，我们可以手动创建设备文件或编写shell脚本来读取该文件并创建设备文件。第三种方法是，我们可以在成功注册后使用`device_create`函数创建设备文件，在调用`cleanup_module`期间使用`device_destroy`。
- en: However, `register_chrdev()` would occupy a range of minor numbers associated
    with the given major. The recommended way to reduce waste for char device registration
    is using cdev interface.
  id: totrans-317
  prefs: []
  type: TYPE_NORMAL
  zh: 然而，`register_chrdev()`会占用与给定主设备号相关联的一组次设备号。为了减少字符设备注册的浪费，建议使用cdev接口。
- en: The newer interface completes the char device registration in two distinct steps.
    First, we should register a range of device numbers, which can be completed with
    `register_chrdev_region` or `alloc_chrdev_region` .
  id: totrans-318
  prefs: []
  type: TYPE_NORMAL
  zh: 新的界面通过两个不同的步骤完成字符设备注册。首先，我们应该注册一系列设备号，这可以通过`register_chrdev_region`或`alloc_chrdev_region`来完成。
- en: '[PRE55]'
  id: totrans-319
  prefs: []
  type: TYPE_PRE
  zh: '[PRE55]'
- en: The choice between two different functions depends on whether you know the major
    numbers for your device. Using `register_chrdev_region` if you know the device
    major number and `alloc_chrdev_region` if you would like to allocate a dynamically-allocated
    major number.
  id: totrans-320
  prefs: []
  type: TYPE_NORMAL
  zh: 两个不同函数之间的选择取决于你是否知道你的设备的主设备号。如果你知道设备的主设备号，则使用`register_chrdev_region`；如果你希望分配一个动态分配的主设备号，则使用`alloc_chrdev_region`。
- en: Second, we should initialize the data structure `struct cdev` for our char device
    and associate it with the device numbers. To initialize the `struct cdev` , we
    can achieve by the similar sequence of the following codes.
  id: totrans-321
  prefs: []
  type: TYPE_NORMAL
  zh: 其次，我们应该初始化我们的字符设备的`struct cdev`数据结构并将其与设备号关联起来。为了初始化`struct cdev`，我们可以通过以下代码的类似序列来实现。
- en: '[PRE56]'
  id: totrans-322
  prefs: []
  type: TYPE_PRE
  zh: '[PRE56]'
- en: However, the common usage pattern will embed the `struct cdev` within a device-specific
    structure of your own. In this case, we’ll need `cdev_init` for the initialization.
  id: totrans-323
  prefs: []
  type: TYPE_NORMAL
  zh: 然而，常见的用法模式是将`struct cdev`嵌入到你自己特定的设备结构中。在这种情况下，我们需要`cdev_init`来进行初始化。
- en: '[PRE57]'
  id: totrans-324
  prefs: []
  type: TYPE_PRE
  zh: '[PRE57]'
- en: Once we finish the initialization, we can add the char device to the system
    by using the `cdev_add` .
  id: totrans-325
  prefs: []
  type: TYPE_NORMAL
  zh: 一旦完成初始化，我们可以通过使用`cdev_add`将字符设备添加到系统中。
- en: '[PRE58]'
  id: totrans-326
  prefs: []
  type: TYPE_PRE
  zh: '[PRE58]'
- en: To find an example using the interface, you can see ioctl.c described in [Section 9](#talking-to-device-files).
  id: totrans-327
  prefs: []
  type: TYPE_NORMAL
  zh: 要找到一个使用该接口的示例，你可以查看[第9节](#talking-to-device-files)中描述的ioctl.c。
- en: 6.4 Unregistering A Device
  id: totrans-328
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.4 注销设备
- en: We can not allow the kernel module to be `rmmod` ’ed whenever root feels like
    it. If the device file is opened by a process and then we remove the kernel module,
    using the file would cause a call to the memory location where the appropriate
    function (read/write) used to be. If we are lucky, no other code was loaded there,
    and we’ll get an ugly error message. If we are unlucky, another kernel module
    was loaded into the same location, which means a jump into the middle of another
    function within the kernel. The results of this would be impossible to predict,
    but they can not be very positive.
  id: totrans-329
  prefs: []
  type: TYPE_NORMAL
  zh: 我们不能允许内核模块在root想什么时候就什么时候被`rmmod`。如果设备文件被某个进程打开，然后我们移除内核模块，使用该文件会导致调用曾经用于（读取/写入）适当功能（read/write）的内存位置。如果我们幸运，那里没有加载其他代码，我们可能会得到一个难看的错误信息。如果我们不幸，另一个内核模块被加载到相同的位置，这意味着在内核中的另一个函数中间进行跳转。这种结果是不可预测的，但它们可能不会非常积极。
- en: 'Normally, when you do not want to allow something, you return an error code
    (a negative number) from the function which is supposed to do it. With `cleanup_module`
    that’s impossible because it is a void function. However, there is a counter which
    keeps track of how many processes are using your module. You can see what its
    value is by looking at the 3rd field with the command `cat /proc/modules` or `lsmod`
    . If this number isn’t zero, `rmmod` will fail. Note that you do not have to check
    the counter within `cleanup_module` because the check will be performed for you
    by the system call `sys_delete_module` , defined in [include/linux/syscalls.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/syscalls.h).
    You should not use this counter directly, but there are functions defined in [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
    which let you display this counter:'
  id: totrans-330
  prefs: []
  type: TYPE_NORMAL
  zh: 通常，当你不希望允许某事发生时，你应该从应该执行该操作的功能中返回一个错误代码（一个负数）。对于`cleanup_module`来说这是不可能的，因为它是一个空函数。然而，有一个计数器会跟踪有多少进程正在使用你的模块。你可以通过查看`cat
    /proc/modules`或`lsmod`命令的第三个字段来查看它的值。如果这个数字不是零，`rmmod`将失败。请注意，你不需要在`cleanup_module`中检查这个计数器，因为系统调用`sys_delete_module`会为你执行这个检查，该系统调用定义在[include/linux/syscalls.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/syscalls.h)。你不应该直接使用这个计数器，但在[include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)中定义了一些函数，允许你显示这个计数器：
- en: '`module_refcount(THIS_MODULE)` : Return the value of reference count of current
    module.'
  id: totrans-331
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`module_refcount(THIS_MODULE)`：返回当前模块的引用计数值。'
- en: 'Note: The use of `try_module_get(THIS_MODULE)` and `module_put(THIS_MODULE)`
    within a module’s own code is considered unsafe and should be avoided. The kernel
    automatically manages the reference count when file operations are in progress,
    so manual reference counting is unnecessary and can lead to race conditions. For
    a deeper understanding of when and how to properly use module reference counting,
    see [https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put](https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put).'
  id: totrans-332
  prefs: []
  type: TYPE_NORMAL
  zh: 注意：在模块自己的代码中使用`try_module_get(THIS_MODULE)`和`module_put(THIS_MODULE)`被认为是不安全的，应该避免。当文件操作正在进行时，内核会自动管理引用计数，因此手动引用计数是不必要的，并且可能导致竞争条件。为了更深入地了解何时以及如何正确使用模块引用计数，请参阅[https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put](https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put)。
- en: 6.5 chardev.c
  id: totrans-333
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.5 chardev.c
- en: 'The next code sample creates a char driver named chardev. You can verify it
    has been registered by checking:'
  id: totrans-334
  prefs: []
  type: TYPE_NORMAL
  zh: 下一个代码示例创建了一个名为chardev的字符驱动程序。你可以通过以下方式验证它是否已注册：
- en: '[PRE59]'
  id: totrans-335
  prefs: []
  type: TYPE_PRE
  zh: '[PRE59]'
- en: This will show the device’s major number. To actually use the device, you need
    to read from /dev/chardev (or open the file with a program) and the driver will
    put the number of times the device file has been read from into the file. We do
    not support writing to the file (like `echo "hi" > /dev/chardev` ), but catch
    these attempts and tell the user that the operation is not supported. Do not worry
    if you do not see what we do with the data we read into the buffer; we do not
    do much with it. We simply read in the data and print a message acknowledging
    that we received it.
  id: totrans-336
  prefs: []
  type: TYPE_NORMAL
  zh: 这将显示设备的major号。要实际使用该设备，你需要从/dev/chardev（或使用程序打开文件）读取，并且驱动程序会将设备文件被读取的次数放入文件中。我们不支持向文件写入（如`echo "hi" > /dev/chardev`），但会捕获这些尝试并告知用户该操作不受支持。如果你没有看到我们对读取到缓冲区中的数据做了什么，请不要担心；我们并没有对它做太多处理。我们只是读取数据并打印一条消息，确认我们已经收到了它。
- en: In a multi-threaded environment, without any protection, concurrent access to
    the same memory may lead to race conditions and will not preserve performance.
    In the kernel module, this problem may happen due to multiple instances accessing
    the shared resources. Therefore, a solution is to enforce exclusive access. We
    use atomic Compare-And-Swap (CAS) to maintain the states, `CDEV_NOT_USED` and
    `CDEV_EXCLUSIVE_OPEN` , to determine whether the file is currently opened by someone
    or not. CAS compares the contents of a memory location with the expected value
    and, only if they are the same, modifies the contents of that memory location
    to the desired value. See more concurrency details in the [Section 12](#synchronization).
  id: totrans-337
  prefs: []
  type: TYPE_NORMAL
  zh: 在多线程环境中，如果没有任何保护措施，对同一内存的并发访问可能会导致竞争条件，并且不会保持性能。在内核模块中，这个问题可能由于多个实例访问共享资源而出现。因此，一个解决方案是强制执行独占访问。我们使用原子比较和交换（CAS）来维护状态，`CDEV_NOT_USED`和`CDEV_EXCLUSIVE_OPEN`，以确定文件当前是否被某人打开。CAS比较内存位置的值与预期值，并且只有在它们相同的情况下，才会将该内存位置的值修改为所需的值。更多并发细节请参阅[第12节](#synchronization)。
- en: '[PRE60]'
  id: totrans-338
  prefs: []
  type: TYPE_PRE
  zh: '[PRE60]'
- en: 6.6 Writing Modules for Multiple Kernel Versions
  id: totrans-339
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 6.6 为多个内核版本编写模块
- en: The system calls, which are the major interface the kernel shows to the processes,
    generally stay the same across versions. A new system call may be added, but usually
    the old ones will behave exactly like they used to. This is necessary for backward
    compatibility – a new kernel version is not supposed to break regular processes.
    In most cases, the device files will also remain the same. On the other hand,
    the internal interfaces within the kernel can and do change between versions.
  id: totrans-340
  prefs: []
  type: TYPE_NORMAL
  zh: 系统调用，这是内核向进程展示的主要接口，通常在版本之间保持不变。可能会添加新的系统调用，但通常旧的行为将与以前完全相同。这是为了向后兼容——新的内核版本不应该破坏常规进程。在大多数情况下，设备文件也将保持不变。另一方面，内核内部接口在版本之间可以并且确实会发生变化。
- en: There are differences between different kernel versions, and if you want to
    support multiple kernel versions, you will find yourself having to code conditional
    compilation directives. The way to do this is to compare the macro `LINUX_VERSION_CODE`
    to the macro `KERNEL_VERSION` . In version a.b.c of the kernel, the value of this
    macro would be ![216a+ 28b+ c  ](img/7b83dc18db2a578cd2fb1a4ad4ae584e.png).
  id: totrans-341
  prefs: []
  type: TYPE_NORMAL
  zh: 不同内核版本之间存在差异，如果你想要支持多个内核版本，你将发现自己需要编写条件编译指令。这样做的方法是将宏`LINUX_VERSION_CODE`与宏`KERNEL_VERSION`进行比较。在内核版本a.b.c中，该宏的值将是![216a+
    28b+ c  ](img/7b83dc18db2a578cd2fb1a4ad4ae584e.png)。
- en: 7 The /proc Filesystem
  id: totrans-342
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 7. /proc 文件系统
- en: In Linux, there is an additional mechanism for the kernel and kernel modules
    to send information to processes — the /proc filesystem. Originally designed to
    allow easy access to information about processes (hence the name), it is now used
    by every bit of the kernel which has something interesting to report, such as
    /proc/modules which provides the list of modules and /proc/meminfo which gathers
    memory usage statistics.
  id: totrans-343
  prefs: []
  type: TYPE_NORMAL
  zh: 在Linux中，内核和内核模块向进程发送信息有一个额外的机制——/proc 文件系统。最初设计是为了允许轻松访问有关进程的信息（因此得名），现在内核中任何有有趣信息要报告的部分都会使用它，例如/proc/modules提供了模块列表，/proc/meminfo收集内存使用统计信息。
- en: The method to use the proc filesystem is very similar to the one used with device
    drivers — a structure is created with all the information needed for the /proc
    file, including pointers to any handler functions (in our case there is only one,
    the one called when somebody attempts to read from the /proc file). Then, `init_module`
    registers the structure with the kernel and `cleanup_module` unregisters it.
  id: totrans-344
  prefs: []
  type: TYPE_NORMAL
  zh: 使用proc文件系统的方法与设备驱动程序使用的非常相似——创建一个包含/proc文件所需所有信息的结构，包括任何处理函数的指针（在我们的例子中只有一个，即当有人尝试从/proc文件读取时调用的函数）。然后，`init_module`将结构注册到内核中，`cleanup_module`注销它。
- en: Normal filesystems are located on a disk, rather than just in memory (which
    is where /proc is), and in that case the index-node (inode for short) number is
    a pointer to a disk location where the file’s inode is located. The inode contains
    information about the file, for example the file’s permissions, together with
    a pointer to the disk location or locations where the file’s data can be found.
  id: totrans-345
  prefs: []
  type: TYPE_NORMAL
  zh: 正常的文件系统位于磁盘上，而不是仅仅在内存中（/proc就在这里），在这种情况下，索引节点（简称inode）号是一个指向文件inode所在磁盘位置的指针。inode包含有关文件的信息，例如文件的权限，以及指向文件数据所在磁盘位置或位置的指针。
- en: Because we do not get called when the file is opened or closed, there is nowhere
    for us to put `try_module_get` and `module_put` in this module, and if the file
    is opened and then the module is removed, there is no way to avoid the consequences.
    The kernel’s automatic reference counting for file operations helps prevent module
    removal while files are in use, but /proc files require careful handling due to
    their different lifecycle.
  id: totrans-346
  prefs: []
  type: TYPE_NORMAL
  zh: 由于文件打开或关闭时我们没有被调用，在这个模块中我们无处放置`try_module_get`和`module_put`，如果文件被打开然后模块被移除，就无法避免后果。内核对文件操作的自动引用计数有助于防止在文件使用时移除模块，但由于它们不同的生命周期，/proc文件需要小心处理。
- en: 'Here is a simple example showing how to use a /proc file. This is the HelloWorld
    for the /proc filesystem. There are three parts: create the file /proc/helloworld
    in the function `init_module` , return a value (and a buffer) when the file /proc/helloworld
    is read in the callback function `procfile_read` , and delete the file /proc/helloworld
    in the function `cleanup_module` .'
  id: totrans-347
  prefs: []
  type: TYPE_NORMAL
  zh: 这里有一个简单的示例，展示了如何使用/proc文件。这是/proc文件系统的HelloWorld。它有三个部分：在`init_module`函数中创建/proc/helloworld文件，在回调函数`procfile_read`中读取/proc/helloworld文件时返回一个值（和一个缓冲区），以及在`cleanup_module`函数中删除/proc/helloworld文件。
- en: The /proc/helloworld is created when the module is loaded with the function
    `proc_create` . The return value is a pointer to `struct proc_dir_entry` , and
    it will be used to configure the file /proc/helloworld (for example, the owner
    of this file). A null return value means that the creation has failed.
  id: totrans-348
  prefs: []
  type: TYPE_NORMAL
  zh: 当模块通过`proc_create`函数加载时，会创建/proc/helloworld。返回值是一个指向`struct proc_dir_entry`的指针，它将被用来配置/proc/helloworld文件（例如，该文件的拥有者）。空返回值表示创建失败。
- en: 'Every time the file /proc/helloworld is read, the function `procfile_read`
    is called. Two parameters of this function are very important: the buffer (the
    second parameter) and the offset (the fourth one). The content of the buffer will
    be returned to the application which read it (for example the `cat` command).
    The offset is the current position in the file. If the return value of the function
    is not null, then this function is called again. So be careful with this function,
    if it never returns zero, the read function is called endlessly.'
  id: totrans-349
  prefs: []
  type: TYPE_NORMAL
  zh: 每次读取/proc/helloworld文件时，都会调用`procfile_read`函数。这个函数的两个参数非常重要：缓冲区（第二个参数）和偏移量（第四个参数）。缓冲区的内容将被返回给读取它的应用程序（例如`cat`命令）。偏移量是文件中的当前位置。如果函数的返回值不为空，则此函数将被再次调用。所以要注意这个函数，如果它从不返回零，则读取函数会无限期地被调用。
- en: '[PRE61]'
  id: totrans-350
  prefs: []
  type: TYPE_PRE
  zh: '[PRE61]'
- en: '[PRE62]'
  id: totrans-351
  prefs: []
  type: TYPE_PRE
  zh: '[PRE62]'
- en: 7.1 The proc_ops Structure
  id: totrans-352
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 7.1 proc_ops 结构
- en: The `proc_ops` structure is defined in [include/linux/proc_fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/proc_fs.h)
    in Linux v5.6+. In older kernels, it used `file_operations` for custom hooks in
    /proc filesystem, but it contains some members that are unnecessary in VFS, and
    every time VFS expands `file_operations` set, /proc code comes bloated. On the
    other hand, not only the space, but also some operations were saved by this structure
    to improve its performance. For example, the file which never disappears in /proc
    can set the `proc_flag` as `PROC_ENTRY_PERMANENT` to save 2 atomic ops, 1 allocation,
    1 free in per open/read/close sequence.
  id: totrans-353
  prefs: []
  type: TYPE_NORMAL
  zh: '`proc_ops` 结构定义在 Linux v5.6+ 的 `[include/linux/proc_fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/proc_fs.h)`
    中。在较旧的内核中，它使用 `file_operations` 在 `/proc` 文件系统中进行自定义钩子，但它包含一些在 VFS 中不必要的成员，并且每次
    VFS 扩展 `file_operations` 集合时，`/proc` 代码就会变得臃肿。另一方面，通过这个结构不仅节省了空间，还节省了一些操作以提高其性能。例如，在
    `/proc` 中永远不会消失的文件可以将 `proc_flag` 设置为 `PROC_ENTRY_PERMANENT` 以节省 2 个原子操作、1 次分配和
    1 次释放，在每次打开/读取/关闭序列中。'
- en: 7.2 Read and Write a /proc File
  id: totrans-354
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 7.2 读取和写入 /proc 文件
- en: We have seen a very simple example for a /proc file where we only read the file
    /proc/helloworld. It is also possible to write in a /proc file. It works the same
    way as read, a function is called when the /proc file is written. But there is
    a little difference with read, data comes from user, so you have to import data
    from user space to kernel space (with `copy_from_user` or `get_user` )
  id: totrans-355
  prefs: []
  type: TYPE_NORMAL
  zh: 我们已经看到了一个用于 `/proc` 文件的非常简单的示例，其中我们只读取了 `/proc/helloworld` 文件。也可以写入 `/proc`
    文件。它的工作方式与读取相同，当 `/proc` 文件被写入时，会调用一个函数。但与读取有一点不同，数据来自用户，因此你必须从用户空间导入数据到内核空间（使用
    `copy_from_user` 或 `get_user`）。
- en: The reason for `copy_from_user` or `get_user` is that Linux memory (on Intel
    architecture, it may be different under some other processors) is segmented. This
    means that a pointer, by itself, does not reference a unique location in memory,
    only a location in a memory segment, and you need to know which memory segment
    it is to be able to use it. There is one memory segment for the kernel, and one
    for each of the processes.
  id: totrans-356
  prefs: []
  type: TYPE_NORMAL
  zh: 使用 `copy_from_user` 或 `get_user` 的原因是 Linux 内存（在英特尔架构上，在其他一些处理器下可能不同）是分段的。这意味着一个指针本身并不引用内存中的唯一位置，而只是引用内存段中的一个位置，你需要知道它是哪个内存段才能使用它。有一个内核内存段，以及每个进程的一个内存段。
- en: The only memory segment accessible to a process is its own, so when writing
    regular programs to run as processes, there is no need to worry about segments.
    When you write a kernel module, normally you want to access the kernel memory
    segment, which is handled automatically by the system. However, when the content
    of a memory buffer needs to be passed between the currently running process and
    the kernel, the kernel function receives a pointer to the memory buffer which
    is in the process segment. The `put_user` and `get_user` macros allow you to access
    that memory. These functions handle only one character, you can handle several
    characters with `copy_to_user` and `copy_from_user` . As the buffer (in read or
    write function) is in kernel space, for write function you need to import data
    because it comes from user space, but not for the read function because data is
    already in kernel space.
  id: totrans-357
  prefs: []
  type: TYPE_NORMAL
  zh: 一个进程可访问的唯一内存段是其自身的，因此当编写作为进程运行的常规程序时，无需担心段。当你编写内核模块时，通常你想要访问内核内存段，这由系统自动处理。然而，当需要将内存缓冲区的内容在当前运行的进程和内核之间传递时，内核函数会接收到一个指向进程内存段的内存缓冲区指针。`put_user`
    和 `get_user` 宏允许你访问该内存。这些函数仅处理一个字符，你可以使用 `copy_to_user` 和 `copy_from_user` 来处理多个字符。由于缓冲区（在读取或写入函数中）位于内核空间，对于写入函数，你需要导入数据，因为数据来自用户空间，但对于读取函数则不需要，因为数据已经在内核空间。
- en: '[PRE63]'
  id: totrans-358
  prefs: []
  type: TYPE_PRE
  zh: '[PRE63]'
- en: 7.3 Manage /proc file with standard filesystem
  id: totrans-359
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 7.3 使用标准文件系统管理 /proc 文件
- en: We have seen how to read and write a /proc file with the /proc interface. But
    it is also possible to manage /proc file with inodes. The main concern is to use
    advanced functions, like permissions.
  id: totrans-360
  prefs: []
  type: TYPE_NORMAL
  zh: 我们已经看到了如何使用 `/proc` 接口读取和写入 `/proc` 文件。但也可以使用inode来管理 `/proc` 文件。主要关注的是使用高级功能，如权限。
- en: In Linux, there is a standard mechanism for filesystem registration. Since every
    filesystem has to have its own functions to handle inode and file operations,
    there is a special structure to hold pointers to all those functions, `struct inode_operations`
    , which includes a pointer to `struct proc_ops` .
  id: totrans-361
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Linux 中，存在一个标准的文件系统注册机制。由于每个文件系统都必须有自己的函数来处理inode和文件操作，因此有一个特殊的结构来保存所有这些函数的指针，即
    `struct inode_operations`，它包括一个指向 `struct proc_ops` 的指针。
- en: The difference between file and inode operations is that file operations deal
    with the file itself whereas inode operations deal with ways of referencing the
    file, such as creating links to it.
  id: totrans-362
  prefs: []
  type: TYPE_NORMAL
  zh: 文件操作和inode操作之间的区别在于，文件操作处理文件本身，而inode操作处理引用文件的方式，例如创建指向它的链接。
- en: In /proc, whenever we register a new file, we’re allowed to specify which `struct inode_operations`
    will be used to access to it. This is the mechanism we use, a `struct inode_operations`
    which includes a pointer to a `struct proc_ops` which includes pointers to our
    `procfs_read` and `procfs_write` functions.
  id: totrans-363
  prefs: []
  type: TYPE_NORMAL
  zh: 在/proc中，每当注册一个新的文件时，我们都可以指定将使用哪个`struct inode_operations`来访问它。这是我们使用的机制，一个包含指向`struct
    proc_ops`的指针的`struct inode_operations`，而`struct proc_ops`包含指向我们的`procfs_read`和`procfs_write`函数的指针。
- en: Another interesting point here is the `module_permission` function. This function
    is called whenever a process tries to do something with the /proc file, and it
    can decide whether to allow access or not. Right now it is only based on the operation
    and the uid of the current user (as available in current, a pointer to a structure
    which includes information on the currently running process), but it could be
    based on anything we like, such as what other processes are doing with the same
    file, the time of day, or the last input we received.
  id: totrans-364
  prefs: []
  type: TYPE_NORMAL
  zh: 另一个有趣的地方是`module_permission`函数。每当一个进程尝试对/proc文件进行操作时，都会调用此函数，并且它可以决定是否允许访问。目前，它仅基于操作和当前用户的uid（如当前，一个指向包含当前运行进程信息的结构的指针），但它可以基于我们喜欢的内容，例如其他进程如何使用相同的文件、一天中的时间或我们收到的最后输入。
- en: It is important to note that the standard roles of read and write are reversed
    in the kernel. Read functions are used for output, whereas write functions are
    used for input. The reason for that is that read and write refer to the user’s
    point of view — if a process reads something from the kernel, then the kernel
    needs to output it, and if a process writes something to the kernel, then the
    kernel receives it as input.
  id: totrans-365
  prefs: []
  type: TYPE_NORMAL
  zh: 重要的是要注意，在内核中，标准读取和写入的角色是相反的。读取函数用于输出，而写入函数用于输入。这样做的原因是读取和写入指的是用户的观点——如果一个进程从内核读取某些内容，那么内核需要输出它；如果一个进程向内核写入某些内容，那么内核将其作为输入接收。
- en: '[PRE64]'
  id: totrans-366
  prefs: []
  type: TYPE_PRE
  zh: '[PRE64]'
- en: Still hungry for procfs examples? Well, first of all keep in mind, there are
    rumors around, claiming that procfs is on its way out, consider using sysfs instead.
    Consider using this mechanism, in case you want to document something kernel related
    yourself.
  id: totrans-367
  prefs: []
  type: TYPE_NORMAL
  zh: 还想看更多关于procfs的示例吗？首先，请记住，有传言称procfs正在退出，考虑使用sysfs。如果您想自己记录与内核相关的内容，可以考虑使用这种机制。
- en: 7.4 Manage /proc file with seq_file
  id: totrans-368
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 7.4 使用seq_file管理/proc文件
- en: 'As we have seen, writing a /proc file may be quite “complex”. So to help people
    writing /proc file, there is an API named `seq_file` that helps formatting a /proc
    file for output. It is based on sequence, which is composed of 3 functions: `start()`
    , `next()` , and `stop()` . The `seq_file` API starts a sequence when a user reads
    the /proc file.'
  id: totrans-369
  prefs: []
  type: TYPE_NORMAL
  zh: 正如我们所见，编写/proc文件可能相当“复杂”。因此，为了帮助人们编写/proc文件，存在一个名为`seq_file`的API，它有助于格式化输出/proc文件。它基于序列，由3个函数组成：`start()`、`next()`和`stop()`。当用户读取/proc文件时，`seq_file`
    API会启动一个序列。
- en: A sequence begins with the call of the function `start()` . If the return is
    a non `NULL` value, the function `next()` is called; otherwise, the `stop()` function
    is called directly. This function is an iterator, the goal is to go through all
    the data. Each time `next()` is called, the function `show()` is also called.
    It writes data values in the buffer read by the user. The function `next()` is
    called until it returns `NULL` . The sequence ends when `next()` returns `NULL`
    , then the function `stop()` is called.
  id: totrans-370
  prefs: []
  type: TYPE_NORMAL
  zh: 序列从调用`start()`函数开始。如果返回值是非`NULL`值，则调用`next()`函数；否则，直接调用`stop()`函数。这个函数是一个迭代器，目标是遍历所有数据。每次调用`next()`时，都会调用`show()`函数。它将用户读取的缓冲区中的数据值写入。`next()`函数会一直调用，直到它返回`NULL`。序列在`next()`返回`NULL`时结束，然后调用`stop()`函数。
- en: 'BE CAREFUL: when a sequence is finished, another one starts. That means that
    at the end of function `stop()` , the function `start()` is called again. This
    loop finishes when the function `start()` returns `NULL` . You can see a scheme
    of this in the [Figure 1](#ignorespaces-how-seqfile-works).'
  id: totrans-371
  prefs: []
  type: TYPE_NORMAL
  zh: 注意：当序列结束时，另一个序列开始。这意味着在`stop()`函数的末尾，会再次调用`start()`函数。这个循环在`start()`函数返回`NULL`时结束。您可以在[图1](#ignorespaces-how-seqfile-works)中看到这个方案的示意图。
- en: '![srrsYNNYtaenetoeooertuetupsstrxr((ntn))( tis)istrr teeaNreNatUaUtmLtLmeLmLen?e?ntntt  ](img/8209b6ea27687e8832cc85a37f5784c5.png)'
  id: totrans-372
  prefs: []
  type: TYPE_IMG
  zh: '![srrsYNNYtaenetoeooertuetupsstrxr((ntn))( tis)istrr teeaNreNatUaUtmLtLmeLmLen?e?ntntt](img/8209b6ea27687e8832cc85a37f5784c5.png)'
- en: Figure 1:How seq_file works
  id: totrans-373
  prefs: []
  type: TYPE_NORMAL
  zh: 图1：seq_file的工作原理
- en: The `seq_file` provides basic functions for `proc_ops` , such as `seq_read`
    , `seq_lseek` , and some others. But nothing to write in the /proc file. Of course,
    you can still use the same way as in the previous example.
  id: totrans-374
  prefs: []
  type: TYPE_NORMAL
  zh: '`seq_file`为`proc_ops`提供了基本函数，如`seq_read`，`seq_lseek`等，但不需要在/proc文件中写入任何内容。当然，您仍然可以使用与上一个示例相同的方式。'
- en: '[PRE65]'
  id: totrans-375
  prefs: []
  type: TYPE_PRE
  zh: '[PRE65]'
- en: 'If you want more information, you can read this web page:'
  id: totrans-376
  prefs: []
  type: TYPE_NORMAL
  zh: 如果需要更多信息，您可以阅读此网页：
- en: '[https://lwn.net/Articles/22355/](https://lwn.net/Articles/22355/)'
  id: totrans-377
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[https://lwn.net/Articles/22355/](https://lwn.net/Articles/22355/)'
- en: '[https://kernelnewbies.org/Documents/SeqFileHowTo](https://kernelnewbies.org/Documents/SeqFileHowTo)'
  id: totrans-378
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[https://kernelnewbies.org/Documents/SeqFileHowTo](https://kernelnewbies.org/Documents/SeqFileHowTo)'
- en: You can also read the code of [fs/seq_file.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/seq_file.c)
    in the Linux kernel.
  id: totrans-379
  prefs: []
  type: TYPE_NORMAL
  zh: 您还可以阅读Linux内核中[fs/seq_file.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/seq_file.c)的代码。
- en: '8 sysfs: Interacting with your module'
  id: totrans-380
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 8 sysfs：与您的模块交互
- en: sysfs allows you to interact with the running kernel from userspace by reading
    or setting variables inside of modules. This can be useful for debugging purposes,
    or just as an interface for applications or scripts. You can find sysfs directories
    and files under the /sys directory on your system.
  id: totrans-381
  prefs: []
  type: TYPE_NORMAL
  zh: sysfs允许您通过读取或设置模块内部的变量从用户空间与运行中的内核进行交互。这可以用于调试目的，或者作为应用程序或脚本的接口。您可以在系统中的/sys目录下找到sysfs目录和文件。
- en: '[PRE66]'
  id: totrans-382
  prefs: []
  type: TYPE_PRE
  zh: '[PRE66]'
- en: Attributes can be exported for kobjects in the form of regular files in the
    filesystem. Sysfs forwards file I/O operations to methods defined for the attributes,
    providing a means to read and write kernel attributes.
  id: totrans-383
  prefs: []
  type: TYPE_NORMAL
  zh: 可以将kobjects的属性以常规文件的形式导出至文件系统。Sysfs将文件I/O操作转发到为属性定义的方法，提供了一种读取和写入内核属性的手段。
- en: 'A simple attribute definition:'
  id: totrans-384
  prefs: []
  type: TYPE_NORMAL
  zh: 简单的属性定义：
- en: '[PRE67]'
  id: totrans-385
  prefs: []
  type: TYPE_PRE
  zh: '[PRE67]'
- en: 'For example, the driver model defines `struct device_attribute` like:'
  id: totrans-386
  prefs: []
  type: TYPE_NORMAL
  zh: 例如，驱动模型定义了`struct device_attribute`如下：
- en: '[PRE68]'
  id: totrans-387
  prefs: []
  type: TYPE_PRE
  zh: '[PRE68]'
- en: To read or write attributes, the `show()` or `store()` method must be specified
    when declaring the attribute. For the common cases [include/linux/sysfs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/sysfs.h)
    provides convenience macros ( `__ATTR` , `__ATTR_RO` , `__ATTR_WO` , etc.) to
    make defining attributes easier as well as making code more concise and readable.
  id: totrans-388
  prefs: []
  type: TYPE_NORMAL
  zh: 为了读取或写入属性，在声明属性时必须指定`show()`或`store()`方法。对于常见情况，[include/linux/sysfs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/sysfs.h)提供了便利宏（`__ATTR`，`__ATTR_RO`，`__ATTR_WO`等），使得定义属性更加容易，同时也使代码更加简洁和易于阅读。
- en: An example of a hello world module which includes the creation of a variable
    accessible via sysfs is given below.
  id: totrans-389
  prefs: []
  type: TYPE_NORMAL
  zh: 下面给出了一个包含通过sysfs创建可访问变量的hello world模块的示例。
- en: '[PRE69]'
  id: totrans-390
  prefs: []
  type: TYPE_PRE
  zh: '[PRE69]'
- en: 'Make and install the module:'
  id: totrans-391
  prefs: []
  type: TYPE_NORMAL
  zh: 编译并安装模块：
- en: '[PRE70]'
  id: totrans-392
  prefs: []
  type: TYPE_PRE
  zh: '[PRE70]'
- en: 'Check that it exists:'
  id: totrans-393
  prefs: []
  type: TYPE_NORMAL
  zh: 检查它是否存在：
- en: '[PRE71]'
  id: totrans-394
  prefs: []
  type: TYPE_PRE
  zh: '[PRE71]'
- en: What is the current value of `myvariable` ?
  id: totrans-395
  prefs: []
  type: TYPE_NORMAL
  zh: '`myvariable`的当前值是多少？'
- en: '[PRE72]'
  id: totrans-396
  prefs: []
  type: TYPE_PRE
  zh: '[PRE72]'
- en: Set the value of `myvariable` and check that it changed.
  id: totrans-397
  prefs: []
  type: TYPE_NORMAL
  zh: 设置`myvariable`的值并检查它是否已更改。
- en: '[PRE73]'
  id: totrans-398
  prefs: []
  type: TYPE_PRE
  zh: '[PRE73]'
- en: 'Finally, remove the test module:'
  id: totrans-399
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，移除测试模块：
- en: '[PRE74]'
  id: totrans-400
  prefs: []
  type: TYPE_PRE
  zh: '[PRE74]'
- en: In the above case, we use a simple kobject to create a directory under sysfs,
    and communicate with its attributes. Since Linux v2.6.0, the `kobject` structure
    made its appearance. It was initially meant as a simple way of unifying kernel
    code which manages reference counted objects. After a bit of mission creep, it
    is now the glue that holds much of the device model and its sysfs interface together.
    For more information about kobject and sysfs, see [Documentation/driver-api/driver-model/driver.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/driver-api/driver-model/driver.rst)
    and [https://lwn.net/Articles/51437/](https://lwn.net/Articles/51437/).
  id: totrans-401
  prefs: []
  type: TYPE_NORMAL
  zh: 在上述情况下，我们使用一个简单的kobject在sysfs下创建一个目录，并与它的属性进行通信。自Linux v2.6.0以来，`kobject`结构首次出现。它最初被用作统一管理引用计数对象的内核代码的简单方法。经过一些任务扩张后，它现在成为了连接设备模型及其sysfs接口的粘合剂。有关kobject和sysfs的更多信息，请参阅[Documentation/driver-api/driver-model/driver.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/driver-api/driver-model/driver.rst)和[https://lwn.net/Articles/51437/](https://lwn.net/Articles/51437/)。
- en: 9 Talking To Device Files
  id: totrans-402
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 9 与设备文件通信
- en: Device files are supposed to represent physical devices. Most physical devices
    are used for output as well as input, so there has to be some mechanism for device
    drivers in the kernel to get the output to send to the device from processes.
    This is done by opening the device file for output and writing to it, just like
    writing to a file. In the following example, this is implemented by `device_write`
    .
  id: totrans-403
  prefs: []
  type: TYPE_NORMAL
  zh: 设备文件应该代表物理设备。大多数物理设备既用于输出也用于输入，因此内核中的设备驱动程序必须有一些机制来获取输出并发送到设备。这是通过打开设备文件进行输出并将内容写入其中来完成的，就像写入文件一样。在下面的示例中，这是通过`device_write`实现的。
- en: This is not always enough. Imagine you had a serial port connected to a modem
    (even if you have an internal modem, it is still implemented from the CPU’s perspective
    as a serial port connected to a modem, so you don’t have to tax your imagination
    too hard). The natural thing to do would be to use the device file to write things
    to the modem (either modem commands or data to be sent through the phone line)
    and read things from the modem (either responses for commands or the data received
    through the phone line). However, this leaves open the question of what to do
    when you need to talk to the serial port itself, for example to configure the
    rate at which data is sent and received.
  id: totrans-404
  prefs: []
  type: TYPE_NORMAL
  zh: 这并不总是足够的。想象一下，你有一个连接到调制解调器的串行端口（即使你有内置调制解调器，从CPU的角度来看，它仍然是一个连接到调制解调器的串行端口，所以你不需要过度发挥想象力）。自然的事情是使用设备文件将信息写入调制解调器（无论是调制解调器命令还是要通过电话线发送的数据），并从调制解调器读取信息（无论是命令的响应还是通过电话线接收的数据）。然而，这留下了当你需要与串行端口本身通信时该做什么的问题，例如配置数据发送和接收的速度。
- en: The answer in Unix is to use a special function called `ioctl` (short for Input
    Output ConTroL). Every device can have its own `ioctl` commands, which can be
    read ioctl’s (to send information from a process to the kernel), write ioctl’s
    (to return information to a process), both or neither. Notice here the roles of
    read and write are reversed again, so in ioctl’s read is to send information to
    the kernel and write is to receive information from the kernel.
  id: totrans-405
  prefs: []
  type: TYPE_NORMAL
  zh: 在Unix中，答案是使用一个名为`ioctl`（简称Input Output ConTroL）的特殊函数。每个设备都可以有自己的`ioctl`命令，这些命令可以是读取ioctl（从进程发送信息到内核），写入ioctl（将信息返回给进程），两者都有或两者都没有。注意这里读取和写入的角色再次颠倒，所以在ioctl中，读取是向内核发送信息，写入是从内核接收信息。
- en: 'The ioctl function is called with three parameters: the file descriptor of
    the appropriate device file, the ioctl number, and a parameter, which is of type
    long so you can use a cast to use it to pass anything. You will not be able to
    pass a structure this way, but you will be able to pass a pointer to the structure.
    Here is an example:'
  id: totrans-406
  prefs: []
  type: TYPE_NORMAL
  zh: '`ioctl`函数使用三个参数调用：适当设备文件的文件描述符、`ioctl`编号和一个参数，该参数为`long`类型，因此你可以使用类型转换来使用它传递任何内容。你无法以此方式传递结构体，但你将能够传递结构体的指针。以下是一个示例：'
- en: '[PRE75]'
  id: totrans-407
  prefs: []
  type: TYPE_PRE
  zh: '[PRE75]'
- en: You can see there is an argument called `cmd` in `test_ioctl_ioctl()` function.
    It is the ioctl number. The ioctl number encodes the major device number, the
    type of the ioctl, the command, and the type of the parameter. This ioctl number
    is usually created by a macro call ( `_IO` , `_IOR` , `_IOW` or `_IOWR` — depending
    on the type) in a header file. This header file should then be included both by
    the programs which will use ioctl (so they can generate the appropriate ioctl’s)
    and by the kernel module (so it can understand it). In the example below, the
    header file is chardev.h and the program which uses it is userspace_ioctl.c.
  id: totrans-408
  prefs: []
  type: TYPE_NORMAL
  zh: 你可以在`test_ioctl_ioctl()`函数中看到一个名为`cmd`的参数。它是`ioctl`编号。`ioctl`编号编码了主设备号、`ioctl`的类型、命令和参数的类型。这个`ioctl`编号通常由头文件中的宏调用（`_IO`、`_IOR`、`_IOW`或`_IOWR`——取决于类型）创建。然后，这个头文件应该被将使用`ioctl`的程序（以便它们可以生成适当的`ioctl`）和内核模块（以便它能够理解它）包含。在下面的示例中，头文件是`chardev.h`，使用它的程序是`userspace_ioctl.c`。
- en: If you want to use ioctls in your own kernel modules, it is best to receive
    an official ioctl assignment, so if you accidentally get somebody else’s ioctls,
    or if they get yours, you’ll know something is wrong. For more information, consult
    the kernel source tree at [Documentation/userspace-api/ioctl/ioctl-number.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/userspace-api/ioctl/ioctl-number.rst).
  id: totrans-409
  prefs: []
  type: TYPE_NORMAL
  zh: 如果你想在自己的内核模块中使用`ioctl`，最好是接收一个官方的`ioctl`分配，这样如果你不小心得到了别人的`ioctl`，或者他们得到了你的`ioctl`，你就会知道出了问题。有关更多信息，请参阅内核源树中的[Documentation/userspace-api/ioctl/ioctl-number.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/userspace-api/ioctl/ioctl-number.rst)。
- en: Also, we need to be careful that concurrent access to the shared resources will
    lead to the race condition. The solution is using atomic Compare-And-Swap (CAS),
    which we mentioned at [Section 6.5](#chardevc), to enforce the exclusive access.
  id: totrans-410
  prefs: []
  type: TYPE_NORMAL
  zh: 此外，我们需要小心，对共享资源的并发访问会导致竞争条件。解决方案是使用原子比较和交换（CAS），我们在[第6.5节](#chardevc)中提到过，以强制执行独占访问。
- en: '[PRE76]'
  id: totrans-411
  prefs: []
  type: TYPE_PRE
  zh: '[PRE76]'
- en: '[PRE77]'
  id: totrans-412
  prefs: []
  type: TYPE_PRE
  zh: '[PRE77]'
- en: '[PRE78]'
  id: totrans-413
  prefs: []
  type: TYPE_PRE
  zh: '[PRE78]'
- en: 10 System Calls
  id: totrans-414
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 10 个系统调用
- en: So far, the only thing we’ve done was to use well defined kernel mechanisms
    to register /proc files and device handlers. This is fine if you want to do something
    the kernel programmers thought you’d want, such as write a device driver. But
    what if you want to do something unusual, to change the behavior of the system
    in some way? Then, you are mostly on your own.
  id: totrans-415
  prefs: []
  type: TYPE_NORMAL
  zh: 到目前为止，我们唯一做的事情是使用定义良好的内核机制来注册/proc文件和设备处理程序。如果你只想做内核程序员认为你会想做的事情，比如编写设备驱动程序，这是可以的。但如果你想做些不同寻常的事情，以某种方式改变系统的行为呢？那么，你基本上是孤军奋战。
- en: Notice that this example has been unavailable since Linux v6.9\. Specifically,
    after this [commit](https://github.com/torvalds/linux/commit/1e3ad78334a69b36e107232e337f9d693dcc9df2#diff-4a16bf89a09b4f49669a30d54540f0b936ea0224dc6ee9edfa7700deb16c3e11R52),
    due to the system call table changing the implementation from an indirect function
    call table to a switch statement for security issues, such as Branch History Injection
    (BHI) attack. See more information [here](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060909).
  id: totrans-416
  prefs: []
  type: TYPE_NORMAL
  zh: 注意，这个例子自Linux v6.9以来就不可用。具体来说，在这次[提交](https://github.com/torvalds/linux/commit/1e3ad78334a69b36e107232e337f9d693dcc9df2#diff-4a16bf89a09b4f49669a30d54540f0b936ea0224dc6ee9edfa7700deb16c3e11R52)之后，由于系统调用表从间接函数调用表更改为用于安全问题的开关语句（例如分支历史注入攻击），因此不可用。更多信息请参阅[这里](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060909)。
- en: Should one choose not to use a virtual machine, kernel programming can become
    risky. For example, while writing the code below, the `open()` system call was
    inadvertently disrupted. This resulted in an inability to open any files, run
    programs, or shut down the system, necessitating a restart of the virtual machine.
    Fortunately, no critical files were lost in this instance. However, if such modifications
    were made on a live, mission-critical system, the consequences could be severe.
    To mitigate the risk of file loss, even in a test environment, it is advised to
    execute `sync` right before using `insmod` and `rmmod` .
  id: totrans-417
  prefs: []
  type: TYPE_NORMAL
  zh: 如果选择不使用虚拟机，内核编程可能会变得危险。例如，在编写以下代码时，`open()` 系统调用意外中断。这导致无法打开任何文件、运行程序或关闭系统，需要重启虚拟机。幸运的是，这次没有丢失任何关键文件。然而，如果在实时、关键任务系统中进行此类修改，后果可能非常严重。为了降低文件丢失的风险，即使在测试环境中，建议在执行
    `insmod` 和 `rmmod` 之前立即执行 `sync`。
- en: Forget about /proc files, forget about device files. They are just minor details.
    Minutiae in the vast expanse of the universe. The real process to kernel communication
    mechanism, the one used by all processes, is system calls. When a process requests
    a service from the kernel (such as opening a file, forking to a new process, or
    requesting more memory), this is the mechanism used. If you want to change the
    behaviour of the kernel in interesting ways, this is the place to do it. By the
    way, if you want to see which system calls a program uses, run `strace <arguments>`
    .
  id: totrans-418
  prefs: []
  type: TYPE_NORMAL
  zh: 忘记/proc文件，忘记设备文件。它们只是细节。在广阔的宇宙中微不足道。真正的进程与内核通信机制，所有进程都使用的，是系统调用。当进程从内核请求服务（如打开文件、创建新进程或请求更多内存）时，这就是使用的机制。如果你想以有趣的方式改变内核的行为，这就是你要做的。顺便说一句，如果你想查看程序使用的系统调用，请运行
    `strace <arguments>`。
- en: In general, a process is not supposed to be able to access the kernel. It can
    not access kernel memory and it can’t call kernel functions. The hardware of the
    CPU enforces this (that is the reason why it is called “protected mode” or “page
    protection”).
  id: totrans-419
  prefs: []
  type: TYPE_NORMAL
  zh: 通常，进程不应该能够访问内核。它不能访问内核内存，也不能调用内核函数。CPU的硬件强制执行这一点（这就是为什么它被称为“保护模式”或“页面保护”）。
- en: System calls are an exception to this general rule. What happens is that the
    process fills the registers with the appropriate values and then calls a special
    instruction which jumps to a previously defined location in the kernel (of course,
    that location is readable by user processes, it is not writable by them). Under
    Intel CPUs, this is done by means of interrupt 0x80\. The hardware knows that
    once you jump to this location, you are no longer running in restricted user mode,
    but as the operating system kernel — and therefore you’re allowed to do whatever
    you want.
  id: totrans-420
  prefs: []
  type: TYPE_NORMAL
  zh: 系统调用是这一通用规则的例外。发生的情况是，进程将寄存器填充为适当的值，然后调用一个特殊指令，该指令跳转到内核中预先定义的位置（当然，该位置对用户进程是可读的，但对它们是不可写的）。在Intel
    CPU上，这是通过中断0x80来完成的。硬件知道一旦你跳转到这个位置，你就不再在受限用户模式下运行，而是作为操作系统内核——因此你可以做任何你想做的事情。
- en: The location in the kernel a process can jump to is called system_call. The
    procedure at that location checks the system call number, which tells the kernel
    what service the process requested. Then, it looks at the table of system calls
    ( `sys_call_table` ) to see the address of the kernel function to call. Then it
    calls the function, and after it returns, does a few system checks and then return
    back to the process (or to a different process, if the process time ran out).
    If you want to read this code, it is at the source file arch/$(architecture)/kernel/entry.S,
    after the line `ENTRY(system_call)` .
  id: totrans-421
  prefs: []
  type: TYPE_NORMAL
  zh: 进程可以跳转到的内核中的位置称为系统调用。该位置的过程检查系统调用号，这告诉内核进程请求了什么服务。然后，它查看系统调用表（`sys_call_table`），以查看要调用的内核函数的地址。然后它调用该函数，并在返回后执行一些系统检查，然后返回到进程（或者如果进程时间耗尽，返回到不同的进程）。如果你想阅读这段代码，它位于源文件`arch/$(architecture)/kernel/entry.S`中，在`ENTRY(system_call)`行之后。
- en: So, if we want to change the way a certain system call works, what we need to
    do is to write our own function to implement it (usually by adding a bit of our
    own code, and then calling the original function) and then change the pointer
    at `sys_call_table` to point to our function. Because we might be removed later
    and we don’t want to leave the system in an unstable state, it’s important for
    `cleanup_module` to restore the table to its original state.
  id: totrans-422
  prefs: []
  type: TYPE_NORMAL
  zh: 因此，如果我们想改变某个系统调用的行为方式，我们需要编写自己的函数来实现它（通常是通过添加一些自己的代码，然后调用原始函数）并随后将`sys_call_table`中的指针改为指向我们的函数。因为我们可能会被移除，而且我们不希望留下一个不稳定的系统状态，所以对于`cleanup_module`来说，将表恢复到原始状态是很重要的。
- en: To modify the content of `sys_call_table` , we need to consider the control
    register. A control register is a processor register that changes or controls
    the general behavior of the CPU. For x86 architecture, the cr0 register has various
    control flags that modify the basic operation of the processor. The WP flag in
    cr0 stands for write protection. Once the WP flag is set, the processor disallows
    further write attempts to the read-only sections. Therefore, we must disable the
    WP flag before modifying `sys_call_table` . Since Linux v5.3, the `write_cr0`
    function cannot be used because of the sensitive cr0 bits pinned by the security
    issue, the attacker may write into CPU control registers to disable CPU protections
    like write protection. As a result, we have to provide the custom assembly routine
    to bypass it.
  id: totrans-423
  prefs: []
  type: TYPE_NORMAL
  zh: 要修改`sys_call_table`的内容，我们需要考虑控制寄存器。控制寄存器是处理器寄存器，它改变或控制CPU的一般行为。对于x86架构，cr0寄存器有各种控制标志，可以修改处理器的基本操作。cr0中的WP标志代表写保护。一旦WP标志被设置，处理器将不允许进一步的写入尝试到只读部分。因此，在修改`sys_call_table`之前，我们必须禁用WP标志。由于Linux
    v5.3以来，`write_cr0`函数不能使用，因为敏感的cr0位被安全问题固定，攻击者可能写入CPU控制寄存器来禁用CPU保护，如写保护。因此，我们必须提供定制的汇编例程来绕过它。
- en: However, `sys_call_table` symbol is unexported to prevent misuse. But there
    have few ways to get the symbol, manual symbol lookup and `kallsyms_lookup_name`
    . Here we use both depend on the kernel version.
  id: totrans-424
  prefs: []
  type: TYPE_NORMAL
  zh: 然而，`sys_call_table`符号未导出，以防止误用。但获取该符号的方法很少，包括手动符号查找和`kallsyms_lookup_name`。在这里，我们根据内核版本使用这两种方法。
- en: 'Because of the control-flow integrity, which is a technique to prevent the
    redirect execution code from the attacker, for making sure that the indirect calls
    go to the expected addresses and the return addresses are not changed. Since Linux
    v5.7, the kernel patched the series of control-flow enforcement (CET) for x86,
    and some configurations of GCC, like GCC versions 9 and 10 in Ubuntu Linux, will
    add with CET (the -fcf-protection option) in the kernel by default. Using that
    GCC to compile the kernel with retpoline off may result in CET being enabled in
    the kernel. You can use the following command to check out the -fcf-protection
    option is enabled or not:'
  id: totrans-425
  prefs: []
  type: TYPE_NORMAL
  zh: 由于控制流完整性（一种防止攻击者重定向执行代码的技术），以确保间接调用到达预期的地址，并且返回地址没有被更改。自Linux v5.7以来，内核修补了针对x86的控制流强制（CET）系列，并且GCC的一些配置，如Ubuntu
    Linux中的GCC版本9和10，默认会在内核中添加CET（-fcf-protection选项）。使用该GCC编译内核并关闭retpoline可能会导致内核中启用CET。您可以使用以下命令检查是否启用了-fcf-protection选项：
- en: '[PRE79]'
  id: totrans-426
  prefs: []
  type: TYPE_PRE
  zh: '[PRE79]'
- en: But CET should not be enabled in the kernel, it may break the Kprobes and bpf.
    Consequently, CET is disabled since v5.11\. To guarantee the manual symbol lookup
    worked, we only use up to v5.4.
  id: totrans-427
  prefs: []
  type: TYPE_NORMAL
  zh: 但是在内核中不应启用CET（Control Flow Enforcement Technology），它可能会破坏Kprobes和bpf。因此，自v5.11版本以来，CET已被禁用。为了保证手动符号查找功能正常工作，我们只使用到v5.4版本。
- en: Unfortunately, since Linux v5.7 `kallsyms_lookup_name` is also unexported, it
    needs certain trick to get the address of `kallsyms_lookup_name` . If `CONFIG_KPROBES`
    is enabled, we can facilitate the retrieval of function addresses by means of
    Kprobes to dynamically break into the specific kernel routine. Kprobes inserts
    a breakpoint at the entry of function by replacing the first bytes of the probed
    instruction. When a CPU hits the breakpoint, registers are stored, and the control
    will pass to Kprobes. It passes the addresses of the saved registers and the Kprobe
    struct to the handler you defined, then executes it. Kprobes can be registered
    by symbol name or address. Within the symbol name, the address will be handled
    by the kernel.
  id: totrans-428
  prefs: []
  type: TYPE_NORMAL
  zh: 不幸的是，由于Linux v5.7 `kallsyms_lookup_name` 也未导出，需要一定的技巧来获取`kallsyms_lookup_name`的地址。如果启用了`CONFIG_KPROBES`，我们可以通过Kprobes动态中断特定的内核例程来方便地检索函数地址。Kprobes通过替换被探测指令的第一字节在函数入口处插入一个断点。当CPU遇到断点时，寄存器被存储，控制权传递给Kprobes。它将保存的寄存器地址和Kprobe结构传递给您定义的处理程序，然后执行它。Kprobes可以通过符号名称或地址进行注册。在符号名称中，地址将由内核处理。
- en: 'Otherwise, specify the address of `sys_call_table` from /proc/kallsyms and
    /boot/System.map into `sym` parameter. Following is the sample usage for /proc/kallsyms:'
  id: totrans-429
  prefs: []
  type: TYPE_NORMAL
  zh: 否则，请从/proc/kallsyms和/boot/System.map中指定`sys_call_table`的地址到`sym`参数中。以下是从/proc/kallsyms的示例用法：
- en: '[PRE80]'
  id: totrans-430
  prefs: []
  type: TYPE_PRE
  zh: '[PRE80]'
- en: Using the address from /boot/System.map, be careful about KASLR (Kernel Address
    Space Layout Randomization). KASLR may randomize the address of kernel code and
    data at every boot time, such as the static address listed in /boot/System.map
    will offset by some entropy. The purpose of KASLR is to protect the kernel space
    from the attacker. Without KASLR, the attacker may find the target address in
    the fixed address easily. Then the attacker can use return-oriented programming
    to insert some malicious codes to execute or receive the target data by a tampered
    pointer. KASLR mitigates these kinds of attacks because the attacker cannot immediately
    know the target address, but a brute-force attack can still work. If the address
    of a symbol in /proc/kallsyms is different from the address in /boot/System.map,
    KASLR is enabled with the kernel, which your system running on.
  id: totrans-431
  prefs: []
  type: TYPE_NORMAL
  zh: 使用/boot/System.map中的地址时，请注意KASLR（内核地址空间布局随机化）。KASLR可能会在每次启动时随机化内核代码和数据地址，例如，/boot/System.map中列出的静态地址将偏移一定的熵。KASLR的目的是为了保护内核空间免受攻击者攻击。如果没有KASLR，攻击者可以轻易地找到固定地址中的目标地址。然后攻击者可以使用返回导向编程插入一些恶意代码来执行或通过篡改的指针接收目标数据。KASLR通过攻击者无法立即知道目标地址来减轻这类攻击。如果/proc/kallsyms中符号的地址与/boot/System.map中的地址不同，则表示内核启用了KASLR，您正在运行的系统就是这种情况。
- en: '[PRE81]'
  id: totrans-432
  prefs: []
  type: TYPE_PRE
  zh: '[PRE81]'
- en: 'If KASLR is enabled, we have to take care of the address from /proc/kallsyms
    each time we reboot the machine. In order to use the address from /boot/System.map,
    make sure that KASLR is disabled. You can add the nokaslr for disabling KASLR
    in next booting time:'
  id: totrans-433
  prefs: []
  type: TYPE_NORMAL
  zh: 如果启用了KASLR（Kernel Address Space Layout Randomization），每次重启机器时，我们都必须注意/proc/kallsyms中的地址。为了使用/boot/System.map中的地址，请确保KASLR已禁用。您可以在下一次启动时添加nokaslr来禁用KASLR：
- en: '[PRE82]'
  id: totrans-434
  prefs: []
  type: TYPE_PRE
  zh: '[PRE82]'
- en: 'For more information, check out the following:'
  id: totrans-435
  prefs: []
  type: TYPE_NORMAL
  zh: 更多信息，请参阅以下内容：
- en: '[Cook: Security things in Linux v5.3](https://lwn.net/Articles/804849/)'
  id: totrans-436
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[Cook: Linux v5.3 中的安全事项](https://lwn.net/Articles/804849/)'
- en: '[Unexporting the system call table](https://lwn.net/Articles/12211/)'
  id: totrans-437
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[取消导出系统调用表](https://lwn.net/Articles/12211/)'
- en: '[Control-flow integrity for the kernel](https://lwn.net/Articles/810077/)'
  id: totrans-438
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[内核的控制流完整性](https://lwn.net/Articles/810077/)'
- en: '[Unexporting kallsyms_lookup_name()](https://lwn.net/Articles/813350/)'
  id: totrans-439
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[取消导出 kallsyms_lookup_name()](https://lwn.net/Articles/813350/)'
- en: '[Kernel Probes (Kprobes)](https://www.kernel.org/doc/Documentation/kprobes.txt)'
  id: totrans-440
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[内核探针 (Kprobes)](https://www.kernel.org/doc/Documentation/kprobes.txt)'
- en: '[Kernel address space layout randomization](https://lwn.net/Articles/569635/)'
  id: totrans-441
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[内核地址空间布局随机化](https://lwn.net/Articles/569635/)'
- en: The source code here is an example of such a kernel module. We want to “spy”
    on a certain user, and to `pr_info()` a message whenever that user opens a file.
    Towards this end, we replace the system call to open a file with our own function,
    called `our_sys_openat` . This function checks the uid (user’s id) of the current
    process, and if it is equal to the uid we spy on, it calls `pr_info()` to display
    the name of the file to be opened. Then, either way, it calls the original `openat()`
    function with the same parameters, to actually open the file.
  id: totrans-442
  prefs: []
  type: TYPE_NORMAL
  zh: 这里提供的源代码是一个这样的内核模块示例。我们想要“监视”某个特定的用户，并且每当该用户打开文件时，就使用 `pr_info()` 显示一条消息。为此，我们用我们自己的函数替换打开文件的系统调用，该函数称为
    `our_sys_openat`。这个函数检查当前进程的 uid（用户 ID），如果它与我们要监视的 uid 相等，它就调用 `pr_info()` 显示要打开的文件名。然后，无论如何，它都使用相同的参数调用原始的
    `openat()` 函数，以实际打开文件。
- en: The `init_module` function replaces the appropriate location in `sys_call_table`
    and keeps the original pointer in a variable. The `cleanup_module` function uses
    that variable to restore everything back to normal. This approach is dangerous,
    because of the possibility of two kernel modules changing the same system call.
    Imagine we have two kernel modules, A and B. A’s openat system call will be `A_openat`
    and B’s will be `B_openat` . Now, when A is inserted into the kernel, the system
    call is replaced with `A_openat` , which will call the original `sys_openat` when
    it is done. Next, B is inserted into the kernel, which replaces the system call
    with `B_openat` , which will call what it thinks is the original system call,
    `A_openat` , when it’s done.
  id: totrans-443
  prefs: []
  type: TYPE_NORMAL
  zh: '`init_module` 函数替换了 `sys_call_table` 中的适当位置，并将原始指针保存在一个变量中。`cleanup_module`
    函数使用该变量将一切恢复到正常状态。这种方法很危险，因为可能有两个内核模块更改相同的系统调用。想象一下，我们有两个内核模块，A 和 B。A 的 openat
    系统调用将是 `A_openat`，而 B 的将是 `B_openat`。现在，当 A 被插入内核时，系统调用被替换为 `A_openat`，完成后将调用原始的
    `sys_openat`。接下来，B 被插入内核，它将系统调用替换为 `B_openat`，完成后将调用它认为的原始系统调用，即 `A_openat`。'
- en: Now, if B is removed first, everything will be well — it will simply restore
    the system call to `A_openat` , which calls the original. However, if A is removed
    and then B is removed, the system will crash. A’s removal will restore the system
    call to the original, `sys_openat` , cutting B out of the loop. Then, when B is
    removed, it will restore the system call to what it thinks is the original, `A_openat`
    , which is no longer in memory. At first glance, it appears we could solve this
    particular problem by checking if the system call is equal to our open function
    and if so not changing it at all (so that B won’t change the system call when
    it is removed), but that will cause an even worse problem. When A is removed,
    it sees that the system call was changed to `B_openat` so that it is no longer
    pointing to `A_openat` , so it will not restore it to `sys_openat` before it is
    removed from memory. Unfortunately, `B_openat` will still try to call `A_openat`
    which is no longer there, so that even without removing B the system would crash.
  id: totrans-444
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，如果首先移除 B，一切都会好——它将简单地恢复系统调用到 `A_openat`，这将调用原始的。然而，如果先移除 A，然后移除 B，系统将崩溃。A
    的移除将恢复系统调用到原始的 `sys_openat`，将 B 排除在循环之外。然后，当 B 被移除时，它将恢复系统调用到它认为的原始，即 `A_openat`，但这个调用已经不在内存中了。乍一看，我们似乎可以通过检查系统调用是否等于我们的
    open 函数，如果是，则完全不更改它（这样 B 在移除时就不会更改系统调用），但这将导致更糟糕的问题。当 A 被移除时，它看到系统调用已被更改为 `B_openat`，因此它不再指向
    `A_openat`，所以在从内存中移除之前不会将其恢复到 `sys_openat`。不幸的是，`B_openat` 仍然会尝试调用不再存在的 `A_openat`，因此即使没有移除
    B，系统也会崩溃。
- en: For x86 architecture, the system call table cannot be used to invoke a system
    call after commit [1e3ad78](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e3ad78334a69b36e107232e337f9d693dcc9df2)
    since v6.9\. This commit has been backported to long term stable kernels, like
    v5.15.154+, v6.1.85+, v6.6.26+ and v6.8.5+, see this [answer](https://stackoverflow.com/a/78607015)
    for more details. In this case, thanks to Kprobes, a hook can be used instead
    on the system call entry to intercept the system call.
  id: totrans-445
  prefs: []
  type: TYPE_NORMAL
  zh: 对于x86架构，从v6.9版本开始，系统调用表不能用于在提交后调用系统调用[1e3ad78](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e3ad78334a69b36e107232e337f9d693dcc9df2)。这个提交已经回滚到长期稳定的内核，如v5.15.154+、v6.1.85+、v6.6.26+和v6.8.5+，更多详情请参阅这个[回答](https://stackoverflow.com/a/78607015)。在这种情况下，多亏了Kprobes，可以在系统调用入口处使用钩子来拦截系统调用。
- en: Note that all the related problems make syscall stealing unfeasible for production
    use. In order to keep people from doing potentially harmful things `sys_call_table`
    is no longer exported. This means, if you want to do something more than a mere
    dry run of this example, you will have to patch your current kernel in order to
    have `sys_call_table` exported.
  id: totrans-446
  prefs: []
  type: TYPE_NORMAL
  zh: 注意，所有相关的问题使得syscall stealing在生产环境中不可行。为了防止人们做可能有害的事情，`sys_call_table`不再导出。这意味着，如果你想做一些不仅仅是这个例子简单运行的事情，你将不得不修补你的当前内核以导出`sys_call_table`。
- en: '[PRE83]'
  id: totrans-447
  prefs: []
  type: TYPE_PRE
  zh: '[PRE83]'
- en: 11 Blocking Processes and threads
  id: totrans-448
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 11 阻塞进程和线程
- en: 11.1 Sleep
  id: totrans-449
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 11.1 睡眠
- en: 'What do you do when somebody asks you for something you can not do right away?
    If you are a human being and you are bothered by a human being, the only thing
    you can say is: "Not right now, I’m busy. Go away!". But if you are a kernel module
    and you are bothered by a process, you have another possibility. You can put the
    process to sleep until you can service it. After all, processes are being put
    to sleep by the kernel and woken up all the time (that is the way multiple processes
    appear to run on the same time on a single CPU).'
  id: totrans-450
  prefs: []
  type: TYPE_NORMAL
  zh: 当有人向你请求你无法立即完成的事情时，你会怎么做？如果你是人，并且被另一个人打扰，你能说的唯一一件事就是：“现在不行，我正忙。请走开！”但如果你是一个内核模块，并且被一个进程打扰，你还有另一种可能性。你可以将进程置于睡眠状态，直到你可以服务它。毕竟，进程是由内核置入睡眠状态并随时唤醒的（这就是为什么多个进程似乎可以在单个CPU上同时运行的原因）。
- en: 'This kernel module is an example of this. The file (called /proc/sleep) can
    only be opened by a single process at a time. If the file is already open, the
    kernel module calls `wait_event_interruptible` . The easiest way to keep a file
    open is to open it with:'
  id: totrans-451
  prefs: []
  type: TYPE_NORMAL
  zh: 这个内核模块是这个例子。文件（称为/proc/sleep）一次只能由一个进程打开。如果文件已经打开，内核模块会调用`wait_event_interruptible`。保持文件打开的最简单方法是使用以下方式打开它：
- en: '[PRE84]'
  id: totrans-452
  prefs: []
  type: TYPE_PRE
  zh: '[PRE84]'
- en: This function changes the status of the task (a task is the kernel data structure
    which holds information about a process and the system call it is in, if any)
    to `TASK_INTERRUPTIBLE` , which means that the task will not run until it is woken
    up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file.
    Then, the function calls the scheduler to context switch to a different process,
    one which has some use for the CPU.
  id: totrans-453
  prefs: []
  type: TYPE_NORMAL
  zh: 这个函数将任务的状态（任务是一个内核数据结构，它包含有关进程及其（如果有的话）正在进行的系统调用的信息）更改为`TASK_INTERRUPTIBLE`，这意味着任务将不会运行，直到以某种方式被唤醒，并将其添加到等待队列中，即等待访问文件的队列。然后，该函数调用调度器以进行上下文切换到另一个进程，该进程对CPU有一些用途。
- en: When a process is done with the file, it closes it, and `module_close` is called.
    That function wakes up all the processes in the queue (there’s no mechanism to
    only wake up one of them). It then returns and the process which just closed the
    file can continue to run. In time, the scheduler decides that that process has
    had enough and gives control of the CPU to another process. Eventually, one of
    the processes which was in the queue will be given control of the CPU by the scheduler.
    It starts at the point right after the call to `wait_event_interruptible` .
  id: totrans-454
  prefs: []
  type: TYPE_NORMAL
  zh: 当一个进程完成对文件的访问后，它会关闭它，并调用`module_close`函数。该函数唤醒队列中的所有进程（没有机制可以只唤醒其中一个）。然后它返回，刚刚关闭文件的进程可以继续运行。随着时间的推移，调度器决定该进程已经足够了，并将CPU的控制权交给另一个进程。最终，队列中的一个进程将由调度器获得CPU的控制权。它从`wait_event_interruptible`调用之后的点开始执行。
- en: This means that the process is still in kernel mode - as far as the process
    is concerned, it issued the open system call and the system call has not returned
    yet. The process does not know somebody else used the CPU for most of the time
    between the moment it issued the call and the moment it returned.
  id: totrans-455
  prefs: []
  type: TYPE_NORMAL
  zh: 这意味着进程仍然在内核模式下——就进程而言，它发出了打开系统调用，而系统调用尚未返回。进程不知道在它发出调用和返回之间的大部分时间，有人 else 使用了CPU。
- en: It can then proceed to set a global variable to tell all the other processes
    that the file is still open and go on with its life. When the other processes
    get a piece of the CPU, they’ll see that global variable and go back to sleep.
  id: totrans-456
  prefs: []
  type: TYPE_NORMAL
  zh: 然后，它可以继续设置一个全局变量来告诉所有其他进程文件仍然打开，并继续其生命周期。当其他进程获得CPU的一部分时，他们会看到这个全局变量，然后再次进入睡眠状态。
- en: So we will use `tail -f` to keep the file open in the background, and attempt
    to access it with another background process. This way, we don’t need to switch
    to another terminal window or virtual terminal to run the second process. As soon
    as the first background process is killed with kill %1 , the second is woken up,
    is able to access the file and finally terminates.
  id: totrans-457
  prefs: []
  type: TYPE_NORMAL
  zh: 因此，我们将使用`tail -f`来在后台保持文件打开，并尝试用另一个后台进程访问它。这样，我们就不需要切换到另一个终端窗口或虚拟终端来运行第二个进程。一旦第一个后台进程被kill
    %1杀死，第二个进程就会被唤醒，能够访问文件，并最终终止。
- en: To make our life more interesting, `module_close` does not have a monopoly on
    waking up the processes which wait to access the file. A signal, such as Ctrl
    +c (SIGINT) can also wake up a process. This is because we used `wait_event_interruptible`
    . We could have used `wait_event` instead, but that would have resulted in extremely
    angry users whose Ctrl+c’s are ignored.
  id: totrans-458
  prefs: []
  type: TYPE_NORMAL
  zh: 为了让我们的生活更有趣，`module_close`并不独占唤醒等待访问文件的进程。一个信号，比如Ctrl + c（SIGINT），也可以唤醒一个进程。这是因为我们使用了`wait_event_interruptible`。我们本可以使用`wait_event`，但那样会导致用户非常愤怒，因为他们的Ctrl+c被忽略了。
- en: In that case, we want to return with `-EINTR` immediately. This is important
    so users can, for example, kill the process before it receives the file.
  id: totrans-459
  prefs: []
  type: TYPE_NORMAL
  zh: 在那种情况下，我们希望立即返回`-EINTR`。这很重要，这样用户可以在进程收到文件之前杀死它。
- en: There is one more point to remember. Some times processes don’t want to sleep,
    they want either to get what they want immediately, or to be told it cannot be
    done. Such processes use the `O_NONBLOCK` flag when opening the file. The kernel
    is supposed to respond by returning with the error code `-EAGAIN` from operations
    which would otherwise block, such as opening the file in this example. The program
    `cat_nonblock` , available in the examples/other directory, can be used to open
    a file with `O_NONBLOCK` .
  id: totrans-460
  prefs: []
  type: TYPE_NORMAL
  zh: 还有一点需要记住。有时进程不想睡眠，它们要么想要立即得到它们想要的，要么被告知无法完成。这类进程在打开文件时使用`O_NONBLOCK`标志。内核应该通过返回错误代码`-EAGAIN`来响应，这些操作在其他情况下会阻塞，例如在这个例子中打开文件。可以在examples/other目录中找到的`cat_nonblock`程序可以用来以`O_NONBLOCK`打开一个文件。
- en: '[PRE85]'
  id: totrans-461
  prefs: []
  type: TYPE_PRE
  zh: '[PRE85]'
- en: '[PRE86]'
  id: totrans-462
  prefs: []
  type: TYPE_PRE
  zh: '[PRE86]'
- en: '[PRE87]'
  id: totrans-463
  prefs: []
  type: TYPE_PRE
  zh: '[PRE87]'
- en: 11.2 Completions
  id: totrans-464
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 11.2 完成操作
- en: Sometimes one thing should happen before another within a module having multiple
    threads. Rather than using `/bin/sleep` commands, the kernel has another way to
    do this which allows timeouts or interrupts to also happen.
  id: totrans-465
  prefs: []
  type: TYPE_NORMAL
  zh: 有时在具有多个线程的模块中，一件事情应该在另一件事情之前发生。与其使用`/bin/sleep`命令，内核还有另一种方法来做这件事，这允许超时或中断也发生。
- en: Completions as code synchronization mechanism have three main parts, initialization
    of struct completion synchronization object, the waiting or barrier part through
    `wait_for_completion()` , and the signalling side through a call to `complete()`
    .
  id: totrans-466
  prefs: []
  type: TYPE_NORMAL
  zh: 完成操作作为代码同步机制有三个主要部分：结构体完成同步对象的初始化，通过`wait_for_completion()`的等待或屏障部分，以及通过调用`complete()`的信号部分。
- en: 'In the subsequent example, two threads are initiated: crank and flywheel. It
    is imperative that the crank thread starts before the flywheel thread. A completion
    state is established for each of these threads, with a distinct completion defined
    for both the crank and flywheel threads. At the exit point of each thread the
    respective completion state is updated, and `wait_for_completion` is used by the
    flywheel thread to ensure that it does not begin prematurely. The crank thread
    uses the `complete_all()` function to update the completion, which lets the flywheel
    thread continue.'
  id: totrans-467
  prefs: []
  type: TYPE_NORMAL
  zh: 在后续的示例中，启动了两个线程：crank和flywheel。必须确保crank线程在flywheel线程之前启动。为这些线程中的每一个都建立了一个完成状态，为crank和flywheel线程分别定义了不同的完成状态。在每个线程的退出点更新相应的完成状态，flywheel线程使用`wait_for_completion`来确保它不会提前开始。crank线程使用`complete_all()`函数来更新完成状态，这允许flywheel线程继续。
- en: So even though `flywheel_thread` is started first you should notice when you
    load this module and run `dmesg` , that turning the crank always happens first
    because the flywheel thread waits for the crank thread to complete.
  id: totrans-468
  prefs: []
  type: TYPE_NORMAL
  zh: 因此，即使`flywheel_thread`首先启动，你应该注意当你加载此模块并运行`dmesg`时，转动曲柄总是先发生，因为flywheel线程等待crank线程完成。
- en: There are other variations of the `wait_for_completion` function, which include
    timeouts or being interrupted, but this basic mechanism is enough for many common
    situations without adding a lot of complexity.
  id: totrans-469
  prefs: []
  type: TYPE_NORMAL
  zh: '`wait_for_completion` 函数有其他变体，包括超时或被中断，但这个基本机制对于许多常见情况来说已经足够，无需增加太多复杂性。'
- en: '[PRE88]'
  id: totrans-470
  prefs: []
  type: TYPE_PRE
  zh: '[PRE88]'
- en: 12 Synchronization
  id: totrans-471
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 12 同步
- en: If processes running on different CPUs or in different threads try to access
    the same memory, then it is possible that strange things can happen or your system
    can lock up. To avoid this, various types of mutual exclusion kernel functions
    are available. These indicate if a section of code is "locked" or "unlocked" so
    that simultaneous attempts to run it can not happen.
  id: totrans-472
  prefs: []
  type: TYPE_NORMAL
  zh: 如果在不同CPU上运行或在不同线程中运行的过程尝试访问相同的内存，那么可能会发生奇怪的事情，或者你的系统可能会锁定。为了避免这种情况，有各种类型的互斥锁内核函数可用。这些函数指示代码的某个部分是“锁定”还是“未锁定”，这样就不能同时尝试运行它。
- en: 12.1 Mutex
  id: totrans-473
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 12.1 互斥锁
- en: You can use kernel mutexes (mutual exclusions) in much the same manner that
    you might deploy them in userland. This may be all that is needed to avoid collisions
    in most cases.
  id: totrans-474
  prefs: []
  type: TYPE_NORMAL
  zh: 你可以使用内核互斥锁（互斥排他）的方式，就像你可能在用户空间部署它们一样。在大多数情况下，这可能就足够避免冲突了。
- en: 'Mutexes in the Linux kernel enforce strict ownership: only the task that successfully
    acquired the mutex can release (or unlock) it. Attempting to release a mutex held
    by another task or releasing an unheld mutex multiple times by the same task typically
    leads to errors or undefined behavior. If a task tries to lock a mutex it already
    holds, it may be blocked or sleep, where the task waits for itself to release
    the lock.'
  id: totrans-475
  prefs: []
  type: TYPE_NORMAL
  zh: Linux内核中的互斥锁强制执行严格的拥有权：只有成功获取互斥锁的任务才能释放（或解锁）它。尝试释放另一个任务持有的互斥锁或同一任务多次释放未持有的互斥锁通常会导致错误或未定义的行为。如果任务尝试锁定它已经持有的互斥锁，它可能会被阻塞或休眠，此时任务等待自己释放锁。
- en: Before use, a mutex must be initialized through specific APIs (such as `mutex_init`
    or by using the `DEFINE_MUTEX` macro for compile-time initialization). And it
    is prohibited to directly modify the internal structure of a mutex using a memory
    manipulation function like `memset` .
  id: totrans-476
  prefs: []
  type: TYPE_NORMAL
  zh: 在使用之前，必须通过特定的API（如`mutex_init`）或使用`DEFINE_MUTEX`宏进行编译时初始化来初始化互斥锁。并且禁止使用如`memset`这样的内存操作函数直接修改互斥锁的内部结构。
- en: '[PRE89]'
  id: totrans-477
  prefs: []
  type: TYPE_PRE
  zh: '[PRE89]'
- en: The various suffixes appended to mutex functions in the Linux kernel primarily
    dictate how a task waiting to acquire a lock will behave, particularly concerning
    its interruptibility.
  id: totrans-478
  prefs: []
  type: TYPE_NORMAL
  zh: Linux内核中附加到互斥锁函数的各种后缀主要决定了等待获取锁的任务将如何行为，特别是在可中断性方面。
- en: When a task calls `mutex_lock()` , and if the mutex is currently unavailable,
    the task enters a sleep state until it can successfully obtain the lock. During
    this period, the task cannot be interrupted. In contrast, functions with the `_interruptible`
    suffix, such as `mutex_lock_interruptible()` , behave similarly to `mutex_lock()`
    but allow the waiting process to be interrupted by signals. If a task receives
    a signal (like a termination signal) while waiting for the lock, it will exit
    the waiting state and return an error code ( `-EINTR` ). This is useful for applications
    that need to handle external events even while waiting for a lock.
  id: totrans-479
  prefs: []
  type: TYPE_NORMAL
  zh: 当一个任务调用 `mutex_lock()` 时，如果互斥锁当前不可用，该任务将进入睡眠状态，直到它成功获得锁。在此期间，任务不能被中断。相比之下，带有
    `_interruptible` 后缀的函数，例如 `mutex_lock_interruptible()`，其行为类似于 `mutex_lock()`，但允许等待进程被信号中断。如果一个任务在等待锁的过程中收到信号（如终止信号），它将退出等待状态并返回一个错误代码（`-EINTR`）。这对于需要即使在等待锁的同时处理外部事件的应用程序来说很有用。
- en: Beyond these fundamental locking behaviors, other mutex functions offer specialized
    capabilities. Functions like `mutex_lock_nested` and `mutex_lock_interruptible_nested()`
    incorporate the `__nested()` functionality, providing support for nested locking.
    This prior locking mechanism aids in managing lock acquisition and preventing
    deadlocks, often employing a subclass parameter for more precise deadlock detection.
    The latter variant combines nested locking with the ability for the waiting process
    to be interrupted by signals. Another function is `mutex_trylock()` , which attempts
    to acquire the mutex without blocking. It returns 1 if the lock is successfully
    acquired and 0 if the mutex is already held by another task.
  id: totrans-480
  prefs: []
  type: TYPE_NORMAL
  zh: 除了这些基本的锁定行为之外，其他互斥锁函数还提供了专门的功能。例如，`mutex_lock_nested` 和 `mutex_lock_interruptible_nested()`
    函数结合了 `__nested()` 功能，提供了嵌套锁定的支持。这种先前的锁定机制有助于管理锁获取并防止死锁，通常使用子类参数进行更精确的死锁检测。后一种变体将嵌套锁定与等待进程可以被信号中断的能力相结合。另一个函数是
    `mutex_trylock()`，它尝试获取互斥锁而不阻塞。如果成功获取锁，则返回1；如果互斥锁已被其他任务持有，则返回0。
- en: Despite the fact that `mutex_trylock` does not sleep, it is still generally
    not safe for use in interrupt context because its implementation isn’t atomic.
    If an interrupt occurs between checking the lock’s availability and its acquisition,
    this can lead to race conditions and potential data corruption.
  id: totrans-481
  prefs: []
  type: TYPE_NORMAL
  zh: 尽管`mutex_trylock`不睡眠，但由于其实现不是原子的，因此在中断上下文中通常不安全使用。如果在检查锁的可用性和获取锁之间发生中断，这可能导致竞争条件和潜在的数据损坏。
- en: 12.2 Spinlocks
  id: totrans-482
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 12.2 自旋锁
- en: As the name suggests, spinlocks lock up the CPU that the code is running on,
    taking 100% of its resources. Because of this you should only use the spinlock
    mechanism around code which is likely to take no more than a few milliseconds
    to run and so will not noticeably slow anything down from the user’s point of
    view.
  id: totrans-483
  prefs: []
  type: TYPE_NORMAL
  zh: 如其名所示，自旋锁锁定正在运行的代码的CPU，占用其100%的资源。因此，你应该只在代码可能运行不超过几毫秒且不会明显减慢用户视角中的任何事物的情况下使用自旋锁机制。
- en: The example here is "irq safe" in that if interrupts happen during the lock
    then they will not be forgotten and will activate when the unlock happens, using
    the `flags` variable to retain their state.
  id: totrans-484
  prefs: []
  type: TYPE_NORMAL
  zh: 此处的示例是“中断安全”的，即如果在锁定过程中发生中断，则它们不会被遗忘，并在解锁时通过使用 `flags` 变量保留其状态激活。
- en: '[PRE90]'
  id: totrans-485
  prefs: []
  type: TYPE_PRE
  zh: '[PRE90]'
- en: Taking 100% of a CPU’s resources comes with greater responsibility. Situations
    where the kernel code monopolizes a CPU are called atomic contexts. Holding a
    spinlock is one of those situations. Sleeping in atomic contexts may leave the
    system hanging, as the occupied CPU devotes 100% of its resources doing nothing
    but sleeping. In some worse cases the system may crash. Thus, sleeping in atomic
    contexts is considered a bug in the kernel. They are sometimes called “sleep-in-atomic-context”
    in some materials.
  id: totrans-486
  prefs: []
  type: TYPE_NORMAL
  zh: 占用CPU的100%资源伴随着更大的责任。内核代码垄断CPU的情况被称为原子上下文。持有自旋锁就是这种情况之一。在原子上下文中睡眠可能会导致系统挂起，因为占用的CPU将100%的资源用于无休止的睡眠。在某些更糟糕的情况下，系统可能会崩溃。因此，在原子上下文中睡眠被视为内核中的错误。在某些材料中，它们有时被称为“原子上下文中的睡眠”。
- en: Note that sleeping here is not limited to calling the sleep functions explicitly.
    If subsequent function calls eventually invoke a function that sleeps, it is also
    considered sleeping. Thus, it is important to pay attention to functions being
    used in atomic context. There’s no documentation recording all such functions,
    but code comments may help. Sometimes you may find comments in kernel source code
    stating that a function “may sleep”, “might sleep”, or more explicitly “the caller
    should not hold a spinlock”. Those comments are hints that a function may implicitly
    sleep and must not be called in atomic contexts.
  id: totrans-487
  prefs: []
  type: TYPE_NORMAL
  zh: 注意，这里的睡眠不仅限于显式调用睡眠函数。如果后续的函数调用最终调用了会睡眠的函数，这也被认为是睡眠。因此，注意在原子上下文中使用的函数非常重要。没有文档记录所有这些函数，但代码注释可能会有所帮助。有时你可能会在内核源代码中找到注释，表明一个函数“可能会睡眠”、“可能睡眠”或更明确地说“调用者不应持有自旋锁”。这些注释是提示，表明一个函数可能会隐式睡眠，并且不应在原子上下文中调用。
- en: 'Now, let’s differentiate between a few types of spinlock functions in the Linux
    kernel: `spin_lock()` , `spin_lock_irq()` , `spin_lock_irqsave()` , and `spin_lock_bh()`
    .'
  id: totrans-488
  prefs: []
  type: TYPE_NORMAL
  zh: 现在，让我们区分 Linux 内核中几种自旋锁函数的类型：`spin_lock()`、`spin_lock_irq()`、`spin_lock_irqsave()`
    和 `spin_lock_bh()`。
- en: '`spin_lock()` does not allow the CPU to sleep while waiting for the lock, which
    makes it suitable for most use cases where the critical section is short. However,
    this is problematic for real-time Linux because spinlocks in this configuration
    behave as sleeping locks. This can prevent other tasks from running and cause
    the system to become unresponsive. To address this in real-time Linux environments,
    a `raw_spin_lock()` is used, which behaves similarly to a `spin_lock()` but without
    causing the system to sleep.'
  id: totrans-489
  prefs: []
  type: TYPE_NORMAL
  zh: '`spin_lock()` 不允许 CPU 在等待锁时睡眠，这使得它在临界区短的情况下大多数用例中都很适用。然而，这对于实时 Linux 来说是问题，因为这种配置下的自旋锁表现得像睡眠锁。这可能会阻止其他任务运行，并导致系统无响应。为了在实时
    Linux 环境中解决这个问题，使用了一个 `raw_spin_lock()`，它表现得像 `spin_lock()`，但不会导致系统睡眠。'
- en: On the other hand, `spin_lock_irq()` disables interrupts while holding the lock,
    but it does not save the interrupt state. This means that if an interrupt occurs
    while the lock is held, the interrupt state could be lost. In contrast, `spin_lock_irqsave()`
    disables interrupts and also saves the interrupt state, ensuring that interrupts
    are restored to their previous state when the lock is released. This makes `spin_lock_irqsave()`
    a safer option in scenarios where preserving the interrupt state is crucial.
  id: totrans-490
  prefs: []
  type: TYPE_NORMAL
  zh: 另一方面，`spin_lock_irq()` 在持有锁的同时禁用中断，但它不会保存中断状态。这意味着如果在持有锁的过程中发生中断，中断状态可能会丢失。相比之下，`spin_lock_irqsave()`
    禁用中断并保存中断状态，确保在释放锁时中断恢复到其之前的状态。这使得 `spin_lock_irqsave()` 在需要保留中断状态的关键场景中成为一个更安全的选项。
- en: Next, `spin_lock_bh()` disables softirqs (software interrupts) but allows hardware
    interrupts to continue. Unlike `spin_lock_irq()` and `spin_lock_irqsave()` , which
    disable both hardware and software interrupts, `spin_lock_bh()` is useful when
    hardware interrupts need to remain active.
  id: totrans-491
  prefs: []
  type: TYPE_NORMAL
  zh: 接下来，`spin_lock_bh()` 禁用软中断（软件中断），但允许硬件中断继续。与 `spin_lock_irq()` 和 `spin_lock_irqsave()`
    不同，它们禁用硬件和软件中断，`spin_lock_bh()` 在需要保持硬件中断活跃时非常有用。
- en: 'For more information about spinlock usage and lock types, see the following
    resources:'
  id: totrans-492
  prefs: []
  type: TYPE_NORMAL
  zh: 关于自旋锁的使用和锁类型的信息，请参阅以下资源：
- en: '[Lesson 1: Spin locks](https://www.kernel.org/doc/Documentation/locking/spinlocks.txt)'
  id: totrans-493
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[课程 1：自旋锁](https://www.kernel.org/doc/Documentation/locking/spinlocks.txt)'
- en: '[Lock types and their rules](https://docs.kernel.org/locking/locktypes.html)'
  id: totrans-494
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[锁类型及其规则](https://docs.kernel.org/locking/locktypes.html)'
- en: 12.3 Read and write locks
  id: totrans-495
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 12.3 读写锁
- en: Read and write locks are specialised kinds of spinlocks so that you can exclusively
    read from something or write to something. Like the earlier spinlocks example,
    the one below shows an "irq safe" situation in which if other functions were triggered
    from irqs which might also read and write to whatever you are concerned with then
    they would not disrupt the logic. As before it is a good idea to keep anything
    done within the lock as short as possible so that it does not hang up the system
    and cause users to start revolting against the tyranny of your module.
  id: totrans-496
  prefs: []
  type: TYPE_NORMAL
  zh: 读写锁是特殊的自旋锁，这样你可以独占地读取或写入某个东西。像之前的自旋锁示例一样，下面的示例展示了“中断安全”的情况，如果其他函数从中断触发，而这些中断也可能读取和写入你关心的东西，那么它们不会破坏逻辑。和之前一样，最好将锁内完成的任何操作尽可能保持简短，以免系统挂起并导致用户开始反抗你模块的暴政。
- en: '[PRE91]'
  id: totrans-497
  prefs: []
  type: TYPE_PRE
  zh: '[PRE91]'
- en: Of course, if you know for sure that there are no functions triggered by irqs
    which could possibly interfere with your logic then you can use the simpler `read_lock(&myrwlock)`
    and `read_unlock(&myrwlock)` or the corresponding write functions.
  id: totrans-498
  prefs: []
  type: TYPE_NORMAL
  zh: 当然，如果你确定没有由中断触发的功能可能会干扰你的逻辑，那么你可以使用更简单的 `read_lock(&myrwlock)` 和 `read_unlock(&myrwlock)`
    或相应的写函数。
- en: 12.4 Atomic operations
  id: totrans-499
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 12.4 原子操作
- en: 'If you are doing simple arithmetic: adding, subtracting or bitwise operations,
    then there is another way in the multi-CPU and multi-hyperthreaded world to stop
    other parts of the system from messing with your mojo. By using atomic operations
    you can be confident that your addition, subtraction or bit flip did actually
    happen and was not overwritten by some other shenanigans. An example is shown
    below.'
  id: totrans-500
  prefs: []
  type: TYPE_NORMAL
  zh: 如果你正在进行简单的算术运算：加法、减法或位操作，那么在多CPU和多超线程的世界中，还有另一种方法可以阻止系统的其他部分干扰你的操作。通过使用原子操作，你可以确信你的加法、减法或位翻转确实发生了，并且没有被其他一些恶作剧覆盖。以下是一个示例。
- en: '[PRE92]'
  id: totrans-501
  prefs: []
  type: TYPE_PRE
  zh: '[PRE92]'
- en: 'Before the C11 standard adopted the built-in atomic types, the kernel already
    provided a small set of atomic types by using a bunch of tricky architecture-specific
    codes. Implementing the atomic types by C11 atomics may allow the kernel to throw
    away the architecture-specific codes and make the kernel code be more friendly
    to the people who understand the standard. But there are some problems, such as
    the memory model of the kernel doesn’t match the model formed by the C11 atomics.
    For further details, see:'
  id: totrans-502
  prefs: []
  type: TYPE_NORMAL
  zh: 在C11标准采用内置原子类型之前，内核已经通过使用一些复杂的架构特定代码提供了一小套原子类型。通过C11原子操作实现原子类型可能允许内核丢弃架构特定代码，并使内核代码对理解标准的人更加友好。但是，存在一些问题，例如内核的内存模型与C11原子操作形成的模型不匹配。有关更多详细信息，请参阅：
- en: '[kernel documentation of atomic types](https://www.kernel.org/doc/Documentation/atomic_t.txt)'
  id: totrans-503
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[原子类型内核文档](https://www.kernel.org/doc/Documentation/atomic_t.txt)'
- en: '[Time to move to C11 atomics?](https://lwn.net/Articles/691128/)'
  id: totrans-504
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[是时候迁移到C11原子操作了吗？](https://lwn.net/Articles/691128/)'
- en: '[Atomic usage patterns in the kernel](https://lwn.net/Articles/698315/)'
  id: totrans-505
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '[内核中的原子使用模式](https://lwn.net/Articles/698315/)'
- en: 13 Replacing Print Macros
  id: totrans-506
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 13 替换打印宏
- en: 13.1 Replacement
  id: totrans-507
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 13.1 替换
- en: In [Section 1.7](#before-we-begin), it was noted that the X Window System and
    kernel module programming are not conducive to integration. This remains valid
    during the development of kernel modules. However, in practical scenarios, the
    necessity emerges to relay messages to the tty (teletype) originating the module
    load command.
  id: totrans-508
  prefs: []
  type: TYPE_NORMAL
  zh: 在[第1.7节](#before-we-begin)中，指出X窗口系统和内核模块编程不利于集成。这在内核模块开发期间仍然有效。然而，在实际场景中，有必要将消息传递到产生模块加载命令的tty（电传打字机）中。
- en: The term “tty” originates from teletype, which initially referred to a combined
    keyboard-printer for Unix system communication. Today, it signifies a text stream
    abstraction employed by Unix programs, encompassing physical terminals, xterms
    in X displays, and network connections like SSH.
  id: totrans-509
  prefs: []
  type: TYPE_NORMAL
  zh: “tty”这个术语起源于电传打字机，最初指的是Unix系统通信的键盘打印机组合。今天，它表示Unix程序使用的文本流抽象，包括物理终端、X显示中的xterms以及SSH等网络连接。
- en: To achieve this, the “current” pointer is leveraged to access the active task’s
    tty structure. Within this structure lies a pointer to a string write function,
    facilitating the string’s transmission to the tty.
  id: totrans-510
  prefs: []
  type: TYPE_NORMAL
  zh: 为了实现这一点，利用“当前”指针来访问活动任务的tty结构。在这个结构中，有一个指向字符串写函数的指针，它有助于将字符串传输到tty。
- en: '[PRE93]'
  id: totrans-511
  prefs: []
  type: TYPE_PRE
  zh: '[PRE93]'
- en: 13.2 Flashing keyboard LEDs
  id: totrans-512
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 13.2 闪烁键盘LED
- en: 'In certain conditions, you may desire a simpler and more direct way to communicate
    to the external world. Flashing keyboard LEDs can be such a solution: It is an
    immediate way to attract attention or to display a status condition. Keyboard
    LEDs are present on every hardware, they are always visible, they do not need
    any setup, and their use is rather simple and non-intrusive, compared to writing
    to a tty or a file.'
  id: totrans-513
  prefs: []
  type: TYPE_NORMAL
  zh: 在某些条件下，你可能希望有一种更简单、更直接的方式与外部世界通信。闪烁键盘LED可以是一个解决方案：这是一种立即吸引注意或显示状态条件的方法。键盘LED存在于每个硬件上，它们总是可见的，不需要任何设置，并且与写入tty或文件相比，它们的使用相当简单且不具侵入性。
- en: 'From v4.14 to v4.15, the timer API made a series of changes to improve memory
    safety. A buffer overflow in the area of a `timer_list` structure may be able
    to overwrite the `function` and `data` fields, providing the attacker with a way
    to use return-oriented programming (ROP) to call arbitrary functions within the
    kernel. Also, the function prototype of the callback, containing an `unsigned long`
    argument, will prevent the compiler from performing type checking. Furthermore,
    the function prototype with `unsigned long` argument may be an obstacle to the
    forward-edge protection of control-flow integrity. Thus, it is better to use a
    unique prototype to separate from the cluster that takes an `unsigned long` argument.
    The timer callback should be passed a pointer to the `timer_list` structure rather
    than an `unsigned long` argument. Then, it wraps all the information the callback
    needs, including the `timer_list` structure, into a larger structure, and it can
    use the `container_of` macro instead of the `unsigned long` value. For more information,
    see: [Improving the kernel timers API](https://lwn.net/Articles/735887/).'
  id: totrans-514
  prefs: []
  type: TYPE_NORMAL
  zh: 从 v4.14 到 v4.15，定时器 API 进行了一系列更改，以提高内存安全性。`timer_list` 结构区域中的缓冲区溢出可能会覆盖 `function`
    和 `data` 字段，为攻击者提供使用返回导向编程（ROP）在内核中调用任意函数的方法。此外，包含 `unsigned long` 参数的回调函数原型将阻止编译器执行类型检查。此外，具有
    `unsigned long` 参数的函数原型可能成为控制流完整性的前向保护障碍。因此，最好使用独特的原型来与接受 `unsigned long` 参数的簇分开。定时器回调应该传递
    `timer_list` 结构的指针而不是 `unsigned long` 参数。然后，它将回调所需的所有信息，包括 `timer_list` 结构，封装到一个更大的结构中，并且可以使用
    `container_of` 宏而不是 `unsigned long` 值。有关更多信息，请参阅：[改进内核定时器 API](https://lwn.net/Articles/735887/)。
- en: 'Before Linux v4.14, `setup_timer` was used to initialize the timer and the
    `timer_list` structure looked like:'
  id: totrans-515
  prefs: []
  type: TYPE_NORMAL
  zh: 在 Linux v4.14 之前，`setup_timer` 用于初始化定时器和 `timer_list` 结构看起来如下：
- en: '[PRE94]'
  id: totrans-516
  prefs: []
  type: TYPE_PRE
  zh: '[PRE94]'
- en: Since Linux v4.14, `timer_setup` is adopted and the kernel step by step converting
    to `timer_setup` from `setup_timer` . One of the reasons why the API was changed
    is that it needed to coexist with the old version of the interface. Moreover,
    the `timer_setup` was implemented by `setup_timer` at first.
  id: totrans-517
  prefs: []
  type: TYPE_NORMAL
  zh: 自从 Linux v4.14 以来，`timer_setup` 被采用，内核逐步从 `setup_timer` 转换到 `timer_setup`。API
    变更的原因之一是它需要与旧版本的接口共存。此外，`timer_setup` 最初是由 `setup_timer` 实现的。
- en: '[PRE95]'
  id: totrans-518
  prefs: []
  type: TYPE_PRE
  zh: '[PRE95]'
- en: The `setup_timer` was then removed since v4.15\. As a result, the `timer_list`
    structure had changed to the following.
  id: totrans-519
  prefs: []
  type: TYPE_NORMAL
  zh: 从 v4.15 版本开始，`setup_timer` 被移除。因此，`timer_list` 结构发生了以下变化。
- en: '[PRE96]'
  id: totrans-520
  prefs: []
  type: TYPE_PRE
  zh: '[PRE96]'
- en: The following source code illustrates a minimal kernel module which, when loaded,
    starts blinking the keyboard LEDs until it is unloaded.
  id: totrans-521
  prefs: []
  type: TYPE_NORMAL
  zh: 以下源代码演示了一个最小的内核模块，当加载时，它会闪烁键盘 LED，直到卸载。
- en: '[PRE97]'
  id: totrans-522
  prefs: []
  type: TYPE_PRE
  zh: '[PRE97]'
- en: If none of the examples in this chapter fit your debugging needs, there might
    yet be some other tricks to try. Ever wondered what `CONFIG_LL_DEBUG` in `make menuconfig`
    is good for? If you activate that you get low level access to the serial port.
    While this might not sound very powerful by itself, you can patch [kernel/printk.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk.c)
    or any other essential syscall to print ASCII characters, thus making it possible
    to trace virtually everything what your code does over a serial line. If you find
    yourself porting the kernel to some new and former unsupported architecture, this
    is usually amongst the first things that should be implemented. Logging over a
    netconsole might also be worth a try.
  id: totrans-523
  prefs: []
  type: TYPE_NORMAL
  zh: 如果本章中的任何示例都不符合你的调试需求，可能还有一些其他技巧可以尝试。你是否想过 `make menuconfig` 中的 `CONFIG_LL_DEBUG`
    是什么作用？如果你激活它，你将获得对串行端口的低级访问。虽然这本身可能听起来并不强大，但你可以在 [kernel/printk.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk.c)
    或任何其他基本系统调用中打补丁，以打印 ASCII 字符，从而使你能够在串行线上追踪代码执行的几乎所有内容。如果你发现自己正在将内核移植到某些新的、以前不支持的平台，这通常是应该首先实现的事情之一。尝试通过
    netconsole 进行日志记录也可能值得尝试。
- en: While you have seen lots of stuff that can be used to aid debugging here, there
    are some things to be aware of. Debugging is almost always intrusive. Adding debug
    code can change the situation enough to make the bug seem to disappear. Thus,
    you should keep debug code to a minimum and make sure it does not show up in production
    code.
  id: totrans-524
  prefs: []
  type: TYPE_NORMAL
  zh: 虽然在这里你已经看到了很多可以用来辅助调试的内容，但还有一些事情需要注意。调试几乎总是具有侵入性。添加调试代码可能会改变足够多的环境，使得错误看起来似乎消失了。因此，你应该将调试代码保持在最小，并确保它不会出现在生产代码中。
- en: 14 GPIO
  id: totrans-525
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 14 个 GPIO
- en: 14.1 GPIO
  id: totrans-526
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 14.1 GPIO
- en: 'General Purpose Input/Output (GPIO) appears on the development board as pins.
    It acts as a bridge for communication between the development board and external
    devices. You can think of it like a switch: users can turn it on or off (Input),
    and the development board can also turn it on or off (Output).'
  id: totrans-527
  prefs: []
  type: TYPE_NORMAL
  zh: 通用输入/输出（GPIO）在开发板上表现为引脚。它作为开发板与外部设备之间通信的桥梁。你可以将其想象成一个开关：用户可以打开或关闭（输入），开发板也可以打开或关闭（输出）。
- en: To implement a GPIO device driver, you use the `gpio_request()` function to
    enable a specific GPIO pin. After successfully enabling it, you can check that
    the pin is being used by looking at /sys/kernel/debug/gpio.
  id: totrans-528
  prefs: []
  type: TYPE_NORMAL
  zh: 要实现GPIO设备驱动程序，你使用`gpio_request()`函数来启用一个特定的GPIO引脚。启用成功后，你可以通过查看/sys/kernel/debug/gpio来检查该引脚是否正在使用。
- en: '[PRE98]'
  id: totrans-529
  prefs: []
  type: TYPE_PRE
  zh: '[PRE98]'
- en: There are other ways to register GPIOs. For example, you can use `gpio_request_one()`
    to register a GPIO while setting its direction (input or output) and initial state
    at the same time. You can also use `gpio_request_array()` to register multiple
    GPIOs at once. However, note that `gpio_request_array()` has been removed since
    Linux v6.10.
  id: totrans-530
  prefs: []
  type: TYPE_NORMAL
  zh: 有其他方法可以注册GPIO。例如，你可以在设置其方向（输入或输出）和初始状态的同时使用`gpio_request_one()`来注册一个GPIO。你也可以使用`gpio_request_array()`一次性注册多个GPIO。但是请注意，`gpio_request_array()`自Linux
    v6.10以来已被删除。
- en: When using GPIO, you must set it as either output with `gpio_direction_output()`
    or input with `gpio_direction_input()` .
  id: totrans-531
  prefs: []
  type: TYPE_NORMAL
  zh: 当使用GPIO时，你必须使用`gpio_direction_output()`将其设置为输出，或使用`gpio_direction_input()`将其设置为输入。
- en: when the GPIO is set as output, you can use `gpio_set_value()` to choose to
    set it to high voltage or low voltage.
  id: totrans-532
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 当GPIO设置为输出时，你可以使用`gpio_set_value()`来选择将其设置为高电压或低电压。
- en: when the GPIO is set as input, you can use `gpio_get_value()` to read whether
    the voltage is high or low.
  id: totrans-533
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 当GPIO设置为输入时，你可以使用`gpio_get_value()`来读取电压是高还是低。
- en: 14.2 Control the LED’s on/off state
  id: totrans-534
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 14.2 控制LED的开关状态
- en: In [Section 9](#talking-to-device-files), we learned how to communicate with
    device files. Therefore, we will further use device files to control the LED on
    and off.
  id: totrans-535
  prefs: []
  type: TYPE_NORMAL
  zh: 在[第9节](#talking-to-device-files)中，我们学习了如何与设备文件通信。因此，我们将进一步使用设备文件来控制LED的开关。
- en: In the implementation, a pull-down resistor is used. The anode of the LED is
    connected to GPIO4, and the cathode is connected to GND. For more details about
    the Raspberry Pi pin assignments, refer to [Raspberry Pi Pinout](https://pinout.xyz/).
    The materials used include a Raspberry Pi 5, an LED, jumper wires, and a 220Ω
    resistor.
  id: totrans-536
  prefs: []
  type: TYPE_NORMAL
  zh: 在实现中，使用了一个下拉电阻。LED的正极连接到GPIO4，负极连接到GND。有关Raspberry Pi引脚分配的更多详细信息，请参阅[Raspberry
    Pi引脚分配](https://pinout.xyz/)。所使用的材料包括Raspberry Pi 5、LED、跳线和220Ω电阻。
- en: '[PRE99]'
  id: totrans-537
  prefs: []
  type: TYPE_PRE
  zh: '[PRE99]'
- en: 'Make and install the module:'
  id: totrans-538
  prefs: []
  type: TYPE_NORMAL
  zh: 创建并安装模块：
- en: '[PRE100]'
  id: totrans-539
  prefs: []
  type: TYPE_PRE
  zh: '[PRE100]'
- en: 'Switch on the LED:'
  id: totrans-540
  prefs: []
  type: TYPE_NORMAL
  zh: 打开LED：
- en: '[PRE101]'
  id: totrans-541
  prefs: []
  type: TYPE_PRE
  zh: '[PRE101]'
- en: 'Switch off the LED:'
  id: totrans-542
  prefs: []
  type: TYPE_NORMAL
  zh: 关闭LED：
- en: '[PRE102]'
  id: totrans-543
  prefs: []
  type: TYPE_PRE
  zh: '[PRE102]'
- en: 'Finally, remove the module:'
  id: totrans-544
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，移除模块：
- en: '[PRE103]'
  id: totrans-545
  prefs: []
  type: TYPE_PRE
  zh: '[PRE103]'
- en: 14.3 DHT11 sensor
  id: totrans-546
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 14.3 DHT11传感器
- en: The DHT11 sensor is a well-known entry-level sensor commonly used to measure
    humidity and temperature. In this subsection, we will use GPIO to communicate
    through a single data line. The DHT11 communication protocol can be referred to
    in the [datasheet](https://www.mouser.com/datasheet/2/758/DHT11-Technical-Data-Sheet-Translated-Version-1143054.pdf?srsltid=AfmBOoppls-QTd864640bVtbK90sWBsFzJ_7SgjOD2EpwuLLGUSTyYnv).
  id: totrans-547
  prefs: []
  type: TYPE_NORMAL
  zh: DHT11传感器是一种常见的入门级传感器，常用于测量湿度和温度。在本小节中，我们将使用GPIO通过单条数据线进行通信。DHT11通信协议可参考[数据表](https://www.mouser.com/datasheet/2/758/DHT11-Technical-Data-Sheet-Translated-Version-1143054.pdf?srsltid=AfmBOoppls-QTd864640bVtbK90sWBsFzJ_7SgjOD2EpwuLLGUSTyYnv)。
- en: In the implementation, the data pin of the DHT11 sensor is connected to GPIO4
    on the Raspberry Pi. The sensor’s VCC and GND pins are connected to 3.3V and GND,
    respectively. For more details about the Raspberry Pi pin assignments, refer to
    [Raspberry Pi Pinout](https://pinout.xyz/). The materials used include a Raspberry
    Pi 5, a DHT11 sensor, and jumper wires.
  id: totrans-548
  prefs: []
  type: TYPE_NORMAL
  zh: 在实现中，DHT11传感器的数据引脚连接到Raspberry Pi的GPIO4。传感器的VCC和GND引脚分别连接到3.3V和GND。有关Raspberry
    Pi引脚分配的更多详细信息，请参阅[Raspberry Pi引脚分配](https://pinout.xyz/)。所使用的材料包括Raspberry Pi
    5、DHT11传感器和跳线。
- en: '[PRE104]'
  id: totrans-549
  prefs: []
  type: TYPE_PRE
  zh: '[PRE104]'
- en: 'Make and install the module:'
  id: totrans-550
  prefs: []
  type: TYPE_NORMAL
  zh: 创建并安装模块：
- en: '[PRE105]'
  id: totrans-551
  prefs: []
  type: TYPE_PRE
  zh: '[PRE105]'
- en: 'Check the Output of the DHT11 Sensor:'
  id: totrans-552
  prefs: []
  type: TYPE_NORMAL
  zh: 检查DHT11传感器的输出：
- en: '[PRE106]'
  id: totrans-553
  prefs: []
  type: TYPE_PRE
  zh: '[PRE106]'
- en: 'Expected Output:'
  id: totrans-554
  prefs: []
  type: TYPE_NORMAL
  zh: 预期输出：
- en: '[PRE107]'
  id: totrans-555
  prefs: []
  type: TYPE_PRE
  zh: '[PRE107]'
- en: 'Finally, remove the module:'
  id: totrans-556
  prefs: []
  type: TYPE_NORMAL
  zh: 最后，移除模块：
- en: '[PRE108]'
  id: totrans-557
  prefs: []
  type: TYPE_PRE
  zh: '[PRE108]'
- en: 15 Scheduling Tasks
  id: totrans-558
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 15 调度任务
- en: 'There are two main ways of running tasks: tasklets and work queues. Tasklets
    are a quick and easy way of scheduling a single function to be run. For example,
    when triggered from an interrupt, whereas work queues are more complicated but
    also better suited to running multiple things in a sequence.'
  id: totrans-559
  prefs: []
  type: TYPE_NORMAL
  zh: 运行任务主要有两种方式：任务和作业队列。任务是一种快速简便的方式来安排单个函数的执行。例如，当从中断触发时，而作业队列则更复杂，但更适合按顺序运行多个任务。
- en: It is possible that in future tasklets may be replaced by threaded IRQs. However,
    discussion about that has been ongoing since 2007 ([Eliminating tasklets](https://lwn.net/Articles/239633)
    and [The end of tasklets](https://lwn.net/Articles/960041/)), so expecting immediate
    changes would be unwise. See the [Section 16.1](#interrupt-handlers1) for alternatives
    that avoid the tasklet debate.
  id: totrans-560
  prefs: []
  type: TYPE_NORMAL
  zh: 未来任务可能会被线程化中断所取代。然而，关于这一问题的讨论自2007年以来一直在进行（[消除任务](https://lwn.net/Articles/239633)
    和 [任务结束](https://lwn.net/Articles/960041/)），因此期望立即发生变化是不明智的。有关避免任务辩论的替代方案，请参阅[第16.1节](#interrupt-handlers1)。
- en: 15.1 Tasklets
  id: totrans-561
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 15.1 任务
- en: Here is an example tasklet module. The `tasklet_fn` function runs for a few
    seconds. In the meantime, execution of the `example_tasklet_init` function may
    continue to the exit point, depending on whether it is interrupted by softirq.
  id: totrans-562
  prefs: []
  type: TYPE_NORMAL
  zh: 这里有一个示例任务模块。`tasklet_fn` 函数运行几秒钟。在此期间，`example_tasklet_init` 函数的执行可能会继续到退出点，具体取决于它是否被软中断中断。
- en: '[PRE109]'
  id: totrans-563
  prefs: []
  type: TYPE_PRE
  zh: '[PRE109]'
- en: 'So with this example loaded `dmesg` should show:'
  id: totrans-564
  prefs: []
  type: TYPE_NORMAL
  zh: 因此，加载此示例后，`dmesg` 应该会显示：
- en: '[PRE110]'
  id: totrans-565
  prefs: []
  type: TYPE_PRE
  zh: '[PRE110]'
- en: Although tasklet is easy to use, it comes with several drawbacks, and developers
    have been discussing their removal from the Linux kernel. The tasklet callback
    runs in atomic context, inside a software interrupt, meaning that it cannot sleep
    or access user-space data, so not all work can be done in a tasklet handler. Also,
    the kernel only allows one instance of any given tasklet to be running at any
    given time; multiple different tasklet callbacks can run in parallel.
  id: totrans-566
  prefs: []
  type: TYPE_NORMAL
  zh: 虽然任务使用起来很简单，但它存在一些缺点，开发者一直在讨论将其从Linux内核中移除。任务回调在原子上下文中运行，在软件中断内部，这意味着它不能休眠或访问用户空间数据，因此并非所有工作都可以在任务处理程序中完成。此外，内核只允许在任何给定时间运行任何给定任务的实例；多个不同的任务回调可以并行运行。
- en: In recent kernels, tasklets can be replaced by workqueues, timers, or threaded
    interrupts. [²](#fn2x0) While the removal of tasklets remains a longer-term goal,
    the current kernel contains more than a hundred uses of tasklets. Now developers
    are proceeding with the API changes and the macro `DECLARE_TASKLET_OLD` exists
    for compatibility. For further information, see [https://lwn.net/Articles/830964/](https://lwn.net/Articles/830964/).
  id: totrans-567
  prefs: []
  type: TYPE_NORMAL
  zh: 在最近的内核中，任务可以被工作队列、定时器或线程化中断所取代。[²](#fn2x0) 虽然移除任务仍然是长期目标，但当前内核中包含超过一百个任务的使用。现在开发者正在推进API更改，并存在宏`DECLARE_TASKLET_OLD`以实现兼容性。有关更多信息，请参阅[https://lwn.net/Articles/830964/](https://lwn.net/Articles/830964/)。
- en: 15.2 Work queues
  id: totrans-568
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 15.2 作业队列
- en: To add a task to the scheduler we can use a workqueue. The kernel then uses
    the Completely Fair Scheduler (CFS) to execute work within the queue.
  id: totrans-569
  prefs: []
  type: TYPE_NORMAL
  zh: 要将任务添加到调度器，我们可以使用工作队列。内核随后使用完全公平调度器（CFS）在队列中执行工作。
- en: '[PRE111]'
  id: totrans-570
  prefs: []
  type: TYPE_PRE
  zh: '[PRE111]'
- en: 16 Interrupt Handlers
  id: totrans-571
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 16 中断处理程序
- en: 16.1 Interrupt Handlers
  id: totrans-572
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 16.1 中断处理程序
- en: Except for the last chapter, everything we did in the kernel so far we have
    done as a response to a process asking for it, either by dealing with a special
    file, sending an `ioctl()` , or issuing a system call. But the job of the kernel
    is not just to respond to process requests. Another job, which is every bit as
    important, is to speak to the hardware connected to the machine.
  id: totrans-573
  prefs: []
  type: TYPE_NORMAL
  zh: 除了最后一章之外，到目前为止我们在内核中所做的一切都是作为对进程请求的响应而进行的，无论是通过处理特殊文件、发送`ioctl()`还是发出系统调用。但内核的工作不仅仅是响应进程请求。另一个同样重要的任务是与连接到机器的硬件进行通信。
- en: There are two types of interaction between the CPU and the rest of the computer’s
    hardware. The first type is when the CPU gives orders to the hardware, the other
    is when the hardware needs to tell the CPU something. The second, called interrupts,
    is much harder to implement because it has to be dealt with when convenient for
    the hardware, not the CPU. Hardware devices typically have a very small amount
    of RAM, and if you do not read their information when available, it is lost.
  id: totrans-574
  prefs: []
  type: TYPE_NORMAL
  zh: CPU与计算机其他硬件之间的交互有两种类型。第一种类型是CPU向硬件下达命令，另一种类型是硬件需要通知CPU某些信息。第二种，称为中断，由于它必须在硬件方便的时候处理，而不是CPU方便的时候，因此实现起来更加困难。硬件设备通常只有很少的RAM，如果你不在可用时读取它们的信息，这些信息就会丢失。
- en: Under Linux, hardware interrupts are called IRQs (Interrupt ReQuests). There
    are two types of IRQs, short and long. A short IRQ is one which is expected to
    take a very short period of time, during which the rest of the machine will be
    blocked and no other interrupts will be handled. A long IRQ is one which can take
    longer, and during which other interrupts may occur (but not interrupts from the
    same device). If at all possible, it is better to declare an interrupt handler
    to be long.
  id: totrans-575
  prefs: []
  type: TYPE_NORMAL
  zh: 在Linux中，硬件中断被称为IRQ（中断请求）。有两种类型的中断请求，即短中断和长中断。短中断是指预期在非常短的时间内完成的中断，在此期间，机器的其他部分将被阻塞，不会处理其他中断。长中断是指可能需要较长时间的中断，在此期间可能会发生其他中断（但不是来自同一设备的中断）。如果可能的话，最好声明一个长中断处理程序。
- en: When the CPU receives an interrupt, it stops whatever it is doing (unless it
    is processing a more important interrupt, in which case it will deal with this
    one only when the more important one is done), saves certain parameters on the
    stack and calls the interrupt handler. This means that certain things are not
    allowed in the interrupt handler itself, because the system is in an unknown state.
    Linux kernel solves the problem by splitting interrupt handling into two parts.
    The first part executes right away and masks the interrupt line. Hardware interrupts
    must be handled quickly, and that is why we need the second part to handle the
    heavy work deferred from an interrupt handler. Historically, BH (Linux naming
    for Bottom Halves) statistically book-keeps the deferred functions. Softirq and
    its higher level abstraction, Tasklet, replace BH since Linux 2.3.
  id: totrans-576
  prefs: []
  type: TYPE_NORMAL
  zh: 当CPU接收到中断时，它会停止正在执行的操作（除非它正在处理一个更重要的中断，在这种情况下，它只会在此更重要的中断完成后处理此中断），在堆栈上保存某些参数，并调用中断处理程序。这意味着在中断处理程序本身中不允许某些操作，因为系统处于未知状态。Linux内核通过将中断处理分为两部分来解决此问题。第一部分立即执行并屏蔽中断线。硬件中断必须快速处理，这就是为什么我们需要第二部分来处理从中断处理程序中延迟的重工作。从历史上看，BH（Linux对下半部分的命名）统计记录了延迟函数。自Linux
    2.3以来，Softirq及其高级抽象Tasklet取代了BH。
- en: The way to implement this is to call `request_irq()` to get your interrupt handler
    called when the relevant IRQ is received.
  id: totrans-577
  prefs: []
  type: TYPE_NORMAL
  zh: 实现这一功能的方法是调用 `request_irq()` 以便在接收到相关中断请求（IRQ）时调用你的中断处理程序。
- en: In practice IRQ handling can be a bit more complex. Hardware is often designed
    in a way that chains two interrupt controllers, so that all the IRQs from interrupt
    controller B are cascaded to a certain IRQ from interrupt controller A. Of course,
    that requires that the kernel finds out which IRQ it really was afterwards and
    that adds overhead. Other architectures offer some special, very low overhead,
    so called "fast IRQ" or FIQs. To take advantage of them requires handlers to be
    written in assembly language, so they do not really fit into the kernel. They
    can be made to work similar to the others, but after that procedure, they are
    no longer any faster than "common" IRQs. SMP enabled kernels running on systems
    with more than one processor need to solve another truckload of problems. It is
    not enough to know if a certain IRQs has happened, it’s also important to know
    what CPU(s) it was for. People still interested in more details, might want to
    refer to "APIC" now.
  id: totrans-578
  prefs: []
  type: TYPE_NORMAL
  zh: 实际上，中断处理可能要复杂一些。硬件通常设计成将两个中断控制器串联起来，这样中断控制器B的所有中断请求都会级联到中断控制器A的某个中断请求。当然，这要求内核在之后找出实际是哪个中断，这会增加开销。其他架构提供了一些特殊、低开销的所谓“快速中断”或FIQs。要利用它们，需要用汇编语言编写处理程序，因此它们实际上并不适合内核。它们可以被配置得与其他处理程序类似，但在此过程之后，它们不再比“普通”中断更快。在具有多个处理器的系统上运行的启用SMP的内核需要解决另一堆问题。仅仅知道某个中断请求是否发生是不够的，还重要的是要知道它针对的是哪个CPU。对更多细节感兴趣的人，现在可能想参考“APIC”。
- en: This function receives the IRQ number, the name of the function, flags, a name
    for /proc/interrupts and a parameter to be passed to the interrupt handler. Usually
    there is a certain number of IRQs available. How many IRQs there are is hardware-dependent.
  id: totrans-579
  prefs: []
  type: TYPE_NORMAL
  zh: 此函数接收中断号、函数名称、标志、/proc/interrupts的名称以及传递给中断处理程序的参数。通常，有特定数量的中断请求可用。中断请求的数量取决于硬件。
- en: The flags can be used to specify behaviors of the IRQ. For example, use `IRQF_SHARED`
    to indicate you are willing to share the IRQ with other interrupt handlers (usually
    because a number of hardware devices sit on the same IRQ); use the `IRQF_ONESHOT`
    to indicate that the IRQ is not reenabled after the handler finished. It should
    be noted that in some materials, you may encounter another set of IRQ flags named
    with the `SA` prefix. For example, the `SA_SHIRQ` and the `SA_INTERRUPT` . Those
    are the IRQ flags in the older kernels. They have been removed completely. Today
    only the `IRQF` flags are in use. This function will only succeed if there is
    not already a handler on this IRQ, or if you are both willing to share.
  id: totrans-580
  prefs: []
  type: TYPE_NORMAL
  zh: 标志可以用来指定中断的行为。例如，使用`IRQF_SHARED`来表示你愿意与其他中断处理程序共享中断（通常是因为多个硬件设备位于同一中断上）；使用`IRQF_ONESHOT`来表示处理程序完成后不重新启用中断。需要注意的是，在某些材料中，你可能会遇到另一组带有`SA`前缀的中断标志。例如，`SA_SHIRQ`和`SA_INTERRUPT`。这些是旧内核中的中断标志。它们已经被完全删除。今天只使用`IRQF`标志。此函数只有在当前中断上没有处理程序，或者你愿意共享的情况下才会成功。
- en: 16.2 Detecting button presses
  id: totrans-581
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 16.2 检测按钮按下
- en: Many popular single board computers, such as Raspberry Pi or Beagleboards, have
    a bunch of GPIO pins. Attaching buttons to those and then having a button press
    do something is a classic case in which you might need to use interrupts, so that
    instead of having the CPU waste time and battery power polling for a change in
    input state, it is better for the input to trigger the CPU to then run a particular
    handling function.
  id: totrans-582
  prefs: []
  type: TYPE_NORMAL
  zh: 许多流行的单板计算机，如树莓派（Raspberry Pi）或贝格尔板（Beagleboards），都有一系列GPIO引脚。将这些按钮连接到这些引脚上，然后通过按钮按下执行某些操作，这是一个你可能需要使用中断的经典案例，这样CPU就不必浪费时间和电池电量轮询输入状态的变化，而是让输入触发CPU运行特定的处理函数。
- en: Here is an example where buttons are connected to GPIO numbers 17 and 18 and
    an LED is connected to GPIO 4\. You can change those numbers to whatever is appropriate
    for your board.
  id: totrans-583
  prefs: []
  type: TYPE_NORMAL
  zh: 这里有一个示例，其中按钮连接到GPIO编号17和18，LED连接到GPIO 4。你可以将这些数字更改为适合你板子的任何数字。
- en: '[PRE112]'
  id: totrans-584
  prefs: []
  type: TYPE_PRE
  zh: '[PRE112]'
- en: 16.3 Bottom Half
  id: totrans-585
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 16.3 下半部分
- en: Suppose you want to do a bunch of stuff inside of an interrupt routine. A common
    way to avoid blocking the interrupt for a significant duration is to defer the
    time-consuming part to a workqueue. This pushes the bulk of the work off into
    the scheduler. This approach helps speed up the interrupt handling process itself,
    allowing the system to respond to the next hardware interrupt more quickly.
  id: totrans-586
  prefs: []
  type: TYPE_NORMAL
  zh: 假设你希望在中断例程内部做很多事情。避免中断被阻塞一段较长时间的一种常见方法是将耗时的部分推迟到工作队列中。这会将大部分工作推到调度器中。这种方法有助于加快中断处理过程本身，使系统能够更快地响应下一个硬件中断。
- en: Kernel developers generally discourage using tasklets due to their design limitations,
    such as memory management issues and unpredictable latencies. Instead, they recommend
    more robust mechanisms like workqueues or softirqs. To address tasklet shortcomings,
    Linux contributors introduced the BH workqueue, activated with the `WQ_BH` flag.
    This workqueue retains critical features, such as execution in atomic (softirq)
    context on the same CPU and the inability to sleep.
  id: totrans-587
  prefs: []
  type: TYPE_NORMAL
  zh: 内核开发者通常不鼓励使用tasklets，因为它们的设计限制，如内存管理问题和不可预测的延迟。相反，他们推荐更健壮的机制，如workqueues或softirqs。为了解决tasklets的不足，Linux贡献者引入了带有`WQ_BH`标志的BH工作队列。此工作队列保留了关键特性，如在同一CPU上的原子（软中断）上下文执行以及无法休眠。
- en: The example below extends the previous code to include an additional task executed
    in process context when an interrupt is triggered.
  id: totrans-588
  prefs: []
  type: TYPE_NORMAL
  zh: 以下示例扩展了之前的代码，以包括在触发中断时在进程上下文中执行的一个附加任务。
- en: '[PRE113]'
  id: totrans-589
  prefs: []
  type: TYPE_PRE
  zh: '[PRE113]'
- en: 16.4 Threaded IRQ
  id: totrans-590
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 16.4 线程化中断
- en: 'Threaded IRQ is a mechanism to organize both top-half and bottom-half of an
    IRQ at once. A threaded IRQ splits the one handler in `request_irq()` into two:
    one for the top-half, the other for the bottom-half. The `request_threaded_irq()`
    is the function for using threaded IRQs. Two handlers are registered at once in
    the `request_threaded_irq()` .'
  id: totrans-591
  prefs: []
  type: TYPE_NORMAL
  zh: 线程化中断请求（Threaded IRQ）是一种同时组织中断的上半部分和下半部分的机制。线程化中断请求将`request_irq()`中的一个处理程序分成两个：一个用于上半部分，另一个用于下半部分。`request_threaded_irq()`是用于使用线程化中断请求的函数。在`request_threaded_irq()`中同时注册两个处理程序。
- en: Those two handlers run in different context. The top-half handler runs in interrupt
    context. It’s the equivalence of the handler passed to the `request_irq()` . The
    bottom-half handler on the other hand runs in its own thread. This thread is created
    on registration of a threaded IRQ. Its sole purpose is to run this bottom-half
    handler. This is where a threaded IRQ is “threaded”. If `IRQ_WAKE_THREAD` is returned
    by the top-half handler, that bottom-half serving thread will wake up. The thread
    then runs the bottom-half handler.
  id: totrans-592
  prefs: []
  type: TYPE_NORMAL
  zh: 这两个处理器在不同的上下文中运行。上半部分处理器在中断上下文中运行。它等同于传递给`request_irq()`的处理器的处理。另一方面，下半部分处理器在其自己的线程中运行。这个线程是在注册线程化中断时创建的。它的唯一目的是运行这个下半部分处理器。这就是线程化中断“线程化”的地方。如果上半部分处理器返回`IRQ_WAKE_THREAD`，那么这个下半部分服务线程将被唤醒。然后线程将运行下半部分处理器。
- en: Here is an example of how to do the same thing as before, with top and bottom
    halves, but using threads.
  id: totrans-593
  prefs: []
  type: TYPE_NORMAL
  zh: 这里是一个如何使用线程实现之前相同功能的例子，即使用上半部分和下半部分。
- en: '[PRE114]'
  id: totrans-594
  prefs: []
  type: TYPE_PRE
  zh: '[PRE114]'
- en: A threaded IRQ is registered using `request_threaded_irq()` . This function
    only takes one additional parameter than the `request_irq()` – the bottom-half
    handling function that runs in its own thread. In this example it is the `button_bottom_half()`
    . Usage of other parameters are the same as `request_irq()` .
  id: totrans-595
  prefs: []
  type: TYPE_NORMAL
  zh: 使用`request_threaded_irq()`注册线程化中断。这个函数比`request_irq()`多一个额外的参数——在它自己的线程中运行的下半部分处理函数。在这个例子中是`button_bottom_half()`。其他参数的使用与`request_irq()`相同。
- en: Presence of both handlers is not mandatory. If either of them is not needed,
    pass the `NULL` instead. A `NULL` top-half handler implies that no action is taken
    except to wake up the bottom-half serving thread, which runs the bottom-half handler.
    Similarly, a `NULL` bottom-half handler effectively acts as if `request_irq()`
    were used. In fact, this is how `request_irq()` is implemented.
  id: totrans-596
  prefs: []
  type: TYPE_NORMAL
  zh: 两个处理器的存在不是强制的。如果其中任何一个不需要，可以用`NULL`代替。一个`NULL`的上半部分处理器意味着除了唤醒运行下半部分处理器的下半部分服务线程外，不采取任何行动。同样，一个`NULL`的下半部分处理器实际上相当于使用了`request_irq()`。实际上，这就是`request_irq()`的实现方式。
- en: Note that passing `NULL` to both handlers is considered an error and will make
    registration fail.
  id: totrans-597
  prefs: []
  type: TYPE_NORMAL
  zh: 注意，将`NULL`传递给两个处理器被视为错误，并且会导致注册失败。
- en: 17 Virtual Input Device Driver
  id: totrans-598
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 17 虚拟输入设备驱动程序
- en: The input device driver is a module that provides a way to communicate with
    the interaction device via the event. For example, the keyboard can send the press
    or release event to tell the kernel what we want to do. The input device driver
    will allocate a new input structure with `input_allocate_device()` and sets up
    input bitfields, device id, version, etc. After that, registers it by calling
    `input_register_device()` .
  id: totrans-599
  prefs: []
  type: TYPE_NORMAL
  zh: 输入设备驱动程序是一个模块，它提供了一种通过事件与交互设备通信的方式。例如，键盘可以发送按键或释放事件来告诉内核我们想要做什么。输入设备驱动程序将使用`input_allocate_device()`分配一个新的输入结构，并设置输入位字段、设备ID、版本等。之后，通过调用`input_register_device()`进行注册。
- en: 'Here is an example, vinput, It is an API to allow easy development of virtual
    input drivers. The driver needs to export a `vinput_device()` that contains the
    virtual device name and `vinput_ops` structure that describes:'
  id: totrans-600
  prefs: []
  type: TYPE_NORMAL
  zh: 这里是一个例子，vinput，它是一个API，允许轻松开发虚拟输入驱动程序。驱动程序需要导出一个包含虚拟设备名称和描述的`vinput_ops`结构的`vinput_device()`。该结构描述：
- en: 'the init function: `init()`'
  id: totrans-601
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 初始化函数：`init()`
- en: 'the input event injection function: `send()`'
  id: totrans-602
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 输入事件注入函数：`send()`
- en: 'the readback function: `read()`'
  id: totrans-603
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 读取函数：`read()`
- en: Then using `vinput_register_device()` and `vinput_unregister_device()` will
    add a new device to the list of support virtual input devices.
  id: totrans-604
  prefs: []
  type: TYPE_NORMAL
  zh: 然后使用`vinput_register_device()`和`vinput_unregister_device()`将新设备添加到支持虚拟输入设备的列表中。
- en: '[PRE115]'
  id: totrans-605
  prefs: []
  type: TYPE_PRE
  zh: '[PRE115]'
- en: This function is passed a `struct vinput` already initialized with an allocated
    `struct input_dev` . The `init()` function is responsible for initializing the
    capabilities of the input device and register it.
  id: totrans-606
  prefs: []
  type: TYPE_NORMAL
  zh: 这个函数传递一个已经使用分配的`struct input_dev`初始化的`struct vinput`。`init()`函数负责初始化输入设备的特性并将其注册。
- en: '[PRE116]'
  id: totrans-607
  prefs: []
  type: TYPE_PRE
  zh: '[PRE116]'
- en: This function will receive a user string to interpret and inject the event using
    the `input_report_XXXX` or `input_event` call. The string is already copied from
    user.
  id: totrans-608
  prefs: []
  type: TYPE_NORMAL
  zh: 这个函数将接收一个用户字符串来解释并使用`input_report_XXXX`或`input_event`调用注入事件。字符串已经从用户空间复制过来。
- en: '[PRE117]'
  id: totrans-609
  prefs: []
  type: TYPE_PRE
  zh: '[PRE117]'
- en: This function is used for debugging and should fill the buffer parameter with
    the last event sent in the virtual input device format. The buffer will then be
    copied to user.
  id: totrans-610
  prefs: []
  type: TYPE_NORMAL
  zh: 这个函数用于调试，应该将缓冲区参数填充为虚拟输入设备格式中发送的最后一个事件。然后，缓冲区将被复制到用户空间。
- en: vinput devices are created and destroyed using sysfs. And, event injection is
    done through a /dev node. The device name will be used by the userland to export
    a new virtual input device.
  id: totrans-611
  prefs: []
  type: TYPE_NORMAL
  zh: 使用sysfs创建和销毁vinput设备。并且，事件注入是通过/dev节点完成的。设备名称将由用户空间用于导出新的虚拟输入设备。
- en: 'The `class_attribute` structure is similar to other attribute types we talked
    about in [Section 8](#sysfs-interacting-with-your-module):'
  id: totrans-612
  prefs: []
  type: TYPE_NORMAL
  zh: '`class_attribute`结构与我们在[第8节](#sysfs-interacting-with-your-module)中讨论的其他属性类型类似：'
- en: '[PRE118]'
  id: totrans-613
  prefs: []
  type: TYPE_PRE
  zh: '[PRE118]'
- en: In vinput.c, the macro `CLASS_ATTR_WO(export/unexport)` defined in [include/linux/device.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/device.h)
    (in this case, device.h is included in [include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h))
    will generate the `class_attribute` structures which are named class_attr_export/unexport.
    Then, put them into `vinput_class_attrs` array and the macro `ATTRIBUTE_GROUPS(vinput_class)`
    will generate the `struct attribute_group vinput_class_group` that should be assigned
    in `vinput_class` . Finally, call `class_register(&vinput_class)` to create attributes
    in sysfs.
  id: totrans-614
  prefs: []
  type: TYPE_NORMAL
  zh: 在vinput.c中，定义在[include/linux/device.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/device.h)（在这种情况下，device.h包含在[include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h)中）的宏`CLASS_ATTR_WO(export/unexport)`将生成名为class_attr_export/unexport的`class_attribute`结构。然后，将它们放入`vinput_class_attrs`数组，宏`ATTRIBUTE_GROUPS(vinput_class)`将生成应分配到`vinput_class`的`struct
    attribute_group vinput_class_group`。最后，调用`class_register(&vinput_class)`在sysfs中创建属性。
- en: To create a vinputX sysfs entry and /dev node.
  id: totrans-615
  prefs: []
  type: TYPE_NORMAL
  zh: 要创建vinputX sysfs条目和/dev节点。
- en: '[PRE119]'
  id: totrans-616
  prefs: []
  type: TYPE_PRE
  zh: '[PRE119]'
- en: 'To unexport the device, just echo its id in unexport:'
  id: totrans-617
  prefs: []
  type: TYPE_NORMAL
  zh: 要取消导出设备，只需在unexport中回显其ID：
- en: '[PRE120]'
  id: totrans-618
  prefs: []
  type: TYPE_PRE
  zh: '[PRE120]'
- en: '[PRE121]'
  id: totrans-619
  prefs: []
  type: TYPE_PRE
  zh: '[PRE121]'
- en: '[PRE122]'
  id: totrans-620
  prefs: []
  type: TYPE_PRE
  zh: '[PRE122]'
- en: Here the virtual keyboard is one of example to use vinput. It supports all `KEY_MAX`
    keycodes. The injection format is the `KEY_CODE` such as defined in [include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h).
    A positive value means `KEY_PRESS` while a negative value is a `KEY_RELEASE` .
    The keyboard supports repetition when the key stays pressed for too long. The
    following demonstrates how simulation work.
  id: totrans-621
  prefs: []
  type: TYPE_NORMAL
  zh: 这里，虚拟键盘是使用vinput的一个示例。它支持所有`KEY_MAX`键码。注入格式是`KEY_CODE`，如[include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h)中定义的那样。正值表示`KEY_PRESS`，而负值是`KEY_RELEASE`。当按键按下时间过长时，键盘支持重复。以下演示了模拟的工作方式。
- en: 'Simulate a key press on "g" ( `KEY_G` = 34):'
  id: totrans-622
  prefs: []
  type: TYPE_NORMAL
  zh: 模拟在"g"键上按下按键（`KEY_G` = 34）：
- en: '[PRE123]'
  id: totrans-623
  prefs: []
  type: TYPE_PRE
  zh: '[PRE123]'
- en: 'Simulate a key release on "g" ( `KEY_G` = 34):'
  id: totrans-624
  prefs: []
  type: TYPE_NORMAL
  zh: 模拟在"g"键上释放按键（`KEY_G` = 34）：
- en: '[PRE124]'
  id: totrans-625
  prefs: []
  type: TYPE_PRE
  zh: '[PRE124]'
- en: '[PRE125]'
  id: totrans-626
  prefs: []
  type: TYPE_PRE
  zh: '[PRE125]'
- en: '18 Standardizing the interfaces: The Device Model'
  id: totrans-627
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 18 标准化接口：设备模型
- en: Up to this point we have seen all kinds of modules doing all kinds of things,
    but there was no consistency in their interfaces with the rest of the kernel.
    To impose some consistency such that there is at minimum a standardized way to
    start, suspend and resume a device model was added. An example is shown below,
    and you can use this as a template to add your own suspend, resume or other interface
    functions.
  id: totrans-628
  prefs: []
  type: TYPE_NORMAL
  zh: 到目前为止，我们已经看到了各种模块做各种事情，但它们与内核其余部分的接口没有一致性。为了强制一致性，至少添加了一种标准化的方式来启动、挂起和恢复设备模型。下面是一个示例，你可以将其用作模板来添加你自己的挂起、恢复或其他接口函数。
- en: '[PRE126]'
  id: totrans-629
  prefs: []
  type: TYPE_PRE
  zh: '[PRE126]'
- en: 19 Device Tree
  id: totrans-630
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 19 设备树
- en: 19.1 Introduction to Device Tree
  id: totrans-631
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.1 设备树简介
- en: Device Tree is a data structure that describes hardware components in a system,
    particularly in embedded systems and ARM-based platforms. Instead of hard-coding
    hardware details in the kernel source, Device Tree provides a separate, human-readable
    description that the kernel can parse at boot time. This separation allows the
    same kernel binary to support multiple hardware platforms, making development
    and maintenance significantly easier.
  id: totrans-632
  prefs: []
  type: TYPE_NORMAL
  zh: 设备树是一种数据结构，用于描述系统中的硬件组件，尤其是在嵌入式系统和基于ARM的平台中。而不是在内核源代码中硬编码硬件细节，设备树提供了一个单独的、可读性好的描述，内核可以在启动时解析。这种分离使得相同的内核二进制文件可以支持多个硬件平台，使得开发和维护变得显著更容易。
- en: Device Tree files (with .dts extension for source files and .dtb for compiled
    binary files) use a hierarchical structure similar to a filesystem to represent
    the hardware topology. Each hardware component is represented as a node with properties
    that describe its characteristics, such as memory addresses, interrupt numbers,
    and device-specific parameters.
  id: totrans-633
  prefs: []
  type: TYPE_NORMAL
  zh: 设备树文件（源文件以.dts扩展名，编译后的二进制文件以.dtb扩展名）使用类似于文件系统的分层结构来表示硬件拓扑。每个硬件组件都表示为一个节点，其属性描述了其特征，如内存地址、中断号和设备特定参数。
- en: 19.2 Device Tree and Kernel Modules
  id: totrans-634
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.2 设备树与内核模块
- en: While Device Tree is primarily used during kernel initialization, kernel modules
    can also interact with Device Tree nodes through the platform device framework.
    When the kernel parses the Device Tree at boot, it creates platform devices for
    nodes that have compatible strings. Kernel modules can then register platform
    drivers that match these compatible strings, allowing them to be automatically
    probed when the corresponding hardware is detected.
  id: totrans-635
  prefs: []
  type: TYPE_NORMAL
  zh: 虽然设备树主要用于内核初始化期间，但内核模块也可以通过平台设备框架与设备树节点交互。当内核在启动时解析设备树时，它会为具有兼容字符串的节点创建平台设备。然后，内核模块可以注册与这些兼容字符串匹配的平台驱动程序，允许它们在检测到相应的硬件时自动探测。
- en: 'The key concepts for Device Tree interaction in kernel modules include:'
  id: totrans-636
  prefs: []
  type: TYPE_NORMAL
  zh: 内核模块中与设备树交互的关键概念包括：
- en: 'Compatible strings: Unique identifiers that match Device Tree nodes to their
    drivers'
  id: totrans-637
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 兼容字符串：匹配设备树节点与其驱动程序的唯一标识符
- en: 'Property reading: Functions to extract configuration data from Device Tree
    nodes'
  id: totrans-638
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 属性读取：从设备树节点中提取配置数据的函数
- en: 'Platform driver framework: Infrastructure for binding drivers to devices described
    in Device Tree'
  id: totrans-639
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 平台驱动程序框架：将驱动程序绑定到设备树中描述的设备的基础设施
- en: 'Device-specific data: Custom properties that can be defined for specific hardware'
  id: totrans-640
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: 设备特定数据：可以为特定硬件定义的自定义属性
- en: '19.3 Example: Device Tree Module'
  id: totrans-641
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.3 示例：设备树模块
- en: The following example demonstrates how a kernel module can interact with Device
    Tree nodes. This module registers a platform driver that matches specific compatible
    strings and extracts properties from the matched Device Tree nodes.
  id: totrans-642
  prefs: []
  type: TYPE_NORMAL
  zh: 以下示例演示了内核模块如何与设备树节点交互。此模块注册了一个与特定兼容字符串匹配的平台驱动程序，并从匹配的设备树节点中提取属性。
- en: '[PRE127]'
  id: totrans-643
  prefs: []
  type: TYPE_PRE
  zh: '[PRE127]'
- en: 19.4 Device Tree Source Example
  id: totrans-644
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.4 设备树源示例
- en: 'To use the above module, you would need a Device Tree entry like this:'
  id: totrans-645
  prefs: []
  type: TYPE_NORMAL
  zh: 要使用上述模块，您需要一个如下的设备树条目：
- en: '[PRE128]'
  id: totrans-646
  prefs: []
  type: TYPE_PRE
  zh: '[PRE128]'
- en: The properties in this Device Tree node would be read by the module’s probe
    function when the device is matched. The compatible property is used to match
    the device with the driver, while other properties provide device-specific configuration.
  id: totrans-647
  prefs: []
  type: TYPE_NORMAL
  zh: 在设备匹配时，模块的探测函数将读取此设备树节点中的属性。兼容属性用于将设备与驱动程序匹配，而其他属性提供设备特定的配置。
- en: 19.5 Testing Device Tree Modules
  id: totrans-648
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.5 测试设备树模块
- en: 'Testing Device Tree modules can be done in several ways:'
  id: totrans-649
  prefs: []
  type: TYPE_NORMAL
  zh: 测试设备树模块可以通过几种方式完成：
- en: 'Using Device Tree overlays: On systems that support it (like Raspberry Pi),
    you can load Device Tree overlays at runtime to add new devices without rebooting.'
  id: totrans-650
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 使用设备树覆盖：在支持它的系统（如树莓派）上，您可以在运行时加载设备树覆盖，以添加新设备而无需重启。
- en: 'Modifying the main Device Tree: Add your device nodes to the system’s main
    Device Tree source file and recompile it.'
  id: totrans-651
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 修改主设备树：将您的设备节点添加到系统的主设备树源文件中，并重新编译它。
- en: 'Using QEMU: For development and testing, QEMU can emulate systems with custom
    Device Trees, allowing you to test your modules without physical hardware.'
  id: totrans-652
  prefs:
  - PREF_OL
  type: TYPE_NORMAL
  zh: 使用QEMU：在开发和测试中，QEMU可以模拟具有自定义设备树的系统，允许您在没有物理硬件的情况下测试您的模块。
- en: 'To check if your device was properly detected, you can examine the sysfs filesystem:'
  id: totrans-653
  prefs: []
  type: TYPE_NORMAL
  zh: 要检查您的设备是否被正确检测，您可以检查sysfs文件系统：
- en: '[PRE129]'
  id: totrans-654
  prefs: []
  type: TYPE_PRE
  zh: '[PRE129]'
- en: 19.6 Common Device Tree Functions
  id: totrans-655
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 19.6 常用设备树函数
- en: 'Here are some commonly used Device Tree functions in kernel modules:'
  id: totrans-656
  prefs: []
  type: TYPE_NORMAL
  zh: 这里有一些在内核模块中常用的设备树函数：
- en: '`of_property_read_string()` - Read a string property'
  id: totrans-657
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_property_read_string()` - 读取字符串属性'
- en: '`of_property_read_u32()` - Read a 32-bit integer property'
  id: totrans-658
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_property_read_u32()` - 读取32位整数属性'
- en: '`of_property_read_bool()` - Check if a boolean property exists'
  id: totrans-659
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_property_read_bool()` - 检查是否存在布尔属性'
- en: '`of_find_property()` - Find a property by name'
  id: totrans-660
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_find_property()` - 通过名称查找属性'
- en: '`of_get_property()` - Get a property’s raw value'
  id: totrans-661
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_get_property()` - 获取属性的原始值'
- en: '`of_match_device()` - Match a device against a match table'
  id: totrans-662
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_match_device()` - 将设备与匹配表匹配'
- en: '`of_parse_phandle()` - Parse a phandle reference to another node'
  id: totrans-663
  prefs:
  - PREF_UL
  type: TYPE_NORMAL
  zh: '`of_parse_phandle()` - 解析指向另一个节点的phandle引用'
- en: These functions provide a robust interface for extracting configuration data
    from Device Tree nodes, allowing modules to be highly configurable without code
    changes.
  id: totrans-664
  prefs: []
  type: TYPE_NORMAL
  zh: 这些函数提供了一个健壮的接口，用于从设备树节点中提取配置数据，允许模块在无需代码更改的情况下进行高度配置。
- en: 20 Optimizations
  id: totrans-665
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 20 优化
- en: 20.1 Likely and Unlikely conditions
  id: totrans-666
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 20.1 可能和不可能条件
- en: Sometimes you might want your code to run as quickly as possible, especially
    if it is handling an interrupt or doing something which might cause noticeable
    latency. If your code contains boolean conditions and if you know that the conditions
    are almost always likely to evaluate as either `true` or `false` , then you can
    allow the compiler to optimize for this using the `likely` and `unlikely` macros.
    For example, when allocating memory you are almost always expecting this to succeed.
  id: totrans-667
  prefs: []
  type: TYPE_NORMAL
  zh: 有时你可能希望你的代码尽可能快地运行，特别是如果你正在处理中断或可能引起明显延迟的操作。如果你的代码包含布尔条件，并且你知道条件几乎总是评估为 `true`
    或 `false`，那么你可以允许编译器使用 `likely` 和 `unlikely` 宏进行优化。例如，当分配内存时，你几乎总是期望这会成功。
- en: '[PRE130]'
  id: totrans-668
  prefs: []
  type: TYPE_PRE
  zh: '[PRE130]'
- en: When the `unlikely` macro is used, the compiler alters its machine instruction
    output, so that it continues along the false branch and only jumps if the condition
    is true. That avoids flushing the processor pipeline. The opposite happens if
    you use the `likely` macro.
  id: totrans-669
  prefs: []
  type: TYPE_NORMAL
  zh: 当使用 `unlikely` 宏时，编译器会改变其机器指令输出，以便它继续沿着错误分支执行，并且只有当条件为真时才会跳转。这避免了刷新处理器流水线。如果你使用
    `likely` 宏，则发生相反的情况。
- en: 20.2 Static keys
  id: totrans-670
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 20.2 静态密钥
- en: 'Static keys allow us to enable or disable kernel code paths based on the runtime
    state of a key. Their APIs have been available since 2010 (most architectures
    are already supported) and use self-modifying code to eliminate the overhead of
    cache and branch prediction. The most typical use case of static keys is for performance-sensitive
    kernel code, such as tracepoints, context switching, networking, etc. These hot
    paths of the kernel often contain branches and can be optimized easily using this
    technique. Before we can use static keys in the kernel, we need to make sure that
    gcc supports `asm goto` inline assembly, and the following kernel configurations
    are set:'
  id: totrans-671
  prefs: []
  type: TYPE_NORMAL
  zh: 静态密钥允许我们根据密钥的运行时状态启用或禁用内核代码路径。它们的 API 自 2010 年以来一直可用（大多数架构已经支持）并且使用自修改代码来消除缓存和分支预测的开销。静态密钥最典型的用例是性能敏感的内核代码，如
    tracepoints、上下文切换、网络等。内核的这些热点路径通常包含分支，并且可以使用此技术轻松优化。在我们能够在内核中使用静态密钥之前，我们需要确保 gcc
    支持 `asm goto` 内联汇编，并且以下内核配置被设置：
- en: '[PRE131]'
  id: totrans-672
  prefs: []
  type: TYPE_PRE
  zh: '[PRE131]'
- en: 'To declare a static key, we need to define a global variable using the `DEFINE_STATIC_KEY_FALSE`
    or `DEFINE_STATIC_KEY_TRUE` macro defined in [include/linux/jump_label.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/jump_label.h).
    This macro initializes the key with the given initial value, which is either false
    or true, respectively. For example, to declare a static key with an initial value
    of false, we can use the following code:'
  id: totrans-673
  prefs: []
  type: TYPE_NORMAL
  zh: 要声明静态密钥，我们需要使用在 [include/linux/jump_label.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/jump_label.h)
    中定义的 `DEFINE_STATIC_KEY_FALSE` 或 `DEFINE_STATIC_KEY_TRUE` 宏来定义一个全局变量。此宏将密钥初始化为给定的初始值，即分别为假或真。例如，要声明一个初始值为假的静态密钥，我们可以使用以下代码：
- en: '[PRE132]'
  id: totrans-674
  prefs: []
  type: TYPE_PRE
  zh: '[PRE132]'
- en: Once the static key has been declared, we need to add branching code to the
    module that uses the static key. For example, the code includes a fastpath, where
    a no-op instruction will be generated at compile time as the key is initialized
    to false and the branch is unlikely to be taken.
  id: totrans-675
  prefs: []
  type: TYPE_NORMAL
  zh: 一旦声明了静态密钥，我们需要向使用静态密钥的模块中添加分支代码。例如，代码包括一个快速路径，在编译时将生成一个无操作指令，因为密钥被初始化为假，分支不太可能被采取。
- en: '[PRE133]'
  id: totrans-676
  prefs: []
  type: TYPE_PRE
  zh: '[PRE133]'
- en: If the key is enabled at runtime by calling `static_branch_enable(&fkey)` ,
    the fastpath will be patched with an unconditional jump instruction to the slowpath
    code `pr_alert` , so the branch will always be taken until the key is disabled
    again.
  id: totrans-677
  prefs: []
  type: TYPE_NORMAL
  zh: 如果在运行时通过调用 `static_branch_enable(&fkey)` 启用密钥，则快速路径将被修补为无条件跳转到慢路径代码 `pr_alert`，因此分支将始终被采取，直到再次禁用密钥。
- en: The following kernel module derived from chardev.c, demonstrates how the static
    key works.
  id: totrans-678
  prefs: []
  type: TYPE_NORMAL
  zh: 以下从 chardev.c 衍生的内核模块演示了静态密钥的工作原理。
- en: '[PRE134]'
  id: totrans-679
  prefs: []
  type: TYPE_PRE
  zh: '[PRE134]'
- en: To check the state of the static key, we can use the /dev/key_state interface.
  id: totrans-680
  prefs: []
  type: TYPE_NORMAL
  zh: 检查静态密钥的状态，我们可以使用 /dev/key_state 接口。
- en: '[PRE135]'
  id: totrans-681
  prefs: []
  type: TYPE_PRE
  zh: '[PRE135]'
- en: This will display the current state of the key, which is disabled by default.
  id: totrans-682
  prefs: []
  type: TYPE_NORMAL
  zh: 这将显示密钥的当前状态，默认情况下密钥是禁用的。
- en: 'To change the state of the static key, we can perform a write operation on
    the file:'
  id: totrans-683
  prefs: []
  type: TYPE_NORMAL
  zh: 要更改静态键的状态，我们可以在文件上执行写操作：
- en: '[PRE136]'
  id: totrans-684
  prefs: []
  type: TYPE_PRE
  zh: '[PRE136]'
- en: This will enable the static key, causing the code path to switch from the fastpath
    to the slowpath.
  id: totrans-685
  prefs: []
  type: TYPE_NORMAL
  zh: 这将启用静态键，导致代码路径从快速路径切换到慢速路径。
- en: In some cases, the key is enabled or disabled at initialization and never changed,
    we can declare a static key as read-only, which means that it can only be toggled
    in the module init function. To declare a read-only static key, we can use the
    `DEFINE_STATIC_KEY_FALSE_RO` or `DEFINE_STATIC_KEY_TRUE_RO` macro instead. Attempts
    to change the key at runtime will result in a page fault. For more information,
    see [Static keys](https://www.kernel.org/doc/Documentation/static-keys.txt)
  id: totrans-686
  prefs: []
  type: TYPE_NORMAL
  zh: 在某些情况下，键在初始化时被启用或禁用，之后从未改变，我们可以将静态键声明为只读，这意味着它只能在模块初始化函数中切换。要声明只读静态键，我们可以使用`DEFINE_STATIC_KEY_FALSE_RO`或`DEFINE_STATIC_KEY_TRUE_RO`宏。在运行时尝试更改键将导致页面错误。有关更多信息，请参阅[静态键](https://www.kernel.org/doc/Documentation/static-keys.txt)。
- en: 21 Common Pitfalls
  id: totrans-687
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 21 常见陷阱
- en: 21.1 Using standard libraries
  id: totrans-688
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 21.1 使用标准库
- en: You can not do that. In a kernel module, you can only use kernel functions which
    are the functions you can see in /proc/kallsyms.
  id: totrans-689
  prefs: []
  type: TYPE_NORMAL
  zh: 你不能这样做。在内核模块中，你只能使用内核函数，这些函数是你可以在/proc/kallsyms中看到的。
- en: 21.2 Disabling interrupts
  id: totrans-690
  prefs:
  - PREF_H4
  type: TYPE_NORMAL
  zh: 21.2 禁用中断
- en: You might need to do this for a short time and that is OK, but if you do not
    enable them afterwards, your system will be stuck and you will have to power it
    off.
  id: totrans-691
  prefs: []
  type: TYPE_NORMAL
  zh: 你可能需要短时间这样做，这是可以的，但如果你之后没有启用它们，你的系统将会卡住，你将不得不关闭电源。
- en: 22 Where To Go From Here?
  id: totrans-692
  prefs:
  - PREF_H3
  type: TYPE_NORMAL
  zh: 22 从这里去哪里？
- en: For those deeply interested in kernel programming, [kernelnewbies.org](https://kernelnewbies.org)
    and the [Documentation](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation)
    subdirectory within the kernel source code are highly recommended. Although the
    latter may not always be straightforward, it serves as a valuable initial step
    for further exploration. Echoing Linus Torvalds’ perspective, the most effective
    method to understand the kernel is through personal examination of the source
    code.
  id: totrans-693
  prefs: []
  type: TYPE_NORMAL
  zh: 对于对内核编程有深厚兴趣的人来说，强烈推荐访问[kernelnewbies.org](https://kernelnewbies.org)以及内核源代码中的[Documentation](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation)子目录。尽管后者可能并不总是直截了当，但它为更深入的探索提供了一个宝贵的起点。正如林纳斯·托瓦兹的观点，理解内核的最有效方法是亲自检查源代码。
- en: Contributions to this guide are welcome, especially if there are any significant
    inaccuracies identified. To contribute or report an issue, please initiate an
    issue at [https://github.com/sysprog21/lkmpg](https://github.com/sysprog21/lkmpg).
    Pull requests are greatly appreciated.
  id: totrans-694
  prefs: []
  type: TYPE_NORMAL
  zh: 欢迎对此指南做出贡献，特别是如果发现任何重大不准确之处。要贡献或报告问题，请在[https://github.com/sysprog21/lkmpg](https://github.com/sysprog21/lkmpg)发起一个问题。拉取请求非常受欢迎。
- en: Happy hacking!
  id: totrans-695
  prefs: []
  type: TYPE_NORMAL
  zh: 开心黑客！
- en: '[¹](#fn1x0-bk)As of Linux kernel 6.12, several member fields have been added,
    removed, or had their prototypes changed. For example, additions include fop_flags,
    splice_eof, and uring_cmd; removals include iterate and sendpage; and the prototype
    for iopoll was modified.'
  id: totrans-696
  prefs: []
  type: TYPE_NORMAL
  zh: '[¹](#fn1x0-bk)截至Linux内核6.12版本，一些成员字段已被添加、删除或更改了原型。例如，新增包括fop_flags、splice_eof和uring_cmd；删除包括iterate和sendpage；iopoll的原型也被修改。'
- en: '[²](#fn2x0-bk)The goal of threaded interrupts is to push more of the work to
    separate threads, so that the minimum needed for acknowledging an interrupt is
    reduced, and therefore the time spent handling the interrupt (where it can’t handle
    any other interrupts at the same time) is reduced. See [https://lwn.net/Articles/302043/](https://lwn.net/Articles/302043/).'
  id: totrans-697
  prefs: []
  type: TYPE_NORMAL
  zh: '[²](#fn2x0-bk)线程中断的目标是将更多的工作推送到单独的线程，这样确认中断所需的最小工作就减少了，因此处理中断（在此期间不能处理其他中断）的时间也就减少了。参见[https://lwn.net/Articles/302043/](https://lwn.net/Articles/302043/)。'