geekdoc-linux-zh/data/The-Linux-Kernel-Module-Pro...

4948 lines
282 KiB
YAML
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

- en: <!--yml
id: totrans-0
prefs: []
type: TYPE_NORMAL
zh: <!--yml
- en: 'category: 未分类'
id: totrans-1
prefs: []
type: TYPE_NORMAL
zh: 分类:未分类
- en: 'date: 2025-12-20 20:24:55'
id: totrans-2
prefs: []
type: TYPE_NORMAL
zh: 日期2025-12-20 20:24:55
- en: -->
id: totrans-3
prefs: []
type: TYPE_NORMAL
zh: -->
- en: The Linux Kernel Module Programming Guide
id: totrans-4
prefs:
- PREF_H1
type: TYPE_NORMAL
zh: Linux 内核模块编程指南
- en: 来源:[https://sysprog21.github.io/lkmpg/](https://sysprog21.github.io/lkmpg/)
id: totrans-5
prefs:
- PREF_BQ
type: TYPE_NORMAL
zh: 来源:[https://sysprog21.github.io/lkmpg/](https://sysprog21.github.io/lkmpg/)
- en: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram, Jim Huang
id: totrans-6
prefs: []
type: TYPE_NORMAL
zh: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram, Jim Huang
- en: September 28, 2025
id: totrans-7
prefs: []
type: TYPE_NORMAL
zh: 2025年9月28日
- en: '![PIC](img/78a1165dae09cd77d532c4c0e3be17a8.png)'
id: totrans-8
prefs: []
type: TYPE_IMG
zh: '![PIC](img/78a1165dae09cd77d532c4c0e3be17a8.png)'
- en: 1 [Introduction](#introduction)
id: totrans-9
prefs: []
type: TYPE_NORMAL
zh: 1 [简介](#introduction)
- en: 1.1 [Authorship](#authorship)
id: totrans-10
prefs: []
type: TYPE_NORMAL
zh: 1.1 [作者](#authorship)
- en: 1.2 [Acknowledgements](#acknowledgements)
id: totrans-11
prefs: []
type: TYPE_NORMAL
zh: 1.2 [致谢](#acknowledgements)
- en: 1.3 [What Is A Kernel Module?](#what-is-a-kernel-module)
id: totrans-12
prefs: []
type: TYPE_NORMAL
zh: 1.3 [什么是内核模块?](#what-is-a-kernel-module)
- en: 1.4 [Kernel module package](#kernel-module-package)
id: totrans-13
prefs: []
type: TYPE_NORMAL
zh: 1.4 [内核模块包](#kernel-module-package)
- en: 1.5 [What Modules are in my Kernel?](#what-modules-are-in-my-kernel)
id: totrans-14
prefs: []
type: TYPE_NORMAL
zh: 1.5 [我的内核中有什么模块?](#what-modules-are-in-my-kernel)
- en: 1.6 [Is there a need to download and compile the kernel?](#is-there-a-need-to-download-and-compile-the-kernel)
id: totrans-15
prefs: []
type: TYPE_NORMAL
zh: 1.6 [是否需要下载和编译内核?](#is-there-a-need-to-download-and-compile-the-kernel)
- en: 1.7 [Before We Begin](#before-we-begin)
id: totrans-16
prefs: []
type: TYPE_NORMAL
zh: 1.7 [开始之前](#before-we-begin)
- en: 2 [Headers](#headers)
id: totrans-17
prefs: []
type: TYPE_NORMAL
zh: 2 [头文件](#headers)
- en: 3 [Examples](#examples)
id: totrans-18
prefs: []
type: TYPE_NORMAL
zh: 3 [示例](#examples)
- en: 4 [Hello World](#hello-world)
id: totrans-19
prefs: []
type: TYPE_NORMAL
zh: 4 [Hello World](#hello-world)
- en: 4.1 [The Simplest Module](#the-simplest-module)
id: totrans-20
prefs: []
type: TYPE_NORMAL
zh: 4.1 [最简单的模块](#the-simplest-module)
- en: 4.2 [Hello and Goodbye](#hello-and-goodbye)
id: totrans-21
prefs: []
type: TYPE_NORMAL
zh: 4.2 [你好和再见](#hello-and-goodbye)
- en: 4.3 [The __init and __exit Macros](#the-init-and-exit-macros)
id: totrans-22
prefs: []
type: TYPE_NORMAL
zh: 4.3 [__init和__exit宏](#the-init-and-exit-macros)
- en: 4.4 [Licensing and Module Documentation](#licensing-and-module-documentation)
id: totrans-23
prefs: []
type: TYPE_NORMAL
zh: 4.4 [许可和模块文档](#licensing-and-module-documentation)
- en: 4.5 [Passing Command Line Arguments to a Module](#passing-command-line-arguments-to-a-module)
id: totrans-24
prefs: []
type: TYPE_NORMAL
zh: 4.5 [向模块传递命令行参数](#passing-command-line-arguments-to-a-module)
- en: 4.6 [Modules Spanning Multiple Files](#modules-spanning-multiple-files)
id: totrans-25
prefs: []
type: TYPE_NORMAL
zh: 4.6 [跨多个文件的模块](#modules-spanning-multiple-files)
- en: 4.7 [Building modules for a precompiled kernel](#building-modules-for-a-precompiled-kernel)
id: totrans-26
prefs: []
type: TYPE_NORMAL
zh: 4.7 [为预编译内核构建模块](#building-modules-for-a-precompiled-kernel)
- en: 5 [Preliminaries](#preliminaries)
id: totrans-27
prefs: []
type: TYPE_NORMAL
zh: 5 [预备知识](#preliminaries)
- en: 5.1 [How modules begin and end](#how-modules-begin-and-end)
id: totrans-28
prefs: []
type: TYPE_NORMAL
zh: 5.1 [模块的开始和结束](#how-modules-begin-and-end)
- en: 5.2 [Functions available to modules](#functions-available-to-modules)
id: totrans-29
prefs: []
type: TYPE_NORMAL
zh: 5.2 [模块可用的函数](#functions-available-to-modules)
- en: 5.3 [User Space vs Kernel Space](#user-space-vs-kernel-space)
id: totrans-30
prefs: []
type: TYPE_NORMAL
zh: 5.3 [用户空间与内核空间](#user-space-vs-kernel-space)
- en: 5.4 [Name Space](#name-space)
id: totrans-31
prefs: []
type: TYPE_NORMAL
zh: 5.4 [命名空间](#name-space)
- en: 5.5 [Code space](#code-space)
id: totrans-32
prefs: []
type: TYPE_NORMAL
zh: 5.5 [代码空间](#code-space)
- en: 5.6 [Device Drivers](#device-drivers)
id: totrans-33
prefs: []
type: TYPE_NORMAL
zh: 5.6 [设备驱动程序](#device-drivers)
- en: 6 [Character Device drivers](#character-device-drivers)
id: totrans-34
prefs: []
type: TYPE_NORMAL
zh: 6 [字符设备驱动程序](#character-device-drivers)
- en: 6.1 [The file_operations Structure](#the-fileoperations-structure)
id: totrans-35
prefs: []
type: TYPE_NORMAL
zh: 6.1 [file_operations结构](#the-fileoperations-structure)
- en: 6.2 [The file structure](#the-file-structure)
id: totrans-36
prefs: []
type: TYPE_NORMAL
zh: 6.2 [文件结构](#the-file-structure)
- en: 6.3 [Registering A Device](#registering-a-device)
id: totrans-37
prefs: []
type: TYPE_NORMAL
zh: 6.3 [注册设备](#registering-a-device)
- en: 6.4 [Unregistering A Device](#unregistering-a-device)
id: totrans-38
prefs: []
type: TYPE_NORMAL
zh: 6.4 [注销设备](#unregistering-a-device)
- en: 6.5 [chardev.c](#chardevc)
id: totrans-39
prefs: []
type: TYPE_NORMAL
zh: 6.5 [chardev.c](#chardevc)
- en: 6.6 [Writing Modules for Multiple Kernel Versions](#writing-modules-for-multiple-kernel-versions)
id: totrans-40
prefs: []
type: TYPE_NORMAL
zh: 6.6 [为多个内核版本编写模块](#writing-modules-for-multiple-kernel-versions)
- en: 7 [The /proc Filesystem](#the-proc-filesystem)
id: totrans-41
prefs: []
type: TYPE_NORMAL
zh: 7 [/proc 文件系统](#the-proc-filesystem)
- en: 7.1 [The proc_ops Structure](#the-procops-structure)
id: totrans-42
prefs: []
type: TYPE_NORMAL
zh: 7.1 [proc_ops 结构](#the-procops-structure)
- en: 7.2 [Read and Write a /proc File](#read-and-write-a-proc-file)
id: totrans-43
prefs: []
type: TYPE_NORMAL
zh: 7.2 [读取和写入/proc文件](#read-and-write-a-proc-file)
- en: 7.3 [Manage /proc file with standard filesystem](#manage-proc-file-with-standard-filesystem)
id: totrans-44
prefs: []
type: TYPE_NORMAL
zh: 7.3 [使用标准文件系统管理/proc文件](#manage-proc-file-with-standard-filesystem)
- en: 7.4 [Manage /proc file with seq_file](#manage-proc-file-with-seqfile)
id: totrans-45
prefs: []
type: TYPE_NORMAL
zh: 7.4 [使用seq_file管理/proc文件](#manage-proc-file-with-seqfile)
- en: '8 [sysfs: Interacting with your module](#sysfs-interacting-with-your-module)'
id: totrans-46
prefs: []
type: TYPE_NORMAL
zh: 8 [sysfs与你的模块交互](#sysfs-interacting-with-your-module)
- en: 9 [Talking To Device Files](#talking-to-device-files)
id: totrans-47
prefs: []
type: TYPE_NORMAL
zh: 9 [与设备文件通信](#talking-to-device-files)
- en: 10 [System Calls](#system-calls)
id: totrans-48
prefs: []
type: TYPE_NORMAL
zh: 10 [系统调用](#system-calls)
- en: 11 [Blocking Processes and threads](#blocking-processes-and-threads)
id: totrans-49
prefs: []
type: TYPE_NORMAL
zh: 11 [阻塞进程和线程](#blocking-processes-and-threads)
- en: 11.1 [Sleep](#sleep)
id: totrans-50
prefs: []
type: TYPE_NORMAL
zh: 11.1 [睡眠](#sleep)
- en: 11.2 [Completions](#completions)
id: totrans-51
prefs: []
type: TYPE_NORMAL
zh: 11.2 [补全](#completions)
- en: 12 [Synchronization](#synchronization)
id: totrans-52
prefs: []
type: TYPE_NORMAL
zh: 12 [同步](#synchronization)
- en: 12.1 [Mutex](#mutex)
id: totrans-53
prefs: []
type: TYPE_NORMAL
zh: 12.1 [互斥锁](#mutex)
- en: 12.2 [Spinlocks](#spinlocks)
id: totrans-54
prefs: []
type: TYPE_NORMAL
zh: 12.2 [自旋锁](#spinlocks)
- en: 12.3 [Read and write locks](#read-and-write-locks)
id: totrans-55
prefs: []
type: TYPE_NORMAL
zh: 12.3 [读写锁](#read-and-write-locks)
- en: 12.4 [Atomic operations](#atomic-operations)
id: totrans-56
prefs: []
type: TYPE_NORMAL
zh: 12.4 [原子操作](#atomic-operations)
- en: 13 [Replacing Print Macros](#replacing-print-macros)
id: totrans-57
prefs: []
type: TYPE_NORMAL
zh: 13 [替换打印宏](#replacing-print-macros)
- en: 13.1 [Replacement](#replacement)
id: totrans-58
prefs: []
type: TYPE_NORMAL
zh: 13.1 [替换](#replacement)
- en: 13.2 [Flashing keyboard LEDs](#flashing-keyboard-leds)
id: totrans-59
prefs: []
type: TYPE_NORMAL
zh: 13.2 [闪烁键盘LED](#flashing-keyboard-leds)
- en: 14 [GPIO](#gpio)
id: totrans-60
prefs: []
type: TYPE_NORMAL
zh: 14 [GPIO](#gpio)
- en: 14.1 [GPIO](#gpio1)
id: totrans-61
prefs: []
type: TYPE_NORMAL
zh: 14.1 [GPIO](#gpio1)
- en: 14.2 [Control the LEDs on/off state](#control-the-leds-onoff-state)
id: totrans-62
prefs: []
type: TYPE_NORMAL
zh: 14.2 [控制LED的开关状态](#control-the-leds-onoff-state)
- en: 14.3 [DHT11 sensor](#dht-sensor)
id: totrans-63
prefs: []
type: TYPE_NORMAL
zh: 14.3 [DHT11传感器](#dht-sensor)
- en: 15 [Scheduling Tasks](#scheduling-tasks)
id: totrans-64
prefs: []
type: TYPE_NORMAL
zh: 15 [调度任务](#scheduling-tasks)
- en: 15.1 [Tasklets](#tasklets)
id: totrans-65
prefs: []
type: TYPE_NORMAL
zh: 15.1 [任务](#tasklets)
- en: 15.2 [Work queues](#work-queues)
id: totrans-66
prefs: []
type: TYPE_NORMAL
zh: 15.2 [工作队列](#work-queues)
- en: 16 [Interrupt Handlers](#interrupt-handlers)
id: totrans-67
prefs: []
type: TYPE_NORMAL
zh: 16 [中断处理程序](#interrupt-handlers)
- en: 16.1 [Interrupt Handlers](#interrupt-handlers1)
id: totrans-68
prefs: []
type: TYPE_NORMAL
zh: 16.1 [中断处理程序](#interrupt-handlers1)
- en: 16.2 [Detecting button presses](#detecting-button-presses)
id: totrans-69
prefs: []
type: TYPE_NORMAL
zh: 16.2 [检测按钮按下](#detecting-button-presses)
- en: 16.3 [Bottom Half](#bottom-half)
id: totrans-70
prefs: []
type: TYPE_NORMAL
zh: 16.3 [下半部](#bottom-half)
- en: 16.4 [Threaded IRQ](#threaded-irq)
id: totrans-71
prefs: []
type: TYPE_NORMAL
zh: 16.4 [线程化中断](#threaded-irq)
- en: 17 [Virtual Input Device Driver](#virtual-input-device-driver)
id: totrans-72
prefs: []
type: TYPE_NORMAL
zh: 17 [虚拟输入设备驱动程序](#virtual-input-device-driver)
- en: '18 [Standardizing the interfaces: The Device Model](#standardizing-the-interfaces-the-device-model)'
id: totrans-73
prefs: []
type: TYPE_NORMAL
zh: 18 [标准化接口:设备模型](#standardizing-the-interfaces-the-device-model)
- en: 19 [Device Tree](#device-tree)
id: totrans-74
prefs: []
type: TYPE_NORMAL
zh: 19 [设备树](#device-tree)
- en: 19.1 [Introduction to Device Tree](#introduction-to-device-tree)
id: totrans-75
prefs: []
type: TYPE_NORMAL
zh: 19.1 [设备树简介](#introduction-to-device-tree)
- en: 19.2 [Device Tree and Kernel Modules](#device-tree-and-kernel-modules)
id: totrans-76
prefs: []
type: TYPE_NORMAL
zh: 19.2 [设备树和内核模块](#device-tree-and-kernel-modules)
- en: '19.3 [Example: Device Tree Module](#example-device-tree-module)'
id: totrans-77
prefs: []
type: TYPE_NORMAL
zh: 19.3 [示例:设备树模块](#example-device-tree-module)
- en: 19.4 [Device Tree Source Example](#device-tree-source-example)
id: totrans-78
prefs: []
type: TYPE_NORMAL
zh: 19.4 [设备树源示例](#device-tree-source-example)
- en: 19.5 [Testing Device Tree Modules](#testing-device-tree-modules)
id: totrans-79
prefs: []
type: TYPE_NORMAL
zh: 19.5 [测试设备树模块](#testing-device-tree-modules)
- en: 19.6 [Common Device Tree Functions](#common-device-tree-functions)
id: totrans-80
prefs: []
type: TYPE_NORMAL
zh: 19.6 [常见的设备树函数](#common-device-tree-functions)
- en: 20 [Optimizations](#optimizations)
id: totrans-81
prefs: []
type: TYPE_NORMAL
zh: 20 [优化](#optimizations)
- en: 20.1 [Likely and Unlikely conditions](#likely-and-unlikely-conditions)
id: totrans-82
prefs: []
type: TYPE_NORMAL
zh: 20.1 [可能和不可能条件](#likely-and-unlikely-conditions)
- en: 20.2 [Static keys](#static-keys)
id: totrans-83
prefs: []
type: TYPE_NORMAL
zh: 20.2 [静态键](#static-keys)
- en: 21 [Common Pitfalls](#common-pitfalls)
id: totrans-84
prefs: []
type: TYPE_NORMAL
zh: 21 [常见陷阱](#common-pitfalls)
- en: 21.1 [Using standard libraries](#using-standard-libraries)
id: totrans-85
prefs: []
type: TYPE_NORMAL
zh: 21.1 [使用标准库](#using-standard-libraries)
- en: 21.2 [Disabling interrupts](#disabling-interrupts)
id: totrans-86
prefs: []
type: TYPE_NORMAL
zh: 21.2 [禁用中断](#disabling-interrupts)
- en: 22 [Where To Go From Here?](#where-to-go-from-here)
id: totrans-87
prefs: []
type: TYPE_NORMAL
zh: 22 [从这里开始?](#where-to-go-from-here)
- en: 1 Introduction
id: totrans-88
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 1 简介
- en: The Linux Kernel Module Programming Guide is a free book; you may reproduce
or modify it under the terms of the [Open Software License](https://opensource.org/licenses/OSL-3.0),
version 3.0.
id: totrans-89
prefs: []
type: TYPE_NORMAL
zh: 《Linux内核模块编程指南》是一本免费书籍您可以在[开放软件许可](https://opensource.org/licenses/OSL-3.0)的条款下复制或修改版本3.0。
- en: This book is distributed in the hope that it would be useful, but without any
warranty, without even the implied warranty of merchantability or fitness for
a particular purpose.
id: totrans-90
prefs: []
type: TYPE_NORMAL
zh: 本书分发是为了希望它会有用,但没有任何保证,甚至没有商销性或特定用途适用性的暗示保证。
- en: The author encourages wide distribution of this book for personal or commercial
use, provided the above copyright notice remains intact and the method adheres
to the provisions of the [Open Software License](https://opensource.org/licenses/OSL-3.0).
In summary, you may copy and distribute this book free of charge or for a profit.
No explicit permission is required from the author for reproduction of this book
in any medium, physical or electronic.
id: totrans-91
prefs: []
type: TYPE_NORMAL
zh: 作者鼓励广泛分发此书,无论是个人还是商业用途,只要上述版权声明保持完整,并且方法遵守[开放软件许可](https://opensource.org/licenses/OSL-3.0)的规定。总之,您可以免费或盈利地复制和分发此书。无需作者明确许可即可以任何介质复制此书,无论是物理的还是电子的。
- en: Derivative works and translations of this document must be placed under the
Open Software License, and the original copyright notice must remain intact. If
you have contributed new material to this book, you must make the material and
source code available for your revisions. Please make revisions and updates available
directly to the document maintainer, Jim Huang <jserv@ccns.ncku.edu.tw>. This
will allow for the merging of updates and provide consistent revisions to the
Linux community.
id: totrans-92
prefs: []
type: TYPE_NORMAL
zh: 本文档的衍生作品和翻译必须置于开放软件许可之下并且必须保留原始版权声明。如果您为此书贡献了新材料您必须提供材料和源代码以供您的修订。请直接向文档维护者Jim
Huang <jserv@ccns.ncku.edu.tw>提供修订和更新。这将允许合并更新并提供给Linux社区一致的修订。
- en: If you publish or distribute this book commercially, donations, royalties, or
printed copies are greatly appreciated by the author and the [Linux Documentation
Project](https://tldp.org/) (LDP). Contributing in this way shows your support
for free software and the LDP. If you have questions or comments, please contact
the address above.
id: totrans-93
prefs: []
type: TYPE_NORMAL
zh: 如果您商业出版或分发此书,作者和 [Linux 文档项目](https://tldp.org/)LDP将非常感激捐赠、版税或印刷副本。以这种方式做出贡献表明您支持免费软件和
LDP。如果您有任何问题或评论请通过上述地址联系。
- en: 1.1 Authorship
id: totrans-94
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.1 著作权
- en: The Linux Kernel Module Programming Guide was initially authored by Ori Pomerantz
for Linux v2.2\. As the Linux kernel evolved, Oris availability to maintain the
document diminished. Consequently, Peter Jay Salzman assumed the role of maintainer
and updated the guide for Linux v2.4\. Similar constraints arose for Peter when
tracking developments in Linux v2.6, leading to Michael Burian joining as a co-maintainer
to bring the guide up to speed with Linux v2.6\. Bob Mottram contributed to the
guide by updating examples for Linux v3.8 and later. Jim Huang then undertook
the task of updating the guide for recent Linux versions (v5.0 and beyond), along
with revising the LaTeX document. The guide continues to be maintained for compatibility
with modern kernels (v6.x series) while ensuring examples work with older LTS
kernels.
id: totrans-95
prefs: []
type: TYPE_NORMAL
zh: 《Linux 内核模块编程指南》最初由 Ori Pomerantz 为 Linux v2.2 版本编写。随着 Linux 内核的演变Ori 维护文档的能力逐渐减弱。因此Peter
Jay Salzman 接替了维护者的角色,并为 Linux v2.4 版本更新了指南。当 Peter 跟踪 Linux v2.6 版本的进展时,也遇到了类似的限制,导致
Michael Burian 加入作为共同维护者,使指南与 Linux v2.6 版本保持同步。Bob Mottram 通过更新 Linux v3.8 及以后的示例为指南做出了贡献。随后Jim
Huang 承担了更新指南以适应最新 Linux 版本v5.0 及以上)的任务,同时修订了 LaTeX 文档。指南继续维护以兼容现代内核v6.x 系列),同时确保示例与较旧的
LTS 内核兼容。
- en: 1.2 Acknowledgements
id: totrans-96
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.2 致谢
- en: 'The following people have contributed corrections or good suggestions:'
id: totrans-97
prefs: []
type: TYPE_NORMAL
zh: 以下人员对纠正或提出了良好的建议:
- en: Amit Dhingra, Andrew Kreimer, Andrew Lin, Andy Shevchenko, Arush Sharma, Aykhan
Hagverdili, Benno Bielmeier, Bob Lee, Brad Baker, Che-Chia Chang, Cheng-Shian
Yeh, Cheng-Yang Chou, Chih-En Lin, Chih-Hsuan Yang, Chih-Yu Chen, Ching-Hua (Vivian)
Lin, Chin Yik Ming, Chung-Han Tsai, cvvletter, Cyril Brulebois, Daniele Paolo
Scarpazza, David Porter, demonsome, Dimo Velev, Ekang Monyet, Ethan Chan, Francois
Audeon, Gilad Reti, Hao.Dong, heartofrain, Horst Schirmeier, Hsin-Hsiang Peng,
Hung-Jen Pao, Ignacio Martin, I-Hsin Cheng, Integral, Iûnn Kiàn-îng, Jian-Xing
Wu, Jimmy Ma, Johan Calle, keytouch, Kohei Otsuka, Kuan-Wei Chiu, manbing, Marconi
Jiang, mengxinayan, Meng-Zong Tsai, Peter Lin, Roman Lakeev, Sam Erickson, Shao-Tse
Hung, Shih-Sheng Yang, Stacy Prowell, Steven Lung, Tristan Lelong, Tse-Wei Lin,
Tucker Polomik, Tyler Fanelli, VxTeemo, Wei-Hsin Yeh, Wei-Lun Tsai, Xatierlike
Lee, Yan-Jie Chan, Yen-Yu Chen, Yin-Chiuan Chen, Yi-Wei Lin, Yo-Jung Lin, Yu-Chun
Lin, Yu-Hsiang Tseng, YYGO.
id: totrans-98
prefs: []
type: TYPE_NORMAL
zh: Amit Dhingra, Andrew Kreimer, Andrew Lin, Andy Shevchenko, Arush Sharma, Aykhan
Hagverdili, Benno Bielmeier, Bob Lee, Brad Baker, Che-Chia Chang, Cheng-Shian
Yeh, Cheng-Yang Chou, Chih-En Lin, Chih-Hsuan Yang, Chih-Yu Chen, Ching-Hua (Vivian)
Lin, Chin Yik Ming, Chung-Han Tsai, cvvletter, Cyril Brulebois, Daniele Paolo
Scarpazza, David Porter, demonsome, Dimo Velev, Ekang Monyet, Ethan Chan, Francois
Audeon, Gilad Reti, Hao.Dong, heartofrain, Horst Schirmeier, Hsin-Hsiang Peng,
Hung-Jen Pao, Ignacio Martin, I-Hsin Cheng, Integral, Iûnn Kiàn-îng, Jian-Xing
Wu, Jimmy Ma, Johan Calle, keytouch, Kohei Otsuka, Kuan-Wei Chiu, manbing, Marconi
Jiang, mengxinayan, Meng-Zong Tsai, Peter Lin, Roman Lakeev, Sam Erickson, Shao-Tse
Hung, Shih-Sheng Yang, Stacy Prowell, Steven Lung, Tristan Lelong, Tse-Wei Lin,
Tucker Polomik, Tyler Fanelli, VxTeemo, Wei-Hsin Yeh, Wei-Lun Tsai, Xatierlike
Lee, Yan-Jie Chan, Yen-Yu Chen, Yin-Chiuan Chen, Yi-Wei Lin, Yo-Jung Lin, Yu-Chun
Lin, Yu-Hsiang Tseng, YYGO。
- en: 1.3 What Is A Kernel Module?
id: totrans-99
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.3 什么是内核模块?
- en: Involvement in the development of Linux kernel modules requires a foundation
in the C programming language and a track record of creating conventional programs
intended for process execution. This pursuit delves into a domain where an unregulated
pointer, if disregarded, may potentially trigger the total elimination of an entire
filesystem, resulting in a scenario that necessitates a complete system reboot.
id: totrans-100
prefs: []
type: TYPE_NORMAL
zh: 参与开发 Linux 内核模块需要具备 C 编程语言的基础,并拥有创建旨在执行进程的传统程序的历史记录。这项追求深入到一个领域,如果忽视未受管理的指针,可能会触发整个文件系统的完全消除,导致需要完全系统重启的情景。
- en: A Linux kernel module is precisely defined as a code segment capable of dynamic
loading and unloading within the kernel as needed. These modules enhance kernel
capabilities without necessitating a system reboot. A notable example is seen
in the device driver module, which facilitates kernel interaction with hardware
components linked to the system. In the absence of modules, the prevailing approach
leans toward monolithic kernels, requiring direct integration of new functionalities
into the kernel image. This approach leads to larger kernels and necessitates
kernel rebuilding and subsequent system rebooting when new functionalities are
desired.
id: totrans-101
prefs: []
type: TYPE_NORMAL
zh: Linux 内核模块精确地定义为一段可以在内核中按需动态加载和卸载的代码。这些模块增强了内核功能,而无需重新启动系统。一个显著的例子是设备驱动模块,它促进了内核与系统连接的硬件组件之间的交互。如果没有模块,当前的方法倾向于使用单核内核,需要将新功能直接集成到内核映像中。这种方法会导致内核变大,并在需要新功能时需要重建内核和随后的系统重启。
- en: 1.4 Kernel module package
id: totrans-102
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.4 内核模块包
- en: Linux distributions provide the commands `modprobe` , `insmod` and `depmod`
within a package.
id: totrans-103
prefs: []
type: TYPE_NORMAL
zh: Linux 发行版在包中提供了 `modprobe`、`insmod` 和 `depmod` 命令。
- en: 'On Ubuntu/Debian GNU/Linux:'
id: totrans-104
prefs: []
type: TYPE_NORMAL
zh: 在 Ubuntu/Debian GNU/Linux 上:
- en: '[PRE0]'
id: totrans-105
prefs: []
type: TYPE_PRE
zh: '[PRE0]'
- en: 'On Arch Linux:'
id: totrans-106
prefs: []
type: TYPE_NORMAL
zh: 在 Arch Linux 上:
- en: '[PRE1]'
id: totrans-107
prefs: []
type: TYPE_PRE
zh: '[PRE1]'
- en: 1.5 What Modules are in my Kernel?
id: totrans-108
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.5 我的内核中有什么模块?
- en: To discover what modules are already loaded within your current kernel, use
the command `lsmod` .
id: totrans-109
prefs: []
type: TYPE_NORMAL
zh: 要发现当前内核中已经加载的模块,请使用命令 `lsmod`。
- en: '[PRE2]'
id: totrans-110
prefs: []
type: TYPE_PRE
zh: '[PRE2]'
- en: 'Modules are stored within the file /proc/modules, so you can also see them
with:'
id: totrans-111
prefs: []
type: TYPE_NORMAL
zh: 模块存储在文件 /proc/modules 中,因此您也可以使用以下命令查看它们:
- en: '[PRE3]'
id: totrans-112
prefs: []
type: TYPE_PRE
zh: '[PRE3]'
- en: 'This can be a long list, and you might prefer to search for something particular.
To search for the fat module:'
id: totrans-113
prefs: []
type: TYPE_NORMAL
zh: 这可能是一个很长的列表,您可能更喜欢搜索特定内容。要搜索 fat 模块:
- en: '[PRE4]'
id: totrans-114
prefs: []
type: TYPE_PRE
zh: '[PRE4]'
- en: 1.6 Is there a need to download and compile the kernel?
id: totrans-115
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.6 是否需要下载和编译内核?
- en: To effectively follow this guide, there is no obligatory requirement for performing
such actions. Nonetheless, a prudent approach involves executing the examples
within a test distribution on a virtual machine, thus mitigating any potential
risk of disrupting the system.
id: totrans-116
prefs: []
type: TYPE_NORMAL
zh: 为了有效地遵循本指南,没有执行此类操作的强制性要求。然而,一种谨慎的方法是在虚拟机上的测试发行版中执行示例,从而降低对系统造成潜在风险的任何可能性。
- en: 1.7 Before We Begin
id: totrans-117
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 1.7 开始之前
- en: Before delving into code, certain matters require attention. Variances exist
among individuals systems, and distinct personal approaches are evident. The
achievement of successful compilation and loading of the inaugural “hello world”
program may, at times, present challenges. It is reassuring to note that overcoming
the initial obstacle on the first attempt paves the way for subsequent endeavors
to proceed seamlessly.
id: totrans-118
prefs: []
type: TYPE_NORMAL
zh: 在深入研究代码之前有一些事项需要关注。不同系统的差异存在并且明显的个人方法也很明显。首次尝试成功编译和加载第一个“hello world”程序有时可能会遇到挑战。值得注意的是首次尝试克服初始障碍为后续的顺利进展铺平了道路。
- en: Modversioning. A module compiled for one kernel will not load if a different
kernel is booted, unless `CONFIG_MODVERSIONS` is enabled in the kernel. Module
versioning will be discussed later in this guide. Until module versioning is covered,
the examples in this guide may not work correctly if running a kernel with modversioning
turned on. However, most stock Linux distribution kernels come with modversioning
enabled. If difficulties arise when loading the modules due to versioning errors,
consider compiling a kernel with modversioning turned off.
id: totrans-119
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 模块版本化。为某个内核编译的模块如果启动了不同的内核则无法加载,除非在内核中启用了 `CONFIG_MODVERSIONS`。模块版本化将在本指南的后面讨论。在覆盖模块版本化之前,如果运行启用了模块版本化的内核,本指南中的示例可能无法正确工作。然而,大多数股票
Linux 发行版内核都启用了模块版本化。如果由于版本错误而加载模块时出现困难,请考虑编译一个禁用了模块版本化的内核。
- en: Using the X Window System. It is highly recommended to extract, compile, and
load all the examples discussed in this guide from a console. Working on these
tasks within the X Window System is discouraged.
id: totrans-120
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 使用 X Window 系统。强烈建议从控制台提取、编译和加载本指南中讨论的所有示例。在 X Window 系统内执行这些任务是不被推荐的。
- en: Modules cannot directly print to the screen like `printf()` can, but they can
log information and warnings to the kernels log ring buffer. This output is not
automatically displayed on any console or terminal. To view kernel module messages,
you must use `dmesg` to read the kernel log ring buffer, or check the systemd
journal with `journalctl -k` for kernel messages. Refer to [Section 4](#hello-world)
for more information. The terminal or environment from which you load the module
does not affect where the output goes—it always goes to the kernel log.
id: totrans-121
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 模块不能像 `printf()` 一样直接打印到屏幕,但它们可以将信息和警告记录到内核的日志环形缓冲区。此输出不会自动在任何控制台或终端上显示。要查看内核模块消息,您必须使用
`dmesg` 读取内核日志环形缓冲区,或使用 `journalctl -k` 检查 systemd 日志以获取内核消息。有关更多信息,请参阅[第 4 节](#hello-world)。加载模块的终端或环境不会影响输出位置——它始终输出到内核日志。
- en: SecureBoot. Numerous modern computers arrive pre-configured with UEFI SecureBoot
enabled—an essential security standard ensuring booting exclusively through trusted
software endorsed by the original equipment manufacturer. Certain Linux distributions
even ship with the default Linux kernel configured to support SecureBoot. In these
cases, the kernel module necessitates a signed security key.
id: totrans-122
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: SecureBoot。许多现代计算机出厂时已预配置为启用 UEFI SecureBoot——这是一个确保仅通过原始设备制造商认可的受信任软件启动的必要安全标准。某些
Linux 发行版甚至默认配置了支持 SecureBoot 的 Linux 内核。在这些情况下,内核模块需要签名安全密钥。
- en: 'Failing that, an attempt to insert your first “hello world” module would result
in the message: “ERROR: could not insert module”. If this message “Lockdown: insmod:
unsigned module loading is restricted; see man kernel lockdown.7” appears in the
`dmesg` output, the simplest approach involves disabling UEFI SecureBoot from
the boot menu of your PC or laptop, allowing the successful insertion of the “hello
world” module. Naturally, an alternative involves undergoing intricate procedures
such as generating keys, system key installation, and module signing to achieve
functionality. However, this intricate process is less appropriate for beginners.
If interested, more detailed steps for [SecureBoot](https://wiki.debian.org/SecureBoot)
can be explored and followed.'
id: totrans-123
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: '如果失败尝试插入您的第一个“Hello World”模块将导致出现消息“ERROR: could not insert module”。如果此消息“Lockdown:
insmod: unsigned module loading is restricted; see man kernel lockdown.7”出现在 `dmesg`
输出中,最简单的方法是禁用 PC 或笔记本电脑的启动菜单中的 UEFI SecureBoot以允许成功插入“Hello World”模块。当然另一种方法是进行复杂的程序如生成密钥、系统密钥安装和模块签名以实现功能。然而这个过程对于初学者来说不太合适。如果您感兴趣可以探索并遵循[SecureBoot](https://wiki.debian.org/SecureBoot)的更详细步骤。'
- en: 2 Headers
id: totrans-124
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 2 头文件
- en: Before building anything, it is necessary to install the header files for the
kernel.
id: totrans-125
prefs: []
type: TYPE_NORMAL
zh: 在构建任何东西之前,需要安装内核的头文件。
- en: 'On Ubuntu/Debian GNU/Linux:'
id: totrans-126
prefs: []
type: TYPE_NORMAL
zh: 在 Ubuntu/Debian GNU/Linux 上:
- en: '[PRE5]'
id: totrans-127
prefs: []
type: TYPE_PRE
zh: '[PRE5]'
- en: 'The following command provides information about the available kernel header
files. Then, for example:'
id: totrans-128
prefs: []
type: TYPE_NORMAL
zh: 以下命令提供了有关可用内核头文件的信息。例如:
- en: '[PRE6]'
id: totrans-129
prefs: []
type: TYPE_PRE
zh: '[PRE6]'
- en: 'On Arch Linux:'
id: totrans-130
prefs: []
type: TYPE_NORMAL
zh: 在 Arch Linux 上:
- en: '[PRE7]'
id: totrans-131
prefs: []
type: TYPE_PRE
zh: '[PRE7]'
- en: 'On Fedora:'
id: totrans-132
prefs: []
type: TYPE_NORMAL
zh: 在 Fedora 上:
- en: '[PRE8]'
id: totrans-133
prefs: []
type: TYPE_PRE
zh: '[PRE8]'
- en: 3 Examples
id: totrans-134
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 3 示例
- en: All the examples from this document are available within the examples subdirectory.
id: totrans-135
prefs: []
type: TYPE_NORMAL
zh: 本文档中的所有示例都可在 examples 子目录中找到。
- en: Should compile errors occur, it may be due to a more recent kernel version being
in use, or there might be a need to install the corresponding kernel header files.
id: totrans-136
prefs: []
type: TYPE_NORMAL
zh: 如果出现编译错误,可能是由于正在使用较新的内核版本,或者可能需要安装相应的内核头文件。
- en: 4 Hello World
id: totrans-137
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 4 Hello World
- en: 4.1 The Simplest Module
id: totrans-138
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.1 最简单的模块
- en: Most individuals beginning their programming journey typically start with some
variant of a hello world example. It is unclear what the outcomes are for those
who deviate from this tradition, but it seems prudent to adhere to it. The learning
process will begin with a series of hello world programs that illustrate various
fundamental aspects of writing a kernel module.
id: totrans-139
prefs: []
type: TYPE_NORMAL
zh: 大多数开始编程之旅的人通常从某种“Hello World”示例的变体开始。对于偏离这一传统的人的结果尚不清楚但似乎遵循它更为谨慎。学习过程将从一系列展示编写内核模块各种基本方面的“Hello
World”程序开始。
- en: Presented next is the simplest possible module.
id: totrans-140
prefs: []
type: TYPE_NORMAL
zh: 下面展示的是最简单的模块。
- en: 'Make a test directory:'
id: totrans-141
prefs: []
type: TYPE_NORMAL
zh: 创建一个测试目录:
- en: '[PRE9]'
id: totrans-142
prefs: []
type: TYPE_PRE
zh: '[PRE9]'
- en: 'Paste this into your favorite editor and save it as hello-1.c:'
id: totrans-143
prefs: []
type: TYPE_NORMAL
zh: 将以下内容粘贴到您喜欢的编辑器中,并保存为 hello-1.c
- en: '[PRE10]'
id: totrans-144
prefs: []
type: TYPE_PRE
zh: '[PRE10]'
- en: Now you will need a Makefile. If you copy and paste this, change the indentation
to use tabs, not spaces.
id: totrans-145
prefs: []
type: TYPE_NORMAL
zh: 现在您需要一个 Makefile。如果您复制并粘贴请将缩进更改为使用制表符而不是空格。
- en: '[PRE11]'
id: totrans-146
prefs: []
type: TYPE_PRE
zh: '[PRE11]'
- en: In Makefile, $(CURDIR) can be set to the absolute pathname of the current working
directory (after all -C options are processed, if any). See more about CURDIR
in [GNU make manual](https://www.gnu.org/software/make/manual/make.html).
id: totrans-147
prefs: []
type: TYPE_NORMAL
zh: 在 Makefile 中,$(CURDIR) 可以设置为当前工作目录的绝对路径名(在处理完所有 -C 选项之后,如果有的话)。有关 CURDIR 的更多信息,请参阅[GNU
make 手册](https://www.gnu.org/software/make/manual/make.html)。
- en: And finally, just run make directly.
id: totrans-148
prefs: []
type: TYPE_NORMAL
zh: 最后,直接运行 make。
- en: '[PRE12]'
id: totrans-149
prefs: []
type: TYPE_PRE
zh: '[PRE12]'
- en: 'If there is no PWD := $(CURDIR) statement in the Makefile, then it may not
compile correctly with sudo make. This is because some environment variables are
specified by the security policy and cannot be inherited. The default security
policy is sudoers. In the sudoers security policy, env_reset is enabled by default,
which restricts environment variables. Specifically, path variables are not retained
from the user environment; they are set to default values (for more information,
see: [sudoers manual](https://www.sudo.ws/docs/man/sudoers.man/)). You can see
the environment variable settings by:'
id: totrans-150
prefs: []
type: TYPE_NORMAL
zh: 如果 Makefile 中没有 PWD := $(CURDIR) 语句,那么使用 sudo make 可能无法正确编译。这是因为一些环境变量由安全策略指定,不能被继承。默认的安全策略是
sudoers。在 sudoers 安全策略中env_reset 默认启用,这限制了环境变量。具体来说,路径变量不会保留用户环境中的值;它们被设置为默认值(更多信息,请参阅:[sudoers
手册](https://www.sudo.ws/docs/man/sudoers.man/))。你可以通过以下方式查看环境变量设置:
- en: '[PRE13]'
id: totrans-151
prefs: []
type: TYPE_PRE
zh: '[PRE13]'
- en: Here is a simple Makefile as an example to demonstrate the problem mentioned
above.
id: totrans-152
prefs: []
type: TYPE_NORMAL
zh: 这里有一个简单的 Makefile 示例,用于演示上述提到的问题。
- en: '[PRE14]'
id: totrans-153
prefs: []
type: TYPE_PRE
zh: '[PRE14]'
- en: Then, we can use the -p flag to print out the environment variable values from
the Makefile.
id: totrans-154
prefs: []
type: TYPE_NORMAL
zh: 然后,我们可以使用 -p 标志打印出 Makefile 中的环境变量值。
- en: '[PRE15]'
id: totrans-155
prefs: []
type: TYPE_PRE
zh: '[PRE15]'
- en: The PWD variable will not be inherited with sudo.
id: totrans-156
prefs: []
type: TYPE_NORMAL
zh: PWD 变量在使用 sudo 时不被继承。
- en: '[PRE16]'
id: totrans-157
prefs: []
type: TYPE_PRE
zh: '[PRE16]'
- en: However, there are three ways to solve this problem.
id: totrans-158
prefs: []
type: TYPE_NORMAL
zh: 然而,有三种方法可以解决这个问题。
- en: You can use the -E flag to temporarily preserve them.
id: totrans-159
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 你可以使用 -E 标志临时保留它们。
- en: '[PRE17]'
id: totrans-160
prefs:
- PREF_IND
type: TYPE_PRE
zh: '[PRE17]'
- en: You can disable env_reset by editing /etc/sudoers as root using visudo.
id: totrans-161
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 作为 root 用户编辑 /etc/sudoers可以禁用 env_reset。
- en: '[PRE18]'
id: totrans-162
prefs:
- PREF_IND
type: TYPE_PRE
zh: '[PRE18]'
- en: Then execute env and sudo env individually.
id: totrans-163
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 然后分别执行 env 和 sudo env。
- en: '[PRE19]'
id: totrans-164
prefs:
- PREF_IND
type: TYPE_PRE
zh: '[PRE19]'
- en: You can view and compare these logs to find differences between env_reset and
!env_reset.
id: totrans-165
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 你可以查看并比较这些日志,以找到 env_reset 和 !env_reset 之间的差异。
- en: You can preserve environment variables by appending them to env_keep in /etc/sudoers.
id: totrans-166
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 你可以通过将它们附加到 /etc/sudoers 中的 env_keep 来保留环境变量。
- en: '[PRE20]'
id: totrans-167
prefs:
- PREF_IND
type: TYPE_PRE
zh: '[PRE20]'
- en: 'After applying the above change, you can check the environment variable settings
by:'
id: totrans-168
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 应用上述更改后,你可以通过以下方式检查环境变量设置:
- en: '[PRE21]'
id: totrans-169
prefs:
- PREF_IND
type: TYPE_PRE
zh: '[PRE21]'
- en: 'If all goes smoothly you should then find that you have a compiled hello-1.ko
module. You can find info on it with the command:'
id: totrans-170
prefs: []
type: TYPE_NORMAL
zh: 如果一切顺利,你应该会发现你有一个编译好的 hello-1.ko 模块。你可以使用以下命令获取相关信息:
- en: '[PRE22]'
id: totrans-171
prefs: []
type: TYPE_PRE
zh: '[PRE22]'
- en: 'At this point the command:'
id: totrans-172
prefs: []
type: TYPE_NORMAL
zh: 在这一点上,以下命令:
- en: '[PRE23]'
id: totrans-173
prefs: []
type: TYPE_PRE
zh: '[PRE23]'
- en: 'should return nothing. You can try loading your new module with:'
id: totrans-174
prefs: []
type: TYPE_NORMAL
zh: 应该不会返回任何内容。你可以尝试使用以下命令加载你的新模块:
- en: '[PRE24]'
id: totrans-175
prefs: []
type: TYPE_PRE
zh: '[PRE24]'
- en: 'The dash character will get converted to an underscore, so when you again try:'
id: totrans-176
prefs: []
type: TYPE_NORMAL
zh: 连字符将被转换为下划线,所以当你再次尝试时:
- en: '[PRE25]'
id: totrans-177
prefs: []
type: TYPE_PRE
zh: '[PRE25]'
- en: 'You should now see your loaded module. It can be removed again with:'
id: totrans-178
prefs: []
type: TYPE_NORMAL
zh: 现在,你应该能看到你加载的模块。它可以再次使用以下命令删除:
- en: '[PRE26]'
id: totrans-179
prefs: []
type: TYPE_PRE
zh: '[PRE26]'
- en: 'Notice that the dash was replaced by an underscore. To see the modules output
messages, use `dmesg` to view the kernel log ring buffer:'
id: totrans-180
prefs: []
type: TYPE_NORMAL
zh: 注意到连字符已被替换为下划线。要查看模块的输出消息,请使用 `dmesg` 查看内核日志环缓冲区:
- en: '[PRE27]'
id: totrans-181
prefs: []
type: TYPE_PRE
zh: '[PRE27]'
- en: 'You should see messages like “Hello world 1.” and “Goodbye world 1.” from your
module. Alternatively, you can check the systemd journal for kernel messages:'
id: totrans-182
prefs: []
type: TYPE_NORMAL
zh: 你应该会看到来自你的模块的消息例如“Hello world 1.”和“Goodbye world 1.”。或者,你可以检查 systemd 日志以获取内核消息:
- en: '[PRE28]'
id: totrans-183
prefs: []
type: TYPE_PRE
zh: '[PRE28]'
- en: You now know the basics of creating, compiling, installing and removing modules.
Now for more of a description of how this module works.
id: totrans-184
prefs: []
type: TYPE_NORMAL
zh: 现在,你已经了解了创建、编译、安装和删除模块的基本知识。现在让我们更详细地描述这个模块的工作原理。
- en: 'Kernel modules must have at least two functions: a "start" (initialization)
function called `init_module()` which is called when the module is `insmod` ed
into the kernel, and an "end" (cleanup) function called `cleanup_module()` which
is called just before it is removed from the kernel. Actually, things have changed
starting with kernel 2.3.13\. You can now use whatever name you like for the start
and end functions of a module, and you will learn how to do this in [Section 4.2](#hello-and-goodbye).
In fact, the new method is the preferred method. However, many people still use
`init_module()` and `cleanup_module()` for their start and end functions.'
id: totrans-185
prefs: []
type: TYPE_NORMAL
zh: 内核模块必须至少有两个函数一个名为“start”初始化的函数称为`init_module()`,当模块被`insmod`到内核中时调用以及一个名为“end”清理的函数称为`cleanup_module()`在它从内核中移除之前调用。实际上从2.3.13内核开始,事情已经发生了变化。你现在可以为模块的起始和结束函数使用任何你喜欢的名称,你将在[第4.2节](#hello-and-goodbye)中了解到如何做到这一点。实际上,新方法是首选方法。然而,许多人仍然使用`init_module()`和`cleanup_module()`作为它们的起始和结束函数。
- en: Typically, `init_module()` either registers a handler for something with the
kernel, or it replaces one of the kernel functions with its own code (usually
code to do something and then call the original function). The `cleanup_module()`
function is supposed to undo whatever `init_module()` did, so the module can be
unloaded safely.
id: totrans-186
prefs: []
type: TYPE_NORMAL
zh: 通常,`init_module()`要么向内核注册一个处理程序,要么用它的代码替换内核中的一个函数(通常是执行某些操作然后调用原始函数的代码)。`cleanup_module()`函数应该撤销`init_module()`所做的操作,以便模块可以安全卸载。
- en: Lastly, every kernel module needs to include <linux/module.h>. We needed to
include <linux/printk.h> only for the macro expansion for the `pr_alert()` log
level, which youll learn about in [Item 2](#x1-121702).
id: totrans-187
prefs: []
type: TYPE_NORMAL
zh: 最后,每个内核模块都需要包含<linux/module.h>。我们只需要包含<linux/printk.h>来为`pr_alert()`日志级别的宏进行展开,你将在[项目2](#x1-121702)中了解到这一点。
- en: A point about coding style. Another thing that may not be immediately obvious
to anyone getting started with kernel programming is that indentation within your
code should use tabs and not spaces. It is one of the coding conventions of the
kernel. You may not like it, but you will need to get used to it if you ever submit
a patch upstream.
id: totrans-188
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 关于编码风格的一点。对于刚开始接触内核编程的人来说可能不太明显的是,你的代码缩进应该使用制表符而不是空格。这是内核的编码约定之一。你可能不喜欢它,但如果你要向上游提交补丁,你将需要习惯它。
- en: Introducing print macros. In the beginning there was `printk` , usually followed
by a priority such as `KERN_INFO` or `KERN_DEBUG` . More recently, this can also
be expressed in abbreviated form using a set of print macros, such as `pr_info`
and `pr_debug` . This just saves some mindless keyboard bashing and looks a bit
neater. They can be found within [include/linux/printk.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/printk.h).
Take time to read through the available priority macros.
id: totrans-189
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 介绍打印宏。最初是`printk`,通常后面跟着一个优先级,例如`KERN_INFO`或`KERN_DEBUG`。最近,这也可以通过使用一组打印宏来以缩写形式表达,例如`pr_info`和`pr_debug`。这仅仅节省了一些无意义的键盘敲击,看起来也更整洁。它们可以在[include/linux/printk.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/printk.h)中找到。花点时间阅读可用的优先级宏。
- en: 'Important: These functions write to the kernel log ring buffer, not directly
to any terminal or console. To view the output from your kernel modules, you must
use `dmesg` or `journalctl -k` .'
id: totrans-190
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 重要:这些函数将写入内核日志环形缓冲区,而不是直接写入任何终端或控制台。要查看内核模块的输出,你必须使用`dmesg`或`journalctl -k`。
- en: About Compiling. Kernel modules need to be compiled a bit differently from regular
userspace apps. Former kernel versions required us to care much about these settings,
which are usually stored in Makefiles. Although hierarchically organized, many
redundant settings accumulated in sublevel Makefiles and made them large and rather
difficult to maintain. Fortunately, there is a new way of doing these things,
called kbuild, and the build process for external loadable modules is now fully
integrated into the standard kernel build mechanism. To learn more about how to
compile modules which are not part of the official kernel (such as all the examples
you will find in this guide), see file [Documentation/kbuild/modules.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/modules.rst).
id: totrans-191
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 关于编译。内核模块需要以与常规用户空间应用不同的方式编译。早期内核版本要求我们非常关注这些设置,这些设置通常存储在 Makefiles 中。尽管它们是按层次组织的,但许多冗余设置在子级
Makefiles 中积累,使它们变得很大,而且相当难以维护。幸运的是,有一种新的方法来做这些事情,称为 kbuild外部可加载模块的构建过程现在已完全集成到标准内核构建机制中。要了解更多关于如何编译不属于官方内核的模块例如您将在本指南中找到的所有示例请参阅文件
[Documentation/kbuild/modules.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/modules.rst)。
- en: Additional details about Makefiles for kernel modules are available in [Documentation/kbuild/makefiles.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/makefiles.rst).
Be sure to read this and the related files before starting to hack Makefiles.
It will probably save you lots of work.
id: totrans-192
prefs:
- PREF_IND
type: TYPE_NORMAL
zh: 关于内核模块的 Makefiles 的更多详细信息,请参阅 [Documentation/kbuild/makefiles.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kbuild/makefiles.rst)。在开始修改
Makefiles 之前,请务必阅读此文件和相关文件。这可能会为您节省大量工作。
- en: Here is another exercise for the reader. See that comment above the return statement
in `init_module()` ? Change the return value to something negative, recompile
and load the module again. What happens?
id: totrans-193
prefs:
- PREF_IND
- PREF_BQ
type: TYPE_NORMAL
zh: 这里有一个给读者的练习。看看 `init_module()` 函数上方的注释?将返回值改为一个负数,重新编译并再次加载模块。会发生什么?
- en: 4.2 Hello and Goodbye
id: totrans-194
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.2 你好和再见
- en: 'In early kernel versions you had to use the `init_module` and `cleanup_module`
functions, as in the first hello world example, but these days you can name those
anything you want by using the `module_init` and `module_exit` macros. These macros
are defined in [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h).
The only requirement is that your init and cleanup functions must be defined before
calling those macros, otherwise you will get compilation errors. Here is an example
of this technique:'
id: totrans-195
prefs: []
type: TYPE_NORMAL
zh: 在早期内核版本中,您必须使用 `init_module` 和 `cleanup_module` 函数,就像第一个 hello world 示例中那样,但如今您可以通过使用
`module_init` 和 `module_exit` 宏来命名这些函数。这些宏在 [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
中定义。唯一的要求是您的初始化和清理函数必须在调用这些宏之前定义,否则您将得到编译错误。以下是一个此技术的示例:
- en: '[PRE29]'
id: totrans-196
prefs: []
type: TYPE_PRE
zh: '[PRE29]'
- en: 'So now we have two real kernel modules under our belt. Adding another module
is as simple as this:'
id: totrans-197
prefs: []
type: TYPE_NORMAL
zh: 因此,我们现在已经有了两个真正的内核模块。添加另一个模块就像这样:
- en: '[PRE30]'
id: totrans-198
prefs: []
type: TYPE_PRE
zh: '[PRE30]'
- en: Now have a look at [drivers/char/Makefile](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/Makefile)
for a real world example. As you can see, some things got hardwired into the kernel
(obj-y) but where have all those obj-m gone? Those familiar with shell scripts
will easily be able to spot them. For those who are not, the obj-$(CONFIG_FOO)
entries you see everywhere expand into obj-y or obj-m, depending on whether the
CONFIG_FOO variable has been set to y or m. While we are at it, those were exactly
the kind of variables that you have set in the .config file in the top-level directory
of the Linux kernel source tree, the last time you ran `make menuconfig` or something
similar.
id: totrans-199
prefs: []
type: TYPE_NORMAL
zh: 现在看看 [drivers/char/Makefile](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/char/Makefile)
以了解一个真实世界的例子。正如您所看到的一些东西被硬编码到内核中obj-y但那些 obj-m 去哪里了?熟悉 shell 脚本的人会很容易地找到它们。对于不熟悉的人,您看到的
obj-$(CONFIG_FOO) 条目会根据 CONFIG_FOO 变量是否设置为 y 或 m 而展开为 obj-y 或 obj-m。当我们谈论这个问题时这些正是您在上一次运行
`make menuconfig` 或类似命令时在 Linux 内核源树顶级目录的 .config 文件中设置的变量。
- en: 4.3 The __init and __exit Macros
id: totrans-200
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.3 __init 和 __exit 宏
- en: The `__init` macro causes the init function to be discarded and its memory freed
once the init function finishes for built-in drivers, but not loadable modules.
If you think about when the init function is invoked, this makes perfect sense.
id: totrans-201
prefs: []
type: TYPE_NORMAL
zh: '`__init` 宏会导致在初始化函数完成后丢弃初始化函数并释放其内存,但对于可加载模块则不会这样做。如果你考虑初始化函数被调用的时机,这完全说得通。'
- en: There is also an `__initdata` which works similarly to `__init` but for init
variables rather than functions.
id: totrans-202
prefs: []
type: TYPE_NORMAL
zh: 此外,还有一个 `__initdata` 宏,它的工作方式与 `__init` 类似,但用于初始化变量而不是函数,而不是函数。
- en: The `__exit` macro causes the omission of the function when the module is built
into the kernel, and like `__init` , has no effect for loadable modules. Again,
if you consider when the cleanup function runs, this makes complete sense; built-in
drivers do not need a cleanup function, while loadable modules do.
id: totrans-203
prefs: []
type: TYPE_NORMAL
zh: '`__exit` 宏会导致在模块被构建到内核中时省略该函数,并且与 `__init` 一样,对于可加载模块没有影响。再次强调,如果你考虑清理函数运行的时机,这完全说得通;内置驱动程序不需要清理函数,而可加载模块则需要。'
- en: 'These macros are defined in [include/linux/init.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/init.h)
and serve to free up kernel memory. When you boot your kernel and see something
like Freeing unused kernel memory: 236k freed, this is precisely what the kernel
is freeing.'
id: totrans-204
prefs: []
type: TYPE_NORMAL
zh: 这些宏在 [include/linux/init.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/init.h)
中定义用于释放内核内存。当你启动内核并看到类似“释放未使用的内核内存236k 释放”的消息时,这正是内核正在释放的内容。
- en: '[PRE31]'
id: totrans-205
prefs: []
type: TYPE_PRE
zh: '[PRE31]'
- en: 4.4 Licensing and Module Documentation
id: totrans-206
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.4 许可和模块文档
- en: 'Honestly, who loads or even cares about proprietary modules? If you do then
you might have seen something like this:'
id: totrans-207
prefs: []
type: TYPE_NORMAL
zh: 老实说,谁会加载甚至关心专有模块?如果你这样做,你可能见过类似这样的:
- en: '[PRE32]'
id: totrans-208
prefs: []
type: TYPE_PRE
zh: '[PRE32]'
- en: You can use a few macros to indicate the license for your module. Some examples
are "GPL", "GPL v2", "GPL and additional rights", "Dual BSD/GPL", "Dual MIT/GPL",
"Dual MPL/GPL" and "Proprietary". They are defined within [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h).
id: totrans-209
prefs: []
type: TYPE_NORMAL
zh: 你可以使用几个宏来指明你模块的许可。一些例子包括 "GPL"、"GPL v2"、"GPL 和额外权利"、"双 BSD/GPL"、"双 MIT/GPL"、"双
MPL/GPL" 和 "专有"。它们在 [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
中定义。
- en: To reference what license you are using, a macro is available called `MODULE_LICENSE`
. This and a few other macros describing the module are illustrated in the example
below.
id: totrans-210
prefs: []
type: TYPE_NORMAL
zh: 要引用你正在使用的许可,有一个名为 `MODULE_LICENSE` 的宏可用。以下示例中展示了该宏以及其他几个描述模块的宏。
- en: '[PRE33]'
id: totrans-211
prefs: []
type: TYPE_PRE
zh: '[PRE33]'
- en: 4.5 Passing Command Line Arguments to a Module
id: totrans-212
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.5 将命令行参数传递给模块
- en: Modules can take command line arguments, but not with the argc/argv you might
be used to.
id: totrans-213
prefs: []
type: TYPE_NORMAL
zh: 模块可以接受命令行参数,但不是使用你可能习惯的 argc/argv。
- en: To allow arguments to be passed to your module, declare the variables that will
take the values of the command line arguments as global and then use the `module_param()`
macro (defined in [include/linux/moduleparam.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/moduleparam.h))
to set the mechanism up. At runtime, `insmod` will fill the variables with any
command line arguments that are given, like `insmod mymodule.ko myvariable=5`
. The variable declarations and macros should be placed at the beginning of the
module for clarity. The example code should clear up my admittedly lousy explanation.
id: totrans-214
prefs: []
type: TYPE_NORMAL
zh: 要允许将参数传递给你的模块,声明将接受命令行参数值的变量为全局变量,然后使用 `module_param()` 宏(在 [include/linux/moduleparam.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/moduleparam.h)
中定义)来设置机制。在运行时,`insmod` 将填充任何给定的命令行参数到变量中,例如 `insmod mymodule.ko myvariable=5`。变量声明和宏应放置在模块的开头以提高清晰度。示例代码应该可以澄清我承认的糟糕解释。
- en: 'The `module_param()` macro takes 3 arguments: the name of the variable, its
type and permissions for the corresponding file in sysfs. Integer types can be
signed as usual or unsigned. If you would like to use arrays of integers or strings,
see `module_param_array()` and `module_param_string()` .'
id: totrans-215
prefs: []
type: TYPE_NORMAL
zh: '`module_param()` 宏接受 3 个参数:变量的名称、其类型以及对应于 sysfs 中文件的权限。整数类型可以是通常的带符号整数或无符号整数。如果你想要使用整数或字符串数组,请参阅
`module_param_array()` 和 `module_param_string()` 。'
- en: '[PRE34]'
id: totrans-216
prefs: []
type: TYPE_PRE
zh: '[PRE34]'
- en: 'Arrays are supported too, but things are a bit different now than they were
in the olden days. To keep track of the number of parameters, you need to pass
a pointer to a count variable as the third parameter. At your option, you could
also ignore the count and pass `NULL` instead. We show both possibilities here:'
id: totrans-217
prefs: []
type: TYPE_NORMAL
zh: 数组也受到支持,但现在的情况与过去有些不同。为了跟踪参数的数量,您需要将计数变量的指针作为第三个参数传递。根据您的选择,您也可以忽略计数并传递`NULL`。这里展示了两种可能性:
- en: '[PRE35]'
id: totrans-218
prefs: []
type: TYPE_PRE
zh: '[PRE35]'
- en: A good use for this is to have the module variables default values set, like
a port or IO address. If the variables contain the default values, then perform
autodetection (explained elsewhere). Otherwise, keep the current value. This will
be made clear later on.
id: totrans-219
prefs: []
type: TYPE_NORMAL
zh: 这种用法的一个好例子是设置模块变量的默认值比如端口或I/O地址。如果变量包含默认值则执行自动检测在其他地方解释。否则保持当前值。这将在稍后说明。
- en: 'Lastly, there is a macro function, `MODULE_PARM_DESC()` , that is used to document
arguments that the module can take. It takes two parameters: a variable name and
a free form string describing that variable.'
id: totrans-220
prefs: []
type: TYPE_NORMAL
zh: 最后,有一个宏函数`MODULE_PARM_DESC()`,用于记录模块可以接受的参数。它接受两个参数:一个变量名和一个描述该变量的自由格式字符串。
- en: '[PRE36]'
id: totrans-221
prefs: []
type: TYPE_PRE
zh: '[PRE36]'
- en: 'It is recommended to experiment with the following code:'
id: totrans-222
prefs: []
type: TYPE_NORMAL
zh: 建议您尝试以下代码:
- en: '[PRE37]'
id: totrans-223
prefs: []
type: TYPE_PRE
zh: '[PRE37]'
- en: 4.6 Modules Spanning Multiple Files
id: totrans-224
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.6 多文件跨越的模块
- en: Sometimes it makes sense to divide a kernel module between several source files.
id: totrans-225
prefs: []
type: TYPE_NORMAL
zh: 有时候,将内核模块分割成几个源文件是有意义的。
- en: Here is an example of such a kernel module.
id: totrans-226
prefs: []
type: TYPE_NORMAL
zh: 这里是一个这样的内核模块的例子。
- en: '[PRE38]'
id: totrans-227
prefs: []
type: TYPE_PRE
zh: '[PRE38]'
- en: 'The next file:'
id: totrans-228
prefs: []
type: TYPE_NORMAL
zh: 下一个文件:
- en: '[PRE39]'
id: totrans-229
prefs: []
type: TYPE_PRE
zh: '[PRE39]'
- en: 'And finally, the makefile:'
id: totrans-230
prefs: []
type: TYPE_NORMAL
zh: 最后是makefile
- en: '[PRE40]'
id: totrans-231
prefs: []
type: TYPE_PRE
zh: '[PRE40]'
- en: This is the complete makefile for all the examples we have seen so far. The
first five lines are nothing special, but for the last example we will need two
lines. First we invent an object name for our combined module, second we tell
`make` what object files are part of that module.
id: totrans-232
prefs: []
type: TYPE_NORMAL
zh: 这是到目前为止我们所看到的所有示例的完整makefile。前五行没有什么特别之处但为了最后的例子我们需要两行。首先我们为我们的组合模块发明一个对象名然后我们告诉`make`哪些目标文件是该模块的一部分。
- en: 4.7 Building modules for a precompiled kernel
id: totrans-233
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 4.7 为预编译内核构建模块
- en: 'Obviously, we strongly suggest you to recompile your kernel, so that you can
enable a number of useful debugging features, such as forced module unloading
( `MODULE_FORCE_UNLOAD` ): when this option is enabled, you can force the kernel
to unload a module even when it believes it is unsafe, via a `sudo rmmod -f module`
command. This option can save you a lot of time and a number of reboots during
the development of a module. If you do not want to recompile your kernel then
you should consider running the examples within a test distribution on a virtual
machine. If you mess anything up then you can easily reboot or restore the virtual
machine (VM).'
id: totrans-234
prefs: []
type: TYPE_NORMAL
zh: 显然,我们强烈建议您重新编译内核,以便您可以使用许多有用的调试功能,例如强制模块卸载(`MODULE_FORCE_UNLOAD`):当此选项启用时,您可以通过`sudo
rmmod -f module`命令强制内核卸载模块即使内核认为这样做不安全。此选项可以在模块开发过程中节省您大量时间和多次重启。如果您不想重新编译内核那么您应该考虑在虚拟机上运行测试分布中的示例。如果您搞砸了您可以轻松地重启或恢复虚拟机VM
- en: There are a number of cases in which you may want to load your module into a
precompiled running kernel, such as the ones shipped with common Linux distributions,
or a kernel you have compiled in the past. In certain circumstances you could
require to compile and insert a module into a running kernel which you are not
allowed to recompile, or on a machine that you prefer not to reboot. If you cant
think of a case that will force you to use modules for a precompiled kernel you
might want to skip this and treat the rest of this chapter as a big footnote.
id: totrans-235
prefs: []
type: TYPE_NORMAL
zh: 在某些情况下您可能希望将您的模块加载到预编译的运行内核中例如与常见Linux发行版一起提供的内核或者您过去编译的内核。在某些情况下您可能需要编译并将模块插入到您不允许重新编译的运行内核中或者在一个您不想重启的机器上。如果您想不出任何必须使用预编译内核模块的情况您可能想跳过这部分并将本章的其余部分视为一个大的脚注。
- en: 'Now, if you just install a kernel source tree, use it to compile your kernel
module and you try to insert your module into the kernel, in most cases you would
obtain an error as follows:'
id: totrans-236
prefs: []
type: TYPE_NORMAL
zh: 现在,如果您只是安装了一个内核源树,使用它来编译您的内核模块,并尝试将您的模块插入内核,在大多数情况下,您会得到以下错误:
- en: '[PRE41]'
id: totrans-237
prefs: []
type: TYPE_PRE
zh: '[PRE41]'
- en: 'Less cryptic information is logged to the systemd journal:'
id: totrans-238
prefs: []
type: TYPE_NORMAL
zh: 更不神秘的日志信息记录到systemd日志中
- en: '[PRE42]'
id: totrans-239
prefs: []
type: TYPE_PRE
zh: '[PRE42]'
- en: 'In other words, your kernel refuses to accept your module because version strings
(more precisely, version magic, see [include/linux/vermagic.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/vermagic.h))
do not match. Incidentally, version magic strings are stored in the module object
in the form of a static string, starting with `vermagic:` . Version data are inserted
in your module when it is linked against the kernel/module.o file. To inspect
version magics and other strings stored in a given module, issue the command `modinfo module.ko`
:'
id: totrans-240
prefs: []
type: TYPE_NORMAL
zh: 换句话说,您的内核拒绝接受您的模块,因为版本字符串(更准确地说,版本魔法,见[include/linux/vermagic.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/vermagic.h))不匹配。顺便提一下,版本魔法字符串以`vermagic:`开头的形式存储在模块对象中。当模块与内核/module.o文件链接时会插入版本数据。要检查给定模块中存储的版本魔法和其他字符串请发出命令`modinfo module.ko`
- en: '[PRE43]'
id: totrans-241
prefs: []
type: TYPE_PRE
zh: '[PRE43]'
- en: To overcome this problem we could resort to the --force-vermagic option, but
this solution is potentially unsafe, and unquestionably unacceptable in production
modules. Consequently, we want to compile our module in an environment which was
identical to the one in which our precompiled kernel was built. How to do this,
is the subject of the remainder of this chapter.
id: totrans-242
prefs: []
type: TYPE_NORMAL
zh: 为了克服这个问题,我们可以求助于--force-vermagic选项但这种解决方案可能不安全并且在生产模块中无疑是不可接受的。因此我们希望在构建我们的模块时环境与我们的预编译内核构建时的环境完全相同。如何做到这一点是本章剩余部分的主题。
- en: 'First of all, make sure that a kernel source tree is available, having exactly
the same version as your current kernel. Then, find the configuration file which
was used to compile your precompiled kernel. Usually, this is available in your
current boot directory, under a name like config-5.14.x. You may just want to
copy it to your kernel source tree: ``cp /boot/config-`uname -r` .config`` .'
id: totrans-243
prefs: []
type: TYPE_NORMAL
zh: 首先确保有一个与您的当前内核版本完全相同的内核源树。然后找到用于编译您的预编译内核的配置文件。通常这个文件位于您的当前引导目录下名称类似于config-5.14.x。您可能只想将其复制到您的内核源树中``cp /boot/config-`uname -r` .config``。
- en: 'Lets focus again on the previous error message: a closer look at the version
magic strings suggests that, even with two configuration files which are exactly
the same, a slight difference in the version magic could be possible, and it is
sufficient to prevent insertion of the module into the kernel. That slight difference,
namely the custom string which appears in the modules version magic and not in
the kernels one, is due to a modification with respect to the original, in the
makefile that some distributions include. Then, examine your Makefile, and make
sure that the specified version information matches exactly the one used for your
current kernel. For example, your makefile could start as follows:'
id: totrans-244
prefs: []
type: TYPE_NORMAL
zh: 让我们再次关注之前的错误信息仔细查看版本魔法的字符串表明即使有两个完全相同的配置文件版本魔法的微小差异也是可能的并且足以防止模块被插入内核。这种微小差异即出现在模块版本魔法中而不出现在内核中的自定义字符串是由于某些发行版包含的makefile相对于原始版本的修改。然后检查您的Makefile并确保指定的版本信息与您当前内核使用的版本信息完全匹配。例如您的makefile可能以以下方式开始
- en: '[PRE44]'
id: totrans-245
prefs: []
type: TYPE_PRE
zh: '[PRE44]'
- en: In this case, you need to restore the value of symbol EXTRAVERSION to -rc2.
We suggest keeping a backup copy of the makefile used to compile your kernel available
in /lib/modules/5.14.0-rc2/build. A simple command as follows should suffice.
id: totrans-246
prefs: []
type: TYPE_NORMAL
zh: 在这种情况下您需要将符号EXTRAVERSION的值恢复到-rc2。我们建议保留一个备份副本的makefile该makefile用于编译您的内核并存储在/lib/modules/5.14.0-rc2/build中。以下简单的命令应该足够
- en: '[PRE45]'
id: totrans-247
prefs: []
type: TYPE_PRE
zh: '[PRE45]'
- en: Here `` linux-`uname -r` `` is the Linux kernel source you are attempting to
build.
id: totrans-248
prefs: []
type: TYPE_NORMAL
zh: 这里 `` linux-`uname -r` `` 是您试图构建的Linux内核源代码。
- en: 'Now, please run `make` to update configuration and version headers and objects:'
id: totrans-249
prefs: []
type: TYPE_NORMAL
zh: 现在,请运行`make`以更新配置和版本头文件和对象:
- en: '[PRE46]'
id: totrans-250
prefs: []
type: TYPE_PRE
zh: '[PRE46]'
- en: 'If you do not desire to actually compile the kernel, you can interrupt the
build process (CTRL-C) just after the SPLIT line, because at that time, the files
you need are ready. Now you can turn back to the directory of your module and
compile it: It will be built exactly according to your current kernel settings,
and it will load into it without any errors.'
id: totrans-251
prefs: []
type: TYPE_NORMAL
zh: 如果您不想实际编译内核可以在SPLIT行之后中断构建过程CTRL-C因为那时您需要的文件已经准备好了。现在您可以回到您的模块目录并编译它它将根据您当前的内核设置精确构建并且可以无错误地加载到内核中。
- en: 5 Preliminaries
id: totrans-252
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 5 初步
- en: 5.1 How modules begin and end
id: totrans-253
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.1 模块的开始和结束
- en: A typical program starts with a `main()` function, executes a series of instructions,
and terminates after completing these instructions. Kernel modules, however, follow
a different pattern. A module always begins with either the `init_module` function
or a function designated by the `module_init` call. This function acts as the
modules entry point, informing the kernel of the modules functionalities and
preparing the kernel to utilize the modules functions when necessary. After performing
these tasks, the entry function returns, and the module remains inactive until
the kernel requires its code.
id: totrans-254
prefs: []
type: TYPE_NORMAL
zh: 典型的程序从`main()`函数开始,执行一系列指令,并在完成这些指令后终止。然而,内核模块遵循不同的模式。模块总是以`init_module`函数或由`module_init`调用指定的函数开始。这个函数作为模块的入口点,向内核告知模块的功能,并准备内核在需要时利用模块的函数。完成这些任务后,入口函数返回,模块保持不活跃状态,直到内核需要其代码。
- en: All modules conclude by invoking either `cleanup_module` or a function specified
through the `module_exit` call. This serves as the modules exit function, reversing
the actions of the entry function by unregistering the previously registered functionalities.
id: totrans-255
prefs: []
type: TYPE_NORMAL
zh: 所有模块都以调用`cleanup_module`或通过`module_exit`调用指定的函数结束。这作为模块的出口函数,通过注销之前注册的功能来反转入口函数的操作。
- en: It is mandatory for every module to have both an entry and an exit function.
While there are multiple methods to define these functions, the terms “entry function”
and “exit function” are generally used. However, they may occasionally be referred
to as `init_module` and `cleanup_module` , which are understood to mean the same.
id: totrans-256
prefs: []
type: TYPE_NORMAL
zh: 每个模块都必须有一个入口函数和一个出口函数。虽然定义这些函数有多种方法,但通常使用“入口函数”和“出口函数”这两个术语。然而,它们有时也可能被称为`init_module`和`cleanup_module`,这些术语都被理解为具有相同的意思。
- en: 5.2 Functions available to modules
id: totrans-257
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.2 模块可用的函数
- en: Programmers use functions they do not define all the time. A prime example of
this is `printf()` . You use these library functions which are provided by the
standard C library, libc. The definitions for these functions do not actually
enter your program until the linking stage, which ensures that the code (for `printf()`
for example) is available, and fixes the call instruction to point to that code.
id: totrans-258
prefs: []
type: TYPE_NORMAL
zh: 程序员经常使用他们自己没有定义的函数。`printf()`就是这样一个典型的例子。你使用的是由标准C库libc提供的库函数。这些函数的定义实际上直到链接阶段才进入你的程序这确保了代码例如`printf()`的代码)可用,并固定了调用指令以指向该代码。
- en: Kernel modules are different here, too. In the hello world example, you might
have noticed that we used a function, `pr_info()` but did not include a standard
I/O library. That is because modules are object files whose symbols get resolved
upon running `insmod` or `modprobe` . The definition for the symbols comes from
the kernel itself; the only external functions you can use are the ones provided
by the kernel. If youre curious about what symbols have been exported by your
kernel, take a look at /proc/kallsyms.
id: totrans-259
prefs: []
type: TYPE_NORMAL
zh: 内核模块在这里也是如此。在“hello world”示例中你可能已经注意到我们使用了一个函数`pr_info()`但没有包含标准I/O库。这是因为模块是对象文件其符号在运行`insmod`或`modprobe`时得到解析。符号的定义来自内核本身;你可以使用的唯一外部函数是内核提供的函数。如果你对内核导出的符号感兴趣,可以查看/proc/kallsyms。
- en: One point to keep in mind is the difference between library functions and system
calls. Library functions are higher level, run completely in user space and provide
a more convenient interface for the programmer to the functions that do the real
work — system calls. System calls run in kernel mode on the users behalf and
are provided by the kernel itself. The library function `printf()` may look like
a very general printing function, but all it really does is format the data into
strings and write the string data using the low-level system call `write()` ,
which then sends the data to standard output.
id: totrans-260
prefs: []
type: TYPE_NORMAL
zh: 需要注意的一个问题是库函数和系统调用的区别。库函数是高级的,完全在用户空间运行,并为程序员提供了对执行实际工作的函数(即系统调用)的更方便的接口。系统调用在用户代表下以内核模式运行,并由内核本身提供。库函数`printf()`可能看起来是一个非常通用的打印函数,但实际上它只是将数据格式化为字符串,并使用低级系统调用`write()`将字符串数据写入,然后发送到标准输出。
- en: 'Would you like to see what system calls are made by `printf()` ? It is easy!
Compile the following program:'
id: totrans-261
prefs: []
type: TYPE_NORMAL
zh: 你想看看`printf()`做了哪些系统调用吗?这很简单!编译以下程序:
- en: '[PRE47]'
id: totrans-262
prefs: []
type: TYPE_PRE
zh: '[PRE47]'
- en: with `gcc -Wall -o hello hello.c` . Run the executable with `strace ./hello`
. Are you impressed? Every line you see corresponds to a system call. [strace](https://strace.io/)
is a handy program that gives you details about what system calls a program is
making, including which call is made, what its arguments are and what it returns.
It is an invaluable tool for figuring out things like what files a program is
trying to access. Towards the end, you will see a line which looks like `write(1, "hello", 5hello)`
. There it is. The face behind the `printf()` mask. You may not be familiar with
write, since most people use library functions for file I/O (like `fopen` , `fputs`
, `fclose` ). If that is the case, try looking at man 2 write. The 2nd man section
is devoted to system calls (like `kill()` and `read()` ). The 3rd man section
is devoted to library calls, which you would probably be more familiar with (like
`cosh()` and `random()` ).
id: totrans-263
prefs: []
type: TYPE_NORMAL
zh: 使用 `gcc -Wall -o hello hello.c` 编译。使用 `strace ./hello` 运行可执行文件。你感到惊讶了吗?你看到的每一行都对应一个系统调用。[strace](https://strace.io/)
是一个方便的程序,它可以提供关于程序正在执行哪些系统调用的详细信息,包括哪个调用被执行、它的参数是什么以及它返回了什么。它是确定诸如程序试图访问哪些文件之类的信息的一个无价工具。在最后,你会看到一行看起来像
`write(1, "hello", 5hello)` 的内容。就在那里。`printf()` 面具背后的面孔。你可能不熟悉 `write`,因为大多数人使用库函数进行文件
I/O如 `fopen`、`fputs`、`fclose`)。如果是这种情况,试着查看 man 2 write。第2个 man 部分man section是专门关于系统调用的
`kill()` 和 `read()`。第3个 man 部分是关于库调用的,你可能更熟悉(如 `cosh()` 和 `random()`)。
- en: You can even write modules to replace the kernels system calls, which we will
do shortly. Crackers often make use of this sort of thing for backdoors or trojans,
but you can write your own modules to do more benign things, like have the kernel
log a message whenever someone attempts to delete a file on your system.
id: totrans-264
prefs: []
type: TYPE_NORMAL
zh: 你甚至可以编写模块来替换内核的系统调用,我们很快就会这样做。黑客经常利用这类东西来创建后门或特洛伊木马,但你也可以编写自己的模块来做更无害的事情,比如当有人试图删除你系统上的文件时,让内核记录一条消息。
- en: 5.3 User Space vs Kernel Space
id: totrans-265
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.3 用户空间与内核空间
- en: The kernel primarily manages access to resources, be it a video card, hard drive,
or memory. Programs frequently vie for the same resources. For instance, as a
document is saved, updatedb might commence updating the locate database. Sessions
in editors like vim and processes like updatedb can simultaneously utilize the
hard drive. The kernels role is to maintain order, ensuring that users do not
access resources indiscriminately.
id: totrans-266
prefs: []
type: TYPE_NORMAL
zh: 内核主要管理对资源的访问无论是显卡、硬盘还是内存。程序经常争夺相同的资源。例如当文档被保存时updatedb 可能开始更新 locate 数据库。在
vim 等编辑器中的会话和 updatedb 等进程可以同时使用硬盘。内核的作用是维持秩序,确保用户不会无差别地访问资源。
- en: 'To manage this, CPUs operate in different modes, each offering varying levels
of system control. The Intel 80386 architecture, for example, featured four such
modes, known as rings. Unix, however, utilizes only two of these rings: the highest
ring (ring 0, also known as “supervisor mode”, where all actions are permissible)
and the lowest ring, referred to as “user mode”.'
id: totrans-267
prefs: []
type: TYPE_NORMAL
zh: 为了管理这一点CPU 在不同的模式下运行每个模式提供不同级别的系统控制。例如Intel 80386 架构具有四种这样的模式被称为环。然而Unix
只利用了这些环中的两个最高环ring 0也称为“管理程序模式”在这里所有操作都是允许的和最低环被称为“用户模式”。
- en: Recall the discussion about library functions vs system calls. Typically, you
use a library function in user mode. The library function calls one or more system
calls, and these system calls execute on the library functions behalf, but do
so in supervisor mode since they are part of the kernel itself. Once the system
call completes its task, it returns and execution gets transferred back to user
mode.
id: totrans-268
prefs: []
type: TYPE_NORMAL
zh: 回想一下关于库函数与系统调用的讨论。通常你在用户模式下使用库函数。库函数调用一个或多个系统调用这些系统调用代表库函数执行但它们在内核本身的部分以管理程序supervisor
mode执行。一旦系统调用完成其任务它就会返回执行控制权就会转回到用户模式。
- en: 5.4 Name Space
id: totrans-269
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.4 命名空间
- en: When you write a small C program, you use variables which are convenient and
make sense to the reader. If, on the other hand, you are writing routines which
will be part of a bigger problem, any global variables you have are part of a
community of other peoples global variables; some of the variable names can clash.
When a program has lots of global variables which arent meaningful enough to
be distinguished, you get namespace pollution. In large projects, effort must
be made to remember reserved names, and to find ways to develop a scheme for naming
unique variable names and symbols.
id: totrans-270
prefs: []
type: TYPE_NORMAL
zh: 当你编写一个小型C程序时你会使用方便且对读者有意义的变量。另一方面如果你正在编写将成为更大问题一部分的例程你拥有的任何全局变量都是其他人的全局变量社区的一部分一些变量名可能会冲突。当一个程序有很多没有足够意义来区分的全局变量时你会得到命名空间污染。在大型项目中必须努力记住保留的名称并找到开发命名唯一变量名和符号方案的方法。
- en: When writing kernel code, even the smallest module will be linked against the
entire kernel, so this is definitely an issue. The best way to deal with this
is to declare all your variables as static and to use a well-defined prefix for
your symbols. By convention, all kernel prefixes are lowercase. If you do not
want to declare everything as static, another option is to declare a symbol table
and register it with the kernel. We will get to this later.
id: totrans-271
prefs: []
type: TYPE_NORMAL
zh: 当编写内核代码时,即使是体积最小的模块也会与整个内核链接,所以这确实是一个问题。处理这个问题的最好方法是声明所有变量为静态的,并为你的符号使用一个定义良好的前缀。按照惯例,所有内核前缀都是小写的。如果你不想将所有内容都声明为静态的,另一个选项是声明一个符号表并将其注册到内核中。我们稍后会讨论这个问题。
- en: The file /proc/kallsyms holds all the symbols that the kernel knows about and
which are therefore accessible to your modules since they share the kernels codespace.
id: totrans-272
prefs: []
type: TYPE_NORMAL
zh: 文件/proc/kallsyms包含了内核所知道的所有符号因此这些符号可以通过你的模块访问因为它们共享内核的代码空间。
- en: 5.5 Code space
id: totrans-273
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.5 代码空间
- en: Memory management is a very complicated subject and the majority of OReillys
[Understanding The Linux Kernel](https://www.oreilly.com/library/view/understanding-the-linux/0596005652/)
exclusively covers memory management! We are not setting out to be experts on
memory management, but we do need to know a couple of facts to even begin worrying
about writing real modules.
id: totrans-274
prefs: []
type: TYPE_NORMAL
zh: 内存管理是一个非常复杂的话题O'Reilly的[《理解Linux内核》](https://www.oreilly.com/library/view/understanding-the-linux/0596005652/)一书专门涵盖了内存管理!我们并不是要成为内存管理方面的专家,但我们确实需要了解一些事实,才能开始担心编写真正的模块。
- en: If you have not thought about what a segfault really means, you may be surprised
to hear that pointers do not actually point to memory locations. Not real ones,
anyway. When a process is created, the kernel sets aside a portion of real physical
memory and hands it to the process to use for its executing code, variables, stack,
heap and other things which a computer scientist would know about. This memory
begins with 0x00000000 and extends up to whatever it needs to be. Since the memory
space for any two processes does not overlap, every process that can access a
memory address, say 0xbffff978, would be accessing a different location in real
physical memory! The processes would be accessing an index named 0xbffff978 which
points to some kind of offset into the region of memory set aside for that particular
process. For the most part, a process like our Hello, World program cannot access
the space of another process, although there are ways which we will talk about
later.
id: totrans-275
prefs: []
type: TYPE_NORMAL
zh: 如果你没有想过段错误segfault真正意味着什么你可能会惊讶地听到指针实际上并不指向内存位置。至少不是真正的内存位置。当创建一个进程时内核会为其实际物理内存分配一部分并将其交给进程用于执行代码、变量、堆栈、堆和其他计算机科学家会了解的东西。这段内存从0x00000000开始扩展到所需的任何位置。由于任何两个进程的内存空间都不会重叠因此任何可以访问内存地址例如0xbffff978的进程都会访问实际物理内存中的不同位置进程会访问一个名为0xbffff978的索引该索引指向为该特定进程保留的内存区域中的某种偏移量。在大多数情况下像我们的Hello,
World程序这样的进程无法访问另一个进程的空间尽管我们稍后会讨论一些方法。
- en: The kernel has its own space of memory as well. Since a module is code which
can be dynamically inserted and removed in the kernel (as opposed to a semi-autonomous
object), it shares the kernels codespace rather than having its own. Therefore,
if your module segfaults, the kernel segfaults. And if you start writing over
data because of an off-by-one error, then youre trampling on kernel data (or
code). This is even worse than it sounds, so try your best to be careful.
id: totrans-276
prefs: []
type: TYPE_NORMAL
zh: 内核也有自己的内存空间。由于模块是可以在内核中动态插入和删除的代码(与半自主对象相反),它共享内核的代码空间,而不是拥有自己的。因此,如果你的模块发生段错误,内核也会发生段错误。如果你因为偏移量错误而开始覆盖数据,那么你就是在践踏内核数据(或代码)。这比听起来更糟糕,所以请务必小心。
- en: It should be noted that the aforementioned discussion applies to any operating
system utilizing a monolithic kernel. This concept differs slightly from “building
all your modules into the kernel”, although the underlying principle is similar.
In contrast, there are microkernels, where modules are allocated their own code
space. Two notable examples of microkernels include the [GNU Hurd](https://www.gnu.org/software/hurd/)
and the [Zircon kernel](https://fuchsia.dev/fuchsia-src/concepts/kernel) of Googles
Fuchsia.
id: totrans-277
prefs: []
type: TYPE_NORMAL
zh: 应当注意,上述讨论适用于任何使用单一内核的操作系统。这个概念与“将所有模块构建到内核中”略有不同,尽管其基本原理相似。相比之下,还有微内核,其中模块分配了自己的代码空间。两个著名的微内核例子包括[GNU
Hurd](https://www.gnu.org/software/hurd/)和谷歌Fuchsia的[Zircon内核](https://fuchsia.dev/fuchsia-src/concepts/kernel)。
- en: 5.6 Device Drivers
id: totrans-278
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 5.6 设备驱动程序
- en: One class of module is the device driver, which provides functionality for hardware
like a serial port. On Unix, each piece of hardware is represented by a file located
in /dev named a device file which provides the means to communicate with the hardware.
The device driver provides the communication on behalf of a user program. So the
es1370.ko sound card device driver might connect the /dev/sound device file to
the Ensoniq ES1370 sound card. A userspace program like mp3blaster can use /dev/sound
without ever knowing what kind of sound card is installed.
id: totrans-279
prefs: []
type: TYPE_NORMAL
zh: 模块的一种类型是设备驱动程序它为串行端口等硬件提供功能。在Unix系统中每一块硬件都由位于/dev目录下的一个文件表示该文件被称为设备文件它提供了与硬件通信的手段。设备驱动程序代表用户程序进行通信。因此es1370.ko声卡设备驱动程序可能会将/dev/sound设备文件连接到Ensoniq
ES1370声卡。像mp3blaster这样的用户空间程序可以使用/dev/sound而无需知道安装了什么类型的声卡。
- en: 'Lets look at some device files. Here are device files which represent the
first three partitions on the primary SCSI storage devices:'
id: totrans-280
prefs: []
type: TYPE_NORMAL
zh: 让我们来看看一些设备文件。以下是一些代表主SCSI存储设备上前三个分区的设备文件
- en: '[PRE48]'
id: totrans-281
prefs: []
type: TYPE_PRE
zh: '[PRE48]'
- en: Notice the column of numbers separated by a comma. The first number is called
the devices major number. The second number is the minor number. The major number
tells you which driver is used to access the hardware. Each driver is assigned
a unique major number; all device files with the same major number are controlled
by the same driver. All the above major numbers are 8, because theyre all controlled
by the same driver.
id: totrans-282
prefs: []
type: TYPE_NORMAL
zh: 注意到由逗号分隔的数字列。第一个数字被称为设备的major号。第二个数字是minor号。major号告诉你使用哪个驱动程序来访问硬件。每个驱动程序都被分配了一个唯一的major号所有具有相同major号的设备文件都由同一个驱动程序控制。所有上述major号都是8因为它们都由同一个驱动程序控制。
- en: The minor number is used by the driver to distinguish between the various hardware
it controls. Returning to the example above, although all three devices are handled
by the same driver they have unique minor numbers because the driver sees them
as being different pieces of hardware.
id: totrans-283
prefs: []
type: TYPE_NORMAL
zh: 小号数由驱动程序用于区分它所控制的多种硬件。回到上面的例子,尽管这三个设备都由同一个驱动程序处理,但它们具有独特的小号数,因为驱动程序将它们视为不同的硬件。
- en: 'Devices are divided into two types: character devices and block devices. The
difference is that block devices have a buffer for requests, so they can choose
the best order in which to respond to the requests. This is important in the case
of storage devices, where it is faster to read or write sectors which are close
to each other, rather than those which are further apart. Another difference is
that block devices can only accept input and return output in blocks (whose size
can vary according to the device), whereas character devices are allowed to use
as many or as few bytes as they like. Most devices in the world are character,
because they dont need this type of buffering, and they dont operate with a
fixed block size. You can tell whether a device file is for a block device or
a character device by looking at the first character in the output of `ls -l`
. If it is b then it is a block device, and if it is c then it is a character
device. The devices you see above are block devices. Here are some character devices
(the serial ports):'
id: totrans-284
prefs: []
type: TYPE_NORMAL
zh: 设备分为两种类型:字符设备和块设备。区别在于块设备有一个请求缓冲区,因此它们可以选择最佳顺序来响应请求。这在存储设备的情况下很重要,因为读取或写入相邻扇区比读取或写入较远扇区要快。另一个区别是,块设备只能以块(其大小可以按设备变化)的形式接受输入并返回输出,而字符设备则允许使用任意多或少的字节。世界上大多数设备都是字符设备,因为它们不需要这种类型的缓冲,并且它们不使用固定块大小操作。你可以通过查看`ls -l`输出的第一个字符来判断设备文件是块设备还是字符设备。如果是b则它是块设备如果是c则它是字符设备。你上面看到的设备是块设备。以下是一些字符设备串行端口
- en: '[PRE49]'
id: totrans-285
prefs: []
type: TYPE_PRE
zh: '[PRE49]'
- en: If you want to see which major numbers have been assigned, you can look at [Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt).
id: totrans-286
prefs: []
type: TYPE_NORMAL
zh: 如果你想查看已分配的主编号,你可以查看[Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)。
- en: When the system was installed, all of those device files were created by the
`mknod` command. To create a new char device named coffee with major/minor number
12 and 2, simply do `mknod /dev/coffee c 12 2` . You do not have to put your device
files into /dev, but it is done by convention. Linus put his device files in /dev,
and so should you. However, when creating a device file for testing purposes,
it is probably OK to place it in your working directory where you compile the
kernel module. Just be sure to put it in the right place when youre done writing
the device driver.
id: totrans-287
prefs: []
type: TYPE_NORMAL
zh: 当系统安装时,所有这些设备文件都是由`mknod`命令创建的。要创建一个名为coffee的字符设备其主/次编号为12和2只需执行`mknod /dev/coffee c 12 2`。你不必将设备文件放入/dev但按照惯例是这样做的。林纳斯把他的设备文件放在/dev你也应该这样做。然而当为测试目的创建设备文件时将其放在编译内核模块的工作目录中可能没问题。只是确保在完成设备驱动程序编写后将其放在正确的位置。
- en: A few final points, although implicit in the previous discussion, are worth
stating explicitly for clarity. When a device file is accessed, the kernel utilizes
the files major number to identify the appropriate driver for handling the access.
This indicates that the kernel does not necessarily rely on or need to be aware
of the minor number. It is the driver that concerns itself with the minor number,
using it to differentiate between various pieces of hardware.
id: totrans-288
prefs: []
type: TYPE_NORMAL
zh: 虽然在之前的讨论中是隐含的但以下几点值得明确指出以增强清晰度。当一个设备文件被访问时内核利用文件的major编号来识别处理访问的适当驱动程序。这表明内核不一定依赖于或需要知道次编号。是驱动程序关心次编号并使用它来区分不同的硬件部件。
- en: 'It is important to note that when referring to “hardware”, the term is used
in a slightly more abstract sense than just a physical PCI card that can be held
in hand. Consider the following two device files:'
id: totrans-289
prefs: []
type: TYPE_NORMAL
zh: 需要注意的是当提到“硬件”时这个术语的使用比仅仅指可以手持的物理PCI卡要抽象一些。考虑以下两个设备文件
- en: '[PRE50]'
id: totrans-290
prefs: []
type: TYPE_PRE
zh: '[PRE50]'
- en: By now you can look at these two device files and know instantly that they are
block devices and are handled by same driver (block major 8). Sometimes two device
files with the same major but different minor number can actually represent the
same piece of physical hardware. So just be aware that the word “hardware” in
our discussion can mean something very abstract.
id: totrans-291
prefs: []
type: TYPE_NORMAL
zh: 到现在为止你可以查看这两个设备文件并立即知道它们是块设备并由相同的驱动程序处理块主编号8。有时具有相同major编号但不同minor编号的两个设备文件实际上可以代表同一块物理硬件。所以请注意我们讨论中的“硬件”一词可以指一个非常抽象的概念。
- en: 6 Character Device drivers
id: totrans-292
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 6 字符设备驱动程序
- en: 6.1 The file_operations Structure
id: totrans-293
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.1 文件操作结构体
- en: The `file_operations` structure is defined in [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h),
and holds pointers to functions defined by the driver that perform various operations
on the device. Each field of the structure corresponds to the address of some
function defined by the driver to handle a requested operation.
id: totrans-294
prefs: []
type: TYPE_NORMAL
zh: '`file_operations` 结构体定义在 [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h),并持有指向驱动程序定义的执行各种设备操作的函数的指针。结构体的每个字段对应于驱动程序定义的用于处理请求操作的函数的地址。'
- en: 'For example, every character driver needs to define a function that reads from
the device. The `file_operations` structure holds the address of the modules
function that performs that operation. Here is what the definition looks like
for kernel 5.4 and later versions:'
id: totrans-295
prefs: []
type: TYPE_NORMAL
zh: 例如,每个字符驱动程序都需要定义一个从设备读取数据的函数。`file_operations` 结构体持有执行该操作的模块函数的地址。以下是内核 5.4
及以后版本的定义示例:
- en: '[PRE51]'
id: totrans-296
prefs: []
type: TYPE_PRE
zh: '[PRE51]'
- en: Some operations are not implemented by a driver. For example, a driver that
handles a video card will not need to read from a directory structure. The corresponding
entries in the `file_operations` structure should be set to `NULL` . [¹](#fn1x0)
id: totrans-297
prefs: []
type: TYPE_NORMAL
zh: 一些操作不是由驱动程序实现的。例如,处理显卡的驱动程序不需要从目录结构中读取。`file_operations` 结构中的相应条目应设置为 `NULL`。[¹](#fn1x0)
- en: 'There is a gcc extension that makes assigning to this structure more convenient.
You will see it in modern drivers, and may catch you by surprise. This is what
the new way of assigning to the structure looks like:'
id: totrans-298
prefs: []
type: TYPE_NORMAL
zh: 存在一个 gcc 扩展,使得向该结构体赋值更加方便。你会在现代驱动程序中看到它,可能会让你感到惊讶。这是向结构体赋值的新方法的样子:
- en: '[PRE52]'
id: totrans-299
prefs: []
type: TYPE_PRE
zh: '[PRE52]'
- en: 'However, there is also a C99 way of assigning to elements of a structure, [designated
initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html), and this
is definitely preferred over using the GNU extension. You should use this syntax
in case someone wants to port your driver. It will help with compatibility:'
id: totrans-300
prefs: []
type: TYPE_NORMAL
zh: 然而C99 标准中也有一种给结构体元素赋值的方法,称为[指定初始化器](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html),这比使用
GNU 扩展更受欢迎。如果你希望有人移植你的驱动程序,你应该使用这种语法。这将有助于兼容性:
- en: '[PRE53]'
id: totrans-301
prefs: []
type: TYPE_PRE
zh: '[PRE53]'
- en: The meaning is clear, and you should be aware that any member of the structure
which you do not explicitly assign will be initialized to `NULL` by gcc.
id: totrans-302
prefs: []
type: TYPE_NORMAL
zh: 意义很明确,你应该知道,结构体中任何未明确赋值的成员将由 gcc 初始化为 `NULL`。
- en: An instance of `struct file_operations` containing pointers to functions that
are used to implement `read` , `write` , `open` , … system calls is commonly named
`fops` .
id: totrans-303
prefs: []
type: TYPE_NORMAL
zh: 包含指向用于实现 `read`、`write`、`open` 等系统调用函数的指针的 `struct file_operations` 实例通常命名为
`fops`。
- en: Since Linux v3.14, the read, write and seek operations are guaranteed for thread-safe
by using the `f_pos` specific lock, which makes the file position update to become
the mutual exclusion. So, we can safely implement those operations without unnecessary
locking.
id: totrans-304
prefs: []
type: TYPE_NORMAL
zh: 自 Linux v3.14 版本以来,通过使用 `f_pos` 特定锁来保证读取、写入和查找操作是线程安全的,这使得文件位置更新成为互斥操作。因此,我们可以安全地实现这些操作,而无需不必要的锁定。
- en: Additionally, since Linux v5.6, the `proc_ops` structure was introduced to replace
the use of the `file_operations` structure when registering proc handlers. See
more information in the [Section 7.1](#the-procops-structure).
id: totrans-305
prefs: []
type: TYPE_NORMAL
zh: 此外,自 Linux v5.6 版本以来,引入了 `proc_ops` 结构来替代注册 proc 处理器时使用 `file_operations` 结构。更多详细信息请参阅[第
7.1 节](#the-procops-structure)。
- en: 6.2 The file structure
id: totrans-306
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.2 文件结构
- en: Each device is represented in the kernel by a file structure, which is defined
in [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h).
Be aware that a file is a kernel level structure and never appears in a user space
program. It is not the same thing as a `FILE` , which is defined by glibc and
would never appear in a kernel space function. Also, its name is a bit misleading;
it represents an abstract open file, not a file on a disk, which is represented
by a structure named `inode` .
id: totrans-307
prefs: []
type: TYPE_NORMAL
zh: 每个设备在内核中通过文件结构体表示,该结构体定义在 [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h)。请注意,文件是一个内核级结构体,永远不会出现在用户空间程序中。它不同于由
glibc 定义的 `FILE`,后者永远不会出现在内核空间函数中。此外,它的名称有点误导;它代表一个抽象的打开“文件”,而不是磁盘上的文件,磁盘上的文件由名为
`inode` 的结构体表示。
- en: An instance of struct file is commonly named `filp` . Youll also see it referred
to as a struct file object. Resist the temptation.
id: totrans-308
prefs: []
type: TYPE_NORMAL
zh: struct file 的实例通常命名为 `filp`。你也会看到它被称作 struct file 对象。请抵制这种诱惑。
- en: Go ahead and look at the definition of file. Most of the entries you see, like
struct dentry, are not used by device drivers, and you can ignore them. This is
because drivers do not fill file directly; they only use structures contained
in file which are created elsewhere.
id: totrans-309
prefs: []
type: TYPE_NORMAL
zh: 继续查看文件的定义。您看到的大部分条目如struct dentry都不被设备驱动程序使用您可以忽略它们。这是因为驱动程序不会直接填充文件它们只使用文件中包含的结构这些结构是在其他地方创建的。
- en: 6.3 Registering A Device
id: totrans-310
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.3 注册设备
- en: As discussed earlier, char devices are accessed through device files, usually
located in /dev. This is by convention. When writing a driver, it is OK to put
the device file in your current directory. Just make sure you place it in /dev
for a production driver. The major number tells you which driver handles which
device file. The minor number is used only by the driver itself to differentiate
which device it is operating on, just in case the driver handles more than one
device.
id: totrans-311
prefs: []
type: TYPE_NORMAL
zh: 如前所述,字符设备通过设备文件访问,通常位于/dev目录下。这是惯例。在编写驱动程序时将设备文件放在当前目录中是可以的。只需确保在生产驱动程序中将它放在/dev目录下。主设备号告诉您哪个驱动程序处理哪个设备文件。次设备号仅由驱动程序本身使用以区分它正在操作哪个设备以防驱动程序处理多个设备。
- en: Adding a driver to your system means registering it with the kernel. This is
synonymous with assigning it a major number during the modules initialization.
You do this by using the `register_chrdev` function, defined by [include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h).
id: totrans-312
prefs: []
type: TYPE_NORMAL
zh: 将驱动程序添加到您的系统意味着将其注册到内核中。这与在模块初始化期间为其分配一个主设备号同义。您可以通过使用由[include/linux/fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/fs.h)定义的`register_chrdev`函数来完成此操作。
- en: '[PRE54]'
id: totrans-313
prefs: []
type: TYPE_PRE
zh: '[PRE54]'
- en: Where `unsigned int major` is the major number you want to request, `const char *name`
is the name of the device as it will appear in /proc/devices and `struct file_operations *fops`
is a pointer to the `file_operations` table for your driver. A negative return
value means the registration failed. Note that we didnt pass the minor number
to `register_chrdev` . That is because the kernel doesnt care about the minor
number; only our driver uses it.
id: totrans-314
prefs: []
type: TYPE_NORMAL
zh: 在`unsigned int major`是您想要请求的主设备号,`const char *name`是设备在/proc/devices中显示的名称`struct
file_operations *fops`是您驱动程序的`file_operations`表的指针。负返回值表示注册失败。请注意,我们没有将次设备号传递给`register_chrdev`。这是因为内核不关心次设备号;只有我们的驱动程序使用它。
- en: Now the question is, how do you get a major number without hijacking one thats
already in use? The easiest way would be to look through [Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)
and pick an unused one. That is a bad way of doing things because you will never
be sure if the number you picked will be assigned later. The answer is that you
can ask the kernel to assign you a dynamic major number.
id: totrans-315
prefs: []
type: TYPE_NORMAL
zh: 现在的问题是,您如何在不抢占已使用的设备号的情况下获得一个主设备号?最简单的方法是查看[Documentation/admin-guide/devices.txt](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/devices.txt)并选择一个未使用的设备号。这是一种不好的做法,因为您永远无法确定您选择的号码将来是否会被分配。答案是您可以请求内核为您分配一个动态的主设备号。
- en: If you pass a major number of 0 to `register_chrdev` , the return value will
be the dynamically allocated major number. The downside is that you can not make
a device file in advance, since you do not know what the major number will be.
There are a couple of ways to do this. First, the driver itself can print the
newly assigned number and we can make the device file by hand. Second, the newly
registered device will have an entry in /proc/devices, and we can either make
the device file by hand or write a shell script to read the file in and make the
device file. The third method is that we can have our driver make the device file
using the `device_create` function after a successful registration and `device_destroy`
during the call to `cleanup_module` .
id: totrans-316
prefs: []
type: TYPE_NORMAL
zh: 如果您将`register_chrdev`的设备号传递为0则返回值将是动态分配的主设备号。缺点是您无法提前创建设备文件因为您不知道主设备号是什么。有几种方法可以做到这一点。首先驱动程序本身可以打印新分配的号码我们可以手动创建设备文件。其次新注册的设备将在/proc/devices中有一个条目我们可以手动创建设备文件或编写shell脚本来读取该文件并创建设备文件。第三种方法是我们可以在成功注册后使用`device_create`函数创建设备文件,在调用`cleanup_module`期间使用`device_destroy`。
- en: However, `register_chrdev()` would occupy a range of minor numbers associated
with the given major. The recommended way to reduce waste for char device registration
is using cdev interface.
id: totrans-317
prefs: []
type: TYPE_NORMAL
zh: 然而,`register_chrdev()`会占用与给定主设备号相关联的一组次设备号。为了减少字符设备注册的浪费建议使用cdev接口。
- en: The newer interface completes the char device registration in two distinct steps.
First, we should register a range of device numbers, which can be completed with
`register_chrdev_region` or `alloc_chrdev_region` .
id: totrans-318
prefs: []
type: TYPE_NORMAL
zh: 新的界面通过两个不同的步骤完成字符设备注册。首先,我们应该注册一系列设备号,这可以通过`register_chrdev_region`或`alloc_chrdev_region`来完成。
- en: '[PRE55]'
id: totrans-319
prefs: []
type: TYPE_PRE
zh: '[PRE55]'
- en: The choice between two different functions depends on whether you know the major
numbers for your device. Using `register_chrdev_region` if you know the device
major number and `alloc_chrdev_region` if you would like to allocate a dynamically-allocated
major number.
id: totrans-320
prefs: []
type: TYPE_NORMAL
zh: 两个不同函数之间的选择取决于你是否知道你的设备的主设备号。如果你知道设备的主设备号,则使用`register_chrdev_region`;如果你希望分配一个动态分配的主设备号,则使用`alloc_chrdev_region`。
- en: Second, we should initialize the data structure `struct cdev` for our char device
and associate it with the device numbers. To initialize the `struct cdev` , we
can achieve by the similar sequence of the following codes.
id: totrans-321
prefs: []
type: TYPE_NORMAL
zh: 其次,我们应该初始化我们的字符设备的`struct cdev`数据结构并将其与设备号关联起来。为了初始化`struct cdev`,我们可以通过以下代码的类似序列来实现。
- en: '[PRE56]'
id: totrans-322
prefs: []
type: TYPE_PRE
zh: '[PRE56]'
- en: However, the common usage pattern will embed the `struct cdev` within a device-specific
structure of your own. In this case, well need `cdev_init` for the initialization.
id: totrans-323
prefs: []
type: TYPE_NORMAL
zh: 然而,常见的用法模式是将`struct cdev`嵌入到你自己特定的设备结构中。在这种情况下,我们需要`cdev_init`来进行初始化。
- en: '[PRE57]'
id: totrans-324
prefs: []
type: TYPE_PRE
zh: '[PRE57]'
- en: Once we finish the initialization, we can add the char device to the system
by using the `cdev_add` .
id: totrans-325
prefs: []
type: TYPE_NORMAL
zh: 一旦完成初始化,我们可以通过使用`cdev_add`将字符设备添加到系统中。
- en: '[PRE58]'
id: totrans-326
prefs: []
type: TYPE_PRE
zh: '[PRE58]'
- en: To find an example using the interface, you can see ioctl.c described in [Section 9](#talking-to-device-files).
id: totrans-327
prefs: []
type: TYPE_NORMAL
zh: 要找到一个使用该接口的示例,你可以查看[第9节](#talking-to-device-files)中描述的ioctl.c。
- en: 6.4 Unregistering A Device
id: totrans-328
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.4 注销设备
- en: We can not allow the kernel module to be `rmmod` ed whenever root feels like
it. If the device file is opened by a process and then we remove the kernel module,
using the file would cause a call to the memory location where the appropriate
function (read/write) used to be. If we are lucky, no other code was loaded there,
and well get an ugly error message. If we are unlucky, another kernel module
was loaded into the same location, which means a jump into the middle of another
function within the kernel. The results of this would be impossible to predict,
but they can not be very positive.
id: totrans-329
prefs: []
type: TYPE_NORMAL
zh: 我们不能允许内核模块在root想什么时候就什么时候被`rmmod`。如果设备文件被某个进程打开,然后我们移除内核模块,使用该文件会导致调用曾经用于(读取/写入适当功能read/write的内存位置。如果我们幸运那里没有加载其他代码我们可能会得到一个难看的错误信息。如果我们不幸另一个内核模块被加载到相同的位置这意味着在内核中的另一个函数中间进行跳转。这种结果是不可预测的但它们可能不会非常积极。
- en: 'Normally, when you do not want to allow something, you return an error code
(a negative number) from the function which is supposed to do it. With `cleanup_module`
thats impossible because it is a void function. However, there is a counter which
keeps track of how many processes are using your module. You can see what its
value is by looking at the 3rd field with the command `cat /proc/modules` or `lsmod`
. If this number isnt zero, `rmmod` will fail. Note that you do not have to check
the counter within `cleanup_module` because the check will be performed for you
by the system call `sys_delete_module` , defined in [include/linux/syscalls.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/syscalls.h).
You should not use this counter directly, but there are functions defined in [include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)
which let you display this counter:'
id: totrans-330
prefs: []
type: TYPE_NORMAL
zh: 通常,当你不希望允许某事发生时,你应该从应该执行该操作的功能中返回一个错误代码(一个负数)。对于`cleanup_module`来说这是不可能的,因为它是一个空函数。然而,有一个计数器会跟踪有多少进程正在使用你的模块。你可以通过查看`cat
/proc/modules`或`lsmod`命令的第三个字段来查看它的值。如果这个数字不是零,`rmmod`将失败。请注意,你不需要在`cleanup_module`中检查这个计数器,因为系统调用`sys_delete_module`会为你执行这个检查,该系统调用定义在[include/linux/syscalls.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/syscalls.h)。你不应该直接使用这个计数器,但在[include/linux/module.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/module.h)中定义了一些函数,允许你显示这个计数器:
- en: '`module_refcount(THIS_MODULE)` : Return the value of reference count of current
module.'
id: totrans-331
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`module_refcount(THIS_MODULE)`:返回当前模块的引用计数值。'
- en: 'Note: The use of `try_module_get(THIS_MODULE)` and `module_put(THIS_MODULE)`
within a modules own code is considered unsafe and should be avoided. The kernel
automatically manages the reference count when file operations are in progress,
so manual reference counting is unnecessary and can lead to race conditions. For
a deeper understanding of when and how to properly use module reference counting,
see [https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put](https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put).'
id: totrans-332
prefs: []
type: TYPE_NORMAL
zh: 注意:在模块自己的代码中使用`try_module_get(THIS_MODULE)`和`module_put(THIS_MODULE)`被认为是不安全的,应该避免。当文件操作正在进行时,内核会自动管理引用计数,因此手动引用计数是不必要的,并且可能导致竞争条件。为了更深入地了解何时以及如何正确使用模块引用计数,请参阅[https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put](https://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put)。
- en: 6.5 chardev.c
id: totrans-333
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.5 chardev.c
- en: 'The next code sample creates a char driver named chardev. You can verify it
has been registered by checking:'
id: totrans-334
prefs: []
type: TYPE_NORMAL
zh: 下一个代码示例创建了一个名为chardev的字符驱动程序。你可以通过以下方式验证它是否已注册
- en: '[PRE59]'
id: totrans-335
prefs: []
type: TYPE_PRE
zh: '[PRE59]'
- en: This will show the devices major number. To actually use the device, you need
to read from /dev/chardev (or open the file with a program) and the driver will
put the number of times the device file has been read from into the file. We do
not support writing to the file (like `echo "hi" > /dev/chardev` ), but catch
these attempts and tell the user that the operation is not supported. Do not worry
if you do not see what we do with the data we read into the buffer; we do not
do much with it. We simply read in the data and print a message acknowledging
that we received it.
id: totrans-336
prefs: []
type: TYPE_NORMAL
zh: 这将显示设备的major号。要实际使用该设备你需要从/dev/chardev或使用程序打开文件读取并且驱动程序会将设备文件被读取的次数放入文件中。我们不支持向文件写入如`echo "hi" > /dev/chardev`),但会捕获这些尝试并告知用户该操作不受支持。如果你没有看到我们对读取到缓冲区中的数据做了什么,请不要担心;我们并没有对它做太多处理。我们只是读取数据并打印一条消息,确认我们已经收到了它。
- en: In a multi-threaded environment, without any protection, concurrent access to
the same memory may lead to race conditions and will not preserve performance.
In the kernel module, this problem may happen due to multiple instances accessing
the shared resources. Therefore, a solution is to enforce exclusive access. We
use atomic Compare-And-Swap (CAS) to maintain the states, `CDEV_NOT_USED` and
`CDEV_EXCLUSIVE_OPEN` , to determine whether the file is currently opened by someone
or not. CAS compares the contents of a memory location with the expected value
and, only if they are the same, modifies the contents of that memory location
to the desired value. See more concurrency details in the [Section 12](#synchronization).
id: totrans-337
prefs: []
type: TYPE_NORMAL
zh: 在多线程环境中如果没有任何保护措施对同一内存的并发访问可能会导致竞争条件并且不会保持性能。在内核模块中这个问题可能由于多个实例访问共享资源而出现。因此一个解决方案是强制执行独占访问。我们使用原子比较和交换CAS来维护状态`CDEV_NOT_USED`和`CDEV_EXCLUSIVE_OPEN`以确定文件当前是否被某人打开。CAS比较内存位置的值与预期值并且只有在它们相同的情况下才会将该内存位置的值修改为所需的值。更多并发细节请参阅[第12节](#synchronization)。
- en: '[PRE60]'
id: totrans-338
prefs: []
type: TYPE_PRE
zh: '[PRE60]'
- en: 6.6 Writing Modules for Multiple Kernel Versions
id: totrans-339
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 6.6 为多个内核版本编写模块
- en: The system calls, which are the major interface the kernel shows to the processes,
generally stay the same across versions. A new system call may be added, but usually
the old ones will behave exactly like they used to. This is necessary for backward
compatibility a new kernel version is not supposed to break regular processes.
In most cases, the device files will also remain the same. On the other hand,
the internal interfaces within the kernel can and do change between versions.
id: totrans-340
prefs: []
type: TYPE_NORMAL
zh: 系统调用,这是内核向进程展示的主要接口,通常在版本之间保持不变。可能会添加新的系统调用,但通常旧的行为将与以前完全相同。这是为了向后兼容——新的内核版本不应该破坏常规进程。在大多数情况下,设备文件也将保持不变。另一方面,内核内部接口在版本之间可以并且确实会发生变化。
- en: There are differences between different kernel versions, and if you want to
support multiple kernel versions, you will find yourself having to code conditional
compilation directives. The way to do this is to compare the macro `LINUX_VERSION_CODE`
to the macro `KERNEL_VERSION` . In version a.b.c of the kernel, the value of this
macro would be ![216a+ 28b+ c ](img/7b83dc18db2a578cd2fb1a4ad4ae584e.png).
id: totrans-341
prefs: []
type: TYPE_NORMAL
zh: 不同内核版本之间存在差异,如果你想要支持多个内核版本,你将发现自己需要编写条件编译指令。这样做的方法是将宏`LINUX_VERSION_CODE`与宏`KERNEL_VERSION`进行比较。在内核版本a.b.c中该宏的值将是![216a+
28b+ c ](img/7b83dc18db2a578cd2fb1a4ad4ae584e.png)。
- en: 7 The /proc Filesystem
id: totrans-342
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 7. /proc 文件系统
- en: In Linux, there is an additional mechanism for the kernel and kernel modules
to send information to processes — the /proc filesystem. Originally designed to
allow easy access to information about processes (hence the name), it is now used
by every bit of the kernel which has something interesting to report, such as
/proc/modules which provides the list of modules and /proc/meminfo which gathers
memory usage statistics.
id: totrans-343
prefs: []
type: TYPE_NORMAL
zh: 在Linux中内核和内核模块向进程发送信息有一个额外的机制——/proc 文件系统。最初设计是为了允许轻松访问有关进程的信息(因此得名),现在内核中任何有有趣信息要报告的部分都会使用它,例如/proc/modules提供了模块列表/proc/meminfo收集内存使用统计信息。
- en: The method to use the proc filesystem is very similar to the one used with device
drivers — a structure is created with all the information needed for the /proc
file, including pointers to any handler functions (in our case there is only one,
the one called when somebody attempts to read from the /proc file). Then, `init_module`
registers the structure with the kernel and `cleanup_module` unregisters it.
id: totrans-344
prefs: []
type: TYPE_NORMAL
zh: 使用proc文件系统的方法与设备驱动程序使用的非常相似——创建一个包含/proc文件所需所有信息的结构包括任何处理函数的指针在我们的例子中只有一个即当有人尝试从/proc文件读取时调用的函数。然后`init_module`将结构注册到内核中,`cleanup_module`注销它。
- en: Normal filesystems are located on a disk, rather than just in memory (which
is where /proc is), and in that case the index-node (inode for short) number is
a pointer to a disk location where the files inode is located. The inode contains
information about the file, for example the files permissions, together with
a pointer to the disk location or locations where the files data can be found.
id: totrans-345
prefs: []
type: TYPE_NORMAL
zh: 正常的文件系统位于磁盘上,而不是仅仅在内存中(/proc就在这里在这种情况下索引节点简称inode号是一个指向文件inode所在磁盘位置的指针。inode包含有关文件的信息例如文件的权限以及指向文件数据所在磁盘位置或位置的指针。
- en: Because we do not get called when the file is opened or closed, there is nowhere
for us to put `try_module_get` and `module_put` in this module, and if the file
is opened and then the module is removed, there is no way to avoid the consequences.
The kernels automatic reference counting for file operations helps prevent module
removal while files are in use, but /proc files require careful handling due to
their different lifecycle.
id: totrans-346
prefs: []
type: TYPE_NORMAL
zh: 由于文件打开或关闭时我们没有被调用,在这个模块中我们无处放置`try_module_get`和`module_put`,如果文件被打开然后模块被移除,就无法避免后果。内核对文件操作的自动引用计数有助于防止在文件使用时移除模块,但由于它们不同的生命周期,/proc文件需要小心处理。
- en: 'Here is a simple example showing how to use a /proc file. This is the HelloWorld
for the /proc filesystem. There are three parts: create the file /proc/helloworld
in the function `init_module` , return a value (and a buffer) when the file /proc/helloworld
is read in the callback function `procfile_read` , and delete the file /proc/helloworld
in the function `cleanup_module` .'
id: totrans-347
prefs: []
type: TYPE_NORMAL
zh: 这里有一个简单的示例,展示了如何使用/proc文件。这是/proc文件系统的HelloWorld。它有三个部分在`init_module`函数中创建/proc/helloworld文件在回调函数`procfile_read`中读取/proc/helloworld文件时返回一个值和一个缓冲区以及在`cleanup_module`函数中删除/proc/helloworld文件。
- en: The /proc/helloworld is created when the module is loaded with the function
`proc_create` . The return value is a pointer to `struct proc_dir_entry` , and
it will be used to configure the file /proc/helloworld (for example, the owner
of this file). A null return value means that the creation has failed.
id: totrans-348
prefs: []
type: TYPE_NORMAL
zh: 当模块通过`proc_create`函数加载时,会创建/proc/helloworld。返回值是一个指向`struct proc_dir_entry`的指针,它将被用来配置/proc/helloworld文件例如该文件的拥有者。空返回值表示创建失败。
- en: 'Every time the file /proc/helloworld is read, the function `procfile_read`
is called. Two parameters of this function are very important: the buffer (the
second parameter) and the offset (the fourth one). The content of the buffer will
be returned to the application which read it (for example the `cat` command).
The offset is the current position in the file. If the return value of the function
is not null, then this function is called again. So be careful with this function,
if it never returns zero, the read function is called endlessly.'
id: totrans-349
prefs: []
type: TYPE_NORMAL
zh: 每次读取/proc/helloworld文件时都会调用`procfile_read`函数。这个函数的两个参数非常重要:缓冲区(第二个参数)和偏移量(第四个参数)。缓冲区的内容将被返回给读取它的应用程序(例如`cat`命令)。偏移量是文件中的当前位置。如果函数的返回值不为空,则此函数将被再次调用。所以要注意这个函数,如果它从不返回零,则读取函数会无限期地被调用。
- en: '[PRE61]'
id: totrans-350
prefs: []
type: TYPE_PRE
zh: '[PRE61]'
- en: '[PRE62]'
id: totrans-351
prefs: []
type: TYPE_PRE
zh: '[PRE62]'
- en: 7.1 The proc_ops Structure
id: totrans-352
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 7.1 proc_ops 结构
- en: The `proc_ops` structure is defined in [include/linux/proc_fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/proc_fs.h)
in Linux v5.6+. In older kernels, it used `file_operations` for custom hooks in
/proc filesystem, but it contains some members that are unnecessary in VFS, and
every time VFS expands `file_operations` set, /proc code comes bloated. On the
other hand, not only the space, but also some operations were saved by this structure
to improve its performance. For example, the file which never disappears in /proc
can set the `proc_flag` as `PROC_ENTRY_PERMANENT` to save 2 atomic ops, 1 allocation,
1 free in per open/read/close sequence.
id: totrans-353
prefs: []
type: TYPE_NORMAL
zh: '`proc_ops` 结构定义在 Linux v5.6+ 的 `[include/linux/proc_fs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/proc_fs.h)`
中。在较旧的内核中,它使用 `file_operations` 在 `/proc` 文件系统中进行自定义钩子,但它包含一些在 VFS 中不必要的成员,并且每次
VFS 扩展 `file_operations` 集合时,`/proc` 代码就会变得臃肿。另一方面,通过这个结构不仅节省了空间,还节省了一些操作以提高其性能。例如,在
`/proc` 中永远不会消失的文件可以将 `proc_flag` 设置为 `PROC_ENTRY_PERMANENT` 以节省 2 个原子操作、1 次分配和
1 次释放,在每次打开/读取/关闭序列中。'
- en: 7.2 Read and Write a /proc File
id: totrans-354
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 7.2 读取和写入 /proc 文件
- en: We have seen a very simple example for a /proc file where we only read the file
/proc/helloworld. It is also possible to write in a /proc file. It works the same
way as read, a function is called when the /proc file is written. But there is
a little difference with read, data comes from user, so you have to import data
from user space to kernel space (with `copy_from_user` or `get_user` )
id: totrans-355
prefs: []
type: TYPE_NORMAL
zh: 我们已经看到了一个用于 `/proc` 文件的非常简单的示例,其中我们只读取了 `/proc/helloworld` 文件。也可以写入 `/proc`
文件。它的工作方式与读取相同,当 `/proc` 文件被写入时,会调用一个函数。但与读取有一点不同,数据来自用户,因此你必须从用户空间导入数据到内核空间(使用
`copy_from_user` 或 `get_user`)。
- en: The reason for `copy_from_user` or `get_user` is that Linux memory (on Intel
architecture, it may be different under some other processors) is segmented. This
means that a pointer, by itself, does not reference a unique location in memory,
only a location in a memory segment, and you need to know which memory segment
it is to be able to use it. There is one memory segment for the kernel, and one
for each of the processes.
id: totrans-356
prefs: []
type: TYPE_NORMAL
zh: 使用 `copy_from_user` 或 `get_user` 的原因是 Linux 内存(在英特尔架构上,在其他一些处理器下可能不同)是分段的。这意味着一个指针本身并不引用内存中的唯一位置,而只是引用内存段中的一个位置,你需要知道它是哪个内存段才能使用它。有一个内核内存段,以及每个进程的一个内存段。
- en: The only memory segment accessible to a process is its own, so when writing
regular programs to run as processes, there is no need to worry about segments.
When you write a kernel module, normally you want to access the kernel memory
segment, which is handled automatically by the system. However, when the content
of a memory buffer needs to be passed between the currently running process and
the kernel, the kernel function receives a pointer to the memory buffer which
is in the process segment. The `put_user` and `get_user` macros allow you to access
that memory. These functions handle only one character, you can handle several
characters with `copy_to_user` and `copy_from_user` . As the buffer (in read or
write function) is in kernel space, for write function you need to import data
because it comes from user space, but not for the read function because data is
already in kernel space.
id: totrans-357
prefs: []
type: TYPE_NORMAL
zh: 一个进程可访问的唯一内存段是其自身的,因此当编写作为进程运行的常规程序时,无需担心段。当你编写内核模块时,通常你想要访问内核内存段,这由系统自动处理。然而,当需要将内存缓冲区的内容在当前运行的进程和内核之间传递时,内核函数会接收到一个指向进程内存段的内存缓冲区指针。`put_user`
和 `get_user` 宏允许你访问该内存。这些函数仅处理一个字符,你可以使用 `copy_to_user` 和 `copy_from_user` 来处理多个字符。由于缓冲区(在读取或写入函数中)位于内核空间,对于写入函数,你需要导入数据,因为数据来自用户空间,但对于读取函数则不需要,因为数据已经在内核空间。
- en: '[PRE63]'
id: totrans-358
prefs: []
type: TYPE_PRE
zh: '[PRE63]'
- en: 7.3 Manage /proc file with standard filesystem
id: totrans-359
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 7.3 使用标准文件系统管理 /proc 文件
- en: We have seen how to read and write a /proc file with the /proc interface. But
it is also possible to manage /proc file with inodes. The main concern is to use
advanced functions, like permissions.
id: totrans-360
prefs: []
type: TYPE_NORMAL
zh: 我们已经看到了如何使用 `/proc` 接口读取和写入 `/proc` 文件。但也可以使用inode来管理 `/proc` 文件。主要关注的是使用高级功能,如权限。
- en: In Linux, there is a standard mechanism for filesystem registration. Since every
filesystem has to have its own functions to handle inode and file operations,
there is a special structure to hold pointers to all those functions, `struct inode_operations`
, which includes a pointer to `struct proc_ops` .
id: totrans-361
prefs: []
type: TYPE_NORMAL
zh: 在 Linux 中存在一个标准的文件系统注册机制。由于每个文件系统都必须有自己的函数来处理inode和文件操作因此有一个特殊的结构来保存所有这些函数的指针
`struct inode_operations`,它包括一个指向 `struct proc_ops` 的指针。
- en: The difference between file and inode operations is that file operations deal
with the file itself whereas inode operations deal with ways of referencing the
file, such as creating links to it.
id: totrans-362
prefs: []
type: TYPE_NORMAL
zh: 文件操作和inode操作之间的区别在于文件操作处理文件本身而inode操作处理引用文件的方式例如创建指向它的链接。
- en: In /proc, whenever we register a new file, were allowed to specify which `struct inode_operations`
will be used to access to it. This is the mechanism we use, a `struct inode_operations`
which includes a pointer to a `struct proc_ops` which includes pointers to our
`procfs_read` and `procfs_write` functions.
id: totrans-363
prefs: []
type: TYPE_NORMAL
zh: 在/proc中每当注册一个新的文件时我们都可以指定将使用哪个`struct inode_operations`来访问它。这是我们使用的机制,一个包含指向`struct
proc_ops`的指针的`struct inode_operations`,而`struct proc_ops`包含指向我们的`procfs_read`和`procfs_write`函数的指针。
- en: Another interesting point here is the `module_permission` function. This function
is called whenever a process tries to do something with the /proc file, and it
can decide whether to allow access or not. Right now it is only based on the operation
and the uid of the current user (as available in current, a pointer to a structure
which includes information on the currently running process), but it could be
based on anything we like, such as what other processes are doing with the same
file, the time of day, or the last input we received.
id: totrans-364
prefs: []
type: TYPE_NORMAL
zh: 另一个有趣的地方是`module_permission`函数。每当一个进程尝试对/proc文件进行操作时都会调用此函数并且它可以决定是否允许访问。目前它仅基于操作和当前用户的uid如当前一个指向包含当前运行进程信息的结构的指针但它可以基于我们喜欢的内容例如其他进程如何使用相同的文件、一天中的时间或我们收到的最后输入。
- en: It is important to note that the standard roles of read and write are reversed
in the kernel. Read functions are used for output, whereas write functions are
used for input. The reason for that is that read and write refer to the users
point of view — if a process reads something from the kernel, then the kernel
needs to output it, and if a process writes something to the kernel, then the
kernel receives it as input.
id: totrans-365
prefs: []
type: TYPE_NORMAL
zh: 重要的是要注意,在内核中,标准读取和写入的角色是相反的。读取函数用于输出,而写入函数用于输入。这样做的原因是读取和写入指的是用户的观点——如果一个进程从内核读取某些内容,那么内核需要输出它;如果一个进程向内核写入某些内容,那么内核将其作为输入接收。
- en: '[PRE64]'
id: totrans-366
prefs: []
type: TYPE_PRE
zh: '[PRE64]'
- en: Still hungry for procfs examples? Well, first of all keep in mind, there are
rumors around, claiming that procfs is on its way out, consider using sysfs instead.
Consider using this mechanism, in case you want to document something kernel related
yourself.
id: totrans-367
prefs: []
type: TYPE_NORMAL
zh: 还想看更多关于procfs的示例吗首先请记住有传言称procfs正在退出考虑使用sysfs。如果您想自己记录与内核相关的内容可以考虑使用这种机制。
- en: 7.4 Manage /proc file with seq_file
id: totrans-368
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 7.4 使用seq_file管理/proc文件
- en: 'As we have seen, writing a /proc file may be quite “complex”. So to help people
writing /proc file, there is an API named `seq_file` that helps formatting a /proc
file for output. It is based on sequence, which is composed of 3 functions: `start()`
, `next()` , and `stop()` . The `seq_file` API starts a sequence when a user reads
the /proc file.'
id: totrans-369
prefs: []
type: TYPE_NORMAL
zh: 正如我们所见,编写/proc文件可能相当“复杂”。因此为了帮助人们编写/proc文件存在一个名为`seq_file`的API它有助于格式化输出/proc文件。它基于序列由3个函数组成`start()`、`next()`和`stop()`。当用户读取/proc文件时`seq_file`
API会启动一个序列。
- en: A sequence begins with the call of the function `start()` . If the return is
a non `NULL` value, the function `next()` is called; otherwise, the `stop()` function
is called directly. This function is an iterator, the goal is to go through all
the data. Each time `next()` is called, the function `show()` is also called.
It writes data values in the buffer read by the user. The function `next()` is
called until it returns `NULL` . The sequence ends when `next()` returns `NULL`
, then the function `stop()` is called.
id: totrans-370
prefs: []
type: TYPE_NORMAL
zh: 序列从调用`start()`函数开始。如果返回值是非`NULL`值,则调用`next()`函数;否则,直接调用`stop()`函数。这个函数是一个迭代器,目标是遍历所有数据。每次调用`next()`时,都会调用`show()`函数。它将用户读取的缓冲区中的数据值写入。`next()`函数会一直调用,直到它返回`NULL`。序列在`next()`返回`NULL`时结束,然后调用`stop()`函数。
- en: 'BE CAREFUL: when a sequence is finished, another one starts. That means that
at the end of function `stop()` , the function `start()` is called again. This
loop finishes when the function `start()` returns `NULL` . You can see a scheme
of this in the [Figure 1](#ignorespaces-how-seqfile-works).'
id: totrans-371
prefs: []
type: TYPE_NORMAL
zh: 注意:当序列结束时,另一个序列开始。这意味着在`stop()`函数的末尾,会再次调用`start()`函数。这个循环在`start()`函数返回`NULL`时结束。您可以在[图1](#ignorespaces-how-seqfile-works)中看到这个方案的示意图。
- en: '![srrsYNNYtaenetoeooertuetupsstrxr((ntn))( tis)istrr teeaNreNatUaUtmLtLmeLmLen?e?ntntt ](img/8209b6ea27687e8832cc85a37f5784c5.png)'
id: totrans-372
prefs: []
type: TYPE_IMG
zh: '![srrsYNNYtaenetoeooertuetupsstrxr((ntn))( tis)istrr teeaNreNatUaUtmLtLmeLmLen?e?ntntt](img/8209b6ea27687e8832cc85a37f5784c5.png)'
- en: Figure 1:How seq_file works
id: totrans-373
prefs: []
type: TYPE_NORMAL
zh: 图1seq_file的工作原理
- en: The `seq_file` provides basic functions for `proc_ops` , such as `seq_read`
, `seq_lseek` , and some others. But nothing to write in the /proc file. Of course,
you can still use the same way as in the previous example.
id: totrans-374
prefs: []
type: TYPE_NORMAL
zh: '`seq_file`为`proc_ops`提供了基本函数,如`seq_read``seq_lseek`等,但不需要在/proc文件中写入任何内容。当然您仍然可以使用与上一个示例相同的方式。'
- en: '[PRE65]'
id: totrans-375
prefs: []
type: TYPE_PRE
zh: '[PRE65]'
- en: 'If you want more information, you can read this web page:'
id: totrans-376
prefs: []
type: TYPE_NORMAL
zh: 如果需要更多信息,您可以阅读此网页:
- en: '[https://lwn.net/Articles/22355/](https://lwn.net/Articles/22355/)'
id: totrans-377
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[https://lwn.net/Articles/22355/](https://lwn.net/Articles/22355/)'
- en: '[https://kernelnewbies.org/Documents/SeqFileHowTo](https://kernelnewbies.org/Documents/SeqFileHowTo)'
id: totrans-378
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[https://kernelnewbies.org/Documents/SeqFileHowTo](https://kernelnewbies.org/Documents/SeqFileHowTo)'
- en: You can also read the code of [fs/seq_file.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/seq_file.c)
in the Linux kernel.
id: totrans-379
prefs: []
type: TYPE_NORMAL
zh: 您还可以阅读Linux内核中[fs/seq_file.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/fs/seq_file.c)的代码。
- en: '8 sysfs: Interacting with your module'
id: totrans-380
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 8 sysfs与您的模块交互
- en: sysfs allows you to interact with the running kernel from userspace by reading
or setting variables inside of modules. This can be useful for debugging purposes,
or just as an interface for applications or scripts. You can find sysfs directories
and files under the /sys directory on your system.
id: totrans-381
prefs: []
type: TYPE_NORMAL
zh: sysfs允许您通过读取或设置模块内部的变量从用户空间与运行中的内核进行交互。这可以用于调试目的或者作为应用程序或脚本的接口。您可以在系统中的/sys目录下找到sysfs目录和文件。
- en: '[PRE66]'
id: totrans-382
prefs: []
type: TYPE_PRE
zh: '[PRE66]'
- en: Attributes can be exported for kobjects in the form of regular files in the
filesystem. Sysfs forwards file I/O operations to methods defined for the attributes,
providing a means to read and write kernel attributes.
id: totrans-383
prefs: []
type: TYPE_NORMAL
zh: 可以将kobjects的属性以常规文件的形式导出至文件系统。Sysfs将文件I/O操作转发到为属性定义的方法提供了一种读取和写入内核属性的手段。
- en: 'A simple attribute definition:'
id: totrans-384
prefs: []
type: TYPE_NORMAL
zh: 简单的属性定义:
- en: '[PRE67]'
id: totrans-385
prefs: []
type: TYPE_PRE
zh: '[PRE67]'
- en: 'For example, the driver model defines `struct device_attribute` like:'
id: totrans-386
prefs: []
type: TYPE_NORMAL
zh: 例如,驱动模型定义了`struct device_attribute`如下:
- en: '[PRE68]'
id: totrans-387
prefs: []
type: TYPE_PRE
zh: '[PRE68]'
- en: To read or write attributes, the `show()` or `store()` method must be specified
when declaring the attribute. For the common cases [include/linux/sysfs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/sysfs.h)
provides convenience macros ( `__ATTR` , `__ATTR_RO` , `__ATTR_WO` , etc.) to
make defining attributes easier as well as making code more concise and readable.
id: totrans-388
prefs: []
type: TYPE_NORMAL
zh: 为了读取或写入属性,在声明属性时必须指定`show()`或`store()`方法。对于常见情况,[include/linux/sysfs.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/sysfs.h)提供了便利宏(`__ATTR``__ATTR_RO``__ATTR_WO`等),使得定义属性更加容易,同时也使代码更加简洁和易于阅读。
- en: An example of a hello world module which includes the creation of a variable
accessible via sysfs is given below.
id: totrans-389
prefs: []
type: TYPE_NORMAL
zh: 下面给出了一个包含通过sysfs创建可访问变量的hello world模块的示例。
- en: '[PRE69]'
id: totrans-390
prefs: []
type: TYPE_PRE
zh: '[PRE69]'
- en: 'Make and install the module:'
id: totrans-391
prefs: []
type: TYPE_NORMAL
zh: 编译并安装模块:
- en: '[PRE70]'
id: totrans-392
prefs: []
type: TYPE_PRE
zh: '[PRE70]'
- en: 'Check that it exists:'
id: totrans-393
prefs: []
type: TYPE_NORMAL
zh: 检查它是否存在:
- en: '[PRE71]'
id: totrans-394
prefs: []
type: TYPE_PRE
zh: '[PRE71]'
- en: What is the current value of `myvariable` ?
id: totrans-395
prefs: []
type: TYPE_NORMAL
zh: '`myvariable`的当前值是多少?'
- en: '[PRE72]'
id: totrans-396
prefs: []
type: TYPE_PRE
zh: '[PRE72]'
- en: Set the value of `myvariable` and check that it changed.
id: totrans-397
prefs: []
type: TYPE_NORMAL
zh: 设置`myvariable`的值并检查它是否已更改。
- en: '[PRE73]'
id: totrans-398
prefs: []
type: TYPE_PRE
zh: '[PRE73]'
- en: 'Finally, remove the test module:'
id: totrans-399
prefs: []
type: TYPE_NORMAL
zh: 最后,移除测试模块:
- en: '[PRE74]'
id: totrans-400
prefs: []
type: TYPE_PRE
zh: '[PRE74]'
- en: In the above case, we use a simple kobject to create a directory under sysfs,
and communicate with its attributes. Since Linux v2.6.0, the `kobject` structure
made its appearance. It was initially meant as a simple way of unifying kernel
code which manages reference counted objects. After a bit of mission creep, it
is now the glue that holds much of the device model and its sysfs interface together.
For more information about kobject and sysfs, see [Documentation/driver-api/driver-model/driver.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/driver-api/driver-model/driver.rst)
and [https://lwn.net/Articles/51437/](https://lwn.net/Articles/51437/).
id: totrans-401
prefs: []
type: TYPE_NORMAL
zh: 在上述情况下我们使用一个简单的kobject在sysfs下创建一个目录并与它的属性进行通信。自Linux v2.6.0以来,`kobject`结构首次出现。它最初被用作统一管理引用计数对象的内核代码的简单方法。经过一些任务扩张后它现在成为了连接设备模型及其sysfs接口的粘合剂。有关kobject和sysfs的更多信息请参阅[Documentation/driver-api/driver-model/driver.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/driver-api/driver-model/driver.rst)和[https://lwn.net/Articles/51437/](https://lwn.net/Articles/51437/)。
- en: 9 Talking To Device Files
id: totrans-402
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 9 与设备文件通信
- en: Device files are supposed to represent physical devices. Most physical devices
are used for output as well as input, so there has to be some mechanism for device
drivers in the kernel to get the output to send to the device from processes.
This is done by opening the device file for output and writing to it, just like
writing to a file. In the following example, this is implemented by `device_write`
.
id: totrans-403
prefs: []
type: TYPE_NORMAL
zh: 设备文件应该代表物理设备。大多数物理设备既用于输出也用于输入,因此内核中的设备驱动程序必须有一些机制来获取输出并发送到设备。这是通过打开设备文件进行输出并将内容写入其中来完成的,就像写入文件一样。在下面的示例中,这是通过`device_write`实现的。
- en: This is not always enough. Imagine you had a serial port connected to a modem
(even if you have an internal modem, it is still implemented from the CPUs perspective
as a serial port connected to a modem, so you dont have to tax your imagination
too hard). The natural thing to do would be to use the device file to write things
to the modem (either modem commands or data to be sent through the phone line)
and read things from the modem (either responses for commands or the data received
through the phone line). However, this leaves open the question of what to do
when you need to talk to the serial port itself, for example to configure the
rate at which data is sent and received.
id: totrans-404
prefs: []
type: TYPE_NORMAL
zh: 这并不总是足够的。想象一下你有一个连接到调制解调器的串行端口即使你有内置调制解调器从CPU的角度来看它仍然是一个连接到调制解调器的串行端口所以你不需要过度发挥想象力。自然的事情是使用设备文件将信息写入调制解调器无论是调制解调器命令还是要通过电话线发送的数据并从调制解调器读取信息无论是命令的响应还是通过电话线接收的数据。然而这留下了当你需要与串行端口本身通信时该做什么的问题例如配置数据发送和接收的速度。
- en: The answer in Unix is to use a special function called `ioctl` (short for Input
Output ConTroL). Every device can have its own `ioctl` commands, which can be
read ioctls (to send information from a process to the kernel), write ioctls
(to return information to a process), both or neither. Notice here the roles of
read and write are reversed again, so in ioctls read is to send information to
the kernel and write is to receive information from the kernel.
id: totrans-405
prefs: []
type: TYPE_NORMAL
zh: 在Unix中答案是使用一个名为`ioctl`简称Input Output ConTroL的特殊函数。每个设备都可以有自己的`ioctl`命令这些命令可以是读取ioctl从进程发送信息到内核写入ioctl将信息返回给进程两者都有或两者都没有。注意这里读取和写入的角色再次颠倒所以在ioctl中读取是向内核发送信息写入是从内核接收信息。
- en: 'The ioctl function is called with three parameters: the file descriptor of
the appropriate device file, the ioctl number, and a parameter, which is of type
long so you can use a cast to use it to pass anything. You will not be able to
pass a structure this way, but you will be able to pass a pointer to the structure.
Here is an example:'
id: totrans-406
prefs: []
type: TYPE_NORMAL
zh: '`ioctl`函数使用三个参数调用:适当设备文件的文件描述符、`ioctl`编号和一个参数,该参数为`long`类型,因此你可以使用类型转换来使用它传递任何内容。你无法以此方式传递结构体,但你将能够传递结构体的指针。以下是一个示例:'
- en: '[PRE75]'
id: totrans-407
prefs: []
type: TYPE_PRE
zh: '[PRE75]'
- en: You can see there is an argument called `cmd` in `test_ioctl_ioctl()` function.
It is the ioctl number. The ioctl number encodes the major device number, the
type of the ioctl, the command, and the type of the parameter. This ioctl number
is usually created by a macro call ( `_IO` , `_IOR` , `_IOW` or `_IOWR` — depending
on the type) in a header file. This header file should then be included both by
the programs which will use ioctl (so they can generate the appropriate ioctls)
and by the kernel module (so it can understand it). In the example below, the
header file is chardev.h and the program which uses it is userspace_ioctl.c.
id: totrans-408
prefs: []
type: TYPE_NORMAL
zh: 你可以在`test_ioctl_ioctl()`函数中看到一个名为`cmd`的参数。它是`ioctl`编号。`ioctl`编号编码了主设备号、`ioctl`的类型、命令和参数的类型。这个`ioctl`编号通常由头文件中的宏调用(`_IO`、`_IOR`、`_IOW`或`_IOWR`——取决于类型)创建。然后,这个头文件应该被将使用`ioctl`的程序(以便它们可以生成适当的`ioctl`)和内核模块(以便它能够理解它)包含。在下面的示例中,头文件是`chardev.h`,使用它的程序是`userspace_ioctl.c`。
- en: If you want to use ioctls in your own kernel modules, it is best to receive
an official ioctl assignment, so if you accidentally get somebody elses ioctls,
or if they get yours, youll know something is wrong. For more information, consult
the kernel source tree at [Documentation/userspace-api/ioctl/ioctl-number.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/userspace-api/ioctl/ioctl-number.rst).
id: totrans-409
prefs: []
type: TYPE_NORMAL
zh: 如果你想在自己的内核模块中使用`ioctl`,最好是接收一个官方的`ioctl`分配,这样如果你不小心得到了别人的`ioctl`,或者他们得到了你的`ioctl`,你就会知道出了问题。有关更多信息,请参阅内核源树中的[Documentation/userspace-api/ioctl/ioctl-number.rst](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/userspace-api/ioctl/ioctl-number.rst)。
- en: Also, we need to be careful that concurrent access to the shared resources will
lead to the race condition. The solution is using atomic Compare-And-Swap (CAS),
which we mentioned at [Section 6.5](#chardevc), to enforce the exclusive access.
id: totrans-410
prefs: []
type: TYPE_NORMAL
zh: 此外我们需要小心对共享资源的并发访问会导致竞争条件。解决方案是使用原子比较和交换CAS我们在[第6.5节](#chardevc)中提到过,以强制执行独占访问。
- en: '[PRE76]'
id: totrans-411
prefs: []
type: TYPE_PRE
zh: '[PRE76]'
- en: '[PRE77]'
id: totrans-412
prefs: []
type: TYPE_PRE
zh: '[PRE77]'
- en: '[PRE78]'
id: totrans-413
prefs: []
type: TYPE_PRE
zh: '[PRE78]'
- en: 10 System Calls
id: totrans-414
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 10 个系统调用
- en: So far, the only thing weve done was to use well defined kernel mechanisms
to register /proc files and device handlers. This is fine if you want to do something
the kernel programmers thought youd want, such as write a device driver. But
what if you want to do something unusual, to change the behavior of the system
in some way? Then, you are mostly on your own.
id: totrans-415
prefs: []
type: TYPE_NORMAL
zh: 到目前为止,我们唯一做的事情是使用定义良好的内核机制来注册/proc文件和设备处理程序。如果你只想做内核程序员认为你会想做的事情比如编写设备驱动程序这是可以的。但如果你想做些不同寻常的事情以某种方式改变系统的行为呢那么你基本上是孤军奋战。
- en: Notice that this example has been unavailable since Linux v6.9\. Specifically,
after this [commit](https://github.com/torvalds/linux/commit/1e3ad78334a69b36e107232e337f9d693dcc9df2#diff-4a16bf89a09b4f49669a30d54540f0b936ea0224dc6ee9edfa7700deb16c3e11R52),
due to the system call table changing the implementation from an indirect function
call table to a switch statement for security issues, such as Branch History Injection
(BHI) attack. See more information [here](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060909).
id: totrans-416
prefs: []
type: TYPE_NORMAL
zh: 注意这个例子自Linux v6.9以来就不可用。具体来说,在这次[提交](https://github.com/torvalds/linux/commit/1e3ad78334a69b36e107232e337f9d693dcc9df2#diff-4a16bf89a09b4f49669a30d54540f0b936ea0224dc6ee9edfa7700deb16c3e11R52)之后,由于系统调用表从间接函数调用表更改为用于安全问题的开关语句(例如分支历史注入攻击),因此不可用。更多信息请参阅[这里](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060909)。
- en: Should one choose not to use a virtual machine, kernel programming can become
risky. For example, while writing the code below, the `open()` system call was
inadvertently disrupted. This resulted in an inability to open any files, run
programs, or shut down the system, necessitating a restart of the virtual machine.
Fortunately, no critical files were lost in this instance. However, if such modifications
were made on a live, mission-critical system, the consequences could be severe.
To mitigate the risk of file loss, even in a test environment, it is advised to
execute `sync` right before using `insmod` and `rmmod` .
id: totrans-417
prefs: []
type: TYPE_NORMAL
zh: 如果选择不使用虚拟机,内核编程可能会变得危险。例如,在编写以下代码时,`open()` 系统调用意外中断。这导致无法打开任何文件、运行程序或关闭系统,需要重启虚拟机。幸运的是,这次没有丢失任何关键文件。然而,如果在实时、关键任务系统中进行此类修改,后果可能非常严重。为了降低文件丢失的风险,即使在测试环境中,建议在执行
`insmod` 和 `rmmod` 之前立即执行 `sync`。
- en: Forget about /proc files, forget about device files. They are just minor details.
Minutiae in the vast expanse of the universe. The real process to kernel communication
mechanism, the one used by all processes, is system calls. When a process requests
a service from the kernel (such as opening a file, forking to a new process, or
requesting more memory), this is the mechanism used. If you want to change the
behaviour of the kernel in interesting ways, this is the place to do it. By the
way, if you want to see which system calls a program uses, run `strace <arguments>`
.
id: totrans-418
prefs: []
type: TYPE_NORMAL
zh: 忘记/proc文件忘记设备文件。它们只是细节。在广阔的宇宙中微不足道。真正的进程与内核通信机制所有进程都使用的是系统调用。当进程从内核请求服务如打开文件、创建新进程或请求更多内存这就是使用的机制。如果你想以有趣的方式改变内核的行为这就是你要做的。顺便说一句如果你想查看程序使用的系统调用请运行
`strace <arguments>`。
- en: In general, a process is not supposed to be able to access the kernel. It can
not access kernel memory and it cant call kernel functions. The hardware of the
CPU enforces this (that is the reason why it is called “protected mode” or “page
protection”).
id: totrans-419
prefs: []
type: TYPE_NORMAL
zh: 通常进程不应该能够访问内核。它不能访问内核内存也不能调用内核函数。CPU的硬件强制执行这一点这就是为什么它被称为“保护模式”或“页面保护”
- en: System calls are an exception to this general rule. What happens is that the
process fills the registers with the appropriate values and then calls a special
instruction which jumps to a previously defined location in the kernel (of course,
that location is readable by user processes, it is not writable by them). Under
Intel CPUs, this is done by means of interrupt 0x80\. The hardware knows that
once you jump to this location, you are no longer running in restricted user mode,
but as the operating system kernel — and therefore youre allowed to do whatever
you want.
id: totrans-420
prefs: []
type: TYPE_NORMAL
zh: 系统调用是这一通用规则的例外。发生的情况是进程将寄存器填充为适当的值然后调用一个特殊指令该指令跳转到内核中预先定义的位置当然该位置对用户进程是可读的但对它们是不可写的。在Intel
CPU上这是通过中断0x80来完成的。硬件知道一旦你跳转到这个位置你就不再在受限用户模式下运行而是作为操作系统内核——因此你可以做任何你想做的事情。
- en: The location in the kernel a process can jump to is called system_call. The
procedure at that location checks the system call number, which tells the kernel
what service the process requested. Then, it looks at the table of system calls
( `sys_call_table` ) to see the address of the kernel function to call. Then it
calls the function, and after it returns, does a few system checks and then return
back to the process (or to a different process, if the process time ran out).
If you want to read this code, it is at the source file arch/$(architecture)/kernel/entry.S,
after the line `ENTRY(system_call)` .
id: totrans-421
prefs: []
type: TYPE_NORMAL
zh: 进程可以跳转到的内核中的位置称为系统调用。该位置的过程检查系统调用号,这告诉内核进程请求了什么服务。然后,它查看系统调用表(`sys_call_table`),以查看要调用的内核函数的地址。然后它调用该函数,并在返回后执行一些系统检查,然后返回到进程(或者如果进程时间耗尽,返回到不同的进程)。如果你想阅读这段代码,它位于源文件`arch/$(architecture)/kernel/entry.S`中,在`ENTRY(system_call)`行之后。
- en: So, if we want to change the way a certain system call works, what we need to
do is to write our own function to implement it (usually by adding a bit of our
own code, and then calling the original function) and then change the pointer
at `sys_call_table` to point to our function. Because we might be removed later
and we dont want to leave the system in an unstable state, its important for
`cleanup_module` to restore the table to its original state.
id: totrans-422
prefs: []
type: TYPE_NORMAL
zh: 因此,如果我们想改变某个系统调用的行为方式,我们需要编写自己的函数来实现它(通常是通过添加一些自己的代码,然后调用原始函数)并随后将`sys_call_table`中的指针改为指向我们的函数。因为我们可能会被移除,而且我们不希望留下一个不稳定的系统状态,所以对于`cleanup_module`来说,将表恢复到原始状态是很重要的。
- en: To modify the content of `sys_call_table` , we need to consider the control
register. A control register is a processor register that changes or controls
the general behavior of the CPU. For x86 architecture, the cr0 register has various
control flags that modify the basic operation of the processor. The WP flag in
cr0 stands for write protection. Once the WP flag is set, the processor disallows
further write attempts to the read-only sections. Therefore, we must disable the
WP flag before modifying `sys_call_table` . Since Linux v5.3, the `write_cr0`
function cannot be used because of the sensitive cr0 bits pinned by the security
issue, the attacker may write into CPU control registers to disable CPU protections
like write protection. As a result, we have to provide the custom assembly routine
to bypass it.
id: totrans-423
prefs: []
type: TYPE_NORMAL
zh: 要修改`sys_call_table`的内容我们需要考虑控制寄存器。控制寄存器是处理器寄存器它改变或控制CPU的一般行为。对于x86架构cr0寄存器有各种控制标志可以修改处理器的基本操作。cr0中的WP标志代表写保护。一旦WP标志被设置处理器将不允许进一步的写入尝试到只读部分。因此在修改`sys_call_table`之前我们必须禁用WP标志。由于Linux
v5.3以来,`write_cr0`函数不能使用因为敏感的cr0位被安全问题固定攻击者可能写入CPU控制寄存器来禁用CPU保护如写保护。因此我们必须提供定制的汇编例程来绕过它。
- en: However, `sys_call_table` symbol is unexported to prevent misuse. But there
have few ways to get the symbol, manual symbol lookup and `kallsyms_lookup_name`
. Here we use both depend on the kernel version.
id: totrans-424
prefs: []
type: TYPE_NORMAL
zh: 然而,`sys_call_table`符号未导出,以防止误用。但获取该符号的方法很少,包括手动符号查找和`kallsyms_lookup_name`。在这里,我们根据内核版本使用这两种方法。
- en: 'Because of the control-flow integrity, which is a technique to prevent the
redirect execution code from the attacker, for making sure that the indirect calls
go to the expected addresses and the return addresses are not changed. Since Linux
v5.7, the kernel patched the series of control-flow enforcement (CET) for x86,
and some configurations of GCC, like GCC versions 9 and 10 in Ubuntu Linux, will
add with CET (the -fcf-protection option) in the kernel by default. Using that
GCC to compile the kernel with retpoline off may result in CET being enabled in
the kernel. You can use the following command to check out the -fcf-protection
option is enabled or not:'
id: totrans-425
prefs: []
type: TYPE_NORMAL
zh: 由于控制流完整性一种防止攻击者重定向执行代码的技术以确保间接调用到达预期的地址并且返回地址没有被更改。自Linux v5.7以来内核修补了针对x86的控制流强制CET系列并且GCC的一些配置如Ubuntu
Linux中的GCC版本9和10默认会在内核中添加CET-fcf-protection选项。使用该GCC编译内核并关闭retpoline可能会导致内核中启用CET。您可以使用以下命令检查是否启用了-fcf-protection选项
- en: '[PRE79]'
id: totrans-426
prefs: []
type: TYPE_PRE
zh: '[PRE79]'
- en: But CET should not be enabled in the kernel, it may break the Kprobes and bpf.
Consequently, CET is disabled since v5.11\. To guarantee the manual symbol lookup
worked, we only use up to v5.4.
id: totrans-427
prefs: []
type: TYPE_NORMAL
zh: 但是在内核中不应启用CETControl Flow Enforcement Technology它可能会破坏Kprobes和bpf。因此自v5.11版本以来CET已被禁用。为了保证手动符号查找功能正常工作我们只使用到v5.4版本。
- en: Unfortunately, since Linux v5.7 `kallsyms_lookup_name` is also unexported, it
needs certain trick to get the address of `kallsyms_lookup_name` . If `CONFIG_KPROBES`
is enabled, we can facilitate the retrieval of function addresses by means of
Kprobes to dynamically break into the specific kernel routine. Kprobes inserts
a breakpoint at the entry of function by replacing the first bytes of the probed
instruction. When a CPU hits the breakpoint, registers are stored, and the control
will pass to Kprobes. It passes the addresses of the saved registers and the Kprobe
struct to the handler you defined, then executes it. Kprobes can be registered
by symbol name or address. Within the symbol name, the address will be handled
by the kernel.
id: totrans-428
prefs: []
type: TYPE_NORMAL
zh: 不幸的是由于Linux v5.7 `kallsyms_lookup_name` 也未导出,需要一定的技巧来获取`kallsyms_lookup_name`的地址。如果启用了`CONFIG_KPROBES`我们可以通过Kprobes动态中断特定的内核例程来方便地检索函数地址。Kprobes通过替换被探测指令的第一字节在函数入口处插入一个断点。当CPU遇到断点时寄存器被存储控制权传递给Kprobes。它将保存的寄存器地址和Kprobe结构传递给您定义的处理程序然后执行它。Kprobes可以通过符号名称或地址进行注册。在符号名称中地址将由内核处理。
- en: 'Otherwise, specify the address of `sys_call_table` from /proc/kallsyms and
/boot/System.map into `sym` parameter. Following is the sample usage for /proc/kallsyms:'
id: totrans-429
prefs: []
type: TYPE_NORMAL
zh: 否则,请从/proc/kallsyms和/boot/System.map中指定`sys_call_table`的地址到`sym`参数中。以下是从/proc/kallsyms的示例用法
- en: '[PRE80]'
id: totrans-430
prefs: []
type: TYPE_PRE
zh: '[PRE80]'
- en: Using the address from /boot/System.map, be careful about KASLR (Kernel Address
Space Layout Randomization). KASLR may randomize the address of kernel code and
data at every boot time, such as the static address listed in /boot/System.map
will offset by some entropy. The purpose of KASLR is to protect the kernel space
from the attacker. Without KASLR, the attacker may find the target address in
the fixed address easily. Then the attacker can use return-oriented programming
to insert some malicious codes to execute or receive the target data by a tampered
pointer. KASLR mitigates these kinds of attacks because the attacker cannot immediately
know the target address, but a brute-force attack can still work. If the address
of a symbol in /proc/kallsyms is different from the address in /boot/System.map,
KASLR is enabled with the kernel, which your system running on.
id: totrans-431
prefs: []
type: TYPE_NORMAL
zh: 使用/boot/System.map中的地址时请注意KASLR内核地址空间布局随机化。KASLR可能会在每次启动时随机化内核代码和数据地址例如/boot/System.map中列出的静态地址将偏移一定的熵。KASLR的目的是为了保护内核空间免受攻击者攻击。如果没有KASLR攻击者可以轻易地找到固定地址中的目标地址。然后攻击者可以使用返回导向编程插入一些恶意代码来执行或通过篡改的指针接收目标数据。KASLR通过攻击者无法立即知道目标地址来减轻这类攻击。如果/proc/kallsyms中符号的地址与/boot/System.map中的地址不同则表示内核启用了KASLR您正在运行的系统就是这种情况。
- en: '[PRE81]'
id: totrans-432
prefs: []
type: TYPE_PRE
zh: '[PRE81]'
- en: 'If KASLR is enabled, we have to take care of the address from /proc/kallsyms
each time we reboot the machine. In order to use the address from /boot/System.map,
make sure that KASLR is disabled. You can add the nokaslr for disabling KASLR
in next booting time:'
id: totrans-433
prefs: []
type: TYPE_NORMAL
zh: 如果启用了KASLRKernel Address Space Layout Randomization每次重启机器时我们都必须注意/proc/kallsyms中的地址。为了使用/boot/System.map中的地址请确保KASLR已禁用。您可以在下一次启动时添加nokaslr来禁用KASLR
- en: '[PRE82]'
id: totrans-434
prefs: []
type: TYPE_PRE
zh: '[PRE82]'
- en: 'For more information, check out the following:'
id: totrans-435
prefs: []
type: TYPE_NORMAL
zh: 更多信息,请参阅以下内容:
- en: '[Cook: Security things in Linux v5.3](https://lwn.net/Articles/804849/)'
id: totrans-436
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[Cook: Linux v5.3 中的安全事项](https://lwn.net/Articles/804849/)'
- en: '[Unexporting the system call table](https://lwn.net/Articles/12211/)'
id: totrans-437
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[取消导出系统调用表](https://lwn.net/Articles/12211/)'
- en: '[Control-flow integrity for the kernel](https://lwn.net/Articles/810077/)'
id: totrans-438
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[内核的控制流完整性](https://lwn.net/Articles/810077/)'
- en: '[Unexporting kallsyms_lookup_name()](https://lwn.net/Articles/813350/)'
id: totrans-439
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[取消导出 kallsyms_lookup_name()](https://lwn.net/Articles/813350/)'
- en: '[Kernel Probes (Kprobes)](https://www.kernel.org/doc/Documentation/kprobes.txt)'
id: totrans-440
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[内核探针 (Kprobes)](https://www.kernel.org/doc/Documentation/kprobes.txt)'
- en: '[Kernel address space layout randomization](https://lwn.net/Articles/569635/)'
id: totrans-441
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[内核地址空间布局随机化](https://lwn.net/Articles/569635/)'
- en: The source code here is an example of such a kernel module. We want to “spy”
on a certain user, and to `pr_info()` a message whenever that user opens a file.
Towards this end, we replace the system call to open a file with our own function,
called `our_sys_openat` . This function checks the uid (users id) of the current
process, and if it is equal to the uid we spy on, it calls `pr_info()` to display
the name of the file to be opened. Then, either way, it calls the original `openat()`
function with the same parameters, to actually open the file.
id: totrans-442
prefs: []
type: TYPE_NORMAL
zh: 这里提供的源代码是一个这样的内核模块示例。我们想要“监视”某个特定的用户,并且每当该用户打开文件时,就使用 `pr_info()` 显示一条消息。为此,我们用我们自己的函数替换打开文件的系统调用,该函数称为
`our_sys_openat`。这个函数检查当前进程的 uid用户 ID如果它与我们要监视的 uid 相等,它就调用 `pr_info()` 显示要打开的文件名。然后,无论如何,它都使用相同的参数调用原始的
`openat()` 函数,以实际打开文件。
- en: The `init_module` function replaces the appropriate location in `sys_call_table`
and keeps the original pointer in a variable. The `cleanup_module` function uses
that variable to restore everything back to normal. This approach is dangerous,
because of the possibility of two kernel modules changing the same system call.
Imagine we have two kernel modules, A and B. As openat system call will be `A_openat`
and Bs will be `B_openat` . Now, when A is inserted into the kernel, the system
call is replaced with `A_openat` , which will call the original `sys_openat` when
it is done. Next, B is inserted into the kernel, which replaces the system call
with `B_openat` , which will call what it thinks is the original system call,
`A_openat` , when its done.
id: totrans-443
prefs: []
type: TYPE_NORMAL
zh: '`init_module` 函数替换了 `sys_call_table` 中的适当位置,并将原始指针保存在一个变量中。`cleanup_module`
函数使用该变量将一切恢复到正常状态。这种方法很危险因为可能有两个内核模块更改相同的系统调用。想象一下我们有两个内核模块A 和 B。A 的 openat
系统调用将是 `A_openat`,而 B 的将是 `B_openat`。现在,当 A 被插入内核时,系统调用被替换为 `A_openat`,完成后将调用原始的
`sys_openat`。接下来B 被插入内核,它将系统调用替换为 `B_openat`,完成后将调用它认为的原始系统调用,即 `A_openat`。'
- en: Now, if B is removed first, everything will be well — it will simply restore
the system call to `A_openat` , which calls the original. However, if A is removed
and then B is removed, the system will crash. As removal will restore the system
call to the original, `sys_openat` , cutting B out of the loop. Then, when B is
removed, it will restore the system call to what it thinks is the original, `A_openat`
, which is no longer in memory. At first glance, it appears we could solve this
particular problem by checking if the system call is equal to our open function
and if so not changing it at all (so that B wont change the system call when
it is removed), but that will cause an even worse problem. When A is removed,
it sees that the system call was changed to `B_openat` so that it is no longer
pointing to `A_openat` , so it will not restore it to `sys_openat` before it is
removed from memory. Unfortunately, `B_openat` will still try to call `A_openat`
which is no longer there, so that even without removing B the system would crash.
id: totrans-444
prefs: []
type: TYPE_NORMAL
zh: 现在,如果首先移除 B一切都会好——它将简单地恢复系统调用到 `A_openat`,这将调用原始的。然而,如果先移除 A然后移除 B系统将崩溃。A
的移除将恢复系统调用到原始的 `sys_openat`,将 B 排除在循环之外。然后,当 B 被移除时,它将恢复系统调用到它认为的原始,即 `A_openat`,但这个调用已经不在内存中了。乍一看,我们似乎可以通过检查系统调用是否等于我们的
open 函数,如果是,则完全不更改它(这样 B 在移除时就不会更改系统调用),但这将导致更糟糕的问题。当 A 被移除时,它看到系统调用已被更改为 `B_openat`,因此它不再指向
`A_openat`,所以在从内存中移除之前不会将其恢复到 `sys_openat`。不幸的是,`B_openat` 仍然会尝试调用不再存在的 `A_openat`,因此即使没有移除
B系统也会崩溃。
- en: For x86 architecture, the system call table cannot be used to invoke a system
call after commit [1e3ad78](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e3ad78334a69b36e107232e337f9d693dcc9df2)
since v6.9\. This commit has been backported to long term stable kernels, like
v5.15.154+, v6.1.85+, v6.6.26+ and v6.8.5+, see this [answer](https://stackoverflow.com/a/78607015)
for more details. In this case, thanks to Kprobes, a hook can be used instead
on the system call entry to intercept the system call.
id: totrans-445
prefs: []
type: TYPE_NORMAL
zh: 对于x86架构从v6.9版本开始,系统调用表不能用于在提交后调用系统调用[1e3ad78](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e3ad78334a69b36e107232e337f9d693dcc9df2)。这个提交已经回滚到长期稳定的内核如v5.15.154+、v6.1.85+、v6.6.26+和v6.8.5+,更多详情请参阅这个[回答](https://stackoverflow.com/a/78607015)。在这种情况下多亏了Kprobes可以在系统调用入口处使用钩子来拦截系统调用。
- en: Note that all the related problems make syscall stealing unfeasible for production
use. In order to keep people from doing potentially harmful things `sys_call_table`
is no longer exported. This means, if you want to do something more than a mere
dry run of this example, you will have to patch your current kernel in order to
have `sys_call_table` exported.
id: totrans-446
prefs: []
type: TYPE_NORMAL
zh: 注意所有相关的问题使得syscall stealing在生产环境中不可行。为了防止人们做可能有害的事情`sys_call_table`不再导出。这意味着,如果你想做一些不仅仅是这个例子简单运行的事情,你将不得不修补你的当前内核以导出`sys_call_table`。
- en: '[PRE83]'
id: totrans-447
prefs: []
type: TYPE_PRE
zh: '[PRE83]'
- en: 11 Blocking Processes and threads
id: totrans-448
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 11 阻塞进程和线程
- en: 11.1 Sleep
id: totrans-449
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 11.1 睡眠
- en: 'What do you do when somebody asks you for something you can not do right away?
If you are a human being and you are bothered by a human being, the only thing
you can say is: "Not right now, Im busy. Go away!". But if you are a kernel module
and you are bothered by a process, you have another possibility. You can put the
process to sleep until you can service it. After all, processes are being put
to sleep by the kernel and woken up all the time (that is the way multiple processes
appear to run on the same time on a single CPU).'
id: totrans-450
prefs: []
type: TYPE_NORMAL
zh: 当有人向你请求你无法立即完成的事情时你会怎么做如果你是人并且被另一个人打扰你能说的唯一一件事就是“现在不行我正忙。请走开”但如果你是一个内核模块并且被一个进程打扰你还有另一种可能性。你可以将进程置于睡眠状态直到你可以服务它。毕竟进程是由内核置入睡眠状态并随时唤醒的这就是为什么多个进程似乎可以在单个CPU上同时运行的原因
- en: 'This kernel module is an example of this. The file (called /proc/sleep) can
only be opened by a single process at a time. If the file is already open, the
kernel module calls `wait_event_interruptible` . The easiest way to keep a file
open is to open it with:'
id: totrans-451
prefs: []
type: TYPE_NORMAL
zh: 这个内核模块是这个例子。文件(称为/proc/sleep一次只能由一个进程打开。如果文件已经打开内核模块会调用`wait_event_interruptible`。保持文件打开的最简单方法是使用以下方式打开它:
- en: '[PRE84]'
id: totrans-452
prefs: []
type: TYPE_PRE
zh: '[PRE84]'
- en: This function changes the status of the task (a task is the kernel data structure
which holds information about a process and the system call it is in, if any)
to `TASK_INTERRUPTIBLE` , which means that the task will not run until it is woken
up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file.
Then, the function calls the scheduler to context switch to a different process,
one which has some use for the CPU.
id: totrans-453
prefs: []
type: TYPE_NORMAL
zh: 这个函数将任务的状态(任务是一个内核数据结构,它包含有关进程及其(如果有的话)正在进行的系统调用的信息)更改为`TASK_INTERRUPTIBLE`这意味着任务将不会运行直到以某种方式被唤醒并将其添加到等待队列中即等待访问文件的队列。然后该函数调用调度器以进行上下文切换到另一个进程该进程对CPU有一些用途。
- en: When a process is done with the file, it closes it, and `module_close` is called.
That function wakes up all the processes in the queue (theres no mechanism to
only wake up one of them). It then returns and the process which just closed the
file can continue to run. In time, the scheduler decides that that process has
had enough and gives control of the CPU to another process. Eventually, one of
the processes which was in the queue will be given control of the CPU by the scheduler.
It starts at the point right after the call to `wait_event_interruptible` .
id: totrans-454
prefs: []
type: TYPE_NORMAL
zh: 当一个进程完成对文件的访问后,它会关闭它,并调用`module_close`函数。该函数唤醒队列中的所有进程没有机制可以只唤醒其中一个。然后它返回刚刚关闭文件的进程可以继续运行。随着时间的推移调度器决定该进程已经足够了并将CPU的控制权交给另一个进程。最终队列中的一个进程将由调度器获得CPU的控制权。它从`wait_event_interruptible`调用之后的点开始执行。
- en: This means that the process is still in kernel mode - as far as the process
is concerned, it issued the open system call and the system call has not returned
yet. The process does not know somebody else used the CPU for most of the time
between the moment it issued the call and the moment it returned.
id: totrans-455
prefs: []
type: TYPE_NORMAL
zh: 这意味着进程仍然在内核模式下——就进程而言,它发出了打开系统调用,而系统调用尚未返回。进程不知道在它发出调用和返回之间的大部分时间,有人 else 使用了CPU。
- en: It can then proceed to set a global variable to tell all the other processes
that the file is still open and go on with its life. When the other processes
get a piece of the CPU, theyll see that global variable and go back to sleep.
id: totrans-456
prefs: []
type: TYPE_NORMAL
zh: 然后它可以继续设置一个全局变量来告诉所有其他进程文件仍然打开并继续其生命周期。当其他进程获得CPU的一部分时他们会看到这个全局变量然后再次进入睡眠状态。
- en: So we will use `tail -f` to keep the file open in the background, and attempt
to access it with another background process. This way, we dont need to switch
to another terminal window or virtual terminal to run the second process. As soon
as the first background process is killed with kill %1 , the second is woken up,
is able to access the file and finally terminates.
id: totrans-457
prefs: []
type: TYPE_NORMAL
zh: 因此,我们将使用`tail -f`来在后台保持文件打开并尝试用另一个后台进程访问它。这样我们就不需要切换到另一个终端窗口或虚拟终端来运行第二个进程。一旦第一个后台进程被kill
%1杀死第二个进程就会被唤醒能够访问文件并最终终止。
- en: To make our life more interesting, `module_close` does not have a monopoly on
waking up the processes which wait to access the file. A signal, such as Ctrl
+c (SIGINT) can also wake up a process. This is because we used `wait_event_interruptible`
. We could have used `wait_event` instead, but that would have resulted in extremely
angry users whose Ctrl+cs are ignored.
id: totrans-458
prefs: []
type: TYPE_NORMAL
zh: 为了让我们的生活更有趣,`module_close`并不独占唤醒等待访问文件的进程。一个信号比如Ctrl + cSIGINT也可以唤醒一个进程。这是因为我们使用了`wait_event_interruptible`。我们本可以使用`wait_event`但那样会导致用户非常愤怒因为他们的Ctrl+c被忽略了。
- en: In that case, we want to return with `-EINTR` immediately. This is important
so users can, for example, kill the process before it receives the file.
id: totrans-459
prefs: []
type: TYPE_NORMAL
zh: 在那种情况下,我们希望立即返回`-EINTR`。这很重要,这样用户可以在进程收到文件之前杀死它。
- en: There is one more point to remember. Some times processes dont want to sleep,
they want either to get what they want immediately, or to be told it cannot be
done. Such processes use the `O_NONBLOCK` flag when opening the file. The kernel
is supposed to respond by returning with the error code `-EAGAIN` from operations
which would otherwise block, such as opening the file in this example. The program
`cat_nonblock` , available in the examples/other directory, can be used to open
a file with `O_NONBLOCK` .
id: totrans-460
prefs: []
type: TYPE_NORMAL
zh: 还有一点需要记住。有时进程不想睡眠,它们要么想要立即得到它们想要的,要么被告知无法完成。这类进程在打开文件时使用`O_NONBLOCK`标志。内核应该通过返回错误代码`-EAGAIN`来响应这些操作在其他情况下会阻塞例如在这个例子中打开文件。可以在examples/other目录中找到的`cat_nonblock`程序可以用来以`O_NONBLOCK`打开一个文件。
- en: '[PRE85]'
id: totrans-461
prefs: []
type: TYPE_PRE
zh: '[PRE85]'
- en: '[PRE86]'
id: totrans-462
prefs: []
type: TYPE_PRE
zh: '[PRE86]'
- en: '[PRE87]'
id: totrans-463
prefs: []
type: TYPE_PRE
zh: '[PRE87]'
- en: 11.2 Completions
id: totrans-464
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 11.2 完成操作
- en: Sometimes one thing should happen before another within a module having multiple
threads. Rather than using `/bin/sleep` commands, the kernel has another way to
do this which allows timeouts or interrupts to also happen.
id: totrans-465
prefs: []
type: TYPE_NORMAL
zh: 有时在具有多个线程的模块中,一件事情应该在另一件事情之前发生。与其使用`/bin/sleep`命令,内核还有另一种方法来做这件事,这允许超时或中断也发生。
- en: Completions as code synchronization mechanism have three main parts, initialization
of struct completion synchronization object, the waiting or barrier part through
`wait_for_completion()` , and the signalling side through a call to `complete()`
.
id: totrans-466
prefs: []
type: TYPE_NORMAL
zh: 完成操作作为代码同步机制有三个主要部分:结构体完成同步对象的初始化,通过`wait_for_completion()`的等待或屏障部分,以及通过调用`complete()`的信号部分。
- en: 'In the subsequent example, two threads are initiated: crank and flywheel. It
is imperative that the crank thread starts before the flywheel thread. A completion
state is established for each of these threads, with a distinct completion defined
for both the crank and flywheel threads. At the exit point of each thread the
respective completion state is updated, and `wait_for_completion` is used by the
flywheel thread to ensure that it does not begin prematurely. The crank thread
uses the `complete_all()` function to update the completion, which lets the flywheel
thread continue.'
id: totrans-467
prefs: []
type: TYPE_NORMAL
zh: 在后续的示例中启动了两个线程crank和flywheel。必须确保crank线程在flywheel线程之前启动。为这些线程中的每一个都建立了一个完成状态为crank和flywheel线程分别定义了不同的完成状态。在每个线程的退出点更新相应的完成状态flywheel线程使用`wait_for_completion`来确保它不会提前开始。crank线程使用`complete_all()`函数来更新完成状态这允许flywheel线程继续。
- en: So even though `flywheel_thread` is started first you should notice when you
load this module and run `dmesg` , that turning the crank always happens first
because the flywheel thread waits for the crank thread to complete.
id: totrans-468
prefs: []
type: TYPE_NORMAL
zh: 因此,即使`flywheel_thread`首先启动,你应该注意当你加载此模块并运行`dmesg`时转动曲柄总是先发生因为flywheel线程等待crank线程完成。
- en: There are other variations of the `wait_for_completion` function, which include
timeouts or being interrupted, but this basic mechanism is enough for many common
situations without adding a lot of complexity.
id: totrans-469
prefs: []
type: TYPE_NORMAL
zh: '`wait_for_completion` 函数有其他变体,包括超时或被中断,但这个基本机制对于许多常见情况来说已经足够,无需增加太多复杂性。'
- en: '[PRE88]'
id: totrans-470
prefs: []
type: TYPE_PRE
zh: '[PRE88]'
- en: 12 Synchronization
id: totrans-471
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 12 同步
- en: If processes running on different CPUs or in different threads try to access
the same memory, then it is possible that strange things can happen or your system
can lock up. To avoid this, various types of mutual exclusion kernel functions
are available. These indicate if a section of code is "locked" or "unlocked" so
that simultaneous attempts to run it can not happen.
id: totrans-472
prefs: []
type: TYPE_NORMAL
zh: 如果在不同CPU上运行或在不同线程中运行的过程尝试访问相同的内存那么可能会发生奇怪的事情或者你的系统可能会锁定。为了避免这种情况有各种类型的互斥锁内核函数可用。这些函数指示代码的某个部分是“锁定”还是“未锁定”这样就不能同时尝试运行它。
- en: 12.1 Mutex
id: totrans-473
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 12.1 互斥锁
- en: You can use kernel mutexes (mutual exclusions) in much the same manner that
you might deploy them in userland. This may be all that is needed to avoid collisions
in most cases.
id: totrans-474
prefs: []
type: TYPE_NORMAL
zh: 你可以使用内核互斥锁(互斥排他)的方式,就像你可能在用户空间部署它们一样。在大多数情况下,这可能就足够避免冲突了。
- en: 'Mutexes in the Linux kernel enforce strict ownership: only the task that successfully
acquired the mutex can release (or unlock) it. Attempting to release a mutex held
by another task or releasing an unheld mutex multiple times by the same task typically
leads to errors or undefined behavior. If a task tries to lock a mutex it already
holds, it may be blocked or sleep, where the task waits for itself to release
the lock.'
id: totrans-475
prefs: []
type: TYPE_NORMAL
zh: Linux内核中的互斥锁强制执行严格的拥有权只有成功获取互斥锁的任务才能释放或解锁它。尝试释放另一个任务持有的互斥锁或同一任务多次释放未持有的互斥锁通常会导致错误或未定义的行为。如果任务尝试锁定它已经持有的互斥锁它可能会被阻塞或休眠此时任务等待自己释放锁。
- en: Before use, a mutex must be initialized through specific APIs (such as `mutex_init`
or by using the `DEFINE_MUTEX` macro for compile-time initialization). And it
is prohibited to directly modify the internal structure of a mutex using a memory
manipulation function like `memset` .
id: totrans-476
prefs: []
type: TYPE_NORMAL
zh: 在使用之前必须通过特定的API如`mutex_init`)或使用`DEFINE_MUTEX`宏进行编译时初始化来初始化互斥锁。并且禁止使用如`memset`这样的内存操作函数直接修改互斥锁的内部结构。
- en: '[PRE89]'
id: totrans-477
prefs: []
type: TYPE_PRE
zh: '[PRE89]'
- en: The various suffixes appended to mutex functions in the Linux kernel primarily
dictate how a task waiting to acquire a lock will behave, particularly concerning
its interruptibility.
id: totrans-478
prefs: []
type: TYPE_NORMAL
zh: Linux内核中附加到互斥锁函数的各种后缀主要决定了等待获取锁的任务将如何行为特别是在可中断性方面。
- en: When a task calls `mutex_lock()` , and if the mutex is currently unavailable,
the task enters a sleep state until it can successfully obtain the lock. During
this period, the task cannot be interrupted. In contrast, functions with the `_interruptible`
suffix, such as `mutex_lock_interruptible()` , behave similarly to `mutex_lock()`
but allow the waiting process to be interrupted by signals. If a task receives
a signal (like a termination signal) while waiting for the lock, it will exit
the waiting state and return an error code ( `-EINTR` ). This is useful for applications
that need to handle external events even while waiting for a lock.
id: totrans-479
prefs: []
type: TYPE_NORMAL
zh: 当一个任务调用 `mutex_lock()` 时,如果互斥锁当前不可用,该任务将进入睡眠状态,直到它成功获得锁。在此期间,任务不能被中断。相比之下,带有
`_interruptible` 后缀的函数,例如 `mutex_lock_interruptible()`,其行为类似于 `mutex_lock()`,但允许等待进程被信号中断。如果一个任务在等待锁的过程中收到信号(如终止信号),它将退出等待状态并返回一个错误代码(`-EINTR`)。这对于需要即使在等待锁的同时处理外部事件的应用程序来说很有用。
- en: Beyond these fundamental locking behaviors, other mutex functions offer specialized
capabilities. Functions like `mutex_lock_nested` and `mutex_lock_interruptible_nested()`
incorporate the `__nested()` functionality, providing support for nested locking.
This prior locking mechanism aids in managing lock acquisition and preventing
deadlocks, often employing a subclass parameter for more precise deadlock detection.
The latter variant combines nested locking with the ability for the waiting process
to be interrupted by signals. Another function is `mutex_trylock()` , which attempts
to acquire the mutex without blocking. It returns 1 if the lock is successfully
acquired and 0 if the mutex is already held by another task.
id: totrans-480
prefs: []
type: TYPE_NORMAL
zh: 除了这些基本的锁定行为之外,其他互斥锁函数还提供了专门的功能。例如,`mutex_lock_nested` 和 `mutex_lock_interruptible_nested()`
函数结合了 `__nested()` 功能,提供了嵌套锁定的支持。这种先前的锁定机制有助于管理锁获取并防止死锁,通常使用子类参数进行更精确的死锁检测。后一种变体将嵌套锁定与等待进程可以被信号中断的能力相结合。另一个函数是
`mutex_trylock()`它尝试获取互斥锁而不阻塞。如果成功获取锁则返回1如果互斥锁已被其他任务持有则返回0。
- en: Despite the fact that `mutex_trylock` does not sleep, it is still generally
not safe for use in interrupt context because its implementation isnt atomic.
If an interrupt occurs between checking the locks availability and its acquisition,
this can lead to race conditions and potential data corruption.
id: totrans-481
prefs: []
type: TYPE_NORMAL
zh: 尽管`mutex_trylock`不睡眠,但由于其实现不是原子的,因此在中断上下文中通常不安全使用。如果在检查锁的可用性和获取锁之间发生中断,这可能导致竞争条件和潜在的数据损坏。
- en: 12.2 Spinlocks
id: totrans-482
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 12.2 自旋锁
- en: As the name suggests, spinlocks lock up the CPU that the code is running on,
taking 100% of its resources. Because of this you should only use the spinlock
mechanism around code which is likely to take no more than a few milliseconds
to run and so will not noticeably slow anything down from the users point of
view.
id: totrans-483
prefs: []
type: TYPE_NORMAL
zh: 如其名所示自旋锁锁定正在运行的代码的CPU占用其100%的资源。因此,你应该只在代码可能运行不超过几毫秒且不会明显减慢用户视角中的任何事物的情况下使用自旋锁机制。
- en: The example here is "irq safe" in that if interrupts happen during the lock
then they will not be forgotten and will activate when the unlock happens, using
the `flags` variable to retain their state.
id: totrans-484
prefs: []
type: TYPE_NORMAL
zh: 此处的示例是“中断安全”的,即如果在锁定过程中发生中断,则它们不会被遗忘,并在解锁时通过使用 `flags` 变量保留其状态激活。
- en: '[PRE90]'
id: totrans-485
prefs: []
type: TYPE_PRE
zh: '[PRE90]'
- en: Taking 100% of a CPUs resources comes with greater responsibility. Situations
where the kernel code monopolizes a CPU are called atomic contexts. Holding a
spinlock is one of those situations. Sleeping in atomic contexts may leave the
system hanging, as the occupied CPU devotes 100% of its resources doing nothing
but sleeping. In some worse cases the system may crash. Thus, sleeping in atomic
contexts is considered a bug in the kernel. They are sometimes called “sleep-in-atomic-context”
in some materials.
id: totrans-486
prefs: []
type: TYPE_NORMAL
zh: 占用CPU的100%资源伴随着更大的责任。内核代码垄断CPU的情况被称为原子上下文。持有自旋锁就是这种情况之一。在原子上下文中睡眠可能会导致系统挂起因为占用的CPU将100%的资源用于无休止的睡眠。在某些更糟糕的情况下,系统可能会崩溃。因此,在原子上下文中睡眠被视为内核中的错误。在某些材料中,它们有时被称为“原子上下文中的睡眠”。
- en: Note that sleeping here is not limited to calling the sleep functions explicitly.
If subsequent function calls eventually invoke a function that sleeps, it is also
considered sleeping. Thus, it is important to pay attention to functions being
used in atomic context. Theres no documentation recording all such functions,
but code comments may help. Sometimes you may find comments in kernel source code
stating that a function “may sleep”, “might sleep”, or more explicitly “the caller
should not hold a spinlock”. Those comments are hints that a function may implicitly
sleep and must not be called in atomic contexts.
id: totrans-487
prefs: []
type: TYPE_NORMAL
zh: 注意,这里的睡眠不仅限于显式调用睡眠函数。如果后续的函数调用最终调用了会睡眠的函数,这也被认为是睡眠。因此,注意在原子上下文中使用的函数非常重要。没有文档记录所有这些函数,但代码注释可能会有所帮助。有时你可能会在内核源代码中找到注释,表明一个函数“可能会睡眠”、“可能睡眠”或更明确地说“调用者不应持有自旋锁”。这些注释是提示,表明一个函数可能会隐式睡眠,并且不应在原子上下文中调用。
- en: 'Now, lets differentiate between a few types of spinlock functions in the Linux
kernel: `spin_lock()` , `spin_lock_irq()` , `spin_lock_irqsave()` , and `spin_lock_bh()`
.'
id: totrans-488
prefs: []
type: TYPE_NORMAL
zh: 现在,让我们区分 Linux 内核中几种自旋锁函数的类型:`spin_lock()`、`spin_lock_irq()`、`spin_lock_irqsave()`
和 `spin_lock_bh()`。
- en: '`spin_lock()` does not allow the CPU to sleep while waiting for the lock, which
makes it suitable for most use cases where the critical section is short. However,
this is problematic for real-time Linux because spinlocks in this configuration
behave as sleeping locks. This can prevent other tasks from running and cause
the system to become unresponsive. To address this in real-time Linux environments,
a `raw_spin_lock()` is used, which behaves similarly to a `spin_lock()` but without
causing the system to sleep.'
id: totrans-489
prefs: []
type: TYPE_NORMAL
zh: '`spin_lock()` 不允许 CPU 在等待锁时睡眠,这使得它在临界区短的情况下大多数用例中都很适用。然而,这对于实时 Linux 来说是问题,因为这种配置下的自旋锁表现得像睡眠锁。这可能会阻止其他任务运行,并导致系统无响应。为了在实时
Linux 环境中解决这个问题,使用了一个 `raw_spin_lock()`,它表现得像 `spin_lock()`,但不会导致系统睡眠。'
- en: On the other hand, `spin_lock_irq()` disables interrupts while holding the lock,
but it does not save the interrupt state. This means that if an interrupt occurs
while the lock is held, the interrupt state could be lost. In contrast, `spin_lock_irqsave()`
disables interrupts and also saves the interrupt state, ensuring that interrupts
are restored to their previous state when the lock is released. This makes `spin_lock_irqsave()`
a safer option in scenarios where preserving the interrupt state is crucial.
id: totrans-490
prefs: []
type: TYPE_NORMAL
zh: 另一方面,`spin_lock_irq()` 在持有锁的同时禁用中断,但它不会保存中断状态。这意味着如果在持有锁的过程中发生中断,中断状态可能会丢失。相比之下,`spin_lock_irqsave()`
禁用中断并保存中断状态,确保在释放锁时中断恢复到其之前的状态。这使得 `spin_lock_irqsave()` 在需要保留中断状态的关键场景中成为一个更安全的选项。
- en: Next, `spin_lock_bh()` disables softirqs (software interrupts) but allows hardware
interrupts to continue. Unlike `spin_lock_irq()` and `spin_lock_irqsave()` , which
disable both hardware and software interrupts, `spin_lock_bh()` is useful when
hardware interrupts need to remain active.
id: totrans-491
prefs: []
type: TYPE_NORMAL
zh: 接下来,`spin_lock_bh()` 禁用软中断(软件中断),但允许硬件中断继续。与 `spin_lock_irq()` 和 `spin_lock_irqsave()`
不同,它们禁用硬件和软件中断,`spin_lock_bh()` 在需要保持硬件中断活跃时非常有用。
- en: 'For more information about spinlock usage and lock types, see the following
resources:'
id: totrans-492
prefs: []
type: TYPE_NORMAL
zh: 关于自旋锁的使用和锁类型的信息,请参阅以下资源:
- en: '[Lesson 1: Spin locks](https://www.kernel.org/doc/Documentation/locking/spinlocks.txt)'
id: totrans-493
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[课程 1自旋锁](https://www.kernel.org/doc/Documentation/locking/spinlocks.txt)'
- en: '[Lock types and their rules](https://docs.kernel.org/locking/locktypes.html)'
id: totrans-494
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[锁类型及其规则](https://docs.kernel.org/locking/locktypes.html)'
- en: 12.3 Read and write locks
id: totrans-495
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 12.3 读写锁
- en: Read and write locks are specialised kinds of spinlocks so that you can exclusively
read from something or write to something. Like the earlier spinlocks example,
the one below shows an "irq safe" situation in which if other functions were triggered
from irqs which might also read and write to whatever you are concerned with then
they would not disrupt the logic. As before it is a good idea to keep anything
done within the lock as short as possible so that it does not hang up the system
and cause users to start revolting against the tyranny of your module.
id: totrans-496
prefs: []
type: TYPE_NORMAL
zh: 读写锁是特殊的自旋锁,这样你可以独占地读取或写入某个东西。像之前的自旋锁示例一样,下面的示例展示了“中断安全”的情况,如果其他函数从中断触发,而这些中断也可能读取和写入你关心的东西,那么它们不会破坏逻辑。和之前一样,最好将锁内完成的任何操作尽可能保持简短,以免系统挂起并导致用户开始反抗你模块的暴政。
- en: '[PRE91]'
id: totrans-497
prefs: []
type: TYPE_PRE
zh: '[PRE91]'
- en: Of course, if you know for sure that there are no functions triggered by irqs
which could possibly interfere with your logic then you can use the simpler `read_lock(&myrwlock)`
and `read_unlock(&myrwlock)` or the corresponding write functions.
id: totrans-498
prefs: []
type: TYPE_NORMAL
zh: 当然,如果你确定没有由中断触发的功能可能会干扰你的逻辑,那么你可以使用更简单的 `read_lock(&myrwlock)` 和 `read_unlock(&myrwlock)`
或相应的写函数。
- en: 12.4 Atomic operations
id: totrans-499
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 12.4 原子操作
- en: 'If you are doing simple arithmetic: adding, subtracting or bitwise operations,
then there is another way in the multi-CPU and multi-hyperthreaded world to stop
other parts of the system from messing with your mojo. By using atomic operations
you can be confident that your addition, subtraction or bit flip did actually
happen and was not overwritten by some other shenanigans. An example is shown
below.'
id: totrans-500
prefs: []
type: TYPE_NORMAL
zh: 如果你正在进行简单的算术运算加法、减法或位操作那么在多CPU和多超线程的世界中还有另一种方法可以阻止系统的其他部分干扰你的操作。通过使用原子操作你可以确信你的加法、减法或位翻转确实发生了并且没有被其他一些恶作剧覆盖。以下是一个示例。
- en: '[PRE92]'
id: totrans-501
prefs: []
type: TYPE_PRE
zh: '[PRE92]'
- en: 'Before the C11 standard adopted the built-in atomic types, the kernel already
provided a small set of atomic types by using a bunch of tricky architecture-specific
codes. Implementing the atomic types by C11 atomics may allow the kernel to throw
away the architecture-specific codes and make the kernel code be more friendly
to the people who understand the standard. But there are some problems, such as
the memory model of the kernel doesnt match the model formed by the C11 atomics.
For further details, see:'
id: totrans-502
prefs: []
type: TYPE_NORMAL
zh: 在C11标准采用内置原子类型之前内核已经通过使用一些复杂的架构特定代码提供了一小套原子类型。通过C11原子操作实现原子类型可能允许内核丢弃架构特定代码并使内核代码对理解标准的人更加友好。但是存在一些问题例如内核的内存模型与C11原子操作形成的模型不匹配。有关更多详细信息请参阅
- en: '[kernel documentation of atomic types](https://www.kernel.org/doc/Documentation/atomic_t.txt)'
id: totrans-503
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[原子类型内核文档](https://www.kernel.org/doc/Documentation/atomic_t.txt)'
- en: '[Time to move to C11 atomics?](https://lwn.net/Articles/691128/)'
id: totrans-504
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[是时候迁移到C11原子操作了吗](https://lwn.net/Articles/691128/)'
- en: '[Atomic usage patterns in the kernel](https://lwn.net/Articles/698315/)'
id: totrans-505
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '[内核中的原子使用模式](https://lwn.net/Articles/698315/)'
- en: 13 Replacing Print Macros
id: totrans-506
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 13 替换打印宏
- en: 13.1 Replacement
id: totrans-507
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 13.1 替换
- en: In [Section 1.7](#before-we-begin), it was noted that the X Window System and
kernel module programming are not conducive to integration. This remains valid
during the development of kernel modules. However, in practical scenarios, the
necessity emerges to relay messages to the tty (teletype) originating the module
load command.
id: totrans-508
prefs: []
type: TYPE_NORMAL
zh: 在[第1.7节](#before-we-begin)中指出X窗口系统和内核模块编程不利于集成。这在内核模块开发期间仍然有效。然而在实际场景中有必要将消息传递到产生模块加载命令的tty电传打字机中。
- en: The term “tty” originates from teletype, which initially referred to a combined
keyboard-printer for Unix system communication. Today, it signifies a text stream
abstraction employed by Unix programs, encompassing physical terminals, xterms
in X displays, and network connections like SSH.
id: totrans-509
prefs: []
type: TYPE_NORMAL
zh: “tty”这个术语起源于电传打字机最初指的是Unix系统通信的键盘打印机组合。今天它表示Unix程序使用的文本流抽象包括物理终端、X显示中的xterms以及SSH等网络连接。
- en: To achieve this, the “current” pointer is leveraged to access the active tasks
tty structure. Within this structure lies a pointer to a string write function,
facilitating the strings transmission to the tty.
id: totrans-510
prefs: []
type: TYPE_NORMAL
zh: 为了实现这一点利用“当前”指针来访问活动任务的tty结构。在这个结构中有一个指向字符串写函数的指针它有助于将字符串传输到tty。
- en: '[PRE93]'
id: totrans-511
prefs: []
type: TYPE_PRE
zh: '[PRE93]'
- en: 13.2 Flashing keyboard LEDs
id: totrans-512
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 13.2 闪烁键盘LED
- en: 'In certain conditions, you may desire a simpler and more direct way to communicate
to the external world. Flashing keyboard LEDs can be such a solution: It is an
immediate way to attract attention or to display a status condition. Keyboard
LEDs are present on every hardware, they are always visible, they do not need
any setup, and their use is rather simple and non-intrusive, compared to writing
to a tty or a file.'
id: totrans-513
prefs: []
type: TYPE_NORMAL
zh: 在某些条件下你可能希望有一种更简单、更直接的方式与外部世界通信。闪烁键盘LED可以是一个解决方案这是一种立即吸引注意或显示状态条件的方法。键盘LED存在于每个硬件上它们总是可见的不需要任何设置并且与写入tty或文件相比它们的使用相当简单且不具侵入性。
- en: 'From v4.14 to v4.15, the timer API made a series of changes to improve memory
safety. A buffer overflow in the area of a `timer_list` structure may be able
to overwrite the `function` and `data` fields, providing the attacker with a way
to use return-oriented programming (ROP) to call arbitrary functions within the
kernel. Also, the function prototype of the callback, containing an `unsigned long`
argument, will prevent the compiler from performing type checking. Furthermore,
the function prototype with `unsigned long` argument may be an obstacle to the
forward-edge protection of control-flow integrity. Thus, it is better to use a
unique prototype to separate from the cluster that takes an `unsigned long` argument.
The timer callback should be passed a pointer to the `timer_list` structure rather
than an `unsigned long` argument. Then, it wraps all the information the callback
needs, including the `timer_list` structure, into a larger structure, and it can
use the `container_of` macro instead of the `unsigned long` value. For more information,
see: [Improving the kernel timers API](https://lwn.net/Articles/735887/).'
id: totrans-514
prefs: []
type: TYPE_NORMAL
zh: 从 v4.14 到 v4.15,定时器 API 进行了一系列更改,以提高内存安全性。`timer_list` 结构区域中的缓冲区溢出可能会覆盖 `function`
和 `data` 字段为攻击者提供使用返回导向编程ROP在内核中调用任意函数的方法。此外包含 `unsigned long` 参数的回调函数原型将阻止编译器执行类型检查。此外,具有
`unsigned long` 参数的函数原型可能成为控制流完整性的前向保护障碍。因此,最好使用独特的原型来与接受 `unsigned long` 参数的簇分开。定时器回调应该传递
`timer_list` 结构的指针而不是 `unsigned long` 参数。然后,它将回调所需的所有信息,包括 `timer_list` 结构,封装到一个更大的结构中,并且可以使用
`container_of` 宏而不是 `unsigned long` 值。有关更多信息,请参阅:[改进内核定时器 API](https://lwn.net/Articles/735887/)。
- en: 'Before Linux v4.14, `setup_timer` was used to initialize the timer and the
`timer_list` structure looked like:'
id: totrans-515
prefs: []
type: TYPE_NORMAL
zh: 在 Linux v4.14 之前,`setup_timer` 用于初始化定时器和 `timer_list` 结构看起来如下:
- en: '[PRE94]'
id: totrans-516
prefs: []
type: TYPE_PRE
zh: '[PRE94]'
- en: Since Linux v4.14, `timer_setup` is adopted and the kernel step by step converting
to `timer_setup` from `setup_timer` . One of the reasons why the API was changed
is that it needed to coexist with the old version of the interface. Moreover,
the `timer_setup` was implemented by `setup_timer` at first.
id: totrans-517
prefs: []
type: TYPE_NORMAL
zh: 自从 Linux v4.14 以来,`timer_setup` 被采用,内核逐步从 `setup_timer` 转换到 `timer_setup`。API
变更的原因之一是它需要与旧版本的接口共存。此外,`timer_setup` 最初是由 `setup_timer` 实现的。
- en: '[PRE95]'
id: totrans-518
prefs: []
type: TYPE_PRE
zh: '[PRE95]'
- en: The `setup_timer` was then removed since v4.15\. As a result, the `timer_list`
structure had changed to the following.
id: totrans-519
prefs: []
type: TYPE_NORMAL
zh: 从 v4.15 版本开始,`setup_timer` 被移除。因此,`timer_list` 结构发生了以下变化。
- en: '[PRE96]'
id: totrans-520
prefs: []
type: TYPE_PRE
zh: '[PRE96]'
- en: The following source code illustrates a minimal kernel module which, when loaded,
starts blinking the keyboard LEDs until it is unloaded.
id: totrans-521
prefs: []
type: TYPE_NORMAL
zh: 以下源代码演示了一个最小的内核模块,当加载时,它会闪烁键盘 LED直到卸载。
- en: '[PRE97]'
id: totrans-522
prefs: []
type: TYPE_PRE
zh: '[PRE97]'
- en: If none of the examples in this chapter fit your debugging needs, there might
yet be some other tricks to try. Ever wondered what `CONFIG_LL_DEBUG` in `make menuconfig`
is good for? If you activate that you get low level access to the serial port.
While this might not sound very powerful by itself, you can patch [kernel/printk.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk.c)
or any other essential syscall to print ASCII characters, thus making it possible
to trace virtually everything what your code does over a serial line. If you find
yourself porting the kernel to some new and former unsupported architecture, this
is usually amongst the first things that should be implemented. Logging over a
netconsole might also be worth a try.
id: totrans-523
prefs: []
type: TYPE_NORMAL
zh: 如果本章中的任何示例都不符合你的调试需求,可能还有一些其他技巧可以尝试。你是否想过 `make menuconfig` 中的 `CONFIG_LL_DEBUG`
是什么作用?如果你激活它,你将获得对串行端口的低级访问。虽然这本身可能听起来并不强大,但你可以在 [kernel/printk.c](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk.c)
或任何其他基本系统调用中打补丁,以打印 ASCII 字符,从而使你能够在串行线上追踪代码执行的几乎所有内容。如果你发现自己正在将内核移植到某些新的、以前不支持的平台,这通常是应该首先实现的事情之一。尝试通过
netconsole 进行日志记录也可能值得尝试。
- en: While you have seen lots of stuff that can be used to aid debugging here, there
are some things to be aware of. Debugging is almost always intrusive. Adding debug
code can change the situation enough to make the bug seem to disappear. Thus,
you should keep debug code to a minimum and make sure it does not show up in production
code.
id: totrans-524
prefs: []
type: TYPE_NORMAL
zh: 虽然在这里你已经看到了很多可以用来辅助调试的内容,但还有一些事情需要注意。调试几乎总是具有侵入性。添加调试代码可能会改变足够多的环境,使得错误看起来似乎消失了。因此,你应该将调试代码保持在最小,并确保它不会出现在生产代码中。
- en: 14 GPIO
id: totrans-525
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 14 个 GPIO
- en: 14.1 GPIO
id: totrans-526
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 14.1 GPIO
- en: 'General Purpose Input/Output (GPIO) appears on the development board as pins.
It acts as a bridge for communication between the development board and external
devices. You can think of it like a switch: users can turn it on or off (Input),
and the development board can also turn it on or off (Output).'
id: totrans-527
prefs: []
type: TYPE_NORMAL
zh: 通用输入/输出GPIO在开发板上表现为引脚。它作为开发板与外部设备之间通信的桥梁。你可以将其想象成一个开关用户可以打开或关闭输入开发板也可以打开或关闭输出
- en: To implement a GPIO device driver, you use the `gpio_request()` function to
enable a specific GPIO pin. After successfully enabling it, you can check that
the pin is being used by looking at /sys/kernel/debug/gpio.
id: totrans-528
prefs: []
type: TYPE_NORMAL
zh: 要实现GPIO设备驱动程序你使用`gpio_request()`函数来启用一个特定的GPIO引脚。启用成功后你可以通过查看/sys/kernel/debug/gpio来检查该引脚是否正在使用。
- en: '[PRE98]'
id: totrans-529
prefs: []
type: TYPE_PRE
zh: '[PRE98]'
- en: There are other ways to register GPIOs. For example, you can use `gpio_request_one()`
to register a GPIO while setting its direction (input or output) and initial state
at the same time. You can also use `gpio_request_array()` to register multiple
GPIOs at once. However, note that `gpio_request_array()` has been removed since
Linux v6.10.
id: totrans-530
prefs: []
type: TYPE_NORMAL
zh: 有其他方法可以注册GPIO。例如你可以在设置其方向输入或输出和初始状态的同时使用`gpio_request_one()`来注册一个GPIO。你也可以使用`gpio_request_array()`一次性注册多个GPIO。但是请注意`gpio_request_array()`自Linux
v6.10以来已被删除。
- en: When using GPIO, you must set it as either output with `gpio_direction_output()`
or input with `gpio_direction_input()` .
id: totrans-531
prefs: []
type: TYPE_NORMAL
zh: 当使用GPIO时你必须使用`gpio_direction_output()`将其设置为输出,或使用`gpio_direction_input()`将其设置为输入。
- en: when the GPIO is set as output, you can use `gpio_set_value()` to choose to
set it to high voltage or low voltage.
id: totrans-532
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 当GPIO设置为输出时你可以使用`gpio_set_value()`来选择将其设置为高电压或低电压。
- en: when the GPIO is set as input, you can use `gpio_get_value()` to read whether
the voltage is high or low.
id: totrans-533
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 当GPIO设置为输入时你可以使用`gpio_get_value()`来读取电压是高还是低。
- en: 14.2 Control the LEDs on/off state
id: totrans-534
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 14.2 控制LED的开关状态
- en: In [Section 9](#talking-to-device-files), we learned how to communicate with
device files. Therefore, we will further use device files to control the LED on
and off.
id: totrans-535
prefs: []
type: TYPE_NORMAL
zh: 在[第9节](#talking-to-device-files)中我们学习了如何与设备文件通信。因此我们将进一步使用设备文件来控制LED的开关。
- en: In the implementation, a pull-down resistor is used. The anode of the LED is
connected to GPIO4, and the cathode is connected to GND. For more details about
the Raspberry Pi pin assignments, refer to [Raspberry Pi Pinout](https://pinout.xyz/).
The materials used include a Raspberry Pi 5, an LED, jumper wires, and a 220Ω
resistor.
id: totrans-536
prefs: []
type: TYPE_NORMAL
zh: 在实现中使用了一个下拉电阻。LED的正极连接到GPIO4负极连接到GND。有关Raspberry Pi引脚分配的更多详细信息请参阅[Raspberry
Pi引脚分配](https://pinout.xyz/)。所使用的材料包括Raspberry Pi 5、LED、跳线和220Ω电阻。
- en: '[PRE99]'
id: totrans-537
prefs: []
type: TYPE_PRE
zh: '[PRE99]'
- en: 'Make and install the module:'
id: totrans-538
prefs: []
type: TYPE_NORMAL
zh: 创建并安装模块:
- en: '[PRE100]'
id: totrans-539
prefs: []
type: TYPE_PRE
zh: '[PRE100]'
- en: 'Switch on the LED:'
id: totrans-540
prefs: []
type: TYPE_NORMAL
zh: 打开LED
- en: '[PRE101]'
id: totrans-541
prefs: []
type: TYPE_PRE
zh: '[PRE101]'
- en: 'Switch off the LED:'
id: totrans-542
prefs: []
type: TYPE_NORMAL
zh: 关闭LED
- en: '[PRE102]'
id: totrans-543
prefs: []
type: TYPE_PRE
zh: '[PRE102]'
- en: 'Finally, remove the module:'
id: totrans-544
prefs: []
type: TYPE_NORMAL
zh: 最后,移除模块:
- en: '[PRE103]'
id: totrans-545
prefs: []
type: TYPE_PRE
zh: '[PRE103]'
- en: 14.3 DHT11 sensor
id: totrans-546
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 14.3 DHT11传感器
- en: The DHT11 sensor is a well-known entry-level sensor commonly used to measure
humidity and temperature. In this subsection, we will use GPIO to communicate
through a single data line. The DHT11 communication protocol can be referred to
in the [datasheet](https://www.mouser.com/datasheet/2/758/DHT11-Technical-Data-Sheet-Translated-Version-1143054.pdf?srsltid=AfmBOoppls-QTd864640bVtbK90sWBsFzJ_7SgjOD2EpwuLLGUSTyYnv).
id: totrans-547
prefs: []
type: TYPE_NORMAL
zh: DHT11传感器是一种常见的入门级传感器常用于测量湿度和温度。在本小节中我们将使用GPIO通过单条数据线进行通信。DHT11通信协议可参考[数据表](https://www.mouser.com/datasheet/2/758/DHT11-Technical-Data-Sheet-Translated-Version-1143054.pdf?srsltid=AfmBOoppls-QTd864640bVtbK90sWBsFzJ_7SgjOD2EpwuLLGUSTyYnv)。
- en: In the implementation, the data pin of the DHT11 sensor is connected to GPIO4
on the Raspberry Pi. The sensors VCC and GND pins are connected to 3.3V and GND,
respectively. For more details about the Raspberry Pi pin assignments, refer to
[Raspberry Pi Pinout](https://pinout.xyz/). The materials used include a Raspberry
Pi 5, a DHT11 sensor, and jumper wires.
id: totrans-548
prefs: []
type: TYPE_NORMAL
zh: 在实现中DHT11传感器的数据引脚连接到Raspberry Pi的GPIO4。传感器的VCC和GND引脚分别连接到3.3V和GND。有关Raspberry
Pi引脚分配的更多详细信息请参阅[Raspberry Pi引脚分配](https://pinout.xyz/)。所使用的材料包括Raspberry Pi
5、DHT11传感器和跳线。
- en: '[PRE104]'
id: totrans-549
prefs: []
type: TYPE_PRE
zh: '[PRE104]'
- en: 'Make and install the module:'
id: totrans-550
prefs: []
type: TYPE_NORMAL
zh: 创建并安装模块:
- en: '[PRE105]'
id: totrans-551
prefs: []
type: TYPE_PRE
zh: '[PRE105]'
- en: 'Check the Output of the DHT11 Sensor:'
id: totrans-552
prefs: []
type: TYPE_NORMAL
zh: 检查DHT11传感器的输出
- en: '[PRE106]'
id: totrans-553
prefs: []
type: TYPE_PRE
zh: '[PRE106]'
- en: 'Expected Output:'
id: totrans-554
prefs: []
type: TYPE_NORMAL
zh: 预期输出:
- en: '[PRE107]'
id: totrans-555
prefs: []
type: TYPE_PRE
zh: '[PRE107]'
- en: 'Finally, remove the module:'
id: totrans-556
prefs: []
type: TYPE_NORMAL
zh: 最后,移除模块:
- en: '[PRE108]'
id: totrans-557
prefs: []
type: TYPE_PRE
zh: '[PRE108]'
- en: 15 Scheduling Tasks
id: totrans-558
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 15 调度任务
- en: 'There are two main ways of running tasks: tasklets and work queues. Tasklets
are a quick and easy way of scheduling a single function to be run. For example,
when triggered from an interrupt, whereas work queues are more complicated but
also better suited to running multiple things in a sequence.'
id: totrans-559
prefs: []
type: TYPE_NORMAL
zh: 运行任务主要有两种方式:任务和作业队列。任务是一种快速简便的方式来安排单个函数的执行。例如,当从中断触发时,而作业队列则更复杂,但更适合按顺序运行多个任务。
- en: It is possible that in future tasklets may be replaced by threaded IRQs. However,
discussion about that has been ongoing since 2007 ([Eliminating tasklets](https://lwn.net/Articles/239633)
and [The end of tasklets](https://lwn.net/Articles/960041/)), so expecting immediate
changes would be unwise. See the [Section 16.1](#interrupt-handlers1) for alternatives
that avoid the tasklet debate.
id: totrans-560
prefs: []
type: TYPE_NORMAL
zh: 未来任务可能会被线程化中断所取代。然而关于这一问题的讨论自2007年以来一直在进行[消除任务](https://lwn.net/Articles/239633)
和 [任务结束](https://lwn.net/Articles/960041/)),因此期望立即发生变化是不明智的。有关避免任务辩论的替代方案,请参阅[第16.1节](#interrupt-handlers1)。
- en: 15.1 Tasklets
id: totrans-561
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 15.1 任务
- en: Here is an example tasklet module. The `tasklet_fn` function runs for a few
seconds. In the meantime, execution of the `example_tasklet_init` function may
continue to the exit point, depending on whether it is interrupted by softirq.
id: totrans-562
prefs: []
type: TYPE_NORMAL
zh: 这里有一个示例任务模块。`tasklet_fn` 函数运行几秒钟。在此期间,`example_tasklet_init` 函数的执行可能会继续到退出点,具体取决于它是否被软中断中断。
- en: '[PRE109]'
id: totrans-563
prefs: []
type: TYPE_PRE
zh: '[PRE109]'
- en: 'So with this example loaded `dmesg` should show:'
id: totrans-564
prefs: []
type: TYPE_NORMAL
zh: 因此,加载此示例后,`dmesg` 应该会显示:
- en: '[PRE110]'
id: totrans-565
prefs: []
type: TYPE_PRE
zh: '[PRE110]'
- en: Although tasklet is easy to use, it comes with several drawbacks, and developers
have been discussing their removal from the Linux kernel. The tasklet callback
runs in atomic context, inside a software interrupt, meaning that it cannot sleep
or access user-space data, so not all work can be done in a tasklet handler. Also,
the kernel only allows one instance of any given tasklet to be running at any
given time; multiple different tasklet callbacks can run in parallel.
id: totrans-566
prefs: []
type: TYPE_NORMAL
zh: 虽然任务使用起来很简单但它存在一些缺点开发者一直在讨论将其从Linux内核中移除。任务回调在原子上下文中运行在软件中断内部这意味着它不能休眠或访问用户空间数据因此并非所有工作都可以在任务处理程序中完成。此外内核只允许在任何给定时间运行任何给定任务的实例多个不同的任务回调可以并行运行。
- en: In recent kernels, tasklets can be replaced by workqueues, timers, or threaded
interrupts. [²](#fn2x0) While the removal of tasklets remains a longer-term goal,
the current kernel contains more than a hundred uses of tasklets. Now developers
are proceeding with the API changes and the macro `DECLARE_TASKLET_OLD` exists
for compatibility. For further information, see [https://lwn.net/Articles/830964/](https://lwn.net/Articles/830964/).
id: totrans-567
prefs: []
type: TYPE_NORMAL
zh: 在最近的内核中,任务可以被工作队列、定时器或线程化中断所取代。[²](#fn2x0) 虽然移除任务仍然是长期目标但当前内核中包含超过一百个任务的使用。现在开发者正在推进API更改并存在宏`DECLARE_TASKLET_OLD`以实现兼容性。有关更多信息,请参阅[https://lwn.net/Articles/830964/](https://lwn.net/Articles/830964/)。
- en: 15.2 Work queues
id: totrans-568
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 15.2 作业队列
- en: To add a task to the scheduler we can use a workqueue. The kernel then uses
the Completely Fair Scheduler (CFS) to execute work within the queue.
id: totrans-569
prefs: []
type: TYPE_NORMAL
zh: 要将任务添加到调度器我们可以使用工作队列。内核随后使用完全公平调度器CFS在队列中执行工作。
- en: '[PRE111]'
id: totrans-570
prefs: []
type: TYPE_PRE
zh: '[PRE111]'
- en: 16 Interrupt Handlers
id: totrans-571
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 16 中断处理程序
- en: 16.1 Interrupt Handlers
id: totrans-572
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 16.1 中断处理程序
- en: Except for the last chapter, everything we did in the kernel so far we have
done as a response to a process asking for it, either by dealing with a special
file, sending an `ioctl()` , or issuing a system call. But the job of the kernel
is not just to respond to process requests. Another job, which is every bit as
important, is to speak to the hardware connected to the machine.
id: totrans-573
prefs: []
type: TYPE_NORMAL
zh: 除了最后一章之外,到目前为止我们在内核中所做的一切都是作为对进程请求的响应而进行的,无论是通过处理特殊文件、发送`ioctl()`还是发出系统调用。但内核的工作不仅仅是响应进程请求。另一个同样重要的任务是与连接到机器的硬件进行通信。
- en: There are two types of interaction between the CPU and the rest of the computers
hardware. The first type is when the CPU gives orders to the hardware, the other
is when the hardware needs to tell the CPU something. The second, called interrupts,
is much harder to implement because it has to be dealt with when convenient for
the hardware, not the CPU. Hardware devices typically have a very small amount
of RAM, and if you do not read their information when available, it is lost.
id: totrans-574
prefs: []
type: TYPE_NORMAL
zh: CPU与计算机其他硬件之间的交互有两种类型。第一种类型是CPU向硬件下达命令另一种类型是硬件需要通知CPU某些信息。第二种称为中断由于它必须在硬件方便的时候处理而不是CPU方便的时候因此实现起来更加困难。硬件设备通常只有很少的RAM如果你不在可用时读取它们的信息这些信息就会丢失。
- en: Under Linux, hardware interrupts are called IRQs (Interrupt ReQuests). There
are two types of IRQs, short and long. A short IRQ is one which is expected to
take a very short period of time, during which the rest of the machine will be
blocked and no other interrupts will be handled. A long IRQ is one which can take
longer, and during which other interrupts may occur (but not interrupts from the
same device). If at all possible, it is better to declare an interrupt handler
to be long.
id: totrans-575
prefs: []
type: TYPE_NORMAL
zh: 在Linux中硬件中断被称为IRQ中断请求。有两种类型的中断请求即短中断和长中断。短中断是指预期在非常短的时间内完成的中断在此期间机器的其他部分将被阻塞不会处理其他中断。长中断是指可能需要较长时间的中断在此期间可能会发生其他中断但不是来自同一设备的中断。如果可能的话最好声明一个长中断处理程序。
- en: When the CPU receives an interrupt, it stops whatever it is doing (unless it
is processing a more important interrupt, in which case it will deal with this
one only when the more important one is done), saves certain parameters on the
stack and calls the interrupt handler. This means that certain things are not
allowed in the interrupt handler itself, because the system is in an unknown state.
Linux kernel solves the problem by splitting interrupt handling into two parts.
The first part executes right away and masks the interrupt line. Hardware interrupts
must be handled quickly, and that is why we need the second part to handle the
heavy work deferred from an interrupt handler. Historically, BH (Linux naming
for Bottom Halves) statistically book-keeps the deferred functions. Softirq and
its higher level abstraction, Tasklet, replace BH since Linux 2.3.
id: totrans-576
prefs: []
type: TYPE_NORMAL
zh: 当CPU接收到中断时它会停止正在执行的操作除非它正在处理一个更重要的中断在这种情况下它只会在此更重要的中断完成后处理此中断在堆栈上保存某些参数并调用中断处理程序。这意味着在中断处理程序本身中不允许某些操作因为系统处于未知状态。Linux内核通过将中断处理分为两部分来解决此问题。第一部分立即执行并屏蔽中断线。硬件中断必须快速处理这就是为什么我们需要第二部分来处理从中断处理程序中延迟的重工作。从历史上看BHLinux对下半部分的命名统计记录了延迟函数。自Linux
2.3以来Softirq及其高级抽象Tasklet取代了BH。
- en: The way to implement this is to call `request_irq()` to get your interrupt handler
called when the relevant IRQ is received.
id: totrans-577
prefs: []
type: TYPE_NORMAL
zh: 实现这一功能的方法是调用 `request_irq()` 以便在接收到相关中断请求IRQ时调用你的中断处理程序。
- en: In practice IRQ handling can be a bit more complex. Hardware is often designed
in a way that chains two interrupt controllers, so that all the IRQs from interrupt
controller B are cascaded to a certain IRQ from interrupt controller A. Of course,
that requires that the kernel finds out which IRQ it really was afterwards and
that adds overhead. Other architectures offer some special, very low overhead,
so called "fast IRQ" or FIQs. To take advantage of them requires handlers to be
written in assembly language, so they do not really fit into the kernel. They
can be made to work similar to the others, but after that procedure, they are
no longer any faster than "common" IRQs. SMP enabled kernels running on systems
with more than one processor need to solve another truckload of problems. It is
not enough to know if a certain IRQs has happened, its also important to know
what CPU(s) it was for. People still interested in more details, might want to
refer to "APIC" now.
id: totrans-578
prefs: []
type: TYPE_NORMAL
zh: 实际上中断处理可能要复杂一些。硬件通常设计成将两个中断控制器串联起来这样中断控制器B的所有中断请求都会级联到中断控制器A的某个中断请求。当然这要求内核在之后找出实际是哪个中断这会增加开销。其他架构提供了一些特殊、低开销的所谓“快速中断”或FIQs。要利用它们需要用汇编语言编写处理程序因此它们实际上并不适合内核。它们可以被配置得与其他处理程序类似但在此过程之后它们不再比“普通”中断更快。在具有多个处理器的系统上运行的启用SMP的内核需要解决另一堆问题。仅仅知道某个中断请求是否发生是不够的还重要的是要知道它针对的是哪个CPU。对更多细节感兴趣的人现在可能想参考“APIC”。
- en: This function receives the IRQ number, the name of the function, flags, a name
for /proc/interrupts and a parameter to be passed to the interrupt handler. Usually
there is a certain number of IRQs available. How many IRQs there are is hardware-dependent.
id: totrans-579
prefs: []
type: TYPE_NORMAL
zh: 此函数接收中断号、函数名称、标志、/proc/interrupts的名称以及传递给中断处理程序的参数。通常有特定数量的中断请求可用。中断请求的数量取决于硬件。
- en: The flags can be used to specify behaviors of the IRQ. For example, use `IRQF_SHARED`
to indicate you are willing to share the IRQ with other interrupt handlers (usually
because a number of hardware devices sit on the same IRQ); use the `IRQF_ONESHOT`
to indicate that the IRQ is not reenabled after the handler finished. It should
be noted that in some materials, you may encounter another set of IRQ flags named
with the `SA` prefix. For example, the `SA_SHIRQ` and the `SA_INTERRUPT` . Those
are the IRQ flags in the older kernels. They have been removed completely. Today
only the `IRQF` flags are in use. This function will only succeed if there is
not already a handler on this IRQ, or if you are both willing to share.
id: totrans-580
prefs: []
type: TYPE_NORMAL
zh: 标志可以用来指定中断的行为。例如,使用`IRQF_SHARED`来表示你愿意与其他中断处理程序共享中断(通常是因为多个硬件设备位于同一中断上);使用`IRQF_ONESHOT`来表示处理程序完成后不重新启用中断。需要注意的是,在某些材料中,你可能会遇到另一组带有`SA`前缀的中断标志。例如,`SA_SHIRQ`和`SA_INTERRUPT`。这些是旧内核中的中断标志。它们已经被完全删除。今天只使用`IRQF`标志。此函数只有在当前中断上没有处理程序,或者你愿意共享的情况下才会成功。
- en: 16.2 Detecting button presses
id: totrans-581
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 16.2 检测按钮按下
- en: Many popular single board computers, such as Raspberry Pi or Beagleboards, have
a bunch of GPIO pins. Attaching buttons to those and then having a button press
do something is a classic case in which you might need to use interrupts, so that
instead of having the CPU waste time and battery power polling for a change in
input state, it is better for the input to trigger the CPU to then run a particular
handling function.
id: totrans-582
prefs: []
type: TYPE_NORMAL
zh: 许多流行的单板计算机如树莓派Raspberry Pi或贝格尔板Beagleboards都有一系列GPIO引脚。将这些按钮连接到这些引脚上然后通过按钮按下执行某些操作这是一个你可能需要使用中断的经典案例这样CPU就不必浪费时间和电池电量轮询输入状态的变化而是让输入触发CPU运行特定的处理函数。
- en: Here is an example where buttons are connected to GPIO numbers 17 and 18 and
an LED is connected to GPIO 4\. You can change those numbers to whatever is appropriate
for your board.
id: totrans-583
prefs: []
type: TYPE_NORMAL
zh: 这里有一个示例其中按钮连接到GPIO编号17和18LED连接到GPIO 4。你可以将这些数字更改为适合你板子的任何数字。
- en: '[PRE112]'
id: totrans-584
prefs: []
type: TYPE_PRE
zh: '[PRE112]'
- en: 16.3 Bottom Half
id: totrans-585
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 16.3 下半部分
- en: Suppose you want to do a bunch of stuff inside of an interrupt routine. A common
way to avoid blocking the interrupt for a significant duration is to defer the
time-consuming part to a workqueue. This pushes the bulk of the work off into
the scheduler. This approach helps speed up the interrupt handling process itself,
allowing the system to respond to the next hardware interrupt more quickly.
id: totrans-586
prefs: []
type: TYPE_NORMAL
zh: 假设你希望在中断例程内部做很多事情。避免中断被阻塞一段较长时间的一种常见方法是将耗时的部分推迟到工作队列中。这会将大部分工作推到调度器中。这种方法有助于加快中断处理过程本身,使系统能够更快地响应下一个硬件中断。
- en: Kernel developers generally discourage using tasklets due to their design limitations,
such as memory management issues and unpredictable latencies. Instead, they recommend
more robust mechanisms like workqueues or softirqs. To address tasklet shortcomings,
Linux contributors introduced the BH workqueue, activated with the `WQ_BH` flag.
This workqueue retains critical features, such as execution in atomic (softirq)
context on the same CPU and the inability to sleep.
id: totrans-587
prefs: []
type: TYPE_NORMAL
zh: 内核开发者通常不鼓励使用tasklets因为它们的设计限制如内存管理问题和不可预测的延迟。相反他们推荐更健壮的机制如workqueues或softirqs。为了解决tasklets的不足Linux贡献者引入了带有`WQ_BH`标志的BH工作队列。此工作队列保留了关键特性如在同一CPU上的原子软中断上下文执行以及无法休眠。
- en: The example below extends the previous code to include an additional task executed
in process context when an interrupt is triggered.
id: totrans-588
prefs: []
type: TYPE_NORMAL
zh: 以下示例扩展了之前的代码,以包括在触发中断时在进程上下文中执行的一个附加任务。
- en: '[PRE113]'
id: totrans-589
prefs: []
type: TYPE_PRE
zh: '[PRE113]'
- en: 16.4 Threaded IRQ
id: totrans-590
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 16.4 线程化中断
- en: 'Threaded IRQ is a mechanism to organize both top-half and bottom-half of an
IRQ at once. A threaded IRQ splits the one handler in `request_irq()` into two:
one for the top-half, the other for the bottom-half. The `request_threaded_irq()`
is the function for using threaded IRQs. Two handlers are registered at once in
the `request_threaded_irq()` .'
id: totrans-591
prefs: []
type: TYPE_NORMAL
zh: 线程化中断请求Threaded IRQ是一种同时组织中断的上半部分和下半部分的机制。线程化中断请求将`request_irq()`中的一个处理程序分成两个:一个用于上半部分,另一个用于下半部分。`request_threaded_irq()`是用于使用线程化中断请求的函数。在`request_threaded_irq()`中同时注册两个处理程序。
- en: Those two handlers run in different context. The top-half handler runs in interrupt
context. Its the equivalence of the handler passed to the `request_irq()` . The
bottom-half handler on the other hand runs in its own thread. This thread is created
on registration of a threaded IRQ. Its sole purpose is to run this bottom-half
handler. This is where a threaded IRQ is “threaded”. If `IRQ_WAKE_THREAD` is returned
by the top-half handler, that bottom-half serving thread will wake up. The thread
then runs the bottom-half handler.
id: totrans-592
prefs: []
type: TYPE_NORMAL
zh: 这两个处理器在不同的上下文中运行。上半部分处理器在中断上下文中运行。它等同于传递给`request_irq()`的处理器的处理。另一方面,下半部分处理器在其自己的线程中运行。这个线程是在注册线程化中断时创建的。它的唯一目的是运行这个下半部分处理器。这就是线程化中断“线程化”的地方。如果上半部分处理器返回`IRQ_WAKE_THREAD`,那么这个下半部分服务线程将被唤醒。然后线程将运行下半部分处理器。
- en: Here is an example of how to do the same thing as before, with top and bottom
halves, but using threads.
id: totrans-593
prefs: []
type: TYPE_NORMAL
zh: 这里是一个如何使用线程实现之前相同功能的例子,即使用上半部分和下半部分。
- en: '[PRE114]'
id: totrans-594
prefs: []
type: TYPE_PRE
zh: '[PRE114]'
- en: A threaded IRQ is registered using `request_threaded_irq()` . This function
only takes one additional parameter than the `request_irq()` the bottom-half
handling function that runs in its own thread. In this example it is the `button_bottom_half()`
. Usage of other parameters are the same as `request_irq()` .
id: totrans-595
prefs: []
type: TYPE_NORMAL
zh: 使用`request_threaded_irq()`注册线程化中断。这个函数比`request_irq()`多一个额外的参数——在它自己的线程中运行的下半部分处理函数。在这个例子中是`button_bottom_half()`。其他参数的使用与`request_irq()`相同。
- en: Presence of both handlers is not mandatory. If either of them is not needed,
pass the `NULL` instead. A `NULL` top-half handler implies that no action is taken
except to wake up the bottom-half serving thread, which runs the bottom-half handler.
Similarly, a `NULL` bottom-half handler effectively acts as if `request_irq()`
were used. In fact, this is how `request_irq()` is implemented.
id: totrans-596
prefs: []
type: TYPE_NORMAL
zh: 两个处理器的存在不是强制的。如果其中任何一个不需要,可以用`NULL`代替。一个`NULL`的上半部分处理器意味着除了唤醒运行下半部分处理器的下半部分服务线程外,不采取任何行动。同样,一个`NULL`的下半部分处理器实际上相当于使用了`request_irq()`。实际上,这就是`request_irq()`的实现方式。
- en: Note that passing `NULL` to both handlers is considered an error and will make
registration fail.
id: totrans-597
prefs: []
type: TYPE_NORMAL
zh: 注意,将`NULL`传递给两个处理器被视为错误,并且会导致注册失败。
- en: 17 Virtual Input Device Driver
id: totrans-598
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 17 虚拟输入设备驱动程序
- en: The input device driver is a module that provides a way to communicate with
the interaction device via the event. For example, the keyboard can send the press
or release event to tell the kernel what we want to do. The input device driver
will allocate a new input structure with `input_allocate_device()` and sets up
input bitfields, device id, version, etc. After that, registers it by calling
`input_register_device()` .
id: totrans-599
prefs: []
type: TYPE_NORMAL
zh: 输入设备驱动程序是一个模块,它提供了一种通过事件与交互设备通信的方式。例如,键盘可以发送按键或释放事件来告诉内核我们想要做什么。输入设备驱动程序将使用`input_allocate_device()`分配一个新的输入结构并设置输入位字段、设备ID、版本等。之后通过调用`input_register_device()`进行注册。
- en: 'Here is an example, vinput, It is an API to allow easy development of virtual
input drivers. The driver needs to export a `vinput_device()` that contains the
virtual device name and `vinput_ops` structure that describes:'
id: totrans-600
prefs: []
type: TYPE_NORMAL
zh: 这里是一个例子vinput它是一个API允许轻松开发虚拟输入驱动程序。驱动程序需要导出一个包含虚拟设备名称和描述的`vinput_ops`结构的`vinput_device()`。该结构描述:
- en: 'the init function: `init()`'
id: totrans-601
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 初始化函数:`init()`
- en: 'the input event injection function: `send()`'
id: totrans-602
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 输入事件注入函数:`send()`
- en: 'the readback function: `read()`'
id: totrans-603
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 读取函数:`read()`
- en: Then using `vinput_register_device()` and `vinput_unregister_device()` will
add a new device to the list of support virtual input devices.
id: totrans-604
prefs: []
type: TYPE_NORMAL
zh: 然后使用`vinput_register_device()`和`vinput_unregister_device()`将新设备添加到支持虚拟输入设备的列表中。
- en: '[PRE115]'
id: totrans-605
prefs: []
type: TYPE_PRE
zh: '[PRE115]'
- en: This function is passed a `struct vinput` already initialized with an allocated
`struct input_dev` . The `init()` function is responsible for initializing the
capabilities of the input device and register it.
id: totrans-606
prefs: []
type: TYPE_NORMAL
zh: 这个函数传递一个已经使用分配的`struct input_dev`初始化的`struct vinput`。`init()`函数负责初始化输入设备的特性并将其注册。
- en: '[PRE116]'
id: totrans-607
prefs: []
type: TYPE_PRE
zh: '[PRE116]'
- en: This function will receive a user string to interpret and inject the event using
the `input_report_XXXX` or `input_event` call. The string is already copied from
user.
id: totrans-608
prefs: []
type: TYPE_NORMAL
zh: 这个函数将接收一个用户字符串来解释并使用`input_report_XXXX`或`input_event`调用注入事件。字符串已经从用户空间复制过来。
- en: '[PRE117]'
id: totrans-609
prefs: []
type: TYPE_PRE
zh: '[PRE117]'
- en: This function is used for debugging and should fill the buffer parameter with
the last event sent in the virtual input device format. The buffer will then be
copied to user.
id: totrans-610
prefs: []
type: TYPE_NORMAL
zh: 这个函数用于调试,应该将缓冲区参数填充为虚拟输入设备格式中发送的最后一个事件。然后,缓冲区将被复制到用户空间。
- en: vinput devices are created and destroyed using sysfs. And, event injection is
done through a /dev node. The device name will be used by the userland to export
a new virtual input device.
id: totrans-611
prefs: []
type: TYPE_NORMAL
zh: 使用sysfs创建和销毁vinput设备。并且事件注入是通过/dev节点完成的。设备名称将由用户空间用于导出新的虚拟输入设备。
- en: 'The `class_attribute` structure is similar to other attribute types we talked
about in [Section 8](#sysfs-interacting-with-your-module):'
id: totrans-612
prefs: []
type: TYPE_NORMAL
zh: '`class_attribute`结构与我们在[第8节](#sysfs-interacting-with-your-module)中讨论的其他属性类型类似:'
- en: '[PRE118]'
id: totrans-613
prefs: []
type: TYPE_PRE
zh: '[PRE118]'
- en: In vinput.c, the macro `CLASS_ATTR_WO(export/unexport)` defined in [include/linux/device.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/device.h)
(in this case, device.h is included in [include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h))
will generate the `class_attribute` structures which are named class_attr_export/unexport.
Then, put them into `vinput_class_attrs` array and the macro `ATTRIBUTE_GROUPS(vinput_class)`
will generate the `struct attribute_group vinput_class_group` that should be assigned
in `vinput_class` . Finally, call `class_register(&vinput_class)` to create attributes
in sysfs.
id: totrans-614
prefs: []
type: TYPE_NORMAL
zh: 在vinput.c中定义在[include/linux/device.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/device.h)在这种情况下device.h包含在[include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h)中)的宏`CLASS_ATTR_WO(export/unexport)`将生成名为class_attr_export/unexport的`class_attribute`结构。然后,将它们放入`vinput_class_attrs`数组,宏`ATTRIBUTE_GROUPS(vinput_class)`将生成应分配到`vinput_class`的`struct
attribute_group vinput_class_group`。最后,调用`class_register(&vinput_class)`在sysfs中创建属性。
- en: To create a vinputX sysfs entry and /dev node.
id: totrans-615
prefs: []
type: TYPE_NORMAL
zh: 要创建vinputX sysfs条目和/dev节点。
- en: '[PRE119]'
id: totrans-616
prefs: []
type: TYPE_PRE
zh: '[PRE119]'
- en: 'To unexport the device, just echo its id in unexport:'
id: totrans-617
prefs: []
type: TYPE_NORMAL
zh: 要取消导出设备只需在unexport中回显其ID
- en: '[PRE120]'
id: totrans-618
prefs: []
type: TYPE_PRE
zh: '[PRE120]'
- en: '[PRE121]'
id: totrans-619
prefs: []
type: TYPE_PRE
zh: '[PRE121]'
- en: '[PRE122]'
id: totrans-620
prefs: []
type: TYPE_PRE
zh: '[PRE122]'
- en: Here the virtual keyboard is one of example to use vinput. It supports all `KEY_MAX`
keycodes. The injection format is the `KEY_CODE` such as defined in [include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h).
A positive value means `KEY_PRESS` while a negative value is a `KEY_RELEASE` .
The keyboard supports repetition when the key stays pressed for too long. The
following demonstrates how simulation work.
id: totrans-621
prefs: []
type: TYPE_NORMAL
zh: 这里虚拟键盘是使用vinput的一个示例。它支持所有`KEY_MAX`键码。注入格式是`KEY_CODE`,如[include/linux/input.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/input.h)中定义的那样。正值表示`KEY_PRESS`,而负值是`KEY_RELEASE`。当按键按下时间过长时,键盘支持重复。以下演示了模拟的工作方式。
- en: 'Simulate a key press on "g" ( `KEY_G` = 34):'
id: totrans-622
prefs: []
type: TYPE_NORMAL
zh: 模拟在"g"键上按下按键(`KEY_G` = 34
- en: '[PRE123]'
id: totrans-623
prefs: []
type: TYPE_PRE
zh: '[PRE123]'
- en: 'Simulate a key release on "g" ( `KEY_G` = 34):'
id: totrans-624
prefs: []
type: TYPE_NORMAL
zh: 模拟在"g"键上释放按键(`KEY_G` = 34
- en: '[PRE124]'
id: totrans-625
prefs: []
type: TYPE_PRE
zh: '[PRE124]'
- en: '[PRE125]'
id: totrans-626
prefs: []
type: TYPE_PRE
zh: '[PRE125]'
- en: '18 Standardizing the interfaces: The Device Model'
id: totrans-627
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 18 标准化接口:设备模型
- en: Up to this point we have seen all kinds of modules doing all kinds of things,
but there was no consistency in their interfaces with the rest of the kernel.
To impose some consistency such that there is at minimum a standardized way to
start, suspend and resume a device model was added. An example is shown below,
and you can use this as a template to add your own suspend, resume or other interface
functions.
id: totrans-628
prefs: []
type: TYPE_NORMAL
zh: 到目前为止,我们已经看到了各种模块做各种事情,但它们与内核其余部分的接口没有一致性。为了强制一致性,至少添加了一种标准化的方式来启动、挂起和恢复设备模型。下面是一个示例,你可以将其用作模板来添加你自己的挂起、恢复或其他接口函数。
- en: '[PRE126]'
id: totrans-629
prefs: []
type: TYPE_PRE
zh: '[PRE126]'
- en: 19 Device Tree
id: totrans-630
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 19 设备树
- en: 19.1 Introduction to Device Tree
id: totrans-631
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.1 设备树简介
- en: Device Tree is a data structure that describes hardware components in a system,
particularly in embedded systems and ARM-based platforms. Instead of hard-coding
hardware details in the kernel source, Device Tree provides a separate, human-readable
description that the kernel can parse at boot time. This separation allows the
same kernel binary to support multiple hardware platforms, making development
and maintenance significantly easier.
id: totrans-632
prefs: []
type: TYPE_NORMAL
zh: 设备树是一种数据结构用于描述系统中的硬件组件尤其是在嵌入式系统和基于ARM的平台中。而不是在内核源代码中硬编码硬件细节设备树提供了一个单独的、可读性好的描述内核可以在启动时解析。这种分离使得相同的内核二进制文件可以支持多个硬件平台使得开发和维护变得显著更容易。
- en: Device Tree files (with .dts extension for source files and .dtb for compiled
binary files) use a hierarchical structure similar to a filesystem to represent
the hardware topology. Each hardware component is represented as a node with properties
that describe its characteristics, such as memory addresses, interrupt numbers,
and device-specific parameters.
id: totrans-633
prefs: []
type: TYPE_NORMAL
zh: 设备树文件(源文件以.dts扩展名编译后的二进制文件以.dtb扩展名使用类似于文件系统的分层结构来表示硬件拓扑。每个硬件组件都表示为一个节点其属性描述了其特征如内存地址、中断号和设备特定参数。
- en: 19.2 Device Tree and Kernel Modules
id: totrans-634
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.2 设备树与内核模块
- en: While Device Tree is primarily used during kernel initialization, kernel modules
can also interact with Device Tree nodes through the platform device framework.
When the kernel parses the Device Tree at boot, it creates platform devices for
nodes that have compatible strings. Kernel modules can then register platform
drivers that match these compatible strings, allowing them to be automatically
probed when the corresponding hardware is detected.
id: totrans-635
prefs: []
type: TYPE_NORMAL
zh: 虽然设备树主要用于内核初始化期间,但内核模块也可以通过平台设备框架与设备树节点交互。当内核在启动时解析设备树时,它会为具有兼容字符串的节点创建平台设备。然后,内核模块可以注册与这些兼容字符串匹配的平台驱动程序,允许它们在检测到相应的硬件时自动探测。
- en: 'The key concepts for Device Tree interaction in kernel modules include:'
id: totrans-636
prefs: []
type: TYPE_NORMAL
zh: 内核模块中与设备树交互的关键概念包括:
- en: 'Compatible strings: Unique identifiers that match Device Tree nodes to their
drivers'
id: totrans-637
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 兼容字符串:匹配设备树节点与其驱动程序的唯一标识符
- en: 'Property reading: Functions to extract configuration data from Device Tree
nodes'
id: totrans-638
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 属性读取:从设备树节点中提取配置数据的函数
- en: 'Platform driver framework: Infrastructure for binding drivers to devices described
in Device Tree'
id: totrans-639
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 平台驱动程序框架:将驱动程序绑定到设备树中描述的设备的基础设施
- en: 'Device-specific data: Custom properties that can be defined for specific hardware'
id: totrans-640
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: 设备特定数据:可以为特定硬件定义的自定义属性
- en: '19.3 Example: Device Tree Module'
id: totrans-641
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.3 示例:设备树模块
- en: The following example demonstrates how a kernel module can interact with Device
Tree nodes. This module registers a platform driver that matches specific compatible
strings and extracts properties from the matched Device Tree nodes.
id: totrans-642
prefs: []
type: TYPE_NORMAL
zh: 以下示例演示了内核模块如何与设备树节点交互。此模块注册了一个与特定兼容字符串匹配的平台驱动程序,并从匹配的设备树节点中提取属性。
- en: '[PRE127]'
id: totrans-643
prefs: []
type: TYPE_PRE
zh: '[PRE127]'
- en: 19.4 Device Tree Source Example
id: totrans-644
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.4 设备树源示例
- en: 'To use the above module, you would need a Device Tree entry like this:'
id: totrans-645
prefs: []
type: TYPE_NORMAL
zh: 要使用上述模块,您需要一个如下的设备树条目:
- en: '[PRE128]'
id: totrans-646
prefs: []
type: TYPE_PRE
zh: '[PRE128]'
- en: The properties in this Device Tree node would be read by the modules probe
function when the device is matched. The compatible property is used to match
the device with the driver, while other properties provide device-specific configuration.
id: totrans-647
prefs: []
type: TYPE_NORMAL
zh: 在设备匹配时,模块的探测函数将读取此设备树节点中的属性。兼容属性用于将设备与驱动程序匹配,而其他属性提供设备特定的配置。
- en: 19.5 Testing Device Tree Modules
id: totrans-648
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.5 测试设备树模块
- en: 'Testing Device Tree modules can be done in several ways:'
id: totrans-649
prefs: []
type: TYPE_NORMAL
zh: 测试设备树模块可以通过几种方式完成:
- en: 'Using Device Tree overlays: On systems that support it (like Raspberry Pi),
you can load Device Tree overlays at runtime to add new devices without rebooting.'
id: totrans-650
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 使用设备树覆盖:在支持它的系统(如树莓派)上,您可以在运行时加载设备树覆盖,以添加新设备而无需重启。
- en: 'Modifying the main Device Tree: Add your device nodes to the systems main
Device Tree source file and recompile it.'
id: totrans-651
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 修改主设备树:将您的设备节点添加到系统的主设备树源文件中,并重新编译它。
- en: 'Using QEMU: For development and testing, QEMU can emulate systems with custom
Device Trees, allowing you to test your modules without physical hardware.'
id: totrans-652
prefs:
- PREF_OL
type: TYPE_NORMAL
zh: 使用QEMU在开发和测试中QEMU可以模拟具有自定义设备树的系统允许您在没有物理硬件的情况下测试您的模块。
- en: 'To check if your device was properly detected, you can examine the sysfs filesystem:'
id: totrans-653
prefs: []
type: TYPE_NORMAL
zh: 要检查您的设备是否被正确检测您可以检查sysfs文件系统
- en: '[PRE129]'
id: totrans-654
prefs: []
type: TYPE_PRE
zh: '[PRE129]'
- en: 19.6 Common Device Tree Functions
id: totrans-655
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 19.6 常用设备树函数
- en: 'Here are some commonly used Device Tree functions in kernel modules:'
id: totrans-656
prefs: []
type: TYPE_NORMAL
zh: 这里有一些在内核模块中常用的设备树函数:
- en: '`of_property_read_string()` - Read a string property'
id: totrans-657
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_property_read_string()` - 读取字符串属性'
- en: '`of_property_read_u32()` - Read a 32-bit integer property'
id: totrans-658
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_property_read_u32()` - 读取32位整数属性'
- en: '`of_property_read_bool()` - Check if a boolean property exists'
id: totrans-659
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_property_read_bool()` - 检查是否存在布尔属性'
- en: '`of_find_property()` - Find a property by name'
id: totrans-660
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_find_property()` - 通过名称查找属性'
- en: '`of_get_property()` - Get a propertys raw value'
id: totrans-661
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_get_property()` - 获取属性的原始值'
- en: '`of_match_device()` - Match a device against a match table'
id: totrans-662
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_match_device()` - 将设备与匹配表匹配'
- en: '`of_parse_phandle()` - Parse a phandle reference to another node'
id: totrans-663
prefs:
- PREF_UL
type: TYPE_NORMAL
zh: '`of_parse_phandle()` - 解析指向另一个节点的phandle引用'
- en: These functions provide a robust interface for extracting configuration data
from Device Tree nodes, allowing modules to be highly configurable without code
changes.
id: totrans-664
prefs: []
type: TYPE_NORMAL
zh: 这些函数提供了一个健壮的接口,用于从设备树节点中提取配置数据,允许模块在无需代码更改的情况下进行高度配置。
- en: 20 Optimizations
id: totrans-665
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 20 优化
- en: 20.1 Likely and Unlikely conditions
id: totrans-666
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 20.1 可能和不可能条件
- en: Sometimes you might want your code to run as quickly as possible, especially
if it is handling an interrupt or doing something which might cause noticeable
latency. If your code contains boolean conditions and if you know that the conditions
are almost always likely to evaluate as either `true` or `false` , then you can
allow the compiler to optimize for this using the `likely` and `unlikely` macros.
For example, when allocating memory you are almost always expecting this to succeed.
id: totrans-667
prefs: []
type: TYPE_NORMAL
zh: 有时你可能希望你的代码尽可能快地运行,特别是如果你正在处理中断或可能引起明显延迟的操作。如果你的代码包含布尔条件,并且你知道条件几乎总是评估为 `true`
或 `false`,那么你可以允许编译器使用 `likely` 和 `unlikely` 宏进行优化。例如,当分配内存时,你几乎总是期望这会成功。
- en: '[PRE130]'
id: totrans-668
prefs: []
type: TYPE_PRE
zh: '[PRE130]'
- en: When the `unlikely` macro is used, the compiler alters its machine instruction
output, so that it continues along the false branch and only jumps if the condition
is true. That avoids flushing the processor pipeline. The opposite happens if
you use the `likely` macro.
id: totrans-669
prefs: []
type: TYPE_NORMAL
zh: 当使用 `unlikely` 宏时,编译器会改变其机器指令输出,以便它继续沿着错误分支执行,并且只有当条件为真时才会跳转。这避免了刷新处理器流水线。如果你使用
`likely` 宏,则发生相反的情况。
- en: 20.2 Static keys
id: totrans-670
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 20.2 静态密钥
- en: 'Static keys allow us to enable or disable kernel code paths based on the runtime
state of a key. Their APIs have been available since 2010 (most architectures
are already supported) and use self-modifying code to eliminate the overhead of
cache and branch prediction. The most typical use case of static keys is for performance-sensitive
kernel code, such as tracepoints, context switching, networking, etc. These hot
paths of the kernel often contain branches and can be optimized easily using this
technique. Before we can use static keys in the kernel, we need to make sure that
gcc supports `asm goto` inline assembly, and the following kernel configurations
are set:'
id: totrans-671
prefs: []
type: TYPE_NORMAL
zh: 静态密钥允许我们根据密钥的运行时状态启用或禁用内核代码路径。它们的 API 自 2010 年以来一直可用(大多数架构已经支持)并且使用自修改代码来消除缓存和分支预测的开销。静态密钥最典型的用例是性能敏感的内核代码,如
tracepoints、上下文切换、网络等。内核的这些热点路径通常包含分支并且可以使用此技术轻松优化。在我们能够在内核中使用静态密钥之前我们需要确保 gcc
支持 `asm goto` 内联汇编,并且以下内核配置被设置:
- en: '[PRE131]'
id: totrans-672
prefs: []
type: TYPE_PRE
zh: '[PRE131]'
- en: 'To declare a static key, we need to define a global variable using the `DEFINE_STATIC_KEY_FALSE`
or `DEFINE_STATIC_KEY_TRUE` macro defined in [include/linux/jump_label.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/jump_label.h).
This macro initializes the key with the given initial value, which is either false
or true, respectively. For example, to declare a static key with an initial value
of false, we can use the following code:'
id: totrans-673
prefs: []
type: TYPE_NORMAL
zh: 要声明静态密钥,我们需要使用在 [include/linux/jump_label.h](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/linux/jump_label.h)
中定义的 `DEFINE_STATIC_KEY_FALSE` 或 `DEFINE_STATIC_KEY_TRUE` 宏来定义一个全局变量。此宏将密钥初始化为给定的初始值,即分别为假或真。例如,要声明一个初始值为假的静态密钥,我们可以使用以下代码:
- en: '[PRE132]'
id: totrans-674
prefs: []
type: TYPE_PRE
zh: '[PRE132]'
- en: Once the static key has been declared, we need to add branching code to the
module that uses the static key. For example, the code includes a fastpath, where
a no-op instruction will be generated at compile time as the key is initialized
to false and the branch is unlikely to be taken.
id: totrans-675
prefs: []
type: TYPE_NORMAL
zh: 一旦声明了静态密钥,我们需要向使用静态密钥的模块中添加分支代码。例如,代码包括一个快速路径,在编译时将生成一个无操作指令,因为密钥被初始化为假,分支不太可能被采取。
- en: '[PRE133]'
id: totrans-676
prefs: []
type: TYPE_PRE
zh: '[PRE133]'
- en: If the key is enabled at runtime by calling `static_branch_enable(&fkey)` ,
the fastpath will be patched with an unconditional jump instruction to the slowpath
code `pr_alert` , so the branch will always be taken until the key is disabled
again.
id: totrans-677
prefs: []
type: TYPE_NORMAL
zh: 如果在运行时通过调用 `static_branch_enable(&fkey)` 启用密钥,则快速路径将被修补为无条件跳转到慢路径代码 `pr_alert`,因此分支将始终被采取,直到再次禁用密钥。
- en: The following kernel module derived from chardev.c, demonstrates how the static
key works.
id: totrans-678
prefs: []
type: TYPE_NORMAL
zh: 以下从 chardev.c 衍生的内核模块演示了静态密钥的工作原理。
- en: '[PRE134]'
id: totrans-679
prefs: []
type: TYPE_PRE
zh: '[PRE134]'
- en: To check the state of the static key, we can use the /dev/key_state interface.
id: totrans-680
prefs: []
type: TYPE_NORMAL
zh: 检查静态密钥的状态,我们可以使用 /dev/key_state 接口。
- en: '[PRE135]'
id: totrans-681
prefs: []
type: TYPE_PRE
zh: '[PRE135]'
- en: This will display the current state of the key, which is disabled by default.
id: totrans-682
prefs: []
type: TYPE_NORMAL
zh: 这将显示密钥的当前状态,默认情况下密钥是禁用的。
- en: 'To change the state of the static key, we can perform a write operation on
the file:'
id: totrans-683
prefs: []
type: TYPE_NORMAL
zh: 要更改静态键的状态,我们可以在文件上执行写操作:
- en: '[PRE136]'
id: totrans-684
prefs: []
type: TYPE_PRE
zh: '[PRE136]'
- en: This will enable the static key, causing the code path to switch from the fastpath
to the slowpath.
id: totrans-685
prefs: []
type: TYPE_NORMAL
zh: 这将启用静态键,导致代码路径从快速路径切换到慢速路径。
- en: In some cases, the key is enabled or disabled at initialization and never changed,
we can declare a static key as read-only, which means that it can only be toggled
in the module init function. To declare a read-only static key, we can use the
`DEFINE_STATIC_KEY_FALSE_RO` or `DEFINE_STATIC_KEY_TRUE_RO` macro instead. Attempts
to change the key at runtime will result in a page fault. For more information,
see [Static keys](https://www.kernel.org/doc/Documentation/static-keys.txt)
id: totrans-686
prefs: []
type: TYPE_NORMAL
zh: 在某些情况下,键在初始化时被启用或禁用,之后从未改变,我们可以将静态键声明为只读,这意味着它只能在模块初始化函数中切换。要声明只读静态键,我们可以使用`DEFINE_STATIC_KEY_FALSE_RO`或`DEFINE_STATIC_KEY_TRUE_RO`宏。在运行时尝试更改键将导致页面错误。有关更多信息,请参阅[静态键](https://www.kernel.org/doc/Documentation/static-keys.txt)。
- en: 21 Common Pitfalls
id: totrans-687
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 21 常见陷阱
- en: 21.1 Using standard libraries
id: totrans-688
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 21.1 使用标准库
- en: You can not do that. In a kernel module, you can only use kernel functions which
are the functions you can see in /proc/kallsyms.
id: totrans-689
prefs: []
type: TYPE_NORMAL
zh: 你不能这样做。在内核模块中,你只能使用内核函数,这些函数是你可以在/proc/kallsyms中看到的。
- en: 21.2 Disabling interrupts
id: totrans-690
prefs:
- PREF_H4
type: TYPE_NORMAL
zh: 21.2 禁用中断
- en: You might need to do this for a short time and that is OK, but if you do not
enable them afterwards, your system will be stuck and you will have to power it
off.
id: totrans-691
prefs: []
type: TYPE_NORMAL
zh: 你可能需要短时间这样做,这是可以的,但如果你之后没有启用它们,你的系统将会卡住,你将不得不关闭电源。
- en: 22 Where To Go From Here?
id: totrans-692
prefs:
- PREF_H3
type: TYPE_NORMAL
zh: 22 从这里去哪里?
- en: For those deeply interested in kernel programming, [kernelnewbies.org](https://kernelnewbies.org)
and the [Documentation](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation)
subdirectory within the kernel source code are highly recommended. Although the
latter may not always be straightforward, it serves as a valuable initial step
for further exploration. Echoing Linus Torvalds perspective, the most effective
method to understand the kernel is through personal examination of the source
code.
id: totrans-693
prefs: []
type: TYPE_NORMAL
zh: 对于对内核编程有深厚兴趣的人来说,强烈推荐访问[kernelnewbies.org](https://kernelnewbies.org)以及内核源代码中的[Documentation](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation)子目录。尽管后者可能并不总是直截了当,但它为更深入的探索提供了一个宝贵的起点。正如林纳斯·托瓦兹的观点,理解内核的最有效方法是亲自检查源代码。
- en: Contributions to this guide are welcome, especially if there are any significant
inaccuracies identified. To contribute or report an issue, please initiate an
issue at [https://github.com/sysprog21/lkmpg](https://github.com/sysprog21/lkmpg).
Pull requests are greatly appreciated.
id: totrans-694
prefs: []
type: TYPE_NORMAL
zh: 欢迎对此指南做出贡献,特别是如果发现任何重大不准确之处。要贡献或报告问题,请在[https://github.com/sysprog21/lkmpg](https://github.com/sysprog21/lkmpg)发起一个问题。拉取请求非常受欢迎。
- en: Happy hacking!
id: totrans-695
prefs: []
type: TYPE_NORMAL
zh: 开心黑客!
- en: '[¹](#fn1x0-bk)As of Linux kernel 6.12, several member fields have been added,
removed, or had their prototypes changed. For example, additions include fop_flags,
splice_eof, and uring_cmd; removals include iterate and sendpage; and the prototype
for iopoll was modified.'
id: totrans-696
prefs: []
type: TYPE_NORMAL
zh: '[¹](#fn1x0-bk)截至Linux内核6.12版本一些成员字段已被添加、删除或更改了原型。例如新增包括fop_flags、splice_eof和uring_cmd删除包括iterate和sendpageiopoll的原型也被修改。'
- en: '[²](#fn2x0-bk)The goal of threaded interrupts is to push more of the work to
separate threads, so that the minimum needed for acknowledging an interrupt is
reduced, and therefore the time spent handling the interrupt (where it cant handle
any other interrupts at the same time) is reduced. See [https://lwn.net/Articles/302043/](https://lwn.net/Articles/302043/).'
id: totrans-697
prefs: []
type: TYPE_NORMAL
zh: '[²](#fn2x0-bk)线程中断的目标是将更多的工作推送到单独的线程,这样确认中断所需的最小工作就减少了,因此处理中断(在此期间不能处理其他中断)的时间也就减少了。参见[https://lwn.net/Articles/302043/](https://lwn.net/Articles/302043/)。'