硬件平台:某ARM SoC
软件平台:Linux
1 Runtime PM 简介
在介绍 Runtime PM 之前,不妨先看看传统的电源管理。传统的电源管理机制,称之为 System PM(System Suspend & Resume),当整个系统要进入睡眠时,依次调用各驱动模块的 suspend 函数,是一种粗粒度的电源管理,执行路径相对也比较单一。
Runtime PM,直译过来就是运行时电源管理。每个设备(包括芯片内部件)各自处理好自身的电源管理工作,在不需要工作的时候尽量进入低功耗状态,在需要工作时又重新起来。这样即使整个系统没有进入睡眠的情况下,设备自身也可以根据实际工作情况决定是否要进入低功耗状态,达到尽量省电的目的。
落实到代码上,当需要设备工作时,通过调用 pm_runtime_get_sync 让设备 runtime resume;当工作完成后,通过调用 pm_runtime_put 让设备 runtime suspend,伪代码如下:
senddata()
{
pm_runtime_get_sync()
do something …
pm_runtime_put()
}
recvdata()
{
pm_runtime_get_sync()
do something …
pm_runtime_put()
}
pm_runtime_get_sync 和 pm_runtime_put 会维护一个引用计数,pm_runtime_get_sync 会增加引用计数,pm_runtime_put 会减少引用计数,当引用计数为0时,才会真正让设备进入低功耗。
Runtime PM 的概念是比较直观的,对于某个设备来说,就是谁需要我工作,就 get 我,否则就 put 我。但是提供的函数接口有点多,本文的重点不在这里,就不一一介绍了,常用的如下:
Runtime PM 调用的时机,需要设备驱动仔细地处理,不然可能引发功耗问题或者系统异常。
2 问题案例 Kernel Panic:external abort on non-linefetch
external abort on non-linefetch,常见的原因是:读写芯片内某个部件的寄存器时,该部件的 power 和 clock 还没有开启。
案例一,通过用户空间 spidev_test 程序测试 SPI 时报错。
[ 86.901554] c2 Unhandled fault: external abort on non-linefetch (0x1008) at 0xe999a008
[ 86.909373] c2 pgd = 6fa82014
[ 86.912315] c2 [e999a008] *pgd=a80e1811, *pte=70a00653, *ppte=70a00453
[ 86.918813] c2 Internal error: : 1008 [#1] PREEMPT SMP ARM
[ 86.942798] c2 CPU: 2 PID: 2653 Comm: spidev_test Tainted: G O 4.14.133+ #10
[ 86.950923] c2 Hardware name: Generic DT based system
[ 86.955945] c2 task: 434d36b9 task.stack: a6b02495
[ 86.960713] c2 PC is at foo_spi_chipselect+0x8c/0xdc
[ 86.965721] c2 LR is at 0xe999a000
[ 86.969095] c2 pc : [
[ 86.975590] c2 sp : c3a07db8 ip : 00000000 fp : c3a07dd4
[ 86.981039] c2 r10: 00000036 r9 : 00000003 r8 : 00000196
[ 86.986489] c2 r7 : 00000196 r6 : e999a008 r5 : c3a07db8 r4 : c067113c
[ 86.993241] c2 r3 : c11adb28 r2 : e9898030 r1 : 00000030 r0 : e9898000
[ 86.999993] c2 Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 87.007349] c2 Control: 10c5383d Table: 8531806a DAC: 00000051
[ 87.013319] c2 Process spidev_test (pid: 2653, stack limit = 0xd6c1587c)
…
[ 87.184082] c2 [
[ 87.192294] c2 [
[ 87.199814] c2 [
[ 87.207521] c2 [
[ 87.215048] c2 [
[ 87.222748] c2 [
[ 87.230276] c2 [
从函数调用栈可以看出,spidev_test 程序通过 IOCTL 与 SPI driver 交互时,在 SPI driver 的 foo_spi_chipselect 函数中发生了错误。foo_spi_chipselect 函数的内容如下,可以看到它读写了 SPI Controller 的寄存器,但是操作之前并没有调用 pm_runtime_get 让 controller resume。
static void foo_spi_chipselect(struct spi_device *sdev, bool cs)
{
struct spi_controller *sctlr = sdev->controller;
struct foo_spi *ss = spi_controller_get_devdata(sctlr);
u32 val;
val = readl\_relaxed(ss->base + FOO\_SPI\_CTL0);
/\* The SPI controller will pull down CS pin if cs is 0 \*/
if (!cs) {
val &= ~FOO\_SPI\_CS0\_VALID;
writel\_relaxed(val, ss->base + FOO\_SPI\_CTL0);
} else {
val |= FOO\_SPI\_CSN\_MASK;
writel\_relaxed(val, ss->base + FOO\_SPI\_CTL0);
}
}
我们可以改成如下代码解决问题。
static void foo_spi_chipselect(struct spi_device *sdev, bool cs)
{
struct spi_controller *sctlr = sdev->controller;
struct foo_spi *ss = spi_controller_get_devdata(sctlr);
u32 val;
但是较新的(2019年10月份以后) kernel spi 代码,已经在 spi core 代码中修复了此问题,无需改动芯片厂商的 controller 驱动。
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index f9502db..19007e0 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -3091,7 +3091,20 @@ int spi_setup(struct spi_device *spi)
if (spi->controller->setup)
status = spi->controller->setup(spi);
spi_set_cs(spi, false);
if (spi->controller->auto_runtime_pm && spi->controller->set_cs) {
status = pm_runtime_get_sync(spi->controller->dev.parent);
if (status < 0) {
pm_runtime_put_noidle(spi->controller->dev.parent);
dev_err(&spi->controller->dev, "Failed to power device: %d\n",
status);
return status;
}
spi_set_cs(spi, false);
pm_runtime_mark_last_busy(spi->controller->dev.parent);
pm_runtime_put_autosuspend(spi->controller->dev.parent);
} else {
spi_set_cs(spi, false);
}
if (spi->rt && !spi->controller->rt) {
spi->controller->rt = true;
详情可参考 https://lore.kernel.org/linux-arm-kernel/1572426234-30019-1-git-send-email-luhua.xu@mediatek.com/
案例二 USB做Host时反复开关机测试出现异常
[ 11.616956] c0 Unhandled fault: external abort on non-linefetch (0x1008) at 0xd02d4001
[ 11.624774] c0 pgd = c0004000
[ 11.627708] [d02d4001] *pgd=8f69a811, *pte=20200653, *ppte=20200453
[ 11.633944] c0 Internal error: : 1008 [#1] PREEMPT SMP ARM
[ 11.639390] Modules linked in:
[ 11.642424] c0 CPU: 0 PID: 161 Comm: kworker/0:3 Not tainted 4.4.83 #1
[ 11.648909] c0 Hardware name: Generic DT based system
[ 11.653940] Workqueue: events musb_deassert_reset
[ 11.658601] c0 task: ce854780 task.stack: ce8c6000
[ 11.663364] c0 PC is at musb_default_readb+0x54/0x9c
原因和案例一类似,musb_deassert_reset 在 USB controller shutdown 的状态下访问了 USB controller 的寄存器。
static void musb_deassert_reset(struct work_struct *work)
{
struct musb *musb;
unsigned long flags;
musb = container\_of(work, struct musb, deassert\_reset\_work.work);
pm_runtime_get_sync(musb->controller);
spin_lock_irqsave(&musb->lock, flags);
if (musb->port1_status & USB_PORT_STAT_RESET)
musb_port_reset(musb, false);
spin_unlock_irqrestore(&musb->lock, flags);
pm_runtime_put(musb->controller);
}
按以上修改后,上述错误路径不复现,但是出现了新的错误路径,说明修改得并不彻底。
[ 13.364606] c0 Unhandled fault: external abort on non-linefetch (0x1008) at 0xd02d4001
[ 13.372418] c0 pgd = c0004000
[ 13.375359] [d02d4001] *pgd=8f69a811, *pte=20200653, *ppte=20200453
[ 13.381595] c0 Internal error: : 1008 [#1] PREEMPT SMP ARM
[ 13.387042] Modules linked in:
[ 13.390075] c0 CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.4.83 #1
[ 13.396388] c0 Hardware name: Generic DT based system
[ 13.401417] Workqueue: usb_hub_wq hub_event
[ 13.405562] c0 task: cf471380 task.stack: cf490000
[ 13.410326] c0 PC is at musb_default_readb+0x54/0x9c
梳理 musb 代码(drivers/usb/musb/*),发现 musb_gadget 代码有针对 runtime PM 的处理,musb_host 代码则没有针对 runtime PM 的处理。最后的处理方案是在 USB controller 驱动中完善 runtime PM 处理,没有修改 musb 公共代码。具体修改细节与该厂商 USB controller 的驱动实现逻辑有关,没有普遍的借鉴意义,就没有必要贴出了。
-------------------------------------------------------
作者:bigfish99
博客:https://www.cnblogs.com/bigfish0506/
公众号:大鱼嵌入式
手机扫一扫
移动阅读更方便
你可能感兴趣的文章