这个panic是由CTS测试的时候发现的,panic的log如下:
[ 2212.531425] c3 3279 (logcat) Unable to handle kernel paging request at virtual address 2b2c2c2b2b292a2a[ 2212.541032] c3 3279 (logcat) pgd = ffffffc00d5f5000[ 2212.545910] [2b2c2c2b2b292a2a] *pgd=0000000000000000[ 2212.550992] c3 3279 (logcat) Internal error: Oops: 96000044 [#1] PREEMPT SMP[ 2212.557983] Modules linked in: sd8777 mlan8777 audiostub cidatattydev gs_modem ccinetdev cci_datastub citty iml_module seh cploaddev msocketk tzdd galcore(O) [last unloaded: mbt8777][ 2212.574228] c3 3279 (logcat) CPU: 3 PID: 3279 Comm: logcat Tainted: G O 3.10.33 #1[ 2212.582601] c3 3279 (logcat) task: ffffffc0132d09c0 ti: ffffffc01ed20000 task.ti: ffffffc01ed20000[ 2212.591495] c3 3279 (logcat) PC is at memcpy+0xc0/0x180[ 2212.596680] c3 3279 (logcat) LR is at tty_insert_flip_string_fixed_flag+0x78/0xcc[ 2212.604102] c3 3279 (logcat) pc : [] lr : [ ] pstate: 80000145[ 2212.612903] c3 3279 (logcat) sp : ffffffc01ed23c90[ 2212.617650] R29: ffffffc01ed23c90 R28: ffffffc01f8b8000[ 2212.622930] R27: 0000000000000067 R26: 0000000000000000[ 2212.628211] R25: 0000000000000700 R24: ffffffc0287cfe00[ 2212.633493] R23: ffffffc02e9ee000 R22: 0000000000000067[ 2212.638774] R21: 0000000000000067 R20: 0000000000000067[ 2212.644055] R19: ffffffc00f0cbc00 R18: 0000000000000000[ 2212.649346] R17: 0000000000000000 R16: ffffffc000192da4[ 2212.654626] R15: 0000000000000000 R14: 00000000f6ff9eaf[ 2212.659908] R13: 00000000fff32670 R12: 00000000fff32678[ 2212.665189] R11: 00000000aac2ff00 R10: 0000000000000000[ 2212.670470] R9 : 00000001ffffffff R8 : 362e30333a31313a[ 2212.675752] R7 : 35302030322d3131 R6 : 2b2c2c2b2b292a2a[ 2212.681032] R5 : ffffffc000361ac8 R4 : 0000000000000000[ 2212.686322] R3 : 2b2c2c2b2b292a2a R2 : ffffffffffffffe7[ 2212.691604] R1 : ffffffc02e9ee010 R0 : 2b2c2c2b2b292a2a
关键的信息就在上面标黄的几行当中,可以看到,kernel是在试图访问一个很诡异的地址(2b2c2c2b2b292a2a)的时候发生错误的,而R0正好也是这个值,我们知道在arm体系当中,R0一般用来传递函数的第一个参数,下面我们通过分析PC和LR来获取更多的信息。
通过addr2line工具得到panic时候的code现场:
aarch64-linux-gnu-addr2line -e vmlinux ffffffc000300180??:?aarch64-linux-gnu-addr2line -e vmlinux ffffffc000367058/home/buildfarm/aabs/src.pxa1928-kk4.4.beta2/kernel/drivers/tty/tty_buffer.c:269
pc没有解析出来,但是LR得到了
256int tty_insert_flip_string_fixed_flag(struct tty_port *port,257 const unsigned char *chars, char flag, size_t size)258{259 int copied = 0;260 do {261 int goal = min_t(size_t, size - copied, TTY_BUFFER_PAGE);262 int space = tty_buffer_request_room(port, goal);263 struct tty_buffer *tb = port->buf.tail;264 /* If there is no space then tb may be NULL */265 if (unlikely(space == 0)) {266 break;267 }268 memcpy(tb->char_buf_ptr + tb->used, chars, space);269 memset(tb->flag_buf_ptr + tb->used, flag, space);270 tb->used += space;271 copied += space;272 chars += space;273 /* There is a small chance that we need to split the data over274 several buffers. If this is the case we must loop */275 } while (unlikely(size > copied));276 return copied;277}
可以很直观的看到,kernel是在执行memcpy的时候出错了,PC解析不出来是因为memcpy是库函数,那么R0的值就应该是tb->char_buf_ptr + tb->used,我们把这个函数反汇编来继续寻找线索。
ffffffc000366fe0:ffffffc000366fe0: a9ba7bfd stp x29, x30, [sp,#-96]!ffffffc000366fe4: 910003fd mov x29, spffffffc000366fe8: a90363f7 stp x23, x24, [sp,#48]ffffffc000366fec: a9046bf9 stp x25, x26, [sp,#64]ffffffc000366ff0: a9025bf5 stp x21, x22, [sp,#32]ffffffc000366ff4: f9002bfb str x27, [sp,#80]ffffffc000366ff8: a90153f3 stp x19, x20, [sp,#16]ffffffc000366ffc: aa0003f8 mov x24, x0ffffffc000367000: aa0103f7 mov x23, x1ffffffc000367004: 53001c5a uxtb w26, w2ffffffc000367008: aa0303fb mov x27, x3ffffffc00036700c: 52800015 mov w21, #0x0 // #0ffffffc000367010: d2800004 mov x4, #0x0 // #0ffffffc000367014: d280e019 mov x25, #0x700 // #1792ffffffc000367018: cb040361 sub x1, x27, x4ffffffc00036701c: f11c003f cmp x1, #0x700ffffffc000367020: 9a999021 csel x1, x1, x25, lsffffffc000367024: aa1803e0 mov x0, x24ffffffc000367028: 97ffff44 bl ffffffc000366d38 ffffffc00036702c: 93407c16 sxtw x22, w0ffffffc000367030: 2a0003f4 mov w20, w0ffffffc000367034: aa1703e1 mov x1, x23ffffffc000367038: aa1603e2 mov x2, x22ffffffc00036703c: f9401b13 ldr x19, [x24,#48]ffffffc000367040: 34000260 cbz w0, ffffffc00036708c ffffffc000367044: f9400663 ldr x3, [x19,#8]ffffffc000367048: b9801a60 ldrsw x0, [x19,#24] //ffffffc00036704c: 0b1402b5 add w21, w21, w20ffffffc000367050: 8b000060 add x0, x3, x0ffffffc000367054: 97fe641b bl ffffffc0003000c0 //这是kernel panic的地方ffffffc000367058: f9400a62 ldr x2, [x19,#16] //这个就是LR,返回地址ffffffc00036705c: b9801a60 ldrsw x0, [x19,#24]ffffffc000367060: 2a1a03e1 mov w1, w26ffffffc000367064: 8b000040 add x0, x2, x0ffffffc000367068: aa1603e2 mov x2, x22ffffffc00036706c: 97fe64d5 bl ffffffc0003003c0 ffffffc000367070: b9401a60 ldr w0, [x19,#24]ffffffc000367074: 93407ea4 sxtw x4, w21ffffffc000367078: 0b140014 add w20, w0, w20ffffffc00036707c: b9001a74 str w20, [x19,#24]ffffffc000367080: eb04037f cmp x27, x4ffffffc000367084: 8b1602f7 add x23, x23, x22ffffffc000367088: 54fffc88 b.hi ffffffc000367018 ffffffc00036708c: 2a1503e0 mov w0, w21ffffffc000367090: a94153f3 ldp x19, x20, [sp,#16]ffffffc000367094: a9425bf5 ldp x21, x22, [sp,#32]ffffffc000367098: a94363f7 ldp x23, x24, [sp,#48]ffffffc00036709c: a9446bf9 ldp x25, x26, [sp,#64]
从汇编代码中可以看到,X0(也就是R0)是通过X0加X3得到的,而X0和X3都是通过取地址X19加一些offset得到,结合code容易得出,X19就是tty_buffer的结构体指针,它的定义如下:
31struct tty_buffer {32 struct tty_buffer *next;33 char *char_buf_ptr;34 unsigned char *flag_buf_ptr;35 int used;36 int size;37 int commit;38 int read;39 /* Data points here */40 unsigned long data[0];41};
由前面的panic log可以知道X19的值等于ffffffc00f0cbc00,继续通过crash工具来查看这个地址的内容
crash> struct tty_buffer 0xffffffc00f0cbc00struct tty_buffer { next = 0x0, char_buf_ptr = 0x2b2c2c2b2b292a2a , flag_buf_ptr = 0x2a2a2a2b2c2a2b2b , used = 0, size = 690695211, commit = 0, read = 0, data = 0xffffffc00f0cbc28}
binggo,发现诡异地址来源,它的确来自于tty_buffer,为什么原本应该是一个正常的地址值现在却变成了这么一个诡异的值呢,很大可能是内存被覆盖了,而且这个值貌似还有一定的pattern,于是脑洞开一开,查看一下这个buffer附近的内存内容。
rd 0xffffffc00f0cac00 1000...ffffffc00f0cb420: 000000a8000000a8 35302030322d3131 ........11-20 05ffffffc00f0cb430: 362e30333a31313a 3634353520203333 :11:30.633 5546ffffffc00f0cb440: 4420323836352020 6d61436c76724d20 5682 D MrvlCamffffffc00f0cb450: 6e69676e45617265 69666e6f43203a65 eraEngine: Confiffffffc00f0cb460: 696c657069505f67 6e6f4365203a656e g_Pipeline: eConffffffc00f0cb470: 2c5d305b74786574 6172656d61436520 text[0], eCameraffffffc00f0cb480: 5b646e616d6d6f43 654e62202c5d3131 Command[11], bNeffffffc00f0cb490: 305b326e69426465 6c69745362202c5d edBin2[0], bStilffffffc00f0cb4a0: 507463656666416c 6950776569766572 lAffectPreviewPiffffffc00f0cb4b0: 305b656e696c6570 62616e4562202c5d peline[0], bEnabffffffc00f0cb4c0: 756f53524444656c 0a0d5d305b656372 leDDRSource[0]..ffffffc00f0cb4d0: 6e69676e45617265 6f74535f5b203a65 eraEngine: [_Stoffffffc00f0cb4e0: 206d616572745370 657250203e2d2d2d pStream ---> Preffffffc00f0cb4f0: 726f502077656976 61430a0d0a0d5d74 view Port]....Caffffffc00f0cb500: 647261486172656d 6573614265726177 meraHardwareBaseffffffc00f0cb510: 6c62616e652d203a 6570795467734d65 : -enableMsgTypeffffffc00f0cb520: 300a0d5d0a0d7820 0000000000000000 x..]..0...........ffffffc00f0cc530: e7e7e6e7e7e7e7e7 dcdce1e5e6e6e6e6 ................ffffffc00f0cc540: dae2e3d9cac2d7e3 dfe3dec7ccd1dde0 ................ffffffc00f0cc550: dbcbc8cac9c5d2d7 e0dfcbc0c9d9dbd9 ................ffffffc00f0cc560: d5cec1c2cfdcd8dc cbc7c7c0d4d8d7db ................ffffffc00f0cc570: d0b5bbc0ced3dbd4 dedfdfdedbd6dbdc ................ffffffc00f0cc580: dedededad4d7dede dbd9dcdedfdfdfde ................ffffffc00f0cc590: dddddedddddedddd c3c1c3c9d0d9dddc ................ffffffc00f0cc5a0: dbdbdcdadbdbdad3 d9dadbdad8cbcdd8 ................ffffffc00f0cc5b0: dbdbdbdadbd9dada dbdadbdbdbdbdcdb ................ffffffc00f0cc5c0: ccc3d3d9d9dadadb d1d1d1d1d2d1cfcd ................ffffffc00f0cc5d0: d1d1d2d2d1d1d1d0 bfcbd3d5d5d4d3d2 ................ffffffc00f0cc5e0: d1c9c3ccd2d0c4c0 b5b5b7b8b8c9d2d3 ................ffffffc00f0cc5f0: 444f7ca6b1b4b5b4 3a3a363536373b3e .....|OD>;7656::ffffffc00f0cc600: 2a2a2b2c2c2b2c2b 2b2c2a2b2a292b2a +,+,,+***+)*+*,+ffffffc00f0cc610: 2b2b2a292b2b2a2a 282a2a2a2c2c2b2b **++)*++++,,***(ffffffc00f0cc620: 1414161f25282828 1314121312131413 (((%............ffffffc00f0cc630: 1414131313141411 1918171616151614 ................ffffffc00f0cc640: 1b19191919181717 28262423211f1d1c ...........!#$&(ffffffc00f0cc650: 34353433302f2d2a 3e3d3d3d3c3b3636 *-/0345466;<===>ffffffc00f0cc660: 333434373a3c3c3d 3132313231303133 =<<:744331012121ffffffc00f0cc670: 2c2c2d2e2f2e2e2e 2628292929292a2a .../.-,,**))))(&ffffffc00f0cc680: 2b2b2b292a292728 2c2d2d2d2d2e2d2c (')*)+++,-.----,ffffffc00f0cc690: 2a2c2b2d2d2d2d2c 282828292829292b ,----+,*+))()(((ffffffc00f0cc6a0: 2626252725252728 2728292828262625 ('%%'%&&%&&(()('ffffffc00f0cc6b0: 2c2c2c2c2c2b2928 2d2b2b2b2d2d2d2c ()+,,,,,,---+++-ffffffc00f0cc6c0: 2c2b2b2b2c2b2c2c 2c2c2c2b2c2c2a2c ,,+,+++,,*,,+,,,ffffffc00f0cc6d0: 2c2c2b2b2c2b2b2c 2e2e2e2e2e2c2c2d ,++,++,,-,,.....ffffffc00f0cc6e0: 2f2f2f302e2f2f2e 3b38373432312f2f .//.0/12478;ffffffc00f0cc6f0: 454545444543423f 4340404041444445 ?BCEDEEEEDDA@@@Cffffffc00f0cc700: 4647474847454243 4342424143444446 CBEGHGGFFDDCABBCffffffc00f0cc710: 4a48474546464345 8a837a736b635b51 ECFFEGHJQ[cksz..ffffffc00f0cc720: 9c9c9b9a9695938e 9f9f9f9e9e9d9d9c ................ffffffc00f0cc730: a0a1a09f9f9f9e9e 9f9f9ea0a0a1a0a1 ................ffffffc00f0cc740: a09f9e9e9fa0a09e a0a0a0a0a0a0a0a0 ................ffffffc00f0cc750: a1a0a0a2a1a1a0a0 a0a0a1a1a0a0a0a0 ................ffffffc00f0cc760: a3a2a2a2a1a1a1a1 9fa0a0a0a1a2a1a2 ................ffffffc00f0cc770: 9f9e9e9e9f9fa09f a1a0a1a09fa09f9e ................ffffffc00f0cc780: 9fa0a0a1a1a1a0a0 a0a0a0a1a1a0a09f ................ffffffc00f0cc790: a0a1a1a0a0a0a0a0 9fa0a0a1a0a1a19f ................ffffffc00f0cc7a0: 9e9e9f9e9f9ea09f 9b9c9c9c9c9d9e9e ................
发现重大线索,附近的内存有camera相关的字符串,而且大部分的内容都是跟之前R0的值很类似的,看着是渐变的数据,camera图像的数据不就是渐变的嘛,于是喊camera的人过来确认,这的确是图像数据,至于为什么会冲掉tty_buffer的memory呢,是因为最近camera那边也enable了mmu,很明显里面是有bug的,至于是啥bug呢,暂时还不清楚,之后查明了在补充吧。