Bug 907039

Summary: i915 drm driver complains: *ERROR* pipe A/B underrun
Product: [openSUSE] openSUSE Distribution Reporter: Jean Delvare <jdelvare>
Component: KernelAssignee: E-mail List <kernel-maintainers>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <qa-bugs>
Severity: Minor    
Priority: P5 - None CC: eich, jdelvare, joschibrauchle, jslaby, tiwai
Version: 13.2   
Target Milestone: ---   
Hardware: i686   
OS: openSUSE 13.2   
URL: https://bugzilla.kernel.org/show_bug.cgi?id=81711
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Jean Delvare 2014-11-25 10:43:42 UTC
I see pipe underrun messages on my old Panasonic Toughbook CF-18 laptop after updating to openSUSE 13.2. Before that it was running openSUSE 11.4, and I never saw these error messages.

The graphics chip in my laptop is:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)
00:02.1 Display controller [0380]: Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)

No errors nor warnings visible in Xorg.0.log.

Relevant kernel log messages:
[    5.462859] [drm] Initialized drm 1.1.0 20060810
[    5.516140] [drm] Memory usable by graphics device = 128M
[    5.516146] [drm] Replacing VGA console driver
[    5.516158] fb: switching to inteldrmfb from EFI VGA
[    5.521916] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    5.521924] [drm] Driver supports precise vblank timestamp query.
[    5.580087] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3
[    5.599601] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[    5.619742] [drm] initialized overlay support
[    5.698544] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[    5.780209] fbcon: inteldrmfb (fb0) is primary device
[    6.592586] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    6.592616] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[    6.642304] [drm:i9xx_check_fifo_underruns] *ERROR* pipe B underrun
[    6.725855] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[   47.458008] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[   47.736185] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[   48.686871] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[   75.032451] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[   75.428211] [drm:i9xx_set_fifo_underrun_reporting] *ERROR* pipe A underrun
[ 1402.930605] [drm:i8xx_irq_handler] *ERROR* pipe B underrun
[ 5332.865613] [drm:i8xx_irq_handler] *ERROR* pipe B underrun
[ 6240.731628] [drm:i8xx_irq_handler] *ERROR* pipe B underrun
Comment 1 Jean Delvare 2014-11-25 10:44:34 UTC
I think this corresponds to upstream kernel bug #81711.
Comment 2 Egbert Eich 2014-11-25 12:21:57 UTC
Looks like a bisect would be the thing to do. To expedite this, have you tried a SLE12 kernel on this?
Comment 3 Jean Delvare 2014-11-25 12:50:25 UTC
(In reply to Egbert Eich from comment #2)
> (...) have you tried a SLE12 kernel on this?

Not yet, but I can do that.
Comment 4 Egbert Eich 2014-11-25 13:07:28 UTC
Also, you may want to try if the bug is gone in the latest upstream kernel snapshot.
If it is there is no point discussing this upstream.
Comment 5 Takashi Iwai 2014-11-25 15:29:35 UTC
This is highly likely the effect of the commit in 3.14 kernel:
commit fc2c807b7a2b2ca8dbe2aed2f5ae730c19beeda5
    drm/i915: Make underruns DRM_ERROR

This replaces the underrun reporting from DRM_DEBUG_DRIVER() to DRM_ERROR().  And I bet that the underrun itself exited since long time ago but was never shown as an error until this change.

That said, it's no real regression, can be mostly ignored for an end user (unless you really see something wrong :)
Comment 6 Jean Delvare 2014-11-25 17:03:05 UTC
Well, the problem is that this happens at boot time and causes the nice openSUSE picture to be replaced by white-on-black error messages. That's not exactly user-friendly.

If these are really only debug / info / warning messages and not actual errors then the log level should be adjusted accordingly.
Comment 7 Egbert Eich 2014-11-25 18:45:27 UTC
The Intel driver is full of WARN_ONs which indicate inconsistencies which don't seem to matter in real life. Still they produce backtraces which alarm users and make them report L3 bugs.
I've asked Intel in Bordeaux to make those messages configurable but they did not want to do that.
I did spend quite some time fixing some of the issues I had seen or I had been made aware of. I'm afraid however that FIFO underruns are beyond what I can do - I don't have detailed documentations on watermarks and their tunings.
Comment 8 Jean Delvare 2014-11-26 09:11:22 UTC
I couldn't test the SLE12 kernel because this is a 32-bit machine, however I could test the upcoming Evergreen kernel which is close enough (based on 3.12.32.) The pipe underrun error messages aren't there.

Now I'll test Takashi's theory.
Comment 9 Jean Delvare 2014-11-27 11:01:48 UTC
Hmmm, no, with "drm/i915: Make underruns DRM_ERROR" as the top commit, I do not get the underrun error messages. So it seems that the problem was introduced later.

Looks like I'm up for a full bisection.
Comment 10 Jean Delvare 2014-11-28 08:06:28 UTC
Oh well, I spoke a bit too fast. There are actually two different occurrences of the pipe underrun. One is happening at run-time and was already present at the time of commit "drm/i915: Make underruns DRM_ERROR". The other one is happening at boot-time and was not present at that time. For now I am bisecting the boot-time one, because it is a regression so it is probably easier to fix.
Comment 11 Jean Delvare 2014-12-02 10:02:55 UTC
I bisected this issue and here are my results.

First of all, there are two sets of error messages to consider: the one from i9xx_set_fifo_underrun_reporting, and the rest.

The error message from i9xx_set_fifo_underrun_reporting appeared with commit e69abff0d6794311d834de0fa2f188eb24a977b9 ("drm/i915: Check for FIFO underuns when disabling reporting on gmch platforms" in v3.16.) This commit is merely adding detection and reporting of the error condition, so this is not a regression per se.

The other messages appeared with commit fc2c807b7a2b2ca8dbe2aed2f5ae730c19beeda5 ("drm/i915: Make underruns DRM_ERROR" in v3.15.) Again this commit is only making the reporting of the error conditions unconditional, so this is not a regression per se.

As a summary, the underruns have always been there, just they were not reported as errors before.

I admit I find it strange that messages that were reported at debug level before were suddenly considered so important that they needed to be reported at error level. I understand that the driver developers wanted to see these messages without having to turn debugging on, but I believe that info level would be more appropriate. The reason why I am suggesting that is that these early error messages kill the nice splash screen at boot time, while info messages would not.
Comment 12 Jean Delvare 2014-12-02 10:04:23 UTC
(In reply to Egbert Eich from comment #4)
> Also, you may want to try if the bug is gone in the latest upstream kernel
> snapshot.

The problem is still present upstream.
Comment 13 Egbert Eich 2014-12-03 18:22:05 UTC
(In reply to Jean Delvare from comment #11)

> I admit I find it strange that messages that were reported at debug level
> before were suddenly considered so important that they needed to be reported
> at error level. I understand that the driver developers wanted to see these
> messages without having to turn debugging on, but I believe that info level
> would be more appropriate. The reason why I am suggesting that is that these
> early error messages kill the nice splash screen at boot time, while info
> messages would not.

This is true and I talked about that to the Intel folks on XDC. What annoys me even more are the many WARN_ONs which report some mismatches between some expected and true settings. These WAN_ONs generate backtraces which look like oopses to the semi-informed. So many users report kernel Oopses although they have not discovered a real problem.
I have fixed some of them myself in the past - mostly when I have actually seen a real issue.

Having said this, the platform you are looking at is actually fairly old: i855 came out over 10y ago. The reason that this bug has not been looked at by Intel is that they don't have much interest (and thus want to spend time on this).
You may have to look into this yourself if you want to see this fixed in a timely manner.

There seem to be 2 reasons for the these messages:

1. Looking at the commit logs there are conditions where spurious underruns may occur. For Gen2 check commit: 4a3436e85ccc2925f4ee7e363131107bb00aab77
People have been working hard to avoid these spurious reports. There may still be cases for older platforms, though as those are not too much in the focus.

2. Watermark values are wrong for the mode you are using. Are you using an external monitor with a higher resolution? Maybe you want to check if these messages go away if you use only the internal display or reduce your external one. If the messages disappear then we have a pretty good indication that this here is what is happening. You could go and play with the WMs. Commit 
e95a2f7509f5219177d6821a0a8754f93892ca56 may give you some ideas what to do.
Comment 14 Takashi Iwai 2015-01-16 14:39:17 UTC
No direct fix for this bug but a relevant fix for gen5+ went into upstream in commit b68362278af94e1171f5be9d4e44988601fb0439
  drm/i915: More cautious with pch fifo underruns

I backported it to openSUSE-13.2 branch now.
Comment 15 Joschi Brauchle 2015-02-11 09:28:07 UTC
Hi there,

we are seeing this:
-----------------
kernel: [drm:cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo underrun on pch transcoder A
kernel: [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
-----------------
with a 
-----------------
09: PCI 02.0: 0300 VGA compatible controller (VGA)              
  [Created at pci.328]
  Unique ID: _Znp.B56NdSxIhXD
  SysFS ID: /devices/pci0000:00/0000:00:02.0
  SysFS BusID: 0000:00:02.0
  Hardware Class: graphics card
  Model: "Intel 3rd Gen Core processor Graphics Controller"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x0166 "3rd Gen Core processor Graphics Controller"
  SubVendor: pci 0x10cf "Fujitsu Limited."
  SubDevice: pci 0x16c1 
  Revision: 0x09
  Driver: "i915"
  Driver Modules: "drm"
  Memory Range: 0xf0000000-0xf03fffff (rw,non-prefetchable)
  Memory Range: 0xe0000000-0xefffffff (ro,non-prefetchable)
  I/O Ports: 0x3000-0x303f (rw)
  IRQ: 42 (61 events)
  I/O Ports: 0x3c0-0x3df (rw)
  Module Alias: "pci:v00008086d00000166sv000010CFsd000016C1bc03sc00i00"
  Driver Info #0:
    Driver Status: i915 is active
    Driver Activation Cmd: "modprobe i915"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

Primary display adapter: #9
-----------------

It is not 100% identical to the reported message in c#1, but possibly its the same 'warning-is-now-considered-error' problem?

There are further references here: https://bugzilla.kernel.org/show_bug.cgi?id=79261

I can open a seperate BZ if needed.

Has the mentioned commit b68362278af94e1171f5be9d4e44988601fb0439 been released on 13.2 yet?
Comment 16 Takashi Iwai 2015-02-11 09:55:23 UTC
(In reply to Joschi Brauchle from comment #15)
> Hi there,
> 
> we are seeing this:
> -----------------
> kernel: [drm:cpt_set_fifo_underrun_reporting] *ERROR* uncleared pch fifo
> underrun on pch transcoder A
> kernel: [drm:cpt_serr_int_handler] *ERROR* PCH transcoder A FIFO underrun
> -----------------
> with a 
> -----------------
> 09: PCI 02.0: 0300 VGA compatible controller (VGA)              
>   [Created at pci.328]
>   Unique ID: _Znp.B56NdSxIhXD
>   SysFS ID: /devices/pci0000:00/0000:00:02.0
>   SysFS BusID: 0000:00:02.0
>   Hardware Class: graphics card
>   Model: "Intel 3rd Gen Core processor Graphics Controller"
>   Vendor: pci 0x8086 "Intel Corporation"
>   Device: pci 0x0166 "3rd Gen Core processor Graphics Controller"
>   SubVendor: pci 0x10cf "Fujitsu Limited."
>   SubDevice: pci 0x16c1 
>   Revision: 0x09
>   Driver: "i915"
>   Driver Modules: "drm"
>   Memory Range: 0xf0000000-0xf03fffff (rw,non-prefetchable)
>   Memory Range: 0xe0000000-0xefffffff (ro,non-prefetchable)
>   I/O Ports: 0x3000-0x303f (rw)
>   IRQ: 42 (61 events)
>   I/O Ports: 0x3c0-0x3df (rw)
>   Module Alias: "pci:v00008086d00000166sv000010CFsd000016C1bc03sc00i00"
>   Driver Info #0:
>     Driver Status: i915 is active
>     Driver Activation Cmd: "modprobe i915"
>   Config Status: cfg=new, avail=yes, need=no, active=unknown
> 
> Primary display adapter: #9
> -----------------
> 
> It is not 100% identical to the reported message in c#1, but possibly its
> the same 'warning-is-now-considered-error' problem?

The reason why now appears is the same, the message was promoted as a warning now.  But the cause is definitely different from Jean's case.  So there is no need to handle in the same bug.
 
> There are further references here:
> https://bugzilla.kernel.org/show_bug.cgi?id=79261
> 
> I can open a seperate BZ if needed.
> 
> Has the mentioned commit b68362278af94e1171f5be9d4e44988601fb0439 been
> released on 13.2 yet?

The patch was already backported, but it wno't cover your messages, as these appear even on 3.19 kernel.  That implies that it's no help to track this on this bugzilla.  Rather report to upstream bugzilla (either freedesktop.org or kernel.org) and track there wiht the latest kernel.
Comment 17 Joschi Brauchle 2015-02-11 10:02:10 UTC
> The patch was already backported, but it wno't cover your messages, as these
> appear even on 3.19 kernel.  That implies that it's no help to track this on
> this bugzilla.  Rather report to upstream bugzilla (either freedesktop.org
> or kernel.org) and track there wiht the latest kernel.

Thanks! Will do.
Comment 18 Swamp Workflow Management 2015-04-13 12:06:51 UTC
openSUSE-SU-2015:0713-1: An update that solves 13 vulnerabilities and has 52 fixes is now available.

Category: security (important)
Bug References: 867199,893428,895797,900811,901925,903589,903640,904899,905681,907039,907818,907988,908582,908588,908589,908592,908593,908594,908596,908598,908603,908604,908605,908606,908608,908610,908612,909077,909078,909477,909634,910150,910322,910440,911311,911325,911326,911356,911438,911578,911835,912061,912202,912429,912705,913059,913466,913695,914175,915425,915454,915456,915577,915858,916608,917830,917839,918954,918970,919463,920581,920604,921313,922542,922944
CVE References: CVE-2014-8134,CVE-2014-8160,CVE-2014-8559,CVE-2014-9419,CVE-2014-9420,CVE-2014-9428,CVE-2014-9529,CVE-2014-9584,CVE-2014-9585,CVE-2015-0777,CVE-2015-1421,CVE-2015-1593,CVE-2015-2150
Sources used:
openSUSE 13.2 (src):    bbswitch-0.8-3.6.6, cloop-2.639-14.6.6, crash-7.0.8-6.6, hdjmod-1.28-18.7.6, ipset-6.23-6.6, kernel-docs-3.16.7-13.2, kernel-obs-build-3.16.7-13.7, kernel-obs-qa-3.16.7-13.1, kernel-obs-qa-xen-3.16.7-13.1, kernel-source-3.16.7-13.1, kernel-syms-3.16.7-13.1, pcfclock-0.44-260.6.2, vhba-kmp-20140629-2.6.2, virtualbox-4.3.20-10.2, xen-4.4.1_08-12.2, xtables-addons-2.6-6.2
Comment 19 Jiri Slaby 2015-05-28 13:33:37 UTC
Thanks.