Some tips on getting to the reason for the recent suspend crashes

Technical issues/questions of an intermediate or advanced nature.
User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#1 by Rava » 16 Nov 2022, 17:07

Edit
should I move this into Bug reports?


Hi all,

after some update that changed something in my system, my Port 5.0 x86-64 XFCE 4.16 system crashed when trying to suspend every one or two days, and usually not even REISUB is helping and I need to do a hard reset (pressing the power button for 10 seconds).
My system:

Code: Select all

root@porteus:~# cat /etc/porteus/*
001-core.xzm:20221006
002-xorg.xzm:20221006
002-xtra.xzm:20220630
003-xfce.xzm:20220925
initrd.xz:20220928
I already monitor and save the log via my script as described here: [HOWTO] save log files when using no changes / always fresh approach in Port 3,4,5. since I not use the changes= approach.

For now I checked if not going online makes a difference ( theory was: maybe some nasty stuff got behind the more on the easy side of things rc.Fireall ) but no, the crash still happened.

Thee one program I updated recently was palemoon, but palemoon is not running during suspend because it occupies too much RAM (I only have 4 GB) so PM is very unlikely the culprit.

I realized after the newest core updates that starting X takes seemingly longer (I boot in mode 3 to initialize the above "[HOWTO] save log files" and manually start X by a guest virtual console login and using

Code: Select all

startx
- and there is also a screen distortion (big black and white boxes on the screen for a half-second) during start of XFCE that did not happen in older system (either due to me using XFCE 4.12 or older 000- 001- core modules.)


What are you suggesting I try changing first?
002-xtra.xzm is from 20220630 and thus cannot be the issue.

Should I exchange only one of these modules for the older one:
001-core.xzm:20221006
002-xorg.xzm:20221006
003-xfce.xzm:20220925
and see if that gets me to the core of the issue?

And which order would you suggest changing a module?

I think only exchanging one module at a time makes more sense than exchanging 2 or even all three.

Not loading any additional modules makes no sense since then having the system turned on serves no purpose, I need my interlink for emails and palemoon for the net.

The saved log /var/log/messages never tells me anything helpful.

These are the last 2 crashes.
This is the most recent crash, the one where I did not go online:

Code: Select all

Nov 16 07:45:26 porteus NetworkManager[1603]: <info>  [1668581126.0748] manager: sleep: sleep requested (sleeping: no  enabled: yes)
Nov 16 07:45:26 porteus NetworkManager[1603]: <info>  [1668581126.0749] device (eth0): state change: unavailable -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Nov 16 07:45:26 porteus ModemManager[1647]: <info>  [sleep-monitor] system is about to suspend 
●● sleeping 10 seconds for loginctl suspend to background-suspend. ●●
Nov 16 07:45:26 porteus NetworkManager[1603]: <info>  [1668581126.0907] manager: NetworkManager state is now ASLEEP
Nov 16 07:45:26 porteus elogind-daemon[1275]: Suspending system...
Nov 16 07:45:26 porteus kernel: PM: suspend entry (deep)
Nov 16 07:45:26 porteus kernel: Filesystems sync: 0.001 seconds
The line starting with ●● is added by my suspend script.

And the one crash before that:

Code: Select all

●● sleeping 10 seconds for loginctl suspend to background-suspend. ●●
Nov 15 14:02:57 porteus NetworkManager[1622]: <info>  [1668517377.0868] manager:
 sleep: sleep requested (sleeping: no  enabled: yes)
Nov 15 14:02:57 porteus NetworkManager[1622]: <info>  [1668517377.1026] device (
eth0): state change: unavailable -> unmanaged (reason 'sleeping', sys-iface-stat
e: 'managed')
Nov 15 14:02:57 porteus ModemManager[1669]: <info>  [sleep-monitor] system is ab
out to suspend 
Nov 15 14:02:57 porteus NetworkManager[1622]: <info>  [1668517377.1205] manager:
 NetworkManager state is now ASLEEP
Nov 15 14:02:57 porteus elogind-daemon[1112]: Suspending system...
Nov 15 14:02:57 porteus kernel: PM: suspend entry (deep)
Nov 15 14:02:57 porteus kernel: Filesystems sync: 0.056 seconds
Nov 15 14:03:18 porteus kernel: Freezing user space processes ... 
Nov 15 14:03:18 porteus kernel: Xorg            D    0  1796   1795 0x00400004
Nov 15 14:03:18 porteus kernel: OOM killer enabled.
Nov 15 14:03:18 porteus kernel: Restarting tasks ... done.
Nov 15 14:03:18 porteus kernel: video LNXVIDEO:01: Restoring backlight state
●●●●● loginctl suspend took 00:00:25 … ●●●●●●●●●●●●●●●
●●●●● after suspend-s3b2 ●●●●●●●●●●●●●●●
Nov 15 14:03:18 porteus kernel: PM: suspend exit
Nov 15 14:03:18 porteus kernel: PM: suspend entry (s2idle)
Nov 15 14:03:18 porteus acpid: client 1796[0:100] has disconnected
Nov 15 14:03:18 porteus last message buffered 1 times
Nov 15 14:03:18 porteus acpid: client connected from 1796[0:100]
Nov 15 14:03:18 porteus acpid: 1 client rule loaded
15.11.2022 14:03:18 ____________________________________________________________
               total        used        free      shared  buff/cache   available
Mem:            3873        1291         707          64        1875        2086
Swap:           4917         102        4815
________________________________________________________________________________
Filename                                Type            Size    Used    Priority
/dev/sda6                               partition       4607    0       1       
/dev/zram0                              partition       309     102     100     
Nov 15 14:03:18 porteus kernel: Filesystems sync: 0.392 seconds
Nov 15 14:03:19 porteus acpid: client connected from 1796[0:100]
Nov 15 14:03:19 porteus acpid: 1 client rule loaded
As you can see, my script says the suspend took 00:00:25 - but instead of coming back alive the system crashed for some unknown to me reason.

Again some extra info is added by my suspend script like the free -m and /proc/swaps stuff.

actually I use this function for the swap info:

Code: Select all

sx () 
{ 
        read firstLine;
        echo "$firstLine";
        while read f t s u p; do
            let "s2 = $s / 1024";
            let "u2 = $u / 1024";
            printf '%-40s%-16s%-8s%-8s%-8s\n' $f $t $s2 $u2 $p;
        done
} < /proc/swaps
Added in 23 hours 16 minutes 15 seconds:
And another crash, again it went into suspend, according to the /var/log/messages it took 22 seconds, but it crashed nonetheless during its return from loginctl suspend:

Code: Select all

Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.2820] manager: sleep: sleep requested (sleeping: no  enabled: yes)
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.2821] manager: NetworkManager state is now ASLEEP
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.2824] device (eth0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed')
●● sleeping 10 seconds for loginctl suspend to background-suspend. ●●
Nov 17 09:44:09 porteus ModemManager[1610]: <info>  [sleep-monitor] system is about to suspend 
Nov 17 09:44:09 porteus dbus-daemon[772]: [system] Activating service name='org.freedesktop.nm_dispatcher' requested by ':1.3' (uid=0 pid=1563 comm="/usr/sbin/NetworkManager ") (using servicehelper)
Nov 17 09:44:09 porteus dbus-daemon[772]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.3732] device (eth0): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed')
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.38d34] policy: set-hostname: current hostname was changed outside NetworkManager: 'porteus.example.net'
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.3847] policy: set-hostname: current hostname was changed outside NetworkManager: 'porteus.example.net'
Nov 17 09:44:09 porteus NetworkManager[1563]: <info>  [1668674649.3857] device (eth0): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Nov 17 09:44:09 porteus kernel: r8169 0000:02:00.0 eth0: Link is Down
Nov 17 09:44:09 porteus elogind-daemon[1169]: Suspending system...
Nov 17 09:44:09 porteus kernel: PM: suspend entry (deep)
Nov 17 09:44:09 porteus kernel: Filesystems sync: 0.061 seconds
Nov 17 09:44:30 porteus kernel: Freezing user space processes ... 
Nov 17 09:44:30 porteus kernel: tee             D    0  1718   1712 0x00004004
Nov 17 09:44:30 porteus kernel: OOM killer enabled.
Nov 17 09:44:30 porteus kernel: Restarting tasks ... done.
Nov 17 09:44:30 porteus kernel: video LNXVIDEO:01: Restoring backlight state
●●●●● loginctl suspend took 00:00:22 … ●●●●●●●●●●●●●●●
●●●●● after suspend-s0b2 ●●●●●●●●●●●●●●●
Nov 17 09:44:30 porteus kernel: PM: suspend exit
Nov 17 09:44:30 porteus kernel: PM: suspend entry (s2idle)
Nov 17 09:44:30 porteus acpid: client 1767[0:100] has disconnected
Nov 17 09:44:30 porteus last message buffered 1 times
Nov 17 09:44:30 porteus acpid: client connected from 1767[0:100]
Nov 17 09:44:30 porteus acpid: 1 client rule loaded
17.11.2022 09:44:30 ____________________________________________________________
               total        used        free      shared  buff/cache   available
Mem:            3873        1399        1237          85        1236        1966
Swap:           4917          57        4860
________________________________________________________________________________
Filename				Type		Size	Used	Priorityd
/dev/sda6                               partition       4607    0       1       
/dev/zram0                              partition       309     57      100     
Nov 17 09:44:30 porteus kernel: Filesystems sync: 0.499 seconds
Nov 17 09:44:30 porteus acpid: client connected from 1767[0:100]
Nov 17 09:44:30 porteus acpid: 1 client rule loaded
As my suspend-script wrote into the log, this time the system had uncharacteristically much free memory, but the crash still happened.

When any of these crashes happens, the machine is kept on but the screen turns dark - but not switches off.
And only when I press any key, only then the screen switches off due of not getting any black image from the OS any more, for whatever weird reasons.
Does this behaviour on crash rings any bells with any of you?
Cheers!
Yours Rava

User avatar
AcnapyxoB
Samurai
Samurai
Posts: 191
Joined: 24 Dec 2014, 10:15
Distribution: Porteus 5.01
Location: Planet Earth

Some tips on getting to the reason for the recent suspend crashes

Post#2 by AcnapyxoB » 17 Nov 2022, 16:41

Rava wrote:
17 Nov 2022, 16:24
Does this behaviour on crash rings any bells with any of you?
I'm not sure if XFCE doesn't need pm-utils:
https://pm-utils.freedesktop.org/wiki/
https://slackware.uk/slackware/slackwar ... 6_64-5.txz

Also why don't you try another DE to see if the problem exists!
Porteus v5.01 KDE x86_64

User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#3 by Rava » 17 Nov 2022, 16:48

AcnapyxoB wrote:
17 Nov 2022, 16:41
I'm not sure if XFCE doesn't need pm-utils:
It never did in the past.

And port switched to loginctl, so now

Code: Select all

loginctl suspend
initialized the suspend when you do it via terminal (like i do for logging extra info reasons)

I could switch back to e.g. XFCE 4.12
Cheers!
Yours Rava

User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#4 by Rava » 17 Nov 2022, 21:23

I changed from the previous

Code: Select all

root@porteus:~# cat /etc/porteus/*
001-core.xzm:20221006
002-xorg.xzm:20221006
002-xtra.xzm:20220630
003-xfce.xzm:20220925
initrd.xz:20220928
aka XFCE 4.16 back to this

Code: Select all

root@porteus:~# cat /etc/porteus/*
001-core.xzm:20221006
002-xorg.xzm:20221006
002-xtra.xzm:20220630
003-xfce4.xzm:20220428
initrd.xz:20220928
aka XFCE 4.12 - but it reports the version wrong, in actuality it is 20220725, not 20220428.

Now I am back to the quite annoying need of having to adjust the size of the name area of the files in thunar since the older version expands that area to ridiculous sizes but never shrinks that name area size back down. (Only valid when you use View as detailed list)

See for yourself, e.g. guest's ~/ folder :
Image

in XFCE 4.16 it always looks like so:
Image
(this is the manually adjusted XFCE 4.12 thunar, just to show you what I mean, a picture being more worth than a thousand words and all that.)


But at least one improvement is already there:
The above mentioned distortion at startup of X (the black and white boxes) is not present at startup from XFCE 4.12.

In one or two days after some suspend's I will see if using XFCE 4.12 also solved the crash issue…

Added in 9 minutes 2 seconds:
And yet another issue with XFCE 4.12

In 4.16 mousepad could be set to re-use the existing window, meaning when I use the terminal to edit a text file, I do not need to manually push mousepad into the background by either recalling to add " &" at the end of its call or freezing the program with "Ctrl+z" and putting it into the background with

Code: Select all

bg
- but I did not find such option for XFCE 4.12 mousepad for "re-use the existing window". :cry:
Cheers!
Yours Rava

User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#5 by Rava » 18 Nov 2022, 22:51

Since changing back to the previously used-by-me version of XFCE 4.12 did not change the crashing issue, I switched back to XFCE 4.16 but now use the previous version of 002-xorg making this my setup:

Code: Select all

guest@porteus:~$ cat /etc/porteus/*
001-core.xzm:20221006
002-xorg.xzm:20220723
002-xtra.xzm:20220630
003-xfce.xzm:20220925
initrd.xz:20220928
Now I have to wait and suspend (= "loginctl suspend") a few times and see if using the older 002-xorg.xzm makes a difference…

By doing so I learn a bit more about the loginctl ( "Control the elogind login manager" ) - like that it only recognizes current sessions when they are started via the virtual consoles, and since most of you never started Porteus in init mode 3 I presume you will only see one session listed.
Since I run my Port in init mode 3 I can log into as much virtual consoles as there are defined in /etc/inittab - and that's only 6 by default unless you edit /etc/inittab and reboot:

Code: Select all

guest@porteus:~$ grep -E "tty[1-9]|getties" /etc/inittab
# These are the standard console login getties in multiuser mode:
c1:12345:respawn:/sbin/agetty --noclear 38400 tty1 linux
c2:12345:respawn:/sbin/agetty 38400 tty2 linux
c3:12345:respawn:/sbin/agetty 38400 tty3 linux
c4:12345:respawn:/sbin/agetty 38400 tty4 linux
c5:12345:respawn:/sbin/agetty 38400 tty5 linux
c6:12345:respawn:/sbin/agetty 38400 tty6 linux
Since I logged in as root to start my logging /var/log/messages to my hardrive via tty1 and as guest for startx on tty2 I have two sessions listed:

Code: Select all

root@porteus:~# loginctl list-sessions
SESSION  UID USER  SEAT  TTY 
     c1    0 root  seat0 tty1
     c2 1000 guest seat0 tty2

2 sessions listed.

Code: Select all

root@porteus:~# loginctl session-status c1
c1 - root (0)
           Since: Fri 2022-11-18 17:05:24 CET; 5h 32min ago
          Leader: 1618 (login)
            Seat: seat0; vc1
             TTY: tty1
         Service: login; type tty; class user
           State: online
root@porteus:~# loginctl session-status c2
c2 - guest (1000)
           Since: Fri 2022-11-18 17:05:58 CET; 5h 32min ago
          Leader: 1619 (login)
            Seat: seat0; vc2
             TTY: tty2
         Service: login; type tty; class user
           State: active
root@porteus:~# loginctl session-status c3
root@porteus:~# echo $?
1
I wonder what's the difference between State: online and State: active.
man loginctl not really tells us about that.

Not that it's really important, but I am still curious. :)

Added in 22 hours 32 minutes 48 seconds:
Quick update

For now with the only change being the older 002-xorg module from 20220723 the system survived much more suspend states (or in other words entered a successful suspend more often) than any bootup with the most recent 002-xorg module.
Cheers!
Yours Rava

User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#6 by Rava » 20 Nov 2022, 16:08

Seems indeed the culprit behind the regular crash-on-suspend was indeed the 003-xorg V20221006 :P

Now that I exchanged it with 003-xorg V20220723 the machine is now on for almost exactly 48 hrs.

Booted it up at "Nov 18 17:04" and had 7 suspend in that time up to now.

My suspend script writes big bad bullets into /var/log/messages and searching for the "●●● loginctl suspend took " message that is part of what my logging + suspend script does, it is easy to ask how many suspend there were:

Code: Select all

root@porteus:/# grep "●●● loginctl suspend took " /var/log/messages  |wc
      7      49     679
Previously when I had the system with the updated to 003-xorg V20221006 it would crash at 2nd or latest at 3rd attempt at suspending.
Cheers!
Yours Rava

User avatar
Rava
Contributor
Contributor
Posts: 5401
Joined: 11 Jan 2011, 02:46
Distribution: XFCE 5.01 x86_64 + 4.0 i586
Location: Forests of Germany

Some tips on getting to the reason for the recent suspend crashes

Post#7 by Rava » 25 Dec 2022, 08:21

Sadly, even with the most recent code module the suspend crash still happens very regularly.

Since I not use changes= I boot into mode 3 and start my script doing the logging as root manually as described here [HOWTO] save log files when using no changes / always fresh approach in Port 3,4,5.

Main modules, initrd and kernel:

Code: Select all

$ cat /etc/porteus/*
001-core.xzm:20221211
002-xorg.xzm:20221211
002-xtra.xzm:20221211
003-xfce.xzm:20220925
initrd.xz:20220928
$ uname -r
5.4.30-porteus
Of course I loaded more than just the base modules like palemoon and interlink and 010-nvidia-340.108-k.5.4.30-porteus-v5.0-x86_64_rava.xzm

This is the

Code: Select all

top -bn 1 |grep -E 'palemoon|Xorg|pulseaudio|ffplay|ffmpeg|mpv|spacefm|viewnior|interlink|gimp|Thunar|mousepad|leafpad|geany|chrom[ei]|brave|florence|soffice|mocp|firefox|signal-desktop|midori' 
just prior suspend:

Code: Select all

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 2014 guest     20   0 2927776  14624   7744 S  18.8   0.4 155:12.08 pulseaudio
 8574 guest     20   0 1117596  80652  47900 S   6.2   2.0  13:33.45 ffplay
 1828 root      20   0  514968 327344  50528 S   0.0   8.3  76:31.82 Xorg
 1931 guest     20   0  480064  69268  46036 S   0.0   1.7   0:05.15 Thunar
 2272 guest     20   0 1034556  83376  47312 S   0.0   2.1   1:19.27 mousepad
 2547 guest     20   0 2296912 356228  96332 S   0.0   9.0   3:12.75 interlink
 2899 guest     20   0    6096   3816   3088 S   0.0   0.1   0:00.00 ffplay.hi+
 2903 guest     20   0 1941900 130740  50856 S   0.0   3.3  24:05.05 ffplay
 4834 guest     20   0    6096   3884   3148 S   0.0   0.1   0:00.00 ffplay.hi+
 4838 guest     20   0 1928100 111444  53248 S   0.0   2.8  18:02.77 ffplay
 8569 guest     20   0    6124   3912   3160 S   0.0   0.1   0:00.02 ffplay-hi+
10585 guest     20   0 1933392 122376  61892 S   0.0   3.1   1:29.26 ffplay
an this is the 4 most %CPU consuming processes just prior suspend:

Code: Select all

25.12.2022 08:03:43 ____________________________________________________________
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 2014 guest     20   0 2927776  14624   7744 S  11.8   0.4 155:12.10 pulseaudiofunction
10898 guest     20   0    5580   3408   2640 R  11.8   0.1   0:00.03 top
 4838 guest     20   0 1928100 111444  53248 S   5.9   2.8  18:02.78 ffplay
10585 guest     20   0 1933392 122376  61892 S   5.9   3.1   1:29.27 ffplay
and this is /bin/free -m and sx (function)

Code: Select all

sx () 
{ 
    echo $(date +%d.%m.%Y\ %H:%M:%S) ____________________________________________________________;
    { 
        read firstLine;
        echo "$firstLine";
        while read f t s u p; do
            let "s2 = $s / 1024";
            let "u2 = $u / 1024";
            printf '%-40s%-16s%-8s%-8s%-8s\n' $f $t $s2 $u2 $p;
        done
    } < /proc/swaps
}
just prior suspend:

Code: Select all

25.12.2022 08:04:03 ____________________________________________________________
               total        used        free      shared  buff/cache   available
Mem:            3873        1548        1388          71         936        1838
Swap:           4917          60        4857
________________________________________________________________________________[m
Filename				Type		Size	Used	Priority
/dev/sda6                               partition       4607    0       1       
/dev/zram0                              partition       309     60      100     
As you can see, there should have been enough free RAM.

and now finally what got logged into /var/log/messages after initializing loginctl suspend

Code: Select all

Dec 25 08:04:03 porteus NetworkManager[1652]: <info>  [1671951843.9436] manager: sleep: sleep requested (sleeping: no  enabled: yes)
Dec 25 08:04:03 porteus NetworkManager[1652]: <info>  [1671951843.9437] device (eth0): state change: unavailable -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Dec 25 08:04:03 porteus NetworkManager[1652]: <info>  [1671951843.9547] manager: NetworkManager state is now ASLEEP
●● sleeping 10 seconds for loginctl suspend to background-suspend. ●●
Dec 25 08:04:03 porteus ModemManager[1715]: <info>  [sleep-monitor] system is about to suspend 
Dec 25 08:04:04 porteus elogind-daemon[1083]: Suspending system...
Dec 25 08:04:04 porteus kernel: PM: suspend entry (deep)
Dec 25 08:04:04 porteus kernel: Filesystems sync: 0.089 seconds
Dec 25 08:04:04 porteus root: ACPI group jack / action lineout is not defined
Dec 25 08:04:04 porteus root: ACPI group jack / action videoout is not defined
As you can see, still no clear info why the suspend crash happens.

Added in 1 minute 57 seconds:
This line
Rava wrote:
25 Dec 2022, 08:21
●● sleeping 10 seconds for loginctl suspend to background-suspend. ●●
is written by my suspend script.
Cheers!
Yours Rava

Post Reply