Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 9
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 12741 times and has 8 replies Next Thread
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Linux kernel crash "general protection fault" - DSFL related?

Crash occurred while crunching WCG WUs, so don't know if the crash was caused by a boinc related process or if it was just the device itself.

My monitor screen was turned off like I leave it usually while running boinc. At 07-Sep-2011 20:43:11, I pressed the monitor's ON button, then a key on the keyboard to wake up the screen. I then saw the following crash on the console screen:

Sep 7 15:26:15 sonata kernel: [362640.136448] general protection fault: 0000 [#1] SMP
Sep 7 15:26:15 sonata kernel: [362640.136527] last sysfs file: /sys/devices/virtual/sound/timer/uevent
Sep 7 15:26:15 sonata kernel: [362640.136610] CPU 3
Sep 7 15:26:15 sonata kernel: [362640.136637] Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 lp fuse snd_hda_codec_via snd_hda_intel radeon ttm drm_kms_helper snd_hda_codec snd_hwdep snd_pcm snd_timer snd drm atl1c agpgart i2c_algo_bit rtc_cmos ppdev soundcore rtc_core shpchp snd_page_alloc processor i2c_piix4 i2c_core k10temp wmi thermal_sys parport_pc parport button psmouse serio_raw evdev sg rtc_lib hwmon
Sep 7 15:26:15 sonata kernel: [362640.137001]
Sep 7 15:26:15 sonata kernel: [362640.137001] Pid: 2281, comm: boinc Not tainted 2.6.37.6 #3 MSI MS-7623/880GM-P51 (MS-7623)
Sep 7 15:26:15 sonata kernel: [362640.137001] RIP: 0010:[<ffffffff8106d601>] [<ffffffff8106d601>] __wake_up_bit+0x11/0x40
Sep 7 15:26:15 sonata kernel: [362640.137001] RSP: 0018:ffff8801ff0d5cf8 EFLAGS: 00010282
Sep 7 15:26:15 sonata kernel: [362640.137001] RAX: 83fc16a3a8573388 RBX: ffffea0004ee4cd0 RCX: 0000000000000040
Sep 7 15:26:15 sonata kernel: [362640.137001] RDX: 0000000000000000 RSI: ffffea0004ee4cd0 RDI: 83fc16a3a8573380
Sep 7 15:26:15 sonata kernel: [362640.137001] RBP: ffff8801ff0d5d08 R08: 6680000000000000 R09: a80013b933400000
Sep 7 15:26:15 sonata kernel: [362640.137001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffffffffff
Sep 7 15:26:15 sonata kernel: [362640.137001] R13: ffff8801636324c8 R14: ffff8801ff0d5d68 R15: 0000000000000000
Sep 7 15:26:15 sonata kernel: [362640.137001] FS: 00007f1762286740(0000) GS:ffff8800cfcc0000(0000) knlGS:00000000f70c86d0
Sep 7 15:26:15 sonata kernel: [362640.137001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 7 15:26:15 sonata kernel: [362640.137001] CR2: 0000000000da8000 CR3: 0000000216f0d000 CR4: 00000000000006e0
Sep 7 15:26:15 sonata kernel: [362640.137001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 7 15:26:15 sonata kernel: [362640.137001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 7 15:26:15 sonata kernel: [362640.137001] Process boinc (pid: 2281, threadinfo ffff8801ff0d4000, task ffff8801ff1aafb0)
Sep 7 15:26:15 sonata kernel: [362640.137001] Stack:
Sep 7 15:26:15 sonata kernel: [362640.137001] ffffea0004ef4c60 ffffea0004ee4cd0 ffff8801ff0d5d28 ffffffff810db20a
Sep 7 15:26:15 sonata kernel: [362640.137001] 0000000000000003 0000000000000003 ffff8801ff0d5e18 ffffffff810e6c1f
Sep 7 15:26:15 sonata kernel: [362640.137001] ffff880212c26800 ffffea0004ee4cd0 ffff880100000002 ffff8801ff0d5d90
Sep 7 15:26:15 sonata kernel: [362640.137001] Call Trace:
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff810db20a>] unlock_page+0x2a/0x40
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff810e6c1f>] truncate_inode_pages_range+0x15f/0x450
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff811560bd>] ? fsnotify_clear_marks_by_inode+0x2d/0xf0
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff810e6f25>] truncate_inode_pages+0x15/0x20
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff812040cf>] ext4_evict_inode+0xaf/0x370
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff81136be7>] evict+0x27/0xc0
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff81137019>] iput+0x1b9/0x290
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff8112c845>] do_unlinkat+0x115/0x1c0
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff81124543>] ? sys_newlstat+0x33/0x40
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff8112e436>] sys_unlink+0x16/0x20
Sep 7 15:26:15 sonata kernel: [362640.137001] [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
Sep 7 15:26:15 sonata kernel: [362640.137001] Code: 41 5c 48 c1 e0 03 48 03 02 c9 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 66 66 66 66 90 48 8d 47 08 <48> 39 47 08 48 89 75 f0 89 55 f8 74 13 48 8d 4d f0 ba 01 00 00
Sep 7 15:26:15 sonata kernel: [362640.137001] RIP [<ffffffff8106d601>] __wake_up_bit+0x11/0x40
Sep 7 15:26:15 sonata kernel: [362640.137001] RSP <ffff8801ff0d5cf8>
Sep 7 15:26:15 sonata kernel: [362640.224043] ---[ end trace 7af7d5d84938c847 ]---


I presume "Sep 7 15:26:15 sonata kernel: [362640.136610] CPU 3" means the crashed occured on CPU number 3? And the running program "Sep 7 15:26:15 sonata kernel: [362640.137001] Process boinc (pid: 2281, threadinfo ffff8801ff0d4000, task ffff8801ff1aafb0)"

I was no where near my device at 15:26. Might have been electrical related, we had some thunderstorms roll through the area all afternoon, but the house never lost power.

Boinc slot 3 stderr.txt file for WU DSFL_00000007_0000045_0010_0_0 crash time logged entries:

[15:13:00] Starting job 27,CPU time is 17574.146311.
[15:13:00] ZINC20563655.pdbqt size = 33 5 ../../projects/www.worldcommunitygrid.org/dsfl.target_00000007.pdbqt size = 13798 0
15:26:44 (5541): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting


I opened boinc manager and the messages show the manager started again at the same time I pressed the keyboard to wake up the screen:

07-Sep-2011 8:43:11 PM EDT Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
---snip---

Time passed by and my monitor went to sleep as it does normally when there's no keyboard activity. Then sometime later I woke it back up to do some more researching I noticed in the boinc manager messages that the manager started again:

07-Sep-2011 10:11:32 PM EDT Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
---snip---


So, I'll restart the device and see if things calm down. I'll also do some hardware checks.


Here's the first part of the boinc messages about my device (my local preferences has CPU set to run at 50% - keeps heat down):

Wed 07 Sep 2011 10:11:32 PM EDT Starting BOINC client version 6.10.58 for x86_64-pc-linux-gnu
Wed 07 Sep 2011 10:11:32 PM EDT log flags: file_xfer, sched_ops, task
Wed 07 Sep 2011 10:11:32 PM EDT Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.5 c-ares/1.5.1
Wed 07 Sep 2011 10:11:32 PM EDT Data directory: /home/chris/BOINC
Wed 07 Sep 2011 10:11:32 PM EDT Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 840 Processor [Family 16 Model 5 Stepping 3]
Wed 07 Sep 2011 10:11:32 PM EDT Processor: 512.00 KB cache
Wed 07 Sep 2011 10:11:32 PM EDT Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt
Wed 07 Sep 2011 10:11:32 PM EDT OS: Linux: 2.6.37.6
Wed 07 Sep 2011 10:11:32 PM EDT Memory: 7.55 GB physical, 4.00 GB virtual
Wed 07 Sep 2011 10:11:32 PM EDT Disk: 134.61 GB total, 126.82 GB free
Wed 07 Sep 2011 10:11:32 PM EDT Local time is UTC -4 hours
Wed 07 Sep 2011 10:11:32 PM EDT No usable GPUs found
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid URL http://www.worldcommunitygrid.org/; Computer ID 1704364; resource share 100
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid General prefs: from World Community Grid (last modified 30-Aug-2011 09:26:31)
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid Host location: none
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid General prefs: using your defaults
Wed 07 Sep 2011 10:11:32 PM EDT Reading preferences override file
Wed 07 Sep 2011 10:11:32 PM EDT Preferences:
Wed 07 Sep 2011 10:11:32 PM EDT max memory usage when active: 5799.20MB
Wed 07 Sep 2011 10:11:32 PM EDT max memory usage when idle: 6959.04MB
Wed 07 Sep 2011 10:11:32 PM EDT max disk usage: 10.00GB
Wed 07 Sep 2011 10:11:32 PM EDT don't use GPU while active
Wed 07 Sep 2011 10:11:32 PM EDT (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Wed 07 Sep 2011 10:11:32 PM EDT Not using a proxy
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid Restarting task DSFL_00000007_0000045_0010_0 using dsfl version 619
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid Restarting task ov347_00003_14 using hpf2 version 640
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid Restarting task c4cw_target04_115080703_0 using c4cw version 641
Wed 07 Sep 2011 10:11:32 PM EDT World Community Grid Restarting task DSFL_00000007_0000053_0285_0 using dsfl version 619
[Sep 8, 2011 2:45:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

Hi,

Don't know what a ''sonata kernel'' is. Google spits this out: http://www.google.it/search?q=sonata+kernel so maybe you see which of those hits ring a bell, maybe something ATA/SATA related????

At any rate, that initial signal at 15:26:15 and the heartbeat loss after 30 seconds at 15:26:44 tie. BOINC will then kill the task/app if it was unable to talk to the science app for that timespan.

--//--
[Sep 8, 2011 6:44:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BSD
Senior Cruncher
Joined: Apr 27, 2011
Post Count: 224
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

Sonata is the computer host name. Kernel is, well you know. wink
[Sep 8, 2011 10:53:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

That's one for the good laugh department. I'll have a look what it says in me event log on Ubuntu (11.04) desktop, 2.6.38-11.47 kernel.

6.10.59 or 6.12.33 may be the upgrade you'd like to apply from the repository (better integration I've read). No complaints on my part. DSFL steady as a rock ever since the 6.16 beta, no fails. Of course, I'm an Intel man. Just can't bring myself to ever buy an AMD, lest it would be in something unfit for crunching in general. Again some problems over none-matching results in the quorum when Intel and AMD meet. Not heard anything on CPU groupings in that respect.

--//--

edit: add "kernel version"
----------------------------------------
[Edit 2 times, last edit by Former Member at Sep 10, 2011 2:03:20 PM]
[Sep 8, 2011 11:23:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

Hi SekeRob,
currently the situation by me is 4 AMD vs. 2 Intel !
Excepted the experienced pairing problem during the August Betas, I do not have any problem to report because of AMD. Furthermore the price/performance ratio is much more attractive with AMD CPUs than with Intel CPUs.
My feeling is that windows based systems are better granted than Linux based systems, although the latters are much more efficient.
The Phenom II and Athlon II are pretty reliable for WCG projects.
Cheers,
Yves
----------------------------------------
[Sep 9, 2011 10:26:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

... I'll have a look what it says in me event log on Ubuntu (11.04) desktop, 2.6.38.47 kernel.
I have Ubuntu (11.04) desktop, but on 2.6.38.11 kernel. How do I get that 2.6.38.47 kernel and next have it installed on my AMD-1090T machine? Any advantage in crunching DSFL_v6.19 WUs using BOINC-supplied BOINC_v6.10.59 on that kernel version for the said AMD machine?

... Again some problems over none-matching results in the quorum when Intel and AMD meet. Not heard anything on CPU groupings in that respect.
I guess this is not the first time that the Intel-AMD mismatch concern arose, but in any case, what is the current state of the resolution of the matter? Is it true that in essence and in the over-all, we can say something like -- Intel-chips define the reference that spells out what WU goes as valid, invalid, or inconclusive?

I just had my first DSFL WU deemed as inconclusive: DSFL_00000008_0000046_ 0403_ 0. If all other things are held the same, would things have gone differently had the WU been crunched on an Intel-chip?
;
----------------------------------------
[Edit 1 times, last edit by Former Member at Sep 10, 2011 6:25:12 AM]
[Sep 10, 2011 6:21:19 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

Neither one is actually "right". The one that goes invalid is the odd man out. Since Intel has a larger proportion of the population, it's more likely to be the repair wingman, thus the two Intel results validate. If the repair goes to AMD, the Intel result will flag invalid, it's just luck that way.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Sep 10, 2011 7:07:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

I do not really agree with the previous statements regarding pairing. Within one week, I contributed to DSFL with around 130 days valid results. The involved hosts are 2 Phenom II x6, 1 Athlon II x2 (the three running Ubuntu 10.04 x64) and 1 Intel Q6600 (running Win XP Pro SP3). The number of invalid or errored results is less 10 (of >500).
The pairing issue appeared in August with the 23th beta runs. After improvements done by WCG, the following beta run was pretty OK.
You should not be afraid to use non Intel-based hosts.
Until September 2010, I used only Intel CPUs, now I am moving more and more to AMD.
Cheers,
Yves
----------------------------------------
[Sep 10, 2011 8:03:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Linux kernel crash "general protection fault" - DSFL related?

... I'll have a look what it says in me event log on Ubuntu (11.04) desktop, 2.6.38.47 kernel.
I have Ubuntu (11.04) desktop, but on 2.6.38.11 kernel. How do I get that 2.6.38.47 kernel and next have it installed on my AMD-1090T machine? Any advantage in crunching DSFL_v6.19 WUs using BOINC-supplied BOINC_v6.10.59 on that kernel version for the said AMD machine?

Sorry, forgot the major in the build. Now 2.6.38-11.49

--//--
[Sep 10, 2011 2:10:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread