| 5 | 1/1 | 返回列表 |
| 查看: 1900 | 回復(fù): 5 | ||
| 當(dāng)前只顯示滿足指定條件的回帖,點(diǎn)擊這里查看本話題的所有回帖 | ||
04nylxb木蟲 (正式寫手)
|
[求助]
請問MS-linux-cluster安裝之后無法并行的問題?
|
|
|
我是讓MS用rsh進(jìn)行的,cluster上已經(jīng)成功配置好rsh的免認(rèn)證登陸了,我在運(yùn)行的時候,總是提示,然后很快就failed bash: /opt/hpmpi/bin/mpid: No such file or directory bash: /opt/hpmpi/bin/mpid: No such file or directory 求指點(diǎn)安裝完之后,是否還需要對hpmpi作一些配置? (rsh配置好的,是免認(rèn)證登錄的,rsh nodexx 不需要任何密碼就切換了) 求高人指點(diǎn)……不勝感激。 我將gw-info.sbd和gwparams.cfg的cpucorestotal都改成總數(shù)了,64。并修改了mpi運(yùn)行參數(shù),支持ib。 選擇4個進(jìn)程(單節(jié)點(diǎn))運(yùn)行時,出現(xiàn)這樣的錯誤:求指點(diǎn),貌似是mpi出問題了 Current trace stack: model_write_occ_eigenvalues model_write_all model_write geom_BFGS geometry_optimise castep Trapped SIGINT or SIGTERM. Exiting... Trapped SIGINT or SIGTERM. Exiting... Trapped SIGINT or SIGTERM. Exiting... MPI_CPU_AFFINITY set to RANK, setting affinity of rank 2 pid 23177 on host master to cpu 2 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 3 pid 23178 on host master to cpu 3 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 1 pid 23176 on host master to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 0 pid 23175 on host master to cpu 0 MPI Application rank 0 exited before MPI_Finalize() with status 1 ———————————————————————————————————————— (分割線下是舊問題,呵呵) att,ms并行機(jī)安裝,安裝過程提示 should hpmpi use ssh? [Y/n] 我十六個節(jié)點(diǎn),配置的是rsh的免認(rèn)證登陸,ssh沒有配置,如果我上面選擇no 的話,是否后面進(jìn)行并行計(jì)算的時候就是以rsh的方式進(jìn)行了呢? 求高人指點(diǎn),呵呵。 另,linux下如何卸載ms?因?yàn)槲抑坝胢si賬戶安裝的時候選擇了ssh,結(jié)果忘了并行機(jī)上配置的是rsh,如果配置ssh的話,比較麻煩。 我將主節(jié)點(diǎn)整個home目錄都做成了nfs共享到各個計(jì)算節(jié)點(diǎn)了,這樣在主節(jié)點(diǎn)master上 生成一個key的時候,就被共享到其它節(jié)點(diǎn)了 ($ ssh-keygen -t rsa,默認(rèn)生成~./ssh id_rsa id_rsa.pub)。在其它節(jié)點(diǎn)進(jìn)行同樣操作的時候,當(dāng)生成密鑰的時候,也是放在home下面的,這時候因?yàn)橹鞴?jié)點(diǎn)共享了home,就會提示已經(jīng)有key存在了,需要覆蓋,暈。然后我就不知道該如何解決了。 請指點(diǎn)。 [ Last edited by 04nylxb on 2011-6-22 at 08:31 ] |

木蟲 (正式寫手)

木蟲 (著名寫手)
|
不要共享home,只共享MS 安裝目錄; 每個節(jié)點(diǎn)上都要安裝HPMPI; 不要用root用戶安裝MS; 每個節(jié)點(diǎn)上的tmp路徑都要有讀寫權(quán)限; 盡可能用SSH計(jì)算,配置不麻煩,按手冊上做一遍就行了,個人認(rèn)為比RSH簡單,現(xiàn)在的linux分發(fā)版本默認(rèn)是沒有安裝RSH服務(wù)的,那個RSH服務(wù)要額外安裝,而SSH是直接配置就可以用的; 安裝MS 要加--type cluster參數(shù); |

木蟲 (正式寫手)
|
嗯,非常感謝啊。系統(tǒng)管理員為了管理方便,就將整個home都作了共享。嗯,各個節(jié)點(diǎn)都安裝了hpmpi。 現(xiàn)在上面的問題解決了,hpmpi通了, 我用dmol3試了下,發(fā)覺計(jì)算完全正常,汗。CASTEP就出現(xiàn)問題,新問題如下,求指點(diǎn),呵呵 Job started on host master at Wed Jun 22 21:55:51 2011 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 1 pid 1156 on host master to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 2 pid 1157 on host master to cpu 2 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 0 pid 1155 on host master to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 3 pid 1158 on host master to cpu 3 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 13 pid 7947 on host node3 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 14 pid 7948 on host node3 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 25 pid 15375 on host node6 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 21 pid 6099 on host node5 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 23 pid 6101 on host node5 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 26 pid 15376 on host node6 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 4 pid 24707 on host node1 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 15 pid 7949 on host node3 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 12 pid 7946 on host node3 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 27 pid 15377 on host node6 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 24 pid 15374 on host node6 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 20 pid 6098 on host node5 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 22 pid 6100 on host node5 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 6 pid 24709 on host node1 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 7 pid 24710 on host node1 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 5 pid 24708 on host node1 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 8 pid 8980 on host node2 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 10 pid 8982 on host node2 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 30 pid 14483 on host node7 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 31 pid 14484 on host node7 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 9 pid 8981 on host node2 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 11 pid 8983 on host node2 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 29 pid 14482 on host node7 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 28 pid 14481 on host node7 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 17 pid 17856 on host node4 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 18 pid 17857 on host node4 to cpu 0 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 19 pid 17858 on host node4 to cpu 1 MPI_CPU_AFFINITY set to RANK, setting affinity of rank 16 pid 17855 on host node4 to cpu 0 warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc arning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc arning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc warning:regcache incompatible with malloc MX:node6:mx__connect_common(00:60:dd:48:d9:57):error 36(errno=3) estination NIC not found in network tableMPI Application rank 26 exited before MPI_Finalize() with status 1 MX:node3:mx__connect_common(00:60:dd:48:d9:28):error 36(errno=3) estination NIC not found in network tableMX:node3:mx__connect_common(00:60:dd:48:d9:28):error 36(errno=3) estination NIC not found in network tableMX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MMX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node4:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MMX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node1:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) X:node1:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node4:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MPI Application rank 15 exited before MPI_Finalize() with status 1 MX:node5:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node2:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MX:node7:Remote endpoint is closed, peer=00:60:dd:48:d8:f0 (node3:0) MPI Application rank 29 exited before MPI_Finalize() with status 1 MPI Application rank 21 exited before MPI_Finalize() with status 1 forrtl: error (78): process killed (SIGTERM) Image PC Routine Line Source libpthread.so.0 0096D21A Unknown Unknown Unknown libmyriexpress.so B6F7535D Unknown Unknown Unknown libmpi.so.1 B7A3401F Unknown Unknown Unknown libmpi.so.1 B7A10622 Unknown Unknown Unknown libmpi.so.1 B7A0FFCB Unknown Unknown Unknown libmpi.so.1 B7A60BDF Unknown Unknown Unknown libmpi.so.1 B7A6AF17 Unknown Unknown Unknown castepexe_mpi.exe 080A68E9 Unknown Unknown Unknown castepexe_mpi.exe 08F5D992 Unknown Unknown Unknown …………………… |

木蟲 (著名寫手)
|
你用InfiniBand?這技術(shù)我沒有接觸過,如果dmol3可以使用,但castep不可用的話,先檢查一下是否是lic的問題,如果不是的話硬件配置問題可能性較大了,下面是網(wǎng)上查到的一些信息,希望對你有用。你可以讓你的管理員看一下這段文字,看能否處理一下。 What is warning:regcache incompatible with malloc ? Myrinet MX uses registration cache (see the "Acronyms in high performance interconnect world" table above) to achieve higher performance. When registration cache feature is enabled, Myrinet MX will manage all memory allocations by itself, i.e. it has its own implemetation of malloc, free, realloc, mremap, munmap, sbrk, etc (see mx__regcache.c in libmyriexpress package) The warning message in question pops up when mx__regcache_works returns 0. For Linux, this means when calling a pair of malloc/free, the variable mx__hook_triggered is not triggerred. Registration cache checks can be disabled by setting the environmental variable MX_RCACHE to 2. Registration cache can sometimes cause weird errors. It can be disabled by setting the environmental variable MX_RCACHE to 0. |

| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 070300化學(xué)求調(diào)劑 +5 | 起個名咋這么難 2026-03-27 | 5/250 |
|
|---|---|---|---|---|
|
[考研] 279求調(diào)劑 +3 | qazplm0852 2026-04-02 | 3/150 |
|
|
[考研] 求生物學(xué)調(diào)劑 +10 | 15172915737 2026-04-01 | 10/500 |
|
|
[考研] 求調(diào)劑 302分初試 0854 +5 | 伶可樂 2026-04-02 | 5/250 |
|
|
[考研] 321求調(diào)劑 一志愿 浙江工業(yè)大學(xué)生物醫(yī)藥 +5 | 嘿嘿HC 2026-04-01 | 6/300 |
|
|
[考研] 261求B區(qū)調(diào)劑 +5 | 明仔· 2026-04-01 | 7/350 |
|
|
[考研]
|
廈大化工 2026-04-01 | 5/250 |
|
|
[考研] 367求調(diào)劑 +8 | 芋泥啵! 2026-03-28 | 8/400 |
|
|
[考研] 化學(xué)工程專碩324分,一志愿中國礦業(yè)大學(xué)求調(diào)劑 +7 | 耿耿1314 2026-04-01 | 7/350 |
|
|
[考研] 291求調(diào)劑 +20 | Y-cap 2026-03-29 | 25/1250 |
|
|
[考研] 295材料工程專碩求調(diào)劑 +19 | 1428151015 2026-03-27 | 19/950 |
|
|
[考研] 085600,320分求調(diào)劑 +5 | 大饞小子 2026-04-01 | 6/300 |
|
|
[考研] 358求調(diào)劑 +3 | 王向陽花 2026-03-31 | 3/150 |
|
|
[考研] 254材料與化工求調(diào)劑 +3 | 翰冬林楠 2026-03-30 | 4/200 |
|
|
[考研] 286求調(diào)劑 +6 | Faune 2026-03-30 | 6/300 |
|
|
[考研] 一志愿西電085401數(shù)一英一299求調(diào)劑 六級521 +4 | 愛吃大鴨梨 2026-03-31 | 4/200 |
|
|
[考研] 266分,求材料相關(guān)專業(yè)調(diào)劑 +10 | 哇呼哼呼哼 2026-03-30 | 12/600 |
|
|
[考研] 085602 化工專碩 338分 求調(diào)劑 +12 | 路癡小琪 2026-03-27 | 12/600 |
|
|
[考研] 材料與化工(0856)304求B區(qū)調(diào)劑 +8 | 邱gl 2026-03-27 | 8/400 |
|
|
[考研] 352分 化工與材料 +5 | 海納百川Ly 2026-03-27 | 5/250 |
|