| 6 | 1/1 | 返回列表 |
| 查看: 3315 | 回復(fù): 5 | ||
04nylxb木蟲(chóng) (正式寫(xiě)手)
|
[求助]
vasp跨節(jié)點(diǎn)運(yùn)行出錯(cuò),mpiexec_node-1 (handle_stdin_input 1089)
|
|
最近在集群上編譯帶CNEB的vasp5.2,并行vasp編譯成功,在單個(gè)節(jié)點(diǎn)(每個(gè)節(jié)點(diǎn)八核)上運(yùn)行 $ mpirun -np 8 vasp 時(shí)候,top下,發(fā)現(xiàn)確實(shí)出現(xiàn)八個(gè)vasp進(jìn)程。 但是,跨節(jié)點(diǎn)的時(shí)候,確出錯(cuò)了,出錯(cuò)信息如下: running on 15 nodes distr: one band on 1 nodes, 15 groups vasp.5.2.12 11Nov11 complex POSCAR found : 1 types and 2 ions ----------------------------------------------------------------------------- | | | W W AA RRRRR N N II N N GGGG !!! | | W W A A R R NN N II NN N G G !!! | | W W A A R R N N N II N N N G !!! | | W WW W AAAAAA RRRRR N N N II N N N G GGG ! | | WW WW A A R R N NN II N NN G G | | W W A A R R N N II N N GGGG !!! | | | | For optimal performance we recommend that you set | | NPAR = approx SQRT( number of cores) | | This will greatly improve the performance of VASP for DFT. | | The default NPAR=number of cores might be grossly inefficient | | on modern multi-core architectures or massively parallel machines. | | Unfortunately you need to use the default for hybrid, GW and RPA | | calculations. | | | ----------------------------------------------------------------------------- LDA part: xc-table for Pade appr. of Perdew found WAVECAR, reading the header number of bands has changed, file: 12 present: 15 trying to continue reading WAVECAR, but it might fail POSCAR, INCAR and KPOINTS ok, starting setup WARNING: small aliasing (wrap around) errors must be expected FFT: planning ...( 1 ) reading WAVECAR random initialization beyond band 13 the WAVECAR file was read sucessfully initial charge from wavefunction entering main loop N E dE d eps ncg rms rms(c) mpiexec_node-1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_node-1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & rank 14 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 14: killed by signal 11 rank 13 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 13: killed by signal 9 rank 9 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 9: killed by signal 11 rank 8 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 8: killed by signal 11 rank 4 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 4: killed by signal 11 rank 3 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 3: killed by signal 9 rank 2 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 2: killed by signal 9 rank 1 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 1: killed by signal 11 rank 0 in job 14 node-1_49061 caused collective abort of all ranks 其中node-1是我的控制節(jié)點(diǎn)。進(jìn)程數(shù)為12以下的時(shí)候都運(yùn)行正常 $ mpirun -machinefile ~/machinefile -np 12 vasp > 5out 其中,mpich2,我用cpi測(cè)試,各個(gè)節(jié)點(diǎn)都OK的,并且能夠跑上百個(gè)核。 求高人指點(diǎn),為什么vasp跨節(jié)點(diǎn)的時(shí)候出現(xiàn)這樣的錯(cuò)誤?該如何解決?非常感謝啊。 另,想問(wèn)下,編譯的時(shí)候,make makeparam,生成的這個(gè)makeparam是干嘛用的? |

木蟲(chóng) (正式寫(xiě)手)

榮譽(yù)版主 (著名寫(xiě)手)
榮譽(yù)版主 (職業(yè)作家)
木蟲(chóng) (正式寫(xiě)手)
|
非常感謝。 嗯,NPAR我都設(shè)成了并行的核數(shù)了,感覺(jué)這個(gè)節(jié)點(diǎn)數(shù)無(wú)法估計(jì)啊,有時(shí)候任務(wù)調(diào)度系統(tǒng)分配給4個(gè)節(jié)點(diǎn),有時(shí)候分配給10個(gè)節(jié)點(diǎn)。是否不需要嚴(yán)格的節(jié)點(diǎn)數(shù)?按照它說(shuō)的近似corse的開(kāi)方即可? mpi方面,我用的是mpich2,我用Mpi自帶的examples下面的cpi測(cè)試,發(fā)現(xiàn)并行都是順利完成,指定幾個(gè)節(jié)點(diǎn),輸出里面會(huì)有相應(yīng)的節(jié)點(diǎn)運(yùn)行報(bào)告,是否可以說(shuō)mpi安裝是好的? 我昨天測(cè)試運(yùn)行的時(shí)候還發(fā)現(xiàn)一個(gè)問(wèn)題,有時(shí)候去提交任務(wù),-np 64之類的,任務(wù)正常,各個(gè)節(jié)點(diǎn)都會(huì)分配vasp任務(wù),然后過(guò)了一兩個(gè)小時(shí)之后,再次運(yùn)行同樣的任務(wù),vasp又出現(xiàn)上面的錯(cuò)誤了,汗,郁悶啊。 |

榮譽(yù)版主 (職業(yè)作家)
| 6 | 1/1 | 返回列表 |
| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 285化工學(xué)碩求調(diào)劑(081700) +9 | 柴郡貓_ 2026-03-12 | 9/450 |
|
|---|---|---|---|---|
|
[考研] 278求調(diào)劑 +3 | Yy7400 2026-03-13 | 3/150 |
|
|
[考研] 286求調(diào)劑 +3 | lemonzzn 2026-03-16 | 5/250 |
|
|
[考研] 一志愿華中師范071000,325求調(diào)劑 +6 | RuitingC 2026-03-12 | 6/300 |
|
|
[考研] 277材料科學(xué)與工程080500求調(diào)劑 +3 | 自由煎餅果子 2026-03-16 | 3/150 |
|
|
[考研] 326求調(diào)劑 +3 | mlpqaz03 2026-03-15 | 3/150 |
|
|
[考研] 080500,材料學(xué)碩302分求調(diào)劑學(xué)校 +4 | 初識(shí)可樂(lè) 2026-03-14 | 5/250 |
|
|
[基金申請(qǐng)] 現(xiàn)在如何回避去年的某一個(gè)專家,不知道名字 +3 | zk200107 2026-03-12 | 6/300 |
|
|
[考研] 復(fù)試調(diào)劑 +9 | Copy267 2026-03-10 | 9/450 |
|
|
[考研] 304求調(diào)劑 +6 | Mochaaaa 2026-03-12 | 7/350 |
|
|
[考研] 材料工程調(diào)劑 +9 | 咪咪空空 2026-03-12 | 9/450 |
|
|
[考研] 求材料調(diào)劑 085600英一數(shù)二總分302 前三科235 精通機(jī)器學(xué)習(xí) 一志愿哈工大 +4 | 林yaxin 2026-03-12 | 4/200 |
|
|
[考研] 315求調(diào)劑 +9 | 小羊小羊_ 2026-03-11 | 10/500 |
|
|
[考研] 材料工程調(diào)劑 +4 | 咪咪空空 2026-03-11 | 4/200 |
|
|
[考研] 【0856】化學(xué)工程(085602)313 分,本科學(xué)科評(píng)估A類院;瘜W(xué)工程與工藝,誠(chéng)求調(diào)劑 +7 | 小劉快快上岸 2026-03-11 | 7/350 |
|
|
[考研] 290求調(diào)劑 +7 | ADT 2026-03-12 | 7/350 |
|
|
[考研] 308求調(diào)劑 +3 | 是Lupa啊 2026-03-12 | 3/150 |
|
|
[考研] 一志愿山大07化學(xué) 332分 四六級(jí)已過(guò) 本科山東雙非 求調(diào)劑! +3 | 不想理你 2026-03-12 | 3/150 |
|
|
[考研] 0817化學(xué)工程與技術(shù)考研312分調(diào)劑 +3 | T123 tt 2026-03-12 | 3/150 |
|
|
[考研] 調(diào)劑 +5 | 呵唔哦豁 2026-03-10 | 5/250 |
|