| 5 | 1/1 | 返回列表 |
| 查看: 3322 | 回復(fù): 5 | ||
| 當(dāng)前只顯示滿足指定條件的回帖,點(diǎn)擊這里查看本話題的所有回帖 | ||
04nylxb木蟲 (正式寫手)
|
[求助]
vasp跨節(jié)點(diǎn)運(yùn)行出錯,mpiexec_node-1 (handle_stdin_input 1089)
|
|
|
最近在集群上編譯帶CNEB的vasp5.2,并行vasp編譯成功,在單個節(jié)點(diǎn)(每個節(jié)點(diǎn)八核)上運(yùn)行 $ mpirun -np 8 vasp 時候,top下,發(fā)現(xiàn)確實(shí)出現(xiàn)八個vasp進(jìn)程。 但是,跨節(jié)點(diǎn)的時候,確出錯了,出錯信息如下: running on 15 nodes distr: one band on 1 nodes, 15 groups vasp.5.2.12 11Nov11 complex POSCAR found : 1 types and 2 ions ----------------------------------------------------------------------------- | | | W W AA RRRRR N N II N N GGGG !!! | | W W A A R R NN N II NN N G G !!! | | W W A A R R N N N II N N N G !!! | | W WW W AAAAAA RRRRR N N N II N N N G GGG ! | | WW WW A A R R N NN II N NN G G | | W W A A R R N N II N N GGGG !!! | | | | For optimal performance we recommend that you set | | NPAR = approx SQRT( number of cores) | | This will greatly improve the performance of VASP for DFT. | | The default NPAR=number of cores might be grossly inefficient | | on modern multi-core architectures or massively parallel machines. | | Unfortunately you need to use the default for hybrid, GW and RPA | | calculations. | | | ----------------------------------------------------------------------------- LDA part: xc-table for Pade appr. of Perdew found WAVECAR, reading the header number of bands has changed, file: 12 present: 15 trying to continue reading WAVECAR, but it might fail POSCAR, INCAR and KPOINTS ok, starting setup WARNING: small aliasing (wrap around) errors must be expected FFT: planning ...( 1 ) reading WAVECAR random initialization beyond band 13 the WAVECAR file was read sucessfully initial charge from wavefunction entering main loop N E dE d eps ncg rms rms(c) mpiexec_node-1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null mpiexec_node-1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out < /dev/null & rank 14 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 14: killed by signal 11 rank 13 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 13: killed by signal 9 rank 9 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 9: killed by signal 11 rank 8 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 8: killed by signal 11 rank 4 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 4: killed by signal 11 rank 3 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 3: killed by signal 9 rank 2 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 2: killed by signal 9 rank 1 in job 14 node-1_49061 caused collective abort of all ranks exit status of rank 1: killed by signal 11 rank 0 in job 14 node-1_49061 caused collective abort of all ranks 其中node-1是我的控制節(jié)點(diǎn)。進(jìn)程數(shù)為12以下的時候都運(yùn)行正常 $ mpirun -machinefile ~/machinefile -np 12 vasp > 5out 其中,mpich2,我用cpi測試,各個節(jié)點(diǎn)都OK的,并且能夠跑上百個核。 求高人指點(diǎn),為什么vasp跨節(jié)點(diǎn)的時候出現(xiàn)這樣的錯誤?該如何解決?非常感謝啊。 另,想問下,編譯的時候,make makeparam,生成的這個makeparam是干嘛用的? |

榮譽(yù)版主 (著名寫手)
木蟲 (正式寫手)

榮譽(yù)版主 (職業(yè)作家)
木蟲 (正式寫手)
|
非常感謝。 嗯,NPAR我都設(shè)成了并行的核數(shù)了,感覺這個節(jié)點(diǎn)數(shù)無法估計啊,有時候任務(wù)調(diào)度系統(tǒng)分配給4個節(jié)點(diǎn),有時候分配給10個節(jié)點(diǎn)。是否不需要嚴(yán)格的節(jié)點(diǎn)數(shù)?按照它說的近似corse的開方即可? mpi方面,我用的是mpich2,我用Mpi自帶的examples下面的cpi測試,發(fā)現(xiàn)并行都是順利完成,指定幾個節(jié)點(diǎn),輸出里面會有相應(yīng)的節(jié)點(diǎn)運(yùn)行報告,是否可以說mpi安裝是好的? 我昨天測試運(yùn)行的時候還發(fā)現(xiàn)一個問題,有時候去提交任務(wù),-np 64之類的,任務(wù)正常,各個節(jié)點(diǎn)都會分配vasp任務(wù),然后過了一兩個小時之后,再次運(yùn)行同樣的任務(wù),vasp又出現(xiàn)上面的錯誤了,汗,郁悶啊。 |

| 最具人氣熱帖推薦 [查看全部] | 作者 | 回/看 | 最后發(fā)表 | |
|---|---|---|---|---|
|
[考研] 考研化學(xué)308分求調(diào)劑 +5 | 你好明天你好 2026-03-23 | 6/300 |
|
|---|---|---|---|---|
|
[考研] 070300,一志愿北航320求調(diào)劑 +3 | Jerry0216 2026-03-22 | 5/250 |
|
|
[考研] 石河子大學(xué)(211、雙一流)碩博研究生長期招生公告 +3 | 李子目 2026-03-22 | 3/150 |
|
|
[考研] 尋找調(diào)劑 +4 | 倔強(qiáng)芒? 2026-03-21 | 4/200 |
|
|
[考研] 298求調(diào)劑一志愿211 +3 | 上岸6666@ 2026-03-20 | 3/150 |
|
|
[考研] 求調(diào)劑 +5 | Zhangbod 2026-03-21 | 7/350 |
|
|
[考博] 招收博士1-2人 +3 | QGZDSYS 2026-03-18 | 4/200 |
|
|
[考研] 0805 316求調(diào)劑 +3 | 大雪深藏 2026-03-18 | 3/150 |
|
|
[考研] 求調(diào)劑 +3 | .m.. 2026-03-21 | 4/200 |
|
|
[考研] 材料與化工(0856)304求 B區(qū) 調(diào)劑 +3 | 邱gl 2026-03-21 | 3/150 |
|
|
[考研]
|
然11 2026-03-19 | 4/200 |
|
|
[考研] 中南大學(xué)化學(xué)學(xué)碩337求調(diào)劑 +3 | niko- 2026-03-19 | 6/300 |
|
|
[考研] 一志愿南理工085701環(huán)境302求調(diào)劑院校 +3 | 葵梓衛(wèi)隊(duì) 2026-03-20 | 3/150 |
|
|
[考研] 298-一志愿中國農(nóng)業(yè)大學(xué)-求調(diào)劑 +9 | 手機(jī)用戶 2026-03-17 | 9/450 |
|
|
[考研] 085410人工智能專碩317求調(diào)劑(0854都可以) +4 | xbxudjdn 2026-03-18 | 4/200 |
|
|
[考研] 一志愿福大288有機(jī)化學(xué),求調(diào)劑 +3 | 小木蟲200408204 2026-03-18 | 3/150 |
|
|
[考研] 【同濟(jì)軟件】軟件(085405)考研求調(diào)劑 +3 | 2026eternal 2026-03-18 | 3/150 |
|
|
[考研] 考研求調(diào)劑 +3 | 橘頌. 2026-03-17 | 4/200 |
|
|
[考研] 考研調(diào)劑 +3 | 淇ya_~ 2026-03-17 | 5/250 |
|
|
[考研] 283求調(diào)劑 +3 | 聽風(fēng)就是雨; 2026-03-16 | 3/150 |
|