?

基于第一性原理量子輸運模擬的并行計算

2015-12-31 21:46張若興侯士敏國家核電技術公司北京軟件技術中心國家能源核電軟件重點實驗室北京0009北京大學電子學系納米器件物理與化學教育部重點實驗室北京0087
計算物理 2015年6期
關鍵詞:電子密度格林進程

張若興, 侯士敏, 丑 強(.國家核電技術公司北京軟件技術中心國家能源核電軟件重點實驗室,北京 0009;.北京大學電子學系納米器件物理與化學教育部重點實驗室,北京 0087)

基于第一性原理量子輸運模擬的并行計算

張若興1, 侯士敏2, 丑 強1
(1.國家核電技術公司北京軟件技術中心國家能源核電軟件重點實驗室,北京 100029;
2.北京大學電子學系納米器件物理與化學教育部重點實驗室,北京 100871)

為了解決基于第一性原理分析計算大尺度量子輸運體系時遇到的耗時長久問題,挖掘密度泛函理論與非平衡格林函數相結合方法(DFT+NEGF方法)在自洽迭代過程中的計算熱點,就計算電子密度矩陣時的能量點積分和計算格林函數時的矩陣求逆/乘法運算提出MPI/OpenMP并行計算方案.能量點積分采用MPI多進程并行方案,在數據初始化時需要將稀疏矩陣和積分能量點依照輪詢調度算法分配給各進程.矩陣求逆/乘法的并行化既可調用ScaLAPACK子程序實現又可調用Intel MKL數學庫中的OpenMP多線程加速函數實現.由于不同能量點計算的獨立性,能量點積分采用的MPI并行計算獲得近乎線性的加速比曲線.由于OpenMP多線程并行采用的是基于共享內存的數據交換機制以及線程間切換通信開銷小,矩陣求逆/乘法運算的OpenMP并行實現在計算效率上要優于而在程序的可擴展性上要劣于MPI多進程并行實現.

第一性原理計算;密度泛函理論;格林函數;MPI;并行計算

0 引言

進入量子領域的電子器件不再遵從傳統的物理規律而是具有顯著的量子效應和電導漲落特性[1],因此探索下一代電子器件的發展方向和研究它們的導電機理成為人們關注的焦點[2-3].在此方面,基于量子力學的第一性原理計算要比實驗研究更有優勢,因為理論研究可以低成本地模擬各種構型的量子器件并且就它們表現出來的電子輸運特性給出物理解釋.由于量子輸運體系是無限大非周期的開放體系,在計算化學和凝聚態物理領域中被廣泛應用的處理孤立體系或者周期體系的基態密度泛函理論并不是直接適用的[4].Meir 和Wingreen于1992年在Keldysh的非平衡格林函數理論的基礎上給出了普適的電流計算公式[5-6],然而公式中代表能量和相位弛豫的非彈性輸運貢獻的電子自能項又往往是很難獲得的.所以后續發展了一種近似處理方法:用密度泛函理論[7-8](Density Functional Theory,DFT)中的交換關聯能代替非平衡格林函數[5,9-11](Non-Equilibrium Green’s Function,NEGF)中的電子相互作用自能.此方法就是被廣泛采用的DFT 與NEGF相結合的方法,記作DFT+NEGF方法.簡言之,在采用局域原子軌道基組和電極邊界滿足屏蔽近似的條件下,不可計算的開放體系分為可以進行傳統DFT計算的周期電極部分和帶有電極邊界修正的有限大小的擴展分子部分[12-13].

在實驗上,一個單分子結的體系一般具有幾十納米甚至幾百納米大小的尺度[14-16],組成多分子器件電路后勢必將有更大的規模.因此,需要解決的一個關鍵問題是如何有效地提高計算效率,以使之能夠處理較大體系的輸運問題.基于多核CPU的OpenMP多線程并行計算和基于多臺物理機的集群MPI多進程并行計算為加速DFT+NEGF自洽迭代計算指明了一個發展方向.事實上,基于第一性原理研究量子輸運的商業軟件Atomistix ToolKit[17]、開源軟件Smeagol[18]都已實現了OpenMP和MPI并行計算.然而,這些軟件更關注的是功能實現而非并行效率優化,例如Smeagol的MPI并行加速只是針對積分能量點實現的,一旦CPU總核數超過積分能量點個數會造成CPU資源的嚴重浪費.本文工作的意義在于不僅在前人研究的基礎上實現了量子輸運計算軟件,還重新審視并設計了軟件中的MPI/OpenMP混合并行算法,實現了比Smeagol軟件更細粒度、更優化的并行計算.

1 DFT+NEGF計算流程

基于DFT+NEGF框架的量子輸運計算流程如圖1所示.我們需要先選取恰當的基函數和贗勢,然后猜測一個初始的密度矩陣,一般取電子占據每個原子的基態作為初始猜測.對于電極,我們利用周期體系的DFT計算得到電極原胞的哈密頓量矩陣、密度矩陣和電極自能矩陣.對于擴展分子,先由給定的基函數和贗勢可以計算出動能和原子核的電勢,這兩部分在自洽計算中不會變化.在得到擴展分子和電極的哈密頓量之后,對于每一個給定的能量,我們都可以計算出自能和擴展分子的格林函數,然后對能量積分得到密度矩陣.如果這個密度矩陣與輸入的密度一樣,則自洽計算成功,否則將新的密度矩陣繼續代入上面的循環,計算哈密頓矩陣和下一步的密度矩陣,直到收斂為止.

圖1 基于DFT+NEGF框架的量子輸運計算的主流程Fig.1 Main flow of quantum transport calculations based on DFT+NEGF framework

在DFT+NEGF框架下,由于采用了局域原子軌道基組和屏蔽近似,兩端量子器件的開放體系可以分割成三個典型部分,如圖2所示.擴展分子的格林函數可表示為

其中ΣL和ΣR分別代表了左、右電極作為半周期邊界條件對擴展分子的哈密頓矩陣的微擾,我們稱之為電極自能矩陣.無論輸運體系是否處于平衡狀態,擴展分子的電子密度矩陣在數學上均表達為

其中擴展分子的小于格林函數在左、右電極費米能級相同的平衡狀態下可寫為

在左、右電極費米能級不相同的非平衡狀態下可寫為

GR和GA、ΓL和ΓR、AL和AR、μL和μR分別是擴展分子的推遲格林函數和超前格林函數、左右電極的展寬函數、擴展分子的譜函數中與左電極和右電極相耦合的部分、左右電極的費米能級.

圖2 兩端量子器件開放體系的典型劃分Fig.2 Typical division of a two-end quantum transport device

2 計算熱點分析

從圖2可知,鑒于周期電極的DFT計算和量子輸運結果分析都各執行一遍,量子輸運計算的熱點在于擴展分子哈密頓量矩陣H和電子密度矩陣ρ之間的自洽迭代計算.H的計算量主要在于對體系電子-電子庫侖勢的計算,利用快速傅立葉變換技術可以有效地將這一部分的計算量減小到O(N log N)的量級;ρ的計算量主要在于求格林函數時矩陣求逆操作(見式(1))和求電子密度矩陣時的能量積分操作(見式(2)),其計算量是O(N3),N是量子輸運體系的尺度大小[19].因此,接下來我們關注的是自洽迭代過程中擴展分子格林函數和電子密度矩陣的求解.

2.1 計算格林函數矩陣

根據式(1),獲知格林函數矩陣實際上是單純的矩陣求逆操作.通用線性代數軟件包 (Linear Algebra PACKage,LAPACK)提供的矩陣求逆算法是通過LU分解和回代過程兩個步驟來進行的,其計算量是O (N3),也就是說每當計算體系增大一倍,求逆計算以及由此導致的NEGF計算的計算量就增大8倍.這大大限定了我們計算的體系大小.在實際計算中,由于采用了局域軌道基組,當原子之間達到一定的距離以后,它們的基組之間將沒有任何重疊.這就導致求擴展分子格林函數時被求逆矩陣z S-H-ΣL-ΣR是非常稀疏的.利用這一稀疏的性質可以有效地提高求逆的速度.可以證明,對于一個帶狀對角的稀疏矩陣,LU分解得到的矩陣具有和原矩陣相同的稀疏程度,從而使得LU分解的計算量從原來的O(N3)降到O(N).然而在輸運計算的體系中,格林函數矩陣往往不具有稀疏性,這使得回代過程的計算量僅由O(N3)降到O (N2).因此,基于稀疏矩陣的LAPACK直接求逆算法最終是個O(N2)計算量的算法[20].

圖3 非平衡態下求電子密度矩陣的能量積分路徑(Ceq為平衡部分的環路積分,Lne為非平衡部分貼近實軸的積分路徑,Cne為非平衡部分的環路積分.)Fig.3 Energy integral path in calculating non-equilibrium electron densitymatrix(Ceqis for the equilibrium division,Lneand Cneare for the non-equilibrium divisions along the real axis and off the real axis.)

2.2 計算電子密度矩陣

實際上,把基組表象的電子密度矩陣表示成對應空間的譜函數按照費米分布填充后再對能量積分的形式,是非常形象和直觀的.只不過,因為小于格林函數在能量域內可能存在大量尖峰而且費米分布函數在費米能級對應的虛軸上存在奇點,所以利用式(2)積分求電子密度矩陣在數學實現上是一項非常有技巧性的工作.簡單地說,平衡態下的電子密度矩陣可以利用復平面內的環路積分和留數定理得到,而非平衡態下的電子密度矩陣積分可以通過公式變形分成兩部分:利用環路積分的平衡部分和處在左、右電極的費米能級之間貼近實軸積分的非平衡部分[20-23],非平衡態下的雙路徑積分見圖3所示.變形積分路徑的一個最大的好處在于我們可以用一個非常稀疏的積分網格點劃分來得到一個相對精確的積分結果,因為格林函數在遠離實軸的區域變得非常的平滑.數學上有很多成熟的算法來處理這一數值積分問題.目前,基于DFT+NEGF的量子輸運計算軟件多數采用Gaussian積分算法[24].不同的是,我們采用了Gauss-Kronrod積分算法[24],特征是對積分路徑上不同平滑程度的區域采用不同疏密的函數取值點使得格林函數的積分更加有效率.圖4給出了金-吡啶-金分子結量子輸運體系的推遲格林函數在上半復能量域上的平滑程度以及采用Gauss-Kronrod積分的自適應能量取點分布.可見,在靠近實軸和費米能級的區域,算法自動給出了更密的函數取點.

圖4 金-吡啶-金分子結量子輸運體系Fig.4 Au-pyridine-Au molecular junction

3 并行計算方案

正如2.1節所述,因為采用了局域的原子軌道基組,為了節省內存和加快矩陣運算我們對稀疏形式的大規模矩陣采用壓縮存儲.如圖5所示,在并行計算方案中,考慮到各進程間的負載均衡和MPI通信開銷,應選擇合適的塊大?。˙lockSize)將稀疏矩陣按照輪詢調度算法(Round-Robin Scheduling)分配給所有進程.若BlockSize過小則MPI進程間通信頻繁,開銷大;若BlockSize過大則有的進程可能分配不到計算任務,負載均衡差.另外,每個進程維護著壓縮存儲矩陣的本地索引表并且能夠還原成稀疏矩陣中對應非零元素的索引.

圖5 稀疏矩陣及對應壓縮存儲數組的進程分配示意Fig.5 Schematic representation of process assignments for sparsematrix and its compressed array

3.1 能量點積分的并行化

在DFT+NEGF框架下的自洽迭代過程中,計算電子密度矩陣時Gauss-Kronrod積分的每個能量點都需要執行從擴展分子哈密頓量矩陣到小于格林函數矩陣的計算,見式(1),(3),(4).幸運的是這樣的計算過程是相互獨立的.因此我們采用的粗粒度并行計算方案是按照輪詢調度算法MPI進程“分而治之”各積分能量點,只在最后的“積分”階段進行歸約加權求和,如圖6所示.此外,由于量子輸運體系在垂直于輸運方向的平面內往往是周期化處理的,即DFT+NEGF框架下的稀疏矩陣多是k相關的[9-11].類似地,不同k點對應的矩陣運算也是相互獨立的,所以k點并行化也可采用與能量點積分相似的并行方案.

圖6 能量點積分和矩陣運算的并行計算方案示意圖Fig.6 Schematic representation of energy point integration and matrix operation

3.2 矩陣求逆和乘法的并行化

在DFT+NEGF框架下的自洽迭代過程中,計算擴展分子的格林函數時需要求一個大規模稀疏矩陣的逆矩陣,屬于AX=B這樣的線性代數求解問題.由于為了節省內存和充分利用緩存技術每個進程中只保存了按照詢調度算法分配的多段稀疏矩陣的壓縮存儲數組(見圖5所示),調用ScaLAPACK并行庫中的PZGESV(…)程序實現矩陣求逆的并行化和調用PZGEMM(…)程序實現矩陣乘法的并行化便水到渠成.實際上,也可采用分塊矩陣的遞歸操作思想把大規模矩陣求逆轉化成小規模矩陣求逆以及小規模矩陣乘法問題[25].作為細粒度并行計算方案,矩陣求逆和矩陣乘法還可以采用共享內存式的OpenMP多線程并行,配合編譯選項調用較高版本的Intel MKL數學庫或采用MPI/OpenMP混合編程均可實現.實際上,由于分段存儲無論采用那種矩陣求逆并行計算方案都涉及到MPI進程間通信問題.在優化進程通信開銷方面,結合特殊的數據結構我們采用自定義數據類型和持久通信來完成MPI進程間的數據傳輸.

4 算例驗證

我們采用如圖7所示的Pt-PDI-Pt雙端電子輸運器件,擴展分子包含10層3×3鉑(Pt3×3)原子層和一個C8H4N2分子(PDI分子)共104個原子,Pt電極在垂直輸運方向是周期擴展的.在原子軌道基組選擇上,Pt原子采用SZP而其它原子均采用DZP.擴展分子共有960個原子軌道,因此若只考慮最近鄰相互作用則圖5中實空間下的稀疏矩陣大小為960×8 640而圖6中k空間下的方陣大小為960×960.如果考慮進自旋自由度,矩陣大小也會相應成倍增加.如此大規模的矩陣運算都是比較耗時的.

我們采用的計算平臺為由4個計算節點(CompNode)組成的小型集群(Cluster),每個CompNode的配置相同均為Intel Xeon E5-2650四路八核CPU/2.00 GHz主頻/20 MB三級緩存/64 GB內存/Linux RHEL 6.3操作系統,禁用CPU的超線程功能.首先驗證只考慮能量點積分并行化的情況,為了對比測試我們采用手動給定積分能量點個數的方式.由于在不考慮矩陣求逆和矩陣乘法的并行化時不同能量點的計算是彼此獨立的,所以我們獲得了如圖8(a)所示的加速比曲線.加速比曲線偏離線性特征主要是因為兩方面因素:一是曲線前半段時的MPI聚合通信開銷,例如實空間下的稀疏矩陣是在初始化時分段存儲在各個進程中的,需要在生成k空間的方陣時進行MPI聚合通信操作,而且在最后階段執行數值積分操作的加權求和也是個MPI_ REDUCE、MPI_BCAST聚合通信操作;二是曲線后半段時的CPU資源受限帶來的影響,當積分能量點個數超過Cluster中CPU總核數(128核)時,由于進程間通信開銷相比變化不大加速比曲線略有微升.

圖7 Pt-PDI-Pt雙端量子輸運器件的擴展分子結構Fig.7 Extended molecule structure of a two-end Pt-PDI-Pt quantum transport device

然后驗證矩陣求逆并行化的情況.我們測試了3.2節所述的兩種并行計算方案即調用ScaLAPACK并行庫中的PZGESV子程序和利用Intel Cluster MKL 10.2.4.032數學庫中的ZGESV OpenMP多線程加速函數.其加速比曲線如圖8(b)所示.考慮到Intel MKL的ZGESV是基于單節點共享內存式的OpenMP加速,我們選用某一臺含32個CPU核心的CompNode進行測試.可見,采用Intel MKL庫函數的OpenMP加速實現明顯優于采用ScaLAPACK庫函數的MPI加速實現,這主要是因為OpenMP線程間的通信和切換開銷大大小于MPI進程間的通信和切換開銷,OpenMP的共享內存機制保證了更加快速高效的數據交換.但是矩陣求逆的OpenMP并行加速可擴展性和可移植性較差,最大加速比嚴重依賴于計算節點的CPU配置,適合于胖計算節點的情況.相比而言,MPI并行加速的可移植性更好.矩陣乘法并行化的加速比測試結果與矩陣求逆的情況類似.

圖8 能量點積分并行(a)和矩陣求逆并行(b)的加速比曲線Fig.8 Speedup ratio curves of energy point integration(a)and matrix inversion(b)

5 結束語

基于第一性原理對量子器件的輸運特性進行分析計算備受人們關注,但是在DFT+NEGF框架下計算多原子體系的電子密度矩陣是非常耗時的,所以利用現代CPU的多核優勢發展通用的并行計算技術顯得尤為重要.本文在分析DFT+NEGF方法計算熱點的基礎上,提出了針對能量點積分和大規模稀疏矩陣求逆、大規模稀疏矩陣乘法的并行計算方案.在并行計算方案中,充分考慮了進程間的負載均衡和MPI通信開銷,將實空間下的稀疏矩陣和電子密度積分能量點按照輪詢調度算法分配給各個進程.由于不同能量點的計算是彼此獨立的,所以在CPU資源充足的情況下獲得亞線性的加速比曲線.對于矩陣求逆和矩陣乘法,我們實現了兩種并行計算加速,即調用ScaLAPACK并行庫子程序和調用Intel MKL數學庫中的OpenMP多線程加速子程序.由于不同的數據交換機制和進程(線程)通信切換開銷,OpenMP加速實現的計算效率要明顯優于MPI加速實現,但是MPI加速實現具有更好的擴展性和移植性.在計算中我們還發現Intel MKL數學庫已針對Intel CPU的指令集和向量化特征做了充分優化,配合Intel編譯器使用計算效率要好于其它LAPACK/ScaLAPACK數學庫.

除此之外,對于在垂直于輸運方向上周期擴展的量子輸運體系我們還可以考慮與能量點積分相似的k點并行化方案,只不過由于擴展分子橫向足夠大k點個數往往遠小于積分能量點個數.我們還計劃下一步實現基于GPU的矩陣求逆和矩陣乘法并行加速,以期獲得更高的加速比和加速效率.

[1] 薛增泉.分子電子學[M].北京:北京大學出版社,2003.

[2] Nitzan A,Ratner M.Electron transport in molecular wire junctions[J].Science,2003,300(5624):1384-1389.

[3] Tao N.Electron transport in molecular junctions[J].Nature Nanotechnology,2006,(1):173-181.

[4] Datta S.Electronic transport inmesoscopic systems[M].UK:Cambridge University Press,1995.

[5] Keldysh L.Diagram technique for non-equilibrium processes[J].Sov Phys JETP,1965,20(4):1018-1026.

[6] Meir Y,Wingreen N.Landauer formula for the current through an interacting electron region[J].Phys Rev Lett,1992,68 (16):2512-2515.

[7] R Parr,Yang W.Density-functional theory of atoms and molecules[M].UK:Oxford University Press,1989.

[8] Koch W,Holthausen M.A chemist’s guide to density functional theory[M].Weinheim:Wiley-VCH,2001.

[9] Haug H,Jauho A.Quantum kinetics in transport and optics of semiconductors[M].Berlin:Springer,1996.

[10] Caroli C,Combescot R,Nozieres P,Saint-James D.A direct calculation of the tunnelling current:IV.Electron-phonon interaction effects[J].JPhys C,1972,5(1):21-42.

[11] Ferrer J,Martin-Rodero A,Flores F.Contact resistance in the scanning tunneling microscope at very small distances[J].Phys Rev B,1988,38(14):10113-10115.

[12] Ferretti A,Calzolari A,Felice R,Manghi F.First-principles theoretical description of electronic transport including electronelectron correlation[J].Phys Rev B,2005,72(12):125114.1-13.

[13] Rocha A,García-Suárez V,Bailey S,Lambert C,Ferrer J,Sanvito S.Spin and molecular electronics in atomically generated orbital landscapes[J].Phys Rev B,2006,73(8):085414.1-22.

[14] Liu R,Sun Y.Influence of electric field on 1,4-phenylene diisocyanidemolecular electronic transport[J].Chinese JComput Phys,2013,30(6):932-942.

[15] Leng J,Zou B,Ma H.3Dmodified novel PML absorbing boundary condition for truncatingmagnetized plasma[J].Chinese J Comput Phys,2013,30(6):895-901.

[16] Liu R,Li Z.Electronic transport in molecular device of isomers[J].Chinese JComput Phys,2011,28(2):295-300.

[17] Atomistix ToolKit(version 13.8.1)[CP/OL].QuantumWise,http:∥www.quantumwise.com,2014.

[18] Smeagol(version 1.2.1)[CP/OL].Trinity College,Dublin,http:∥smeagol.tcd.ie,2012.

[19] Anderson E,Bai Z,Bischof C,Blackford S,Demmel J,Dongarra J,Du Croz J,Greenbaum A,McKenney A,Sorensen D.LAPACK users'guide(third edition)[EB/OL].Society for Industrial and Applied Mathematics(SIAM),http://www.netlib.org/lapack/lug,1999.

[20] 錢則侃.基于分子器件集成電路設計的理論研究[D].北京大學圖書館:北京大學信息科學技術學院,2008.

[21] Brandbyge M,Mozos J-L,Ordejón P,Taylor J,Stokbro K.Density-functionalmethod for nonequilibrium electron transport [J].Phys Rev B,2002,65(16),165401.1-17.

[22] 李銳.分子電子器件中的分子軌道和自旋軌道耦合效應研究[D].北京大學圖書館:北京大學信息科學技術學院, 2008.

[23] Taylor J,Guo H,Wang J.Ab initiomodeling of quantum transport properties ofmolecular electronic devices[J].Phys Rev B,2001,63(24):245407.1-13.

[24] Press W,Flannery B,Teukolsky S,VetterlingW.Numerical recipes in C:The art of scientific computing[M].2nd ed.UK:Cambridge University Press,1992.

[25] 鄭小宏,蘭杰,郝華,曾雉.GPU加速在第一性原理輸運研究中的應用[J].科研信息化技術與應用,2013,4(5):90-96.

0 Introduction

It is well known that the study of nonlinear wave equations and their solutions are of great importance in many areas of physics.Among these,the Korteweg-de Vries(KdV)equation and the Burgers equation have attracted a lotof interest due to their ubiquity as amodel of nonlinear physical phenomena.In this paper we consider the compound Burgers-Korteweg-de Vries(cBKdV)equation,

whereα,β,νandμare constants,which can be viewed as a composition of the KdV,modified KdV and Burgers equations,involving nonlinear,dispersion and dissipation effects[1-2].The second and third terms are two non-linear convective ones with different orders.The fourth is the so-called viscous dissipative term in whichνdenotes the viscosity and is positive.The last one is the dispersive term.Equation(1)contains at least the following important special cases:

1)Ifα=0,β≠0,ν≠0,μ≠0,then Eq.(1)becomes amodified Burgers-KdV(mBKdV)equation

2)Ifα≠0,β=0,ν≠0,μ≠0,then Eq.(1)becomes

which is the Burgers-KdV(BKdV)equation.Further,ifwe setμ=0 in(3),then

which is the Burgers equation.

3)Ifα≠0,β≠0,ν=0,μ≠0,then

Equation(5)is the compound KdV(cKdV)equation[3].

4)Further,ifν=0 in Eqs.(2)and(3),then we obtain the modified KdV(mKdV)equation[4]

and the KdV equation[5]

respectively.

Thus,Eq.(3)is one of the simplest classicalmodels containing nonlinear,dissipative,and dispersive effects.It can be viewed as a nonlinearmodel of the propagation of waves on an elastic tube filled with a viscous fluid[6],the flow of liquids containing gas bubbles[7]and turbulence[8].

The goal of this paper is to develop a lattice Boltzmann method(LBM)for the cBKdV equation.In recent years,the LBM has attracted more and more attention as an alternative and promising numerical scheme for simulating fluid flows[9-11]and solving various mathematicalphysical problems[12-21].This method can be either regarded as an extension of the lattice gas automaton[22]or as a special discrete form of the Boltzmann equation for kinetic theory[23].The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics ofmicroscopic ormesoscopic processes so that themacroscopic averaged properties obey the desired macroscopic equations.Due to its advantages in geometrical flexibility,natural parallelity, simplicity of programming and numerical efficiency,the LBM hasmade great success inmany fields and has extensively been applied to solving numerous problems.

Many researchers obtained numerical solutions of various linear and nonlinear partial differential equations via LBMs.Yan et al[15-16]presented a higher-order moment method.By giving appropriate equilibrium distribution,Shi et al[18-19]proposed a LBM for nonlinear convectiondiffusion equations.The equilibrium distributionmust satisfy reasonable constrains ofmoments from physical characteristic ofmacroscopic equation.Several researchers[20-21]provided the technique by adding an amending function to the lattice Boltzmann equation(LBE)and constructing the equilibrium distribution function.The LBMs above are the Bhatnagar-Gross-Krook(BGK)[24]or single-relaxation-time approximation to the Boltzmann equation.However,there is a troublesome problem for solving nonlinear partial differential equations in many existing lattice Boltzmann models,i.e.,how to treat the higher derivative and nonlinear terms?In this paper,a lattice Boltzmann model for the cBKdV equation is developed.By treating the dispersive term uxxxproperly and applying the Chapman-Enskog expansion,the governing equation is recovered correctly from the lattice Boltzmann equation,and the local equilibrium distribution functions are obtained.Numerical solutions agree well with those exact solutions and with other numerical results.

The paper’s layout is as follows:In Section 1 a brief introduction is given.Section 2 highlights the lattice Boltzmann model.The application of LBM to the cBKdV equation is presented in the section.Section 3 is dedicated to numerical results and analysis of the mathematical models.Finally,conclusions are presented.

1 Lattice Boltzmann model

According to the theory of lattice Boltzmann method,it consists of two steps:① colliding, which occurs when particles arriving at a node interact and possibly change their velocity directions according to scattering rules,and②streaming,in which each particlemoves to the nearestnode in the direction of its velocity.In its standard form,LBM is an explicit finite difference approximation of a velocity-discrete Boltzmann equation with a collision operator of relaxation type.Usually,with the single-relaxation-time BGK approximation,these two steps can be combined into the LBE.Evolution equation of the distribution function in themodel reads

where fi(x,t)is the distribution function of particles;(x,t)is the local equilibrium distribution function of particles;Δx andΔt are the lattice space and time increment,respectively;c=Δx/Δt is the particle speed andτis the dimensionless relaxation time;{ei,i=0,1,…,b-1}is the set of discrete velocity directions,for the 3-bitmodel,{e0,e1,e2}={0,c,-c}.Consider a lattice with b-1 links that connect the center site to b-1 neighboring nodes.It is actually a b-bitmodel if the rest particles with velocity ei=0 are allowed at each node.The macroscopic parameter u is defined in terms of the distribution functions as

To derive the macroscopic equation from the lattice BGK model,the Chapman-Enskog(CE)expansion is applied under the assumption of small Kundsen number?defined as?=l/L,where l is themean free path,and L is the characteristic length.In the investigation on accuracy of the LBM, the CE expansion is usually used for the LBE and it has been found that the LBE approaches to the Navier-Stokes equations with error proportional to the Knudsen number squared[25].The Chapman-Enskog expansion is applied to fi(x,t),

Using time scale t1=?t,t2=?2t and space scale x1=?x,then the time derivation and the space derivation can be expanded formally

Performing Taylor expansion on Eq.(8)and using Eqs.(10)and(11)(up to O(?3)),we get partial differential equations in order of?and?2,

Taking(12)×?+(13)×?2,summing over i and using Eqs.(9),(11),(12)and

we have

In Eq.(15)

and

where

Then a cBKdV with second order accuracy of truncation error is obtained.

Summing Eq.(11)over i and using Eqs.(9),(14)and(16),we obtain

Using Eqs.(12),(17)and(19),we get the second-order moment equation of the local equilibrium distribution

Based on Eqs.(9),(16)and(20),the equilibrium distribution functions of 3-bitmodel is obtained

We use a central difference(second order accuracy)to approximate partial derivative ux,i.e.,

According to the scaling relation(11),the error is up to?3,which does not affect the previous derivation.For the boundary nodes,in order to avoid large errors at the boundary which can affect the global evolution,we use a three adjacent points difference(second order accuracy)to approximate partial derivatives ux(a)and ux(b),i.e.,

where a is the left boundary node,and b is the right boundary node.

2 Numerical results and analysis

In this section,the 3-bit LBM is used to solve Eqs.(1)-(7)and numerical accuracy is demonstrated via comparing numerical results with analytical solutions and current available predictions or results.

Exam p le 1 A specialmodel problem of KdV equation was investigated in Refs.[26,27].We consider KdV equation(7)withα=1 andμ=4.84×10-4.The initial condition is

and the boundary condition is

The exact solution of this problem is taken from Ref.[28]and is given by

The 3-bit LBM is applied to Example 1 withΔx=0.012 5,Δt=0.000 05,C1=0.3,D1=-6.0 andλ=c2/2.The results of percent error are compared with those given in Refs.[26-27].These techniques include heat balance integralmethod(HBIM)[26]and multiquadric quasiinterpolation(MQQI)[27]method.Percentage errors of the numerical solutions of the KdV equation by LBM,HBIM,and MQQImethod are given in Table 1,which shows that the new algorithm performs better than HBIM[26]and MQQI[27].In Fig.1,profiles of the exactand numerical solutions at time t=0.1,0.5 and 1.0 are illustrated.

Table 1 Percentage errors of Examp le 1 at t=0.005,0.01

Fig.1 LBM solutions and exact solutions at different time stages

Exam ple 2 The second test is solution of the modified KdV

The exact solution of this problem is

where the height of the soliton a=0.5 and the initial position of the soliton x0= 0.In the numerical simulation,the region of space is set in x∈[-50,50], and the parameters are taken asΔx=0.1,Δt=0.01, andλ=c2/5.L∞and L2errors for Eq.(25)at time t=0.1,0.5 and t=1 are shown in Table 2.Solitary wave solutions obtained by LBM,and the exact solutions are demonstrated in Fig.2 at time t=1.

Tab le 2 L∞and L2errors for Eq.(25)at time t=0.1,0.5 and t=1

Exam ple 3 Consider a 1D Burgers equation[17]

with the following initial and boundary conditions

It is a steady shock problem with exact solutions u(x)=-tanh[x/(2ν)].The LBM solutions are shown in Fig.3.When time t=4,a shock is formed and the numerical steady state solutions are obtained.Wemeasured the error using both L2-norm and L∞-norm,and they are3.641 4×10-3and 2.516 1×10-2,respectively.In the simulation,ν=0.005 and the parameters are taken asΔx=0.02,Δt=0.000 2,andλ=c2/5.

Fig.2 LBM solutions(circles)and exact solutions(solid)at t=1

Fig.3 LBM solutions and exact solutions of Burgers equation withν=0.005

Fig.4 LBM solutions and exact solutions at t=10,50, 100,200 forα=1,ν=1,and μ=1(solitary wave)

Example 4 In this example,we consider BKdV equation(3)with solitary wave solutions[29],

where A=ν2/(25αμ),B=ν/10μand C=6 ν2/25 μ.The solitary wave,i.e.,kink-type solution obtained by the LBM,and the exact solutions of Eq.(3)are demonstrated in Fig.4 forα=1,ν=1,andμ=1 at times t=10,50,100 and 200.L∞and L2errors for various values ofα, ν,andμat time t=0.5 and t=1 are shown in Table 3.In the numerical simulation,the region of space is set in x∈[-100,100],and the parameters are taken asΔx=1,Δt=0.1,andλ=c2/5.From the illustrations,we can see that the LBM results agree very wellwith the exact solutions.

Table 3 L∞and L2errors for various values ofα,ν,andμat time t=0.5 and t=1

Example 5 We consider amodified Burgers-KdV equation as follows

with traveling wave solutions[30]

In Table 4,we present errors between exact and numerical solutions at different values of x and t in the region x∈[-10,10].The parameters are taken asΔx=0.01,Δt=0.01,andλ=c2/3.In comparison with ChSC(Chebyshev spectral collocation)method[30],results of the LBM are almost as good as numerical results obtained from Ref.[30].The traveling wave solutions obtained by the LBM,and the exact solutions of Eq.(27)are demonstrated in Fig.5 at time t=0.5.

Table 4 Absolute errors of LBM solutions at different values of x and t

Table 5 L∞and L2errors for various values ofα,β,andνat time t=0.5 and t=1 w ithμ=-0.1

Exam ple 6 Finally we consider Eqs.(1)and(5)with the following explicit traveling solitary wave solutions[2]

and

whereθ=x-vt+x0,v=-ν2/6μ-2μk2-α2/4β,and k=1,x0=0.

The L∞and L2errors for various values ofα,β,andνwithμ=-0.1 at time t=0.5 and t=1 are shown in Table5.In numerical simulation,the region of space is set in x∈[-10,10],and the parameters are taken asΔx=0.1,Δt=0.01,andλ=c2/2.Traveling solitary wave solutions obtained by LBM,and exact solutions of the cBKdV equation are displayed in Fig.6 at time t=5 withα=1,β=100,ν=0.01 andμ=-0.01.From the illustrations,we can see that the LBM results agree very wellwith the exact solutions.

Fig.5 LBM solutions and exact solutions at time t=0.5(traveling wave)

Fig.6 LBM solutions and exact solutions of cBKdV equation at t=5 withα=1,β=100,ν=0.01,andμ=-0.01

3 Conclusions

In this paper,a 3-bit lattice Boltzmannmodel for the cBKdV equation is proposed.By treating the dispersive term uxxxproperly and applying the Chapman-Enskog expansion,the governing equations are recovered correctly from the lattice Boltzmann equations and the local equilibrium distribution functions are also obtained.In order to illustrate efficiency of the proposed method, comparisons aremade with exact solutions and with other numerical results.Numerical experiments show that LBM results are not only in nice agreement with exact solutions but also much more accurate than those of othermethods.We shall discuss further works about thismodel in future.

[1] Wang M L.Exact solutions for a compound KdV-Burgers equation[J].Phys Lett A,1996,213:279-287.

[2] Feng Z S,Chen G.Solitary wave solutions of the compound Burgers Korteweg-de Vries equation[J].Physica A,2005,352:419-435.

[3] Pan X D.Solitary wave and similarity solutions of the combined KdV equation[J].Appl Math Mech, 1988,9:281-285.

[4] McIntosh I.Single phase averaging and traveling wave solutions of themodified Burgers-Korteweg-de Vries equation[J].Phys Lett A,1990,143:57-61.

[5] Canosa J,Gazdag J.The Korteweg-de Vries-Burgers equation[J].JComput Phys,1977,23:393-403.

[6] Johnson R S.A non-linear equation incorporating damping and dispersion[J].JFluid Mech,1970,42:49-60.

[7] Wijngaarden L V.One-dimensional flow of liquids containing small gas bubbles[J].Ann Rev Fluid Mech,1972,4:369-396.

[8] Gao G.A theory of interaction between dissipation and dispersion of turbulence[J].Sci Sin Ser A,1985, 28:616-627.

[9] Benzi R,Succi S,Vergassola M.The lattice Boltzmann equation:theory and applications[J].Phys Report,1992,222(3):145-197.

[10] Qian Y H,Succi S,Orszag S.Recent advances in lattice Boltzmann computing[J].Annu Rev Comput Phys,1995,3:195-242.

[11] Chen SY,Doolen G D.Lattice Boltzmannmethod for fluid flows[J].Annu Rev Fluid Mech,1998,30:329-364.

[12] Dawson SP,Chen SY,Doolen G D.Lattice Boltzmann computations for reaction-diffusion equations[J].JChem Phys,1993,2:1514-1523.

[13] Shen Z J,Yuan GW,Shen L J.Lattice Boltzmann method for Burgers equation[J].Chinese JComput Phys,2000,17:166-172.

[14] Yan GW.A lattice Boltzmann equation for waves[J].JComput Phys,2000,161:61-69.

[15] Yan GW,Zhang JY.A higher-odermomentmethod of the lattice Boltzmann model for the Korteweg-de Vries equation[J].Math Comput Simu,2009,79:1554-1565.

[16] Zhang JY,Yan GW.A lattice Boltzmannmodel for the Korteweg-de Vries equation with two conservation laws[J].Comput Phys Commu,2009,180:1054-1062.

[17] Duan Y L,Liu R X,Jiang Y Q.Lattice Boltzmann model for themodified Burgers’equation[J].Appl Math Comput,2008,202:489-497.

[18] Shi B C,Guo Z L.Lattice Boltzmann model for nonlinear convection-diffusion equations[J].Phy Rev E, 2009,79:016701.

[19] Shi B C,Deng B,Du R,Chen X W.A new scheme for source term in LBGK model for convectiondiffusion equation[J].Comput Math Appl,2008,55(7):1568-1575.

[20] Ma C F.A new lattice Blotzmannmodel for KdV-Burgers equation[J].Chinese Phys Lett,2005,22(9):2313-2315.

[21] Lai H L,Ma C F.Lattice Boltzmann method for the generalized Kuramoto-Sivashinsky equation[J].Physica A,2009,388:1405-1412.

[22] McNamara G R,Zanetti G.Use of the Boltzmann equation to simulate lattice-gas automata[J].Phys Rev Lett,1988,61:2332-2335.

[23] He X Y,Lou L S.Theory of the lattice Boltzmann method:from the Boltzmann equation to the lattice Boltzmann equation[J].Phy Rev E,1997,56:6811-6817.

[24] Bhatnagar P,Gross E,Krook M.A model for collision process in gas.I:Small amplitude processed in charged and neutral one component system[J].Phys Rev,1954,94:511-525.

[25] Guo Z L,Gross E,Krook M.Lattice Boltzmann method for Fluid Dynamics[M].Wuhan:Hubei Science and Technology Press,2002.

[26] Kutluay S,Bahadir A,?zdes A R.A small time solutions for the Korteweg-de Vries equation[J].Appl Math Comput,2000,107:203-210.

[27] Xiao M L,Wang R H,Zhu C G.Applyingmultiquadric quasi-interpolation to solve KdV equation[J].Journal of Mathematical Research Exposition,2011,31(2):191-201.

[28] Alexander M E,Morris J L I.Galerkin methods for some model equations for nonlinear dispersive wave [J].Comput Phys,1979,30:428-451.

[29] Soliman A A.Themodified extended tanh-functionmethod for solving Burgers-type equations[J].Physica A,2006,361:394-404.

[30] Khatera A H,Temsaha R S,Hassanb M M A.Chebyshev spectral collocationmethod for solving Burgers’-type equations[J].JComput Appl Math,2008,222:333-350.

Parallel Com puting of First-principles Based Quantum Transport Simulations

ZHANG Ruoxing1, HOU Shimin2, CHOU Qiang1
(1.National Energy Key Laboratory ofNuclear Power Software,Software Development Center,State Nuclear Power Technology Corporation,Beijing 100029,China;2.Key Laboratory for the Physics and Chemistry of Nanodevices,Department of Electronics,Peking University,Beijing 100871,China)

To solve long time-consuming problem in analysis of large-scale quantum transport systems based on first-principles calculations,we analyze hot spots of self-consistent iterations within the framework that combines non-equilibrium Green’s function with density functional theory,namely DFT+NEGFmethod.Two parallel computing schemes based on MPI/OpenMP are proposed to deal with energy point integration and matrix inversion/multiplication.For energy point integral parallelism,sparsematrix as well as energy points should be assigned to each process over data initialization according to round-robin scheduling algorithm.Either MPI based ScaLAPACK subroutine or OpenMP based Intel MKL subroutine can be called to realize matrix inversion/multiplication parallelization.A sub-linear speedup ratio curve is obtained for energy point integral parallelism due to the fact that calculations related with different energy points aremutually independent.OpenMP parallelism adopts shared memory patterned data exchangemechanism and overhead of switching threads is rather small,and consequently it is better in computing efficiency but worse in code scalability than MPI implementation.

first-principles calculations;density functional theory;Green’s function;MPI;parallel computing

Lattice Boltzmann M odel for Com pound Burgers-Korteweg-de Vries Equation

DUAN Yali1, CHEN Xianjin1, KONG Linghua2
(1.School ofMathematical Sciences,University of Science and Technology ofChina,Hefei,Anhui 230026,China;
2.School ofMathematicsand Information Science,Jiangxi Normal University,Nanchang,Jiangxi 330022,China)

1001-246X(2015)06-0639-10

O241.8 Document code:A

1001-246X(2015)06-0631-08

O411.2

A

2014-11-03;

2015-02-09

國家核電技術公司員工自主創新項目(SNP-KJ-CX-2015-27),大型先進壓水堆核電站重大專項核電關鍵設計軟件自主化技術研究課題(2011ZX06004-024)資助

張若興(1980-),男,河北邢臺,博士,高級工程師,主要研究領域為高性能網格計算和云計算、大數據分析技術、概率安全分析軟件開發,E-mail:zhangruoxing@snptc.com.cn

Received date:2014-11-05;Revised date:2015-02-01

Foundation item s:Supported by the NSFC(11101399,10901074)and Fundamental Research Funds for Central Universities(WK0010000022)

Biography:Duan Yali(1973-),female,Shanxi,associate professor,Ph D,major in numerical solution of PDEs,E-mail:ylduan01@ustc.edu.cn

Abstract:We develop a lattice Boltzmannmodel for compound Burgers-Korteweg-de Vries(cBKdV)equation.By properly treating dispersive term uxxxand applying Chapman-Enskog expansion,the governing equation is recovered correctly from lattice Boltzmann equation and local equilibrium distribution functions are obtained.Numerical experiments show that our results agree well with exact solutions and have better numerical accuracy compared with previous numerical results.This hence indicates that the model is satisfactory and efficient.

Key words:Lattice Boltzmannmodel;Compound Burgers-KdV equation;Chapman-Enskog expansion

猜你喜歡
電子密度格林進程
麻辣老師
顧及地磁影響的GNSS電離層層析不等像素間距算法*
我喜歡小狼格林
不同GPS掩星電離層剖面產品相關性分析
債券市場對外開放的進程與展望
等離子體電子密度分布信息提取方法研究
綠毛怪格林奇
一種適用于電離層電子密度重構的AMART算法
格林的遺憾
社會進程中的新聞學探尋
91香蕉高清国产线观看免费-97夜夜澡人人爽人人喊a-99久久久无码国产精品9-国产亚洲日韩欧美综合