Cuda by Example - Book
Cuda by Example - Book
Cuda by Example - Book
com
CUDA by Example
JASON SANDERS
EDWARD KANDROT
8SSHU6DGGOH5LYHU1-ǩ%RVWRQǩ,QGLDQDSROLVǩ6DQ)UDQFLVFR
1HZ<RUNǩ7RURQWRǩ0RQWUHDOǩ/RQGRQǩ0XQLFKǩ3DULVǩ0DGULG
&DSHWRZQǩ6\GQH\ǩ7RN\Rǩ6LQJDSRUHǩ0H[LFR&LW\
)RUHZRUG [LLL
3UHIDFH [Y
$FNQRZOHGJPHQWV [YLL
$ERXWWKH$XWKRUV [L[
&KDSWHU2EMHFWLYHV 2
&HQWUDO3URFHVVLQJ8QLWV 2
7KH5LVHRI*38&RPSXWLQJ 4
$%ULHI+LVWRU\RI*38V 4
(DUO\*38&RPSXWLQJ
&8'$
:KDW,VWKH&8'$$UFKLWHFWXUH"
8VLQJWKH&8'$$UFKLWHFWXUH
$SSOLFDWLRQVRI&8'$ 8
0HGLFDO,PDJLQJ 8
&RPSXWDWLRQDO)OXLG'\QDPLFV 9
(QYLURQPHQWDO6FLHQFH 10
&KDSWHU5HYLHZ 11
vii
2 GETTING STARTED 13
&KDSWHU2EMHFWLYHV 14
'HYHORSPHQW(QYLURQPHQW 14
&8'$(QDEOHG*UDSKLFV3URFHVVRUV 14
19,',$'HYLFH'ULYHU 1
&8'$'HYHORSPHQW7RRONLW 1
6WDQGDUG&&RPSLOHr 18
&KDSWHU5HYLHZ 19
3 INTRODUCTION TO CUDA C 21
&KDSWHU2EMHFWLYHV 22
$)LUVW3URJUDP 22
+HOOR:RUOG! 22
$.HUQHO&DOO 23
3DVVLQJ3DUDPHWHUV 24
4XHU\LQJ'HYLFHV 2
8VLQJ'HYLFH3URSHUWLHV 33
&KDSWHU5HYLHZ 3
&KDSWHU2EMHFWLYHV 38
&8'$3DUDOOHO3URJUDPPLQJ 38
6XPPLQJ9HFWRUV 38
$)XQ([DPSOH 4
viii
5 THREAD COOPERATION 59
6KDUHG0HPRU\%LWPDS 90
&KDSWHU5HYLHZ 94
&KDSWHU2EMHFWLYHV 9
&RQVWDQW0HPRU\ 9
5D\7UDFLQJ,QWURGXFWLRQ 9
5D\7UDFLQJRQWKH*38 98
L[
9 ATOMICS 163
10 STREAMS 185
[L
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
xii
5HFHQWDFWLYLWLHVRIPDMRUFKLSPDQXIDFWXUHUVVXFKDV19,',$PDNHLWPRUH
HYLGHQWWKDQHYHUWKDWIXWXUHGHVLJQVRIPLFURSURFHVVRUVDQGODUJH+3&
V\VWHPVZLOOEHK\EULGKHWHURJHQHRXVLQQDWXUH7KHVHKHWHURJHQHRXVV\VWHPV
ZLOOUHO\RQWKHLQWHJUDWLRQRIWZRPDMRUW\SHVRIFRPSRQHQWVLQYDU\LQJ
SURSRUWLRQV
7KHUHODWLYHEDODQFHEHWZHHQWKHVHFRPSRQHQWW\SHVLQIXWXUHGHVLJQVLVQRW
FOHDUDQGZLOOOLNHO\YDU\RYHUWLPH7KHUHVHHPVWREHQRGRXEWWKDWIXWXUH
JHQHUDWLRQVRIFRPSXWHUV\VWHPVUDQJLQJIURPODSWRSVWRVXSHUFRPSXWHUV
ZLOOFRQVLVWRIDFRPSRVLWLRQRIKHWHURJHQHRXVFRPSRQHQWV,QGHHGWKHpetaflop
(10ȍRDWLQJSRLQWRSHUDWLRQVSHUVHFRQG SHUIRUPDQFHEDUULHUZDVEUHDFKHGE\
VXFKDV\VWHP
$QG\HWWKHSUREOHPVDQGWKHFKDOOHQJHVIRUGHYHORSHUVLQWKHQHZFRPSXWDWLRQDO
ODQGVFDSHRIK\EULGSURFHVVRUVUHPDLQGDXQWLQJ&ULWLFDOSDUWVRIWKHVRIWZDUH
LQIUDVWUXFWXUHDUHDOUHDG\KDYLQJDYHU\GLIȌFXOWWLPHNHHSLQJXSZLWKWKHSDFH
RIFKDQJH,QVRPHFDVHVSHUIRUPDQFHFDQQRWVFDOHZLWKWKHQXPEHURIFRUHV
EHFDXVHDQLQFUHDVLQJO\ODUJHSRUWLRQRIWLPHLVVSHQWRQGDWDPRYHPHQWUDWKHU
WKDQDULWKPHWLF,QRWKHUFDVHVVRIWZDUHWXQHGIRUSHUIRUPDQFHLVGHOLYHUHG\HDUV
DIWHUWKHKDUGZDUHDUULYHVDQGVRLVREVROHWHRQGHOLYHU\$QGLQVRPHFDVHVDV
RQVRPHUHFHQW*38VVRIWZDUHZLOOQRWUXQDWDOOEHFDXVHSURJUDPPLQJHQYLURQ-
PHQWVKDYHFKDQJHGWRRPXFK
[LLL
CUDA by ExampleDGGUHVVHVWKHKHDUWRIWKHVRIWZDUHGHYHORSPHQWFKDOOHQJHE\
OHYHUDJLQJRQHRIWKHPRVWLQQRYDWLYHDQGSRZHUIXOVROXWLRQVWRWKHSUREOHPRI
SURJUDPPLQJWKHPDVVLYHO\SDUDOOHODFFHOHUDWRUVLQUHFHQW\HDUV
7KLVERRNLQWURGXFHV\RXWRSURJUDPPLQJLQ&8'$&E\SURYLGLQJH[DPSOHVDQG
LQVLJKWLQWRWKHSURFHVVRIFRQVWUXFWLQJDQGHIIHFWLYHO\XVLQJ19,',$*38V,W
SUHVHQWVLQWURGXFWRU\FRQFHSWVRISDUDOOHOFRPSXWLQJIURPVLPSOHH[DPSOHVWR
GHEXJJLQJ ERWKORJLFDODQGSHUIRUPDQFH DVZHOODVFRYHUVDGYDQFHGWRSLFVDQG
LVVXHVUHODWHGWRXVLQJDQGEXLOGLQJPDQ\DSSOLFDWLRQV7KURXJKRXWWKHERRN
SURJUDPPLQJH[DPSOHVUHLQIRUFHWKHFRQFHSWVWKDWKDYHEHHQSUHVHQWHG
7KHERRNLVUHTXLUHGUHDGLQJIRUDQ\RQHZRUNLQJZLWKDFFHOHUDWRUEDVHG
FRPSXWLQJV\VWHPV,WH[SORUHVSDUDOOHOFRPSXWLQJLQGHSWKDQGSURYLGHVDQ
DSSURDFKWRPDQ\SUREOHPVWKDWPD\EHHQFRXQWHUHG,WLVHVSHFLDOO\XVHIXOIRU
DSSOLFDWLRQGHYHORSHUVQXPHULFDOOLEUDU\ZULWHUVDQGVWXGHQWVDQGWHDFKHUVRI
SDUDOOHOFRPSXWLQJ
,KDYHHQMR\HGDQGOHDUQHGIURPWKLVERRNDQG,IHHOFRQȌGHQWWKDW\RXZLOO
DVZHOO
Jack Dongarra
University Distinguished Professor, University of Tennessee Distinguished Research
Staff Member, Oak Ridge National Laboratory
[LY
7KLVERRNVKRZVKRZE\KDUQHVVLQJWKHSRZHURI\RXUFRPSXWHUǢVJUDSKLFV
SURFHVVXQLW *38 \RXFDQZULWHKLJKSHUIRUPDQFHVRIWZDUHIRUDZLGHUDQJH
RIDSSOLFDWLRQV$OWKRXJKRULJLQDOO\GHVLJQHGWRUHQGHUFRPSXWHUJUDSKLFVRQ
DPRQLWRU DQGVWLOOXVHGIRUWKLVSXUSRVH *38VDUHLQFUHDVLQJO\EHLQJFDOOHG
XSRQIRUHTXDOO\GHPDQGLQJSURJUDPVLQVFLHQFHHQJLQHHULQJDQGȌQDQFH
DPRQJRWKHUGRPDLQV:HUHIHUFROOHFWLYHO\WR*38SURJUDPVWKDWDGGUHVV
SUREOHPVLQQRQJUDSKLFVGRPDLQVDVgeneral-purpose+DSSLO\DOWKRXJK\RX
QHHGWRKDYHVRPHH[SHULHQFHZRUNLQJLQ&RU&WREHQHȌWIURPWKLVERRN
\RXQHHGQRWKDYHDQ\NQRZOHGJHRIFRPSXWHUJUDSKLFV1RQHZKDWVRHYHU*38
SURJUDPPLQJVLPSO\RIIHUV\RXDQRSSRUWXQLW\WREXLOGǟDQGWREXLOGPLJKWLO\ǟ
RQ\RXUH[LVWLQJSURJUDPPLQJVNLOOV
7RSURJUDP19,',$*38VWRSHUIRUPJHQHUDOSXUSRVHFRPSXWLQJWDVNV\RX
ZLOOZDQWWRNQRZZKDW&8'$LV19,',$*38VDUHEXLOWRQZKDWǢVNQRZQDV
WKHCUDA Architecture<RXFDQWKLQNRIWKH&8'$$UFKLWHFWXUHDVWKHVFKHPH
E\ZKLFK19,',$KDVEXLOW*38VWKDWFDQSHUIRUPbothWUDGLWLRQDOJUDSKLFV
UHQGHULQJWDVNVandJHQHUDOSXUSRVHWDVNV7RSURJUDP&8'$*38VZHZLOO
EHXVLQJDODQJXDJHNQRZQDVCUDA C$V\RXZLOOVHHYHU\HDUO\LQWKLVERRN
&8'$&LVHVVHQWLDOO\&ZLWKDKDQGIXORIH[WHQVLRQVWRDOORZSURJUDPPLQJRI
PDVVLYHO\SDUDOOHOPDFKLQHVOLNH19,',$*38V
:HǢYHJHDUHGCUDA by ExampleWRZDUGH[SHULHQFHG&RU&SURJUDPPHUV
ZKRKDYHHQRXJKIDPLOLDULW\ZLWK&VXFKWKDWWKH\DUHFRPIRUWDEOHUHDGLQJDQG
ZULWLQJFRGHLQ&7KLVERRNEXLOGVRQ\RXUH[SHULHQFHZLWK&DQGLQWHQGVWRVHUYH
DVDQH[DPSOHGULYHQǤTXLFNVWDUWǥJXLGHWRXVLQJ19,',$ǢV&8'$&SURJUDP-
PLQJODQJXDJH%\QRPHDQVGR\RXQHHGWRKDYHGRQHODUJHVFDOHVRIWZDUH
DUFKLWHFWXUHWRKDYHZULWWHQD&FRPSLOHURUDQRSHUDWLQJV\VWHPNHUQHORUWR
NQRZDOOWKHLQVDQGRXWVRIWKH$16,&VWDQGDUGV+RZHYHUZHGRQRWVSHQG
WLPHUHYLHZLQJ&V\QWD[RUFRPPRQ&OLEUDU\URXWLQHVVXFKDVmalloc()RU
memcpy()VRZHZLOODVVXPHWKDW\RXDUHDOUHDG\UHDVRQDEO\IDPLOLDUZLWKWKHVH
WRSLFV
[Y
<RXZLOOHQFRXQWHUVRPHWHFKQLTXHVWKDWFDQEHFRQVLGHUHGJHQHUDOSDUDOOHO
SURJUDPPLQJSDUDGLJPVDOWKRXJKWKLVERRNGRHVQRWDLPWRWHDFKJHQHUDO
SDUDOOHOSURJUDPPLQJWHFKQLTXHV$OVRZKLOHZHZLOOORRNDWQHDUO\HYHU\SDUWRI
WKH&8'$$3,WKLVERRNGRHVQRWVHUYHDVDQH[WHQVLYH$3,UHIHUHQFHQRUZLOOLW
JRLQWRJRU\GHWDLODERXWHYHU\WRROWKDW\RXFDQXVHWRKHOSGHYHORS\RXU&8'$&
VRIWZDUH&RQVHTXHQWO\ZHKLJKO\UHFRPPHQGWKDWWKLVERRNEHXVHGLQFRQMXQF-
WLRQZLWK19,',$ǢVIUHHO\DYDLODEOHGRFXPHQWDWLRQLQSDUWLFXODUWKHNVIDIA CUDA
Programming Guide DQGWKHNVIDIA CUDA Best Practices Guide%XWGRQǢWVWUHVV
RXWDERXWFROOHFWLQJDOOWKHVHGRFXPHQWVEHFDXVHZHǢOOZDON\RXWKURXJKHYHU\-
WKLQJ\RXQHHGWRGR
:LWKRXWIXUWKHUDGRWKHZRUOGRISURJUDPPLQJ19,',$*38VZLWK&8'$&DZDLWV
[YL
,WǢVEHHQVDLGWKDWLWWDNHVDYLOODJHWRZULWHDWHFKQLFDOERRNDQGCUDA by Example
LVQRH[FHSWLRQWRWKLVDGDJH7KHDXWKRUVRZHGHEWVRIJUDWLWXGHWRPDQ\SHRSOH
VRPHRIZKRPZHZRXOGOLNHWRWKDQNKHUH
,DQ%XFN19,',$ǢVVHQLRUGLUHFWRURI*38FRPSXWLQJVRIWZDUHKDVEHHQLPPHD-
VXUDEO\KHOSIXOLQHYHU\VWDJHRIWKHGHYHORSPHQWRIWKLVERRNIURPFKDPSLRQLQJ
WKHLGHDWRPDQDJLQJPDQ\RIWKHGHWDLOV:HDOVRRZH7LP0XUUD\RXUDOZD\V
VPLOLQJUHYLHZHUPXFKRIWKHFUHGLWIRUWKLVERRNSRVVHVVLQJHYHQDPRGLFXPRI
WHFKQLFDODFFXUDF\DQGUHDGDELOLW\0DQ\WKDQNVDOVRJRWRRXUGHVLJQHU'DUZLQ
7DWZKRFUHDWHGIDQWDVWLFFRYHUDUWDQGȌJXUHVRQDQH[WUHPHO\WLJKWVFKHGXOH
)LQDOO\ZHDUHPXFKREOLJHGWR-RKQ3DUNZKRKHOSHGJXLGHWKLVSURMHFWWKURXJK
WKHGHOLFDWHOHJDOSURFHVVUHTXLUHGRISXEOLVKHGZRUN
:LWKRXWKHOSIURP$GGLVRQ:HVOH\ǢVVWDIIWKLVERRNZRXOGVWLOOEHQRWKLQJPRUH
WKDQDWZLQNOHLQWKHH\HVRIWKHDXWKRUV3HWHU*RUGRQ.LP%RHGLJKHLPHUDQG
-XOLH1DKLOKDYHDOOVKRZQXQERXQGHGSDWLHQFHDQGSURIHVVLRQDOLVPDQGKDYH
JHQXLQHO\PDGHWKHSXEOLFDWLRQRIWKLVERRNDSDLQOHVVSURFHVV$GGLWLRQDOO\
0ROO\6KDUSǢVSURGXFWLRQZRUNDQG.LP:LPSVHWWǢVFRS\HGLWLQJKDYHXWWHUO\
WUDQVIRUPHGWKLVWH[WIURPDSLOHRIGRFXPHQWVULGGOHGZLWKHUURUVWRWKHYROXPH
\RXǢUHUHDGLQJWRGD\
6RPHRIWKHFRQWHQWRIWKLVERRNFRXOGQRWKDYHEHHQLQFOXGHGZLWKRXWWKH
KHOSRIRWKHUFRQWULEXWRUV6SHFLȌFDOO\1DGHHP0RKDPPDGZDVLQVWUXPHQWDO
LQUHVHDUFKLQJWKH&8'$FDVHVWXGLHVZHSUHVHQWLQ&KDSWHUDQG1DWKDQ
:KLWHKHDGJHQHURXVO\SURYLGHGFRGHWKDWZHLQFRUSRUDWHGLQWRH[DPSOHV
WKURXJKRXWWKHERRN
:HZRXOGEHUHPLVVLIZHGLGQǢWWKDQNWKHRWKHUVZKRUHDGHDUO\GUDIWVRI
WKLVWH[WDQGSURYLGHGKHOSIXOIHHGEDFNLQFOXGLQJ*HQHYLHYH%UHHGDQG.XUW
:DOO0DQ\RIWKH19,',$VRIWZDUHHQJLQHHUVSURYLGHGLQYDOXDEOHWHFKQLFDO
[YLL
DVVLVWDQFHGXULQJWKHFRXUVHRIGHYHORSLQJWKHFRQWHQWIRUCUDA by Example,
LQFOXGLQJ0DUN+DLUJURYHZKRVFRXUHGWKHERRNXQFRYHULQJDOOPDQQHURI
LQFRQVLVWHQFLHVǟWHFKQLFDOW\SRJUDSKLFDODQGJUDPPDWLFDO6WHYH+LQHV
1LFKRODV:LOWDQG6WHSKHQ-RQHVFRQVXOWHGRQVSHFLȌFVHFWLRQVRIWKH&8'$
$3,KHOSLQJHOXFLGDWHQXDQFHVWKDWWKHDXWKRUVZRXOGKDYHRWKHUZLVHRYHU-
ORRNHG7KDQNVDOVRJRRXWWR5DQGLPD)HUQDQGRZKRKHOSHGWRJHWWKLVSURMHFW
RIIWKHJURXQGDQGWR0LFKDHO6FKLGORZVN\IRUDFNQRZOHGJLQJ-DVRQLQKLVERRN
$QGZKDWDFNQRZOHGJPHQWVVHFWLRQZRXOGEHFRPSOHWHZLWKRXWDKHDUWIHOW
H[SUHVVLRQRIJUDWLWXGHWRSDUHQWVDQGVLEOLQJV",WLVKHUHWKDWZHZRXOGOLNHWR
WKDQNRXUIDPLOLHVZKRKDYHEHHQZLWKXVWKURXJKHYHU\WKLQJDQGKDYHPDGH
WKLVDOOSRVVLEOH:LWKWKDWVDLGZHZRXOGOLNHWRH[WHQGVSHFLDOWKDQNVWRORYLQJ
SDUHQWV(GZDUGDQG.DWKOHHQ.DQGURWDQG6WHSKHQDQG+HOHQ6DQGHUV7KDQNV
DOVRJRWRRXUEURWKHUV.HQQHWK.DQGURWDQG&RUH\6DQGHUV7KDQN\RXDOOIRU
\RXUXQZDYHULQJVXSSRUW
[YLLL
[L[
7KHUHZDVDWLPHLQWKHQRWVRGLVWDQWSDVWZKHQSDUDOOHOFRPSXWLQJZDVORRNHG
XSRQDVDQǤH[RWLFǥSXUVXLWDQGW\SLFDOO\JRWFRPSDUWPHQWDOL]HGDVDVSHFLDOW\
ZLWKLQWKHȌHOGRIFRPSXWHUVFLHQFH7KLVSHUFHSWLRQKDVFKDQJHGLQSURIRXQG
ZD\VLQUHFHQW\HDUV7KHFRPSXWLQJZRUOGKDVVKLIWHGWRWKHSRLQWZKHUHIDU
IURPEHLQJDQHVRWHULFSXUVXLWQHDUO\HYHU\DVSLULQJSURJUDPPHUneedsWUDLQLQJ
LQSDUDOOHOSURJUDPPLQJWREHIXOO\HIIHFWLYHLQFRPSXWHUVFLHQFH3HUKDSV\RXǢYH
SLFNHGWKLVERRNXSXQFRQYLQFHGDERXWWKHLPSRUWDQFHRISDUDOOHOSURJUDPPLQJ
LQWKHFRPSXWLQJZRUOGWRGD\DQGWKHLQFUHDVLQJO\ODUJHUROHLWZLOOSOD\LQWKH
\HDUVWRFRPH7KLVLQWURGXFWRU\FKDSWHUZLOOH[DPLQHUHFHQWWUHQGVLQWKHKDUG-
ZDUHWKDWGRHVWKHKHDY\OLIWLQJIRUWKHVRIWZDUHWKDWZHDVSURJUDPPHUVZULWH
,QGRLQJVRZHKRSHWRFRQYLQFH\RXWKDWWKHSDUDOOHOFRPSXWLQJUHYROXWLRQKDV
alreadyKDSSHQHGDQGWKDWE\OHDUQLQJ&8'$&\RXǢOOEHZHOOSRVLWLRQHGWRZULWH
KLJKSHUIRUPDQFHDSSOLFDWLRQVIRUKHWHURJHQHRXVSODWIRUPVWKDWFRQWDLQERWK
FHQWUDODQGJUDSKLFVSURFHVVLQJXQLWV
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWWKHLQFUHDVLQJO\LPSRUWDQWUROHRISDUDOOHOFRPSXWLQJ
ǩ <RXZLOOOHDUQDEULHIKLVWRU\RI*38FRPSXWLQJDQG&8'$
ǩ <RXZLOOOHDUQDERXWVRPHVXFFHVVIXODSSOLFDWLRQVWKDWXVH&8'$&
7KH$JHRI3DUDOOHO3URFHVVLQJ
,QUHFHQW\HDUVPXFKKDVEHHQPDGHRIWKHFRPSXWLQJLQGXVWU\ǢVZLGHVSUHDG
VKLIWWRSDUDOOHOFRPSXWLQJ1HDUO\DOOFRQVXPHUFRPSXWHUVLQWKH\HDU
ZLOOVKLSZLWKPXOWLFRUHFHQWUDOSURFHVVRUV)URPWKHLQWURGXFWLRQRIGXDOFRUH
ORZHQGQHWERRNPDFKLQHVWRDQGFRUHZRUNVWDWLRQFRPSXWHUVQRORQJHU
ZLOOSDUDOOHOFRPSXWLQJEHUHOHJDWHGWRH[RWLFVXSHUFRPSXWHUVRUPDLQIUDPHV
0RUHRYHUHOHFWURQLFGHYLFHVVXFKDVPRELOHSKRQHVDQGSRUWDEOHPXVLFSOD\HUV
KDYHEHJXQWRLQFRUSRUDWHSDUDOOHOFRPSXWLQJFDSDELOLWLHVLQDQHIIRUWWRSURYLGH
IXQFWLRQDOLW\ZHOOEH\RQGWKRVHRIWKHLUSUHGHFHVVRUV
0RUHDQGPRUHVRIWZDUHGHYHORSHUVZLOOQHHGWRFRSHZLWKDYDULHW\RISDUDOOHO
FRPSXWLQJSODWIRUPVDQGWHFKQRORJLHVLQRUGHUWRSURYLGHQRYHODQGULFKH[SHUL-
HQFHVIRUDQLQFUHDVLQJO\VRSKLVWLFDWHGEDVHRIXVHUV&RPPDQGSURPSWVDUHRXW
PXOWLWKUHDGHGJUDSKLFDOLQWHUIDFHVDUHLQ&HOOXODUSKRQHVWKDWRQO\PDNHFDOOV
DUHRXWSKRQHVWKDWFDQVLPXOWDQHRXVO\SOD\PXVLFEURZVHWKH:HEDQGSURYLGH
*36VHUYLFHVDUHLQ
RULJLQDOSHUVRQDOFRPSXWHU$OWKRXJKLQFUHDVLQJWKH&38FORFNVSHHGLVFHUWDLQO\
QRWWKHRQO\PHWKRGE\ZKLFKFRPSXWLQJSHUIRUPDQFHKDVEHHQLPSURYHGLWKDV
DOZD\VEHHQDUHOLDEOHVRXUFHIRULPSURYHGSHUIRUPDQFH
,QUHFHQW\HDUVKRZHYHUPDQXIDFWXUHUVKDYHEHHQIRUFHGWRORRNIRUDOWHUQD-
WLYHVWRWKLVWUDGLWLRQDOVRXUFHRILQFUHDVHGFRPSXWDWLRQDOSRZHU%HFDXVHRI
YDULRXVIXQGDPHQWDOOLPLWDWLRQVLQWKHIDEULFDWLRQRILQWHJUDWHGFLUFXLWVLWLVQR
ORQJHUIHDVLEOHWRUHO\RQXSZDUGVSLUDOLQJSURFHVVRUFORFNVSHHGVDVDPHDQV
IRUH[WUDFWLQJDGGLWLRQDOSRZHUIURPH[LVWLQJDUFKLWHFWXUHV%HFDXVHRISRZHUDQG
KHDWUHVWULFWLRQVDVZHOODVDUDSLGO\DSSURDFKLQJSK\VLFDOOLPLWWRWUDQVLVWRUVL]H
UHVHDUFKHUVDQGPDQXIDFWXUHUVKDYHEHJXQWRORRNHOVHZKHUH
2XWVLGHWKHZRUOGRIFRQVXPHUFRPSXWLQJVXSHUFRPSXWHUVKDYHIRUGHFDGHV
H[WUDFWHGPDVVLYHSHUIRUPDQFHJDLQVLQVLPLODUZD\V7KHSHUIRUPDQFHRID
SURFHVVRUXVHGLQDVXSHUFRPSXWHUKDVFOLPEHGDVWURQRPLFDOO\VLPLODUWRWKH
LPSURYHPHQWVLQWKHSHUVRQDOFRPSXWHU&38+RZHYHULQDGGLWLRQWRGUDPDWLF
LPSURYHPHQWVLQWKHSHUIRUPDQFHRIDVLQJOHSURFHVVRUVXSHUFRPSXWHUPDQX-
IDFWXUHUVKDYHDOVRH[WUDFWHGPDVVLYHOHDSVLQSHUIRUPDQFHE\VWHDGLO\LQFUHDVLQJ
WKHnumberRISURFHVVRUV,WLVQRWXQFRPPRQIRUWKHIDVWHVWVXSHUFRPSXWHUVWR
KDYHWHQVRUKXQGUHGVRIWKRXVDQGVRISURFHVVRUFRUHVZRUNLQJLQWDQGHP
,QWKHVHDUFKIRUDGGLWLRQDOSURFHVVLQJSRZHUIRUSHUVRQDOFRPSXWHUVWKH
LPSURYHPHQWLQVXSHUFRPSXWHUVUDLVHVDYHU\JRRGTXHVWLRQ5DWKHUWKDQVROHO\
ORRNLQJWRLQFUHDVHWKHSHUIRUPDQFHRIDVLQJOHSURFHVVLQJFRUHZK\QRWSXW
PRUHWKDQRQHLQDSHUVRQDOFRPSXWHU",QWKLVZD\SHUVRQDOFRPSXWHUVFRXOG
FRQWLQXHWRLPSURYHLQSHUIRUPDQFHZLWKRXWWKHQHHGIRUFRQWLQXLQJLQFUHDVHVLQ
SURFHVVRUFORFNVSHHG
,QIDFHGZLWKDQLQFUHDVLQJO\FRPSHWLWLYHPDUNHWSODFHDQGIHZDOWHUQDWLYHV
OHDGLQJ&38PDQXIDFWXUHUVEHJDQRIIHULQJSURFHVVRUVZLWKWZRFRPSXWLQJFRUHV
LQVWHDGRIRQH2YHUWKHIROORZLQJ\HDUVWKH\IROORZHGWKLVGHYHORSPHQWZLWKWKH
UHOHDVHRIWKUHHIRXUVL[DQGHLJKWFRUHFHQWUDOSURFHVVRUXQLWV6RPHWLPHV
UHIHUUHGWRDVWKHmulticore revolutionWKLVWUHQGKDVPDUNHGDKXJHVKLIWLQWKH
HYROXWLRQRIWKHFRQVXPHUFRPSXWLQJPDUNHW
7RGD\LWLVUHODWLYHO\FKDOOHQJLQJWRSXUFKDVHDGHVNWRSFRPSXWHUZLWKD&38
FRQWDLQLQJEXWDVLQJOHFRPSXWLQJFRUH(YHQORZHQGORZSRZHUFHQWUDOSURFHV-
VRUVVKLSZLWKWZRRUPRUHFRUHVSHUGLH/HDGLQJ&38PDQXIDFWXUHUVKDYH
DOUHDG\DQQRXQFHGSODQVIRUDQGFRUH&38VIXUWKHUFRQȌUPLQJWKDW
SDUDOOHOFRPSXWLQJKDVDUULYHGIRUJRRG
7KH5LVHRI*38&RPSXWLQJ
,QFRPSDULVRQWRWKHFHQWUDOSURFHVVRUǢVWUDGLWLRQDOGDWDSURFHVVLQJSLSHOLQH
SHUIRUPLQJJHQHUDOSXUSRVHFRPSXWDWLRQVRQDJUDSKLFVSURFHVVLQJXQLW *38 LV
DQHZFRQFHSW,QIDFWWKH*38LWVHOILVUHODWLYHO\QHZFRPSDUHGWRWKHFRPSXWLQJ
ȌHOGDWODUJH+RZHYHUWKHLGHDRIFRPSXWLQJRQJUDSKLFVSURFHVVRUVLVQRWDV
QHZDV\RXPLJKWEHOLHYH
$%5,()+,6725<2)*386
:HKDYHDOUHDG\ORRNHGDWKRZFHQWUDOSURFHVVRUVHYROYHGLQERWKFORFNVSHHGV
DQGFRUHFRXQW,QWKHPHDQWLPHWKHVWDWHRIJUDSKLFVSURFHVVLQJXQGHUZHQWD
GUDPDWLFUHYROXWLRQ,QWKHODWHVDQGHDUO\VWKHJURZWKLQSRSXODULW\RI
JUDSKLFDOO\GULYHQRSHUDWLQJV\VWHPVVXFKDV0LFURVRIW:LQGRZVKHOSHGFUHDWH
DPDUNHWIRUDQHZW\SHRISURFHVVRU,QWKHHDUO\VXVHUVEHJDQSXUFKDVLQJ
'GLVSOD\DFFHOHUDWRUVIRUWKHLUSHUVRQDOFRPSXWHUV7KHVHGLVSOD\DFFHOHUDWRUV
RIIHUHGKDUGZDUHDVVLVWHGELWPDSRSHUDWLRQVWRDVVLVWLQWKHGLVSOD\DQGXVDELOLW\
RIJUDSKLFDORSHUDWLQJV\VWHPV
$URXQGWKHVDPHWLPHLQWKHZRUOGRISURIHVVLRQDOFRPSXWLQJDFRPSDQ\E\
WKHQDPHRI6LOLFRQ*UDSKLFVVSHQWWKHVSRSXODUL]LQJWKHXVHRIWKUHH
GLPHQVLRQDOJUDSKLFVLQDYDULHW\RIPDUNHWVLQFOXGLQJJRYHUQPHQWDQGGHIHQVH
DSSOLFDWLRQVDQGVFLHQWLȌFDQGWHFKQLFDOYLVXDOL]DWLRQDVZHOODVSURYLGLQJWKH
WRROVWRFUHDWHVWXQQLQJFLQHPDWLFHIIHFWV,Q6LOLFRQ*UDSKLFVRSHQHGWKH
SURJUDPPLQJLQWHUIDFHWRLWVKDUGZDUHE\UHOHDVLQJWKH2SHQ*/OLEUDU\6LOLFRQ
*UDSKLFVLQWHQGHG2SHQ*/WREHXVHGDVDVWDQGDUGL]HGSODWIRUPLQGHSHQGHQW
PHWKRGIRUZULWLQJ'JUDSKLFVDSSOLFDWLRQV$VZLWKSDUDOOHOSURFHVVLQJDQG
&38VLWZRXOGRQO\EHDPDWWHURIWLPHEHIRUHWKHWHFKQRORJLHVIRXQGWKHLUZD\
LQWRFRQVXPHUDSSOLFDWLRQV
%\WKHPLGVWKHGHPDQGIRUFRQVXPHUDSSOLFDWLRQVHPSOR\LQJ'JUDSKLFV
KDGHVFDODWHGUDSLGO\VHWWLQJWKHVWDJHIRUWZRIDLUO\VLJQLȌFDQWGHYHORSPHQWV
)LUVWWKHUHOHDVHRILPPHUVLYHȌUVWSHUVRQJDPHVVXFKDV'RRP'XNH1XNHP
'DQG4XDNHKHOSHGLJQLWHDTXHVWWRFUHDWHSURJUHVVLYHO\PRUHUHDOLVWLF'HQYL-
URQPHQWVIRU3&JDPLQJ$OWKRXJK'JUDSKLFVZRXOGHYHQWXDOO\ZRUNWKHLUZD\
LQWRQHDUO\DOOFRPSXWHUJDPHVWKHSRSXODULW\RIWKHQDVFHQWȌUVWSHUVRQVKRRWHU
JHQUHZRXOGVLJQLȌFDQWO\DFFHOHUDWHWKHDGRSWLRQRI'JUDSKLFVLQFRQVXPHU
FRPSXWLQJ$WWKHVDPHWLPHFRPSDQLHVVXFKDV19,',$$7,7HFKQRORJLHV
DQGGI[,QWHUDFWLYHEHJDQUHOHDVLQJJUDSKLFVDFFHOHUDWRUVWKDWZHUHDIIRUGDEOH
HQRXJKWRDWWUDFWZLGHVSUHDGDWWHQWLRQ7KHVHGHYHORSPHQWVFHPHQWHG'
JUDSKLFVDVDWHFKQRORJ\WKDWZRXOGȌJXUHSURPLQHQWO\IRU\HDUVWRFRPH
7KHUHOHDVHRI19,',$ǢV*H)RUFHIXUWKHUSXVKHGWKHFDSDELOLWLHVRIFRQVXPHU
JUDSKLFVKDUGZDUH)RUWKHȌUVWWLPHWUDQVIRUPDQGOLJKWLQJFRPSXWDWLRQVFRXOG
EHSHUIRUPHGGLUHFWO\RQWKHJUDSKLFVSURFHVVRUWKHUHE\HQKDQFLQJWKHSRWHQWLDO
IRUHYHQPRUHYLVXDOO\LQWHUHVWLQJDSSOLFDWLRQV6LQFHWUDQVIRUPDQGOLJKWLQJZHUH
DOUHDG\LQWHJUDOSDUWVRIWKH2SHQ*/JUDSKLFVSLSHOLQHWKH*H)RUFHPDUNHG
WKHEHJLQQLQJRIDQDWXUDOSURJUHVVLRQZKHUHLQFUHDVLQJO\PRUHRIWKHJUDSKLFV
SLSHOLQHZRXOGEHLPSOHPHQWHGGLUHFWO\RQWKHJUDSKLFVSURFHVVRU
)URPDSDUDOOHOFRPSXWLQJVWDQGSRLQW19,',$ǢVUHOHDVHRIWKH*H)RUFHVHULHV
LQUHSUHVHQWVDUJXDEO\WKHPRVWLPSRUWDQWEUHDNWKURXJKLQ*38WHFKQRORJ\
7KH*H)RUFHVHULHVZDVWKHFRPSXWLQJLQGXVWU\ǢVȌUVWFKLSWRLPSOHPHQW
0LFURVRIWǢVWKHQQHZ'LUHFW;VWDQGDUG7KLVVWDQGDUGUHTXLUHGWKDWFRPSOLDQW
KDUGZDUHFRQWDLQERWKSURJUDPPDEOHYHUWH[DQGSURJUDPPDEOHSL[HOVKDGLQJ
VWDJHV)RUWKHȌUVWWLPHGHYHORSHUVKDGVRPHFRQWURORYHUWKHH[DFWFRPSXWD-
WLRQVWKDWZRXOGEHSHUIRUPHGRQWKHLU*38V
(VVHQWLDOO\WKH*38VRIWKHHDUO\VZHUHGHVLJQHGWRSURGXFHDFRORUIRU
HYHU\SL[HORQWKHVFUHHQXVLQJSURJUDPPDEOHDULWKPHWLFXQLWVNQRZQDVpixel
shaders,QJHQHUDODSL[HOVKDGHUXVHVLWV(x,y)SRVLWLRQRQWKHVFUHHQDVZHOO
DVVRPHDGGLWLRQDOLQIRUPDWLRQWRFRPELQHYDULRXVLQSXWVLQFRPSXWLQJDȌQDO
FRORU7KHDGGLWLRQDOLQIRUPDWLRQFRXOGEHLQSXWFRORUVWH[WXUHFRRUGLQDWHVRU
RWKHUDWWULEXWHVWKDWZRXOGEHSDVVHGWRWKHVKDGHUZKHQLWUDQ%XWEHFDXVH
WKHDULWKPHWLFEHLQJSHUIRUPHGRQWKHLQSXWFRORUVDQGWH[WXUHVZDVFRPSOHWHO\
FRQWUROOHGE\WKHSURJUDPPHUUHVHDUFKHUVREVHUYHGWKDWWKHVHLQSXWǤFRORUVǥ
FRXOGDFWXDOO\EHanyGDWD
6RLIWKHLQSXWVZHUHDFWXDOO\QXPHULFDOGDWDVLJQLI\LQJVRPHWKLQJRWKHUWKDQ
FRORUSURJUDPPHUVFRXOGWKHQSURJUDPWKHSL[HOVKDGHUVWRSHUIRUPDUELWUDU\
FRPSXWDWLRQVRQWKLVGDWD7KHUHVXOWVZRXOGEHKDQGHGEDFNWRWKH*38DVWKH
ȌQDOSL[HOǤFRORUǥDOWKRXJKWKHFRORUVZRXOGVLPSO\EHWKHUHVXOWRIZKDWHYHU
FRPSXWDWLRQVWKHSURJUDPPHUKDGLQVWUXFWHGWKH*38WRSHUIRUPRQWKHLULQSXWV
7KLVGDWDFRXOGEHUHDGEDFNE\WKHUHVHDUFKHUVDQGWKH*38ZRXOGQHYHUEHWKH
ZLVHU,QHVVHQFHWKH*38ZDVEHLQJWULFNHGLQWRSHUIRUPLQJQRQUHQGHULQJWDVNV
E\PDNLQJWKRVHWDVNVDSSHDUDVLIWKH\ZHUHDVWDQGDUGUHQGHULQJ7KLVWULFNHU\
ZDVYHU\FOHYHUEXWDOVRYHU\FRQYROXWHG
%HFDXVHRIWKHKLJKDULWKPHWLFWKURXJKSXWRI*38VLQLWLDOUHVXOWVIURPWKHVH
H[SHULPHQWVSURPLVHGDEULJKWIXWXUHIRU*38FRPSXWLQJ+RZHYHUWKHSURJUDP-
PLQJPRGHOZDVVWLOOIDUWRRUHVWULFWLYHIRUDQ\FULWLFDOPDVVRIGHYHORSHUVWR
IRUP7KHUHZHUHWLJKWUHVRXUFHFRQVWUDLQWVVLQFHSURJUDPVFRXOGUHFHLYHLQSXW
GDWDRQO\IURPDKDQGIXORILQSXWFRORUVDQGDKDQGIXORIWH[WXUHXQLWV7KHUH
ZHUHVHULRXVOLPLWDWLRQVRQKRZDQGZKHUHWKHSURJUDPPHUFRXOGZULWHUHVXOWV
WRPHPRU\VRDOJRULWKPVUHTXLULQJWKHDELOLW\WRZULWHWRDUELWUDU\ORFDWLRQVLQ
PHPRU\ VFDWWHU FRXOGQRWUXQRQD*380RUHRYHULWZDVQHDUO\LPSRVVLEOHWR
SUHGLFWKRZ\RXUSDUWLFXODU*38ZRXOGGHDOZLWKȍRDWLQJSRLQWGDWDLILWKDQGOHG
ȍRDWLQJSRLQWGDWDDWDOOVRPRVWVFLHQWLȌFFRPSXWDWLRQVZRXOGEHXQDEOHWR
XVHD*38)LQDOO\ZKHQWKHSURJUDPLQHYLWDEO\FRPSXWHGWKHLQFRUUHFWUHVXOWV
IDLOHGWRWHUPLQDWHRUVLPSO\KXQJWKHPDFKLQHWKHUHH[LVWHGQRUHDVRQDEO\JRRG
PHWKRGWRGHEXJDQ\FRGHWKDWZDVEHLQJH[HFXWHGRQWKH*38
$VLIWKHOLPLWDWLRQVZHUHQǢWVHYHUHHQRXJKDQ\RQHZKRstillZDQWHGWRXVHD*38
WRSHUIRUPJHQHUDOSXUSRVHFRPSXWDWLRQVZRXOGQHHGWROHDUQ2SHQ*/RU'LUHFW;
VLQFHWKHVHUHPDLQHGWKHRQO\PHDQVE\ZKLFKRQHFRXOGLQWHUDFWZLWKD*381RW
RQO\GLGWKLVPHDQVWRULQJGDWDLQJUDSKLFVWH[WXUHVDQGH[HFXWLQJFRPSXWDWLRQV
E\FDOOLQJ2SHQ*/RU'LUHFW;IXQFWLRQVEXWLWPHDQWZULWLQJWKHFRPSXWDWLRQV
WKHPVHOYHVLQVSHFLDOJUDSKLFVRQO\SURJUDPPLQJODQJXDJHVNQRZQDVshading
languages$VNLQJUHVHDUFKHUVWRERWKFRSHZLWKVHYHUHUHVRXUFHDQGSURJUDP-
PLQJUHVWULFWLRQVDVZHOODVWROHDUQFRPSXWHUJUDSKLFVDQGVKDGLQJODQJXDJHV
EHIRUHDWWHPSWLQJWRKDUQHVVWKHFRPSXWLQJSRZHURIWKHLU*38SURYHGWRRODUJH
DKXUGOHIRUZLGHDFFHSWDQFH
CUDA
,WZRXOGQRWEHXQWLOȌYH\HDUVDIWHUWKHUHOHDVHRIWKH*H)RUFHVHULHVWKDW*38
FRPSXWLQJZRXOGEHUHDG\IRUSULPHWLPH,Q1RYHPEHU19,',$XQYHLOHGWKH
LQGXVWU\ǢVȌUVW'LUHFW;*38WKH*H)RUFH*7;7KH*H)RUFH*7;ZDV
DOVRWKHȌUVW*38WREHEXLOWZLWK19,',$ǢV&8'$$UFKLWHFWXUH7KLVDUFKLWHFWXUH
LQFOXGHGVHYHUDOQHZFRPSRQHQWVGHVLJQHGVWULFWO\IRU*38FRPSXWLQJDQGDLPHG
WRDOOHYLDWHPDQ\RIWKHOLPLWDWLRQVWKDWSUHYHQWHGSUHYLRXVJUDSKLFVSURFHVVRUV
IURPEHLQJOHJLWLPDWHO\XVHIXOIRUJHQHUDOSXUSRVHFRPSXWDWLRQ
:+$7,67+(&8'$$5&+,7(&785("
8QOLNHSUHYLRXVJHQHUDWLRQVWKDWSDUWLWLRQHGFRPSXWLQJUHVRXUFHVLQWRYHUWH[
DQGSL[HOVKDGHUVWKH&8'$$UFKLWHFWXUHLQFOXGHGDXQLȌHGVKDGHUSLSHOLQH
DOORZLQJHDFKDQGHYHU\DULWKPHWLFORJLFXQLW $/8 RQWKHFKLSWREHPDUVKDOHG
E\DSURJUDPLQWHQGLQJWRSHUIRUPJHQHUDOSXUSRVHFRPSXWDWLRQV%HFDXVH
19,',$LQWHQGHGWKLVQHZIDPLO\RIJUDSKLFVSURFHVVRUVWREHXVHGIRUJHQHUDO
SXUSRVHFRPSXWLQJWKHVH$/8VZHUHEXLOWWRFRPSO\ZLWK,(((UHTXLUHPHQWVIRU
VLQJOHSUHFLVLRQȍRDWLQJSRLQWDULWKPHWLFDQGZHUHGHVLJQHGWRXVHDQLQVWUXF-
WLRQVHWWDLORUHGIRUJHQHUDOFRPSXWDWLRQUDWKHUWKDQVSHFLȌFDOO\IRUJUDSKLFV
)XUWKHUPRUHWKHH[HFXWLRQXQLWVRQWKH*38ZHUHDOORZHGDUELWUDU\UHDGDQG
ZULWHDFFHVVWRPHPRU\DVZHOODVDFFHVVWRDVRIWZDUHPDQDJHGFDFKHNQRZQ
DVshared memory$OORIWKHVHIHDWXUHVRIWKH&8'$$UFKLWHFWXUHZHUHDGGHGLQ
RUGHUWRFUHDWHD*38WKDWZRXOGH[FHODWFRPSXWDWLRQLQDGGLWLRQWRSHUIRUPLQJ
ZHOODWWUDGLWLRQDOJUDSKLFVWDVNV
86,1*7+(&8'$$5&+,7(&785(
7KHHIIRUWE\19,',$WRSURYLGHFRQVXPHUVZLWKDSURGXFWIRUERWKFRPSXWD-
WLRQDQGJUDSKLFVFRXOGQRWVWRSDWSURGXFLQJKDUGZDUHLQFRUSRUDWLQJWKH&8'$
$UFKLWHFWXUHWKRXJK5HJDUGOHVVRIKRZPDQ\IHDWXUHV19,',$DGGHGWRLWVFKLSV
WRIDFLOLWDWHFRPSXWLQJWKHUHFRQWLQXHGWREHQRZD\WRDFFHVVWKHVHIHDWXUHV
ZLWKRXWXVLQJ2SHQ*/RU'LUHFW;1RWRQO\ZRXOGWKLVKDYHUHTXLUHGXVHUVWR
FRQWLQXHWRGLVJXLVHWKHLUFRPSXWDWLRQVDVJUDSKLFVSUREOHPVEXWWKH\ZRXOG
KDYHQHHGHGWRFRQWLQXHZULWLQJWKHLUFRPSXWDWLRQVLQDJUDSKLFVRULHQWHG
VKDGLQJODQJXDJHVXFKDV2SHQ*/ǢV*/6/RU0LFURVRIWǢV+/6/
7RUHDFKWKHPD[LPXPQXPEHURIGHYHORSHUVSRVVLEOH19,',$WRRNLQGXVWU\
VWDQGDUG&DQGDGGHGDUHODWLYHO\VPDOOQXPEHURINH\ZRUGVLQRUGHUWRKDUQHVV
VRPHRIWKHVSHFLDOIHDWXUHVRIWKH&8'$$UFKLWHFWXUH$IHZPRQWKVDIWHU
WKHODXQFKRIWKH*H)RUFH*7;19,',$PDGHSXEOLFDFRPSLOHUIRUWKLV
ODQJXDJH&8'$&$QGZLWKWKDW&8'$&EHFDPHWKHȌUVWODQJXDJHVSHFLȌFDOO\
GHVLJQHGE\D*38FRPSDQ\WRIDFLOLWDWHJHQHUDOSXUSRVHFRPSXWLQJRQ*38V
,QDGGLWLRQWRFUHDWLQJDODQJXDJHWRZULWHFRGHIRUWKH*3819,',$DOVRSURYLGHV
DVSHFLDOL]HGKDUGZDUHGULYHUWRH[SORLWWKH&8'$$UFKLWHFWXUHǢVPDVVLYHFRPSX-
WDWLRQDOSRZHU8VHUVDUHQRORQJHUUHTXLUHGWRKDYHDQ\NQRZOHGJHRIWKH
2SHQ*/RU'LUHFW;JUDSKLFVSURJUDPPLQJLQWHUIDFHVQRUDUHWKH\UHTXLUHGWR
IRUFHWKHLUSUREOHPWRORRNOLNHDFRPSXWHUJUDSKLFVWDVN
$SSOLFDWLRQVRI&8'$
6LQFHLWVGHEXWLQHDUO\DYDULHW\RILQGXVWULHVDQGDSSOLFDWLRQVKDYHHQMR\HG
DJUHDWGHDORIVXFFHVVE\FKRRVLQJWREXLOGDSSOLFDWLRQVLQ&8'$&7KHVH
EHQHȌWVRIWHQLQFOXGHRUGHUVRIPDJQLWXGHSHUIRUPDQFHLPSURYHPHQWRYHUWKH
SUHYLRXVVWDWHRIWKHDUWLPSOHPHQWDWLRQV)XUWKHUPRUHDSSOLFDWLRQVUXQQLQJRQ
19,',$JUDSKLFVSURFHVVRUVHQMR\VXSHULRUSHUIRUPDQFHSHUGROODUDQGSHUIRU-
PDQFHSHUZDWWWKDQLPSOHPHQWDWLRQVEXLOWH[FOXVLYHO\RQWUDGLWLRQDOFHQWUDO
SURFHVVLQJWHFKQRORJLHV7KHIROORZLQJUHSUHVHQWMXVWDIHZRIWKHZD\VLQZKLFK
SHRSOHKDYHSXW&8'$&DQGWKH&8'$$UFKLWHFWXUHLQWRVXFFHVVIXOXVH
7KHPDPPRJUDPRQHRIWKHFXUUHQWEHVWWHFKQLTXHVIRUWKHHDUO\GHWHFWLRQRI
EUHDVWFDQFHUKDVVHYHUDOVLJQLȌFDQWOLPLWDWLRQV7ZRRUPRUHLPDJHVQHHGWREH
WDNHQDQGWKHȌOPQHHGVWREHGHYHORSHGDQGUHDGE\DVNLOOHGGRFWRUWRLGHQWLI\
SRWHQWLDOWXPRUV$GGLWLRQDOO\WKLV;UD\SURFHGXUHFDUULHVZLWKLWDOOWKHULVNVRI
UHSHDWHGO\UDGLDWLQJDSDWLHQWǢVFKHVW$IWHUFDUHIXOVWXG\GRFWRUVRIWHQUHTXLUH
IXUWKHUPRUHVSHFLȌFLPDJLQJǟDQGHYHQELRSV\ǟLQDQDWWHPSWWRHOLPLQDWHWKH
SRVVLELOLW\RIFDQFHU7KHVHIDOVHSRVLWLYHVLQFXUH[SHQVLYHIROORZXSZRUNDQG
FDXVHXQGXHVWUHVVWRWKHSDWLHQWXQWLOȌQDOFRQFOXVLRQVFDQEHGUDZQ
8OWUDVRXQGLPDJLQJLVVDIHUWKDQ;UD\LPDJLQJVRGRFWRUVRIWHQXVHLWLQFRQMXQF-
WLRQZLWKPDPPRJUDSK\WRDVVLVWLQEUHDVWFDQFHUFDUHDQGGLDJQRVLV%XWFRQYHQ-
WLRQDOEUHDVWXOWUDVRXQGKDVLWVOLPLWDWLRQVDVZHOO$VDUHVXOW7HFKQL6FDQ0HGLFDO
6\VWHPVZDVERUQ7HFKQL6FDQKDVGHYHORSHGDSURPLVLQJWKUHHGLPHQVLRQDO
XOWUDVRXQGLPDJLQJPHWKRGEXWLWVVROXWLRQKDGQRWEHHQSXWLQWRSUDFWLFHIRUD
YHU\VLPSOHUHDVRQFRPSXWDWLRQOLPLWDWLRQV6LPSO\SXWFRQYHUWLQJWKHJDWKHUHG
XOWUDVRXQGGDWDLQWRWKHWKUHHGLPHQVLRQDOLPDJHU\UHTXLUHGFRPSXWDWLRQFRQVLG-
HUHGSURKLELWLYHO\WLPHFRQVXPLQJDQGH[SHQVLYHIRUSUDFWLFDOXVH
7KHLQWURGXFWLRQRI19,',$ǢVȌUVW*38EDVHGRQWKH&8'$$UFKLWHFWXUHDORQJZLWK
LWV&8'$&SURJUDPPLQJODQJXDJHSURYLGHGDSODWIRUPRQZKLFK7HFKQL6FDQ
FRXOGFRQYHUWWKHGUHDPVRILWVIRXQGHUVLQWRUHDOLW\$VWKHQDPHLQGLFDWHVLWV
6YDUDXOWUDVRXQGLPDJLQJV\VWHPXVHVXOWUDVRQLFZDYHVWRLPDJHWKHSDWLHQWǢV
FKHVW7KH7HFKQL6FDQ6YDUDV\VWHPUHOLHVRQWZR19,',$7HVOD&SURFHVVRUV
LQRUGHUWRSURFHVVWKH*%RIGDWDJHQHUDWHGE\DPLQXWHVFDQ7KDQNVWR
WKHFRPSXWDWLRQDOKRUVHSRZHURIWKH7HVOD&ZLWKLQPLQXWHVWKHGRFWRU
FDQPDQLSXODWHDKLJKO\GHWDLOHGWKUHHGLPHQVLRQDOLPDJHRIWKHZRPDQǢVEUHDVW
7HFKQL6FDQH[SHFWVZLGHGHSOR\PHQWRILWV6YDUDV\VWHPVWDUWLQJLQ
&20387$7,21$/)/8,''<1$0,&6
)RUPDQ\\HDUVWKHGHVLJQRIKLJKO\HIȌFLHQWURWRUVDQGEODGHVUHPDLQHGD
EODFNDUWRIVRUWV7KHDVWRQLVKLQJO\FRPSOH[PRYHPHQWRIDLUDQGȍXLGVDURXQG
WKHVHGHYLFHVFDQQRWEHHIIHFWLYHO\PRGHOHGE\VLPSOHIRUPXODWLRQVVRDFFX-
UDWHVLPXODWLRQVSURYHIDUWRRFRPSXWDWLRQDOO\H[SHQVLYHWREHUHDOLVWLF2QO\WKH
ODUJHVWVXSHUFRPSXWHUVLQWKHZRUOGFRXOGKRSHWRRIIHUFRPSXWDWLRQDOUHVRXUFHV
RQSDUZLWKWKHVRSKLVWLFDWHGQXPHULFDOPRGHOVUHTXLUHGWRGHYHORSDQGYDOLGDWH
GHVLJQV6LQFHIHZKDYHDFFHVVWRVXFKPDFKLQHVLQQRYDWLRQLQWKHGHVLJQRI
VXFKPDFKLQHVFRQWLQXHGWRVWDJQDWH
7KH8QLYHUVLW\RI&DPEULGJHLQDJUHDWWUDGLWLRQVWDUWHGE\&KDUOHV%DEEDJHLV
KRPHWRDFWLYHUHVHDUFKLQWRDGYDQFHGSDUDOOHOFRPSXWLQJ'U*UDKDP3XOODQ
DQG'U7RELDV%UDQGYLNRIWKHǤPDQ\FRUHJURXSǥFRUUHFWO\LGHQWLȌHGWKHSRWHQ-
WLDOLQ19,',$ǢV&8'$$UFKLWHFWXUHWRDFFHOHUDWHFRPSXWDWLRQDOȍXLGG\QDPLFV
XQSUHFHGHQWHGOHYHOV7KHLULQLWLDOLQYHVWLJDWLRQVLQGLFDWHGWKDWDFFHSWDEOHOHYHOV
RISHUIRUPDQFHFRXOGEHGHOLYHUHGE\*38SRZHUHGSHUVRQDOZRUNVWDWLRQV
/DWHUWKHXVHRIDVPDOO*38FOXVWHUHDVLO\RXWSHUIRUPHGWKHLUPXFKPRUHFRVWO\
VXSHUFRPSXWHUVDQGIXUWKHUFRQȌUPHGWKHLUVXVSLFLRQVWKDWWKHFDSDELOLWLHVRI
19,',$ǢV*38PDWFKHGH[WUHPHO\ZHOOZLWKWKHSUREOHPVWKH\ZDQWHGWRVROYH
)RUWKHUHVHDUFKHUVDW&DPEULGJHWKHPDVVLYHSHUIRUPDQFHJDLQVRIIHUHGE\
&8'$&UHSUHVHQWPRUHWKDQDVLPSOHLQFUHPHQWDOERRVWWRWKHLUVXSHUFRP-
SXWLQJUHVRXUFHV7KHDYDLODELOLW\RIFRSLRXVDPRXQWVRIORZFRVW*38FRPSXWD-
WLRQHPSRZHUHGWKH&DPEULGJHUHVHDUFKHUVWRSHUIRUPUDSLGH[SHULPHQWDWLRQ
5HFHLYLQJH[SHULPHQWDOUHVXOWVZLWKLQVHFRQGVVWUHDPOLQHGWKHIHHGEDFNSURFHVV
RQZKLFKUHVHDUFKHUVUHO\LQRUGHUWRDUULYHDWEUHDNWKURXJKV$VDUHVXOWWKH
XVHRI*38FOXVWHUVKDVIXQGDPHQWDOO\WUDQVIRUPHGWKHZD\WKH\DSSURDFKWKHLU
UHVHDUFK1HDUO\LQWHUDFWLYHVLPXODWLRQKDVXQOHDVKHGQHZRSSRUWXQLWLHVIRU
LQQRYDWLRQDQGFUHDWLYLW\LQDSUHYLRXVO\VWLȍHGȌHOGRIUHVHDUFK
7KHNH\FRPSRQHQWVWRFOHDQLQJDJHQWVDUHNQRZQDVsurfactants6XUIDFWDQW
PROHFXOHVGHWHUPLQHWKHFOHDQLQJFDSDFLW\DQGWH[WXUHRIGHWHUJHQWVDQGVKDP-
SRRVEXWWKH\DUHRIWHQLPSOLFDWHGDVWKHPRVWHQYLURQPHQWDOO\GHYDVWDWLQJ
FRPSRQHQWRIFOHDQLQJSURGXFWV7KHVHPROHFXOHVDWWDFKWKHPVHOYHVWRGLUWDQG
WKHQPL[ZLWKZDWHUVXFKWKDWWKHVXUIDFWDQWVFDQEHULQVHGDZD\DORQJZLWKWKH
GLUW7UDGLWLRQDOO\PHDVXULQJWKHFOHDQLQJYDOXHRIDQHZVXUIDFWDQWZRXOGUHTXLUH
H[WHQVLYHODERUDWRU\WHVWLQJLQYROYLQJQXPHURXVFRPELQDWLRQVRIPDWHULDOVDQG
LPSXULWLHVWREHFOHDQHG7KLVSURFHVVQRWVXUSULVLQJO\FDQEHYHU\VORZDQG
H[SHQVLYH
7HPSOH8QLYHUVLW\KDVEHHQZRUNLQJZLWKLQGXVWU\OHDGHU3URFWHU *DPEOHWR
XVHPROHFXODUVLPXODWLRQRIVXUIDFWDQWLQWHUDFWLRQVZLWKGLUWZDWHUDQGRWKHU
PDWHULDOV7KHLQWURGXFWLRQRIFRPSXWHUVLPXODWLRQVVHUYHVQRWMXVWWRDFFHOHUDWH
DWUDGLWLRQDOODEDSSURDFKEXWLWH[WHQGVWKHEUHDGWKRIWHVWLQJWRQXPHURXVYDUL-
DQWVRIHQYLURQPHQWDOFRQGLWLRQVIDUPRUHWKDQFRXOGEHSUDFWLFDOO\WHVWHGLQWKH
SDVW7HPSOHUHVHDUFKHUVXVHGWKH*38DFFHOHUDWHG+LJKO\2SWLPL]HG2EMHFW
RULHQWHG0DQ\SDUWLFOH'\QDPLFV +220' VLPXODWLRQVRIWZDUHZULWWHQE\WKH
'HSDUWPHQWRI(QHUJ\ǢV$PHV/DERUDWRU\%\VSOLWWLQJWKHLUVLPXODWLRQDFURVVWZR
10
19,',$7HVOD*38VWKH\ZHUHDEOHDFKLHYHHTXLYDOHQWSHUIRUPDQFHWRWKH
&38FRUHVRIWKH&UD\;7DQGWRWKH&38VRIDQ,%0%OXH*HQH/PDFKLQH
%\LQFUHDVLQJWKHQXPEHURI7HVOD*38VLQWKHLUVROXWLRQWKH\DUHDOUHDG\VLPX-
ODWLQJVXUIDFWDQWLQWHUDFWLRQVDWWLPHVWKHSHUIRUPDQFHRISUHYLRXVSODWIRUPV
6LQFH19,',$ǢV&8'$KDVUHGXFHGWKHWLPHWRFRPSOHWHVXFKFRPSUHKHQVLYH
VLPXODWLRQVIURPVHYHUDOZHHNVWRDIHZKRXUVWKH\HDUVWRFRPHVKRXOGRIIHU
DGUDPDWLFULVHLQSURGXFWVWKDWKDYHERWKLQFUHDVHGHIIHFWLYHQHVVDQGUHGXFHG
HQYLURQPHQWDOLPSDFW
&KDSWHU5HYLHZ
7KHFRPSXWLQJLQGXVWU\LVDWWKHSUHFLSLFHRIDSDUDOOHOFRPSXWLQJUHYROXWLRQ
DQG19,',$ǢV&8'$&KDVWKXVIDUEHHQRQHRIWKHPRVWVXFFHVVIXOODQJXDJHV
HYHUGHVLJQHGIRUSDUDOOHOFRPSXWLQJ7KURXJKRXWWKHFRXUVHRIWKLVERRNZHZLOO
KHOS\RXOHDUQKRZWRZULWH\RXURZQFRGHLQ&8'$&:HZLOOKHOS\RXOHDUQWKH
VSHFLDOH[WHQVLRQVWR&DQGWKHDSSOLFDWLRQSURJUDPPLQJLQWHUIDFHVWKDW19,',$
KDVFUHDWHGLQVHUYLFHRI*38FRPSXWLQJ<RXDUHnotH[SHFWHGWRNQRZ2SHQ*/
RU'LUHFW;QRUDUH\RXH[SHFWHGWRKDYHDQ\EDFNJURXQGLQFRPSXWHUJUDSKLFV
:HZLOOQRWEHFRYHULQJWKHEDVLFVRISURJUDPPLQJLQ&VRZHGRQRWUHFRPPHQG
WKLVERRNWRSHRSOHFRPSOHWHO\QHZWRFRPSXWHUSURJUDPPLQJ6RPHIDPLO-
LDULW\ZLWKSDUDOOHOSURJUDPPLQJPLJKWKHOSDOWKRXJKZHGRQRWexpect\RXWR
KDYHGRQHDQ\SDUDOOHOSURJUDPPLQJ$Q\WHUPVRUFRQFHSWVUHODWHGWRSDUDOOHO
SURJUDPPLQJWKDW\RXZLOOQHHGWRXQGHUVWDQGZLOOEHH[SODLQHGLQWKHWH[W,Q
IDFWWKHUHPD\EHVRPHRFFDVLRQVZKHQ\RXȌQGWKDWNQRZOHGJHRIWUDGLWLRQDO
SDUDOOHOSURJUDPPLQJZLOOFDXVH\RXWRPDNHDVVXPSWLRQVDERXW*38FRPSXWLQJ
WKDWSURYHXQWUXH6RLQUHDOLW\DPRGHUDWHDPRXQWRIH[SHULHQFHZLWK&RU&
SURJUDPPLQJLVWKHRQO\SUHUHTXLVLWHWRPDNLQJLWWKURXJKWKLVERRN
,QWKHQH[WFKDSWHUZHZLOOKHOS\RXVHWXS\RXUPDFKLQHIRU*38FRPSXWLQJ
HQVXULQJWKDW\RXKDYHERWKWKHKDUGZDUHDQGWKHVRIWZDUHFRPSRQHQWVQHFHV-
VDU\JHWVWDUWHG$IWHUWKDW\RXǢOOEHUHDG\WRJHW\RXUKDQGVGLUW\ZLWK&8'$&,I
\RXDOUHDG\KDYHVRPHH[SHULHQFHZLWK&8'$&RU\RXǢUHVXUHWKDW\RXUV\VWHP
KDVEHHQSURSHUO\VHWXSWRGRGHYHORSPHQWLQ&8'$&\RXFDQVNLSWR&KDSWHU
11
:HKRSHWKDW&KDSWHUKDVJRWWHQ\RXH[FLWHGWRJHWVWDUWHGOHDUQLQJ&8'$&
6LQFHWKLVERRNLQWHQGVWRWHDFK\RXWKHODQJXDJHWKURXJKDVHULHVRIFRGLQJ
H[DPSOHV\RXǢOOQHHGDIXQFWLRQLQJGHYHORSPHQWHQYLURQPHQW6XUH\RXFRXOG
VWDQGRQWKHVLGHOLQHDQGZDWFKEXWZHWKLQN\RXǢOOKDYHPRUHIXQDQGVWD\
LQWHUHVWHGORQJHULI\RXMXPSLQDQGJHWVRPHSUDFWLFDOH[SHULHQFHKDFNLQJ
&8'$&FRGHDVVRRQDVSRVVLEOH,QWKLVYHLQWKLVFKDSWHUZLOOZDON\RX
WKURXJKVRPHRIWKHKDUGZDUHDQGVRIWZDUHFRPSRQHQWV\RXǢOOQHHGLQRUGHUWR
JHWVWDUWHG7KHJRRGQHZVLVWKDW\RXFDQREWDLQDOORIWKHVRIWZDUH\RXǢOOQHHG
IRUIUHHOHDYLQJ\RXPRUHPRQH\IRUZKDWHYHUWLFNOHV\RXUIDQF\
13
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOGRZQORDGDOOWKHVRIWZDUHFRPSRQHQWVUHTXLUHGWKURXJKWKLVERRN
ǩ <RXZLOOVHWXSDQHQYLURQPHQWLQZKLFK\RXFDQEXLOGFRGHZULWWHQLQ&8'$&
'HYHORSPHQW(QYLURQPHQW
%HIRUHHPEDUNLQJRQWKLVMRXUQH\\RXZLOOQHHGWRVHWXSDQHQYLURQPHQWLQZKLFK
\RXFDQGHYHORSXVLQJ&8'$&7KHSUHUHTXLVLWHVWRGHYHORSLQJFRGHLQ&8'$&
DUHDVIROORZV
ǩ $&8'$HQDEOHGJUDSKLFVSURFHVVRU
ǩ $Q19,',$GHYLFHGULYHU
ǩ $&8'$GHYHORSPHQWWRRONLW
ǩ $VWDQGDUG&FRPSLOHU
7RPDNHWKLVFKDSWHUDVSDLQOHVVDVSRVVLEOHZHǢOOZDONWKURXJKHDFKRIWKHVH
SUHUHTXLVLWHVQRZ
&8'$Ȑ(1$%/('*5$3+,&6352&(66256
)RUWXQDWHO\LWVKRXOGEHHDV\WRȌQG\RXUVHOIDJUDSKLFVSURFHVVRUWKDWKDV
EHHQEXLOWRQWKH&8'$$UFKLWHFWXUHEHFDXVHHYHU\19,',$*38VLQFHWKH
UHOHDVHRIWKH*H)RUFH*7;KDVEHHQ&8'$HQDEOHG6LQFH19,',$UHJXODUO\
UHOHDVHVQHZ*38VEDVHGRQWKH&8'$$UFKLWHFWXUHWKHIROORZLQJZLOOXQGRXEW-
HGO\EHRQO\DSDUWLDOOLVWRI&8'$HQDEOHG*38V1HYHUWKHOHVVWKH*38VDUHDOO
&8'$FDSDEOH
)RUDFRPSOHWHOLVW\RXVKRXOGFRQVXOWWKH19,',$ZHEVLWHDW
ZZZQYLGLDFRPFXGD
DOWKRXJKLWLVVDIHWRDVVXPHWKDWDOOUHFHQW*38V *38VIURPRQ ZLWKPRUH
WKDQ0%RIJUDSKLFVPHPRU\FDQEHXVHGWRGHYHORSDQGUXQFRGHZULWWHQ
ZLWK&8'$&
14
Continued
15
<RXZLOOOHDUQWKHVHGHWDLOVLQWKHQH[WFKDSWHUEXWVLQFH\RXU&8'$&DSSOLFD-
WLRQVDUHJRLQJWREHFRPSXWLQJRQWZRGLIIHUHQWSURFHVVRUV\RXDUHFRQVHTXHQWO\
JRLQJWRQHHGWZRFRPSLOHUV2QHFRPSLOHUZLOOFRPSLOHFRGHIRU\RXU*38DQG
RQHZLOOFRPSLOHFRGHIRU\RXU&3819,',$SURYLGHVWKHFRPSLOHUIRU\RXU*38
FRGH$VZLWKWKH19,',$GHYLFHGULYHU\RXFDQGRZQORDGWKHCUDA Toolkit DW
KWWSGHYHORSHUQYLGLDFRPREMHFWJSXFRPSXWLQJKWPO &OLFNWKH&8'$7RRONLW
OLQNWRUHDFKWKHGRZQORDGSDJHVKRZQLQ)LJXUH
16
17
<RXZLOODJDLQEHDVNHGWRVHOHFW\RXUSODWIRUPIURPDPRQJDQGELW
YHUVLRQVRI:LQGRZV;3:LQGRZV9LVWD:LQGRZV/LQX[DQG0DF26)URPWKH
DYDLODEOHGRZQORDGV\RXQHHGWRGRZQORDGWKH&8'$7RRONLWLQRUGHUWREXLOGWKH
FRGHH[DPSOHVFRQWDLQHGLQWKLVERRN$GGLWLRQDOO\\RXDUHHQFRXUDJHGDOWKRXJK
QRWUHTXLUHGWRGRZQORDGWKH*38&RPSXWLQJ6'.FRGHVDPSOHVZKLFKFRQWDLQV
GR]HQVRIKHOSIXOH[DPSOHSURJUDPV7KH*38&RPSXWLQJ6'.FRGHVDPSOHVZLOO
QRWEHFRYHUHGLQWKLVERRNEXWWKH\QLFHO\FRPSOHPHQWWKHPDWHULDOZHLQWHQG
WRFRYHUDQGDVZLWKOHDUQLQJDQ\VW\OHRISURJUDPPLQJWKHPRUHH[DPSOHVWKH
EHWWHU<RXVKRXOGDOVRWDNHQRWHWKDWDOWKRXJKQHDUO\DOOWKHFRGHLQWKLVERRNZLOO
ZRUNRQWKH/LQX[:LQGRZVDQG0DF26SODWIRUPVZHKDYHWDUJHWHGWKHDSSOL-
FDWLRQVWRZDUG/LQX[DQG:LQGRZV,I\RXDUHXVLQJ0DF26;\RXZLOOEHOLYLQJ
GDQJHURXVO\DQGXVLQJXQVXSSRUWHGFRGHH[DPSOHV
WINDOWS
2Q0LFURVRIW:LQGRZVSODWIRUPVLQFOXGLQJ:LQGRZV;3:LQGRZV9LVWD:LQGRZV
6HUYHUDQG:LQGRZVZHUHFRPPHQGXVLQJWKH0LFURVRIW9LVXDO6WXGLR&
FRPSLOHU19,',$FXUUHQWO\VXSSRUWVERWKWKH9LVXDO6WXGLRDQG9LVXDO6WXGLR
IDPLOLHVRISURGXFWV$V0LFURVRIWUHOHDVHVQHZYHUVLRQV19,',$ZLOOOLNHO\
DGGVXSSRUWIRUQHZHUHGLWLRQVRI9LVXDO6WXGLRZKLOHGURSSLQJVXSSRUWIRUROGHU
YHUVLRQV0DQ\&DQG&GHYHORSHUVDOUHDG\KDYH9LVXDO6WXGLRRU9LVXDO
6WXGLRLQVWDOOHGRQWKHLUPDFKLQHVRLIWKLVDSSOLHVWR\RX\RXFDQVDIHO\
VNLSWKLVVXEVHFWLRQ
,I\RXGRQRWKDYHDFFHVVWRDVXSSRUWHGYHUVLRQRI9LVXDO6WXGLRDQGDUHQǢWUHDG\
WRLQYHVWLQDFRS\0LFURVRIWGRHVSURYLGHIUHHGRZQORDGVRIWKH9LVXDO6WXGLR
([SUHVVHGLWLRQRQLWVZHEVLWH$OWKRXJKW\SLFDOO\XQVXLWDEOHIRUFRPPHUFLDO
VRIWZDUHGHYHORSPHQWWKH9LVXDO6WXGLR([SUHVVHGLWLRQVDUHDQH[FHOOHQWZD\WR
JHWVWDUWHGGHYHORSLQJ&8'$&RQ:LQGRZVSODWIRUPVZLWKRXWLQYHVWLQJPRQH\LQ
VRIWZDUHOLFHQVHV6RKHDGRQRYHUWRZZZPLFURVRIWFRPYLVXDOVWXGLRLI\RXǢUH
LQQHHGRI9LVXDO6WXGLR
18
/,18;
0RVW/LQX[GLVWULEXWLRQVW\SLFDOO\VKLSZLWKDYHUVLRQRIWKH*18&FRPSLOHU
(gcc LQVWDOOHG$VRI&8'$WKHIROORZLQJ/LQX[GLVWULEXWLRQVVKLSSHGZLWK
VXSSRUWHGYHUVLRQVRIgccLQVWDOOHG
ǩ 5HG+DW(QWHUSULVH/LQX[
ǩ 5HG+DW(QWHUSULVH/LQX[
ǩ 2SHQ686(
ǩ 686(/LQX[(QWHUSULVH'HVNWRS
ǩ 8EXQWX
ǩ )HGRUD
,I\RXǢUHDGLHKDUG/LQX[XVHU\RXǢUHSUREDEO\DZDUHWKDWPDQ\/LQX[VRIWZDUH
SDFNDJHVZRUNRQIDUPRUHWKDQMXVWWKHǤVXSSRUWHGǥSODWIRUPV7KH&8'$
7RRONLWLVQRH[FHSWLRQVRHYHQLI\RXUIDYRULWHGLVWULEXWLRQLVQRWOLVWHGKHUHLW
PD\EHZRUWKWU\LQJLWDQ\ZD\7KHGLVWULEXWLRQǢVNHUQHOgccDQGglibcYHUVLRQV
ZLOOLQDODUJHSDUWGHWHUPLQHZKHWKHUWKHGLVWULEXWLRQLVFRPSDWLEOH
0$&,1726+26;
,I\RXZDQWWRGHYHORSRQ0DF26;\RXZLOOQHHGWRHQVXUHWKDW\RXUPDFKLQH
KDVDWOHDVWYHUVLRQRI0DF26;7KLVLQFOXGHVYHUVLRQ0DF26;
Ǥ6QRZ/HRSDUGǥ)XUWKHUPRUH\RXZLOOQHHGWRLQVWDOOgccE\GRZQORDGLQJ
DQGLQVWDOOLQJ$SSOHǢV;FRGH7KLVVRIWZDUHLVSURYLGHGIUHHWR$SSOH'HYHORSHU
&RQQHFWLRQ $'& PHPEHUVDQGFDQEHGRZQORDGHGIURPKWWSGHYHORSHUDSSOH
FRPWRROV;FRGH7KHFRGHLQWKLVERRNZDVGHYHORSHGRQ/LQX[DQG:LQGRZV
SODWIRUPVEXWVKRXOGZRUNZLWKRXWPRGLȌFDWLRQRQ0DF26;V\VWHPV
&KDSWHU5HYLHZ
,I\RXKDYHIROORZHGWKHVWHSVLQWKLVFKDSWHU\RXDUHUHDG\WRVWDUWGHYHORSLQJ
FRGHLQ&8'$&3HUKDSV\RXKDYHHYHQSOD\HGDURXQGZLWKVRPHRIWKH19,',$
*38&RPSXWLQJ6'.FRGHVDPSOHV\RXGRZQORDGHGIURP19,',$ǢVZHEVLWH,IVR
ZHDSSODXG\RXUZLOOLQJQHVVWRWLQNHU,IQRWGRQǢWZRUU\(YHU\WKLQJ\RXQHHGLV
ULJKWKHUHLQWKLVERRN(LWKHUZD\\RXǢUHSUREDEO\UHDG\WRVWDUWZULWLQJ\RXUȌUVW
SURJUDPLQ&8'$&VROHWǢVJHWVWDUWHG
19
,I\RXUHDG&KDSWHUZHKRSHZHKDYHFRQYLQFHG\RXRIERWKWKHLPPHQVH
FRPSXWDWLRQDOSRZHURIJUDSKLFVSURFHVVRUVDQGWKDW\RXDUHMXVWWKH
SURJUDPPHUWRKDUQHVVLW$QGLI\RXFRQWLQXHGWKURXJK&KDSWHU\RXVKRXOG
KDYHDIXQFWLRQLQJHQYLURQPHQWVHWXSLQRUGHUWRFRPSLOHDQGUXQWKHFRGH
\RXǢOOEHZULWLQJLQ&8'$&,I\RXVNLSSHGWKHȌUVWFKDSWHUVSHUKDSV\RXǢUHMXVW
VNLPPLQJIRUFRGHVDPSOHVSHUKDSV\RXUDQGRPO\RSHQHGWRWKLVSDJHZKLOH
EURZVLQJDWDERRNVWRUHRUPD\EH\RXǢUHMXVWG\LQJWRJHWVWDUWHGWKDWǢV2.WRR
ZHZRQǢWWHOO (LWKHUZD\\RXǢUHUHDG\WRJHWVWDUWHGZLWKWKHȌUVWFRGHH[DP-
SOHVVROHWǢVJR
21
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOZULWH\RXUȌUVWOLQHVRIFRGHLQ&8'$&
ǩ <RXZLOOOHDUQWKHGLIIHUHQFHEHWZHHQFRGHZULWWHQIRUWKHhostDQGFRGHZULWWHQ
IRUDdevice
ǩ <RXZLOOOHDUQKRZWRUXQGHYLFHFRGHIURPWKHKRVW
ǩ <RXZLOOOHDUQDERXWWKHZD\VGHYLFHPHPRU\FDQEHXVHGRQ&8'$FDSDEOH
GHYLFHV
ǩ <RXZLOOOHDUQKRZWRTXHU\\RXUV\VWHPIRULQIRUPDWLRQRQLWV&8'$FDSDEOH
GHYLFHV
$)LUVW3URJUDP
6LQFHZHLQWHQGWROHDUQ&8'$&E\H[DPSOHOHWǢVWDNHDORRNDWRXUȌUVWH[DPSOH
RI&8'$&,QDFFRUGDQFHZLWKWKHODZVJRYHUQLQJZULWWHQZRUNVRIFRPSXWHU
SURJUDPPLQJZHEHJLQE\H[DPLQLQJDǤ+HOOR:RUOGǥH[DPSOH
+(//2:25/'
#include "../common/book.h"
$WWKLVSRLQWQRGRXEW\RXǢUHZRQGHULQJZKHWKHUWKLVERRNLVDVFDP,VWKLVMXVW
&"'RHV&8'$&HYHQH[LVW"7KHDQVZHUVWRWKHVHTXHVWLRQVDUHERWKLQWKHDIȌU-
PDWLYHWKLVERRNLVQRWDQHODERUDWHUXVH7KLVVLPSOHǤ+HOOR:RUOGǥH[DPSOHLV
22
PHDQWWRLOOXVWUDWHWKDWDWLWVPRVWEDVLFWKHUHLVQRGLIIHUHQFHEHWZHHQ&8'$&
DQGWKHVWDQGDUG&WRZKLFK\RXKDYHJURZQDFFXVWRPHG
7KHVLPSOLFLW\RIWKLVH[DPSOHVWHPVIURPWKHIDFWWKDWLWUXQVHQWLUHO\RQWKHhost
7KLVZLOOEHRQHRIWKHLPSRUWDQWGLVWLQFWLRQVPDGHLQWKLVERRNZHUHIHUWRWKH
&38DQGWKHV\VWHPǢVPHPRU\DVWKHhostDQGUHIHUWRWKH*38DQGLWVPHPRU\
DVWKHdevice7KLVH[DPSOHUHVHPEOHVDOPRVWDOOWKHFRGH\RXKDYHHYHUZULWWHQ
EHFDXVHLWVLPSO\LJQRUHVDQ\FRPSXWLQJGHYLFHVRXWVLGHWKHKRVW
7RUHPHG\WKDWVLQNLQJIHHOLQJWKDW\RXǢYHLQYHVWHGLQQRWKLQJPRUHWKDQDQ
H[SHQVLYHFROOHFWLRQRIWULYLDOLWLHVZHZLOOJUDGXDOO\EXLOGXSRQWKLVVLPSOH
H[DPSOH/HWǢVORRNDWVRPHWKLQJWKDWXVHVWKH*38 Ddevice WRH[HFXWHFRGH
$IXQFWLRQWKDWH[HFXWHVRQWKHGHYLFHLVW\SLFDOO\FDOOHGDkernel
#include <iostream>
7KLVSURJUDPPDNHVWZRQRWDEOHDGGLWLRQVWRWKHRULJLQDOǤ+HOOR:RUOGǥ
H[DPSOH
ǩ $QHPSW\IXQFWLRQQDPHGkernel()TXDOLȌHGZLWK__global__
ǩ $FDOOWRWKHHPSW\IXQFWLRQHPEHOOLVKHGZLWK<<<1,1>>>
$VZHVDZLQWKHSUHYLRXVVHFWLRQFRGHLVFRPSLOHGE\\RXUV\VWHPǢVVWDQGDUG
&FRPSLOHUE\GHIDXOW)RUH[DPSOH*18gccPLJKWFRPSLOH\RXUKRVWFRGH
23
RQ/LQX[RSHUDWLQJV\VWHPVZKLOH0LFURVRIW9LVXDO&FRPSLOHVLWRQ:LQGRZV
V\VWHPV7KH19,',$WRROVVLPSO\IHHGWKLVKRVWFRPSLOHU\RXUFRGHDQGHYHU\-
WKLQJEHKDYHVDVLWZRXOGLQDZRUOGZLWKRXW&8'$
1RZZHVHHWKDW&8'$&DGGVWKH__global__TXDOLȌHUWRVWDQGDUG&7KLV
PHFKDQLVPDOHUWVWKHFRPSLOHUWKDWDIXQFWLRQVKRXOGEHFRPSLOHGWRUXQRQ
DGHYLFHLQVWHDGRIWKHKRVW,QWKLVVLPSOHH[DPSOHnvccJLYHVWKHIXQFWLRQ
kernel()WRWKHFRPSLOHUWKDWKDQGOHVGHYLFHFRGHDQGLWIHHGVmain()WRWKH
KRVWFRPSLOHUDVLWGLGLQWKHSUHYLRXVH[DPSOH
6RZKDWLVWKHP\VWHULRXVFDOOWRkernel()DQGZK\PXVWZHYDQGDOL]HRXU
VWDQGDUG&ZLWKDQJOHEUDFNHWVDQGDQXPHULFWXSOH"%UDFH\RXUVHOIEHFDXVHWKLV
LVZKHUHWKHPDJLFKDSSHQV
:HKDYHVHHQWKDW&8'$&QHHGHGDOLQJXLVWLFPHWKRGIRUPDUNLQJDIXQFWLRQ
DVGHYLFHFRGH7KHUHLVQRWKLQJVSHFLDODERXWWKLVLWLVVKRUWKDQGWRVHQGKRVW
FRGHWRRQHFRPSLOHUDQGGHYLFHFRGHWRDQRWKHUFRPSLOHU7KHWULFNLVDFWXDOO\LQ
FDOOLQJWKHGHYLFHFRGHIURPWKHKRVWFRGH2QHRIWKHEHQHȌWVRI&8'$&LVWKDW
LWSURYLGHVWKLVODQJXDJHLQWHJUDWLRQVRWKDWGHYLFHIXQFWLRQFDOOVORRNYHU\PXFK
OLNHKRVWIXQFWLRQFDOOV/DWHUZHZLOOGLVFXVVZKDWDFWXDOO\KDSSHQVEHKLQGWKH
VFHQHVEXWVXIȌFHWRVD\WKDWWKH&8'$FRPSLOHUDQGUXQWLPHWDNHFDUHRIWKH
PHVV\EXVLQHVVRILQYRNLQJGHYLFHFRGHIURPWKHKRVW
6RWKHP\VWHULRXVORRNLQJFDOOLQYRNHVGHYLFHFRGHEXWZK\WKHDQJOHEUDFNHWV
DQGQXPEHUV"7KHDQJOHEUDFNHWVGHQRWHDUJXPHQWVZHSODQWRSDVVWRWKH
UXQWLPHV\VWHP7KHVHDUHQRWDUJXPHQWVWRWKHGHYLFHFRGHEXWDUHSDUDPHWHUV
WKDWZLOOLQȍXHQFHKRZWKHUXQWLPHZLOOODXQFKRXUGHYLFHFRGH:HZLOOOHDUQ
DERXWWKHVHSDUDPHWHUVWRWKHUXQWLPHLQWKHQH[WFKDSWHU$UJXPHQWVWRWKH
GHYLFHFRGHLWVHOIJHWSDVVHGZLWKLQWKHSDUHQWKHVHVMXVWOLNHDQ\RWKHUIXQFWLRQ
LQYRFDWLRQ
24
#include <iostream>
#include "book.h"
add<<<1,1>>>( 2, 7, dev_c );
return 0;
}
<RXZLOOQRWLFHDKDQGIXORIQHZOLQHVKHUHEXWWKHVHFKDQJHVLQWURGXFHRQO\WZR
FRQFHSWV
ǩ :HFDQSDVVSDUDPHWHUVWRDNHUQHODVZHZRXOGZLWKDQ\&IXQFWLRQ
ǩ :HQHHGWRDOORFDWHPHPRU\WRGRDQ\WKLQJXVHIXORQDGHYLFHVXFKDVUHWXUQ
YDOXHVWRWKHKRVW
7KHUHLVQRWKLQJVSHFLDODERXWSDVVLQJSDUDPHWHUVWRDNHUQHO7KHDQJOHEUDFNHW
V\QWD[QRWZLWKVWDQGLQJDNHUQHOFDOOORRNVDQGDFWVH[DFWO\OLNHDQ\IXQFWLRQFDOO
LQVWDQGDUG&7KHUXQWLPHV\VWHPWDNHVFDUHRIDQ\FRPSOH[LW\LQWURGXFHGE\WKH
IDFWWKDWWKHVHSDUDPHWHUVQHHGWRJHWIURPWKHKRVWWRWKHGHYLFH
25
7KHPRUHLQWHUHVWLQJDGGLWLRQLVWKHDOORFDWLRQRIPHPRU\XVLQJcudaMalloc()
7KLVFDOOEHKDYHVYHU\VLPLODUO\WRWKHVWDQGDUG&FDOOmalloc()EXWLWWHOOV
WKH&8'$UXQWLPHWRDOORFDWHWKHPHPRU\RQWKHGHYLFH7KHȌUVWDUJXPHQW
LVDSRLQWHUWRWKHSRLQWHU\RXZDQWWRKROGWKHDGGUHVVRIWKHQHZO\DOORFDWHG
PHPRU\DQGWKHVHFRQGSDUDPHWHULVWKHVL]HRIWKHDOORFDWLRQ\RXZDQWWRPDNH
%HVLGHVWKDW\RXUDOORFDWHGPHPRU\SRLQWHULVQRWWKHIXQFWLRQǢVUHWXUQYDOXH
WKLVLVLGHQWLFDOEHKDYLRUWRmalloc()ULJKWGRZQWRWKHvoid*UHWXUQW\SH7KH
HANDLE_ERROR()WKDWVXUURXQGVWKHVHFDOOVLVDXWLOLW\PDFURWKDWZHKDYH
SURYLGHGDVSDUWRIWKLVERRNǢVVXSSRUWFRGH,WVLPSO\GHWHFWVWKDWWKHFDOOKDV
UHWXUQHGDQHUURUSULQWVWKHDVVRFLDWHGHUURUPHVVDJHDQGH[LWVWKHDSSOLFDWLRQ
ZLWKDQEXIT_FAILUREFRGH$OWKRXJK\RXDUHIUHHWRXVHWKLVFRGHLQ\RXURZQ
DSSOLFDWLRQVLWLVKLJKO\OLNHO\WKDWWKLVHUURUKDQGOLQJFRGHZLOOEHLQVXIȌFLHQWLQ
SURGXFWLRQFRGH
7KLVUDLVHVDVXEWOHEXWLPSRUWDQWSRLQW0XFKRIWKHVLPSOLFLW\DQGSRZHURI
&8'$&GHULYHVIURPWKHDELOLW\WREOXUWKHOLQHEHWZHHQKRVWDQGGHYLFHFRGH
+RZHYHULWLVWKHUHVSRQVLELOLW\RIWKHSURJUDPPHUQRWWRGHUHIHUHQFHWKHSRLQWHU
UHWXUQHGE\cudaMalloc()IURPFRGHWKDWH[HFXWHVRQWKHKRVW+RVWFRGHPD\
SDVVWKLVSRLQWHUDURXQGSHUIRUPDULWKPHWLFRQLWRUHYHQFDVWLWWRDGLIIHUHQW
W\SH%XW\RXFDQQRWXVHLWWRUHDGRUZULWHIURPPHPRU\
8QIRUWXQDWHO\WKHFRPSLOHUFDQQRWSURWHFW\RXIURPWKLVPLVWDNHHLWKHU,WZLOO
EHSHUIHFWO\KDSS\WRDOORZGHUHIHUHQFHVRIGHYLFHSRLQWHUVLQ\RXUKRVWFRGH
EHFDXVHLWORRNVOLNHDQ\RWKHUSRLQWHULQWKHDSSOLFDWLRQ:HFDQVXPPDUL]HWKH
UHVWULFWLRQVRQWKHXVDJHRIGHYLFHSRLQWHUDVIROORZV
<RXcanSDVVSRLQWHUVDOORFDWHGZLWKcudaMalloc()WRIXQFWLRQVWKDW
H[HFXWHRQWKHGHYLFH
<RXcanXVHSRLQWHUVDOORFDWHGZLWKcudaMalloc()WRUHDGRUZULWH
PHPRU\IURPFRGHWKDWH[HFXWHVRQWKHGHYLFH
<RXcanSDVVSRLQWHUVDOORFDWHGZLWKcudaMalloc()WRIXQFWLRQVWKDW
H[HFXWHRQWKHKRVW
<RXcannotXVHSRLQWHUVDOORFDWHGZLWKcudaMalloc()WRUHDGRUZULWH
PHPRU\IURPFRGHWKDWH[HFXWHVRQWKHKRVW
,I\RXǢYHEHHQUHDGLQJFDUHIXOO\\RXPLJKWKDYHDQWLFLSDWHGWKHQH[WOHVVRQ:H
FDQǢWXVHVWDQGDUG&ǢVfree()IXQFWLRQWRUHOHDVHPHPRU\ZHǢYHDOORFDWHGZLWK
cudaMalloc()7RIUHHPHPRU\ZHǢYHDOORFDWHGZLWKcudaMalloc()ZHQHHG
WRXVHDFDOOWRcudaFree()ZKLFKEHKDYHVH[DFWO\OLNHfree()GRHV
26
:HǢYHVHHQKRZWRXVHWKHKRVWWRDOORFDWHDQGIUHHPHPRU\RQWKHGHYLFHEXW
ZHǢYHDOVRPDGHLWSDLQIXOO\FOHDUWKDW\RXFDQQRWPRGLI\WKLVPHPRU\IURPWKH
KRVW7KHUHPDLQLQJWZROLQHVRIWKHVDPSOHSURJUDPLOOXVWUDWHWZRRIWKHPRVW
FRPPRQPHWKRGVIRUDFFHVVLQJGHYLFHPHPRU\ǟE\XVLQJGHYLFHSRLQWHUVIURP
ZLWKLQGHYLFHFRGHDQGE\XVLQJFDOOVWRcudaMemcpy()
:HXVHSRLQWHUVIURPZLWKLQGHYLFHFRGHH[DFWO\WKHVDPHZD\ZHXVHWKHPLQ
VWDQGDUG&WKDWUXQVRQWKHKRVWFRGH7KHVWDWHPHQW*c = a + bLVDVVLPSOH
DVLWORRNV,WDGGVWKHSDUDPHWHUVaDQGbWRJHWKHUDQGVWRUHVWKHUHVXOWLQWKH
PHPRU\SRLQWHGWRE\c:HKRSHWKLVLVDOPRVWWRRHDV\WRHYHQEHLQWHUHVWLQJ
:HOLVWHGWKHZD\VLQZKLFKZHFDQDQGFDQQRWXVHGHYLFHSRLQWHUVIURPZLWKLQ
GHYLFHDQGKRVWFRGH7KHVHFDYHDWVWUDQVODWHH[DFWO\DVRQHPLJKWLPDJLQH
ZKHQFRQVLGHULQJKRVWSRLQWHUV$OWKRXJKZHDUHIUHHWRSDVVKRVWSRLQWHUV
DURXQGLQGHYLFHFRGHZHUXQLQWRWURXEOHZKHQZHDWWHPSWWRXVHDKRVWSRLQWHU
WRDFFHVVPHPRU\IURPZLWKLQGHYLFHFRGH7RVXPPDUL]HKRVWSRLQWHUVFDQ
DFFHVVPHPRU\IURPKRVWFRGHDQGGHYLFHSRLQWHUVFDQDFFHVVPHPRU\IURP
GHYLFHFRGH
$VSURPLVHGZHFDQDOVRDFFHVVPHPRU\RQDGHYLFHWKURXJKFDOOVWR
cudaMemcpy()IURPKRVWFRGH7KHVHFDOOVEHKDYHH[DFWO\OLNHVWDQGDUG&
memcpy()ZLWKDQDGGLWLRQDOSDUDPHWHUWRVSHFLI\ZKLFKRIWKHVRXUFHDQG
GHVWLQDWLRQSRLQWHUVSRLQWWRGHYLFHPHPRU\,QWKHH[DPSOHQRWLFHWKDWWKHODVW
SDUDPHWHUWRcudaMemcpy()LVcudaMemcpyDeviceToHostLQVWUXFWLQJWKH
UXQWLPHWKDWWKHVRXUFHSRLQWHULVDGHYLFHSRLQWHUDQGWKHGHVWLQDWLRQSRLQWHULVD
KRVWSRLQWHU
8QVXUSULVLQJO\cudaMemcpyHostToDeviceZRXOGLQGLFDWHWKHRSSRVLWHVLWX-
DWLRQZKHUHWKHVRXUFHGDWDLVRQWKHKRVWDQGWKHGHVWLQDWLRQLVDQDGGUHVVRQ
WKHGHYLFH)LQDOO\ZHFDQHYHQVSHFLI\WKDWbothSRLQWHUVDUHRQWKHGHYLFHE\
SDVVLQJcudaMemcpyDeviceToDevice,IWKHVRXUFHDQGGHVWLQDWLRQSRLQWHUV
DUHERWKRQWKHKRVWZHZRXOGVLPSO\XVHVWDQGDUG&ǢVmemcpy()URXWLQHWRFRS\
EHWZHHQWKHP
4XHU\LQJ'HYLFHV
6LQFHZHZRXOGOLNHWREHDOORFDWLQJPHPRU\DQGH[HFXWLQJFRGHRQRXUGHYLFH
LWZRXOGEHXVHIXOLIRXUSURJUDPKDGDZD\RINQRZLQJKRZPXFKPHPRU\DQG
ZKDWW\SHVRIFDSDELOLWLHVWKHGHYLFHKDG)XUWKHUPRUHLWLVUHODWLYHO\FRPPRQIRU
27
SHRSOHWRKDYHPRUHWKDQRQH&8'$FDSDEOHGHYLFHSHUFRPSXWHU,QVLWXDWLRQV
OLNHWKLVZHZLOOGHȌQLWHO\ZDQWDZD\WRGHWHUPLQHZKLFKSURFHVVRULVZKLFK
)RUH[DPSOHPDQ\PRWKHUERDUGVVKLSZLWKLQWHJUDWHG19,',$JUDSKLFVSURFHV-
VRUV:KHQDPDQXIDFWXUHURUXVHUDGGVDGLVFUHWHJUDSKLFVSURFHVVRUWRWKLV
FRPSXWHULWWKHQSRVVHVVHVWZR&8'$FDSDEOHSURFHVVRUV6RPH19,',$SURG-
XFWVOLNHWKH*H)RUFH*7;VKLSZLWKWZR*38VRQDVLQJOHFDUG&RPSXWHUV
WKDWFRQWDLQSURGXFWVVXFKDVWKLVZLOODOVRVKRZWZR&8'$FDSDEOHSURFHVVRUV
%HIRUHZHJHWWRRGHHSLQWRZULWLQJGHYLFHFRGHZHZRXOGORYHWRKDYHD
PHFKDQLVPIRUGHWHUPLQLQJZKLFKGHYLFHV LIDQ\ DUHSUHVHQWDQGZKDWFDSD-
ELOLWLHVHDFKGHYLFHVXSSRUWV)RUWXQDWHO\WKHUHLVDYHU\HDV\LQWHUIDFHWR
GHWHUPLQHWKLVLQIRUPDWLRQ)LUVWZHZLOOZDQWWRNQRZKRZPDQ\GHYLFHVLQWKH
V\VWHPZHUHEXLOWRQWKH&8'$$UFKLWHFWXUH7KHVHGHYLFHVZLOOEHFDSDEOHRI
H[HFXWLQJNHUQHOVZULWWHQLQ&8'$&7RJHWWKHFRXQWRI&8'$GHYLFHVZHFDOO
cudaGetDeviceCount()1HHGOHVVWRVD\ZHDQWLFLSDWHUHFHLYLQJDQDZDUG
IRU0RVW&UHDWLYH)XQFWLRQ1DPH
int count;
HANDLE_ERROR( cudaGetDeviceCount( &count ) );
$IWHUFDOOLQJcudaGetDeviceCount()ZHFDQWKHQLWHUDWHWKURXJKWKHGHYLFHV
DQGTXHU\UHOHYDQWLQIRUPDWLRQDERXWHDFK7KH&8'$UXQWLPHUHWXUQVXVWKHVH
SURSHUWLHVLQDVWUXFWXUHRIW\SHcudaDeviceProp:KDWNLQGRISURSHUWLHV
FDQZHUHWULHYH"$VRI&8'$WKHcudaDevicePropVWUXFWXUHFRQWDLQVWKH
IROORZLQJ
struct cudaDeviceProp {
char name[256];
size_t totalGlobalMem;
size_t sharedMemPerBlock;
int regsPerBlock;
int warpSize;
size_t memPitch;
int maxThreadsPerBlock;
int maxThreadsDim[3];
int maxGridSize[3];
size_t totalConstMem;
int major;
28
int minor;
int clockRate;
size_t textureAlignment;
int deviceOverlap;
int multiProcessorCount;
int kernelExecTimeoutEnabled;
int integrated;
int canMapHostMemory;
int computeMode;
int maxTexture1D;
int maxTexture2D[2];
int maxTexture3D[3];
int maxTexture2DArray[3];
int concurrentKernels;
}
6RPHRIWKHVHDUHVHOIH[SODQDWRU\RWKHUVEHDUVRPHDGGLWLRQDOGHVFULSWLRQ VHH
7DEOH
Continued
29
30
:HǢGOLNHWRDYRLGJRLQJWRRIDUWRRIDVWGRZQRXUUDEELWKROHVRZHZLOOQRW
JRLQWRH[WHQVLYHGHWDLODERXWWKHVHSURSHUWLHVQRZ,QIDFWWKHSUHYLRXVOLVWLV
PLVVLQJVRPHLPSRUWDQWGHWDLOVDERXWVRPHRIWKHVHSURSHUWLHVVR\RXZLOOZDQW
WRFRQVXOWWKHNVIDIA CUDA Programming GuideIRUPRUHLQIRUPDWLRQ:KHQ\RX
PRYHRQWRZULWH\RXURZQDSSOLFDWLRQVWKHVHSURSHUWLHVZLOOSURYHH[WUHPHO\
XVHIXO+RZHYHUIRUQRZZHZLOOVLPSO\VKRZKRZWRTXHU\HDFKGHYLFHDQGUHSRUW
WKHSURSHUWLHVRIHDFK6RIDURXUGHYLFHTXHU\ORRNVVRPHWKLQJOLNHWKLV
#include "../common/book.h"
int count;
HANDLE_ERROR( cudaGetDeviceCount( &count ) );
for (int i=0; i< count; i++) {
HANDLE_ERROR( cudaGetDeviceProperties( &prop, i ) );
}
}
31
1RZWKDWZHNQRZHDFKRIWKHȌHOGVDYDLODEOHWRXVZHFDQH[SDQGRQWKH
DPELJXRXVǤ'RVRPHWKLQJǥVHFWLRQDQGLPSOHPHQWVRPHWKLQJPDUJLQDOO\OHVV
WULYLDO
#include "../common/book.h"
int count;
HANDLE_ERROR( cudaGetDeviceCount( &count ) );
for (int i=0; i< count; i++) {
HANDLE_ERROR( cudaGetDeviceProperties( &prop, i ) );
printf( " --- General Information for device %d ---\n", i );
printf( "Name: %s\n", prop.name );
printf( "Compute capability: %d.%d\n", prop.major, prop.minor );
printf( "Clock rate: %d\n", prop.clockRate );
printf( "Device copy overlap: " );
if (prop.deviceOverlap)
printf( "Enabled\n" );
else
printf( "Disabled\n" );
printf( "Kernel execition timeout : " );
if (prop.kernelExecTimeoutEnabled)
printf( "Enabled\n" );
else
printf( "Disabled\n" );
32
8VLQJ'HYLFH3URSHUWLHV
2WKHUWKDQZULWLQJDQDSSOLFDWLRQWKDWKDQGLO\SULQWVHYHU\GHWDLORIHYHU\&8'$
FDSDEOHFDUGZK\PLJKWZHEHLQWHUHVWHGLQWKHSURSHUWLHVRIHDFKGHYLFHLQRXU
V\VWHP"6LQFHZHDVVRIWZDUHGHYHORSHUVZDQWHYHU\RQHWRWKLQNRXUVRIWZDUHLV
IDVWZHPLJKWEHLQWHUHVWHGLQFKRRVLQJWKH*38ZLWKWKHPRVWPXOWLSURFHVVRUV
RQZKLFKWRUXQRXUFRGH2ULIWKHNHUQHOQHHGVFORVHLQWHUDFWLRQZLWKWKH&38
ZHPLJKWEHLQWHUHVWHGLQUXQQLQJRXUFRGHRQWKHLQWHJUDWHG*38WKDWVKDUHV
V\VWHPPHPRU\ZLWKWKH&387KHVHDUHERWKSURSHUWLHVZHFDQTXHU\ZLWK
cudaGetDeviceProperties()
6XSSRVHWKDWZHDUHZULWLQJDQDSSOLFDWLRQWKDWGHSHQGVRQKDYLQJGRXEOH
SUHFLVLRQȍRDWLQJSRLQWVXSSRUW$IWHUDTXLFNFRQVXOWDWLRQZLWK$SSHQGL[$RIWKH
NVIDIA CUDA Programming GuideZHNQRZWKDWFDUGVWKDWKDYHFRPSXWHFDSD-
ELOLW\RUKLJKHUVXSSRUWGRXEOHSUHFLVLRQȍRDWLQJSRLQWPDWK6RWRVXFFHVV-
IXOO\UXQWKHGRXEOHSUHFLVLRQDSSOLFDWLRQWKDWZHǢYHZULWWHQZHQHHGWRȌQGDW
OHDVWRQHGHYLFHRIFRPSXWHFDSDELOLW\RUKLJKHU
33
%DVHGRQZKDWZHKDYHVHHQZLWKcudaGetDeviceCount()DQG
cudaGetDeviceProperties()ZHFRXOGLWHUDWHWKURXJKHDFKGHYLFHDQGORRN
IRURQHWKDWHLWKHUKDVDPDMRUYHUVLRQJUHDWHUWKDQRUKDVDPDMRUYHUVLRQRI
DQGPLQRUYHUVLRQJUHDWHUWKDQRUHTXDOWR%XWVLQFHWKLVUHODWLYHO\FRPPRQ
SURFHGXUHLVDOVRUHODWLYHO\DQQR\LQJWRSHUIRUPWKH&8'$UXQWLPHRIIHUVXVDQ
DXWRPDWHGZD\WRGRWKLV:HȌUVWȌOODcudaDevicePropVWUXFWXUHZLWKWKH
SURSHUWLHVZHQHHGRXUGHYLFHWRKDYH
cudaDeviceProp prop;
memset( &prop, 0, sizeof( cudaDeviceProp ) );
prop.major = 1;
prop.minor = 3;
$IWHUȌOOLQJDcudaDevicePropVWUXFWXUHZHSDVVLWWR
cudaChooseDevice()WRKDYHWKH&8'$UXQWLPHȌQGDGHYLFHWKDWVDWLVȌHV
WKLVFRQVWUDLQW7KHFDOOWRcudaChooseDevice()UHWXUQVDGHYLFH,'WKDWZH
FDQWKHQSDVVWRcudaSetDevice())URPWKLVSRLQWIRUZDUGDOOGHYLFHRSHUD-
WLRQVZLOOWDNHSODFHRQWKHGHYLFHZHIRXQGLQcudaChooseDevice()
#include "../common/book.h"
34
6\VWHPVZLWKPXOWLSOH*38VDUHEHFRPLQJPRUHDQGPRUHFRPPRQ)RU
H[DPSOHPDQ\RI19,',$ǢVPRWKHUERDUGFKLSVHWVFRQWDLQLQWHJUDWHG&8'$
FDSDEOH*38V:KHQDGLVFUHWH*38LVDGGHGWRRQHRIWKHVHV\VWHPV\RX
VXGGHQO\KDYHDPXOWL*38SODWIRUP0RUHRYHU19,',$ǢV6/,WHFKQRORJ\DOORZV
PXOWLSOHGLVFUHWH*38VWREHLQVWDOOHGVLGHE\VLGH,QHLWKHURIWKHVHFDVHV\RXU
DSSOLFDWLRQPD\KDYHDSUHIHUHQFHRIRQH*38RYHUDQRWKHU,I\RXUDSSOLFDWLRQ
GHSHQGVRQFHUWDLQIHDWXUHVRIWKH*38RUGHSHQGVRQKDYLQJWKHIDVWHVW*38
LQWKHV\VWHP\RXVKRXOGIDPLOLDUL]H\RXUVHOIZLWKWKLV$3,EHFDXVHWKHUHLVQR
JXDUDQWHHWKDWWKH&8'$UXQWLPHZLOOFKRRVHWKHEHVWRUPRVWDSSURSULDWH*38
IRU\RXUDSSOLFDWLRQ
&KDSWHU5HYLHZ
:HǢYHȌQDOO\JRWWHQRXUKDQGVGLUW\ZULWLQJ&8'$&DQGLGHDOO\LWKDVEHHQOHVV
SDLQIXOWKDQ\RXPLJKWKDYHVXVSHFWHG)XQGDPHQWDOO\&8'$&LVVWDQGDUG&
ZLWKVRPHRUQDPHQWDWLRQWRDOORZXVWRVSHFLI\ZKLFKFRGHVKRXOGUXQRQWKH
GHYLFHDQGZKLFKVKRXOGUXQRQWKHKRVW%\DGGLQJWKHNH\ZRUG__global__
EHIRUHDIXQFWLRQZHLQGLFDWHGWRWKHFRPSLOHUWKDWZHLQWHQGWRUXQWKHIXQFWLRQ
RQWKH*387RXVHWKH*38ǢVGHGLFDWHGPHPRU\ZHDOVROHDUQHGD&8'$$3,
VLPLODUWR&ǢVmalloc(), memcpy()DQGfree()$3,V7KH&8'$YHUVLRQVRI
WKHVHIXQFWLRQVcudaMalloc(), cudaMemcpy()DQGcudaFree()DOORZXV
WRDOORFDWHGHYLFHPHPRU\FRS\GDWDEHWZHHQWKHGHYLFHDQGKRVWDQGIUHHWKH
GHYLFHPHPRU\ZKHQZHǢYHȌQLVKHGZLWKLW
$VZHSURJUHVVWKURXJKWKLVERRNZHZLOOVHHPRUHLQWHUHVWLQJH[DPSOHVRI
KRZZHFDQHIIHFWLYHO\XVHWKHGHYLFHDVDPDVVLYHO\SDUDOOHOFRSURFHVVRU)RU
QRZ\RXVKRXOGNQRZKRZHDV\LWLVWRJHWVWDUWHGZLWK&8'$&DQGLQWKHQH[W
FKDSWHUZHZLOOVHHKRZHDV\LWLVWRH[HFXWHSDUDOOHOFRGHRQWKH*38
35
,QWKHSUHYLRXVFKDSWHUZHVDZKRZVLPSOHLWFDQEHWRZULWHFRGHWKDWH[HFXWHV
RQWKH*38:HKDYHHYHQJRQHVRIDUDVWROHDUQKRZWRDGGWZRQXPEHUV
WRJHWKHUDOEHLWMXVWWKHQXPEHUVDQG$GPLWWHGO\WKDWH[DPSOHZDVQRW
LPPHQVHO\LPSUHVVLYHQRUZDVLWLQFUHGLEO\LQWHUHVWLQJ%XWZHKRSH\RXDUH
FRQYLQFHGWKDWLWLVHDV\WRJHWVWDUWHGZLWK&8'$&DQG\RXǢUHH[FLWHGWROHDUQ
PRUH0XFKRIWKHSURPLVHRI*38FRPSXWLQJOLHVLQH[SORLWLQJWKHPDVVLYHO\
SDUDOOHOVWUXFWXUHRIPDQ\SUREOHPV,QWKLVYHLQZHLQWHQGWRVSHQGWKLVFKDSWHU
H[DPLQLQJKRZWRH[HFXWHSDUDOOHOFRGHRQWKH*38XVLQJ&8'$&
37
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQRQHRIWKHIXQGDPHQWDOZD\V&8'$H[SRVHVLWVSDUDOOHOLVP
ǩ <RXZLOOZULWH\RXUȌUVWSDUDOOHOFRGHZLWK&8'$&
&8'$3DUDOOHO3URJUDPPLQJ
3UHYLRXVO\ZHVDZKRZHDV\LWZDVWRJHWDVWDQGDUG&IXQFWLRQWRVWDUWUXQQLQJ
RQDGHYLFH%\DGGLQJWKH__global__TXDOLȌHUWRWKHIXQFWLRQDQGE\FDOOLQJ
LWXVLQJDVSHFLDODQJOHEUDFNHWV\QWD[ZHH[HFXWHGWKHIXQFWLRQRQRXU*38
$OWKRXJKWKLVZDVH[WUHPHO\VLPSOHLWZDVDOVRH[WUHPHO\LQHIȌFLHQWEHFDXVH
19,',$ǢVKDUGZDUHHQJLQHHULQJPLQLRQVKDYHRSWLPL]HGWKHLUJUDSKLFVSURFHVVRUV
WRSHUIRUPKXQGUHGVRIFRPSXWDWLRQVLQSDUDOOHO+RZHYHUWKXVIDUZHKDYHRQO\
HYHUODXQFKHGDNHUQHOWKDWUXQVVHULDOO\RQWKH*38,QWKLVFKDSWHUZHVHHKRZ
VWUDLJKWIRUZDUGLWLVWRODXQFKDGHYLFHNHUQHOWKDWSHUIRUPVLWVFRPSXWDWLRQVLQ
SDUDOOHO
38
#include "../common/book.h"
#define N 10
add( a, b, c );
39
return 0;
}
0RVWRIWKLVH[DPSOHEHDUVDOPRVWQRH[SODQDWLRQEXWZHZLOOEULHȍ\ORRNDWWKH
add()IXQFWLRQWRH[SODLQZK\ZHRYHUO\FRPSOLFDWHGLW
:HFRPSXWHWKHVXPZLWKLQDwhileORRSZKHUHWKHLQGH[tidUDQJHVIURP0WR
N-1:HDGGFRUUHVSRQGLQJHOHPHQWVRIa[]DQGb[]SODFLQJWKHUHVXOWLQWKH
FRUUHVSRQGLQJHOHPHQWRIc[]2QHZRXOGW\SLFDOO\FRGHWKLVLQDVOLJKWO\VLPSOHU
PDQQHUOLNHVR
2XUVOLJKWO\PRUHFRQYROXWHGPHWKRGZDVLQWHQGHGWRVXJJHVWDSRWHQWLDOZD\WR
SDUDOOHOL]HWKHFRGHRQDV\VWHPZLWKPXOWLSOH&38VRU&38FRUHV)RUH[DPSOH
ZLWKDGXDOFRUHSURFHVVRURQHFRXOGFKDQJHWKHLQFUHPHQWWRDQGKDYHRQH
FRUHLQLWLDOL]HWKHORRSZLWKtid = 0DQGDQRWKHUZLWKtid = 17KHȌUVWFRUH
ZRXOGDGGWKHHYHQLQGH[HGHOHPHQWVDQGWKHVHFRQGFRUHZRXOGDGGWKHRGG
LQGH[HGHOHPHQWV7KLVDPRXQWVWRH[HFXWLQJWKHIROORZLQJFRGHRQHDFKRIWKH
WZR&38FRUHV
40
void add( int *a, int *b, int *c ) void add( int *a, int *b, int *c )
{ {
int tid = 0; int tid = 1;
while (tid < N) { while (tid < N) {
c[tid] = a[tid] + b[tid]; c[tid] = a[tid] + b[tid];
tid += 2; tid += 2;
} }
} }
2IFRXUVHGRLQJWKLVRQD&38ZRXOGUHTXLUHFRQVLGHUDEO\PRUHFRGHWKDQZH
KDYHLQFOXGHGLQWKLVH[DPSOH<RXZRXOGQHHGWRSURYLGHDUHDVRQDEOHDPRXQWRI
LQIUDVWUXFWXUHWRFUHDWHWKHZRUNHUWKUHDGVWKDWH[HFXWHWKHIXQFWLRQadd()DV
ZHOODVPDNHWKHDVVXPSWLRQWKDWHDFKWKUHDGZRXOGH[HFXWHLQSDUDOOHODVFKHG-
XOLQJDVVXPSWLRQWKDWLVXQIRUWXQDWHO\QRWDOZD\VWUXH
#include "../common/book.h"
#define N 10
41
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( c, dev_c, N * sizeof(int),
cudaMemcpyDeviceToHost ) );
return 0;
}
<RXZLOOQRWLFHVRPHFRPPRQSDWWHUQVWKDWZHHPSOR\DJDLQ
ǩ :HDOORFDWHWKUHHDUUD\VRQWKHGHYLFHXVLQJFDOOVWRcudaMalloc()WZR
DUUD\Vdev_aDQGdev_bWRKROGLQSXWVDQGRQHDUUD\dev_cWRKROGWKH
UHVXOW
ǩ %HFDXVHZHDUHHQYLURQPHQWDOO\FRQVFLHQWLRXVFRGHUVZHFOHDQXSDIWHU
RXUVHOYHVZLWKcudaFree()
ǩ 8VLQJcudaMemcpy()ZHFRS\WKHLQSXWGDWDWRWKHGHYLFHZLWKWKHSDUDPHWHU
cudaMemcpyHostToDeviceDQGFRS\WKHUHVXOWGDWDEDFNWRWKHKRVWZLWK
cudaMemcpyDeviceToHost
ǩ :HH[HFXWHWKHGHYLFHFRGHLQadd()IURPWKHKRVWFRGHLQmain()XVLQJWKH
WULSOHDQJOHEUDFNHWV\QWD[
42
$VDQDVLGH\RXPD\EHZRQGHULQJZK\ZHȌOOWKHLQSXWDUUD\VRQWKH&387KHUH
LVQRUHDVRQLQSDUWLFXODUZK\ZHneedWRGRWKLV,QIDFWWKHSHUIRUPDQFHRIWKLV
VWHSZRXOGEHIDVWHULIZHȌOOHGWKHDUUD\VRQWKH*38%XWZHLQWHQGWRVKRZKRZ
DSDUWLFXODURSHUDWLRQQDPHO\WKHDGGLWLRQRIWZRYHFWRUVFDQEHLPSOHPHQWHG
RQDJUDSKLFVSURFHVVRU$VDUHVXOWZHDVN\RXWRLPDJLQHWKDWWKLVLVEXWRQH
VWHSRIDODUJHUDSSOLFDWLRQZKHUHWKHLQSXWDUUD\Va[]DQGb[]KDYHEHHQ
JHQHUDWHGE\VRPHRWKHUDOJRULWKPRUORDGHGIURPWKHKDUGGULYHE\WKHXVHU,Q
VXPPDU\LWZLOOVXIȌFHWRSUHWHQGWKDWWKLVGDWDDSSHDUHGRXWRIQRZKHUHDQG
QRZZHQHHGWRGRVRPHWKLQJZLWKLW
0RYLQJRQRXUadd()URXWLQHORRNVVLPLODUWRLWVFRUUHVSRQGLQJ&38
LPSOHPHQWDWLRQ
$JDLQZHVHHDFRPPRQSDWWHUQZLWKWKHIXQFWLRQadd()
ǩ :HKDYHZULWWHQDIXQFWLRQFDOOHGadd()WKDWH[HFXWHVRQWKHGHYLFH:H
DFFRPSOLVKHGWKLVE\WDNLQJ&FRGHDQGDGGLQJD__global__TXDOLȌHUWR
WKHIXQFWLRQQDPH
6RIDUWKHUHLVQRWKLQJQHZLQWKLVH[DPSOHH[FHSWLWFDQGRPRUHWKDQDGGDQG
+RZHYHUWKHUHareWZRQRWHZRUWK\FRPSRQHQWVRIWKLVH[DPSOH7KHSDUDP-
HWHUVZLWKLQWKHWULSOHDQJOHEUDFNHWVDQGWKHFRGHFRQWDLQHGLQWKHNHUQHOLWVHOI
ERWKLQWURGXFHQHZFRQFHSWV
8SWRWKLVSRLQWZHKDYHDOZD\VVHHQNHUQHOVODXQFKHGLQWKHIROORZLQJIRUP
kernel<<<1,1>>>( param1, param2, … );
%XWLQWKLVH[DPSOHZHDUHODXQFKLQJZLWKDQXPEHULQWKHDQJOHEUDFNHWVWKDWLV
QRW
add<<<N,1>>>( dev _ a, dev _ b, dev _ c );
:KDWJLYHV"
43
5HFDOOWKDWZHOHIWWKRVHWZRQXPEHUVLQWKHDQJOHEUDFNHWVXQH[SODLQHGZH
VWDWHGYDJXHO\WKDWWKH\ZHUHSDUDPHWHUVWRWKHUXQWLPHWKDWGHVFULEHKRZWR
ODXQFKWKHNHUQHO:HOOWKHȌUVWQXPEHULQWKRVHSDUDPHWHUVUHSUHVHQWVWKH
QXPEHURISDUDOOHOEORFNVLQZKLFKZHZRXOGOLNHWKHGHYLFHWRH[HFXWHRXUNHUQHO
,QWKLVFDVHZHǢUHSDVVLQJWKHYDOXHNIRUWKLVSDUDPHWHU
)RUH[DPSOHLIZHODXQFKZLWKkernel<<<2,1>>>()\RXFDQWKLQNRIWKH
UXQWLPHFUHDWLQJWZRFRSLHVRIWKHNHUQHODQGUXQQLQJWKHPLQSDUDOOHO:HFDOO
HDFKRIWKHVHSDUDOOHOLQYRFDWLRQVDblock:LWKkernel<<<256,1>>>()\RX
ZRXOGJHWblocksUXQQLQJRQWKH*383DUDOOHOSURJUDPPLQJKDVQHYHUEHHQ
HDVLHU
%XWWKLVUDLVHVDQH[FHOOHQWTXHVWLRQ7KH*38UXQVNFRSLHVRIRXUNHUQHOFRGH
EXWKRZFDQZHWHOOIURPZLWKLQWKHFRGHZKLFKEORFNLVFXUUHQWO\UXQQLQJ"7KLV
TXHVWLRQEULQJVXVWRWKHVHFRQGQHZIHDWXUHRIWKHH[DPSOHWKHNHUQHOFRGH
LWVHOI6SHFLȌFDOO\LWEULQJVXVWRWKHYDULDEOHblockIdx.x
$WȌUVWJODQFHLWORRNVOLNHWKLVYDULDEOHVKRXOGFDXVHDV\QWD[HUURUDWFRPSLOH
WLPHVLQFHZHXVHLWWRDVVLJQWKHYDOXHRItidEXWZHKDYHQHYHUGHȌQHGLW
+RZHYHUWKHUHLVQRQHHGWRGHȌQHWKHYDULDEOHblockIdxWKLVLVRQHRIWKH
EXLOWLQYDULDEOHVWKDWWKH&8'$UXQWLPHGHȌQHVIRUXV)XUWKHUPRUHZHXVHWKLV
YDULDEOHIRUH[DFWO\ZKDWLWVRXQGVOLNHLWPHDQV,WFRQWDLQVWKHYDOXHRIWKHEORFN
LQGH[IRUZKLFKHYHUEORFNLVFXUUHQWO\UXQQLQJWKHGHYLFHFRGH
:K\\RXPD\WKHQDVNLVLWQRWMXVWblockIdx":K\blockIdx.x"$VLWWXUQV
RXW&8'$&DOORZV\RXWRGHȌQHDJURXSRIEORFNVLQWZRGLPHQVLRQV)RUSURE-
OHPVZLWKWZRGLPHQVLRQDOGRPDLQVVXFKDVPDWUL[PDWKRULPDJHSURFHVVLQJ
LWLVRIWHQFRQYHQLHQWWRXVHWZRGLPHQVLRQDOLQGH[LQJWRDYRLGDQQR\LQJWUDQVOD-
WLRQVIURPOLQHDUWRUHFWDQJXODULQGLFHV'RQǢWZRUU\LI\RXDUHQǢWIDPLOLDUZLWK
WKHVHSUREOHPW\SHVMXVWNQRZWKDWXVLQJWZRGLPHQVLRQDOLQGH[LQJFDQVRPH-
WLPHVEHPRUHFRQYHQLHQWWKDQRQHGLPHQVLRQDOLQGH[LQJ%XW\RXQHYHUhaveWR
XVHLW:HZRQǢWEHRIIHQGHG
44
:KHQZHODXQFKHGWKHNHUQHOZHVSHFLȌHGNDVWKHQXPEHURISDUDOOHOEORFNV
:HFDOOWKHFROOHFWLRQRISDUDOOHOEORFNVDgrid7KLVVSHFLȌHVWRWKHUXQWLPH
V\VWHPWKDWZHZDQWDRQHGLPHQVLRQDOgridRINEORFNV VFDODUYDOXHVDUH
LQWHUSUHWHGDVRQHGLPHQVLRQDO 7KHVHWKUHDGVZLOOKDYHYDU\LQJYDOXHVIRU
blockIdx.xWKHȌUVWWDNLQJYDOXHDQGWKHODVWWDNLQJYDOXHN-16RLPDJLQH
IRXUEORFNVDOOUXQQLQJWKURXJKWKHVDPHFRS\RIWKHGHYLFHFRGHEXWKDYLQJ
GLIIHUHQWYDOXHVIRUWKHYDULDEOHblockIdx.x7KLVLVZKDWWKHDFWXDOFRGHEHLQJ
H[HFXWHGLQHDFKRIWKHIRXUSDUDOOHOEORFNVORRNVOLNHDIWHUWKHUXQWLPHVXEVWL-
WXWHVWKHDSSURSULDWHEORFNLQGH[IRUblockIdx.x
BLOCK 1 BLOCK 2
BLOCK 3 BLOCK 4
,I\RXUHFDOOWKH&38EDVHGH[DPSOHZLWKZKLFKZHEHJDQ\RXZLOOUHFDOOWKDWZH
QHHGHGWRZDONWKURXJKLQGLFHVIURPWRN-1LQRUGHUWRVXPWKHWZRYHFWRUV
6LQFHWKHUXQWLPHV\VWHPLVDOUHDG\ODXQFKLQJDNHUQHOZKHUHHDFKEORFNZLOO
KDYHRQHRIWKHVHLQGLFHVQHDUO\DOORIWKLVZRUNKDVDOUHDG\EHHQGRQHIRUXV
%HFDXVHZHǢUHVRPHWKLQJRIDOD]\ORWWKLVLVDJRRGWKLQJ,WDIIRUGVXVPRUHWLPH
WREORJSUREDEO\DERXWKRZOD]\ZHDUH
7KHODVWUHPDLQLQJTXHVWLRQWREHDQVZHUHGLVZK\GRZHFKHFNZKHWKHUtid
LVOHVVWKDQN",WshouldDOZD\VEHOHVVWKDQNVLQFHZHǢYHVSHFLȌFDOO\ODXQFKHG
RXUNHUQHOVXFKWKDWWKLVDVVXPSWLRQKROGV%XWRXUGHVLUHWREHOD]\DOVRPDNHV
XVSDUDQRLGDERXWVRPHRQHEUHDNLQJDQDVVXPSWLRQZHǢYHPDGHLQRXUFRGH
%UHDNLQJFRGHDVVXPSWLRQVPHDQVEURNHQFRGH7KLVPHDQVEXJUHSRUWVODWH
45
QLJKWVWUDFNLQJGRZQEDGEHKDYLRUDQGJHQHUDOO\ORWVRIDFWLYLWLHVWKDWVWDQG
EHWZHHQXVDQGRXUEORJ,IZHGLGQǢWFKHFNWKDWtidLVOHVVWKDQNDQGVXEVH-
TXHQWO\IHWFKHGPHPRU\WKDWZDVQǢWRXUVWKLVZRXOGEHEDG,QIDFWLWFRXOG
SRVVLEO\NLOOWKHH[HFXWLRQRI\RXUNHUQHOVLQFH*38VKDYHVRSKLVWLFDWHGPHPRU\
PDQDJHPHQWXQLWVWKDWNLOOSURFHVVHVWKDWVHHPWREHYLRODWLQJPHPRU\UXOHV
,I\RXHQFRXQWHUSUREOHPVOLNHWKHRQHVMXVWPHQWLRQHGRQHRIWKHHANDLE_
ERROR()PDFURVWKDWZHǢYHVSULQNOHGVROLEHUDOO\WKURXJKRXWWKHFRGHZLOO
GHWHFWDQGDOHUW\RXWRWKHVLWXDWLRQ$VZLWKWUDGLWLRQDO&SURJUDPPLQJWKH
OHVVRQKHUHLVWKDWIXQFWLRQVUHWXUQHUURUFRGHVIRUDUHDVRQ$OWKRXJKLWLV
DOZD\VWHPSWLQJWRLJQRUHWKHVHHUURUFRGHVZHZRXOGORYHWRVDYHyouWKHKRXUV
RISDLQWKURXJKZKLFKweKDYHVXIIHUHGE\XUJLQJWKDW\RXcheck the results of
every operation that can fail$VLVRIWHQWKHFDVHWKHSUHVHQFHRIWKHVHHUURUV
ZLOOQRWSUHYHQW\RXIURPFRQWLQXLQJWKHH[HFXWLRQRI\RXUDSSOLFDWLRQEXWWKH\
ZLOOPRVWFHUWDLQO\FDXVHDOOPDQQHURIXQSUHGLFWDEOHDQGXQVDYRU\VLGHHIIHFWV
GRZQVWUHDP
$WWKLVSRLQW\RXǢUHUXQQLQJFRGHLQSDUDOOHORQWKH*383HUKDSV\RXKDGKHDUG
WKLVZDVWULFN\RUWKDW\RXKDGWRXQGHUVWDQGFRPSXWHUJUDSKLFVWRGRJHQHUDO
SXUSRVHSURJUDPPLQJRQDJUDSKLFVSURFHVVRU:HKRSH\RXDUHVWDUWLQJWRVHH
KRZ&8'$&PDNHVLWPXFKHDVLHUWRJHWVWDUWHGZULWLQJSDUDOOHOFRGHRQD*38
:HXVHGWKHH[DPSOHRQO\WRVXPYHFWRUVRIOHQJWK,I\RXZRXOGOLNHWRVHH
KRZHDV\LWLVWRJHQHUDWHDPDVVLYHO\SDUDOOHODSSOLFDWLRQWU\FKDQJLQJWKHLQ
WKHOLQH#define N 10WRRUWRODXQFKWHQVRIWKRXVDQGVRISDUDOOHO
EORFNV%HZDUQHGWKRXJK1RGLPHQVLRQRI\RXUODXQFKRIEORFNVPD\H[FHHG
7KLVLVVLPSO\DKDUGZDUHLPSRVHGOLPLWVR\RXZLOOVWDUWWRVHHIDLOXUHVLI
\RXDWWHPSWODXQFKHVZLWKPRUHEORFNVWKDQWKLV,QWKHQH[WFKDSWHUZHZLOOVHH
KRZWRZRUNZLWKLQWKLVOLPLWDWLRQ
$)81(;$03/(
:HGRQǢWPHDQWRLPSO\WKDWDGGLQJYHFWRUVLVDQ\WKLQJOHVVWKDQIXQEXWWKH
IROORZLQJH[DPSOHZLOOVDWLVI\WKRVHORRNLQJIRUVRPHȍDVK\H[DPSOHVRISDUDOOHO
&8'$&
7KHIROORZLQJH[DPSOHZLOOGHPRQVWUDWHFRGHWRGUDZVOLFHVRIWKH-XOLD6HW)RU
WKHXQLQLWLDWHGWKH-XOLD6HWLVWKHERXQGDU\RIDFHUWDLQFODVVRIIXQFWLRQVRYHU
FRPSOH[QXPEHUV8QGRXEWHGO\WKLVVRXQGVHYHQOHVVIXQWKDQYHFWRUDGGL-
WLRQDQGPDWUL[PXOWLSOLFDWLRQ+RZHYHUIRUDOPRVWDOOYDOXHVRIWKHIXQFWLRQǢV
46
SDUDPHWHUVWKLVERXQGDU\IRUPVDIUDFWDORQHRIWKHPRVWLQWHUHVWLQJDQGEHDX-
WLIXOFXULRVLWLHVRIPDWKHPDWLFV
7KHFDOFXODWLRQVLQYROYHGLQJHQHUDWLQJVXFKDVHWDUHTXLWHVLPSOH$WLWVKHDUW
WKH-XOLD6HWHYDOXDWHVDVLPSOHLWHUDWLYHHTXDWLRQIRUSRLQWVLQWKHFRPSOH[SODQH
$SRLQWLVnotLQWKHVHWLIWKHSURFHVVRILWHUDWLQJWKHHTXDWLRQGLYHUJHVIRUWKDW
SRLQW7KDWLVLIWKHVHTXHQFHRIYDOXHVSURGXFHGE\LWHUDWLQJWKHHTXDWLRQJURZV
WRZDUGLQȌQLW\DSRLQWLVFRQVLGHUHGoutsideWKHVHW&RQYHUVHO\LIWKHYDOXHV
WDNHQE\WKHHTXDWLRQUHPDLQERXQGHGWKHSRLQWisLQWKHVHW
&RPSXWDWLRQDOO\WKHLWHUDWLYHHTXDWLRQLQTXHVWLRQLVUHPDUNDEO\VLPSOHDV
VKRZQLQ(TXDWLRQ
Equation 4.1
&RPSXWLQJDQLWHUDWLRQRI(TXDWLRQZRXOGWKHUHIRUHLQYROYHVTXDULQJWKH
FXUUHQWYDOXHDQGDGGLQJDFRQVWDQWWRJHWWKHQH[WYDOXHRIWKHHTXDWLRQ
kernel( ptr );
bitmap.display_and_exit();
}
2XUPDLQURXWLQHLVUHPDUNDEO\VLPSOH,WFUHDWHVWKHDSSURSULDWHVL]HELWPDS
LPDJHXVLQJDXWLOLW\OLEUDU\SURYLGHG1H[WLWSDVVHVDSRLQWHUWRWKHELWPDSGDWD
WRWKHNHUQHOIXQFWLRQ
47
7KHFRPSXWDWLRQNHUQHOGRHVQRWKLQJPRUHWKDQLWHUDWHWKURXJKDOOSRLQWVZH
FDUHWRUHQGHUFDOOLQJjulia()RQHDFKWRGHWHUPLQHPHPEHUVKLSLQWKH-XOLD
6HW7KHIXQFWLRQjulia()ZLOOUHWXUQLIWKHSRLQWLVLQWKHVHWDQGLILWLVQRW
LQWKHVHW:HVHWWKHSRLQWǢVFRORUWREHUHGLIjulia()UHWXUQVDQGEODFNLILW
UHWXUQV7KHVHFRORUVDUHDUELWUDU\DQG\RXVKRXOGIHHOIUHHWRFKRRVHDFRORU
VFKHPHWKDWPDWFKHV\RXUSHUVRQDODHVWKHWLFV
int i = 0;
for (i=0; i<200; i++) {
a = a * a + c;
if (a.magnitude2() > 1000)
return 0;
}
return 1;
}
48
7KLVIXQFWLRQLVWKHPHDWRIWKHH[DPSOH:HEHJLQE\WUDQVODWLQJRXUSL[HO
FRRUGLQDWHWRDFRRUGLQDWHLQFRPSOH[VSDFH7RFHQWHUWKHFRPSOH[SODQHDWWKH
LPDJHFHQWHUZHVKLIWE\DIM/27KHQWRHQVXUHWKDWWKHLPDJHVSDQVWKHUDQJH
RIWRZHVFDOHWKHLPDJHFRRUGLQDWHE\DIM/27KXVJLYHQDQLPDJH
SRLQWDW(x,y)ZHJHWDSRLQWLQFRPSOH[VSDFHDW( (DIM/2 – x)/(DIM/2),
((DIM/2 – y)/(DIM/2) )
7KHQWRSRWHQWLDOO\]RRPLQRURXWZHLQWURGXFHDscaleIDFWRU&XUUHQWO\WKHVFDOH
LVKDUGFRGHGWREHEXW\RXVKRXOGWZHDNWKLVSDUDPHWHUWR]RRPLQRURXW,I\RX
DUHIHHOLQJUHDOO\DPELWLRXV\RXFRXOGPDNHWKLVDFRPPDQGOLQHSDUDPHWHU
$IWHUREWDLQLQJWKHSRLQWLQFRPSOH[VSDFHZHWKHQQHHGWRGHWHUPLQHZKHWKHU
WKHSRLQWLVLQRURXWRIWKH-XOLD6HW,I\RXUHFDOOWKHSUHYLRXVVHFWLRQZHGRWKLV
E\FRPSXWLQJWKHYDOXHVRIWKHLWHUDWLYHHTXDWLRQ=Q ]Q2&6LQFH&LVVRPH
DUELWUDU\FRPSOH[YDOXHGFRQVWDQWZHKDYHFKRVHQ-0.8 + 0.156iEHFDXVHLW
KDSSHQVWR\LHOGDQLQWHUHVWLQJSLFWXUH<RXVKRXOGSOD\ZLWKWKLVFRQVWDQWLI\RX
ZDQWWRVHHRWKHUYHUVLRQVRIWKH-XOLD6HW
,QWKHH[DPSOHZHFRPSXWHLWHUDWLRQVRIWKLVIXQFWLRQ$IWHUHDFKLWHUDWLRQ
ZHFKHFNZKHWKHUWKHPDJQLWXGHRIWKHUHVXOWH[FHHGVVRPHWKUHVKROG IRU
RXUSXUSRVHV ,IVRWKHHTXDWLRQLVGLYHUJLQJDQGZHFDQUHWXUQWRLQGLFDWHWKDW
WKHSRLQWLVnotLQWKHVHW2QWKHRWKHUKDQGLIZHȌQLVKDOOLWHUDWLRQVDQGWKH
PDJQLWXGHLVVWLOOERXQGHGXQGHUZHDVVXPHWKDWWKHSRLQWLVLQWKHVHW
DQGZHUHWXUQWRWKHFDOOHUkernel()
6LQFHDOOWKHFRPSXWDWLRQVDUHEHLQJSHUIRUPHGRQFRPSOH[QXPEHUVZHGHȌQH
DJHQHULFVWUXFWXUHWRVWRUHFRPSOH[QXPEHUV
struct cuComplex {
float r;
float i;
cuComplex( float a, float b ) : r(a), i(b) {}
float magnitude2( void ) { return r * r + i * i; }
cuComplex operator*(const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
cuComplex operator+(const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
49
7KHFODVVUHSUHVHQWVFRPSOH[QXPEHUVZLWKWZRGDWDHOHPHQWVDVLQJOH
SUHFLVLRQUHDOFRPSRQHQWrDQGDVLQJOHSUHFLVLRQLPDJLQDU\FRPSRQHQWi
7KHFODVVGHȌQHVDGGLWLRQDQGPXOWLSOLFDWLRQRSHUDWRUVWKDWFRPELQHFRPSOH[
QXPEHUVDVH[SHFWHG ,I\RXDUHFRPSOHWHO\XQIDPLOLDUZLWKFRPSOH[QXPEHUV
\RXFDQJHWDTXLFNSULPHURQOLQH )LQDOO\ZHGHȌQHDPHWKRGWKDWUHWXUQVWKH
PDJQLWXGHRIWKHFRPSOH[QXPEHU
dim3 grid(DIM,DIM);
kernel<<<grid,1>>>( dev_bitmap );
cudaFree( dev_bitmap );
}
7KLVYHUVLRQRImain()ORRNVPXFKPRUHFRPSOLFDWHGWKDQWKH&38YHUVLRQEXW
WKHȍRZLVDFWXDOO\LGHQWLFDO/LNHZLWKWKH&38YHUVLRQZHFUHDWHDDIM[DIM
50
ELWPDSLPDJHXVLQJRXUXWLOLW\OLEUDU\%XWEHFDXVHZHZLOOEHGRLQJFRPSXWD-
WLRQRQD*38ZHDOVRGHFODUHDSRLQWHUFDOOHGdev_bitmapWRKROGDFRS\
RIWKHGDWDRQWKHGHYLFH$QGWRKROGGDWDZHQHHGWRDOORFDWHPHPRU\XVLQJ
cudaMalloc()
:HWKHQUXQRXUkernel()IXQFWLRQH[DFWO\OLNHLQWKH&38YHUVLRQDOWKRXJK
QRZLWLVD__global__IXQFWLRQPHDQLQJLWZLOOUXQRQWKH*38$VZLWKWKH
&38H[DPSOHZHSDVVkernel()WKHSRLQWHUZHDOORFDWHGLQWKHSUHYLRXVOLQHWR
VWRUHWKHUHVXOWV7KHRQO\GLIIHUHQFHLVWKDWWKHPHPRU\UHVLGHVRQWKH*38QRZ
QRWRQWKHKRVWV\VWHP
7KHPRVWVLJQLȌFDQWGLIIHUHQFHLVWKDWZHVSHFLI\KRZPDQ\SDUDOOHOEORFNVRQ
ZKLFKWRH[HFXWHWKHIXQFWLRQkernel()%HFDXVHHDFKSRLQWFDQEHFRPSXWHG
LQGHSHQGHQWO\RIHYHU\RWKHUSRLQWZHVLPSO\VSHFLI\RQHFRS\RIWKHIXQFWLRQIRU
HDFKSRLQWZHZDQWWRFRPSXWH:HPHQWLRQHGWKDWIRUVRPHSUREOHPGRPDLQV
LWKHOSVWRXVHWZRGLPHQVLRQDOLQGH[LQJ8QVXUSULVLQJO\FRPSXWLQJIXQFWLRQ
YDOXHVRYHUDWZRGLPHQVLRQDOGRPDLQVXFKDVWKHFRPSOH[SODQHLVRQHRIWKHVH
SUREOHPV6RZHVSHFLI\DWZRGLPHQVLRQDOJULGRIEORFNVLQWKLVOLQH
dim3 grid(DIM,DIM);
7KHW\SHdim3LVQRWDVWDQGDUG&W\SHOHVW\RXIHDUHG\RXKDGIRUJRWWHQVRPH
NH\SLHFHVRILQIRUPDWLRQ5DWKHUWKH&8'$UXQWLPHKHDGHUȌOHVGHȌQHVRPH
FRQYHQLHQFHW\SHVWRHQFDSVXODWHPXOWLGLPHQVLRQDOWXSOHV7KHW\SHdim3 repre-
VHQWVDWKUHHGLPHQVLRQDOWXSOHWKDWZLOOEHXVHGWRVSHFLI\WKHVL]HRIRXUODXQFK
%XWZK\GRZHXVHDWKUHHGLPHQVLRQDOYDOXHZKHQZHRKVRFOHDUO\VWDWHGWKDW
RXUODXQFKLVDtwo-dimensionalJULG"
)UDQNO\ZHGRWKLVEHFDXVHDWKUHHGLPHQVLRQDOdim3YDOXHLVZKDWWKH&8'$
UXQWLPHH[SHFWV$OWKRXJKDWKUHHGLPHQVLRQDOODXQFKJULGLVQRWFXUUHQWO\
VXSSRUWHGWKH&8'$UXQWLPHVWLOOH[SHFWVDdim3YDULDEOHZKHUHWKHODVWFRPSR-
QHQWHTXDOV:KHQZHLQLWLDOL]HLWZLWKRQO\WZRYDOXHVDVZHGRLQWKHVWDWH-
PHQWdim3 grid(DIM,DIM)WKH&8'$UXQWLPHDXWRPDWLFDOO\ȌOOVWKHWKLUG
GLPHQVLRQZLWKWKHYDOXHVRHYHU\WKLQJKHUHZLOOZRUNDVH[SHFWHG$OWKRXJK
LWǢVSRVVLEOHWKDW19,',$ZLOOVXSSRUWDWKUHHGLPHQVLRQDOJULGLQWKHIXWXUHIRU
QRZZHǢOOMXVWSOD\QLFHO\ZLWKWKHNHUQHOODXQFK$3,EHFDXVHZKHQFRGHUVDQG
$3,VȌJKWWKH$3,DOZD\VZLQV
51
:HWKHQSDVVRXUdim3YDULDEOHgridWRWKH&8'$UXQWLPHLQWKLVOLQH
kernel<<<grid,1>>>( dev _ bitmap );
)LQDOO\DFRQVHTXHQFHRIWKHUHVXOWVUHVLGLQJRQWKHGHYLFHLVWKDWDIWHUH[HFXWLQJ
kernel()ZHKDYHWRFRS\WKHUHVXOWVEDFNWRWKHKRVW$VZHOHDUQHGLQ
SUHYLRXVFKDSWHUVZHDFFRPSOLVKWKLVZLWKDFDOOWRcudaMemcpy()VSHFLI\LQJ
WKHGLUHFWLRQcudaMemcpyDeviceToHostDVWKHODVWDUJXPHQW
2QHRIWKHODVWZULQNOHVLQWKHGLIIHUHQFHRILPSOHPHQWDWLRQFRPHVLQWKHLPSOH-
PHQWDWLRQRIkernel()
)LUVWZHQHHGkernel()WREHGHFODUHGDVD__global__IXQFWLRQVRLWUXQV
RQWKHGHYLFHEXWFDQEHFDOOHGIURPWKHKRVW8QOLNHWKH&38YHUVLRQZHQR
ORQJHUQHHGQHVWHGfor()ORRSVWRJHQHUDWHWKHSL[HOLQGLFHVWKDWJHWSDVVHG
52
WRjulia()$VZLWKWKHYHFWRUDGGLWLRQH[DPSOHWKH&8'$UXQWLPHJHQHUDWHV
WKHVHLQGLFHVIRUXVLQWKHYDULDEOHblockIdx7KLVZRUNVEHFDXVHZHGHFODUHG
RXUJULGRIEORFNVWRKDYHWKHVDPHGLPHQVLRQVDVRXULPDJHVRZHJHWRQHEORFN
IRUHDFKSDLURILQWHJHUV(x,y)EHWZHHQ(0,0)DQG(DIM-1, DIM-1)
1H[WWKHRQO\DGGLWLRQDOLQIRUPDWLRQZHQHHGLVDOLQHDURIIVHWLQWRRXURXWSXW
EXIIHUptr7KLVJHWVFRPSXWHGXVLQJDQRWKHUEXLOWLQYDULDEOHgridDim7KLV
YDULDEOHLVDFRQVWDQWDFURVVDOOEORFNVDQGVLPSO\KROGVWKHGLPHQVLRQVRIWKH
JULGWKDWZDVODXQFKHG,QWKLVH[DPSOHLWZLOODOZD\VEHWKHYDOXH DIM, DIM)
6RPXOWLSO\LQJWKHURZLQGH[E\WKHJULGZLGWKDQGDGGLQJWKHFROXPQLQGH[ZLOO
JLYHXVDXQLTXHLQGH[LQWRptrWKDWUDQJHVIURP0WR(DIM*DIM-1)
int offset = x + y * gridDim.x;
)LQDOO\ZHH[DPLQHWKHDFWXDOFRGHWKDWGHWHUPLQHVZKHWKHUDSRLQWLVLQRURXW
RIWKH-XOLD6HW7KLVFRGHVKRXOGORRNLGHQWLFDOWRWKH&38YHUVLRQFRQWLQXLQJD
WUHQGZHKDYHVHHQLQPDQ\H[DPSOHVQRZ
int i = 0;
for (i=0; i<200; i++) {
a = a * a + c;
if (a.magnitude2() > 1000)
return 0;
}
return 1;
}
53
$JDLQZHGHȌQHDcuComplexVWUXFWXUHWKDWGHȌQHVDPHWKRGIRUVWRULQJD
FRPSOH[QXPEHUZLWKVLQJOHSUHFLVLRQȍRDWLQJSRLQWFRPSRQHQWV7KHVWUXFWXUH
DOVRGHȌQHVDGGLWLRQDQGPXOWLSOLFDWLRQRSHUDWRUVDVZHOODVDIXQFWLRQWRUHWXUQ
WKHPDJQLWXGHRIWKHFRPSOH[YDOXH
struct cuComplex {
float r;
float i;
cuComplex( float a, float b ) : r(a), i(b) {}
__device__ float magnitude2( void ) {
return r * r + i * i;
}
__device__ cuComplex operator*(const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
__device__ cuComplex operator+(const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
1RWLFHWKDWZHXVHWKHVDPHODQJXDJHFRQVWUXFWVLQ&8'$&WKDWZHXVHLQRXU
&38YHUVLRQ7KHRQHGLIIHUHQFHLVWKHTXDOLȌHU__device__ZKLFKLQGLFDWHV
WKDWWKLVFRGHZLOOUXQRQD*38DQGQRWRQWKHKRVW5HFDOOWKDWEHFDXVHWKHVH
IXQFWLRQVDUHGHFODUHGDV__device__IXQFWLRQVWKH\ZLOOEHFDOODEOHRQO\IURP
RWKHU__device__IXQFWLRQVRUIURP__global__IXQFWLRQV
6LQFHZHǢYHLQWHUUXSWHGWKHFRGHZLWKFRPPHQWDU\VRIUHTXHQWO\KHUHLVWKH
HQWLUHVRXUFHOLVWLQJIURPVWDUWWRȌQLVK
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
54
struct cuComplex {
float r;
float i;
cuComplex( float a, float b ) : r(a), i(b) {}
__device__ float magnitude2( void ) {
return r * r + i * i;
}
__device__ cuComplex operator*(const cuComplex& a) {
return cuComplex(r*a.r - i*a.i, i*a.r + r*a.i);
}
__device__ cuComplex operator+(const cuComplex& a) {
return cuComplex(r+a.r, i+a.i);
}
};
int i = 0;
for (i=0; i<200; i++) {
a = a * a + c;
if (a.magnitude2() > 1000)
return 0;
}
return 1;
}
55
dim3 grid(DIM,DIM);
kernel<<<grid,1>>>( dev_bitmap );
:KHQ\RXUXQWKHDSSOLFDWLRQ\RXVKRXOGVHHDQDQLPDWLQJYLVXDOL]DWLRQRIWKH
-XOLD6HW7RFRQYLQFH\RXWKDWLWKDVHDUQHGWKHWLWOHǤ$)XQ([DPSOHǥ)LJXUH
VKRZVDVFUHHQVKRWWDNHQIURPWKLVDSSOLFDWLRQ
56
&KDSWHU5HYLHZ
&RQJUDWXODWLRQV\RXFDQQRZZULWHFRPSLOHDQGUXQPDVVLYHO\SDUDOOHOFRGH
RQDJUDSKLFVSURFHVVRU<RXVKRXOGJREUDJWR\RXUIULHQGV$QGLIWKH\DUHVWLOO
XQGHUWKHPLVFRQFHSWLRQWKDW*38FRPSXWLQJLVH[RWLFDQGGLIȌFXOWWRPDVWHU
WKH\ZLOOEHPRVWLPSUHVVHG7KHHDVHZLWKZKLFK\RXDFFRPSOLVKHGLWZLOOEH
RXUVHFUHW,IWKH\ǢUHSHRSOH\RXWUXVWZLWK\RXUVHFUHWVVXJJHVWWKDWWKH\EX\WKH
ERRNWRR
:HKDYHVRIDUORRNHGDWKRZWRLQVWUXFWWKH&8'$UXQWLPHWRH[HFXWHPXOWLSOH
FRSLHVRIRXUSURJUDPLQSDUDOOHORQZKDWZHFDOOHGblocks:HFDOOHGWKHFROOHF-
WLRQRIEORFNVZHODXQFKRQWKH*38Dgrid$VWKHQDPHPLJKWLPSO\DJULGFDQ
EHHLWKHUDRQHRUWZRGLPHQVLRQDOFROOHFWLRQRIEORFNV(DFKFRS\RIWKHNHUQHO
FDQGHWHUPLQHZKLFKEORFNLWLVH[HFXWLQJZLWKWKHEXLOWLQYDULDEOHblockIdx
/LNHZLVHLWFDQGHWHUPLQHWKHVL]HRIWKHJULGE\XVLQJWKHEXLOWLQYDULDEOH
gridDim%RWKRIWKHVHEXLOWLQYDULDEOHVSURYHGXVHIXOZLWKLQRXUNHUQHOWR
FDOFXODWHWKHGDWDLQGH[IRUZKLFKHDFKEORFNLVUHVSRQVLEOH
57
:HKDYHQRZZULWWHQRXUȌUVWSURJUDPXVLQJ&8'$&DVZHOODVKDYHVHHQKRZ
WRZULWHFRGHWKDWH[HFXWHVLQSDUDOOHORQD*387KLVLVDQH[FHOOHQWVWDUW%XW
DUJXDEO\RQHRIWKHPRVWLPSRUWDQWFRPSRQHQWVWRSDUDOOHOSURJUDPPLQJLV
WKHPHDQVE\ZKLFKWKHSDUDOOHOSURFHVVLQJHOHPHQWVFRRSHUDWHRQVROYLQJD
SUREOHP5DUHDUHWKHSUREOHPVZKHUHHYHU\SURFHVVRUFDQFRPSXWHUHVXOWV
DQGWHUPLQDWHH[HFXWLRQZLWKRXWDSDVVLQJWKRXJKWDVWRZKDWWKHRWKHUSURFHV-
VRUVDUHGRLQJ)RUHYHQPRGHUDWHO\VRSKLVWLFDWHGDOJRULWKPVZHZLOOQHHGWKH
SDUDOOHOFRSLHVRIRXUFRGHWRFRPPXQLFDWHDQGFRRSHUDWH6RIDUZHKDYHQRW
VHHQDQ\PHFKDQLVPVIRUDFFRPSOLVKLQJWKLVFRPPXQLFDWLRQEHWZHHQVHFWLRQV
RI&8'$&FRGHH[HFXWLQJLQSDUDOOHO)RUWXQDWHO\WKHUHLVDVROXWLRQRQHWKDWZH
ZLOOEHJLQWRH[SORUHLQWKLVFKDSWHU
59
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWZKDW&8'$&FDOOVthreads
ǩ <RXZLOOOHDUQDPHFKDQLVPIRUGLIIHUHQWWKUHDGVWRFRPPXQLFDWHZLWKHDFKRWKHU
ǩ <RXZLOOOHDUQDPHFKDQLVPWRV\QFKURQL]HWKHSDUDOOHOH[HFXWLRQRIGLIIHUHQW
WKUHDGV
6SOLWWLQJ3DUDOOHO%ORFNV
,QWKHSUHYLRXVFKDSWHUZHORRNHGDWKRZWRODXQFKSDUDOOHOFRGHRQWKH*38:H
GLGWKLVE\LQVWUXFWLQJWKH&8'$UXQWLPHV\VWHPRQKRZPDQ\SDUDOOHOFRSLHVRI
RXUNHUQHOWRODXQFK:HFDOOWKHVHSDUDOOHOFRSLHVblocks
7KH&8'$UXQWLPHDOORZVWKHVHEORFNVWREHVSOLWLQWRthreads5HFDOOWKDWZKHQ
ZHODXQFKHGPXOWLSOHSDUDOOHOEORFNVZHFKDQJHGWKHȌUVWDUJXPHQWLQWKHDQJOH
EUDFNHWVIURPWRWKHQXPEHURIEORFNVZHZDQWHGWRODXQFK)RUH[DPSOHZKHQ
ZHVWXGLHGYHFWRUDGGLWLRQZHODXQFKHGDEORFNIRUHDFKHOHPHQWLQWKHYHFWRURI
VL]H1E\FDOOLQJWKLV
add<<<N,1>>>( dev_a, dev_b, dev_c );
,QVLGHWKHDQJOHEUDFNHWVWKHVHFRQGSDUDPHWHUDFWXDOO\UHSUHVHQWVWKHQXPEHU
RIWKUHDGVSHUEORFNZHZDQWWKH&8'$UXQWLPHWRFUHDWHRQRXUEHKDOI7RWKLV
SRLQWZHKDYHRQO\HYHUODXQFKHGRQHWKUHDGSHUEORFN,QWKHSUHYLRXVH[DPSOH
ZHODXQFKHGWKHIROORZLQJ
1EORFNV[WKUHDGEORFN 1SDUDOOHOWKUHDGV
6RUHDOO\ZHFRXOGKDYHODXQFKHGN/2EORFNVZLWKWZRWKUHDGVSHUEORFNN/4
EORFNVZLWKIRXUWKUHDGVSHUEORFNDQGVRRQ/HWǢVUHYLVLWRXUYHFWRUDGGLWLRQ
H[DPSOHDUPHGZLWKWKLVQHZLQIRUPDWLRQDERXWWKHFDSDELOLWLHVRI&8'$&
9(&72568065('8;
:HHQGHDYRUWRDFFRPSOLVKWKHVDPHWDVNDVZHGLGLQWKHSUHYLRXVFKDSWHU7KDW
LVZHZDQWWRWDNHWZRLQSXWYHFWRUVDQGVWRUHWKHLUVXPLQDWKLUGRXWSXWYHFWRU
+RZHYHUWKLVWLPHZHZLOOXVHWKUHDGVLQVWHDGRIEORFNVWRDFFRPSOLVKWKLV
60
<RXPD\EHZRQGHULQJZKDWLVWKHDGYDQWDJHRIXVLQJWKUHDGVUDWKHUWKDQ
EORFNV":HOOIRUQRZWKHUHLVQRDGYDQWDJHZRUWKGLVFXVVLQJ%XWSDUDOOHO
WKUHDGVZLWKLQDEORFNZLOOKDYHWKHDELOLW\WRGRWKLQJVWKDWSDUDOOHOEORFNVFDQQRW
GR6RIRUQRZEHSDWLHQWDQGKXPRUXVZKLOHZHZDONWKURXJKDSDUDOOHOWKUHDG
YHUVLRQRIWKHSDUDOOHOEORFNH[DPSOHIURPWKHSUHYLRXVFKDSWHU
*389(&725680686,1*7+5($'6
:HZLOOVWDUWE\DGGUHVVLQJWKHWZRFKDQJHVRIQRWHZKHQPRYLQJIURPSDUDOOHO
EORFNVWRSDUDOOHOWKUHDGV2XUNHUQHOLQYRFDWLRQZLOOFKDQJHIURPRQHWKDW
ODXQFKHVNEORFNVRIRQHWKUHDGDSLHFH
add<<<N,1>>>( dev _ a, dev _ b, dev _ c );
WRDYHUVLRQWKDWODXQFKHVNWKUHDGVDOOZLWKLQRQHEORFN
add<<<1,N>>>( dev _ a, dev _ b, dev _ c );
7KHRQO\RWKHUFKDQJHDULVHVLQWKHPHWKRGE\ZKLFKZHLQGH[RXUGDWD
3UHYLRXVO\ZLWKLQRXUNHUQHOZHLQGH[HGWKHLQSXWDQGRXWSXWGDWDE\EORFNLQGH[
int tid = blockIdx.x;
7KHSXQFKOLQHKHUHVKRXOGQRWEHDVXUSULVH1RZWKDWZHKDYHRQO\DVLQJOH
EORFNZHKDYHWRLQGH[WKHGDWDE\WKUHDGLQGH[
int tid = threadIdx.x;
7KHVHDUHWKHRQO\WZRFKDQJHVUHTXLUHGWRPRYHIURPDSDUDOOHOEORFNLPSOH-
PHQWDWLRQWRDSDUDOOHOWKUHDGLPSOHPHQWDWLRQ)RUFRPSOHWHQHVVKHUHLVWKH
HQWLUHVRXUFHOLVWLQJZLWKWKHFKDQJHGOLQHVLQEROG
#include "../common/book.h"
#define N 10
61
// copy the array ‘c’ back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( c,
dev_c,
N * sizeof(int),
cudaMemcpyDeviceToHost ) );
62
return 0;
}
3UHWW\VLPSOHVWXIIULJKW",QWKHQH[WVHFWLRQZHǢOOVHHRQHRIWKHOLPLWDWLRQV
RIWKLVWKUHDGRQO\DSSURDFK$QGRIFRXUVHODWHUZHǢOOVHHZK\ZHZRXOGHYHQ
ERWKHUVSOLWWLQJEORFNVLQWRRWKHUSDUDOOHOFRPSRQHQWV
*3868062)$/21*(59(&725
,QWKHSUHYLRXVFKDSWHUZHQRWHGWKDWWKHKDUGZDUHOLPLWVWKHQXPEHURIEORFNV
LQDVLQJOHODXQFKWR6LPLODUO\WKHKDUGZDUHOLPLWVWKHQXPEHURIWKUHDGV
SHUEORFNZLWKZKLFKZHFDQODXQFKDNHUQHO6SHFLȌFDOO\WKLVQXPEHUFDQQRW
H[FHHGWKHYDOXHVSHFLȌHGE\WKHmaxThreadsPerBlockȌHOGRIWKHGHYLFH
SURSHUWLHVVWUXFWXUHZHORRNHGDWLQ&KDSWHU)RUPDQ\RIWKHJUDSKLFVSURFHV-
VRUVFXUUHQWO\DYDLODEOHWKLVOLPLWLVWKUHDGVSHUEORFNVRKRZZRXOGZHXVH
DWKUHDGEDVHGDSSURDFKWRDGGWZRYHFWRUVRIVL]HJUHDWHUWKDQ":HZLOO
KDYHWRXVHDFRPELQDWLRQRIWKUHDGVDQGEORFNVWRDFFRPSOLVKWKLV
$VEHIRUHWKLVZLOOUHTXLUHWZRFKDQJHV:HZLOOKDYHWRFKDQJHWKHLQGH[FRPSX-
WDWLRQZLWKLQWKHNHUQHODQGZHZLOOKDYHWRFKDQJHWKHNHUQHOODXQFKLWVHOI
1RZWKDWZHKDYHPXOWLSOHEORFNVDQGWKUHDGVWKHLQGH[LQJZLOOVWDUWWRORRN
VLPLODUWRWKHVWDQGDUGPHWKRGIRUFRQYHUWLQJIURPDWZRGLPHQVLRQDOLQGH[
VSDFHWRDOLQHDUVSDFH
7KLVDVVLJQPHQWXVHVDQHZEXLOWLQYDULDEOHblockDim7KLVYDULDEOHLVD
FRQVWDQWIRUDOOEORFNVDQGVWRUHVWKHQXPEHURIWKUHDGVDORQJHDFKGLPHQ-
VLRQRIWKHEORFN6LQFHZHDUHXVLQJDRQHGLPHQVLRQDOEORFNZHUHIHURQO\WR
blockDim.x,I\RXUHFDOOgridDimVWRUHGDVLPLODUYDOXHEXWLWVWRUHGWKH
QXPEHURIEORFNVDORQJHDFKGLPHQVLRQRIWKHHQWLUHJULG0RUHRYHUgridDimLV
WZRGLPHQVLRQDOZKHUHDVblockDimLVDFWXDOO\WKUHHGLPHQVLRQDO7KDWLVWKH
&8'$UXQWLPHDOORZV\RXWRODXQFKDWZRGLPHQVLRQDOJULGRIEORFNVZKHUHHDFK
EORFNLVDWKUHHGLPHQVLRQDODUUD\RIWKUHDGV<HVWKLVLVDORWRIGLPHQVLRQVDQG
LWLVXQOLNHO\\RXZLOOUHJXODUO\QHHGWKHȌYHGHJUHHVRILQGH[LQJIUHHGRPDIIRUGHG
\RXEXWWKH\DUHDYDLODEOHLIVRGHVLUHG
63
,QGH[LQJWKHGDWDLQDOLQHDUDUUD\XVLQJWKHSUHYLRXVDVVLJQPHQWDFWXDOO\LVTXLWH
LQWXLWLYH,I\RXGLVDJUHHLWPD\KHOSWRWKLQNDERXW\RXUFROOHFWLRQRIEORFNVRI
WKUHDGVVSDWLDOO\VLPLODUWRDWZRGLPHQVLRQDODUUD\RISL[HOV:HGHSLFWWKLV
DUUDQJHPHQWLQ)LJXUH
,IWKHWKUHDGVUHSUHVHQWFROXPQVDQGWKHEORFNVUHSUHVHQWURZVZHFDQJHWD
XQLTXHLQGH[E\WDNLQJWKHSURGXFWRIWKHEORFNLQGH[ZLWKWKHQXPEHURIWKUHDGV
LQHDFKEORFNDQGDGGLQJWKHWKUHDGLQGH[ZLWKLQWKHEORFN7KLVLVLGHQWLFDOWRWKH
PHWKRGZHXVHGWROLQHDUL]HWKHWZRGLPHQVLRQDOLPDJHLQGH[LQWKH-XOLD6HW
H[DPSOH
int offset = x + y * DIM;
7KHRWKHUFKDQJHLVWRWKHNHUQHOODXQFKLWVHOI:HVWLOOQHHGNSDUDOOHOWKUHDGVWR
ODXQFKEXWZHZDQWWKHPWRODXQFKDFURVVPXOWLSOHEORFNVVRZHGRQRWKLWWKH
WKUHDGOLPLWDWLRQLPSRVHGXSRQXV2QHVROXWLRQLVWRDUELWUDULO\VHWWKHEORFN
VL]HWRVRPHȌ[HGQXPEHURIWKUHDGVIRUWKLVH[DPSOHOHWǢVXVHWKUHDGVSHU
EORFN7KHQZHFDQMXVWODXQFKN/128EORFNVWRJHWRXUWRWDORINWKUHDGVUXQQLQJ
7KHZULQNOHKHUHLVWKDWN/128LVDQLQWHJHUGLYLVLRQ7KLVLPSOLHVWKDWLIN were
N/128ZRXOGEH]HURDQGZHZLOOQRWDFWXDOO\FRPSXWHDQ\WKLQJLIZHODXQFK
64
]HURWKUHDGV,QIDFWZHZLOOODXQFKWRRIHZWKUHDGVZKHQHYHUNLVQRWDQH[DFW
PXOWLSOHRI7KLVLVEDG:HDFWXDOO\ZDQWWKLVGLYLVLRQWRURXQGXS
7KHUHLVDFRPPRQWULFNWRDFFRPSOLVKWKLVLQLQWHJHUGLYLVLRQZLWKRXWFDOOLQJ
ceil():HDFWXDOO\FRPSXWH(N+127)/128LQVWHDGRIN/128(LWKHU\RXFDQ
WDNHRXUZRUGWKDWWKLVZLOOFRPSXWHWKHVPDOOHVWPXOWLSOHRIJUHDWHUWKDQRU
HTXDOWRNRU\RXFDQWDNHDPRPHQWQRZWRFRQYLQFH\RXUVHOIRIWKLVIDFW
:HKDYHFKRVHQWKUHDGVSHUEORFNDQGWKHUHIRUHXVHWKHIROORZLQJNHUQHO
ODXQFK
%HFDXVHRIRXUFKDQJHWRWKHGLYLVLRQWKDWHQVXUHVZHODXQFKHQRXJKWKUHDGVZH
ZLOODFWXDOO\QRZODXQFKtoo many WKUHDGVZKHQNLVQRWDQH[DFWPXOWLSOHRI
%XWWKHUHLVDVLPSOHUHPHG\WRWKLVSUREOHPDQGRXUNHUQHODOUHDG\WDNHVFDUHRI
LW:HKDYHWRFKHFNZKHWKHUDWKUHDGǢVRIIVHWLVDFWXDOO\EHWZHHQDQGNEHIRUH
ZHXVHLWWRDFFHVVRXULQSXWDQGRXWSXWDUUD\V
if (tid < N)
c[tid] = a[tid] + b[tid];
7KXVZKHQRXULQGH[RYHUVKRRWVWKHHQGRIRXUDUUD\DVZLOODOZD\VKDSSHQ
ZKHQZHODXQFKDQRQPXOWLSOHRIZHDXWRPDWLFDOO\UHIUDLQIURPSHUIRUPLQJ
WKHFDOFXODWLRQ0RUHLPSRUWDQWZHUHIUDLQIURPUHDGLQJDQGZULWLQJPHPRU\RII
WKHHQGRIRXUDUUD\
*3868062)$5%,75$5,/</21*9(&7256
:HZHUHQRWFRPSOHWHO\IRUWKFRPLQJZKHQZHȌUVWGLVFXVVHGODXQFKLQJSDUDOOHO
EORFNVRQD*38,QDGGLWLRQWRWKHOLPLWDWLRQRQWKUHDGFRXQWWKHUHLVDOVRD
KDUGZDUHOLPLWDWLRQRQWKHQXPEHURIEORFNV DOEHLWPXFKJUHDWHUWKDQWKHWKUHDG
OLPLWDWLRQ $VZHǢYHPHQWLRQHGSUHYLRXVO\QHLWKHUGLPHQVLRQRIDJULGRIEORFNV
PD\H[FHHG
6RWKLVUDLVHVDSUREOHPZLWKRXUFXUUHQWYHFWRUDGGLWLRQLPSOHPHQWDWLRQ,I
ZHODXQFKN/128EORFNVWRDGGRXUYHFWRUVZHZLOOKLWODXQFKIDLOXUHVZKHQ
RXUYHFWRUVH[FHHG HOHPHQWV7KLVVHHPVOLNHDODUJH
QXPEHUEXWZLWKFXUUHQWPHPRU\FDSDFLWLHVEHWZHHQ*%DQG*%WKHKLJKHQG
JUDSKLFVSURFHVVRUVFDQKROGRUGHUVRIPDJQLWXGHPRUHGDWDWKDQYHFWRUVZLWK
PLOOLRQHOHPHQWV
65
)RUWXQDWHO\WKHVROXWLRQWRWKLVLVVXHLVH[WUHPHO\VLPSOH:HȌUVWPDNHDFKDQJH
WRRXUNHUQHO
7KLVORRNVUHPDUNDEO\OLNHRXUoriginalYHUVLRQRIYHFWRUDGGLWLRQ,QIDFWFRPSDUH
LWWRWKHIROORZLQJ&38LPSOHPHQWDWLRQIURPWKHSUHYLRXVFKDSWHU
+HUHZHDOVRXVHGDwhile()ORRSWRLWHUDWHWKURXJKWKHGDWD5HFDOOWKDWZH
FODLPHGWKDWUDWKHUWKDQLQFUHPHQWLQJWKHDUUD\LQGH[E\DPXOWL&38RUPXOWL-
FRUHYHUVLRQFRXOGLQFUHPHQWE\WKHQXPEHURISURFHVVRUVZHZDQWHGWRXVH:H
ZLOOQRZXVHWKDWVDPHSULQFLSOHLQWKH*38YHUVLRQ
,QWKH*38LPSOHPHQWDWLRQZHFRQVLGHUWKHQXPEHURISDUDOOHOWKUHDGVODXQFKHG
WREHWKHQXPEHURISURFHVVRUV$OWKRXJKWKHDFWXDO*38PD\KDYHIHZHU RU
PRUH SURFHVVLQJXQLWVWKDQWKLVZHWKLQNRIHDFKWKUHDGDVORJLFDOO\H[HFXWLQJ
LQSDUDOOHODQGWKHQDOORZWKHKDUGZDUHWRVFKHGXOHWKHDFWXDOH[HFXWLRQ
'HFRXSOLQJWKHSDUDOOHOL]DWLRQIURPWKHDFWXDOPHWKRGRIKDUGZDUHH[HFXWLRQLV
RQHRIEXUGHQVWKDW&8'$&OLIWVRIIDVRIWZDUHGHYHORSHUǢVVKRXOGHUV7KLVVKRXOG
FRPHDVDUHOLHIFRQVLGHULQJFXUUHQW19,',$KDUGZDUHFDQVKLSZLWKDQ\ZKHUH
EHWZHHQDQGDULWKPHWLFXQLWVSHUFKLS
1RZWKDWZHXQGHUVWDQGWKHSULQFLSOHEHKLQGWKLVLPSOHPHQWDWLRQZHMXVWQHHG
WRXQGHUVWDQGKRZZHGHWHUPLQHWKHLQLWLDOLQGH[YDOXHIRUHDFKSDUDOOHOWKUHDG
66
DQGKRZZHGHWHUPLQHWKHLQFUHPHQW:HZDQWHDFKSDUDOOHOWKUHDGWRVWDUWRQ
DGLIIHUHQWGDWDLQGH[VRZHMXVWQHHGWRWDNHRXUWKUHDGDQGEORFNLQGH[HVDQG
OLQHDUL]HWKHPDVZHVDZLQWKHǤ*386XPVRID/RQJHU9HFWRUǥVHFWLRQ(DFK
WKUHDGZLOOVWDUWDWDQLQGH[JLYHQE\WKHIROORZLQJ
int tid = threadIdx.x + blockIdx.x * blockDim.x;
$IWHUHDFKWKUHDGȌQLVKHVLWVZRUNDWWKHFXUUHQWLQGH[ZHQHHGWRLQFUHPHQW
HDFKRIWKHPE\WKHWRWDOQXPEHURIWKUHDGVUXQQLQJLQWKHJULG7KLVLVVLPSO\WKH
QXPEHURIWKUHDGVSHUEORFNPXOWLSOLHGE\WKHQXPEHURIEORFNVLQWKHJULGRU
blockDim.x * gridDim.x+HQFHWKHLQFUHPHQWVWHSLVDVIROORZV
tid += blockDim.x * gridDim.x;
:HDUHDOPRVWWKHUH7KHRQO\UHPDLQLQJSLHFHLVWRȌ[WKHODXQFK
LWVHOI,I\RXUHPHPEHUZHWRRNWKLVGHWRXUEHFDXVHWKHODXQFK
add<<<(N+127)/128,128>>>( dev_a, dev_b, dev_c )ZLOOIDLOZKHQ
(N+127)/128LVJUHDWHUWKDQ7RHQVXUHZHQHYHUODXQFKWRRPDQ\EORFNV
ZHZLOOMXVWȌ[WKHQXPEHURIEORFNVWRVRPHUHDVRQDEO\VPDOOYDOXH6LQFHZHOLNH
FRS\LQJDQGSDVWLQJVRPXFKZHZLOOXVHEORFNVHDFKZLWKWKUHDGV
add<<<128,128>>>( dev _ a, dev _ b, dev _ c );
<RXVKRXOGIHHOIUHHWRDGMXVWWKHVHYDOXHVKRZHYHU\RXVHHȌWSURYLGHGWKDW
\RXUYDOXHVUHPDLQZLWKLQWKHOLPLWVZHǢYHGLVFXVVHG/DWHULQWKHERRNZHZLOO
GLVFXVVWKHSRWHQWLDOSHUIRUPDQFHLPSOLFDWLRQVRIWKHVHFKRLFHVEXWIRUQRZLW
VXIȌFHVWRFKRRVHWKUHDGVSHUEORFNDQGEORFNV1RZZHFDQDGGYHFWRUV
RIDUELWUDU\OHQJWKOLPLWHGRQO\E\WKHDPRXQWRI5$0ZHKDYHRQRXU*38+HUH
LVWKHHQWLUHVRXUFHOLVWLQJ
#include "../common/book.h"
67
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( c,
dev_c,
N * sizeof(int),
cudaMemcpyDeviceToHost ) );
// verify that the GPU did the work we requested
bool success = true;
for (int i=0; i<N; i++) {
if ((a[i] + b[i]) != c[i]) {
printf( “Error: %d + %d != %d\n”, a[i], b[i], c[i] );
success = false;
68
}
}
if (success) printf( "We did it!\n" );
return 0;
}
*385,33/(86,1*7+5($'6
$VZLWKWKHSUHYLRXVFKDSWHUZHZLOOUHZDUG\RXUSDWLHQFHZLWKYHFWRUDGGLWLRQE\
SUHVHQWLQJDPRUHIXQH[DPSOHWKDWGHPRQVWUDWHVVRPHRIWKHWHFKQLTXHVZHǢYH
EHHQXVLQJ:HZLOODJDLQXVHRXU*38FRPSXWLQJSRZHUWRJHQHUDWHSLFWXUHV
SURFHGXUDOO\%XWWRPDNHWKLQJVHYHQPRUHLQWHUHVWLQJWKLVWLPHZHZLOODQLPDWH
WKHP%XWGRQǢWZRUU\ZHǢYHSDFNDJHGDOOWKHXQUHODWHGDQLPDWLRQFRGHLQWR
KHOSHUIXQFWLRQVVR\RXZRQǢWKDYHWRPDVWHUDQ\JUDSKLFVRUDQLPDWLRQ
struct DataBlock {
unsigned char *dev_bitmap;
CPUAnimBitmap *bitmap;
};
69
0RVWRIWKHFRPSOH[LW\RImain()LVKLGGHQLQWKHKHOSHUFODVV
CPUAnimBitmap<RXZLOOQRWLFHWKDWZHDJDLQKDYHDSDWWHUQRIGRLQJD
cudaMalloc()H[HFXWLQJGHYLFHFRGHWKDWXVHVWKHDOORFDWHGPHPRU\DQG
WKHQFOHDQLQJXSZLWKcudaFree()7KLVVKRXOGEHROGKDWWR\RXE\QRZ
,QWKLVH[DPSOHZHKDYHVOLJKWO\FRQYROXWHGWKHPHDQVE\ZKLFKZHDFFRPSOLVK
WKHPLGGOHVWHSǤH[HFXWLQJGHYLFHFRGHWKDWXVHVWKHDOORFDWHGPHPRU\ǥ:H
SDVVWKHanim_and_exit()PHWKRGDIXQFWLRQSRLQWHUWRgenerate_frame()
7KLVIXQFWLRQZLOOEHFDOOHGE\WKHFODVVHYHU\WLPHLWZDQWVWRJHQHUDWHDQHZ
IUDPHRIWKHDQLPDWLRQ
$OWKRXJKWKLVIXQFWLRQFRQVLVWVRQO\RIIRXUOLQHVWKH\DOOLQYROYHLPSRUWDQW
&8'$&FRQFHSWV)LUVWZHGHFODUHWZRWZRGLPHQVLRQDOYDULDEOHVblocks
DQGthreads$VRXUQDPLQJFRQYHQWLRQPDNHVSDLQIXOO\REYLRXVWKHYDULDEOH
blocksUHSUHVHQWVWKHQXPEHURISDUDOOHOEORFNVZHZLOOODXQFKLQRXUJULG7KH
YDULDEOHthreadsUHSUHVHQWVWKHQXPEHURIWKUHDGVZHZLOOODXQFKSHUEORFN
%HFDXVHZHDUHJHQHUDWLQJDQLPDJHZHXVHWZRGLPHQVLRQDOLQGH[LQJVRWKDW
HDFKWKUHDGZLOOKDYHDXQLTXH(x,y)LQGH[WKDWZHFDQHDVLO\SXWLQWRFRUUHVSRQ-
GHQFHZLWKDSL[HOLQWKHRXWSXWLPDJH:HKDYHFKRVHQWRXVHEORFNVWKDWFRQVLVW
70
RID[DUUD\RIWKUHDGV,IWKHLPDJHKDVDIM[DIMSL[HOVZHQHHGWRODXQFK
DIM/16[DIM/16EORFNVWRJHWRQHWKUHDGSHUSL[HO)LJXUHVKRZVKRZWKLV
EORFNDQGWKUHDGFRQȌJXUDWLRQZRXOGORRNLQD ULGLFXORXVO\ VPDOOSL[HOZLGH
SL[HOKLJKLPDJH
71
,I\RXKDYHGRQHDQ\PXOWLWKUHDGHG&38SURJUDPPLQJ\RXPD\EHZRQGHULQJ
ZK\ZHZRXOGODXQFKVRPDQ\WKUHDGV)RUH[DPSOHWRUHQGHUDIXOOKLJK
GHȌQLWLRQDQLPDWLRQDW[WKLVPHWKRGZRXOGFUHDWHPRUHWKDQPLOOLRQ
WKUHDGV$OWKRXJKZHURXWLQHO\FUHDWHDQGVFKHGXOHWKLVPDQ\WKUHDGVRQD*38
RQHZRXOGQRWGUHDPRIFUHDWLQJWKLVPDQ\WKUHDGVRQD&38%HFDXVH&38
WKUHDGPDQDJHPHQWDQGVFKHGXOLQJPXVWEHGRQHLQVRIWZDUHLWVLPSO\FDQQRW
VFDOHWRWKHQXPEHURIWKUHDGVWKDWD*38FDQ%HFDXVHZHFDQVLPSO\FUHDWHD
WKUHDGIRUHDFKGDWDHOHPHQWZHZDQWWRSURFHVVSDUDOOHOSURJUDPPLQJRQD*38
FDQEHIDUVLPSOHUWKDQRQD&38
$IWHUGHFODULQJWKHYDULDEOHVWKDWKROGWKHGLPHQVLRQVRIRXUODXQFKZHVLPSO\
ODXQFKWKHNHUQHOWKDWZLOOFRPSXWHRXUSL[HOYDOXHV
kernel<<< blocks,threads>>>( d->dev _ bitmap, ticks );
7KHNHUQHOZLOOQHHGWZRSLHFHVRILQIRUPDWLRQWKDWZHSDVVDVSDUDPHWHUV)LUVW
LWQHHGVDSRLQWHUWRGHYLFHPHPRU\WKDWKROGVWKHRXWSXWSL[HOV7KLVLVDJOREDO
YDULDEOHWKDWKDGLWVPHPRU\DOORFDWHGLQmain()%XWWKHYDULDEOHLVǤJOREDOǥ
RQO\IRUKRVWFRGHVRZHQHHGWRSDVVLWDVDSDUDPHWHUWRHQVXUHWKDWWKH&8'$
UXQWLPHZLOOPDNHLWDYDLODEOHIRURXUGHYLFHFRGH
6HFRQGRXUNHUQHOZLOOQHHGWRNQRZWKHFXUUHQWDQLPDWLRQWLPHVRLWFDQ
JHQHUDWHWKHFRUUHFWIUDPH7KHFXUUHQWWLPHticksLVSDVVHGWRWKH
generate_frame()IXQFWLRQIURPWKHLQIUDVWUXFWXUHFRGHLQCPUAnimBitmap,
VRZHFDQVLPSO\SDVVWKLVRQWRRXUNHUQHO
$QGQRZKHUHǢVWKHNHUQHOFRGHLWVHOI
72
7KHȌUVWWKUHHDUHWKHPRVWLPSRUWDQWOLQHVLQWKHNHUQHO
,QWKHVHOLQHVHDFKWKUHDGWDNHVLWVLQGH[ZLWKLQLWVEORFNDVZHOODVWKHLQGH[
RILWVEORFNZLWKLQWKHJULGDQGLWWUDQVODWHVWKLVLQWRDXQLTXH(x,y)LQGH[
ZLWKLQWKHLPDJH6RZKHQWKHWKUHDGDWLQGH[(3, 5)LQEORFN(12, 8)EHJLQV
H[HFXWLQJLWNQRZVWKDWWKHUHDUHHQWLUHEORFNVWRWKHOHIWRILWDQGHQWLUH
EORFNVDERYHLW:LWKLQLWVEORFNWKHWKUHDGDW(3, 5)KDVWKUHHWKUHDGVWRWKH
OHIWDQGȌYHDERYHLW%HFDXVHWKHUHDUHWKUHDGVSHUEORFNWKLVPHDQVWKH
WKUHDGLQTXHVWLRQKDVWKHIROORZLQJ
7KLVFRPSXWDWLRQLVLGHQWLFDOWRWKHFRPSXWDWLRQRIxDQGyLQWKHȌUVWWZROLQHV
DQGLVKRZZHPDSWKHWKUHDGDQGEORFNLQGLFHVWRLPDJHFRRUGLQDWHV7KHQZH
VLPSO\OLQHDUL]HWKHVHxDQGyYDOXHVWRJHWDQRIIVHWLQWRWKHRXWSXWEXIIHU$JDLQ
WKLVLVLGHQWLFDOWRZKDWZHGLGLQWKHǤ*386XPVRID/RQJHU9HFWRUǥDQGǤ*38
6XPVRI$UELWUDULO\/RQJ9HFWRUVǥVHFWLRQV
int offset = x + y * blockDim.x * gridDim.x;
6LQFHZHNQRZZKLFK(x,y)SL[HOLQWKHLPDJHWKHWKUHDGVKRXOGFRPSXWHDQG
ZHNQRZWKHWLPHDWZKLFKLWQHHGVWRFRPSXWHWKLVYDOXHZHFDQFRPSXWHDQ\
73
IXQFWLRQRI(x,y,t)DQGVWRUHWKLVYDOXHLQWKHRXWSXWEXIIHU,QWKLVFDVHWKH
IXQFWLRQSURGXFHVDWLPHYDU\LQJVLQXVRLGDOǤULSSOHǥ
float fx = x - DIM/2;
float fy = y - DIM/2;
float d = sqrtf( fx * fx + fy * fy );
unsigned char grey = (unsigned char)(128.0f + 127.0f *
cos(d/10.0f - ticks/7.0f) /
(d/10.0f + 1.0f));
:HUHFRPPHQGWKDW\RXQRWJHWWRRKXQJXSRQWKHFRPSXWDWLRQRIgrey,WǢV
HVVHQWLDOO\MXVWD'IXQFWLRQRIWLPHWKDWPDNHVDQLFHULSSOLQJHIIHFWZKHQLWǢV
DQLPDWHG$VFUHHQVKRWRIRQHIUDPHVKRXOGORRNVRPHWKLQJOLNH)LJXUH
74
6KDUHG0HPRU\DQG
6\QFKURQL]DWLRQ
6RIDUWKHPRWLYDWLRQIRUVSOLWWLQJEORFNVLQWRWKUHDGVZDVVLPSO\RQHRIZRUNLQJ
DURXQGKDUGZDUHOLPLWDWLRQVWRWKHQXPEHURIEORFNVZHFDQKDYHLQȍLJKW7KLV
LVIDLUO\ZHDNPRWLYDWLRQEHFDXVHWKLVFRXOGHDVLO\EHGRQHEHKLQGWKHVFHQHVE\
WKH&8'$UXQWLPH)RUWXQDWHO\WKHUHDUHRWKHUUHDVRQVRQHPLJKWZDQWWRVSOLWD
EORFNLQWRWKUHDGV
&8'$&PDNHVDYDLODEOHDUHJLRQRIPHPRU\WKDWZHFDOOshared memory7KLV
UHJLRQRIPHPRU\EULQJVDORQJZLWKLWDQRWKHUH[WHQVLRQWRWKH&ODQJXDJHDNLQ
WR__device__ DQG__global__$VDSURJUDPPHU\RXFDQPRGLI\\RXUYDUL-
DEOHGHFODUDWLRQVZLWKWKH&8'$&NH\ZRUG__shared__WRPDNHWKLVYDULDEOH
UHVLGHQWLQVKDUHGPHPRU\%XWZKDWǢVWKHSRLQW"
:HǢUHJODG\RXDVNHG7KH&8'$&FRPSLOHUWUHDWVYDULDEOHVLQVKDUHGPHPRU\
GLIIHUHQWO\WKDQW\SLFDOYDULDEOHV,WFUHDWHVDFRS\RIWKHYDULDEOHIRUHDFKEORFN
WKDW\RXODXQFKRQWKH*38(YHU\WKUHDGLQWKDWEORFNVKDUHVWKHPHPRU\EXW
WKUHDGVFDQQRWVHHRUPRGLI\WKHFRS\RIWKLVYDULDEOHWKDWLVVHHQZLWKLQRWKHU
EORFNV7KLVSURYLGHVDQH[FHOOHQWPHDQVE\ZKLFKWKUHDGVZLWKLQDEORFNFDQ
FRPPXQLFDWHDQGFROODERUDWHRQFRPSXWDWLRQV)XUWKHUPRUHVKDUHGPHPRU\
EXIIHUVUHVLGHSK\VLFDOO\RQWKH*38DVRSSRVHGWRUHVLGLQJLQRIIFKLS'5$0
%HFDXVHRIWKLVWKHODWHQF\WRDFFHVVVKDUHGPHPRU\WHQGVWREHIDUORZHU
WKDQW\SLFDOEXIIHUVPDNLQJVKDUHGPHPRU\HIIHFWLYHDVDSHUEORFNVRIWZDUH
PDQDJHGFDFKHRUVFUDWFKSDG
7KHSURVSHFWRIFRPPXQLFDWLRQEHWZHHQWKUHDGVVKRXOGH[FLWH\RX,WH[FLWHVXV
WRR%XWQRWKLQJLQOLIHLVIUHHDQGLQWHUWKUHDGFRPPXQLFDWLRQLVQRH[FHSWLRQ
,IZHH[SHFWWRFRPPXQLFDWHEHWZHHQWKUHDGVZHDOVRQHHGDPHFKDQLVPIRU
V\QFKURQL]LQJEHWZHHQWKUHDGV)RUH[DPSOHLIWKUHDG$ZULWHVDYDOXHWRVKDUHG
PHPRU\DQGZHZDQWWKUHDG%WRGRVRPHWKLQJZLWKWKLVYDOXHZHFDQǢWKDYH
WKUHDG%VWDUWLWVZRUNXQWLOZHNQRZWKHZULWHIURPWKUHDG$LVFRPSOHWH:LWKRXW
V\QFKURQL]DWLRQZHKDYHFUHDWHGDUDFHFRQGLWLRQZKHUHWKHFRUUHFWQHVVRIWKH
H[HFXWLRQUHVXOWVGHSHQGVRQWKHQRQGHWHUPLQLVWLFGHWDLOVRIWKHKDUGZDUH
/HWǢVWDNHDORRNDWDQH[DPSOHWKDWXVHVWKHVHIHDWXUHV
75
)RUH[DPSOHLIZHWDNHWKHGRWSURGXFWRIWZRIRXUHOHPHQWYHFWRUVZHZRXOGJHW
(TXDWLRQ
Equation 5.1
3HUKDSVWKHDOJRULWKPZHWHQGWRXVHLVEHFRPLQJREYLRXV:HFDQGRWKHȌUVW
VWHSH[DFWO\KRZZHGLGYHFWRUDGGLWLRQ(DFKWKUHDGPXOWLSOLHVDSDLURIFRUUH-
VSRQGLQJHQWULHVDQGWKHQHYHU\WKUHDGPRYHVRQWRLWVQH[WSDLU%HFDXVHWKH
UHVXOWQHHGVWREHWKHVXPRIDOOWKHVHSDLUZLVHSURGXFWVHDFKWKUHDGNHHSV
DUXQQLQJVXPRIWKHSDLUVLWKDVDGGHG-XVWOLNHLQWKHDGGLWLRQH[DPSOHWKH
WKUHDGVLQFUHPHQWWKHLULQGLFHVE\WKHWRWDOQXPEHURIWKUHDGVWRHQVXUHZHGRQǢW
PLVVDQ\HOHPHQWVDQGGRQǢWPXOWLSO\DSDLUWZLFH+HUHLVWKHȌUVWVWHSRIWKHGRW
SURGXFWURXWLQH
#include "../common/book.h"
76
float temp = 0;
while (tid < N) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
$V\RXFDQVHHZHKDYHGHFODUHGDEXIIHURIVKDUHGPHPRU\QDPHGcache7KLV
EXIIHUZLOOEHXVHGWRVWRUHHDFKWKUHDGǢVUXQQLQJVXP6RRQZHZLOOVHHwhy we
GRWKLVEXWIRUQRZZHZLOOVLPSO\H[DPLQHWKHPHFKDQLFVE\ZKLFKZHDFFRP-
SOLVKLW,WLVWULYLDOWRGHFODUHDYDULDEOHWRUHVLGHLQVKDUHGPHPRU\DQGLWLV
LGHQWLFDOWRWKHPHDQVE\ZKLFK\RXGHFODUHDYDULDEOHDVstaticRUvolatile
LQVWDQGDUG&
__shared__ float cache[threadsPerBlock];
:HGHFODUHWKHDUUD\RIVL]HthreadsPerBlockVRHDFKWKUHDGLQWKHEORFN
KDVDSODFHWRVWRUHLWVWHPSRUDU\UHVXOW5HFDOOWKDWZKHQZHKDYHDOORFDWHG
PHPRU\JOREDOO\ZHDOORFDWHGHQRXJKIRUHYHU\WKUHDGWKDWUXQVWKHNHUQHORU
threadsPerBlockWLPHVWKHWRWDOQXPEHURIEORFNV%XWVLQFHWKHFRPSLOHU
ZLOOFUHDWHDFRS\RIWKHVKDUHGYDULDEOHVIRUHDFKEORFNZHQHHGWRDOORFDWHRQO\
HQRXJKPHPRU\VXFKWKDWHDFKWKUHDGLQWKHEORFNKDVDQHQWU\
$IWHUDOORFDWLQJWKHVKDUHGPHPRU\ZHFRPSXWHRXUGDWDLQGLFHVPXFKOLNHZH
KDYHLQWKHSDVW
7KHFRPSXWDWLRQIRUWKHYDULDEOHtidVKRXOGORRNIDPLOLDUE\QRZZHDUHMXVW
FRPELQLQJWKHEORFNDQGWKUHDGLQGLFHVWRJHWDJOREDORIIVHWLQWRRXULQSXWDUUD\V
7KHRIIVHWLQWRRXUVKDUHGPHPRU\FDFKHLVVLPSO\RXUWKUHDGLQGH[$JDLQZH
GRQǢWQHHGWRLQFRUSRUDWHRXUEORFNLQGH[LQWRWKLVRIIVHWEHFDXVHHDFKEORFNKDV
LWVRZQSULYDWHFRS\RIWKLVVKDUHGPHPRU\
77
)LQDOO\ZHFOHDURXUVKDUHGPHPRU\EXIIHUVRWKDWODWHUZHZLOOEHDEOHWREOLQGO\
VXPWKHHQWLUHDUUD\ZLWKRXWZRUU\LQJZKHWKHUDSDUWLFXODUHQWU\KDVYDOLGGDWD
VWRUHGWKHUH
,WZLOOEHSRVVLEOHWKDWQRWHYHU\HQWU\ZLOOEHXVHGLIWKHVL]HRIWKHLQSXWYHFWRUV
LVQRWDPXOWLSOHRIWKHQXPEHURIWKUHDGVSHUEORFN,QWKLVFDVHWKHODVWEORFN
ZLOOKDYHVRPHWKUHDGVWKDWGRQRWKLQJDQGWKHUHIRUHGRQRWZULWHYDOXHV
(DFKWKUHDGFRPSXWHVDUXQQLQJVXPRIWKHSURGXFWRIFRUUHVSRQGLQJHQWULHVLQa
DQGb$IWHUUHDFKLQJWKHHQGRIWKHDUUD\HDFKWKUHDGVWRUHVLWVWHPSRUDU\VXP
LQWRWKHVKDUHGEXIIHU
float temp = 0;
while (tid < N) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
$WWKLVSRLQWLQWKHDOJRULWKPZHQHHGWRVXPDOOWKHWHPSRUDU\YDOXHVZHǢYH
SODFHGLQWKHFDFKH7RGRWKLVZHZLOOQHHGVRPHRIWKHWKUHDGVWRUHDGWKH
YDOXHVWKDWKDYHEHHQVWRUHGWKHUH+RZHYHUDVZHPHQWLRQHGWKLVLVDSRWHQ-
WLDOO\GDQJHURXVRSHUDWLRQ:HQHHGDPHWKRGWRJXDUDQWHHWKDWDOORIWKHVH
ZULWHVWRWKHVKDUHGDUUD\cache[]FRPSOHWHEHIRUHDQ\RQHWULHVWRUHDGIURP
WKLVEXIIHU)RUWXQDWHO\VXFKDPHWKRGH[LVWV
7KLVFDOOJXDUDQWHHVWKDWHYHU\WKUHDGLQWKHEORFNKDVFRPSOHWHGLQVWUXFWLRQV
SULRUWRWKH__syncthreads()EHIRUHWKHKDUGZDUHZLOOH[HFXWHWKHQH[W
78
LQVWUXFWLRQRQDQ\WKUHDG7KLVLVH[DFWO\ZKDWZHQHHG:HQRZNQRZWKDWZKHQ
WKHȌUVWWKUHDGH[HFXWHVWKHȌUVWLQVWUXFWLRQDIWHURXU__syncthreads(),
HYHU\RWKHUWKUHDGLQWKHEORFNKDVDOVRȌQLVKHGH[HFXWLQJXSWRWKH
__syncthreads()
1RZWKDWZHKDYHJXDUDQWHHGWKDWRXUWHPSRUDU\FDFKHKDVEHHQȌOOHGZH
FDQVXPWKHYDOXHVLQLW:HFDOOWKHJHQHUDOSURFHVVRIWDNLQJDQLQSXWDUUD\
DQGSHUIRUPLQJVRPHFRPSXWDWLRQVWKDWSURGXFHDVPDOOHUDUUD\RIUHVXOWVD
reduction5HGXFWLRQVDULVHRIWHQLQSDUDOOHOFRPSXWLQJZKLFKOHDGVWRWKHGHVLUH
WRJLYHWKHPDQDPH
7KHQD±YHZD\WRDFFRPSOLVKWKLVUHGXFWLRQZRXOGEHKDYLQJRQHWKUHDGLWHUDWH
RYHUWKHVKDUHGPHPRU\DQGFDOFXODWHDUXQQLQJVXP7KLVZLOOWDNHXVWLPH
SURSRUWLRQDOWRWKHOHQJWKRIWKHDUUD\+RZHYHUVLQFHZHKDYHKXQGUHGVRI
WKUHDGVDYDLODEOHWRGRRXUZRUNZHFDQGRWKLVUHGXFWLRQLQSDUDOOHODQGWDNH
WLPHWKDWLVSURSRUWLRQDOWRWKHORJDULWKPRIWKHOHQJWKRIWKHDUUD\$WȌUVWWKH
IROORZLQJFRGHZLOOORRNFRQYROXWHGZHǢOOEUHDNLWGRZQLQDPRPHQW
7KHJHQHUDOLGHDLVWKDWHDFKWKUHDGZLOODGGWZRRIWKHYDOXHVLQcache[]DQG
VWRUHWKHUHVXOWEDFNWRcache[]6LQFHHDFKWKUHDGFRPELQHVWZRHQWULHVLQWR
RQHZHFRPSOHWHWKLVVWHSZLWKKDOIDVPDQ\HQWULHVDVZHVWDUWHGZLWK,QWKH
QH[WVWHSZHGRWKHVDPHWKLQJRQWKHUHPDLQLQJKDOI:HFRQWLQXHLQWKLVIDVKLRQ
IRUlog2(threadsPerBlock)VWHSVXQWLOZHKDYHWKHVXPRIHYHU\HQWU\LQ
cache[])RURXUH[DPSOHZHǢUHXVLQJWKUHDGVSHUEORFNVRLWWDNHVLWHUD-
WLRQVRIWKLVSURFHVVWRUHGXFHWKHHQWULHVLQcache[]WRDVLQJOHVXP
7KHFRGHIRUWKLVIROORZV
79
)RUWKHȌUVWVWHSZHVWDUWZLWKiDVKDOIWKHQXPEHURIthreadsPerBlock
:HRQO\ZDQWWKHWKUHDGVZLWKLQGLFHVOHVVWKDQWKLVYDOXHWRGRDQ\ZRUNVRZH
FRQGLWLRQDOO\DGGWZRHQWULHVRIcache[]LIWKHWKUHDGǢVLQGH[LVOHVVWKDQi:H
SURWHFWRXUDGGLWLRQZLWKLQDQif(cacheIndex < i)EORFN(DFKWKUHDGZLOO
WDNHWKHHQWU\DWLWVLQGH[LQcache[]DGGLWWRWKHHQWU\DWLWVLQGH[RIIVHWE\i,
DQGVWRUHWKLVVXPEDFNWRcache[]
6XSSRVHWKHUHZHUHHLJKWHQWULHVLQcache[]DQGDVDUHVXOWiKDGWKHYDOXH
2QHVWHSRIWKHUHGXFWLRQZRXOGORRNOLNH)LJXUH
$IWHUZHKDYHFRPSOHWHGDVWHSZHKDYHWKHVDPHUHVWULFWLRQZHGLGDIWHU
FRPSXWLQJDOOWKHSDLUZLVHSURGXFWV%HIRUHZHFDQUHDGWKHYDOXHVZHMXVWVWRUHG
LQcache[]ZHQHHGWRHQVXUHWKDWHYHU\WKUHDGWKDWQHHGVWRZULWHWRcache[]
KDVDOUHDG\GRQHVR7KH__syncthreads()DIWHUWKHDVVLJQPHQWHQVXUHVWKLV
FRQGLWLRQLVPHW
$IWHUWHUPLQDWLRQRIWKLVwhile()ORRSHDFKEORFNKDVEXWDVLQJOHQXPEHU
UHPDLQLQJ7KLVQXPEHULVVLWWLQJLQWKHȌUVWHQWU\RIcache[]DQGLVWKHVXP
RIHYHU\SDLUZLVHSURGXFWWKHWKUHDGVLQWKDWEORFNFRPSXWHG:HWKHQVWRUHWKLV
VLQJOHYDOXHWRJOREDOPHPRU\DQGHQGRXUNHUQHO
if (cacheIndex == 0)
c[blockIdx.x] = cache[0];
}
80
:K\GRZHGRWKLVJOREDOVWRUHRQO\IRUWKHWKUHDGZLWKcacheIndex == 0":HOO
VLQFHWKHUHLVRQO\RQHQXPEHUWKDWQHHGVZULWLQJWRJOREDOPHPRU\RQO\DVLQJOH
WKUHDGQHHGVWRSHUIRUPWKLVRSHUDWLRQ&RQFHLYDEO\HYHU\WKUHDGFRXOGSHUIRUP
WKLVZULWHDQGWKHSURJUDPZRXOGVWLOOZRUNEXWGRLQJVRZRXOGFUHDWHDQXQQHF-
HVVDULO\ODUJHDPRXQWRIPHPRU\WUDIȌFWRZULWHDVLQJOHYDOXH)RUVLPSOLFLW\
ZHFKRVHWKHWKUHDGZLWKLQGH[WKRXJK\RXFRXOGFRQFHLYDEO\KDYHFKRVHQDQ\
cacheIndexWRZULWH cache[0@WRJOREDOPHPRU\)LQDOO\VLQFHHDFKEORFN
ZLOOZULWHH[DFWO\RQHYDOXHWRWKHJOREDODUUD\c[]ZHFDQVLPSO\LQGH[LWE\
blockIdx
:HDUHOHIWZLWKDQDUUD\c[]HDFKHQWU\RIZKLFKFRQWDLQVWKHVXPSURGXFHGE\
RQHRIWKHSDUDOOHOEORFNV7KHODVWVWHSRIWKHGRWSURGXFWLVWRVXPWKHHQWULHV
RIc[](YHQWKRXJKWKHGRWSURGXFWLVQRWIXOO\FRPSXWHGZHH[LWWKHNHUQHODQG
UHWXUQFRQWUROWRWKHKRVWDWWKLVSRLQW%XWZK\GRZHUHWXUQWRWKHKRVWEHIRUH
WKHFRPSXWDWLRQLVFRPSOHWH"
3UHYLRXVO\ZHUHIHUUHGWRDQRSHUDWLRQOLNHDGRWSURGXFWDVDreduction5RXJKO\
VSHDNLQJWKLVLVEHFDXVHZHSURGXFHIHZHURXWSXWGDWDHOHPHQWVWKDQZHLQSXW
,QWKHFDVHRIDGRWSURGXFWZHDOZD\VSURGXFHH[DFWO\RQHRXWSXWUHJDUGOHVV
RIWKHVL]HRIRXULQSXW,WWXUQVRXWWKDWDPDVVLYHO\SDUDOOHOPDFKLQHOLNHD*38
WHQGVWRZDVWHLWVUHVRXUFHVZKHQSHUIRUPLQJWKHODVWVWHSVRIDUHGXFWLRQVLQFH
WKHVL]HRIWKHGDWDVHWLVVRVPDOODWWKDWSRLQWLWLVKDUGWRXWLOL]HDULWKPHWLF
XQLWVWRDGGQXPEHUV
)RUWKLVUHDVRQZHUHWXUQFRQWUROWRWKHKRVWDQGOHWWKH&38ȌQLVKWKHȌQDOVWHS
RIWKHDGGLWLRQVXPPLQJWKHDUUD\c[],QDODUJHUDSSOLFDWLRQWKH*38ZRXOG
QRZEHIUHHWRVWDUWDQRWKHUGRWSURGXFWRUZRUNRQDQRWKHUODUJHFRPSXWDWLRQ
+RZHYHULQWKLVH[DPSOHZHDUHGRQHZLWKWKH*38
,QH[SODLQLQJWKLVH[DPSOHZHEURNHZLWKWUDGLWLRQDQGMXPSHGULJKWLQWRWKH
DFWXDONHUQHOFRPSXWDWLRQ:HKRSH\RXZLOOKDYHQRWURXEOHXQGHUVWDQGLQJWKH
ERG\RImain()XSWRWKHNHUQHOFDOOVLQFHLWLVRYHUZKHOPLQJO\VLPLODUWRZKDW
ZHKDYHVKRZQEHIRUH
81
dot<<<blocksPerGrid,threadsPerBlock>>>( dev_a,
dev_b,
dev_partial_c );
7RDYRLG\RXSDVVLQJRXWIURPERUHGRPZHZLOOTXLFNO\VXPPDUL]HWKLVFRGH
$OORFDWHKRVWDQGGHYLFHPHPRU\IRULQSXWDQGRXWSXWDUUD\V
)LOOLQSXWDUUD\Va[]DQGb[]DQGWKHQFRS\WKHVHWRWKHGHYLFHXVLQJ
cudaMemcpy()
&DOORXUGRWSURGXFWNHUQHOXVLQJVRPHSUHGHWHUPLQHGQXPEHURIWKUHDGV
SHUEORFNDQGEORFNVSHUJULG
82
'HVSLWHPRVWRIWKLVEHLQJFRPPRQSODFHWR\RXQRZLWLVZRUWKH[DPLQLQJWKH
FRPSXWDWLRQIRUWKHQXPEHURIEORFNVZHODXQFK:HGLVFXVVHGKRZWKHGRW
SURGXFWLVDUHGXFWLRQDQGKRZHDFKEORFNODXQFKHGZLOOFRPSXWHDSDUWLDOVXP
7KHOHQJWKRIWKLVOLVWRISDUWLDOVXPVVKRXOGEHVRPHWKLQJPDQDJHDEO\VPDOO
IRUWKH&38\HWODUJHHQRXJKVXFKWKDWZHKDYHHQRXJKEORFNVLQȍLJKWWRNHHS
HYHQWKHIDVWHVW*38VEXV\:HKDYHFKRVHQEORFNVDOWKRXJKWKLVLVDFDVH
ZKHUH\RXPD\QRWLFHEHWWHURUZRUVHSHUIRUPDQFHIRURWKHUFKRLFHVHVSHFLDOO\
GHSHQGLQJRQWKHUHODWLYHVSHHGVRI\RXU&38DQG*38
%XWZKDWLIZHDUHJLYHQDYHU\VKRUWOLVWDQGEORFNVRIWKUHDGVDSLHFH
LVWRRPDQ\",IZHKDYHNGDWDHOHPHQWVZHQHHGRQO\NWKUHDGVLQRUGHU
WRFRPSXWHRXUGRWSURGXFW6RLQWKLVFDVHZHQHHGWKHVPDOOHVWPXOWLSOH
RIthreadsPerBlockWKDWLVJUHDWHUWKDQRUHTXDOWRN:HKDYHVHHQWKLV
RQFHEHIRUHZKHQZHZHUHDGGLQJYHFWRUV,QWKLVFDVHZHJHWWKHVPDOOHVW
PXOWLSOHRIthreadsPerBlockWKDWLVJUHDWHUWKDQRUHTXDOWRNE\FRPSXWLQJ
(N+(threadsPerBlock-1)) / threadsPerBlock$V\RXPD\EHDEOH
WRWHOOWKLVLVDFWXDOO\DIDLUO\FRPPRQWULFNLQLQWHJHUPDWKVRLWLVZRUWK
GLJHVWLQJWKLVHYHQLI\RXVSHQGPRVWRI\RXUWLPHZRUNLQJRXWVLGHWKH
&8'$&UHDOP
7KHUHIRUHWKHQXPEHURIEORFNVZHODXQFKVKRXOGEHHLWKHURU
(N+(threadsPerBlock-1)) / threadsPerBlockZKLFKHYHUYDOXHLV
VPDOOHU
1RZLWVKRXOGEHFOHDUKRZZHDUULYHDWWKHFRGHLQmain()$IWHUWKHNHUQHO
ȌQLVKHVZHVWLOOKDYHWRVXPWKHUHVXOW%XWOLNHWKHZD\ZHFRS\RXULQSXWWR
WKH*38EHIRUHZHODXQFKDNHUQHOZHQHHGWRFRS\RXURXWSXWEDFNWRWKH&38
EHIRUHZHFRQWLQXHZRUNLQJZLWKLW6RDIWHUWKHNHUQHOȌQLVKHVZHFRS\EDFNWKH
OLVWRISDUWLDOVXPVDQGFRPSOHWHWKHVXPRQWKH&38
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( partial_c, dev_partial_c,
blocksPerGrid*sizeof(float),
cudaMemcpyDeviceToHost ) );
83
)LQDOO\ZHFKHFNRXUUHVXOWVDQGFOHDQXSWKHPHPRU\ZHǢYHDOORFDWHGRQERWK
WKH&38DQG*38&KHFNLQJWKHUHVXOWVLVPDGHHDVLHUEHFDXVHZHǢYHȌOOHGWKH
LQSXWVZLWKSUHGLFWDEOHGDWD,I\RXUHFDOOa[]LVȌOOHGZLWKWKHLQWHJHUVIURPWR
N-1DQGb[]LVMXVW2*a[]
2XUGRWSURGXFWVKRXOGEHWZRWLPHVWKHVXPRIWKHVTXDUHVRIWKHLQWHJHUV
IURPWRN-1)RUWKHUHDGHUZKRORYHVGLVFUHWHPDWKHPDWLFV DQGZKDWǢVQRWWR
ORYH" LWZLOOEHDQDPXVLQJGLYHUVLRQWRGHULYHWKHFORVHGIRUPVROXWLRQIRUWKLV
VXPPDWLRQ)RUWKRVHZLWKOHVVSDWLHQFHRULQWHUHVWZHSUHVHQWWKHFORVHGIRUP
KHUHDVZHOODVWKHUHVWRIWKHERG\RImain()
84
,I\RXIRXQGDOORXUH[SODQDWRU\LQWHUUXSWLRQVERWKHUVRPHKHUHLVWKHHQWLUH
VRXUFHOLVWLQJVDQVFRPPHQWDU\
#include "../common/book.h"
float temp = 0;
while (tid < N) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
85
if (cacheIndex == 0)
c[blockIdx.x] = cache[0];
}
86
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( partial_c, dev_partial_c,
blocksPerGrid*sizeof(float),
cudaMemcpyDeviceToHost ) );
'27352'8&7237,0,=('Ȏ,1&255(&7/<ȏ
:HTXLFNO\JORVVHGRYHUWKHVHFRQG__syncthreads()LQWKHGRWSURGXFW
H[DPSOH1RZZHZLOOWDNHDFORVHUORRNDWLWDVZHOODVH[DPLQLQJDQDWWHPSW
WRLPSURYHLW,I\RXUHFDOOZHQHHGHGWKHVHFRQG__syncthreads()EHFDXVH
87
ZHXSGDWHRXUVKDUHGPHPRU\YDULDEOHcache[]DQGQHHGWKHVHXSGDWHVWREH
YLVLEOHWRHYHU\WKUHDGRQWKHQH[WLWHUDWLRQWKURXJKWKHORRS
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i)
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
i /= 2;
}
2EVHUYHWKDWZHXSGDWHRXUVKDUHGPHPRU\EXIIHUcache[]RQO\LIcacheIndex
LVOHVVWKDQi6LQFHcacheIndexLVUHDOO\MXVWthreadIdx.xWKLVPHDQVWKDW
RQO\someRIWKHWKUHDGVDUHXSGDWLQJHQWULHVLQWKHVKDUHGPHPRU\FDFKH6LQFH
ZHDUHXVLQJ__syncthreadsRQO\WRHQVXUHWKDWWKHVHXSGDWHVKDYHWDNHQ
SODFHEHIRUHSURFHHGLQJLWVWDQGVWRUHDVRQWKDWZHPLJKWVHHDVSHHGLPSURYH-
PHQWRQO\LIZHZDLWIRUWKHWKUHDGVWKDWDUHDFWXDOO\ZULWLQJWRVKDUHGPHPRU\
:HGRWKLVE\PRYLQJWKHV\QFKURQL]DWLRQFDOOLQVLGHWKHif()EORFN
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i) {
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
}
i /= 2;
}
$OWKRXJKWKLVZDVDYDOLDQWHIIRUWDWRSWLPL]DWLRQLWZLOOQRWDFWXDOO\ZRUN,QIDFW
WKHVLWXDWLRQLVZRUVHWKDQWKDW7KLVFKDQJHWRWKHNHUQHOZLOODFWXDOO\FDXVHWKH
*38WRVWRSUHVSRQGLQJIRUFLQJ\RXWRNLOO\RXUSURJUDP%XWZKDWFRXOGKDYH
JRQHVRFDWDVWURSKLFDOO\ZURQJZLWKVXFKDVHHPLQJO\LQQRFXRXVFKDQJH"
7RDQVZHUWKLVTXHVWLRQLWKHOSVWRLPDJLQHHYHU\WKUHDGLQWKHEORFNPDUFKLQJ
WKURXJKWKHFRGHRQHOLQHDWDWLPH$WHDFKLQVWUXFWLRQLQWKHSURJUDPHYHU\
WKUHDGH[HFXWHVWKHVDPHLQVWUXFWLRQEXWHDFKFDQRSHUDWHRQGLIIHUHQWGDWD
%XWZKDWKDSSHQVZKHQWKHLQVWUXFWLRQWKDWHYHU\WKUHDGLVVXSSRVHGWRH[HFXWH
88
LVLQVLGHDFRQGLWLRQDOEORFNOLNHDQif()"2EYLRXVO\QRWHYHU\WKUHDGVKRXOG
H[HFXWHWKDWLQVWUXFWLRQULJKW")RUH[DPSOHFRQVLGHUDNHUQHOWKDWFRQWDLQVWKH
IROORZLQJIUDJPHQWRIFRGHWKDWLQWHQGVIRURGGLQGH[HGWKUHDGVWRXSGDWHWKH
YDOXHRIVRPHYDULDEOH
int myVar = 0;
if( threadIdx.x % 2 )
myVar = threadIdx.x;
,QWKHSUHYLRXVH[DPSOHZKHQWKHWKUHDGVDUULYHDWWKHOLQHLQEROGRQO\WKH
WKUHDGVZLWKRGGLQGLFHVZLOOH[HFXWHLWVLQFHWKHWKUHDGVZLWKHYHQLQGLFHVGRQRW
VDWLVI\WKHFRQGLWLRQif( threadIdx.x % 2 )7KHHYHQQXPEHUHGWKUHDGV
VLPSO\GRQRWKLQJZKLOHWKHRGGWKUHDGVH[HFXWHWKLVLQVWUXFWLRQ:KHQVRPHRI
WKHWKUHDGVQHHGWRH[HFXWHDQLQVWUXFWLRQZKLOHRWKHUVGRQǢWWKLVVLWXDWLRQLV
NQRZQDVthread divergence8QGHUQRUPDOFLUFXPVWDQFHVGLYHUJHQWEUDQFKHV
VLPSO\UHVXOWLQVRPHWKUHDGVUHPDLQLQJLGOHZKLOHWKHRWKHUWKUHDGVDFWXDOO\
H[HFXWHWKHLQVWUXFWLRQVLQWKHEUDQFK
%XWLQWKHFDVHRI__syncthreads()WKHUHVXOWLVVRPHZKDWWUDJLF7KH
&8'$$UFKLWHFWXUHJXDUDQWHHVWKDWno threadZLOODGYDQFHWRDQLQVWUXFWLRQ
EH\RQGWKH__syncthreads()XQWLOeveryWKUHDGLQWKHEORFNKDVH[HFXWHGWKH
__syncthreads()8QIRUWXQDWHO\LIWKH__syncthreads()VLWVLQDGLYHUJHQW
EUDQFKVRPHRIWKHWKUHDGVZLOOneverUHDFKWKH__syncthreads()7KHUHIRUH
EHFDXVHRIWKHJXDUDQWHHWKDWQRLQVWUXFWLRQDIWHUD__syncthreads()FDQEH
H[HFXWHGEHIRUHHYHU\WKUHDGKDVH[HFXWHGLWWKHKDUGZDUHVLPSO\FRQWLQXHVWR
ZDLWIRUWKHVHWKUHDGV$QGZDLWV$QGZDLWV)RUHYHU
7KLVLVWKHVLWXDWLRQLQWKHGRWSURGXFWH[DPSOHZKHQZHPRYHWKH
__syncthreads()FDOOLQVLGHWKHif()EORFN$Q\WKUHDGZLWKcacheIndex
JUHDWHUWKDQRUHTXDOWRi will neverH[HFXWHWKH__syncthreads()7KLVHIIHF-
WLYHO\KDQJVWKHSURFHVVRUEHFDXVHLWUHVXOWVLQWKH*38ZDLWLQJIRUVRPHWKLQJ
WKDWZLOOQHYHUKDSSHQ
if (cacheIndex < i) {
cache[cacheIndex] += cache[cacheIndex + i];
__syncthreads();
}
89
7KHPRUDORIWKLVVWRU\LVWKDW__syncthreads()LVDSRZHUIXOPHFKDQLVP
IRUHQVXULQJWKDW\RXUPDVVLYHO\SDUDOOHODSSOLFDWLRQVWLOOFRPSXWHVWKHFRUUHFW
UHVXOWV%XWEHFDXVHRIWKLVSRWHQWLDOIRUXQLQWHQGHGFRQVHTXHQFHVZHVWLOOQHHG
WRWDNHFDUHZKHQXVLQJLW
6+$5('0(025<%,70$3
:HKDYHORRNHGDWH[DPSOHVWKDWXVHVKDUHGPHPRU\DQGHPSOR\HG
__syncthreads()WRHQVXUHWKDWGDWDLVUHDG\EHIRUHZHFRQWLQXH
,QWKHQDPHRIVSHHG\RXPD\EHWHPSWHGWROLYHGDQJHURXVO\DQGRPLW
WKH__syncthreads():HZLOOQRZORRNDWDJUDSKLFDOH[DPSOHWKDWUHTXLUHV
__syncthreads()IRUFRUUHFWQHVV:HZLOOVKRZ\RXVFUHHQVKRWVRIWKH
LQWHQGHGRXWSXWDQGRIWKHRXWSXWZKHQUXQZLWKRXW__syncthreads(),W
ZRQǢWEHSUHWW\
7KHERG\RImain()LVLGHQWLFDOWRWKH*38-XOLD6HWH[DPSOHDOWKRXJKWKLVWLPH
ZHODXQFKPXOWLSOHWKUHDGVSHUEORFN
#include "cuda.h"
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( dev_bitmap );
90
cudaFree( dev_bitmap );
}
$VZLWKWKH-XOLD6HWH[DPSOHHDFKWKUHDGZLOOEHFRPSXWLQJDSL[HOYDOXHIRUD
VLQJOHRXWSXWORFDWLRQ7KHȌUVWWKLQJWKDWHDFKWKUHDGGRHVLVFRPSXWHLWVxDQG
yORFDWLRQLQWKHRXWSXWLPDJH7KLVFRPSXWDWLRQLVLGHQWLFDOWRWKHtidFRPSXWD-
WLRQLQWKHYHFWRUDGGLWLRQH[DPSOHDOWKRXJKZHFRPSXWHLWLQWZRGLPHQVLRQV
WKLVWLPH
6LQFHZHZLOOEHXVLQJDVKDUHGPHPRU\EXIIHUWRFDFKHRXUFRPSXWDWLRQVZH
GHFODUHRQHVXFKWKDWHDFKWKUHDGLQRXU[EORFNKDVDQHQWU\
__shared__ float shared[16][16];
7KHQHDFKWKUHDGFRPSXWHVDYDOXHWREHVWRUHGLQWRWKLVEXIIHU
shared[threadIdx.x][threadIdx.y] =
255 * (sinf(x*2.0f*PI/ period) + 1.0f) *
(sinf(y*2.0f*PI/ period) + 1.0f) / 4.0f;
91
$QGODVWO\ZHVWRUHWKHVHYDOXHVEDFNRXWWRWKHSL[HOUHYHUVLQJWKHRUGHURIx
DQGy
ptr[offset*4 + 0] = 0;
ptr[offset*4 + 1] = shared[15-threadIdx.x][15-threadIdx.y];
ptr[offset*4 + 2] = 0;
ptr[offset*4 + 3] = 255;
}
*UDQWHGWKHVHFRPSXWDWLRQVDUHVRPHZKDWDUELWUDU\:HǢYHVLPSO\FRPHXSZLWK
VRPHWKLQJWKDWZLOOGUDZDJULGRIJUHHQVSKHULFDOEOREV6RDIWHUFRPSLOLQJDQG
UXQQLQJWKLVNHUQHOZHRXWSXWDQLPDJHOLNHWKHRQHLQ)LJXUH
:KDWKDSSHQHGKHUH"$V\RXPD\KDYHJXHVVHGIURPWKHZD\ZHVHWXSWKLV
H[DPSOHZHǢUHPLVVLQJDQLPSRUWDQWV\QFKURQL]DWLRQSRLQW:KHQDWKUHDG
VWRUHVWKHFRPSXWHGYDOXHLQshared[][]WRWKHSL[HOLWLVSRVVLEOHWKDWWKH
WKUHDGUHVSRQVLEOHIRUZULWLQJWKDWYDOXHWRshared[][]KDVQRWȌQLVKHG
ZULWLQJLW\HW7KHRQO\ZD\WRJXDUDQWHHWKDWWKLVGRHVQRWKDSSHQLVE\XVLQJ
__syncthreads()7KXVWKHUHVXOWLVDFRUUXSWHGSLFWXUHRIJUHHQEOREV
92
$OWKRXJKWKLVPD\QRWEHWKHHQGRIWKHZRUOG\RXUDSSOLFDWLRQPLJKWEH
FRPSXWLQJPRUHLPSRUWDQWYDOXHV
,QVWHDGZHQHHGWRDGGDV\QFKURQL]DWLRQSRLQWEHWZHHQWKHZULWHWRVKDUHG
PHPRU\DQGWKHVXEVHTXHQWUHDGIURPLW
shared[threadIdx.x][threadIdx.y] =
255 * (sinf(x*2.0f*PI/ period) + 1.0f) *
(sinf(y*2.0f*PI/ period) + 1.0f) / 4.0f;
__syncthreads();
ptr[offset*4 + 0] = 0;
ptr[offset*4 + 1] = shared[15-threadIdx.x][15-threadIdx.y];
ptr[offset*4 + 2] = 0;
ptr[offset*4 + 3] = 255;
}
:LWKWKLV__syncthreads()LQSODFHZHWKHQJHWDIDUPRUHSUHGLFWDEOH DQG
DHVWKHWLFDOO\SOHDVLQJ UHVXOWDVVKRZQLQ)LJXUH
&KDSWHU5HYLHZ
:HNQRZKRZEORFNVFDQEHVXEGLYLGHGLQWRVPDOOHUSDUDOOHOH[HFXWLRQXQLWV
NQRZQDVthreads:HUHYLVLWHGWKHYHFWRUDGGLWLRQH[DPSOHRIWKHSUHYLRXV
FKDSWHUWRVHHKRZWRSHUIRUPDGGLWLRQRIDUELWUDULO\ORQJYHFWRUV:HDOVRVKRZHG
DQH[DPSOHRIreductionDQGKRZZHXVHVKDUHGPHPRU\DQGV\QFKURQL]DWLRQWR
DFFRPSOLVKWKLV,QIDFWWKLVH[DPSOHVKRZHGKRZWKH*38DQG&38FDQFROODER-
UDWHRQFRPSXWLQJUHVXOWV)LQDOO\ZHVKRZHGKRZSHULORXVLWFDQEHWRDQDSSOL-
FDWLRQZKHQZHQHJOHFWWKHQHHGIRUV\QFKURQL]DWLRQ
<RXKDYHOHDUQHGPRVWRIWKHEDVLFVRI&8'$&DVZHOODVVRPHRIWKHZD\VLW
UHVHPEOHVVWDQGDUG&DQGDORWRIWKHLPSRUWDQWZD\VLWGLIIHUVIURPVWDQGDUG
&7KLVZRXOGEHDQH[FHOOHQWWLPHWRFRQVLGHUVRPHRIWKHSUREOHPV\RXKDYH
HQFRXQWHUHGDQGZKLFKRQHVPLJKWOHQGWKHPVHOYHVWRSDUDOOHOLPSOHPHQWDWLRQV
ZLWK&8'$&$VZHSURJUHVVZHZLOOORRNDWVRPHRIWKHRWKHUIHDWXUHVZHFDQ
XVHWRDFFRPSOLVKWDVNVRQWKH*38DVZHOODVVRPHRIWKHPRUHDGYDQFHG$3,
IHDWXUHVWKDW&8'$SURYLGHVWRXV
94
:HKRSH\RXKDYHOHDUQHGPXFKDERXWZULWLQJFRGHWKDWH[HFXWHVRQWKH*38
<RXVKRXOGNQRZKRZWRVSDZQSDUDOOHOEORFNVWRH[HFXWH\RXUNHUQHOVDQG\RX
VKRXOGNQRZKRZWRIXUWKHUVSOLWWKHVHEORFNVLQWRSDUDOOHOWKUHDGV<RXKDYHDOVR
VHHQZD\VWRHQDEOHFRPPXQLFDWLRQDQGV\QFKURQL]DWLRQEHWZHHQWKHVHWKUHDGV
%XWVLQFHWKHERRNLVQRWRYHU\HW\RXPD\KDYHJXHVVHGWKDW&8'$&KDVHYHQ
PRUHIHDWXUHVWKDWPLJKWEHXVHIXOWR\RX
7KLVFKDSWHUZLOOLQWURGXFH\RXWRDFRXSOHRIWKHVHPRUHDGYDQFHGIHDWXUHV
6SHFLȌFDOO\WKHUHH[LVWZD\VLQZKLFK\RXFDQH[SORLWVSHFLDOUHJLRQVRIPHPRU\
RQ\RXU*38LQRUGHUWRDFFHOHUDWH\RXUDSSOLFDWLRQV,QWKLVFKDSWHUZHZLOO
GLVFXVVRQHRIWKHVHUHJLRQVRIPHPRU\constant memory,QDGGLWLRQEHFDXVH
ZHDUHORRNLQJDWRXUȌUVWPHWKRGIRUHQKDQFLQJWKHSHUIRUPDQFHRI\RXU&8'$&
DSSOLFDWLRQV\RXZLOODOVROHDUQKRZWRPHDVXUHWKHSHUIRUPDQFHRI\RXUDSSOLFD-
WLRQVXVLQJ&8'$events)URPWKHVHPHDVXUHPHQWV\RXZLOOEHDEOHWRTXDQWLI\
WKHJDLQ RUORVV IURPDQ\HQKDQFHPHQWV\RXPDNH
95
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWXVLQJFRQVWDQWPHPRU\ZLWK&8'$&
ǩ <RXZLOOOHDUQDERXWWKHSHUIRUPDQFHFKDUDFWHULVWLFVRIFRQVWDQWPHPRU\
ǩ <RXZLOOOHDUQKRZWRXVH&8'$HYHQWVWRPHDVXUHDSSOLFDWLRQSHUIRUPDQFH
&RQVWDQW0HPRU\
3UHYLRXVO\ZHGLVFXVVHGKRZPRGHUQ*38VDUHHTXLSSHGZLWKHQRUPRXV
DPRXQWVRIDULWKPHWLFSURFHVVLQJSRZHU,QIDFWWKHFRPSXWDWLRQDODGYDQWDJH
JUDSKLFVSURFHVVRUVKDYHRYHU&38VKHOSHGSUHFLSLWDWHWKHLQLWLDOLQWHUHVWLQXVLQJ
JUDSKLFVSURFHVVRUVIRUJHQHUDOSXUSRVHFRPSXWLQJ:LWKKXQGUHGVRIDULWKPHWLF
XQLWVRQWKH*38RIWHQWKHERWWOHQHFNLVQRWWKHDULWKPHWLFWKURXJKSXWRIWKH
FKLSEXWUDWKHUWKHPHPRU\EDQGZLGWKRIWKHFKLS7KHUHDUHVRPDQ\$/8VRQ
JUDSKLFVSURFHVVRUVWKDWVRPHWLPHVZHMXVWFDQǢWNHHSWKHLQSXWFRPLQJWRWKHP
IDVWHQRXJKWRVXVWDLQVXFKKLJKUDWHVRIFRPSXWDWLRQ6RLWLVZRUWKLQYHVWLJDWLQJ
PHDQVE\ZKLFKZHFDQUHGXFHWKHDPRXQWRIPHPRU\WUDIȌFUHTXLUHGIRUDJLYHQ
SUREOHP
:HKDYHVHHQ&8'$&SURJUDPVWKDWKDYHXVHGERWKJOREDODQGVKDUHGPHPRU\
VRIDU+RZHYHUWKHODQJXDJHPDNHVDYDLODEOHDQRWKHUNLQGRIPHPRU\NQRZQ
DVconstant memory$VWKHQDPHPD\LQGLFDWHZHXVHFRQVWDQWPHPRU\IRU
GDWDWKDWZLOOQRWFKDQJHRYHUWKHFRXUVHRIDNHUQHOH[HFXWLRQ19,',$KDUGZDUH
SURYLGHV.%RIFRQVWDQWPHPRU\WKDWLWWUHDWVGLIIHUHQWO\WKDQLWWUHDWVVWDQGDUG
JOREDOPHPRU\,QVRPHVLWXDWLRQVXVLQJFRQVWDQWPHPRU\UDWKHUWKDQJOREDO
PHPRU\ZLOOUHGXFHWKHUHTXLUHGPHPRU\EDQGZLGWK
96
6LPSO\SXWUD\WUDFLQJLVRQHZD\RISURGXFLQJDWZRGLPHQVLRQDOLPDJHRID
VFHQHFRQVLVWLQJRIWKUHHGLPHQVLRQDOREMHFWV%XWLVQǢWWKLVZKDW*38VZHUH
RULJLQDOO\GHVLJQHGIRU"+RZLVWKLVGLIIHUHQWIURPZKDW2SHQ*/RU'LUHFW;
GRZKHQ\RXSOD\\RXUIDYRULWHJDPH":HOO*38VGRLQGHHGVROYHWKLVVDPH
SUREOHPEXWWKH\XVHDWHFKQLTXHNQRZQDVrasterization7KHUHDUHPDQ\H[FHO-
OHQWERRNVRQUDVWHUL]DWLRQVRZHZLOOQRWHQGHDYRUWRH[SODLQWKHGLIIHUHQFHV
KHUH,WVXIȌFHVWRVD\WKDWWKH\DUHFRPSOHWHO\GLIIHUHQWPHWKRGVWKDWVROYHWKH
VDPHSUREOHP
6RKRZGRHVUD\WUDFLQJSURGXFHDQLPDJHRIDWKUHHGLPHQVLRQDOVFHQH"7KH
LGHDLVVLPSOH:HFKRRVHDVSRWLQRXUVFHQHWRSODFHDQLPDJLQDU\FDPHUD7KLV
VLPSOLȌHGGLJLWDOFDPHUDFRQWDLQVDOLJKWVHQVRUVRWRSURGXFHDQLPDJHZH
QHHGWRGHWHUPLQHZKDWOLJKWZRXOGKLWWKDWVHQVRU(DFKSL[HORIWKHUHVXOWLQJ
LPDJHVKRXOGEHWKHVDPHFRORUDQGLQWHQVLW\RIWKHUD\RIOLJKWWKDWKLWVWKDWVSRW
VHQVRU
6LQFHOLJKWLQFLGHQWDWDQ\SRLQWRQWKHVHQVRUFDQFRPHIURPDQ\SODFHLQRXU
VFHQHLWWXUQVRXWLWǢVHDVLHUWRZRUNEDFNZDUG7KDWLVUDWKHUWKDQWU\LQJWR
ȌJXUHRXWZKDWOLJKWUD\KLWVWKHSL[HOLQTXHVWLRQZKDWLIZHLPDJLQHVKRRWLQJD
UD\fromWKHSL[HODQGLQWRWKHVFHQH",QWKLVZD\HDFKSL[HOEHKDYHVVRPHWKLQJ
OLNHDQH\HWKDWLVǤORRNLQJǥLQWRWKHVFHQH)LJXUHLOOXVWUDWHVWKHVHUD\VEHLQJ
FDVWRXWRIHDFKSL[HODQGLQWRWKHVFHQH
97
:HȌJXUHRXWZKDWFRORULVVHHQE\HDFKSL[HOE\WUDFLQJDUD\IURPWKHSL[HOLQ
TXHVWLRQWKURXJKWKHVFHQHXQWLOLWKLWVRQHRIRXUREMHFWV:HWKHQVD\WKDWWKH
SL[HOZRXOGǤVHHǥWKLVREMHFWDQGFDQDVVLJQLWVFRORUEDVHGRQWKHFRORURIWKH
REMHFWLWVHHV0RVWRIWKHFRPSXWDWLRQUHTXLUHGE\UD\WUDFLQJLVLQWKHFRPSXWD-
WLRQRIWKHVHLQWHUVHFWLRQVRIWKHUD\ZLWKWKHREMHFWVLQWKHVFHQH
0RUHRYHULQPRUHFRPSOH[UD\WUDFLQJPRGHOVVKLQ\REMHFWVLQWKHVFHQHFDQ
UHȍHFWUD\VDQGWUDQVOXFHQWREMHFWVFDQUHIUDFWWKHUD\VRIOLJKW7KLVFUHDWHV
VHFRQGDU\UD\VWHUWLDU\UD\VDQGVRRQ,QIDFWWKLVLVRQHRIWKHDWWUDFWLYH
IHDWXUHVRIUD\WUDFLQJLWLVYHU\VLPSOHWRJHWDEDVLFUD\WUDFHUZRUNLQJEXWZH
FDQEXLOGPRGHOVRIPRUHFRPSOH[SKHQRPHQRQLQWRWKHUD\WUDFHULQRUGHUWR
SURGXFHPRUHUHDOLVWLFLPDJHV
5$<75$&,1*217+(*38
6LQFH$3,VVXFKDV2SHQ*/DQG'LUHFW;DUHQRWGHVLJQHGWRDOORZUD\WUDFHG
UHQGHULQJZHZLOOKDYHWRXVH&8'$&WRLPSOHPHQWRXUEDVLFUD\WUDFHU2XU
UD\WUDFHUZLOOEHH[WUDRUGLQDULO\VLPSOHVRWKDWZHFDQFRQFHQWUDWHRQWKHXVH
RIFRQVWDQWPHPRU\VRLI\RXZHUHH[SHFWLQJFRGHWKDWFRXOGIRUPWKHEDVLVRI
DIXOOEORZQSURGXFWLRQUHQGHUHU\RXZLOOEHGLVDSSRLQWHG2XUEDVLFUD\WUDFHU
ZLOORQO\VXSSRUWVFHQHVRIVSKHUHVDQGWKHFDPHUDLVUHVWULFWHGWRWKH]D[LV
IDFLQJWKHRULJLQ0RUHRYHUZHZLOOQRWVXSSRUWDQ\OLJKWLQJRIWKHVFHQHWRDYRLG
WKHFRPSOLFDWLRQVRIVHFRQGDU\UD\V,QVWHDGRIFRPSXWLQJOLJKWLQJHIIHFWVZHZLOO
VLPSO\DVVLJQHDFKVSKHUHDFRORUDQGWKHQVKDGHWKHPZLWKVRPHSUHFRPSXWHG
IXQFWLRQLIWKH\DUHYLVLEOH
6RZKDWwillWKHUD\WUDFHUGR",WZLOOȌUHDUD\IURPHDFKSL[HODQGNHHSWUDFNRI
ZKLFKUD\VKLWZKLFKVSKHUHV,WZLOODOVRWUDFNWKHGHSWKRIHDFKRIWKHVHKLWV,Q
WKHFDVHZKHUHDUD\SDVVHVWKURXJKPXOWLSOHVSKHUHVRQO\WKHVSKHUHFORVHVW
WRWKHFDPHUDFDQEHVHHQ,QHVVHQFHRXUǤUD\WUDFHUǥLVQRWGRLQJPXFKPRUH
WKDQKLGLQJVXUIDFHVWKDWFDQQRWEHVHHQE\WKHFDPHUD
:HZLOOPRGHORXUVSKHUHVZLWKDGDWDVWUXFWXUHWKDWVWRUHVWKHVSKHUHǢVFHQWHU
FRRUGLQDWHRI(x, y, z)LWVradiusDQGLWVFRORURI(r, b, g)
98
struct Sphere {
float r,b,g;
float radius;
float x,y,z;
__device__ float hit( float ox, float oy, float *n ) {
float dx = ox - x;
float dy = oy - y;
if (dx*dx + dy*dy < radius*radius) {
float dz = sqrtf( radius*radius - dx*dx - dy*dy );
*n = dz / sqrtf( radius * radius );
return dz + z;
}
return -INF;
}
};
2XUmain()URXWLQHIROORZVURXJKO\WKHVDPHVHTXHQFHDVRXUSUHYLRXVLPDJH
JHQHUDWLQJH[DPSOHV
#include "cuda.h"
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
Sphere *s;
99
:HDOORFDWHPHPRU\IRURXULQSXWGDWDZKLFKLVDQDUUD\RIVSKHUHVWKDWFRPSRVH
RXUVFHQH6LQFHZHQHHGWKLVGDWDRQWKH*38EXWDUHJHQHUDWLQJLWZLWKWKH&38
ZHKDYHWRGRERWKDcudaMalloc() and a malloc()WRDOORFDWHPHPRU\RQ
ERWKWKH*38DQGWKH&38:HDOVRDOORFDWHDELWPDSLPDJHWKDWZHZLOOȌOOZLWK
RXWSXWSL[HOGDWDDVZHUD\WUDFHRXUVSKHUHVRQWKH*38
$IWHUDOORFDWLQJPHPRU\IRULQSXWDQGRXWSXWZHUDQGRPO\JHQHUDWHWKHFHQWHU
FRRUGLQDWHFRORUDQGUDGLXVIRURXUVSKHUHV
100
7KHSURJUDPFXUUHQWO\JHQHUDWHVDUDQGRPDUUD\RIVSKHUHVEXWWKLVTXDQWLW\
LVVSHFLȌHGLQD#defineDQGFDQEHDGMXVWHGDFFRUGLQJO\
:HFRS\WKLVDUUD\RIVSKHUHVWRWKH*38XVLQJcudaMemcpy()DQGWKHQIUHHWKH
WHPSRUDU\EXIIHU
1RZWKDWRXULQSXWLVRQWKH*38DQGZHKDYHDOORFDWHGVSDFHIRUWKHRXWSXWZH
DUHUHDG\WRODXQFKRXUNHUQHO
:HZLOOH[DPLQHWKHNHUQHOLWVHOILQDPRPHQWEXWIRUQRZ\RXVKRXOGWDNHLWRQ
IDLWKWKDWLWUD\WUDFHVWKHVFHQHDQGJHQHUDWHVSL[HOGDWDIRUWKHLQSXWVFHQHRI
VSKHUHV)LQDOO\ZHFRS\WKHRXWSXWLPDJHEDFNIURPWKH*38DQGGLVSOD\LW,W
VKRXOGJRZLWKRXWVD\LQJWKDWZHIUHHDOODOORFDWHGPHPRU\WKDWKDVQǢWDOUHDG\
EHHQIUHHG
101
$OORIWKLVVKRXOGEHFRPPRQSODFHWR\RXQRZ6RKRZGRZHGRWKHDFWXDOUD\
WUDFLQJ"%HFDXVHZHKDYHVHWWOHGRQDYHU\VLPSOHUD\WUDFLQJPRGHORXUNHUQHO
ZLOOEHYHU\HDV\WRXQGHUVWDQG(DFKWKUHDGLVJHQHUDWLQJRQHSL[HOIRURXURXWSXW
LPDJHVRZHVWDUWLQWKHXVXDOPDQQHUE\FRPSXWLQJWKHxDQGyFRRUGLQDWHV
IRUWKHWKUHDGDVZHOODVWKHOLQHDUL]HGoffsetLQWRRXURXWSXWEXIIHU:HZLOO
DOVRVKLIWRXU(x,y)LPDJHFRRUGLQDWHVE\DIM/2VRWKDWWKH]D[LVUXQVWKURXJK
WKHFHQWHURIWKHLPDJH
6LQFHHDFKUD\QHHGVWRFKHFNHDFKVSKHUHIRULQWHUVHFWLRQZHZLOOQRZLWHUDWH
WKURXJKWKHDUUD\RIVSKHUHVFKHFNLQJHDFKIRUDKLW
&OHDUO\WKHPDMRULW\RIWKHLQWHUHVWLQJFRPSXWDWLRQOLHVLQWKHfor()ORRS:H
LWHUDWHWKURXJKHDFKRIWKHLQSXWVSKHUHVDQGFDOOLWVhit()PHWKRGWRGHWHU-
PLQHZKHWKHUWKHUD\IURPRXUSL[HOǤVHHVǥWKHVSKHUH,IWKHUD\KLWVWKHFXUUHQW
VSKHUHZHGHWHUPLQHZKHWKHUWKHKLWLVFORVHUWRWKHFDPHUDWKDQWKHODVWVSKHUH
ZHKLW,ILWLVFORVHUZHVWRUHWKLVGHSWKDVRXUQHZFORVHVWVSKHUH,QDGGLWLRQZH
102
VWRUHWKHFRORUDVVRFLDWHGZLWKWKLVVSKHUHVRWKDWZKHQWKHORRSKDVWHUPLQDWHG
WKHWKUHDGNQRZVWKHFRORURIWKHVSKHUHWKDWLVFORVHVWWRWKHFDPHUD6LQFHWKLV
LVWKHFRORUWKDWWKHUD\IURPRXUSL[HOǤVHHVǥZHFRQFOXGHWKDWWKLVLVWKHFRORURI
WKHSL[HODQGVWRUHWKLVYDOXHLQRXURXWSXWLPDJHEXIIHU
$IWHUHYHU\VSKHUHKDVEHHQFKHFNHGIRULQWHUVHFWLRQZHFDQVWRUHWKHFXUUHQW
FRORULQWRWKHRXWSXWLPDJH
1RWHWKDWLIQRVSKHUHVKDYHEHHQKLWWKHFRORUWKDWZHVWRUHZLOOEHZKDWHYHU
FRORUZHLQLWLDOL]HGWKHYDULDEOHVr, bDQGgWR,QWKLVFDVHZHVHWr, bDQGg
WR]HURVRWKHEDFNJURXQGZLOOEHEODFN<RXFDQFKDQJHWKHVHYDOXHVWRUHQGHU
DGLIIHUHQWFRORUEDFNJURXQG)LJXUHVKRZVDQH[DPSOHRIZKDWWKHRXWSXW
VKRXOGORRNOLNHZKHQUHQGHUHGZLWKVSKHUHVDQGDEODFNEDFNJURXQG
6LQFHZHUDQGRPO\JHQHUDWHGWKHVSKHUHSRVLWLRQVFRORUVDQGVL]HVZHDGYLVH
\RXQRWWRSDQLFLI\RXURXWSXWGRHVQǢWPDWFKWKLVLPDJHLGHQWLFDOO\
5$<75$&,1*:,7+&2167$170(025<
<RXPD\KDYHQRWLFHGWKDWZHQHYHUPHQWLRQHGFRQVWDQWPHPRU\LQWKHUD\
WUDFLQJH[DPSOH1RZLWǢVWLPHWRLPSURYHWKLVH[DPSOHXVLQJWKHEHQHȌWVRI
FRQVWDQWPHPRU\6LQFHZHFDQQRWPRGLI\FRQVWDQWPHPRU\ZHFOHDUO\FDQǢW
XVHLWIRUWKHRXWSXWLPDJHGDWD$QGWKLVH[DPSOHKDVRQO\RQHLQSXWWKHDUUD\
RIVSKHUHVVRLWVKRXOGEHSUHWW\REYLRXVZKDWGDWDZHZLOOVWRUHLQFRQVWDQW
PHPRU\
7KHPHFKDQLVPIRUGHFODULQJPHPRU\FRQVWDQWLVLGHQWLFDOWRWKHRQHZHXVHGIRU
GHFODULQJDEXIIHUDVVKDUHGPHPRU\,QVWHDGRIGHFODULQJRXUDUUD\OLNHWKLV
Sphere *s;
ZHDGGWKHPRGLȌHU__constant__EHIRUHLW
1RWLFHWKDWLQWKHRULJLQDOH[DPSOHZHGHFODUHGDSRLQWHUDQGWKHQXVHG
cudaMalloc()WRDOORFDWH*38PHPRU\IRULW:KHQZHFKDQJHGLWWRFRQVWDQW
PHPRU\ZHDOVRFKDQJHGWKHGHFODUDWLRQWRVWDWLFDOO\DOORFDWHWKHVSDFHLQ
FRQVWDQWPHPRU\:HQRORQJHUQHHGWRZRUU\DERXWFDOOLQJcudaMalloc()RU
cudaFree()IRURXUDUUD\RIVSKHUHVEXWZHGRQHHGWRFRPPLWWRDVL]HIRUWKLV
DUUD\DWFRPSLOHWLPH)RUPDQ\DSSOLFDWLRQVWKLVLVDQDFFHSWDEOHWUDGHRIIIRU
WKHSHUIRUPDQFHEHQHȌWVRIFRQVWDQWPHPRU\:HZLOOWDONDERXWWKHVHEHQHȌWV
PRPHQWDULO\EXWȌUVWZHZLOOORRNDWKRZWKHXVHRIFRQVWDQWPHPRU\FKDQJHV
RXUmain()URXWLQH
104
/DUJHO\WKLVLVLGHQWLFDOWRWKHSUHYLRXVLPSOHPHQWDWLRQRImain()$VZH
PHQWLRQHGSUHYLRXVO\ZHQRORQJHUQHHGWKHFDOOWRcudaMalloc()WRDOORFDWH
105
VSDFHIRURXUDUUD\RIVSKHUHV7KHRWKHUFKDQJHKDVEHHQKLJKOLJKWHGLQWKH
OLVWLQJ
:HXVHWKLVVSHFLDOYHUVLRQRIcudaMemcpy()ZKHQZHFRS\IURPKRVW
PHPRU\WRFRQVWDQWPHPRU\RQWKH*387KHRQO\GLIIHUHQFHVEHWZHHQ
cudaMemcpyToSymbol()DQGcudaMemcpy()XVLQJcudaMemcpyHostToDevice
DUHWKDWcudaMemcpyToSymbol()FRSLHVWRFRQVWDQWPHPRU\DQG
cudaMemcpy()FRSLHVWRJOREDOPHPRU\
2XWVLGHWKH__constant__PRGLȌHUDQGWKHWZRFKDQJHVWRmain()WKH
YHUVLRQVZLWKDQGZLWKRXWFRQVWDQWPHPRU\DUHLGHQWLFDO
3(5)250$1&(:,7+&2167$170(025<
'HFODULQJPHPRU\DV__constant__FRQVWUDLQVRXUXVDJHWREHUHDGRQO\,Q
WDNLQJRQWKLVFRQVWUDLQWZHH[SHFWWRJHWVRPHWKLQJLQUHWXUQ$VZHSUHYLRXVO\
PHQWLRQHGUHDGLQJIURPFRQVWDQWPHPRU\FDQFRQVHUYHPHPRU\EDQGZLGWK
ZKHQFRPSDUHGWRUHDGLQJWKHVDPHGDWDIURPJOREDOPHPRU\7KHUHDUHWZR
UHDVRQVZK\UHDGLQJIURPWKH.%RIFRQVWDQWPHPRU\FDQVDYHEDQGZLGWKRYHU
VWDQGDUGUHDGVRIJOREDOPHPRU\
ǩ $VLQJOHUHDGIURPFRQVWDQWPHPRU\FDQEHEURDGFDVWWRRWKHUǤQHDUE\ǥ
WKUHDGVHIIHFWLYHO\VDYLQJXSWRUHDGV
ǩ &RQVWDQWPHPRU\LVFDFKHGVRFRQVHFXWLYHUHDGVRIWKHVDPHDGGUHVVZLOOQRW
LQFXUDQ\DGGLWLRQDOPHPRU\WUDIȌF
:KDWGRZHPHDQE\WKHZRUGnearby"7RDQVZHUWKLVTXHVWLRQZHZLOOQHHGWR
H[SODLQWKHFRQFHSWRIDwarp)RUWKRVHUHDGHUVZKRDUHPRUHIDPLOLDUZLWKStar
TrekWKDQZLWKZHDYLQJDZDUSLQWKLVFRQWH[WKDVQRWKLQJWRGRZLWKWKHVSHHG
RIWUDYHOWKURXJKVSDFH,QWKHZRUOGRIZHDYLQJDZDUSUHIHUVWRWKHJURXS
RIthreadsEHLQJZRYHQWRJHWKHULQWRIDEULF,QWKH&8'$$UFKLWHFWXUHDwarp
UHIHUVWRDFROOHFWLRQRIWKUHDGVWKDWDUHǤZRYHQWRJHWKHUǥDQGJHWH[HFXWHGLQ
ORFNVWHS$WHYHU\OLQHLQ\RXUSURJUDPHDFKWKUHDGLQDZDUSH[HFXWHVWKHVDPH
LQVWUXFWLRQRQGLIIHUHQWGDWD
106
:KHQLWFRPHVWRKDQGOLQJFRQVWDQWPHPRU\19,',$KDUGZDUHFDQEURDGFDVW
DVLQJOHPHPRU\UHDGWRHDFKKDOIZDUS$KDOIZDUSǟQRWQHDUO\DVFUHDWLYHO\
QDPHGDVDZDUSǟLVDJURXSRIWKUHDGVKDOIRIDWKUHDGZDUS,IHYHU\
WKUHDGLQDKDOIZDUSUHTXHVWVGDWDIURPWKHVDPHDGGUHVVLQFRQVWDQWPHPRU\
\RXU*38ZLOOJHQHUDWHRQO\DVLQJOHUHDGUHTXHVWDQGVXEVHTXHQWO\EURDGFDVW
WKHGDWDWRHYHU\WKUHDG,I\RXDUHUHDGLQJDORWRIGDWDIURPFRQVWDQWPHPRU\
\RXZLOOJHQHUDWHRQO\ URXJKO\SHUFHQW RIWKHPHPRU\WUDIȌFDV\RXZRXOG
ZKHQXVLQJJOREDOPHPRU\
%XWWKHVDYLQJVGRQǢWVWRSDWDSHUFHQWUHGXFWLRQLQEDQGZLGWKZKHQ
UHDGLQJFRQVWDQWPHPRU\%HFDXVHZHKDYHFRPPLWWHGWROHDYLQJWKHPHPRU\
XQFKDQJHGWKHKDUGZDUHFDQDJJUHVVLYHO\FDFKHWKHFRQVWDQWGDWDRQWKH*38
6RDIWHUWKHȌUVWUHDGIURPDQDGGUHVVLQFRQVWDQWPHPRU\RWKHUKDOIZDUSV
UHTXHVWLQJWKHVDPHDGGUHVVDQGWKHUHIRUHKLWWLQJWKHFRQVWDQWFDFKHZLOO
JHQHUDWHQRDGGLWLRQDOPHPRU\WUDIȌF
,QWKHFDVHRIRXUUD\WUDFHUHYHU\WKUHDGLQWKHODXQFKUHDGVWKHGDWDFRUUH-
VSRQGLQJWRWKHȌUVWVSKHUHVRWKHWKUHDGFDQWHVWLWVUD\IRULQWHUVHFWLRQ$IWHU
ZHPRGLI\RXUDSSOLFDWLRQWRVWRUHWKHVSKHUHVLQFRQVWDQWPHPRU\WKHKDUG-
ZDUHQHHGVWRPDNHRQO\DVLQJOHUHTXHVWIRUWKLVGDWD$IWHUFDFKLQJWKHGDWD
HYHU\RWKHUWKUHDGDYRLGVJHQHUDWLQJPHPRU\WUDIȌFDVDUHVXOWRIRQHRIWKHWZR
FRQVWDQWPHPRU\EHQHȌWV
ǩ ,WUHFHLYHVWKHGDWDLQDKDOIZDUSEURDGFDVW
ǩ ,WUHWULHYHVWKHGDWDIURPWKHFRQVWDQWPHPRU\FDFKH
8QIRUWXQDWHO\WKHUHFDQSRWHQWLDOO\EHDGRZQVLGHWRSHUIRUPDQFHZKHQXVLQJ
FRQVWDQWPHPRU\7KHKDOIZDUSEURDGFDVWIHDWXUHLVLQDFWXDOLW\DGRXEOHHGJHG
VZRUG$OWKRXJKLWFDQGUDPDWLFDOO\DFFHOHUDWHSHUIRUPDQFHZKHQDOOWKUHDGV
DUHUHDGLQJWKHVDPHDGGUHVVLWDFWXDOO\VORZVSHUIRUPDQFHWRDFUDZOZKHQDOO
WKUHDGVUHDGGLIIHUHQWDGGUHVVHV
7KHWUDGHRIIWRDOORZLQJWKHEURDGFDVWRIDVLQJOHUHDGWRWKUHDGVLVWKDWWKH
WKUHDGVDUHDOORZHGWRSODFHRQO\DVLQJOHUHDGUHTXHVWDWDWLPH)RUH[DPSOH
LIDOOWKUHDGVLQDKDOIZDUSQHHGGLIIHUHQWGDWDIURPFRQVWDQWPHPRU\WKH
GLIIHUHQWUHDGVJHWVHULDOL]HGHIIHFWLYHO\WDNLQJWLPHVWKHDPRXQWRIWLPH
WRSODFHWKHUHTXHVW,IWKH\ZHUHUHDGLQJIURPFRQYHQWLRQDOJOREDOPHPRU\WKH
UHTXHVWFRXOGEHLVVXHGDWWKHVDPHWLPH,QWKLVFDVHUHDGLQJIURPFRQVWDQW
PHPRU\ZRXOGSUREDEO\EHVORZHUWKDQXVLQJJOREDOPHPRU\
107
0HDVXULQJ3HUIRUPDQFHZLWK(YHQWV
)XOO\DZDUHWKDWWKHUHPD\EHHLWKHUSRVLWLYHRUQHJDWLYHLPSOLFDWLRQV\RXKDYH
FKDQJHG\RXUUD\WUDFHUWRXVHFRQVWDQWPHPRU\+RZGR\RXGHWHUPLQHKRZWKLV
KDVLPSDFWHGWKHSHUIRUPDQFHRI\RXUSURJUDP"2QHRIWKHVLPSOHVWPHWULFV
LQYROYHVDQVZHULQJWKLVVLPSOHTXHVWLRQ:KLFKYHUVLRQWDNHVOHVVWLPHWRȌQLVK"
:HFRXOGXVHRQHRIWKH&38RURSHUDWLQJV\VWHPWLPHUVEXWWKLVZLOOLQFOXGH
ODWHQF\DQGYDULDWLRQIURPDQ\QXPEHURIVRXUFHV RSHUDWLQJV\VWHPWKUHDG
VFKHGXOLQJDYDLODELOLW\RIKLJKSUHFLVLRQ&38WLPHUVDQGVRRQ )XUWKHUPRUH
ZKLOHWKH*38NHUQHOUXQVZHPD\EHDV\QFKURQRXVO\SHUIRUPLQJFRPSXWDWLRQ
RQWKHKRVW7KHRQO\ZD\WRWLPHWKHVHKRVWFRPSXWDWLRQVLVXVLQJWKH&38RU
RSHUDWLQJV\VWHPWLPLQJPHFKDQLVP6RWRPHDVXUHWKHWLPHD*38VSHQGVRQD
WDVNZHZLOOXVHWKH&8'$HYHQW$3,
$QeventLQ&8'$LVHVVHQWLDOO\D*38WLPHVWDPSWKDWLVUHFRUGHGDWDXVHU
VSHFLȌHGSRLQWLQWLPH6LQFHWKH*38LWVHOILVUHFRUGLQJWKHWLPHVWDPSLW
HOLPLQDWHVDORWRIWKHSUREOHPVZHPLJKWHQFRXQWHUZKHQWU\LQJWRWLPH*38
H[HFXWLRQZLWK&38WLPHUV7KH$3,LVUHODWLYHO\HDV\WRXVHVLQFHWDNLQJDWLPH
VWDPSFRQVLVWVRIMXVWWZRVWHSVFUHDWLQJDQHYHQWDQGVXEVHTXHQWO\UHFRUGLQJ
DQHYHQW)RUH[DPSOHDWWKHEHJLQQLQJRIVRPHVHTXHQFHRIFRGHZHLQVWUXFW
WKH&8'$UXQWLPHWRPDNHDUHFRUGRIWKHFXUUHQWWLPH:HGRVRE\FUHDWLQJDQG
WKHQUHFRUGLQJWKHHYHQW
cudaEvent_t start;
cudaEventCreate(&start);
cudaEventRecord( start, 0 );
<RXZLOOQRWLFHWKDWZKHQZHLQVWUXFWWKHUXQWLPHWRUHFRUGWKHHYHQWstart, we
DOVRSDVVLWDVHFRQGDUJXPHQW,QWKHSUHYLRXVH[DPSOHWKLVDUJXPHQWLV7KH
H[DFWQDWXUHRIWKLVDUJXPHQWLVXQLPSRUWDQWIRURXUSXUSRVHVULJKWQRZVRZH
LQWHQGWROHDYHLWP\VWHULRXVO\XQH[SODLQHGUDWKHUWKDQRSHQDQHZFDQRIZRUPV
,I\RXUFXULRVLW\LVNLOOLQJ\RXZHLQWHQGWRGLVFXVVWKLVZKHQZHWDONDERXW
streams
7RWLPHDEORFNRIFRGHZHZLOOZDQWWRFUHDWHERWKDVWDUWHYHQWDQGDVWRSHYHQW
:HZLOOKDYHWKH&8'$UXQWLPHUHFRUGZKHQZHVWDUWWHOOLWWRGRVRPHRWKHUZRUN
RQWKH*38DQGWKHQWHOOLWWRUHFRUGZKHQZHǢYHVWRSSHG
108
cudaEventRecord( stop, 0 );
8QIRUWXQDWHO\WKHUHLVVWLOODSUREOHPZLWKWLPLQJ*38FRGHLQWKLVZD\7KHȌ[ZLOO
UHTXLUHRQO\RQHOLQHRIFRGHEXWZLOOUHTXLUHVRPHH[SODQDWLRQ7KHWULFNLHVWSDUWRI
XVLQJHYHQWVDULVHVDVDFRQVHTXHQFHRIWKHIDFWWKDWVRPHRIWKHFDOOVZHPDNHLQ
&8'$&DUHDFWXDOO\asynchronous)RUH[DPSOHZKHQZHODXQFKHGWKHNHUQHOLQ
RXUUD\WUDFHUWKH*38EHJLQVH[HFXWLQJRXUFRGHEXWWKH&38FRQWLQXHVH[HFXWLQJ
WKHQH[WOLQHRIRXUSURJUDPEHIRUHWKH*38ȌQLVKHV7KLVLVH[FHOOHQWIURPD
SHUIRUPDQFHVWDQGSRLQWEHFDXVHLWPHDQVZHFDQEHFRPSXWLQJVRPHWKLQJRQWKH
*38DQG&38DWWKHVDPHWLPHEXWFRQFHSWXDOO\LWPDNHVWLPLQJWULFN\
<RXVKRXOGLPDJLQHFDOOVWRcudaEventRecord()DVDQLQVWUXFWLRQWRUHFRUG
WKHFXUUHQWWLPHEHLQJSODFHGLQWRWKH*38ǢVSHQGLQJTXHXHRIZRUN$VDUHVXOW
RXUHYHQWZRQǢWDFWXDOO\EHUHFRUGHGXQWLOWKH*38ȌQLVKHVHYHU\WKLQJSULRUWRWKH
FDOOWRcudaEventRecord(),QWHUPVRIKDYLQJRXUstopHYHQWPHDVXUHWKH
FRUUHFWWLPHWKLVLVSUHFLVHO\ZKDWZHZDQW%XWZHFDQQRWVDIHO\readWKHYDOXH
RIWKHstopHYHQWXQWLOWKH*38KDVFRPSOHWHGLWVSULRUZRUNDQGUHFRUGHGWKH
stopHYHQW)RUWXQDWHO\ZHKDYHDZD\WRLQVWUXFWWKH&38WRV\QFKURQL]HRQDQ
HYHQWWKHHYHQW$3,IXQFWLRQcudaEventSynchronize()
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
1RZZHKDYHLQVWUXFWHGWKHUXQWLPHWREORFNIXUWKHULQVWUXFWLRQXQWLOWKH*38
KDVUHDFKHGWKHstopHYHQW:KHQWKHFDOOWRcudaEventSynchronize()
109
UHWXUQVZHNQRZWKDWDOO*38ZRUNEHIRUHWKHstopHYHQWKDVFRPSOHWHGVRLW
LVVDIHWRUHDGWKHWLPHVWDPSUHFRUGHGLQstop,WLVZRUWKQRWLQJWKDWEHFDXVH
&8'$HYHQWVJHWLPSOHPHQWHGGLUHFWO\RQWKH*38WKH\DUHXQVXLWDEOHIRUWLPLQJ
PL[WXUHVRIGHYLFHDQGKRVWFRGH7KDWLV\RXZLOOJHWXQUHOLDEOHUHVXOWVLI\RX
DWWHPSWWRXVH&8'$HYHQWVWRWLPHPRUHWKDQNHUQHOH[HFXWLRQVDQGPHPRU\
FRSLHVLQYROYLQJWKHGHYLFH
0($685,1*5$<75$&(53(5)250$1&(
7RWLPHRXUUD\WUDFHUZHZLOOQHHGWRFUHDWHDVWDUWDQGVWRSHYHQWMXVWDVZHGLG
ZKHQOHDUQLQJDERXWHYHQWV7KHIROORZLQJLVDWLPLQJHQDEOHGYHUVLRQRIWKHUD\
WUDFHUWKDWGRHVnotXVHFRQVWDQWPHPRU\
110
float elapsedTime;
HANDLE_ERROR( cudaEventElapsedTime( &elapsedTime,
start, stop ) );
printf( "Time to generate: %3.1f ms\n", elapsedTime );
// display
bitmap.display_and_exit();
111
1RWLFHWKDWZHKDYHWKURZQWZRDGGLWLRQDOIXQFWLRQVLQWRWKHPL[WKHFDOOV
WRcudaEventElapsedTime()DQGcudaEventDestroy()7KHIXQFWLRQ
cudaEventElapsedTime()LVDXWLOLW\WKDWFRPSXWHVWKHHODSVHGWLPHEHWZHHQ
WZRSUHYLRXVO\UHFRUGHGHYHQWV7KHWLPHLQPLOOLVHFRQGVHODSVHGEHWZHHQWKH
WZRHYHQWVLVUHWXUQHGLQWKHȌUVWDUJXPHQWWKHDGGUHVVRIDȍRDWLQJSRLQW
YDULDEOH
7KHFDOOWRcudaEventDestroy()QHHGVWREHPDGHZKHQZHǢUHȌQLVKHG
XVLQJDQHYHQWFUHDWHGZLWKcudaEventCreate()7KLVLVLGHQWLFDOWRFDOOLQJ
free()RQPHPRU\SUHYLRXVO\DOORFDWHGZLWKmalloc()VRZHQHHGQǢW
VWUHVVKRZLPSRUWDQWLWLVWRPDWFKHYHU\cudaEventCreate()ZLWKD
cudaEventDestroy()
:HFDQLQVWUXPHQWWKHUD\WUDFHUWKDWGRHVXVHFRQVWDQWPHPRU\LQWKHVDPH
IDVKLRQ
112
// display
bitmap.display_and_exit();
113
1RZZKHQZHUXQRXUWZRYHUVLRQVRIWKHUD\WUDFHUZHFDQFRPSDUHWKHWLPHLW
WDNHVWRFRPSOHWHWKH*38ZRUN7KLVZLOOWHOOXVDWDKLJKOHYHOZKHWKHULQWUR-
GXFLQJFRQVWDQWPHPRU\KDVLPSURYHGWKHSHUIRUPDQFHRIRXUDSSOLFDWLRQRU
ZRUVHQHGLW)RUWXQDWHO\LQWKLVFDVHSHUIRUPDQFHLVLPSURYHGGUDPDWLFDOO\
E\XVLQJFRQVWDQWPHPRU\2XUH[SHULPHQWVRQD*H)RUFH*7;VKRZWKH
FRQVWDQWPHPRU\UD\WUDFHUSHUIRUPLQJXSWRSHUFHQWIDVWHUWKDQWKHYHUVLRQ
WKDWXVHVJOREDOPHPRU\2QDGLIIHUHQW*38\RXUPLOHDJHPLJKWYDU\DOWKRXJK
WKHUD\WUDFHUWKDWXVHVFRQVWDQWPHPRU\VKRXOGDOZD\VEHDWOHDVWDVIDVWDVWKH
YHUVLRQZLWKRXWLW
&KDSWHU5HYLHZ
,QDGGLWLRQWRWKHJOREDODQGVKDUHGPHPRU\ZHH[SORUHGLQSUHYLRXVFKDSWHUV
19,',$KDUGZDUHPDNHVRWKHUW\SHVRIPHPRU\DYDLODEOHIRURXUXVH&RQVWDQW
PHPRU\FRPHVZLWKDGGLWLRQDOFRQVWUDLQWVRYHUVWDQGDUGJOREDOPHPRU\EXW
LQVRPHFDVHVVXEMHFWLQJRXUVHOYHVWRWKHVHFRQVWUDLQWVFDQ\LHOGDGGLWLRQDO
SHUIRUPDQFH6SHFLȌFDOO\ZHFDQVHHDGGLWLRQDOSHUIRUPDQFHZKHQWKUHDGVLQD
ZDUSQHHGDFFHVVWRWKHVDPHUHDGRQO\GDWD8VLQJFRQVWDQWPHPRU\IRUGDWD
ZLWKWKLVDFFHVVSDWWHUQFDQFRQVHUYHEDQGZLGWKERWKEHFDXVHRIWKHFDSDFLW\WR
EURDGFDVWUHDGVDFURVVDKDOIZDUSDQGEHFDXVHRIWKHSUHVHQFHRIDFRQVWDQW
PHPRU\FDFKHRQFKLS0HPRU\EDQGZLGWKERWWOHQHFNVDZLGHFODVVRIDOJR-
ULWKPVVRKDYLQJPHFKDQLVPVWRDPHOLRUDWHWKLVVLWXDWLRQFDQSURYHLQFUHGLEO\
XVHIXO
:HDOVROHDUQHGKRZWRXVH&8'$HYHQWVWRUHTXHVWWKHUXQWLPHWRUHFRUGWLPH
VWDPSVDWVSHFLȌFSRLQWVGXULQJ*38H[HFXWLRQ:HVDZKRZWRV\QFKURQL]HWKH
&38ZLWKWKH*38RQRQHRIWKHVHHYHQWVDQGWKHQKRZWRFRPSXWHWKHWLPH
HODSVHGEHWZHHQWZRHYHQWV,QGRLQJVRZHEXLOWXSDPHWKRGWRFRPSDUHWKH
UXQQLQJWLPHEHWZHHQWZRGLIIHUHQWPHWKRGVIRUUD\WUDFLQJVSKHUHVFRQFOXGLQJ
WKDWIRUWKHDSSOLFDWLRQDWKDQGXVLQJFRQVWDQWPHPRU\JDLQHGXVDVLJQLȌFDQW
DPRXQWRISHUIRUPDQFH
114
:KHQZHORRNHGDWFRQVWDQWPHPRU\ZHVDZKRZH[SORLWLQJVSHFLDOPHPRU\
VSDFHVXQGHUWKHULJKWFLUFXPVWDQFHVFDQGUDPDWLFDOO\DFFHOHUDWHDSSOLFDWLRQV
:HDOVROHDUQHGKRZWRPHDVXUHWKHVHSHUIRUPDQFHJDLQVLQRUGHUWRPDNH
LQIRUPHGGHFLVLRQVDERXWSHUIRUPDQFHFKRLFHV,QWKLVFKDSWHUZHZLOOOHDUQ
DERXWKRZWRDOORFDWHDQGXVHtexture memory/LNHFRQVWDQWPHPRU\WH[WXUH
PHPRU\LVDQRWKHUYDULHW\RIUHDGRQO\PHPRU\WKDWFDQLPSURYHSHUIRUPDQFH
DQGUHGXFHPHPRU\WUDIȌFZKHQUHDGVKDYHFHUWDLQDFFHVVSDWWHUQV$OWKRXJK
WH[WXUHPHPRU\ZDVRULJLQDOO\GHVLJQHGIRUWUDGLWLRQDOJUDSKLFVDSSOLFDWLRQVLW
FDQDOVREHXVHGTXLWHHIIHFWLYHO\LQVRPH*38FRPSXWLQJDSSOLFDWLRQV
115
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWWKHSHUIRUPDQFHFKDUDFWHULVWLFVRIWH[WXUHPHPRU\
ǩ <RXZLOOOHDUQKRZWRXVHRQHGLPHQVLRQDOWH[WXUHPHPRU\ZLWK&8'$&
ǩ <RXZLOOOHDUQKRZWRXVHWZRGLPHQVLRQDOWH[WXUHPHPRU\ZLWK&8'$&
7H[WXUH0HPRU\2YHUYLHZ
,I\RXUHDGWKHLQWURGXFWLRQWRWKLVFKDSWHUWKHVHFUHWLVDOUHDG\RXW7KHUHLV
\HWDQRWKHUW\SHRIUHDGRQO\PHPRU\WKDWLVDYDLODEOHIRUXVHLQ\RXUSURJUDPV
ZULWWHQLQ&8'$&5HDGHUVIDPLOLDUZLWKWKHZRUNLQJVRIJUDSKLFVKDUGZDUHZLOO
QRWEHVXUSULVHGEXWWKH*38ǢVVRSKLVWLFDWHGtexture memoryPD\DOVREHXVHG
IRUJHQHUDOSXUSRVHFRPSXWLQJ$OWKRXJK19,',$GHVLJQHGWKHWH[WXUHXQLWVIRU
WKHFODVVLFDO2SHQ*/DQG'LUHFW;UHQGHULQJSLSHOLQHVWH[WXUHPHPRU\KDVVRPH
SURSHUWLHVWKDWPDNHLWH[WUHPHO\XVHIXOIRUFRPSXWLQJ
/LNHFRQVWDQWPHPRU\WH[WXUHPHPRU\LVFDFKHGRQFKLSVRLQVRPHVLWXDWLRQVLW
ZLOOSURYLGHKLJKHUHIIHFWLYHEDQGZLGWKE\UHGXFLQJPHPRU\UHTXHVWVWRRIIFKLS
'5$06SHFLȌFDOO\WH[WXUHFDFKHVDUHGHVLJQHGIRUJUDSKLFVDSSOLFDWLRQVZKHUH
PHPRU\DFFHVVSDWWHUQVH[KLELWDJUHDWGHDORIspatial locality,QDFRPSXWLQJ
DSSOLFDWLRQWKLVURXJKO\LPSOLHVWKDWDWKUHDGLVOLNHO\WRUHDGIURPDQDGGUHVV
ǤQHDUǥWKHDGGUHVVWKDWQHDUE\WKUHDGVUHDGDVVKRZQLQ)LJXUH
Thread 0
Thread 1
Thread 2
Thread 3
116
$ULWKPHWLFDOO\WKHIRXUDGGUHVVHVVKRZQDUHQRWFRQVHFXWLYHVRWKH\ZRXOG
QRWEHFDFKHGWRJHWKHULQDW\SLFDO&38FDFKLQJVFKHPH%XWVLQFH*38WH[WXUH
FDFKHVDUHGHVLJQHGWRDFFHOHUDWHDFFHVVSDWWHUQVVXFKDVWKLVRQH\RXZLOOVHH
DQLQFUHDVHLQSHUIRUPDQFHLQWKLVFDVHZKHQXVLQJWH[WXUHPHPRU\LQVWHDGRI
JOREDOPHPRU\,QIDFWWKLVVRUWRIDFFHVVSDWWHUQLVQRWLQFUHGLEO\XQFRPPRQLQ
JHQHUDOSXUSRVHFRPSXWLQJDVZHVKDOOVHH
6LPXODWLQJ+HDW7UDQVIHU
3K\VLFDOVLPXODWLRQVFDQEHDPRQJWKHPRVWFRPSXWDWLRQDOO\FKDOOHQJLQJSURE-
OHPVWRVROYH)XQGDPHQWDOO\WKHUHLVRIWHQDWUDGHRIIEHWZHHQDFFXUDF\DQG
FRPSXWDWLRQDOFRPSOH[LW\$VDUHVXOWFRPSXWHUVLPXODWLRQVKDYHEHFRPHPRUH
DQGPRUHLPSRUWDQWLQUHFHQW\HDUVWKDQNVLQODUJHSDUWWRWKHLQFUHDVHGDFFX-
UDF\SRVVLEOHDVDFRQVHTXHQFHRIWKHSDUDOOHOFRPSXWLQJUHYROXWLRQ6LQFHPDQ\
SK\VLFDOVLPXODWLRQVFDQEHSDUDOOHOL]HGTXLWHHDVLO\ZHZLOOORRNDWDYHU\VLPSOH
VLPXODWLRQPRGHOLQWKLVH[DPSOH
6,03/(+($7,1*02'(/
7RGHPRQVWUDWHDVLWXDWLRQZKHUH\RXFDQHIIHFWLYHO\HPSOR\WH[WXUHPHPRU\
ZHZLOOFRQVWUXFWDVLPSOHWZRGLPHQVLRQDOKHDWWUDQVIHUVLPXODWLRQ:HVWDUW
E\DVVXPLQJWKDWZHKDYHVRPHUHFWDQJXODUURRPWKDWZHGLYLGHLQWRDJULG
,QVLGHWKHJULGZHZLOOUDQGRPO\VFDWWHUDKDQGIXORIǤKHDWHUVǥZLWKYDULRXVȌ[HG
WHPSHUDWXUHV)LJXUHVKRZVDQH[DPSOHRIZKDWWKLVURRPPLJKWORRNOLNH
*LYHQDUHFWDQJXODUJULGDQGFRQȌJXUDWLRQRIKHDWHUVZHDUHORRNLQJWRVLPX-
ODWHZKDWKDSSHQVWRWKHWHPSHUDWXUHLQHYHU\JULGFHOODVWLPHSURJUHVVHV)RU
VLPSOLFLW\FHOOVZLWKKHDWHUVLQWKHPDOZD\VUHPDLQDFRQVWDQWWHPSHUDWXUH
$WHYHU\VWHSLQWLPHZHZLOODVVXPHWKDWKHDWǤȍRZVǥEHWZHHQDFHOODQGLWV
QHLJKERUV,IDFHOOǢVQHLJKERULVZDUPHUWKDQLWLVWKHZDUPHUQHLJKERUZLOOWHQG
WRZDUPLWXS&RQYHUVHO\LIDFHOOKDVDQHLJKERUFRROHUWKDQLWLVLWZLOOFRRORII
4XDOLWDWLYHO\)LJXUHUHSUHVHQWVWKLVȍRZRIKHDW
,QRXUKHDWWUDQVIHUPRGHOZHZLOOFRPSXWHWKHQHZWHPSHUDWXUHLQDJULGFHOO
DVDVXPRIWKHGLIIHUHQFHVEHWZHHQLWVWHPSHUDWXUHDQGWKHWHPSHUDWXUHVRILWV
QHLJKERURUHVVHQWLDOO\DQXSGDWHHTXDWLRQDVVKRZQLQ(TXDWLRQ
Equation 7.1
,QWKHHTXDWLRQIRUXSGDWLQJDFHOOǢVWHPSHUDWXUHWKHFRQVWDQWkVLPSO\UHSUH-
VHQWVWKHUDWHDWZKLFKKHDWȍRZVWKURXJKWKHVLPXODWLRQ$ODUJHYDOXHRIk will
GULYHWKHV\VWHPWRDFRQVWDQWWHPSHUDWXUHTXLFNO\ZKLOHDVPDOOYDOXHZLOODOORZ
WKHVROXWLRQWRUHWDLQODUJHWHPSHUDWXUHJUDGLHQWVORQJHU6LQFHZHFRQVLGHURQO\
IRXUQHLJKERUV WRSERWWRPOHIWULJKW DQGkDQGTOLDUHPDLQFRQVWDQWLQWKH
HTXDWLRQWKLVXSGDWHEHFRPHVOLNHWKHRQHVKRZQLQ(TXDWLRQ
Equation 7.2
/LNHZLWKWKHUD\WUDFLQJH[DPSOHLQWKHSUHYLRXVFKDSWHUWKLVPRGHOLVQRW
LQWHQGHGWREHFORVHWRZKDWPLJKWEHXVHGLQLQGXVWU\ LQIDFWLWLVQRWUHDOO\
HYHQDQDSSUR[LPDWLRQRIVRPHWKLQJSK\VLFDOO\DFFXUDWH :HKDYHVLPSOLȌHG
WKLVPRGHOLPPHQVHO\LQRUGHUWRGUDZDWWHQWLRQWRWKHWHFKQLTXHVDWKDQG:LWK
WKLVLQPLQGOHWǢVWDNHDORRNDWKRZWKHXSGDWHJLYHQE\(TXDWLRQFDQEH
FRPSXWHGRQWKH*38
118
*LYHQVRPHJULGRILQSXWWHPSHUDWXUHVFRS\WKHWHPSHUDWXUHRIFHOOV
ZLWKKHDWHUVWRWKLVJULG7KLVZLOORYHUZULWHDQ\SUHYLRXVO\FRPSXWHG
WHPSHUDWXUHVLQWKHVHFHOOVWKHUHE\HQIRUFLQJRXUUHVWULFWLRQWKDWǤKHDWLQJ
FHOOVǥUHPDLQDWDFRQVWDQWWHPSHUDWXUH7KLVFRS\JHWVSHUIRUPHGLQ
copy_const_kernel()
*LYHQWKHLQSXWWHPSHUDWXUHJULGFRPSXWHWKHRXWSXWWHPSHUDWXUHVEDVHGRQ
WKHXSGDWHLQ(TXDWLRQ7KLVXSGDWHJHWVSHUIRUPHGLQblend_kernel()
6ZDSWKHLQSXWDQGRXWSXWEXIIHUVLQSUHSDUDWLRQRIWKHQH[WWLPHVWHS7KH
RXWSXWWHPSHUDWXUHJULGFRPSXWHGLQVWHSZLOOEHFRPHWKHLQSXWWHPSHUDWXUH
JULGWKDWZHVWDUWZLWKLQVWHSZKHQVLPXODWLQJWKHQH[WWLPHVWHS
%HIRUHEHJLQQLQJWKHVLPXODWLRQZHDVVXPHZHKDYHJHQHUDWHGDJULGRI
FRQVWDQWV0RVWRIWKHHQWULHVLQWKLVJULGDUH]HUREXWVRPHHQWULHVFRQWDLQ
QRQ]HURWHPSHUDWXUHVWKDWUHSUHVHQWKHDWHUVDWȌ[HGWHPSHUDWXUHV7KLVEXIIHU
RIFRQVWDQWVZLOOQRWFKDQJHRYHUWKHFRXUVHRIWKHVLPXODWLRQDQGJHWVUHDGDW
HDFKWLPHVWHS
%HFDXVHRIWKHZD\ZHDUHPRGHOLQJRXUKHDWWUDQVIHUZHVWDUWZLWKWKHRXWSXW
JULGIURPWKHSUHYLRXVWLPHVWHS7KHQDFFRUGLQJWRVWHSZHFRS\WKHWHPSHUD-
WXUHVRIWKHFHOOVZLWKKHDWHUVLQWRWKLVRXWSXWJULGRYHUZULWLQJDQ\SUHYLRXVO\
FRPSXWHGWHPSHUDWXUHV:HGRWKLVEHFDXVHZHKDYHDVVXPHGWKDWWKHWHPSHUD-
WXUHRIWKHVHKHDWHUFHOOVUHPDLQVFRQVWDQW:HSHUIRUPWKLVFRS\RIWKHFRQVWDQW
JULGRQWRWKHLQSXWJULGZLWKWKHIROORZLQJNHUQHO
119
7KHȌUVWWKUHHOLQHVVKRXOGORRNIDPLOLDU7KHȌUVWWZROLQHVFRQYHUWDWKUHDGǢV
threadIdxDQGblockIdxLQWRDQxDQGDyFRRUGLQDWH7KHWKLUGOLQH
FRPSXWHVDOLQHDUoffsetLQWRRXUFRQVWDQWDQGLQSXWEXIIHUV7KHKLJKOLJKWHG
OLQHSHUIRUPVWKHFRS\RIWKHKHDWHUWHPSHUDWXUHLQcptr[]WRWKHLQSXWJULGLQ
iptr[]1RWLFHWKDWWKHFRS\LVSHUIRUPHGRQO\LIWKHFHOOLQWKHFRQVWDQWJULGLV
QRQ]HUR:HGRWKLVWRSUHVHUYHDQ\YDOXHVWKDWZHUHFRPSXWHGLQWKHSUHYLRXV
WLPHVWHSZLWKLQFHOOVWKDWGRQRWFRQWDLQKHDWHUV&HOOVZLWKKHDWHUVZLOOKDYH
QRQ]HURHQWULHVLQcptr[]DQGZLOOWKHUHIRUHKDYHWKHLUWHPSHUDWXUHVSUHVHUYHG
IURPVWHSWRVWHSWKDQNVWRWKLVFRS\NHUQHO
6WHSRIWKHDOJRULWKPLVWKHPRVWFRPSXWDWLRQDOO\LQYROYHG7RSHUIRUPWKH
XSGDWHVZHFDQKDYHHDFKWKUHDGWDNHUHVSRQVLELOLW\IRUDVLQJOHFHOOLQRXU
VLPXODWLRQ(DFKWKUHDGZLOOUHDGLWVFHOOǢVWHPSHUDWXUHDQGWKHWHPSHUDWXUHVRI
LWVQHLJKERULQJFHOOVSHUIRUPWKHSUHYLRXVXSGDWHFRPSXWDWLRQDQGWKHQXSGDWH
LWVWHPSHUDWXUHZLWKWKHQHZYDOXH0XFKRIWKLVNHUQHOUHVHPEOHVWHFKQLTXHV
\RXǢYHXVHGEHIRUH
120
Notice that we start exactly as we did for the examples that produced images as
their output. However, instead of computing the color of a pixel, the threads are
computing temperatures of simulation grid cells. Nevertheless, they start by
converting their threadIdx and blockIdx into an x, y, and offset. You might
be able to recite these lines in your sleep by now (although for your sake, we hope
you aren’t actually reciting them in your sleep).
Next, we determine the offsets of our left, right, top, and bottom neighbors so
that we can read the temperatures of those cells. We will need those values to
compute the updated temperature in the current cell. The only complication here
is that we need to adjust indices on the border so that cells around the edges
do not wrap around. Finally, in the highlighted line, we perform the update from
Equation 7.2, adding the old temperature and the scaled differences of that
temperature and the cell’s neighbors’ temperatures.
#include "cuda.h"
#include "../common/book.h"
#include "../common/cpu_anim.h"
121
122
:HKDYHHTXLSSHGWKHFRGHZLWKHYHQWEDVHGWLPLQJDVZHGLGLQSUHYLRXVFKDS-
WHUǢVUD\WUDFLQJH[DPSOH7KHWLPLQJFRGHVHUYHVWKHVDPHSXUSRVHDVLWGLG
SUHYLRXVO\6LQFHZHZLOOHQGHDYRUWRDFFHOHUDWHWKHLQLWLDOLPSOHPHQWDWLRQZH
KDYHSXWLQSODFHDPHFKDQLVPE\ZKLFKZHFDQPHDVXUHSHUIRUPDQFHDQG
FRQYLQFHRXUVHOYHVWKDWZHKDYHVXFFHHGHG
7KHIXQFWLRQanim_gpu()JHWVFDOOHGE\WKHDQLPDWLRQIUDPHZRUNRQHYHU\
IUDPH7KHDUJXPHQWVWRWKLVIXQFWLRQDUHDSRLQWHUWRDDataBlockDQGWKH
QXPEHURIticksRIWKHDQLPDWLRQWKDWKDYHHODSVHG$VZLWKWKHDQLPDWLRQ
H[DPSOHVZHXVHEORFNVRIWKUHDGVWKDWZHRUJDQL]HLQWRDWZRGLPHQVLRQDO
JULGRI[(DFKLWHUDWLRQRIWKHfor()ORRSLQanim_gpu()FRPSXWHVD
VLQJOHWLPHVWHSRIWKHVLPXODWLRQDVGHVFULEHGE\WKHWKUHHVWHSDOJRULWKP
DWWKHEHJLQQLQJRI6HFWLRQ&RPSXWLQJ7HPSHUDWXUH8SGDWHV6LQFHWKH
DataBlockFRQWDLQVWKHFRQVWDQWEXIIHURIKHDWHUVDVZHOODVWKHRXWSXWRIWKH
ODVWWLPHVWHSLWHQFDSVXODWHVWKHHQWLUHVWDWHRIWKHDQLPDWLRQDQGFRQVHTXHQWO\
anim_gpu()GRHVQRWDFWXDOO\QHHGWRXVHWKHYDOXHRIticksDQ\ZKHUH
<RXZLOOQRWLFHWKDWZHKDYHFKRVHQWRGRWLPHVWHSVSHUIUDPH7KLVQXPEHU
LVQRWPDJLFDOEXWZDVGHWHUPLQHGVRPHZKDWH[SHULPHQWDOO\DVDUHDVRQDEOH
WUDGHRIIEHWZHHQKDYLQJWRGRZQORDGDELWPDSLPDJHIRUHYHU\WLPHVWHSDQG
FRPSXWLQJWRRPDQ\WLPHVWHSVSHUIUDPHUHVXOWLQJLQDMHUN\DQLPDWLRQ,I\RX
ZHUHPRUHFRQFHUQHGZLWKJHWWLQJWKHRXWSXWRIHDFKVLPXODWLRQVWHSWKDQ\RX
ZHUHZLWKDQLPDWLQJWKHUHVXOWVLQUHDOWLPH\RXFRXOGFKDQJHWKLVVXFKWKDW\RX
FRPSXWHGRQO\DVLQJOHVWHSRQHDFKIUDPH
$IWHUFRPSXWLQJWKHWLPHVWHSVVLQFHWKHSUHYLRXVIUDPHanim_gpu()
LVUHDG\WRFRS\DELWPDSIUDPHRIWKHFXUUHQWDQLPDWLRQEDFNWRWKH&38
6LQFHWKHfor()ORRSOHDYHVWKHLQSXWDQGRXWSXWVZDSSHGZHȌUVWVZDS
123
WKHLQSXWDQGRXWSXWEXIIHUVVRWKDWWKHRXWSXWDFWXDOO\FRQWDLQVWKHRXWSXW
RIWKHWKWLPHVWHS:HFRQYHUWWKHWHPSHUDWXUHVWRFRORUVXVLQJWKH
NHUQHOfloat_to_color()DQGWKHQFRS\WKHUHVXOWDQWLPDJHEDFNWR
WKH&38ZLWKDcudaMemcpy()WKDWVSHFLȌHVWKHGLUHFWLRQRIFRS\DV
cudaMemcpyDeviceToHost)LQDOO\WRSUHSDUHIRUWKHQH[WVHTXHQFHRIWLPH
VWHSVZHVZDSWKHRXWSXWEXIIHUEDFNWRWKHLQSXWEXIIHUVLQFHLWZLOOVHUYHDV
LQSXWWRWKHQH[WWLPHVWHS
124
Figure 7.4 shows an example of what the output might look like. You will notice in
the image some of the “heaters” that appear to be pixel-sized islands that disrupt
the continuity of the temperature distribution.
125
GHVLJQHGWRDFFHOHUDWH*LYHQWKDWZHZDQWWRXVHWH[WXUHPHPRU\ZHQHHGWR
OHDUQWKHPHFKDQLFVRIGRLQJVR
)LUVWZHZLOOQHHGWRGHFODUHRXULQSXWVDVWH[WXUHUHIHUHQFHV:HZLOOXVHUHIHU-
HQFHVWRȍRDWLQJSRLQWWH[WXUHVVLQFHRXUWHPSHUDWXUHGDWDLVȍRDWLQJSRLQW
7KHQH[WPDMRUGLIIHUHQFHLVWKDWDIWHUDOORFDWLQJ*38PHPRU\IRUWKHVH
WKUHHEXIIHUVZHQHHGWRbindWKHUHIHUHQFHVWRWKHPHPRU\EXIIHUXVLQJ
cudaBindTexture()7KLVEDVLFDOO\WHOOVWKH&8'$UXQWLPHWZRWKLQJV
ǩ :HLQWHQGWRXVHWKHVSHFLȌHGEXIIHUDVDWH[WXUH
ǩ :HLQWHQGWRXVHWKHVSHFLȌHGWH[WXUHUHIHUHQFHDVWKHWH[WXUHǢVǤQDPHǥ
126
$IWHUWKHWKUHHDOORFDWLRQVLQRXUKHDWWUDQVIHUVLPXODWLRQZHELQGWKHWKUHH
DOORFDWLRQVWRWKHWH[WXUHUHIHUHQFHVGHFODUHGHDUOLHU texConstSrc, texInDQG
texOut
$WWKLVSRLQWRXUWH[WXUHVDUHFRPSOHWHO\VHWXSDQGZHǢUHUHDG\WRODXQFKRXU
NHUQHO+RZHYHUZKHQZHǢUHUHDGLQJIURPWH[WXUHVLQWKHNHUQHOZHQHHGWRXVH
VSHFLDOIXQFWLRQVWRLQVWUXFWWKH*38WRURXWHRXUUHTXHVWVWKURXJKWKHWH[WXUHXQLW
DQGQRWWKURXJKVWDQGDUGJOREDOPHPRU\$VDUHVXOWZHFDQQRORQJHUVLPSO\XVH
VTXDUHEUDFNHWVWRUHDGIURPEXIIHUVZHQHHGWRPRGLI\blend_kernel()WRXVH
tex1Dfetch()ZKHQUHDGLQJIURPPHPRU\
$GGLWLRQDOO\WKHUHLVDQRWKHUGLIIHUHQFHEHWZHHQXVLQJJOREDODQGWH[WXUH
PHPRU\WKDWUHTXLUHVXVWRPDNHDQRWKHUFKDQJH$OWKRXJKLWORRNVOLNHDIXQF-
WLRQtex1Dfetch()LVDFRPSLOHULQWULQVLF$QGVLQFHWH[WXUHUHIHUHQFHVPXVW
EHGHFODUHGJOREDOO\DWȌOHVFRSHZHFDQQRORQJHUSDVVWKHLQSXWDQGRXWSXW
EXIIHUVDVSDUDPHWHUVWRblend_kernel()EHFDXVHWKHFRPSLOHUQHHGVWRNQRZ
DWFRPSLOHWLPHZKLFKWH[WXUHVtex1Dfetch()VKRXOGEHVDPSOLQJ5DWKHU
WKDQSDVVLQJSRLQWHUVWRLQSXWDQGRXWSXWEXIIHUVDVZHSUHYLRXVO\GLGZHZLOO
SDVVWRblend_kernel()DERROHDQȍDJdstOutWKDWLQGLFDWHVZKLFKEXIIHUWR
127
XVHDVLQSXWDQGZKLFKWRXVHDVRXWSXW7KHFKDQJHVWRblend_kernel() are
KLJKOLJKWHGKHUH
float t, l, c, r, b;
if (dstOut) {
t = tex1Dfetch(texIn,top);
l = tex1Dfetch(texIn,left);
c = tex1Dfetch(texIn,offset);
r = tex1Dfetch(texIn,right);
b = tex1Dfetch(texIn,bottom);
} else {
t = tex1Dfetch(texOut,top);
l = tex1Dfetch(texOut,left);
c = tex1Dfetch(texOut,offset);
r = tex1Dfetch(texOut,right);
b = tex1Dfetch(texOut,bottom);
}
dst[offset] = c + SPEED * (t + b + r + l - 4 * c);
}
128
6LQFHWKHcopy_const_kernel()NHUQHOUHDGVIURPRXUEXIIHUWKDWKROGVWKH
KHDWHUSRVLWLRQVDQGWHPSHUDWXUHVZHZLOOQHHGWRPDNHDVLPLODUPRGLȌFDWLRQ
WKHUHLQRUGHUWRUHDGWKURXJKWH[WXUHPHPRU\LQVWHDGRIJOREDOPHPRU\
float c = tex1Dfetch(texConstSrc,offset);
if (c != 0)
iptr[offset] = c;
}
6LQFHWKHVLJQDWXUHRIblend_kernel()FKDQJHGWRDFFHSWDȍDJWKDWVZLWFKHV
WKHEXIIHUVEHWZHHQLQSXWDQGRXWSXWZHQHHGDFRUUHVSRQGLQJFKDQJHWR
WKHanim_gpu()URXWLQH5DWKHUWKDQVZDSSLQJEXIIHUVZHVHWdstOut =
!dstOutWRWRJJOHWKHȍDJDIWHUHDFKVHULHVRIFDOOV
129
} else {
out = d->dev_inSrc;
in = d->dev_outSrc;
}
copy_const_kernel<<<blocks,threads>>>( in );
blend_kernel<<<blocks,threads>>>( out, dstOut );
dstOut = !dstOut;
}
float_to_color<<<blocks,threads>>>( d->output_bitmap,
d->dev_inSrc );
7KHȌQDOFKDQJHWRRXUKHDWWUDQVIHUURXWLQHLQYROYHVFOHDQLQJXSDWWKHHQGRI
WKHDSSOLFDWLRQǢVUXQ5DWKHUWKDQMXVWIUHHLQJWKHJOREDOEXIIHUVZHDOVRQHHGWR
XQELQGWH[WXUHV
130
texture<float,2> texConstSrc;
texture<float,2> texIn;
texture<float,2> texOut;
131
FDOOVWRtex2D()FDOOVZHQRORQJHUQHHGWRXVHWKHOLQHDUL]HGoffsetYDULDEOH
WRFRPSXWHWKHVHWRIRIIVHWVtop, left, rightDQGbottom:KHQZHVZLWFKWR
DWZRGLPHQVLRQDOWH[WXUHZHFDQXVHxDQGyGLUHFWO\WRDGGUHVVWKHWH[WXUH
)XUWKHUPRUHZHQRORQJHUKDYHWRZRUU\DERXWERXQGVRYHUȍRZZKHQZHVZLWFK
WRXVLQJtex2D(),IRQHRIxRUyLVOHVVWKDQ]HURtex2D()ZLOOUHWXUQWKH
YDOXHDW]HUR/LNHZLVHLIRQHRIWKHVHYDOXHVLVJUHDWHUWKDQWKHZLGWKtex2D()
ZLOOUHWXUQWKHYDOXHDWZLGWK1RWHWKDWLQRXUDSSOLFDWLRQWKLVEHKDYLRULVLGHDO
EXWLWǢVSRVVLEOHWKDWRWKHUDSSOLFDWLRQVZRXOGGHVLUHRWKHUEHKDYLRU
$VDUHVXOWRIWKHVHVLPSOLȌFDWLRQVRXUNHUQHOFOHDQVXSQLFHO\
float t, l, c, r, b;
if (dstOut) {
t = tex2D(texIn,x,y-1);
l = tex2D(texIn,x-1,y);
c = tex2D(texIn,x,y);
r = tex2D(texIn,x+1,y);
b = tex2D(texIn,x,y+1);
} else {
t = tex2D(texOut,x,y-1);
l = tex2D(texOut,x-1,y);
c = tex2D(texOut,x,y);
r = tex2D(texOut,x+1,y);
b = tex2D(texOut,x,y+1);
}
dst[offset] = c + SPEED * (t + b + r + l - 4 * c);
}
132
6LQFHDOORIRXUSUHYLRXVFDOOVWRtex1Dfetch()QHHGWREHFKDQJHGWRtex2D()
FDOOVZHPDNHWKHFRUUHVSRQGLQJFKDQJHLQcopy_const_kernel()6LPLODUO\
WRWKHNHUQHOblend_kernel()ZHQRORQJHUQHHGWRXVHoffsetWRDGGUHVV
WKHWH[WXUHZHVLPSO\XVHxDQGyWRDGGUHVVWKHFRQVWDQWVRXUFH
float c = tex2D(texConstSrc,x,y);
if (c != 0)
iptr[offset] = c;
}
7KHȌQDOFKDQJHWRWKHRQHGLPHQVLRQDOWH[WXUHYHUVLRQRIRXUKHDWWUDQVIHU
VLPXODWLRQLVDORQJWKHVDPHOLQHVDVRXUSUHYLRXVFKDQJHV6SHFLȌFDOO\LQ
main()ZHQHHGWRFKDQJHRXUWH[WXUHELQGLQJFDOOVWRLQVWUXFWWKHUXQWLPHWKDW
WKHEXIIHUZHSODQWRXVHZLOOEHWUHDWHGDVDWZRGLPHQVLRQDOWH[WXUHQRWDRQH
GLPHQVLRQDORQH
133
$VZLWKWKHQRQWH[WXUHDQGRQHGLPHQVLRQDOWH[WXUHYHUVLRQVZHEHJLQ
E\DOORFDWLQJVWRUDJHIRURXULQSXWDUUD\V:HGHYLDWHIURPWKHRQH
GLPHQVLRQDOH[DPSOHEHFDXVHWKH&8'$UXQWLPHUHTXLUHVWKDWZHSURYLGHD
cudaChannelFormatDescZKHQZHELQGWZRGLPHQVLRQDOWH[WXUHV7KH
SUHYLRXVOLVWLQJLQFOXGHVDGHFODUDWLRQRIDFKDQQHOIRUPDWGHVFULSWRU,QRXU
FDVHZHFDQDFFHSWWKHGHIDXOWSDUDPHWHUVDQGVLPSO\QHHGWRVSHFLI\WKDW
ZHUHTXLUHDȍRDWLQJSRLQWGHVFULSWRU:HWKHQELQGWKHWKUHHLQSXWEXIIHUVDV
WZRGLPHQVLRQDOWH[WXUHVXVLQJcudaBindTexture2D()WKHGLPHQVLRQVRI
WKHWH[WXUH DIM[DIM DQGWKHFKDQQHOIRUPDWGHVFULSWRU desc 7KHUHVWRI
main()UHPDLQVWKHVDPH
134
135
$OWKRXJKZHQHHGHGGLIIHUHQWIXQFWLRQVWRLQVWUXFWWKHUXQWLPHWRELQGRQH
GLPHQVLRQDORUWZRGLPHQVLRQDOWH[WXUHVZHXVHWKHVDPHURXWLQHWRXQELQG
WKHWH[WXUHcudaUnbindTexture()%HFDXVHRIWKLVRXUFOHDQXSURXWLQHFDQ
UHPDLQXQFKDQJHG
136
cudaUnbindTexture( texConstSrc );
cudaFree( d->dev_inSrc );
cudaFree( d->dev_outSrc );
cudaFree( d->dev_constSrc );
7KHYHUVLRQRIRXUKHDWWUDQVIHUVLPXODWLRQWKDWXVHVWZRGLPHQVLRQDOWH[WXUHV
KDVHVVHQWLDOO\LGHQWLFDOSHUIRUPDQFHFKDUDFWHULVWLFVDVWKHYHUVLRQWKDWXVHV
RQHGLPHQVLRQDOWH[WXUHV6RIURPDSHUIRUPDQFHVWDQGSRLQWWKHGHFLVLRQ
EHWZHHQRQHDQGWZRGLPHQVLRQDOWH[WXUHVLVOLNHO\WREHLQFRQVHTXHQWLDO)RU
RXUSDUWLFXODUDSSOLFDWLRQWKHFRGHLVDOLWWOHVLPSOHUZKHQXVLQJWZRGLPHQVLRQDO
WH[WXUHVEHFDXVHZHKDSSHQWREHVLPXODWLQJDWZRGLPHQVLRQDOGRPDLQ%XW
LQJHQHUDOVLQFHWKLVLVQRWDOZD\VWKHFDVHZHVXJJHVW\RXPDNHWKHGHFLVLRQ
EHWZHHQRQHDQGWZRGLPHQVLRQDOWH[WXUHVRQDFDVHE\FDVHEDVLV
&KDSWHU5HYLHZ
$VZHVDZLQWKHSUHYLRXVFKDSWHUZLWKFRQVWDQWPHPRU\VRPHRIWKHEHQHȌWRI
WH[WXUHPHPRU\FRPHVDVWKHUHVXOWRIRQFKLSFDFKLQJ7KLVLVHVSHFLDOO\QRWLFH-
DEOHLQDSSOLFDWLRQVVXFKDVRXUKHDWWUDQVIHUVLPXODWLRQDSSOLFDWLRQVWKDWKDYH
VRPHVSDWLDOFRKHUHQFHWRWKHLUGDWDDFFHVVSDWWHUQV:HVDZKRZHLWKHURQHRU
WZRGLPHQVLRQDOWH[WXUHVFDQEHXVHGERWKKDYLQJVLPLODUSHUIRUPDQFHFKDU-
DFWHULVWLFV$VZLWKDEORFNRUJULGVKDSHWKHFKRLFHRIRQHRUWZRGLPHQVLRQDO
WH[WXUHLVODUJHO\RQHRIFRQYHQLHQFH6LQFHWKHFRGHEHFDPHVRPHZKDWFOHDQHU
ZKHQZHVZLWFKHGWRWZRGLPHQVLRQDOWH[WXUHVDQGWKHERUGHUVDUHKDQGOHGDXWR-
PDWLFDOO\ZHZRXOGSUREDEO\DGYRFDWHWKHXVHRID'WH[WXUHLQRXUKHDWWUDQVIHU
DSSOLFDWLRQ%XWDV\RXVDZLWZLOOZRUNȌQHHLWKHUZD\
7H[WXUHPHPRU\FDQSURYLGHDGGLWLRQDOVSHHGXSVLIZHXWLOL]HVRPHRIWKHFRQYHU-
VLRQVWKDWWH[WXUHVDPSOHUVFDQSHUIRUPDXWRPDWLFDOO\VXFKDVXQSDFNLQJSDFNHG
GDWDLQWRVHSDUDWHYDULDEOHVRUFRQYHUWLQJDQGELWLQWHJHUVWRQRUPDOL]HG
ȍRDWLQJSRLQWQXPEHUV:HGLGQǢWH[SORUHHLWKHURIWKHVHFDSDELOLWLHVLQWKHKHDW
WUDQVIHUDSSOLFDWLRQEXWWKH\PLJKWEHXVHIXOWR\RX
137
6LQFHWKLVERRNKDVIRFXVHGRQJHQHUDOSXUSRVHFRPSXWDWLRQIRUWKHPRVWSDUW
ZHǢYHLJQRUHGWKDW*38VFRQWDLQVRPHVSHFLDOSXUSRVHFRPSRQHQWVDVZHOO7KH
*38RZHVLWVVXFFHVVWRLWVDELOLW\WRSHUIRUPFRPSOH[UHQGHULQJWDVNVLQUHDO
WLPHIUHHLQJWKHUHVWRIWKHV\VWHPWRFRQFHQWUDWHRQRWKHUZRUN7KLVOHDGVXV
WRWKHREYLRXVTXHVWLRQ&DQZHXVHWKH*38IRUERWKUHQGHULQJandJHQHUDO
SXUSRVHFRPSXWDWLRQLQWKHVDPHDSSOLFDWLRQ":KDWLIWKHLPDJHVZHZDQWWR
UHQGHUUHO\RQWKHUHVXOWVRIRXUFRPSXWDWLRQV"2UZKDWLIZHZDQWWRWDNHWKH
IUDPHZHǢYHUHQGHUHGDQGSHUIRUPVRPHLPDJHSURFHVVLQJRUVWDWLVWLFVFRPSX-
WDWLRQVRQLW"
)RUWXQDWHO\QRWRQO\LVWKLVLQWHUDFWLRQEHWZHHQJHQHUDOSXUSRVHFRPSXWDWLRQ
DQGUHQGHULQJPRGHVSRVVLEOHEXWLWǢVIDLUO\HDV\WRDFFRPSOLVKJLYHQZKDW\RX
DOUHDG\NQRZ&8'$&DSSOLFDWLRQVFDQVHDPOHVVO\LQWHURSHUDWHZLWKHLWKHURIWKH
WZRPRVWSRSXODUUHDOWLPHUHQGHULQJ$3,V2SHQ*/DQG'LUHFW;7KLVFKDSWHU
ZLOOORRNDWWKHPHFKDQLFVE\ZKLFK\RXFDQHQDEOHWKLVIXQFWLRQDOLW\
7KHH[DPSOHVLQWKLVFKDSWHUGHYLDWHVRPHIURPWKHSUHFHGHQWVZHǢYHVHWLQ
SUHYLRXVFKDSWHUV,QSDUWLFXODUWKLVFKDSWHUDVVXPHVDVLJQLȌFDQWDPRXQWDERXW
\RXUEDFNJURXQGZLWKRWKHUWHFKQRORJLHV6SHFLȌFDOO\ZHKDYHLQFOXGHGDFRQVLG-
HUDEOHDPRXQWRI2SHQ*/DQG*/87FRGHLQWKHVHH[DPSOHVDOPRVWQRQHRI
ZKLFKZLOOZHH[SODLQLQJUHDWGHSWK7KHUHDUHPDQ\VXSHUEUHVRXUFHVWROHDUQ
JUDSKLFV$3,VERWKRQOLQHDQGLQERRNVWRUHVEXWWKHVHWRSLFVDUHZHOOEH\RQGWKH
139
LQWHQGHGVFRSHRIWKLVERRN5DWKHUWKLVFKDSWHULQWHQGVWRIRFXVRQ&8'$&DQG
WKHIDFLOLWLHVLWRIIHUVWRLQFRUSRUDWHLWLQWR\RXUJUDSKLFVDSSOLFDWLRQV,I\RXDUH
XQIDPLOLDUZLWK2SHQ*/RU'LUHFW;\RXDUHXQOLNHO\WRGHULYHPXFKEHQHȌWIURP
WKLVFKDSWHUDQGPD\ZDQWWRVNLSWRWKHQH[W
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQKRZWRVHWXSD&8'$GHYLFHIRUJUDSKLFVLQWHURSHUDELOLW\
ǩ <RXZLOOOHDUQKRZWRVKDUHGDWDEHWZHHQ\RXU&8'$&NHUQHOVDQG2SHQ*/
UHQGHULQJ
*UDSKLFV,QWHURSHUDWLRQ
7RGHPRQVWUDWHWKHPHFKDQLFVRILQWHURSHUDWLRQEHWZHHQJUDSKLFVDQG&8'$&
ZHǢOOZULWHDQDSSOLFDWLRQWKDWZRUNVLQWZRVWHSV7KHȌUVWVWHSXVHVD&8'$&
NHUQHOWRJHQHUDWHLPDJHGDWD,QWKHVHFRQGVWHSWKHDSSOLFDWLRQSDVVHVWKLVGDWD
WRWKH2SHQ*/GULYHUWRUHQGHU7RDFFRPSOLVKWKLVZHZLOOXVHPXFKRIWKH&8'$
&ZHKDYHVHHQLQSUHYLRXVFKDSWHUVDORQJZLWKVRPH2SHQ*/DQG*/87FDOOV
7RVWDUWRXUDSSOLFDWLRQZHLQFOXGHWKHUHOHYDQW*/87DQG&8'$KHDGHUVLQRUGHU
WRHQVXUHWKHFRUUHFWIXQFWLRQVDQGHQXPHUDWLRQVDUHGHȌQHG:HDOVRGHȌQHWKH
VL]HRIWKHZLQGRZLQWRZKLFKRXUDSSOLFDWLRQSODQVWRUHQGHU$W[SL[HOV
ZHZLOOGRUHODWLYHO\VPDOOGUDZLQJV
#define GL_GLEXT_PROTOTYPES
#include "GL/glut.h"
#include "cuda.h"
#include "cuda_gl_interop.h"
#include "../common/book.h"
#include "../common/cpu_bitmap.h"
140
$GGLWLRQDOO\ZHGHFODUHWZRJOREDOYDULDEOHVWKDWZLOOVWRUHKDQGOHVWRWKHGDWDZH
LQWHQGWRVKDUHEHWZHHQ2SHQ*/DQGGDWD:HZLOOVHHPRPHQWDULO\KRZZHXVH
WKHVHWZRYDULDEOHVEXWWKH\ZLOOVWRUHGLIIHUHQWKDQGOHVWRWKHsameEXIIHU:H
QHHGWZRVHSDUDWHYDULDEOHVEHFDXVH2SHQ*/DQG&8'$ZLOOERWKKDYHGLIIHUHQW
ǤQDPHVǥIRUWKHEXIIHU7KHYDULDEOHbufferObjZLOOEH2SHQ*/ǢVQDPHIRUWKH
GDWDDQGWKHYDULDEOHresourceZLOOEHWKH&8'$&QDPHIRULW
GLuint bufferObj;
cudaGraphicsResource *resource;
1RZOHWǢVWDNHDORRNDWWKHDFWXDODSSOLFDWLRQ7KHȌUVWWKLQJZHGRLVVHOHFWD
&8'$GHYLFHRQZKLFKWRUXQRXUDSSOLFDWLRQ2QPDQ\V\VWHPVWKLVLVQRWD
FRPSOLFDWHGSURFHVVVLQFHWKH\ZLOORIWHQFRQWDLQRQO\DVLQJOH&8'$HQDEOHG
*38+RZHYHUDQLQFUHDVLQJQXPEHURIV\VWHPVFRQWDLQPRUHWKDQRQH&8'$
HQDEOHG*38VRZHQHHGDPHWKRGWRFKRRVHRQH)RUWXQDWHO\WKH&8'$UXQWLPH
SURYLGHVVXFKDIDFLOLW\WRXV
<RXPD\UHFDOOWKDWZHVDZcudaChooseDevice()LQ&KDSWHUEXWVLQFHLWZDV
VRPHWKLQJRIDQDQFLOODU\SRLQWZHǢOOUHYLHZLWDJDLQQRZ(VVHQWLDOO\WKLVFRGHWHOOV
WKHUXQWLPHWRVHOHFWDQ\*38WKDWKDVDcompute capabilityRIYHUVLRQRUEHWWHU
,WDFFRPSOLVKHVWKLVE\ȌUVWFUHDWLQJDQGFOHDULQJDcudaDevicePropVWUXFWXUH
DQGWKHQE\VHWWLQJLWVmajorYHUVLRQWRDQGminorYHUVLRQWR,WSDVVHVWKLV
LQIRUPDWLRQWRcudaChooseDevice()ZKLFKLQVWUXFWVWKHUXQWLPHWRVHOHFWD
*38LQWKHV\VWHPWKDWVDWLVȌHVWKHFRQVWUDLQWVVSHFLȌHGE\WKHcudaDeviceProp
VWUXFWXUH,QWKHQH[WFKDSWHUZHZLOOORRNPRUHDWZKDWLVPHDQWE\D*38ǢV
compute capabilityEXWIRUQRZLWVXIȌFHVWRVD\WKDWLWURXJKO\LQGLFDWHVWKHIHDWXUHV
D*38VXSSRUWV$OO&8'$FDSDEOH*38VKDYHDWOHDVWFRPSXWHFDSDELOLW\VR
WKHQHWHIIHFWRIWKLVFDOOLVWKDWWKHUXQWLPHZLOOVHOHFWDQ\&8'$FDSDEOHGHYLFH
DQGUHWXUQDQLGHQWLȌHUIRUWKLVGHYLFHLQWKHYDULDEOHdev7KHUHLVQRJXDUDQWHH
141
WKDWWKLVGHYLFHLVWKHEHVWRUIDVWHVW*38QRULVWKHUHDJXDUDQWHHWKDWWKHGHYLFH
ZLOOEHWKHVDPH*38IURPYHUVLRQWRYHUVLRQRIWKH&8'$UXQWLPH
,IWKHUHVXOWRIGHYLFHVHOHFWLRQLVVRVHHPLQJO\XQGHUZKHOPLQJZK\GR
ZHERWKHUZLWKDOOWKLVHIIRUWWRȌOODcudaDevicePropVWUXFWXUHDQGFDOO
cudaChooseDevice()WRJHWDYDOLGGHYLFH,'")XUWKHUPRUHZHQHYHUKDVVOHG
ZLWKWKLVWRPIRROHU\EHIRUHVRZK\QRZ"7KHVHDUHJRRGTXHVWLRQV,WWXUQVRXW
WKDWZHQHHGWRNQRZWKH&8'$GHYLFH,'VRWKDWZHFDQWHOOWKH&8'$UXQWLPH
WKDWZHLQWHQGWRXVHWKHGHYLFHIRU&8'$and2SHQ*/:HDFKLHYHWKLVZLWKD
FDOOWRcudaGLSetGLDevice()SDVVLQJWKHGHYLFH,'devZHREWDLQHGIURP
cudaChooseDevice()
$IWHUWKH&8'$UXQWLPHLQLWLDOL]DWLRQZHFDQSURFHHGWRLQLWLDOL]HWKH2SHQ*/
GULYHUE\FDOOLQJRXU*/8WLOLW\7RRONLW */87 VHWXSIXQFWLRQV7KLVVHTXHQFHRI
FDOOVVKRXOGORRNUHODWLYHO\IDPLOLDULI\RXǢYHXVHG*/87EHIRUH
$WWKLVSRLQWLQmain()ZHǢYHSUHSDUHGRXU&8'$UXQWLPHWRSOD\QLFHO\ZLWKWKH
2SHQ*/GULYHUE\FDOOLQJcudaGLSetGLDevice()7KHQZHLQLWLDOL]HG*/87DQG
FUHDWHGDZLQGRZQDPHGǤELWPDSǥLQZKLFKWRGUDZRXUUHVXOWV1RZZHFDQJHW
RQWRWKHDFWXDO2SHQ*/LQWHURSHUDWLRQ
6KDUHGGDWDEXIIHUVDUHWKHNH\FRPSRQHQWWRLQWHURSHUDWLRQEHWZHHQ&8'$&
NHUQHOVDQG2SHQ*/UHQGHULQJ7RSDVVGDWDEHWZHHQ2SHQ*/DQG&8'$ZHZLOO
ȌUVWQHHGWRFUHDWHDEXIIHUWKDWFDQEHXVHGZLWKERWK$3,V:HVWDUWWKLVSURFHVV
E\FUHDWLQJDSL[HOEXIIHUREMHFWLQ2SHQ*/DQGVWRULQJWKHKDQGOHLQRXUJOREDO
YDULDEOHGLuint bufferObj
glGenBuffers( 1, &bufferObj );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, bufferObj );
glBufferData( GL_PIXEL_UNPACK_BUFFER_ARB, DIM * DIM * 4,
NULL, GL_DYNAMIC_DRAW_ARB );
142
$OOWKDWUHPDLQVLQRXUTXHVWWRVHWXSJUDSKLFVLQWHURSHUDELOLW\LVQRWLI\LQJWKH
&8'$UXQWLPHWKDWZHLQWHQGWRVKDUHWKH2SHQ*/EXIIHUQDPHGbufferObj
ZLWK&8'$:HGRWKLVE\UHJLVWHULQJbufferObjZLWKWKH&8'$UXQWLPHDVD
JUDSKLFVUHVRXUFH
HANDLE_ERROR(
cudaGraphicsGLRegisterBuffer( &resource,
bufferObj,
cudaGraphicsMapFlagsNone )
);
:HVSHFLI\WRWKH&8'$UXQWLPHWKDWZHLQWHQGWRXVHWKH
2SHQ*/3%2bufferObjZLWKERWK2SHQ*/DQG&8'$E\FDOOLQJ
cudaGraphicsGLRegisterBuffer()7KH&8'$UXQWLPHUHWXUQVD&8'$
IULHQGO\KDQGOHWRWKHEXIIHULQWKHYDULDEOHresource7KLVKDQGOHZLOOEHXVHGWR
UHIHUWRbufferObjLQVXEVHTXHQWFDOOVWRWKH&8'$UXQWLPH
7KHȍDJcudaGraphicsMapFlagsNoneVSHFLȌHVWKDWWKHUHLVQRSDUWLFXODU
EHKDYLRURIWKLVEXIIHUWKDWZHZDQWWRVSHFLI\DOWKRXJKZHKDYHWKHRSWLRQWR
VSHFLI\ZLWKcudaGraphicsMapFlagsReadOnlyWKDWWKHEXIIHUZLOOEHUHDG
RQO\:HFRXOGDOVRXVHcudaGraphicsMapFlagsWriteDiscardWRVSHFLI\
WKDWWKHSUHYLRXVFRQWHQWVZLOOEHGLVFDUGHGPDNLQJWKHEXIIHUHVVHQWLDOO\
ZULWHRQO\7KHVHȍDJVDOORZWKH&8'$DQG2SHQ*/GULYHUVWRRSWLPL]HWKHKDUG-
ZDUHVHWWLQJVIRUEXIIHUVZLWKUHVWULFWHGDFFHVVSDWWHUQVDOWKRXJKWKH\DUHQRW
UHTXLUHGWREHVHW
(IIHFWLYHO\WKHFDOOWRglBufferData()UHTXHVWVWKH2SHQ*/GULYHUWRDOORFDWHD
EXIIHUODUJHHQRXJKWRKROGDIM[DIMELWYDOXHV,QVXEVHTXHQW2SHQ*/FDOOV
ZHǢOOUHIHUWRWKLVEXIIHUZLWKWKHKDQGOHbufferObjZKLOHLQ&8'$UXQWLPHFDOOV
ZHǢOOUHIHUWRWKLVEXIIHUZLWKWKHSRLQWHUresource6LQFHZHZRXOGOLNHWRUHDG
IURPDQGZULWHWRWKLVEXIIHUIURPRXU&8'$&NHUQHOVZHZLOOQHHGPRUHWKDQMXVW
DKDQGOHWRWKHREMHFW:HZLOOQHHGDQDFWXDODGGUHVVLQGHYLFHPHPRU\WKDWFDQEH
143
SDVVHGWRRXUNHUQHO:HDFKLHYHWKLVE\LQVWUXFWLQJWKH&8'$UXQWLPHWRPDSWKH
VKDUHGUHVRXUFHDQGWKHQE\UHTXHVWLQJDSRLQWHUWRWKHPDSSHGUHVRXUFH
uchar4* devPtr;
size_t size;
HANDLE_ERROR( cudaGraphicsMapResources( 1, &resource, NULL ) );
HANDLE_ERROR(
cudaGraphicsResourceGetMappedPointer( (void**)&devPtr,
&size,
resource )
);
:HFDQWKHQXVHdevPtrDVZHZRXOGXVHDQ\GHYLFHSRLQWHUH[FHSWWKDWWKHGDWD
FDQDOVREHXVHGE\2SHQ*/DVDSL[HOVRXUFH$IWHUDOOWKHVHVHWXSVKHQDQLJDQV
WKHUHVWRImain()SURFHHGVDVIROORZV)LUVWZHODXQFKRXUNHUQHOSDVVLQJLW
WKHSRLQWHUWRRXUVKDUHGEXIIHU7KLVNHUQHOWKHFRGHRIZKLFKZHKDYHQRWVHHQ
\HWJHQHUDWHVLPDJHGDWDWREHUHQGHUHG1H[WZHXQPDSRXUVKDUHGUHVRXUFH
7KLVFDOOLVLPSRUWDQWWRPDNHSULRUWRSHUIRUPLQJUHQGHULQJWDVNVEHFDXVHLW
SURYLGHVV\QFKURQL]DWLRQEHWZHHQWKH&8'$DQGJUDSKLFVSRUWLRQVRIWKHDSSOLFD-
WLRQ6SHFLȌFDOO\LWLPSOLHVWKDWDOO&8'$RSHUDWLRQVSHUIRUPHGSULRUWRWKHFDOO
WRcudaGraphicsUnmapResources()ZLOOFRPSOHWHEHIRUHHQVXLQJJUDSKLFV
FDOOVEHJLQ
/DVWO\ZHUHJLVWHURXUNH\ERDUGDQGGLVSOD\FDOOEDFNIXQFWLRQVZLWK*/87
(key_funcDQGdraw_func DQGZHUHOLQTXLVKFRQWUROWRWKH*/87UHQGHULQJ
ORRSZLWKglutMainLoop()
dim3 grids(DIM/16,DIM/16);
dim3 threads(16,16);
kernel<<<grids,threads>>>( devPtr );
144
7KHUHPDLQGHURIWKHDSSOLFDWLRQFRQVLVWVRIWKHWKUHHIXQFWLRQVZHMXVWKLJK-
OLJKWHGkernel(), key_func()DQGdraw_func()6ROHWǢVWDNHDORRNDW
WKRVH
7KHNHUQHOIXQFWLRQWDNHVDGHYLFHSRLQWHUDQGJHQHUDWHVLPDJHGDWD,QWKH
IROORZLQJH[DPSOHZHǢUHXVLQJDNHUQHOLQVSLUHGE\WKHULSSOHH[DPSOHLQ
&KDSWHU
0DQ\IDPLOLDUFRQFHSWVDUHDWZRUNKHUH7KHPHWKRGIRUWXUQLQJWKUHDGDQGEORFN
LQGLFHVLQWRxDQGyFRRUGLQDWHVDQGDOLQHDURIIVHWKDVEHHQH[DPLQHGVHYHUDO
WLPHV:HWKHQSHUIRUPVRPHUHDVRQDEO\DUELWUDU\FRPSXWDWLRQVWRGHWHUPLQHWKH
FRORUIRUWKHSL[HODWWKDW(x,y)ORFDWLRQDQGZHVWRUHWKRVHYDOXHVWRPHPRU\
:HǢUHDJDLQXVLQJ&8'$&WRSURFHGXUDOO\JHQHUDWHDQLPDJHRQWKH*387KH
LPSRUWDQWWKLQJWRUHDOL]HLVWKDWWKLVLPDJHZLOOWKHQEHKDQGHGdirectlyWR2SHQ*/
IRUUHQGHULQJZLWKRXWWKH&38HYHUJHWWLQJLQYROYHG2QWKHRWKHUKDQGLQWKH
ULSSOHH[DPSOHRI&KDSWHUZHJHQHUDWHGLPDJHGDWDRQWKH*38YHU\PXFKOLNH
WKLVEXWRXUDSSOLFDWLRQWKHQFRSLHGWKHEXIIHUEDFNWRWKH&38IRUGLVSOD\
145
6RKRZGRZHGUDZWKH&8'$JHQHUDWHGEXIIHUXVLQJ2SHQ*/":HOOLI\RXUHFDOO
WKHVHWXSZHSHUIRUPHGLQmain()\RXǢOOUHPHPEHUWKHIROORZLQJ
glBindBuffer( GL _ PIXEL _ UNPACK _ BUFFER _ ARB, bufferObj );
7KLVFDOOERXQGWKHVKDUHGEXIIHUDVDSL[HOVRXUFHIRUWKH2SHQ*/GULYHUWR
XVHLQDOOVXEVHTXHQWFDOOVWRglDrawPixels()(VVHQWLDOO\WKLVPHDQVWKDW
DFDOOWRglDrawPixels()LVDOOWKDWZHQHHGLQRUGHUWRUHQGHUWKHLPDJH
GDWDRXU&8'$&NHUQHOJHQHUDWHG&RQVHTXHQWO\WKHIROORZLQJLVDOOWKDWRXU
draw_func()QHHGVWRGR
,WǢVSRVVLEOH\RXǢYHVHHQglDrawPixels()ZLWKDEXIIHUSRLQWHUDVWKHODVWDUJX-
PHQW7KH2SHQ*/GULYHUZLOOFRS\IURPWKLVEXIIHULIQREXIIHULVERXQGDVDGL_
PIXEL_UNPACK_BUFFER_ARBVRXUFH+RZHYHUVLQFHRXUGDWDLVDOUHDG\RQWKH
*38DQGZHhaveERXQGRXUVKDUHGEXIIHUDVWKHGL_PIXEL_UNPACK_BUFFER_
ARBVRXUFHWKLVODVWSDUDPHWHULQVWHDGEHFRPHVDQRIIVHWLQWRWKHERXQGEXIIHU
%HFDXVHZHZDQWWRUHQGHUWKHHQWLUHEXIIHUWKLVRIIVHWLV]HURIRURXUDSSOLFDWLRQ
7KHODVWFRPSRQHQWWRWKLVH[DPSOHVHHPVVRPHZKDWDQWLFOLPDFWLFEXWZHǢYH
GHFLGHGWRJLYHRXUXVHUVDPHWKRGWRH[LWWKHDSSOLFDWLRQ,QWKLVYHLQRXU
key_func()FDOOEDFNUHVSRQGVRQO\WRWKH(VFNH\DQGXVHVWKLVDVDVLJQDOWR
FOHDQXSDQGH[LW
146
:KHQUXQWKLVH[DPSOHGUDZVDPHVPHUL]LQJSLFWXUHLQǤ19,',$*UHHQǥDQG
EODFNVKRZQLQ)LJXUH7U\XVLQJLWWRK\SQRWL]H\RXUIULHQGV RUHQHPLHV
*385LSSOHZLWK*UDSKLFV
,QWHURSHUDELOLW\
,QǤ6HFWLRQ*UDSKLFV,QWHURSHUDWLRQǥZHUHIHUUHGWR&KDSWHUǢV*38ULSSOH
H[DPSOHDIHZWLPHV,I\RXUHFDOOWKDWDSSOLFDWLRQFUHDWHGDCPUAnimBitmap
DQGSDVVHGLWDIXQFWLRQWREHFDOOHGZKHQHYHUDIUDPHQHHGHGWREHJHQHUDWHG
147
:LWKWKHWHFKQLTXHVZHǢYHOHDUQHGLQWKHSUHYLRXVVHFWLRQZHLQWHQGWRFUHDWHD
GPUAnimBitmapVWUXFWXUH7KLVVWUXFWXUHZLOOVHUYHWKHVDPHSXUSRVHDVWKH
CPUAnimBitmapEXWLQWKLVLPSURYHGYHUVLRQWKH&8'$DQG2SHQ*/FRPSR-
QHQWVZLOOFRRSHUDWHZLWKRXW&38LQWHUYHQWLRQ:KHQZHǢUHGRQHWKHDSSOLFDWLRQ
ZLOOXVHDGPUAnimBitmapVRWKDWmain()ZLOOEHFRPHVLPSO\DVIROORZV
bitmap.anim_and_exit(
(void (*)(uchar4*,void*,int))generate_frame, NULL );
}
7KHGPUAnimBitmapVWUXFWXUHXVHVWKHVDPHFDOOVZHMXVWH[DPLQHGLQ
6HFWLRQ*UDSKLFV,QWHURSHUDWLRQ+RZHYHUQRZWKHVHFDOOVZLOOEHDEVWUDFWHG
DZD\LQDGPUAnimBitmapVWUXFWXUHVRWKDWIXWXUHH[DPSOHV DQGSRWHQWLDOO\
\RXURZQDSSOLFDWLRQV ZLOOEHFOHDQHU
7+(*38$1,0%,70$36758&785(
6HYHUDORIWKHGDWDPHPEHUVIRURXUGPUAnimBitmapZLOOORRNIDPLOLDUWR\RX
IURP6HFWLRQ*UDSKLFV,QWHURSHUDWLRQ
struct GPUAnimBitmap {
GLuint bufferObj;
cudaGraphicsResource *resource;
int width, height;
void *dataBlock;
void (*fAnim)(uchar4*,void*,int);
void (*animExit)(void*);
void (*clickDrag)(void*,int,int,int,int);
int dragStartX, dragStartY;
148
:HNQRZWKDW2SHQ*/DQGWKH&8'$UXQWLPHZLOOKDYHGLIIHUHQWQDPHVIRURXU
*38EXIIHUDQGZHNQRZWKDWZHZLOOQHHGWRUHIHUWRERWKRIWKHVHQDPHV
GHSHQGLQJRQZKHWKHUZHDUHPDNLQJ2SHQ*/RU&8'$&FDOOV7KHUHIRUHRXU
VWUXFWXUHZLOOVWRUHERWK2SHQ*/ǢVbufferObjQDPHDQGWKH&8'$UXQWLPHǢV
UHVRXUFHQDPH6LQFHZHDUHGHDOLQJZLWKDELWPDSLPDJHWKDWZHLQWHQGWR
GLVSOD\ZHNQRZWKDWWKHLPDJHZLOOKDYHDZLGWKDQGKHLJKWWRLW
7RDOORZXVHUVRIRXUGPUAnimBitmapWRUHJLVWHUIRUFHUWDLQFDOOEDFNHYHQWV
ZHZLOODOVRVWRUHDvoid*SRLQWHUWRDUELWUDU\XVHUGDWDLQdataBlock2XU
FODVVZLOOQHYHUORRNDWWKLVGDWDEXWZLOOVLPSO\SDVVLWEDFNWRDQ\UHJLVWHUHG
FDOOEDFNIXQFWLRQV7KHFDOOEDFNVWKDWDXVHUPD\UHJLVWHUDUHVWRUHGLQfAnim,
animExitDQGclickDrag7KHIXQFWLRQfAnim()JHWVFDOOHGLQHYHU\FDOOWR
glutIdleFunc()DQGWKLVIXQFWLRQLVUHVSRQVLEOHIRUSURGXFLQJWKHLPDJHGDWD
WKDWZLOOEHUHQGHUHGLQWKHDQLPDWLRQ7KHIXQFWLRQanimExit()ZLOOEHFDOOHG
RQFHZKHQWKHDQLPDWLRQH[LWV7KLVLVZKHUHWKHXVHUVKRXOGLPSOHPHQWFOHDQXS
FRGHWKDWQHHGVWREHH[HFXWHGZKHQWKHDQLPDWLRQHQGV)LQDOO\clickDrag(),
DQRSWLRQDOIXQFWLRQLPSOHPHQWVWKHXVHUǢVUHVSRQVHWRPRXVHFOLFNGUDJHYHQWV
,IWKHXVHUUHJLVWHUVWKLVIXQFWLRQLWJHWVFDOOHGDIWHUHYHU\VHTXHQFHRIPRXVH
EXWWRQSUHVVGUDJDQGUHOHDVHHYHQWV7KHORFDWLRQRIWKHLQLWLDOPRXVHFOLFNLQ
WKLVVHTXHQFHLVVWRUHGLQ(dragStartX, dragStartY)VRWKDWWKHVWDUWDQG
HQGSRLQWVRIWKHFOLFNGUDJHYHQWFDQEHSDVVHGWRWKHXVHUZKHQWKHPRXVH
EXWWRQLVUHOHDVHG7KLVFDQEHXVHGWRLPSOHPHQWLQWHUDFWLYHDQLPDWLRQVWKDWZLOO
LPSUHVV\RXUIULHQGV
,QLWLDOL]LQJDGPUAnimBitmapIROORZVWKHVDPHVHTXHQFHRIFRGHWKDWZHVDZ
LQRXUSUHYLRXVH[DPSOH$IWHUVWDVKLQJDZD\DUJXPHQWVLQWKHDSSURSULDWH
VWUXFWXUHPHPEHUVZHVWDUWE\TXHU\LQJWKH&8'$UXQWLPHIRUDVXLWDEOH&8'$
GHYLFH
149
$IWHUȌQGLQJDFRPSDWLEOH&8'$GHYLFHZHPDNHWKHLPSRUWDQW
cudaGLSetGLDevice()FDOOWRWKH&8'$UXQWLPHLQRUGHUWRQRWLI\LWWKDWZH
LQWHQGWRXVHdevDVDGHYLFHIRULQWHURSHUDWLRQZLWK2SHQ*/
cudaGLSetGLDevice( dev );
6LQFHRXUIUDPHZRUNXVHV*/87WRFUHDWHDZLQGRZHGUHQGHULQJHQYLURQPHQWZH
QHHGWRLQLWLDOL]H*/877KLVLVXQIRUWXQDWHO\DELWDZNZDUGVLQFHglutInit()
ZDQWVFRPPDQGOLQHDUJXPHQWVWRSDVVWRWKHZLQGRZLQJV\VWHP6LQFHZHKDYH
QRQHZHZDQWWRSDVVZHZRXOGOLNHWRVLPSO\VSHFLI\]HURFRPPDQGOLQHDUJX-
PHQWV8QIRUWXQDWHO\VRPHYHUVLRQVRI*/87KDYHDEXJWKDWFDXVHDSSOLFDWLRQV
WRFUDVKZKHQ]HURDUJXPHQWVDUHJLYHQ6RZHWULFN*/87LQWRWKLQNLQJWKDW
ZHǢUHSDVVLQJDQDUJXPHQWDQGDVDUHVXOWOLIHLVJRRG
int c=1;
char *foo = "name";
glutInit( &c, &foo );
:HFRQWLQXHLQLWLDOL]LQJ*/87H[DFWO\DVZHGLGLQWKHSUHYLRXVH[DPSOH:H
FUHDWHDZLQGRZLQZKLFKWRUHQGHUVSHFLI\LQJDWLWOHZLWKWKHVWULQJǤELWPDSǥ,I
\RXǢGOLNHWRQDPH\RXUZLQGRZVRPHWKLQJPRUHLQWHUHVWLQJEHRXUJXHVW
150
1H[WZHUHTXHVWIRUWKH2SHQ*/GULYHUWRDOORFDWHDEXIIHUKDQGOHWKDWZHLPPH-
GLDWHO\ELQGWRWKHGL_PIXEL_UNPACK_BUFFER_ARBWDUJHWWRHQVXUHWKDWIXWXUH
FDOOVWRglDrawPixels()ZLOOGUDZWRRXULQWHURSEXIIHU
glGenBuffers( 1, &bufferObj );
glBindBuffer( GL_PIXEL_UNPACK_BUFFER_ARB, bufferObj );
/DVWEXWPRVWFHUWDLQO\QRWOHDVWZHUHTXHVWWKDWWKH2SHQ*/GULYHUDOORFDWHD
UHJLRQRI*38PHPRU\IRUXV2QFHWKLVLVGRQHZHLQIRUPWKH&8'$UXQWLPHRI
WKLVEXIIHUDQGUHTXHVWD&8'$&QDPHIRUWKLVEXIIHUE\UHJLVWHULQJbufferObj
ZLWKcudaGraphicsGLRegisterBuffer()
HANDLE_ERROR(
cudaGraphicsGLRegisterBuffer( &resource,
bufferObj,
cudaGraphicsMapFlagsNone ) );
}
:LWKWKHGPUAnimBitmapVHWXSWKHRQO\UHPDLQLQJFRQFHUQLVH[DFWO\KRZ
ZHSHUIRUPWKHUHQGHULQJ7KHPHDWRIWKHUHQGHULQJZLOOEHGRQHLQRXU
glutIdleFunction()7KLVIXQFWLRQZLOOHVVHQWLDOO\GRWKUHHWKLQJV)LUVWLW
PDSVRXUVKDUHGEXIIHUDQGUHWULHYHVD*38SRLQWHUIRUWKLVEXIIHU
151
HANDLE_ERROR(
cudaGraphicsMapResources( 1, &(bitmap->resource), NULL )
);
HANDLE_ERROR(
cudaGraphicsResourceGetMappedPointer( (void**)&devPtr,
&size,
bitmap->resource )
);
6HFRQGLWFDOOVWKHXVHUVSHFLȌHGIXQFWLRQfAnim()WKDWSUHVXPDEO\ZLOOODXQFK
D&8'$&NHUQHOWRȌOOWKHEXIIHUDWdevPtrZLWKLPDJHGDWD
$QGODVWO\LWXQPDSVWKH*38SRLQWHUWKDWZLOOUHOHDVHWKHEXIIHUIRUXVHE\
WKH2SHQ*/GULYHULQUHQGHULQJ7KLVUHQGHULQJZLOOEHWULJJHUHGE\DFDOOWR
glutPostRedisplay()
HANDLE_ERROR(
cudaGraphicsUnmapResources( 1,
&(bitmap->resource),
NULL ) );
glutPostRedisplay();
}
7KHUHPDLQGHURIWKHGPUAnimBitmapVWUXFWXUHFRQVLVWVRILPSRUWDQWEXWVRPH-
ZKDWWDQJHQWLDOLQIUDVWUXFWXUHFRGH,I\RXKDYHDQLQWHUHVWLQLW\RXVKRXOGE\DOO
PHDQVH[DPLQHLW%XWZHIHHOWKDW\RXǢOOEHDEOHWRSURFHHGVXFFHVVIXOO\HYHQLI
\RXODFNWKHWLPHRULQWHUHVWWRGLJHVWWKHUHVWRIWKHFRGHLQGPUAnimBitmap
*385,33/(5('8;
1RZWKDWZHKDYHD*38YHUVLRQRICPUAnimBitmapZHFDQSURFHHGWR
UHWURȌWRXU*38ULSSOHDSSOLFDWLRQWRSHUIRUPLWVDQLPDWLRQHQWLUHO\RQWKH*38
7REHJLQZHZLOOLQFOXGHgpu_anim.hWKHKRPHRIRXULPSOHPHQWDWLRQRI
152
GPUAnimBitmap:HDOVRLQFOXGHQHDUO\WKHVDPHNHUQHODVZHH[DPLQHGLQ
&KDSWHU
#include "../common/book.h"
#include "../common/gpu_anim.h"
7KHRQHDQGRQO\FKDQJHZHǢYHPDGHLVKLJKOLJKWHG7KHUHDVRQIRUWKLVFKDQJH
LVEHFDXVH2SHQ*/LQWHURSHUDWLRQUHTXLUHVWKDWRXUVKDUHGVXUIDFHVEHǤJUDSKLFV
IULHQGO\ǥ%HFDXVHUHDOWLPHUHQGHULQJW\SLFDOO\XVHVDUUD\VRIIRXUFRPSRQHQW
UHGJUHHQEOXHDOSKD GDWDHOHPHQWVRXUWDUJHWEXIIHULVQRORQJHUVLPSO\DQ
DUUD\RIunsigned charDVLWSUHYLRXVO\ZDV,WǢVQRZUHTXLUHGWREHDQDUUD\RI
W\SHuchar4,QUHDOLW\ZHWUHDWHGRXUEXIIHULQ&KDSWHUDVDIRXUFRPSRQHQW
EXIIHUVRZHDOZD\VLQGH[HGLWZLWKptr[offset*4+k]ZKHUHkLQGLFDWHVWKH
FRPSRQHQWIURPWR%XWQRZWKHIRXUFRPSRQHQWQDWXUHRIWKHGDWDLVPDGH
H[SOLFLWZLWKWKHVZLWFKWRDuchar4W\SH
153
6LQFHkernel()LVD&8'$&IXQFWLRQWKDWJHQHUDWHVLPDJHGDWDDOOWKDW
UHPDLQVLVZULWLQJDKRVWIXQFWLRQWKDWZLOOEHXVHGDVDFDOOEDFNLQWKH
idle_func()PHPEHURIGPUAnimBitmap)RURXUFXUUHQWDSSOLFDWLRQ
DOOWKLVIXQFWLRQGRHVLVODXQFKWKH&8'$&NHUQHO
7KDWǢVEDVLFDOO\HYHU\WKLQJZHQHHGVLQFHDOORIWKHKHDY\OLIWLQJZDV
GRQHLQWKHGPUAnimBitmapVWUXFWXUH7RJHWWKLVSDUW\VWDUWHGZHMXVW
FUHDWHDGPUAnimBitmapDQGUHJLVWHURXUDQLPDWLRQFDOOEDFNIXQFWLRQ
generate_frame()
bitmap.anim_and_exit(
(void (*)(uchar4*,void*,int))generate_frame, NULL );
}
+HDW7UDQVIHUZLWK*UDSKLFV,QWHURS
6RZKDWKDVEHHQWKHSRLQWRIGRLQJDOORIWKLV",I\RXORRNDWWKHLQWHUQDOVRIWKH
CPUAnimBitmapWKHVWUXFWXUHZHXVHGIRUSUHYLRXVDQLPDWLRQH[DPSOHVZH
ZRXOGVHHWKDWLWZRUNVDOPRVWH[DFWO\OLNHWKHUHQGHULQJFRGHLQ6HFWLRQ
*UDSKLFV,QWHURSHUDWLRQ
Almost.
7KHNH\GLIIHUHQFHEHWZHHQWKHCPUAnimBitmapDQGWKHSUHYLRXVH[DPSOHLV
EXULHGLQWKHFDOOWRglDrawPixels()
154
glDrawPixels( bitmap->x,
bitmap->y,
GL_RGBA,
GL_UNSIGNED_BYTE,
bitmap->pixels );
:HUHPDUNHGLQWKHȌUVWH[DPSOHRIWKLVFKDSWHUWKDW\RXPD\KDYHSUHYLRXVO\
VHHQFDOOVWRglDrawPixels()ZLWKDEXIIHUSRLQWHUDVWKHODVWDUJXPHQW
:HOOLI\RXKDGQǢWEHIRUH\RXKDYHQRZ7KLVFDOOLQWKHDraw()URXWLQHRI
CPUAnimBitmapWULJJHUVDFRS\RIWKH&38EXIIHULQbitmap->pixelsWRWKH
*38IRUUHQGHULQJ7RGRWKLVWKH&38QHHGVWRVWRSZKDWLWǢVGRLQJDQGLQLWLDWH
DFRS\RQWRWKH*38IRUHYHU\IUDPH7KLVUHTXLUHVV\QFKURQL]DWLRQEHWZHHQWKH
&38DQG*38DQGDGGLWLRQDOODWHQF\WRLQLWLDWHDQGFRPSOHWHDWUDQVIHURYHUWKH
3&,([SUHVVEXV6LQFHWKHFDOOWRglDrawPixels()H[SHFWVDKRVWSRLQWHULQ
WKHODVWDUJXPHQWWKLVDOVRPHDQVWKDWDIWHUJHQHUDWLQJDIUDPHRILPDJHGDWD
ZLWKD&8'$&NHUQHORXU&KDSWHUULSSOHDSSOLFDWLRQQHHGHGWRFRS\WKHIUDPH
IURPWKH*38WRWKH&38ZLWKDcudaMemcpy()
7DNHQWRJHWKHUWKHVHIDFWVPHDQWKDWRXURULJLQDO*38ULSSOHDSSOLFDWLRQ
ZDVPRUHWKDQDOLWWOHVLOO\:HXVHG&8'$&WRFRPSXWHLPDJHYDOXHVIRURXU
UHQGHULQJLQHDFKIUDPHEXWDIWHUWKHFRPSXWDWLRQVZHUHGRQHZHFRSLHGWKH
EXIIHUWRWKH&38ZKLFKWKHQFRSLHGWKHEXIIHUbackWRWKH*38IRUGLVSOD\7KLV
PHDQVWKDWZHLQWURGXFHGXQQHFHVVDU\GDWDWUDQVIHUVEHWZHHQWKHKRVWDQG
155
WKHGHYLFHWKDWVWRRGEHWZHHQXVDQGPD[LPXPSHUIRUPDQFH/HWǢVUHYLVLWD
FRPSXWHLQWHQVLYHDQLPDWLRQDSSOLFDWLRQWKDWPLJKWVHHLWVSHUIRUPDQFHLPSURYH
E\PLJUDWLQJLWWRXVHJUDSKLFVLQWHURSHUDWLRQIRULWVUHQGHULQJ
,I\RXUHFDOOWKHSUHYLRXVFKDSWHUǢVKHDWVLPXODWLRQDSSOLFDWLRQ\RXZLOO
UHPHPEHUWKDWLWDOVRXVHGCPUAnimBitmapLQRUGHUWRGLVSOD\WKHRXWSXWRILWV
VLPXODWLRQFRPSXWDWLRQV:HZLOOPRGLI\WKLVDSSOLFDWLRQWRXVHRXUQHZO\LPSOH-
PHQWHGGPUAnimBitmapVWUXFWXUHDQGORRNDWKRZWKHUHVXOWLQJSHUIRUPDQFH
FKDQJHV$VZLWKWKHULSSOHH[DPSOHRXUGPUAnimBitmapLVDOPRVWDSHUIHFW
GURSLQUHSODFHPHQWIRUCPUAnimBitmapZLWKWKHH[FHSWLRQRIWKHunsigned
charWRuchar4FKDQJH6RWKHVLJQDWXUHRIRXUDQLPDWLRQURXWLQHFKDQJHVLQ
RUGHUWRDFFRPPRGDWHWKLVVKLIWLQGDWDW\SHV
156
6LQFHWKHfloat_to_color()NHUQHOLVWKHRQO\IXQFWLRQWKDWDFWXDOO\XVHVWKH
outputBitmapLWǢVWKHRQO\RWKHUIXQFWLRQWKDWQHHGVPRGLȌFDWLRQDVDUHVXOW
RIRXUVKLIWWRuchar47KLVIXQFWLRQZDVVLPSO\FRQVLGHUHGXWLOLW\FRGHLQWKH
SUHYLRXVFKDSWHUDQGZHZLOOFRQWLQXHWRFRQVLGHULWXWLOLW\FRGH+RZHYHUZH
KDYHRYHUORDGHGWKLVIXQFWLRQDQGLQFOXGHGERWKunsigned charDQGuchar4
YHUVLRQVLQbook.h<RXZLOOQRWLFHWKDWWKHGLIIHUHQFHVEHWZHHQWKHVHIXQF-
WLRQVDUHLGHQWLFDOWRWKHGLIIHUHQFHVEHWZHHQkernel()LQWKH&38DQLPDWHG
DQG*38DQLPDWHGYHUVLRQVRI*38ULSSOH0RVWRIWKHFRGHIRUWKHfloat_to_
color()NHUQHOVKDVEHHQRPLWWHGIRUFODULW\EXWZHHQFRXUDJH\RXWRFRQVXOW
book.hLI\RXǢUHG\LQJWRVHHWKHGHWDLOV
157
2XWVLGHRIWKHVHFKDQJHVWKHRQO\PDMRUGLIIHUHQFHLVLQWKHFKDQJHIURP
CPUAnimBitmapWRGPUAnimBitmapWRSHUIRUPDQLPDWLRQ
158
159
$OWKRXJKLWPLJKWEHLQVWUXFWLYHWRWDNHDJODQFHDWWKHUHVWRIWKLVHQKDQFHGKHDW
VLPXODWLRQDSSOLFDWLRQLWLVQRWVXIȌFLHQWO\GLIIHUHQWIURPWKHSUHYLRXVFKDSWHUǢV
YHUVLRQWRZDUUDQWPRUHGHVFULSWLRQ7KHLPSRUWDQWFRPSRQHQWLVDQVZHULQJWKH
TXHVWLRQKRZGRHVSHUIRUPDQFHFKDQJHQRZWKDWZHǢYHFRPSOHWHO\PLJUDWHGWKH
DSSOLFDWLRQWRWKH*38":LWKRXWKDYLQJWRFRS\HYHU\IUDPHEDFNWRWKHKRVWIRU
GLVSOD\WKHVLWXDWLRQVKRXOGEHPXFKKDSSLHUWKDQLWZDVSUHYLRXVO\
6RH[DFWO\KRZPXFKEHWWHULVLWWRXVHWKHJUDSKLFVLQWHURSHUDELOLW\WRSHUIRUP
WKHUHQGHULQJ"3UHYLRXVO\WKHKHDWWUDQVIHUH[DPSOHFRQVXPHGDERXWPVSHU
IUDPHRQRXU*H)RUFH*7;ǞEDVHGWHVWPDFKLQH$IWHUFRQYHUWLQJWKHDSSOL-
FDWLRQWRXVHJUDSKLFVLQWHURSHUDELOLW\WKLVGURSVE\SHUFHQWWRPVSHU
IUDPH7KHQHWUHVXOWLVWKDWRXUUHQGHULQJORRSLVSHUFHQWIDVWHUDQGQRORQJHU
UHTXLUHVLQWHUYHQWLRQIURPWKHKRVWHYHU\WLPHZHZDQWWRGLVSOD\DIUDPH7KDWǢV
QRWEDGIRUDGD\ǢVZRUN
'LUHFW;,QWHURSHUDELOLW\
$OWKRXJKZHǢYHORRNHGRQO\DWH[DPSOHVWKDWXVHLQWHURSHUDWLRQZLWKWKH2SHQ*/
UHQGHULQJV\VWHP'LUHFW;LQWHURSHUDWLRQLVQHDUO\LGHQWLFDO<RXZLOOVWLOOXVHD
cudaGraphicsResourceWRUHIHUWREXIIHUVWKDW\RXVKDUHEHWZHHQ'LUHFW;
DQG&8'$DQG\RXZLOOVWLOOXVHFDOOVWRcudaGraphicsMapResources()DQG
cudaGraphicsResourceGetMappedPointer()WRUHWULHYH&8'$IULHQGO\
SRLQWHUVWRWKHVHVKDUHGUHVRXUFHV
)RUWKHPRVWSDUWWKHFDOOVWKDWGLIIHUEHWZHHQ2SHQ*/DQG'LUHFW;LQWHURSHUDELOLW\
KDYHHPEDUUDVVLQJO\VLPSOHWUDQVODWLRQVWR'LUHFW;)RUH[DPSOHUDWKHUWKDQ
FDOOLQJcudaGLSetGLDevice()ZHFDOOcudaD3D9SetDirect3DDevice()
WRVSHFLI\WKDWD&8'$GHYLFHVKRXOGEHHQDEOHGIRU'LUHFW'LQWHURSHUDELOLW\
160
/LNHZLVHcudaD3D10SetDirect3DDevice()HQDEOHVDGHYLFHIRU'LUHFW'
LQWHURSHUDWLRQDQGcudaD3D11SetDirect3DDevice()IRU'LUHFW'
7KHGHWDLOVRI'LUHFW;LQWHURSHUDELOLW\SUREDEO\ZLOOQRWVXUSULVH\RXLI\RXǢYH
ZRUNHGWKURXJKWKLVFKDSWHUǢV2SHQ*/H[DPSOHV%XWLI\RXZDQWWRXVH'LUHFW;
LQWHURSHUDWLRQDQGZDQWDVPDOOSURMHFWWRJHWVWDUWHGZHVXJJHVWWKDW\RX
PLJUDWHWKLVFKDSWHUǢVH[DPSOHVWRXVH'LUHFW;7RJHWVWDUWHGZHUHFRP-
PHQGFRQVXOWLQJWKHNVIDIA CUDA Programming GuideIRUDUHIHUHQFHRQWKH
$3,DQGWDNLQJDORRNDWWKH*38&RPSXWLQJ6'.FRGHVDPSOHVRQ'LUHFW;
LQWHURSHUDELOLW\
&KDSWHU5HYLHZ
$OWKRXJKPXFKRIWKLVERRNKDVEHHQGHYRWHGWRXVLQJWKH*38IRUSDUDOOHO
JHQHUDOSXUSRVHFRPSXWLQJZHFDQǢWIRUJHWWKH*38ǢVVXFFHVVIXOGD\MREDVD
UHQGHULQJHQJLQH0DQ\DSSOLFDWLRQVUHTXLUHRUZRXOGEHQHȌWIURPWKHXVHRI
VWDQGDUGFRPSXWHUJUDSKLFVUHQGHULQJ6LQFHWKH*38LVPDVWHURIWKHUHQGHULQJ
GRPDLQDOOWKDWVWRRGEHWZHHQXVDQGWKHH[SORLWDWLRQRIWKHVHUHVRXUFHVZDV
DODFNRIXQGHUVWDQGLQJRIWKHPHFKDQLFVLQFRQYLQFLQJWKH&8'$UXQWLPHDQG
JUDSKLFVGULYHUVWRFRRSHUDWH1RZWKDWZHKDYHVHHQKRZWKLVLVGRQHZH
QRORQJHUQHHGWKHKRVWWRLQWHUYHQHLQGLVSOD\LQJWKHJUDSKLFDOUHVXOWVRIRXU
FRPSXWDWLRQV7KLVVLPXOWDQHRXVO\DFFHOHUDWHVWKHDSSOLFDWLRQǢVUHQGHULQJORRS
DQGIUHHVWKHKRVWWRSHUIRUPRWKHUFRPSXWDWLRQVLQWKHPHDQWLPH2WKHUZLVH
LIWKHUHDUHQRRWKHUFRPSXWDWLRQVWREHSHUIRUPHGLWOHDYHVRXUV\VWHPPRUH
UHVSRQVLYHWRRWKHUHYHQWVRUDSSOLFDWLRQV
7KHUHDUHPDQ\RWKHUZD\VWRXVHJUDSKLFVLQWHURSHUDELOLW\WKDWZHOHIWXQH[-
SORUHG:HORRNHGSULPDULO\DWXVLQJD&8'$&NHUQHOWRZULWHLQWRDSL[HOEXIIHU
REMHFWIRUGLVSOD\LQDZLQGRZ7KLVLPDJHGDWDFDQDOVREHXVHGDVDWH[WXUHWKDW
FDQEHDSSOLHGWRDQ\VXUIDFHLQWKHVFHQH,QDGGLWLRQWRPRGLI\LQJSL[HOEXIIHU
REMHFWV\RXFDQDOVRVKDUHYHUWH[EXIIHUREMHFWVEHWZHHQ&8'$DQGWKHJUDSKLFV
HQJLQH$PRQJRWKHUWKLQJVWKLVDOORZV\RXWRZULWH&8'$&NHUQHOVWKDWSHUIRUP
FROOLVLRQGHWHFWLRQEHWZHHQREMHFWVRUFRPSXWHYHUWH[GLVSODFHPHQWPDSVWREH
XVHGWRUHQGHUREMHFWVRUVXUIDFHVWKDWLQWHUDFWZLWKWKHXVHURUWKHLUVXUURXQG-
LQJV,I\RXǢUHLQWHUHVWHGLQFRPSXWHUJUDSKLFV&8'$&ǢVJUDSKLFVLQWHURSHUDELOLW\
$3,HQDEOHVDVOHZRIQHZSRVVLELOLWLHVIRU\RXUDSSOLFDWLRQV
161
,QWKHȌUVWKDOIRIWKHERRNZHVDZPDQ\RFFDVLRQVZKHUHVRPHWKLQJFRPSOL-
FDWHGWRDFFRPSOLVKZLWKDVLQJOHWKUHDGHGDSSOLFDWLRQEHFRPHVTXLWHHDV\ZKHQ
LPSOHPHQWHGXVLQJ&8'$&)RUH[DPSOHWKDQNVWRWKHEHKLQGWKHVFHQHVZRUN
RIWKH&8'$UXQWLPHZHQRORQJHUQHHGHGfor()ORRSVLQRUGHUWRGRSHUSL[HO
XSGDWHVLQRXUDQLPDWLRQVRUKHDWVLPXODWLRQV/LNHZLVHWKRXVDQGVRISDUDOOHO
EORFNVDQGWKUHDGVJHWFUHDWHGDQGDXWRPDWLFDOO\HQXPHUDWHGZLWKWKUHDGDQG
EORFNLQGLFHVVLPSO\E\FDOOLQJD__global__IXQFWLRQIURPKRVWFRGH
2QWKHRWKHUKDQGWKHUHDUHVRPHVLWXDWLRQVZKHUHVRPHWKLQJLQFUHGLEO\VLPSOH
LQVLQJOHWKUHDGHGDSSOLFDWLRQVDFWXDOO\SUHVHQWVDVHULRXVSUREOHPZKHQZHWU\
WRLPSOHPHQWWKHVDPHDOJRULWKPRQDPDVVLYHO\SDUDOOHODUFKLWHFWXUH,QWKLV
FKDSWHUZHǢOOWDNHDORRNDWVRPHRIWKHVLWXDWLRQVZKHUHZHQHHGWRXVHVSHFLDO
SULPLWLYHVLQRUGHUWRVDIHO\DFFRPSOLVKWKLQJVWKDWFDQEHTXLWHWULYLDOWRGRLQD
WUDGLWLRQDOVLQJOHWKUHDGHGDSSOLFDWLRQ
163
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWWKHcompute capabilityRIYDULRXV19,',$*38V
ǩ <RXZLOOOHDUQDERXWZKDWDWRPLFRSHUDWLRQVDUHDQGZK\\RXPLJKWQHHGWKHP
ǩ <RXZLOOOHDUQKRZWRSHUIRUPDULWKPHWLFZLWKDWRPLFRSHUDWLRQVLQ\RXU&8'$
&NHUQHOV
&RPSXWH&DSDELOLW\
$OORIWKHWRSLFVZHKDYHFRYHUHGWRWKLVSRLQWLQYROYHFDSDELOLWLHVWKDWHYHU\
&8'$HQDEOHG*38SRVVHVVHV)RUH[DPSOHHYHU\*38EXLOWRQWKH&8'$
$UFKLWHFWXUHFDQODXQFKNHUQHOVDFFHVVJOREDOPHPRU\DQGUHDGIURPFRQVWDQW
DQGWH[WXUHPHPRULHV%XWMXVWOLNHGLIIHUHQWPRGHOVRI&38VKDYHYDU\LQJFDSD-
ELOLWLHVDQGLQVWUXFWLRQVHWV IRUH[DPSOH00;66(RU66( VRWRRGR&8'$
HQDEOHGJUDSKLFVSURFHVVRUV19,',$UHIHUVWRWKHVXSSRUWHGIHDWXUHVRID*38DV
LWVcompute capability
7+(&20387(&$3$%,/,7<2)19,',$*386
$VRISUHVVWLPH19,',$*38VFRXOGSRWHQWLDOO\VXSSRUWFRPSXWHFDSDELOLWLHV
RU+LJKHUFDSDELOLW\YHUVLRQVUHSUHVHQWVXSHUVHWVRIWKHYHUVLRQV
EHORZWKHPLPSOHPHQWLQJDǤOD\HUHGRQLRQǥRUǤ5XVVLDQQHVWLQJGROOǥKLHUDUFK\
GHSHQGLQJRQ\RXUPHWDSKRULFDOSUHIHUHQFH )RUH[DPSOHD*38ZLWKFRPSXWH
FDSDELOLW\VXSSRUWVDOOWKHIHDWXUHVRIFRPSXWHFDSDELOLWLHVDQG7KH
NVIDIA CUDA Programming GuideFRQWDLQVDQXSWRGDWHOLVWRIDOO&8'$FDSDEOH
*38VDQGWKHLUFRUUHVSRQGLQJFRPSXWHFDSDELOLW\7DEOHOLVWVWKH19,',$*38V
DYDLODEOHDWSUHVVWLPH7KHFRPSXWHFDSDELOLW\VXSSRUWHGE\HDFK*38LVOLVWHG
QH[WWRWKHGHYLFHǢVQDPH
164
COMPUTE
GPU CAPABILITY
*H)RUFH*7;*7;
*H)RUFH*7;
*H)RUFH*7;*7;
*H)RUFH*7;
*H)RUFH*;
*H)RUFH*76*76*7;*7;*76
*H)RUFH8OWUD*7;
*H)RUFH*7*7*7;00*7;
*H)RUFH*7*62*60*7;*7;00*7
*H)RUFH*76
*H)RUFH*70*760*76
*H)RUFH0*7
*H)RUFH*7*7*76*70*70*60
*70*60*60*70*70*6
*H)RUFH**7*60*70*0*0*6
P*38P*38P*38P*38P*38
*H)RUFH0*60*60*0*
7HVOD66&&
7HVOD6&
Continued
165
COMPUTE
GPU CAPABILITY
7HVOD6'&
4XDGUR3OH['
4XDGUR3OH['
4XDGUR3OH[0RGHO6
4XDGUR3OH[0RGHO,9
4XDGUR);
4XDGUR);
4XDGUR);;
4XDGUR);0
4XDGUR);
4XDGUR);
4XDGUR);0
4XDGUR);
4XDGUR);0
4XDGUR););1960);0);0);0);
0
4XDGUR);19619601960);0
4XDGUR);01960
166
2IFRXUVHVLQFH19,',$UHOHDVHVQHZJUDSKLFVSURFHVVRUVDOOWKHWLPHWKLVWDEOH
ZLOOXQGRXEWHGO\EHRXWRIGDWHWKHPRPHQWWKLVERRNLVSXEOLVKHG)RUWXQDWHO\
19,',$KDVDZHEVLWHDQGRQWKLVZHEVLWH\RXZLOOȌQGWKH&8'$=RQH$PRQJ
RWKHUWKLQJVWKH&8'$=RQHLVKRPHWRWKHPRVWXSWRGDWHOLVWRIVXSSRUWHG
&8'$GHYLFHV:HUHFRPPHQGWKDW\RXFRQVXOWWKLVOLVWEHIRUHGRLQJDQ\WKLQJ
GUDVWLFDVDUHVXOWRIEHLQJXQDEOHWRȌQG\RXUQHZ*38LQ7DEOH2U\RXFDQ
VLPSO\UXQWKHH[DPSOHIURP&KDSWHUWKDWSULQWVWKHFRPSXWHFDSDELOLW\RIHDFK
&8'$GHYLFHLQWKHV\VWHP
%HFDXVHWKLVLVWKHFKDSWHURQDWRPLFVRISDUWLFXODUUHOHYDQFHLVWKHKDUGZDUH
FDSDELOLW\WRSHUIRUPDWRPLFRSHUDWLRQVRQPHPRU\%HIRUHZHORRNDWZKDW
DWRPLFRSHUDWLRQVDUHDQGZK\\RXFDUH\RXVKRXOGNQRZWKDWDWRPLFRSHUD-
WLRQVRQJOREDOPHPRU\DUHVXSSRUWHGRQO\RQ*38VRIFRPSXWHFDSDELOLW\
RUKLJKHU)XUWKHUPRUHDWRPLFRSHUDWLRQVRQsharedPHPRU\UHTXLUHD*38RI
FRPSXWHFDSDELOLW\RUKLJKHU%HFDXVHRIWKHVXSHUVHWQDWXUHRIFRPSXWH
FDSDELOLW\YHUVLRQV*38VRIFRPSXWHFDSDELOLW\WKHUHIRUHVXSSRUWERWKVKDUHG
PHPRU\DWRPLFVDQGJOREDOPHPRU\DWRPLFV6LPLODUO\*38VRIFRPSXWHFDSD-
ELOLW\VXSSRUWERWKRIWKHVHDVZHOO
,ILWWXUQVRXWWKDW\RXU*38LVRIFRPSXWHFDSDELOLW\DQGLWGRHVQǢWVXSSRUW
DWRPLFRSHUDWLRQVRQJOREDOPHPRU\ZHOOPD\EHZHǢYHMXVWJLYHQ\RXWKHSHUIHFW
H[FXVHWRXSJUDGH,I\RXGHFLGH\RXǢUHQRWUHDG\WRVSOXUJHRQDQHZDWRPLFV
HQDEOHGJUDSKLFVSURFHVVRU\RXFDQFRQWLQXHWRUHDGDERXWDWRPLFRSHUDWLRQV
DQGWKHVLWXDWLRQVLQZKLFK\RXPLJKWZDQWWRXVHWKHP%XWLI\RXȌQGLWWRR
KHDUWEUHDNLQJWKDW\RXZRQǢWEHDEOHWRUXQWKHH[DPSOHVIHHOIUHHWRVNLSWRWKH
QH[WFKDSWHU
&203,/,1*)25$0,1,080&20387(&$3$%,/,7<
6XSSRVHWKDWZHKDYHZULWWHQFRGHWKDWUHTXLUHVDFHUWDLQPLQLPXPFRPSXWH
FDSDELOLW\)RUH[DPSOHLPDJLQHWKDW\RXǢYHȌQLVKHGWKLVFKDSWHUDQGJRRIIWR
ZULWHDQDSSOLFDWLRQWKDWUHOLHVKHDYLO\RQJOREDOPHPRU\DWRPLFV+DYLQJVWXGLHG
WKLVWH[WH[WHQVLYHO\\RXNQRZWKDWJOREDOPHPRU\DWRPLFVUHTXLUHDFRPSXWH
FDSDELOLW\RI7RFRPSLOH\RXUFRGH\RXQHHGWRLQIRUPWKHFRPSLOHUWKDWWKH
NHUQHOFDQQRWUXQRQKDUGZDUHZLWKDFDSDELOLW\OHVVWKDQ0RUHRYHULQWHOOLQJ
WKHFRPSLOHUWKLV\RXǢUHDOVRJLYLQJLWWKHIUHHGRPWRPDNHRWKHURSWLPL]DWLRQV
WKDWPD\EHDYDLODEOHRQO\RQ*38VRIFRPSXWHFDSDELOLW\RUJUHDWHU,QIRUPLQJ
167
WKHFRPSLOHURIWKLVLVDVVLPSOHDVDGGLQJDFRPPDQGOLQHRSWLRQWR\RXULQYRFD-
WLRQRInvcc
nvcc -arch=sm _ 11
6LPLODUO\WREXLOGDNHUQHOWKDWUHOLHVRQVKDUHGPHPRU\DWRPLFV\RXQHHGWR
LQIRUPWKHFRPSLOHUWKDWWKHFRGHUHTXLUHVFRPSXWHFDSDELOLW\RUJUHDWHU
nvcc -arch=sm _ 12
$WRPLF2SHUDWLRQV2YHUYLHZ
3URJUDPPHUVW\SLFDOO\QHYHUQHHGWRXVHDWRPLFRSHUDWLRQVZKHQZULWLQJWUDGL-
WLRQDOVLQJOHWKUHDGHGDSSOLFDWLRQV,IWKLVLVWKHVLWXDWLRQZLWK\RXGRQǢWZRUU\
ZHSODQWRH[SODLQZKDWWKH\DUHDQGZK\ZHPLJKWQHHGWKHPLQDPXOWLWKUHDGHG
DSSOLFDWLRQ7RFODULI\DWRPLFRSHUDWLRQVZHǢOOORRNDWRQHRIWKHȌUVWWKLQJV\RX
OHDUQHGZKHQOHDUQLQJ&RU&WKHLQFUHPHQWRSHUDWRU
x++;
7KLVLVDVLQJOHH[SUHVVLRQLQVWDQGDUG&DQGDIWHUH[HFXWLQJWKLVH[SUHVVLRQWKH
YDOXHLQxVKRXOGEHRQHJUHDWHUWKDQLWZDVSULRUWRH[HFXWLQJWKHLQFUHPHQW%XW
ZKDWVHTXHQFHRIRSHUDWLRQVGRHVWKLVLPSO\"7RDGGRQHWRWKHYDOXHRIx, we
ȌUVWQHHGWRNQRZZKDWYDOXHLVFXUUHQWO\LQx$IWHUUHDGLQJWKHYDOXHRIx, we
FDQPRGLI\LW$QGȌQDOO\ZHQHHGWRZULWHWKLVYDOXHEDFNWRx
6RWKHWKUHHVWHSVLQWKLVRSHUDWLRQDUHDVIROORZV
5HDGWKHYDOXHLQx
$GGWRWKHYDOXHUHDGLQVWHS
:ULWHWKHUHVXOWEDFNWRx
6RPHWLPHVWKLVSURFHVVLVJHQHUDOO\FDOOHGDread-modify-writeRSHUDWLRQVLQFH
VWHSFDQFRQVLVWRIDQ\RSHUDWLRQWKDWFKDQJHVWKHYDOXHWKDWZDVUHDGIURPx
1RZFRQVLGHUDVLWXDWLRQZKHUHWZRWKUHDGVQHHGWRSHUIRUPWKLVLQFUHPHQWRQ
WKHYDOXHLQx/HWǢVFDOOWKHVHWKUHDGVADQGB)RUADQGBWRERWKLQFUHPHQWWKH
YDOXHLQxERWKWKUHDGVQHHGWRSHUIRUPWKHWKUHHRSHUDWLRQVZHǢYHGHVFULEHG
/HWǢVVXSSRVHxVWDUWVZLWKWKHYDOXH,GHDOO\ZHZRXOGOLNHWKUHDGADQGWKUHDG
BWRGRWKHVWHSVVKRZQLQ7DEOH
168
STEP EXAMPLE
7KUHDG$UHDGVWKHYDOXHLQx A UHDGVIURPx
7KUHDG$DGGVWRWKHYDOXHLWUHDG A FRPSXWHV
7KUHDG$ZULWHVWKHUHVXOWEDFNWRx x <- 8
7KUHDG%UHDGVWKHYDOXHLQx BUHDGVIURPx
7KUHDG%DGGVWRWKHYDOXHLWUHDG BFRPSXWHV
7KUHDG%ZULWHVWKHUHVXOWEDFNWRx x <- 9
6LQFHxVWDUWVZLWKWKHYDOXHDQGJHWVLQFUHPHQWHGE\WZRWKUHDGVZHZRXOG
H[SHFWLWWRKROGWKHYDOXHDIWHUWKH\ǢYHFRPSOHWHG,QWKHSUHYLRXVVHTXHQFH
RIRSHUDWLRQVWKLVLVLQGHHGWKHUHVXOWZHREWDLQ8QIRUWXQDWHO\WKHUHDUHPDQ\
RWKHURUGHULQJVRIWKHVHVWHSVWKDWSURGXFHWKHZURQJYDOXH)RUH[DPSOH
FRQVLGHUWKHRUGHULQJVKRZQLQ7DEOHZKHUHWKUHDG$DQGWKUHDG%ǢVRSHUD-
WLRQVEHFRPHLQWHUOHDYHGZLWKHDFKRWKHU
STEP EXAMPLE
7KUHDG$UHDGVWKHYDOXHLQx AUHDGVIURPx
7KUHDG%UHDGVWKHYDOXHLQx BUHDGVIURPx
7KUHDG$DGGVWRWKHYDOXHLWUHDG AFRPSXWHV
7KUHDG%DGGVWRWKHYDOXHLWUHDG BFRPSXWHV
7KUHDG$ZULWHVWKHUHVXOWEDFNWRx x <- 8
7KUHDG%ZULWHVWKHUHVXOWEDFNWRx x <- 8
169
7KHUHIRUHLIRXUWKUHDGVJHWVFKHGXOHGXQIDYRUDEO\ZHHQGXSFRPSXWLQJWKH
ZURQJUHVXOW7KHUHDUHPDQ\RWKHURUGHULQJVIRUWKHVHVL[RSHUDWLRQVVRPH
RIZKLFKSURGXFHFRUUHFWUHVXOWVDQGVRPHRIZKLFKGRQRW:KHQPRYLQJIURP
DVLQJOHWKUHDGHGWRDPXOWLWKUHDGHGYHUVLRQRIWKLVDSSOLFDWLRQZHVXGGHQO\
KDYHSRWHQWLDOIRUXQSUHGLFWDEOHUHVXOWVLIPXOWLSOHWKUHDGVQHHGWRUHDGRUZULWH
VKDUHGYDOXHV
,QWKHSUHYLRXVH[DPSOHZHQHHGDZD\WRSHUIRUPWKHread-modify-writeZLWKRXW
EHLQJLQWHUUXSWHGE\DQRWKHUWKUHDG2UPRUHVSHFLȌFDOO\QRRWKHUWKUHDGFDQ
UHDGRUZULWHWKHYDOXHRIxXQWLOZHKDYHFRPSOHWHGRXURSHUDWLRQ%HFDXVH
WKHH[HFXWLRQRIWKHVHRSHUDWLRQVFDQQRWEHEURNHQLQWRVPDOOHUSDUWVE\RWKHU
WKUHDGVZHFDOORSHUDWLRQVWKDWVDWLVI\WKLVFRQVWUDLQWDVatomic&8'$&
VXSSRUWVVHYHUDODWRPLFRSHUDWLRQVWKDWDOORZ\RXWRRSHUDWHVDIHO\RQPHPRU\
HYHQZKHQWKRXVDQGVRIWKUHDGVDUHSRWHQWLDOO\FRPSHWLQJIRUDFFHVV
1RZZHǢOOWDNHDORRNDWDQH[DPSOHWKDWUHTXLUHVWKHXVHRIDWRPLFRSHUDWLRQVWR
FRPSXWHFRUUHFWUHVXOWV
&RPSXWLQJ+LVWRJUDPV
2IWHQWLPHVDOJRULWKPVUHTXLUHWKHFRPSXWDWLRQRIDhistogramRIVRPHVHWRI
GDWD,I\RXKDYHQǢWKDGDQ\H[SHULHQFHZLWKKLVWRJUDPVLQWKHSDVWWKDWǢVQRW
DELJGHDO(VVHQWLDOO\JLYHQDGDWDVHWWKDWFRQVLVWVRIVRPHVHWRIHOHPHQWVD
KLVWRJUDPUHSUHVHQWVDFRXQWRIWKHIUHTXHQF\RIHDFKHOHPHQW)RUH[DPSOHLI
ZHFUHDWHGDKLVWRJUDPRIWKHOHWWHUVLQWKHSKUDVHProgramming with CUDA C, we
ZRXOGHQGXSZLWKWKHUHVXOWVKRZQLQ)LJXUH
$OWKRXJKVLPSOHWRGHVFULEHDQGXQGHUVWDQGFRPSXWLQJKLVWRJUDPVRIGDWD
DULVHVVXUSULVLQJO\RIWHQLQFRPSXWHUVFLHQFH,WǢVXVHGLQDOJRULWKPVIRULPDJH
SURFHVVLQJGDWDFRPSUHVVLRQFRPSXWHUYLVLRQPDFKLQHOHDUQLQJDXGLR
HQFRGLQJDQGPDQ\RWKHUV:HZLOOXVHKLVWRJUDPFRPSXWDWLRQDVWKHDOJRULWKP
IRUWKHIROORZLQJFRGHH[DPSOHV
170
&38+,672*5$0&20387$7,21
%HFDXVHWKHFRPSXWDWLRQRIDKLVWRJUDPPD\QRWEHIDPLOLDUWRDOOUHDGHUVZHǢOO
VWDUWZLWKDQH[DPSOHRIKRZWRFRPSXWHDKLVWRJUDPRQWKH&387KLVH[DPSOH
ZLOODOVRVHUYHWRLOOXVWUDWHKRZFRPSXWLQJDKLVWRJUDPLVUHODWLYHO\VLPSOHLQD
VLQJOHWKUHDGHG&38DSSOLFDWLRQ7KHDSSOLFDWLRQZLOOEHJLYHQVRPHODUJHVWUHDP
RIGDWD,QDQDFWXDODSSOLFDWLRQWKHGDWDPLJKWVLJQLI\DQ\WKLQJIURPSL[HOFRORUV
WRDXGLRVDPSOHVEXWLQRXUVDPSOHDSSOLFDWLRQLWZLOOEHDVWUHDPRIUDQGRPO\
JHQHUDWHGE\WHV:HFDQFUHDWHWKLVUDQGRPVWUHDPRIE\WHVXVLQJDXWLOLW\IXQF-
WLRQZHKDYHSURYLGHGFDOOHGbig_random_block(),QRXUDSSOLFDWLRQZH
FUHDWH0%RIUDQGRPGDWD
#include "../common/book.h"
6LQFHHDFKUDQGRPELWE\WHFDQEHDQ\RIGLIIHUHQWYDOXHV IURP0x00WR
0xFF RXUKLVWRJUDPQHHGVWRFRQWDLQbinsLQRUGHUWRNHHSWUDFNRIWKH
QXPEHURIWLPHVHDFKYDOXHKDVEHHQVHHQLQWKHGDWD:HFUHDWHDELQDUUD\
DQGLQLWLDOL]HDOOWKHELQFRXQWVWR]HUR
2QFHRXUKLVWRJUDPKDVEHHQFUHDWHGDQGDOOWKHELQVDUHLQLWLDOL]HGWR]HUR
ZHQHHGWRWDEXODWHWKHIUHTXHQF\ZLWKZKLFKHDFKYDOXHDSSHDUVLQWKHGDWD
FRQWDLQHGLQbuffer[]7KHLGHDKHUHLVWKDWZKHQHYHUZHVHHVRPHYDOXHzLQ
WKHDUUD\buffer[]ZHZDQWWRLQFUHPHQWWKHYDOXHLQELQzRIRXUKLVWRJUDP
7KLVZD\ZHǢUHFRXQWLQJWKHQXPEHURIWLPHVZHKDYHVHHQDQRFFXUUHQFHRIWKH
YDOXHz
171
,Ibuffer[i]LVWKHFXUUHQWYDOXHZHDUHORRNLQJDWZHZDQWWRLQFUHPHQWWKH
FRXQWZHKDYHLQWKHELQQXPEHUHGbuffer[i]6LQFHELQbuffer[i]LVORFDWHG
DWhisto[buffer[i]]ZHFDQLQFUHPHQWWKHDSSURSULDWHFRXQWHULQDVLQJOH
OLQHRIFRGH
histo[buffer[i]]++;
:HGRWKLVIRUHDFKHOHPHQWLQbuffer[]ZLWKDVLPSOHfor()ORRS
$WWKLVSRLQWZHǢYHFRPSOHWHGRXUKLVWRJUDPRIWKHLQSXWGDWD,QDIXOODSSOLFD-
WLRQWKLVKLVWRJUDPPLJKWEHWKHLQSXWWRWKHQH[WVWHSRIFRPSXWDWLRQ,QRXU
VLPSOHH[DPSOHKRZHYHUWKLVLVDOOZHFDUHWRFRPSXWHVRZHHQGWKHDSSOLFD-
WLRQE\YHULI\LQJWKDWDOOWKHELQVRIRXUKLVWRJUDPVXPWRWKHH[SHFWHGYDOXH
long histoCount = 0;
for (int i=0; i<256; i++) {
histoCount += histo[i];
}
printf( "Histogram Sum: %ld\n", histoCount );
,I\RXǢYHIROORZHGFORVHO\\RXZLOOUHDOL]HWKDWWKLVVXPZLOODOZD\VEHWKHVDPH
UHJDUGOHVVRIWKHUDQGRPLQSXWDUUD\(DFKELQFRXQWVWKHQXPEHURIWLPHVZH
KDYHVHHQWKHFRUUHVSRQGLQJGDWDHOHPHQWVRWKHVXPRIDOORIWKHVHELQVVKRXOG
EHWKHWRWDOQXPEHURIGDWDHOHPHQWVZHǢYHH[DPLQHG,QRXUFDVHWKLVZLOOEHWKH
YDOXHSIZE
free( buffer );
return 0;
}
172
2QRXUEHQFKPDUNPDFKLQHD&RUH'XRWKHKLVWRJUDPRIWKLV0%DUUD\RI
GDWDFDQEHFRQVWUXFWHGLQVHFRQGV7KLVZLOOSURYLGHDEDVHOLQHSHUIRU-
PDQFHIRUWKH*38YHUVLRQZHLQWHQGWRZULWH
*38+,672*5$0&20387$7,21
:HZRXOGOLNHWRDGDSWWKHKLVWRJUDPFRPSXWDWLRQH[DPSOHWRUXQRQWKH*38
,IRXULQSXWDUUD\LVODUJHHQRXJKLWPLJKWVDYHDFRQVLGHUDEOHDPRXQWRIWLPH
WRKDYHGLIIHUHQWWKUHDGVH[DPLQLQJGLIIHUHQWSDUWVRIWKHEXIIHU+DYLQJGLIIHUHQW
WKUHDGVUHDGGLIIHUHQWSDUWVRIWKHLQSXWVKRXOGEHHDV\HQRXJK$IWHUDOOLWǢVYHU\
VLPLODUWRWKLQJVZHKDYHVHHQVRIDU7KHSUREOHPZLWKFRPSXWLQJDKLVWRJUDP
IURPWKHLQSXWGDWDDULVHVIURPWKHIDFWWKDWPXOWLSOHWKUHDGVPD\ZDQWWRLQFUH-
PHQWWKHVDPHELQRIWKHRXWSXWKLVWRJUDPDWWKHVDPHWLPH,QWKLVVLWXDWLRQZH
ZLOOQHHGWRXVHDWRPLFLQFUHPHQWVWRDYRLGDVLWXDWLRQOLNHWKHRQHGHVFULEHGLQ
6HFWLRQ$WRPLF2SHUDWLRQV2YHUYLHZ
2XUmain()URXWLQHORRNVYHU\VLPLODUWRWKH&38YHUVLRQDOWKRXJKZHZLOOQHHG
WRDGGVRPHRIWKH&8'$&SOXPELQJLQRUGHUWRJHWLQSXWWRWKH*38DQGUHVXOWV
IURPWKH*38+RZHYHUZHVWDUWH[DFWO\DVZHGLGRQWKH&38
:HZLOOEHLQWHUHVWHGLQPHDVXULQJKRZRXUFRGHSHUIRUPVVRZHLQLWLDOL]HHYHQWV
IRUWLPLQJH[DFWO\OLNHZHDOZD\VKDYH
$IWHUVHWWLQJXSRXULQSXWGDWDDQGHYHQWVZHORRNWR*38PHPRU\:H
ZLOOQHHGWRDOORFDWHVSDFHIRURXUUDQGRPLQSXWGDWDDQGRXURXWSXWKLVWR-
JUDP$IWHUDOORFDWLQJWKHLQSXWEXIIHUZHFRS\WKHDUUD\ZHJHQHUDWHGZLWK
173
big_random_block()WRWKH*38/LNHZLVHDIWHUDOORFDWLQJWKHKLVWRJUDPZH
LQLWLDOL]HLWWR]HURMXVWOLNHZHGLGLQWKH&38YHUVLRQ
<RXPD\QRWLFHWKDWZHVOLSSHGLQDQHZ&8'$UXQWLPHIXQFWLRQcudaMemset()
7KLVIXQFWLRQKDVDVLPLODUVLJQDWXUHWRWKHVWDQGDUG&IXQFWLRQmemset()DQG
WKHWZRIXQFWLRQVEHKDYHQHDUO\LGHQWLFDOO\7KHGLIIHUHQFHLQVLJQDWXUHLVEHWZHHQ
WKHVHIXQFWLRQVLVWKDWcudaMemset()UHWXUQVDQHUURUFRGHZKLOHWKH&OLEUDU\
IXQFWLRQmemset()GRHVQRW7KLVHUURUFRGHZLOOLQIRUPWKHFDOOHUZKHWKHU
DQ\WKLQJEDGKDSSHQHGZKLOHDWWHPSWLQJWRVHW*38PHPRU\$VLGHIURPWKH
HUURUFRGHUHWXUQWKHRQO\GLIIHUHQFHLVWKDWcudaMemset()RSHUDWHVRQ*38
PHPRU\ZKLOHmemset()RSHUDWHVRQKRVWPHPRU\
$IWHULQLWLDOL]LQJWKHLQSXWDQGRXWSXWEXIIHUVZHDUHUHDG\WRFRPSXWHRXUKLVWR-
JUDP<RXZLOOVHHKRZZHSUHSDUHDQGODXQFKWKHKLVWRJUDPNHUQHOPRPHQWDULO\
)RUWKHWLPHEHLQJDVVXPHWKDWZHKDYHFRPSXWHGWKHKLVWRJUDPRQWKH*38
$IWHUȌQLVKLQJZHQHHGWRFRS\WKHKLVWRJUDPEDFNWRWKH&38VRZHDOORFDWHD
HQWU\DUUD\DQGSHUIRUPDFRS\IURPGHYLFHWRKRVW
174
$WWKLVSRLQWZHDUHGRQHZLWKWKHKLVWRJUDPFRPSXWDWLRQVRZHFDQVWRSRXU
WLPHUVDQGGLVSOD\WKHHODSVHGWLPH-XVWOLNHWKHSUHYLRXVHYHQWFRGHWKLVLV
LGHQWLFDOWRWKHWLPLQJFRGHZHǢYHXVHGIRUVHYHUDOFKDSWHUV
$WWKLVSRLQWZHFRXOGSDVVWKHKLVWRJUDPDVLQSXWWRDQRWKHUVWDJHLQWKHDOJR-
ULWKPEXWVLQFHZHDUHQRWXVLQJWKHKLVWRJUDPIRUDQ\WKLQJHOVHZHZLOOVLPSO\
YHULI\WKDWWKHFRPSXWHG*38KLVWRJUDPPDWFKHVZKDWZHJHWRQWKH&38)LUVW
ZHYHULI\WKDWWKHKLVWRJUDPVXPPDWFKHVZKDWZHH[SHFW7KLVLVLGHQWLFDOWRWKH
&38FRGHVKRZQKHUH
long histoCount = 0;
for (int i=0; i<256; i++) {
histoCount += histo[i];
}
printf( "Histogram Sum: %ld\n", histoCount );
7RIXOO\YHULI\WKH*38KLVWRJUDPWKRXJKZHZLOOXVHWKH&38WRFRPSXWHWKH
VDPHKLVWRJUDP7KHREYLRXVZD\WRGRWKLVZRXOGEHWRDOORFDWHDQHZKLVWRJUDP
DUUD\FRPSXWHDKLVWRJUDPIURPWKHLQSXWXVLQJWKHFRGHIURP6HFWLRQ
&38+LVWRJUDP&RPSXWDWLRQDQGȌQDOO\HQVXUHWKDWHDFKELQLQWKH*38DQG
&38YHUVLRQPDWFK%XWUDWKHUWKDQDOORFDWHDQHZKLVWRJUDPDUUD\ZHǢOORSWWR
VWDUWZLWKWKH*38KLVWRJUDPDQGFRPSXWHWKH&38KLVWRJUDPǤLQUHYHUVHǥ
%\FRPSXWLQJWKHKLVWRJUDPǤLQUHYHUVHǥZHPHDQWKDWUDWKHUWKDQVWDUWLQJ
DW]HURDQGLQFUHPHQWLQJELQYDOXHVZKHQZHVHHGDWDHOHPHQWVZHZLOOVWDUW
ZLWKWKH*38KLVWRJUDPDQGdecrementWKHELQǢVYDOXHZKHQWKH&38VHHVGDWD
HOHPHQWV7KHUHIRUHWKH&38KDVFRPSXWHGWKHVDPHKLVWRJUDPDVWKH*38LI
DQGRQO\LIHYHU\ELQKDVWKHYDOXH]HURZKHQZHDUHȌQLVKHG,QVRPHVHQVHZH
DUHFRPSXWLQJWKHGLIIHUHQFHEHWZHHQWKHVHWZRKLVWRJUDPV7KHFRGHZLOOORRN
175
UHPDUNDEO\OLNHWKH&38KLVWRJUDPFRPSXWDWLRQEXWZLWKDGHFUHPHQWRSHUDWRU
LQVWHDGRIDQLQFUHPHQWRSHUDWRU
$VXVXDOWKHȌQDOHLQYROYHVFOHDQLQJXSRXUDOORFDWHG&8'$HYHQWV*38
PHPRU\DQGKRVWPHPRU\
%HIRUHZHDVVXPHGWKDWZHKDGODXQFKHGDNHUQHOWKDWFRPSXWHGRXUKLVWRJUDP
DQGWKHQSUHVVHGRQWRGLVFXVVWKHDIWHUPDWK2XUNHUQHOODXQFKLVVOLJKWO\PRUH
FRPSOLFDWHGWKDQXVXDOEHFDXVHRISHUIRUPDQFHFRQFHUQV%HFDXVHWKHKLVWR-
JUDPFRQWDLQVELQVXVLQJWKUHDGVSHUEORFNSURYHVFRQYHQLHQWDVZHOODV
UHVXOWVLQKLJKSHUIRUPDQFH%XWZHKDYHDORWRIȍH[LELOLW\LQWHUPVRIWKHQXPEHU
RIEORFNVZHODXQFK)RUH[DPSOHZLWK0%RIGDWDZHKDYHE\WHV
RIGDWD:HFRXOGODXQFKDVLQJOHEORFNDQGKDYHHDFKWKUHDGH[DPLQH
GDWDHOHPHQWV/LNHZLVHZHFRXOGODXQFKEORFNVDQGKDYHHDFKWKUHDG
H[DPLQHDVLQJOHGDWDHOHPHQW
$V\RXPLJKWKDYHJXHVVHGWKHRSWLPDOVROXWLRQLVDWDSRLQWEHWZHHQWKHVHWZR
H[WUHPHV%\UXQQLQJVRPHSHUIRUPDQFHH[SHULPHQWVRSWLPDOSHUIRUPDQFHLV
DFKLHYHGZKHQWKHQXPEHURIEORFNVZHODXQFKLVH[DFWO\WZLFHWKHQXPEHURI
PXOWLSURFHVVRUVRXU*38FRQWDLQV)RUH[DPSOHD*H)RUFH*7;KDVPXOWL-
SURFHVVRUVVRRXUKLVWRJUDPNHUQHOKDSSHQVWRUXQIDVWHVWRQD*H)RUFH*7;
ZKHQODXQFKHGZLWKSDUDOOHOEORFNV
176
,Q&KDSWHUZHGLVFXVVHGDPHWKRGIRUTXHU\LQJYDULRXVSURSHUWLHVRIWKH
KDUGZDUHRQZKLFKRXUSURJUDPLVUXQQLQJ:HZLOOQHHGWRXVHRQHRIWKHVH
GHYLFHSURSHUWLHVLIZHLQWHQGWRG\QDPLFDOO\VL]HRXUODXQFKEDVHGRQRXUFXUUHQW
KDUGZDUHSODWIRUP7RDFFRPSOLVKWKLVZHZLOOXVHWKHIROORZLQJFRGHVHJPHQW
$OWKRXJK\RXKDYHQǢW\HWVHHQWKHNHUQHOLPSOHPHQWDWLRQ\RXVKRXOGVWLOOEHDEOH
WRIROORZZKDWLVJRLQJRQ
cudaDeviceProp prop;
HANDLE_ERROR( cudaGetDeviceProperties( &prop, 0 ) );
int blocks = prop.multiProcessorCount;
histo_kernel<<<blocks*2,256>>>( dev_buffer, SIZE, dev_histo );
6LQFHRXUZDONWKURXJKRImain()KDVEHHQVRPHZKDWIUDJPHQWHGKHUHLVWKH
HQWLUHURXWLQHIURPVWDUWWRȌQLVK
177
cudaDeviceProp prop;
HANDLE_ERROR( cudaGetDeviceProperties( &prop, 0 ) );
int blocks = prop.multiProcessorCount;
histo_kernel<<<blocks*2,256>>>( dev_buffer, SIZE, dev_histo );
long histoCount = 0;
for (int i=0; i<256; i++) {
histoCount += histo[i];
}
printf( "Histogram Sum: %ld\n", histoCount );
178
cudaFree( dev_histo );
cudaFree( dev_buffer );
free( buffer );
return 0;
}
+,672*5$0.(51(/86,1**/2%$/0(025<$720,&6
$QGQRZIRUWKHIXQSDUWWKH*38FRGHWKDWFRPSXWHVWKHKLVWRJUDP7KHNHUQHO
WKDWFRPSXWHVWKHKLVWRJUDPLWVHOIQHHGVWREHJLYHQDSRLQWHUWRWKHLQSXW
GDWDDUUD\WKHOHQJWKRIWKHLQSXWDUUD\DQGDSRLQWHUWRWKHRXWSXWKLVWRJUDP
7KHȌUVWWKLQJRXUNHUQHOQHHGVWRFRPSXWHLVDOLQHDUL]HGRIIVHWLQWRWKHLQSXW
GDWDDUUD\(DFKWKUHDGZLOOVWDUWZLWKDQRIIVHWEHWZHHQDQGWKHQXPEHURI
WKUHDGVPLQXV,WZLOOWKHQVWULGHE\WKHWRWDOQXPEHURIWKUHDGVWKDWKDYHEHHQ
ODXQFKHG:HKRSH\RXUHPHPEHUWKLVWHFKQLTXHZHXVHGWKHVDPHORJLFWRDGG
YHFWRUVRIDUELWUDU\OHQJWKZKHQ\RXȌUVWOHDUQHGDERXWWKUHDGV
#include "../common/book.h"
2QFHHDFKWKUHDGNQRZVLWVVWDUWLQJRIIVHWiDQGWKHVWULGHLWVKRXOGXVHWKHFRGH
ZDONVWKURXJKWKHLQSXWDUUD\LQFUHPHQWLQJWKHFRUUHVSRQGLQJKLVWRJUDPELQ
179
7KHKLJKOLJKWHGOLQHUHSUHVHQWVWKHZD\ZHXVHDWRPLFRSHUDWLRQVLQ&8'$&
7KHFDOOatomicAdd( addr, y );JHQHUDWHVDQDWRPLFVHTXHQFHRIRSHUD-
WLRQVWKDWUHDGWKHYDOXHDWDGGUHVVaddrDGGVyWRWKDWYDOXHDQGVWRUHVWKH
UHVXOWEDFNWRWKHPHPRU\DGGUHVVaddr7KHKDUGZDUHJXDUDQWHHVXVWKDWQR
RWKHUWKUHDGFDQUHDGRUZULWHWKHYDOXHDWDGGUHVVaddrZKLOHZHSHUIRUPWKHVH
RSHUDWLRQVWKXVHQVXULQJSUHGLFWDEOHUHVXOWV,QRXUH[DPSOHWKHDGGUHVVLQ
TXHVWLRQLVWKHORFDWLRQRIWKHKLVWRJUDPELQWKDWFRUUHVSRQGVWRWKHFXUUHQWE\WH
,IWKHFXUUHQWE\WHLVbuffer[i]MXVWOLNHZHVDZLQWKH&38YHUVLRQWKHFRUUH-
VSRQGLQJKLVWRJUDPELQLVhisto[buffer[i]]7KHDWRPLFRSHUDWLRQQHHGVWKH
DGGUHVVRIWKLVELQVRWKHȌUVWDUJXPHQWLVWKHUHIRUH&(histo[buffer[i]])
6LQFHZHVLPSO\ZDQWWRLQFUHPHQWWKHYDOXHLQWKDWELQE\RQHWKHVHFRQGDUJX-
PHQWLV
6RDIWHUDOOWKDWKXOODEDORRRXU*38KLVWRJUDPFRPSXWDWLRQLVIDLUO\VLPLODUWR
WKHFRUUHVSRQGLQJ&38YHUVLRQ
#include "../common/book.h"
+RZHYHUZHQHHGWRVDYHWKHFHOHEUDWLRQVIRUODWHU$IWHUUXQQLQJWKLVH[DPSOH
ZHGLVFRYHUWKDWD*H)RUFH*7;FDQFRQVWUXFWDKLVWRJUDPIURP0%RI
LQSXWGDWDLQVHFRQGV,I\RXUHDGWKHVHFWLRQRQ&38EDVHGKLVWRJUDPV
\RXZLOOUHDOL]HWKDWWKLVSHUIRUPDQFHLVWHUULEOH,QIDFWWKLVLVPRUHWKDQIRXU
WLPHVVORZHUWKDQWKH&38YHUVLRQ%XWWKLVLVZK\ZHDOZD\VPHDVXUHRXU
EDVHOLQHSHUIRUPDQFH,WZRXOGEHDVKDPHWRVHWWOHIRUVXFKDORZSHUIRUPDQFH
LPSOHPHQWDWLRQVLPSO\EHFDXVHLWUXQVRQWKH*38
180
6LQFHZHGRYHU\OLWWOHZRUNLQWKHNHUQHOLWLVTXLWHOLNHO\WKDWWKHDWRPLFRSHUD-
WLRQRQJOREDOPHPRU\LVFDXVLQJWKHSUREOHP(VVHQWLDOO\ZKHQWKRXVDQGV
RIWKUHDGVDUHWU\LQJWRDFFHVVDKDQGIXORIPHPRU\ORFDWLRQVDJUHDWGHDORI
FRQWHQWLRQIRURXUKLVWRJUDPELQVFDQRFFXU7RHQVXUHDWRPLFLW\RIWKHLQFUH-
PHQWRSHUDWLRQVWKHKDUGZDUHQHHGVWRVHULDOL]HRSHUDWLRQVWRWKHVDPHPHPRU\
ORFDWLRQ7KLVFDQUHVXOWLQDORQJTXHXHRISHQGLQJRSHUDWLRQVDQGDQ\SHUIRU-
PDQFHJDLQZHPLJKWKDYHKDGZLOOYDQLVK:HZLOOQHHGWRLPSURYHWKHDOJRULWKP
LWVHOILQRUGHUWRUHFRYHUWKLVSHUIRUPDQFH
+,672*5$0.(51(/86,1*6+$5('$1'*/2%$/0(025<$720,&6
,URQLFDOO\GHVSLWHWKDWWKHDWRPLFRSHUDWLRQVFDXVHWKLVSHUIRUPDQFHGHJUDGD-
WLRQDOOHYLDWLQJWKHVORZGRZQDFWXDOO\LQYROYHVXVLQJmoreDWRPLFVQRWIHZHU
7KHFRUHSUREOHPZDVQRWWKHXVHRIDWRPLFVVRPXFKDVWKHIDFWWKDWWKRXVDQGV
RIWKUHDGVZHUHFRPSHWLQJIRUDFFHVVWRDUHODWLYHO\VPDOOQXPEHURIPHPRU\
DGGUHVVHV7RDGGUHVVWKLVLVVXHZHZLOOVSOLWRXUKLVWRJUDPFRPSXWDWLRQLQWRWZR
SKDVHV
,QSKDVHRQHHDFKSDUDOOHOEORFNZLOOFRPSXWHDVHSDUDWHKLVWRJUDPRIWKHGDWD
WKDWLWVFRQVWLWXHQWWKUHDGVH[DPLQH6LQFHHDFKEORFNGRHVWKLVLQGHSHQGHQWO\
ZHFDQFRPSXWHWKHVHKLVWRJUDPVLQVKDUHGPHPRU\VDYLQJXVWKHWLPHRI
VHQGLQJHDFKZULWHRIIFKLSWR'5$0'RLQJWKLVGRHVQRWIUHHXVIURPQHHGLQJ
DWRPLFRSHUDWLRQVWKRXJKVLQFHPXOWLSOHWKUHDGVZLWKLQWKHEORFNFDQVWLOO
H[DPLQHGDWDHOHPHQWVZLWKWKHVDPHYDOXH+RZHYHUWKHIDFWWKDWRQO\
WKUHDGVZLOOQRZEHFRPSHWLQJIRUDGGUHVVHVZLOOUHGXFHFRQWHQWLRQIURPWKH
JOREDOYHUVLRQZKHUHWKRXVDQGVRIWKUHDGVZHUHFRPSHWLQJ
7KHȌUVWSKDVHWKHQLQYROYHVDOORFDWLQJDQG]HURLQJDVKDUHGPHPRU\EXIIHU
WRKROGHDFKEORFNǢVLQWHUPHGLDWHKLVWRJUDP5HFDOOIURP&KDSWHUWKDWVLQFH
WKHVXEVHTXHQWVWHSZLOOLQYROYHUHDGLQJDQGPRGLI\LQJWKLVEXIIHUZHQHHGD
__syncthreads()FDOOWRHQVXUHWKDWHYHU\WKUHDGǢVZULWHKDVFRPSOHWHG
EHIRUHSURJUHVVLQJ
181
$IWHU]HURLQJWKHKLVWRJUDPWKHQH[WVWHSLVUHPDUNDEO\VLPLODUWRRXURULJLQDO
*38KLVWRJUDP7KHVROHGLIIHUHQFHVKHUHDUHWKDWZHXVHWKHVKDUHGPHPRU\
EXIIHUtemp[]LQVWHDGRIWKHJOREDOPHPRU\EXIIHUhisto[]DQGWKDWZHQHHGD
VXEVHTXHQWFDOOWR__syncthreads()WRHQVXUHWKHODVWRIRXUZULWHVKDYHEHHQ
FRPPLWWHG
7KHODVWVWHSLQRXUPRGLȌHGKLVWRJUDPH[DPSOHUHTXLUHVWKDWZHPHUJHHDFK
EORFNǢVWHPSRUDU\KLVWRJUDPLQWRWKHJOREDOEXIIHUhisto[]6XSSRVHZHVSOLW
WKHLQSXWLQKDOIDQGWZRWKUHDGVORRNDWGLIIHUHQWKDOYHVDQGFRPSXWHVHSDUDWH
KLVWRJUDPV,IWKUHDG$VHHVE\WH0xFCWLPHVLQWKHLQSXWDQGWKUHDG%VHHV
E\WH0xFCWLPHVWKHE\WH0xFCPXVWKDYHDSSHDUHGWLPHVLQWKHLQSXW
/LNHZLVHHDFKELQRIWKHȌQDOKLVWRJUDPLVMXVWWKHVXPRIWKHFRUUHVSRQGLQJ
ELQLQWKUHDG$ǢVKLVWRJUDPDQGWKUHDG%ǢVKLVWRJUDP7KLVORJLFH[WHQGVWRDQ\
QXPEHURIWKUHDGVVRPHUJLQJHYHU\EORFNǢVKLVWRJUDPLQWRDVLQJOHȌQDOKLVWR-
JUDPLQYROYHVDGGLQJHDFKHQWU\LQWKHEORFNǢVKLVWRJUDPWRWKHFRUUHVSRQGLQJ
HQWU\LQWKHȌQDOKLVWRJUDP)RUDOOWKHUHDVRQVZHǢYHVHHQDOUHDG\WKLVQHHGVWR
EHGRQHDWRPLFDOO\
6LQFHZHKDYHGHFLGHGWRXVHWKUHDGVDQGKDYHKLVWRJUDPELQVHDFK
WKUHDGDWRPLFDOO\DGGVDVLQJOHELQWRWKHȌQDOKLVWRJUDPǢVWRWDO,IWKHVHQXPEHUV
GLGQǢWPDWFKWKLVSKDVHZRXOGEHPRUHFRPSOLFDWHG1RWHWKDWZHKDYHQR
JXDUDQWHHVDERXWZKDWRUGHUWKHEORFNVDGGWKHLUYDOXHVWRWKHȌQDOKLVWRJUDP
EXWVLQFHLQWHJHUDGGLWLRQLVFRPPXWDWLYHZHZLOODOZD\VJHWWKHVDPHDQVZHU
SURYLGHGWKDWWKHDGGLWLRQVRFFXUDWRPLFDOO\
182
$QGZLWKWKLVRXUWZRSKDVHKLVWRJUDPFRPSXWDWLRQNHUQHOLVFRPSOHWH+HUHLWLV
IURPVWDUWWRȌQLVK
__syncthreads();
atomicAdd( &(histo[threadIdx.x]), temp[threadIdx.x] );
}
7KLVYHUVLRQRIRXUKLVWRJUDPH[DPSOHLPSURYHVGUDPDWLFDOO\RYHUWKHSUHYLRXV
*38YHUVLRQ$GGLQJWKHVKDUHGPHPRU\FRPSRQHQWGURSVRXUUXQQLQJWLPHRQ
D*H)RUFH*7;WRVHFRQGV1RWRQO\LVWKLVVLJQLȌFDQWO\EHWWHUWKDQWKH
YHUVLRQWKDWXVHGJOREDOPHPRU\DWRPLFVRQO\EXWWKLVEHDWVRXURULJLQDO&38
LPSOHPHQWDWLRQE\DQRUGHURIPDJQLWXGH IURPVHFRQGVWRVHFRQGV
7KLVLPSURYHPHQWUHSUHVHQWVJUHDWHUWKDQDVHYHQIROGERRVWLQVSHHGRYHUWKH
&38YHUVLRQ6RGHVSLWHWKHHDUO\VHWEDFNLQDGDSWLQJWKHKLVWRJUDPWRD*38
LPSOHPHQWDWLRQRXUYHUVLRQWKDWXVHVERWKVKDUHGDQGJOREDODWRPLFVVKRXOGEH
FRQVLGHUHGDVXFFHVV
&KDSWHU5HYLHZ
$OWKRXJKZHKDYHIUHTXHQWO\VSRNHQDWOHQJWKDERXWKRZHDV\SDUDOOHOSURJUDP-
PLQJFDQEHZLWK&8'$&ZHKDYHODUJHO\LJQRUHGVRPHRIWKHVLWXDWLRQVZKHQ
183
PDVVLYHO\SDUDOOHODUFKLWHFWXUHVVXFKDVWKH*38FDQPDNHRXUOLYHVDVSURJUDP-
PHUVPRUHGLIȌFXOW7U\LQJWRFRSHZLWKSRWHQWLDOO\WHQVRIWKRXVDQGVRIWKUHDGV
VLPXOWDQHRXVO\PRGLI\LQJWKHVDPHPHPRU\DGGUHVVHVLVDFRPPRQVLWXDWLRQ
ZKHUHDPDVVLYHO\SDUDOOHOPDFKLQHFDQVHHPEXUGHQVRPH)RUWXQDWHO\ZHKDYH
KDUGZDUHVXSSRUWHGDWRPLFRSHUDWLRQVDYDLODEOHWRKHOSHDVHWKLVSDLQ
+RZHYHUDV\RXVDZZLWKWKHKLVWRJUDPFRPSXWDWLRQVRPHWLPHVUHOLDQFHRQ
DWRPLFRSHUDWLRQVLQWURGXFHVSHUIRUPDQFHLVVXHVWKDWFDQEHUHVROYHGRQO\
E\UHWKLQNLQJSDUWVRIWKHDOJRULWKP,QWKHKLVWRJUDPH[DPSOHZHPRYHGWRD
WZRVWDJHDOJRULWKPWKDWDOOHYLDWHGFRQWHQWLRQIRUJOREDOPHPRU\DGGUHVVHV,Q
JHQHUDOWKLVVWUDWHJ\RIORRNLQJWROHVVHQPHPRU\FRQWHQWLRQWHQGVWRZRUNZHOO
DQG\RXVKRXOGNHHSLWLQPLQGZKHQXVLQJDWRPLFVLQ\RXURZQDSSOLFDWLRQV
184
7LPHDQGWLPHDJDLQLQWKLVERRNZHKDYHVHHQKRZWKHPDVVLYHO\GDWDSDUDOOHO
H[HFXWLRQHQJLQHRQD*38FDQSURYLGHVWXQQLQJSHUIRUPDQFHJDLQVRYHUFRPSD-
UDEOH&38FRGH+RZHYHUWKHUHLV\HWDQRWKHUFODVVRISDUDOOHOLVPWREHH[SORLWHG
RQ19,',$JUDSKLFVSURFHVVRUV7KLVSDUDOOHOLVPLVVLPLODUWRWKHtask parallelism
WKDWLVIRXQGLQPXOWLWKUHDGHG&38DSSOLFDWLRQV5DWKHUWKDQVLPXOWDQHRXVO\
FRPSXWLQJWKHVDPHIXQFWLRQRQORWVRIGDWDHOHPHQWVDVRQHGRHVZLWKGDWD
SDUDOOHOLVPWDVNSDUDOOHOLVPLQYROYHVGRLQJWZRRUPRUHFRPSOHWHO\GLIIHUHQW
WDVNVLQSDUDOOHO
,QWKHFRQWH[WRISDUDOOHOLVPDtaskFRXOGEHDQ\QXPEHURIWKLQJV)RUH[DPSOH
DQDSSOLFDWLRQFRXOGEHH[HFXWLQJWZRWDVNVUHGUDZLQJLWV*8,ZLWKRQHWKUHDG
ZKLOHGRZQORDGLQJDQXSGDWHRYHUWKHQHWZRUNZLWKDQRWKHUWKUHDG7KHVHWDVNV
SURFHHGLQSDUDOOHOGHVSLWHKDYLQJQRWKLQJLQFRPPRQ$OWKRXJKWKHWDVNSDUDO-
OHOLVPRQ*38VLVQRWFXUUHQWO\DVȍH[LEOHDVDJHQHUDOSXUSRVHSURFHVVRUǢVLW
VWLOOSURYLGHVRSSRUWXQLWLHVIRUXVDVSURJUDPPHUVWRH[WUDFWHYHQPRUHVSHHG
IURPRXU*38EDVHGLPSOHPHQWDWLRQV,QWKLVFKDSWHUZHZLOOORRNDW&8'$
VWUHDPVDQGWKHZD\VLQZKLFKWKHLUFDUHIXOXVHZLOOHQDEOHXVWRH[HFXWHFHUWDLQ
RSHUDWLRQVVLPXOWDQHRXVO\RQWKH*38
185
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWDOORFDWLQJSDJHORFNHGKRVWPHPRU\
ǩ <RXZLOOOHDUQZKDW&8'$streamsDUH
ǩ <RXZLOOOHDUQKRZWRXVH&8'$VWUHDPVWRDFFHOHUDWH\RXUDSSOLFDWLRQV
3DJH/RFNHG+RVW0HPRU\
,QHYHU\H[DPSOHRYHUWKHFRXUVHRIQLQHFKDSWHUV\RXKDYHVHHQXVDOORFDWH
PHPRU\RQWKH*38ZLWKcudaMalloc()2QWKHKRVWZHKDYHDOZD\VDOORFDWHG
PHPRU\ZLWKWKHYDQLOOD&OLEUDU\URXWLQHmalloc()+RZHYHUWKH&8'$UXQWLPH
RIIHUVLWVRZQPHFKDQLVPIRUDOORFDWLQJKRVWPHPRU\cudaHostAlloc():K\
ZRXOG\RXERWKHUXVLQJWKLVIXQFWLRQZKHQmalloc()KDVVHUYHG\RXTXLWHZHOO
VLQFHGD\RQHRI\RXUOLIHDVD&SURJUDPPHU"
,QIDFWWKHUHLVDVLJQLȌFDQWGLIIHUHQFHEHWZHHQWKHPHPRU\WKDWmalloc()
ZLOODOORFDWHDQGWKHPHPRU\WKDWcudaHostAlloc()DOORFDWHV7KH&
OLEUDU\IXQFWLRQmalloc()DOORFDWHVVWDQGDUGSDJHDEOHKRVWPHPRU\ZKLOH
cudaHostAlloc()DOORFDWHVDEXIIHURIpage-lockedKRVWPHPRU\6RPHWLPHV
FDOOHGpinnedPHPRU\SDJHORFNHGEXIIHUVKDYHDQLPSRUWDQWSURSHUW\7KH
RSHUDWLQJV\VWHPJXDUDQWHHVXVWKDWLWZLOOQHYHUSDJHWKLVPHPRU\RXWWRGLVN
ZKLFKHQVXUHVLWVUHVLGHQF\LQSK\VLFDOPHPRU\7KHFRUROODU\WRWKLVLVWKDWLW
EHFRPHVVDIHIRUWKH26WRDOORZDQDSSOLFDWLRQDFFHVVWRWKHSK\VLFDODGGUHVVRI
WKHPHPRU\VLQFHWKHEXIIHUZLOOQRWEHHYLFWHGRUUHORFDWHG
.QRZLQJWKHSK\VLFDODGGUHVVRIDEXIIHUWKH*38FDQWKHQXVHGLUHFWPHPRU\
DFFHVV '0$ WRFRS\GDWDWRRUIURPWKHKRVW6LQFH'0$FRSLHVSURFHHGZLWKRXW
LQWHUYHQWLRQIURPWKH&38LWDOVRPHDQVWKDWWKH&38FRXOGEHVLPXOWDQHRXVO\
SDJLQJWKHVHEXIIHUVRXWWRGLVNRUUHORFDWLQJWKHLUSK\VLFDODGGUHVVE\XSGDWLQJ
WKHRSHUDWLQJV\VWHPǢVSDJHWDEOHV7KHSRVVLELOLW\RIWKH&38PRYLQJSDJHDEOH
GDWDPHDQVWKDWXVLQJSLQQHGPHPRU\IRUD'0$FRS\LVHVVHQWLDO,QIDFWHYHQ
ZKHQ\RXDWWHPSWWRSHUIRUPDPHPRU\FRS\ZLWKSDJHDEOHPHPRU\WKH&8'$
GULYHUVWLOOXVHV'0$WRWUDQVIHUWKHEXIIHUWRWKH*387KHUHIRUH\RXUFRS\
186
KDSSHQVWZLFHȌUVWIURPDSDJHDEOHV\VWHPEXIIHUWRDSDJHORFNHGǤVWDJLQJǥ
EXIIHUDQGWKHQIURPWKHSDJHORFNHGV\VWHPEXIIHUWRWKH*38
$VDUHVXOWZKHQHYHU\RXSHUIRUPPHPRU\FRSLHVIURPSDJHDEOHPHPRU\\RX
JXDUDQWHHWKDWWKHFRS\VSHHGZLOOEHERXQGHGE\WKHlowerRIWKH3&,(WUDQVIHU
VSHHGDQGWKHV\VWHPIURQWVLGHEXVVSHHGV$ODUJHGLVSDULW\LQEDQGZLGWK
EHWZHHQWKHVHEXVHVLQVRPHV\VWHPVHQVXUHVWKDWSDJHORFNHGKRVWPHPRU\
HQMR\VURXJKO\DWZRIROGSHUIRUPDQFHDGYDQWDJHRYHUVWDQGDUGSDJHDEOHPHPRU\
ZKHQXVHGIRUFRS\LQJGDWDEHWZHHQWKH*38DQGWKHKRVW%XWHYHQLQDZRUOG
ZKHUH3&,([SUHVVDQGIURQWVLGHEXVVSHHGVZHUHLGHQWLFDOSDJHDEOHEXIIHUV
ZRXOGVWLOOLQFXUWKHRYHUKHDGRIDQDGGLWLRQDO&38PDQDJHGFRS\
+RZHYHU\RXVKRXOGUHVLVWWKHWHPSWDWLRQWRVLPSO\GRDVHDUFKDQGUHSODFH
RQmallocWRFRQYHUWHYHU\RQHRI\RXUFDOOVWRXVHcudaHostAlloc()8VLQJ
SLQQHGPHPRU\LVDGRXEOHHGJHGVZRUG%\GRLQJVR\RXKDYHHIIHFWLYHO\RSWHG
RXWRIDOOWKHQLFHIHDWXUHVRIYLUWXDOPHPRU\6SHFLȌFDOO\WKHFRPSXWHUUXQQLQJ
WKHDSSOLFDWLRQQHHGVWRKDYHDYDLODEOHSK\VLFDOPHPRU\IRUHYHU\SDJHORFNHG
EXIIHUVLQFHWKHVHEXIIHUVFDQQHYHUEHVZDSSHGRXWWRGLVN7KLVPHDQVWKDW
\RXUV\VWHPZLOOUXQRXWRIPHPRU\PXFKIDVWHUWKDQLWZRXOGLI\RXVWXFNWR
VWDQGDUGmalloc()FDOOV1RWRQO\GRHVWKLVPHDQWKDW\RXUDSSOLFDWLRQPLJKW
VWDUWWRIDLORQPDFKLQHVZLWKVPDOOHUDPRXQWVRISK\VLFDOPHPRU\EXWLWPHDQV
WKDW\RXUDSSOLFDWLRQFDQDIIHFWWKHSHUIRUPDQFHRIRWKHUDSSOLFDWLRQVUXQQLQJRQ
WKHV\VWHP
7KHVHZDUQLQJVDUHQRWPHDQWWRVFDUH\RXRXWRIXVLQJcudaHostAlloc()EXW
\RXVKRXOGUHPDLQDZDUHRIWKHLPSOLFDWLRQVRISDJHORFNLQJEXIIHUV:HVXJJHVW
WU\LQJWRUHVWULFWWKHLUXVHWRPHPRU\WKDWZLOOEHXVHGDVDVRXUFHRUGHVWLQDWLRQ
LQFDOOVWRcudaMemcpy()DQGIUHHLQJWKHPZKHQWKH\DUHQRORQJHUQHHGHG
UDWKHUWKDQZDLWLQJXQWLODSSOLFDWLRQVKXWGRZQWRUHOHDVHWKHPHPRU\7KHXVHRI
cudaHostAlloc()VKRXOGEHQRPRUHGLIȌFXOWWKDQDQ\WKLQJHOVH\RXǢYHVWXGLHG
VRIDUEXWOHWǢVWDNHDORRNDWDQH[DPSOHWKDWZLOOERWKLOOXVWUDWHKRZSLQQHG
PHPRU\LVDOORFDWHGDQGGHPRQVWUDWHLWVSHUIRUPDQFHDGYDQWDJHRYHUVWDQGDUG
SDJHDEOHPHPRU\
2XUDSSOLFDWLRQZLOOEHYHU\VLPSOHDQGVHUYHVSULPDULO\WREHQFKPDUN
cudaMemcpy()SHUIRUPDQFHZLWKERWKSDJHDEOHDQGSDJHORFNHGPHPRU\
$OOZHHQGHDYRUWRGRLVDOORFDWHD*38EXIIHUDQGDKRVWEXIIHURIPDWFKLQJ
VL]HVDQGWKHQH[HFXWHVRPHQXPEHURIFRSLHVEHWZHHQWKHVHWZREXIIHUV:HǢOO
DOORZWKHXVHURIWKLVEHQFKPDUNWRVSHFLI\WKHGLUHFWLRQRIWKHFRS\HLWKHUǤXSǥ
IURPKRVWWRGHYLFH RUǤGRZQǥ IURPGHYLFHWRKRVW <RXZLOODOVRQRWLFHWKDWLQ
RUGHUWRREWDLQDFFXUDWHWLPLQJVZHVHWXS&8'$HYHQWVIRUWKHVWDUWDQGVWRS
187
RIWKHVHTXHQFHRIFRSLHV<RXSUREDEO\UHPHPEHUKRZWRGRWKLVIURPSUHYLRXV
SHUIRUPDQFHWHVWLQJH[DPSOHVEXWLQFDVH\RXǢYHIRUJRWWHQWKHIROORZLQJZLOOMRJ
\RXUPHPRU\
,QGHSHQGHQWRIWKHGLUHFWLRQRIWKHFRSLHVZHVWDUWE\DOORFDWLQJDKRVWDQG*38
EXIIHURIsizeLQWHJHUV$IWHUWKLVZHGRFRSLHVLQWKHGLUHFWLRQVSHFLȌHGE\
WKHDUJXPHQWupVWRSSLQJWKHWLPHUDIWHUZHǢYHȌQLVKHGFRS\LQJ
188
$IWHUWKHFRSLHVFOHDQXSE\IUHHLQJWKHKRVWDQG*38EXIIHUVDVZHOODV
GHVWUR\LQJRXUWLPLQJHYHQWV
free( a );
HANDLE_ERROR( cudaFree( dev_a ) );
HANDLE_ERROR( cudaEventDestroy( start ) );
HANDLE_ERROR( cudaEventDestroy( stop ) );
return elapsedTime;
}
,I\RXGLGQǢWQRWLFHWKHIXQFWLRQcuda_malloc_test()DOORFDWHGSDJHDEOHKRVW
PHPRU\ZLWKWKHVWDQGDUG&malloc()URXWLQH7KHSLQQHGPHPRU\YHUVLRQ
XVHVcudaHostAlloc()WRDOORFDWHDSDJHORFNHGEXIIHU
189
HANDLE_ERROR( cudaFreeHost( a ) );
HANDLE_ERROR( cudaFree( dev_a ) );
HANDLE_ERROR( cudaEventDestroy( start ) );
HANDLE_ERROR( cudaEventDestroy( stop ) );
return elapsedTime;
}
$V\RXFDQVHHWKHEXIIHUDOORFDWHGE\cudaHostAlloc()LVXVHGLQWKHVDPH
ZD\DVDEXIIHUDOORFDWHGE\malloc()7KHRWKHUFKDQJHIURPXVLQJmalloc()
OLHVLQWKHODVWDUJXPHQWWKHYDOXHcudaHostAllocDefault7KLVODVWDUJX-
PHQWVWRUHVDFROOHFWLRQRIȍDJVWKDWZHFDQXVHWRPRGLI\WKHEHKDYLRURI
cudaHostAlloc()LQRUGHUWRDOORFDWHRWKHUYDULHWLHVRISLQQHGKRVWPHPRU\
,QWKHQH[WFKDSWHUZHǢOOVHHKRZWRXVHWKHRWKHUSRVVLEOHYDOXHVRIWKHVHȍDJV
EXWIRUQRZZHǢUHFRQWHQWWRXVHWKHGHIDXOWSDJHORFNHGPHPRU\VRZHSDVV
cudaHostAllocDefaultLQRUGHUWRJHWWKHGHIDXOWEHKDYLRU7RIUHHDEXIIHU
WKDWZDVDOORFDWHGZLWKcudaHostAlloc()ZHKDYHWRXVHcudaFreeHost()
7KDWLVHYHU\malloc()QHHGVDfree()DQGHYHU\cudaHostAlloc()QHHGV
a cudaFreeHost()
7KHERG\RImain()SURFHHGVQRWXQOLNHZKDW\RXZRXOGH[SHFW
#include "../common/book.h"
190
%HFDXVHWKHupDUJXPHQWWRcuda_malloc_test()LVtrueWKHSUHYLRXVFDOO
WHVWVWKHSHUIRUPDQFHRIFRSLHVIURPKRVWWRGHYLFHRUǤXSǥWRWKHGHYLFH7R
EHQFKPDUNWKHFDOOVLQWKHRSSRVLWHGLUHFWLRQZHH[HFXWHWKHVDPHFDOOVEXWZLWK
falseDVWKHVHFRQGDUJXPHQW
:HSHUIRUPWKHVDPHVHWRIVWHSVWRWHVWWKHSHUIRUPDQFHRIcudaHostAlloc()
:HFDOOcuda_ host_alloc_test()WZLFHRQFHZLWKupDVtrueDQGRQFH
ZLWKLWfalse
2QD*H)RUFH*7;ZHREVHUYHGFRSLHVIURPKRVWWRGHYLFHLPSURYLQJIURP
*%VWR*%VZKHQZHXVHSLQQHGPHPRU\LQVWHDGRISDJHDEOHPHPRU\
191
&RSLHVIURPWKHGHYLFHGRZQWRWKHKRVWLPSURYHVLPLODUO\IURP*%VWR
*%V6RIRUPRVW3&,(EDQGZLGWKOLPLWHGDSSOLFDWLRQV\RXZLOOQRWLFHD
PDUNHGLPSURYHPHQWZKHQXVLQJSLQQHGPHPRU\YHUVXVVWDQGDUGSDJHDEOH
PHPRU\%XWSDJHORFNHGPHPRU\LVQRWVROHO\IRUSHUIRUPDQFHHQKDQFHPHQWV
$VZHǢOOVHHLQWKHQH[WVHFWLRQVWKHUHDUHVLWXDWLRQVZKHUHZHDUHrequiredWR
XVHSDJHORFNHGPHPRU\
&8'$6WUHDPV
,Q&KDSWHUZHLQWURGXFHGWKHFRQFHSWRI&8'$HYHQWV,QGRLQJVRZHSRVW-
SRQHGDQLQGHSWKGLVFXVVLRQRIWKHVHFRQGDUJXPHQWWRcudaEventRecord(),
LQVWHDGPHQWLRQLQJRQO\WKDWLWVSHFLȌHGWKHstreamLQWRZKLFKZHZHUHLQVHUWLQJ
WKHHYHQW
cudaEvent_t start;
cudaEventCreate(&start);
cudaEventRecord( start, 0 );
&8'$VWUHDPVFDQSOD\DQLPSRUWDQWUROHLQDFFHOHUDWLQJ\RXUDSSOLFDWLRQV
A CUDA stream UHSUHVHQWVDTXHXHRI*38RSHUDWLRQVWKDWJHWH[HFXWHGLQD
VSHFLȌFRUGHU:HFDQDGGRSHUDWLRQVVXFKDVNHUQHOODXQFKHVPHPRU\FRSLHV
DQGHYHQWVWDUWVDQGVWRSVLQWRDVWUHDP7KHRUGHULQZKLFKRSHUDWLRQVDUHDGGHG
WRWKHVWUHDPVSHFLȌHVWKHRUGHULQZKLFKWKH\ZLOOEHH[HFXWHG<RXFDQWKLQNRI
HDFKVWUHDPDVDtaskRQWKH*38DQGWKHUHDUHRSSRUWXQLWLHVIRUWKHVHWDVNVWR
H[HFXWHLQSDUDOOHO:HǢOOȌUVWVHHKRZVWUHDPVDUHXVHGDQGWKHQZHǢOOORRNDW
KRZ\RXFDQXVHVWUHDPVWRDFFHOHUDWH\RXUDSSOLFDWLRQV
8VLQJD6LQJOH&8'$6WUHDP
$VZHǢOOVHHODWHUWKHUHDOSRZHURIVWUHDPVEHFRPHVDSSDUHQWRQO\ZKHQZH
XVHPRUHWKDQRQHRIWKHPEXWZHǢOOEHJLQWRLOOXVWUDWHWKHPHFKDQLFVRIWKHLU
XVHZLWKLQDQDSSOLFDWLRQWKDWHPSOR\VMXVWDVLQJOHVWUHDP,PDJLQHWKDWZH
KDYHD&8'$&NHUQHOWKDWZLOOWDNHWZRLQSXWEXIIHUVRIGDWDaDQGb7KHNHUQHO
ZLOOFRPSXWHVRPHUHVXOWEDVHGRQDFRPELQDWLRQRIYDOXHVLQWKHVHEXIIHUVWR
SURGXFHDQRXWSXWEXIIHUc2XUYHFWRUDGGLWLRQH[DPSOHGLGVRPHWKLQJDORQJ
192
WKHVHOLQHVEXWLQWKLVH[DPSOHZHǢOOFRPSXWHDQDYHUDJHRIWKUHHYDOXHVLQaDQG
WKUHHYDOXHVLQb
#include "../common/book.h"
#define N (1024*1024)
#define FULL_DATA_SIZE (N*20)
7KLVNHUQHOLVQRWLQFUHGLEO\LPSRUWDQWVRGRQǢWJHWWRRKXQJXSRQLWLI\RX
DUHQǢWVXUHH[DFWO\ZKDWLWǢVVXSSRVHGWREHFRPSXWLQJ,WǢVVRPHWKLQJRID
SODFHKROGHUVLQFHWKHLPSRUWDQWVWUHDPUHODWHGFRPSRQHQWRIWKLVH[DPSOH
UHVLGHVLQmain()
return 0;
}
193
7KHȌUVWWKLQJZHGRLVFKRRVHDGHYLFHDQGFKHFNWRVHHZKHWKHULWVXSSRUWVD
IHDWXUHNQRZQDVdevice overlap$*38VXSSRUWLQJGHYLFHRYHUODSSRVVHVVHVWKH
FDSDFLW\WRVLPXOWDQHRXVO\H[HFXWHD&8'$&NHUQHOZKLOHSHUIRUPLQJDFRS\
EHWZHHQGHYLFHDQGKRVWPHPRU\$VZHǢYHSURPLVHGEHIRUHZHǢOOXVHPXOWLSOH
VWUHDPVWRDFKLHYHWKLVRYHUODSRIFRPSXWDWLRQDQGGDWDWUDQVIHUEXWȌUVWZHǢOO
VHHKRZWRFUHDWHDQGXVHDVLQJOHVWUHDP$VZLWKDOORIRXUH[DPSOHVWKDWDLPWR
PHDVXUHSHUIRUPDQFHLPSURYHPHQWV RUUHJUHVVLRQV ZHEHJLQE\FUHDWLQJDQG
VWDUWLQJDQHYHQWWLPHU
$IWHUVWDUWLQJRXUWLPHUZHFUHDWHWKHVWUHDPZHZDQWWRXVHIRUWKLVDSSOLFDWLRQ
<HDKWKDWǢVSUHWW\PXFKDOOLWWDNHVWRFUHDWHDVWUHDP,WǢVQRWUHDOO\ZRUWK
GZHOOLQJRQVROHWǢVSUHVVRQWRWKHGDWDDOORFDWLRQ
194
:HKDYHDOORFDWHGRXULQSXWDQGRXWSXWEXIIHUVRQERWKWKH*38DQGWKH
KRVW1RWLFHWKDWZHǢYHGHFLGHGWRXVHSLQQHGPHPRU\RQWKHKRVWE\XVLQJ
cudaHostAlloc()WRSHUIRUPWKHDOORFDWLRQV7KHUHLVDYHU\JRRGUHDVRQIRU
XVLQJSLQQHGPHPRU\DQGLWǢVQRWVWULFWO\EHFDXVHLWPDNHVFRSLHVIDVWHU:HǢOO
VHHLQGHWDLOPRPHQWDULO\EXWZHZLOOEHXVLQJDQHZNLQGRIcudaMemcpy()
IXQFWLRQDQGWKLVQHZIXQFWLRQrequiresWKDWWKHKRVWPHPRU\EHSDJHORFNHG
$IWHUDOORFDWLQJWKHLQSXWEXIIHUVZHȌOOWKHKRVWDOORFDWLRQVZLWKUDQGRPLQWHJHUV
XVLQJWKH&OLEUDU\FDOOrand()
:LWKRXUVWUHDPDQGRXUWLPLQJHYHQWVFUHDWHGDQGRXUGHYLFHDQGKRVWEXIIHUV
DOORFDWHGZHǢUHUHDG\WRSHUIRUPVRPHFRPSXWDWLRQV7\SLFDOO\ZHEODVWWKURXJK
WKLVVWDJHE\FRS\LQJWKHWZRLQSXWEXIIHUVWRWKH*38ODXQFKLQJRXUNHUQHODQG
FRS\LQJWKHRXWSXWEXIIHUEDFNWRWKHKRVW:HZLOOIROORZWKLVSDWWHUQDJDLQEXW
WKLVWLPHZLWKVRPHVPDOOFKDQJHV
)LUVWZHZLOORSWnotWRFRS\WKHLQSXWEXIIHUVLQWKHLUHQWLUHW\WRWKH*385DWKHU
ZHZLOOVSOLWRXULQSXWVLQWRVPDOOHUFKXQNVDQGSHUIRUPWKHWKUHHVWHSSURFHVV
RQHDFKFKXQN7KDWLVZHZLOOWDNHVRPHIUDFWLRQRIWKHLQSXWEXIIHUVFRS\
WKHPWRWKH*38H[HFXWHRXUNHUQHORQWKDWIUDFWLRQRIWKHEXIIHUVDQGFRS\WKH
UHVXOWLQJIUDFWLRQRIWKHRXWSXWEXIIHUEDFNWRWKHKRVW,PDJLQHWKDWZHQHHG
195
WRGRWKLVEHFDXVHRXU*38KDVPXFKOHVVPHPRU\WKDQRXUKRVWGRHVVRWKH
FRPSXWDWLRQQHHGVWREHVWDJHGLQFKXQNVEHFDXVHWKHHQWLUHEXIIHUFDQǢWȌWRQ
WKH*38DWRQFH7KHFRGHWRSHUIRUPWKLVǤFKXQNLȌHGǥVHTXHQFHRIFRPSXWDWLRQV
ZLOOORRNOLNHWKLV
%XW\RXZLOOQRWLFHWZRRWKHUXQH[SHFWHGVKLIWVIURPWKHQRUPLQWKHSUHFHGLQJ
H[FHUSW)LUVWLQVWHDGRIXVLQJWKHIDPLOLDUcudaMemcpy()ZHǢUHFRS\LQJ
WKHGDWDWRDQGIURPWKH*38ZLWKDQHZURXWLQHcudaMemcpyAsync()
7KHGLIIHUHQFHEHWZHHQWKHVHIXQFWLRQVLVVXEWOH\HWVLJQLȌFDQW7KHRULJLQDO
cudaMemcpy()EHKDYHVOLNHWKH&OLEUDU\IXQFWLRQmemcpy()6SHFLȌFDOO\WKLV
IXQFWLRQH[HFXWHVsynchronouslyPHDQLQJWKDWZKHQWKHIXQFWLRQUHWXUQVWKH
FRS\KDVFRPSOHWHGDQGWKHRXWSXWEXIIHUQRZFRQWDLQVWKHFRQWHQWVWKDWZHUH
VXSSRVHGWREHFRSLHGLQWRLW
196
7KHRSSRVLWHRIDsynchronousIXQFWLRQLVDQasynchronousIXQFWLRQZKLFK
LQVSLUHGWKHQDPHcudaMemcpyAsync()7KHFDOOWRcudaMemcpyAsync()
VLPSO\SODFHVDrequestWRSHUIRUPDPHPRU\FRS\LQWRWKHVWUHDPVSHFLȌHGE\
WKHDUJXPHQWstream:KHQWKHFDOOUHWXUQVWKHUHLVQRJXDUDQWHHWKDWWKH
FRS\KDVHYHQVWDUWHG\HWPXFKOHVVWKDWLWKDVȌQLVKHG7KHJXDUDQWHHWKDW
ZHKDYHLVWKDWWKHFRS\ZLOOGHȌQLWHO\EHSHUIRUPHGEHIRUHWKHQH[WRSHUD-
WLRQSODFHGLQWRWKHVDPHVWUHDP,WLVUHTXLUHGWKDWDQ\KRVWPHPRU\SRLQWHUV
SDVVHGWRcudaMemcpyAsync()KDYHEHHQDOORFDWHGE\cudaHostAlloc()
7KDWLV\RXDUHRQO\DOORZHGWRVFKHGXOHDV\QFKURQRXVFRSLHVWRRUIURPSDJH
ORFNHGPHPRU\
1RWLFHWKDWWKHDQJOHEUDFNHWHGNHUQHOODXQFKDOVRWDNHVDQRSWLRQDOVWUHDP
DUJXPHQW7KLVNHUQHOODXQFKLVDV\QFKURQRXVMXVWOLNHWKHSUHFHGLQJWZR
PHPRU\FRSLHVWRWKH*38DQGWKHWUDLOLQJPHPRU\FRS\EDFNIURPWKH*38
7HFKQLFDOO\ZHFDQHQGDQLWHUDWLRQRIWKLVORRSZLWKRXWKDYLQJDFWXDOO\VWDUWHG
DQ\RIWKHPHPRU\FRSLHVRUNHUQHOH[HFXWLRQ$VZHPHQWLRQHGDOOWKDWZHDUH
JXDUDQWHHGLVWKDWWKHȌUVWFRS\SODFHGLQWRWKHVWUHDPZLOOH[HFXWHEHIRUHWKH
VHFRQGFRS\0RUHRYHUWKHVHFRQGFRS\ZLOOFRPSOHWHEHIRUHWKHNHUQHOVWDUWV
DQGWKHNHUQHOZLOOFRPSOHWHEHIRUHWKHWKLUGFRS\VWDUWV6RDVZHǢYHPHQWLRQHG
HDUOLHULQWKLVFKDSWHUDVWUHDPDFWVMXVWOLNHDQRUGHUHGTXHXHRIZRUNIRUWKH
*38WRSHUIRUP
:KHQWKHfor()ORRSKDVWHUPLQDWHGWKHUHFRXOGVWLOOEHTXLWHDELWRIZRUN
TXHXHGXSIRUWKH*38WRȌQLVK,IZHZRXOGOLNHWRJXDUDQWHHWKDWWKH*38
LVGRQHZLWKLWVFRPSXWDWLRQVDQGPHPRU\FRSLHVZHQHHGWRV\QFKURQL]H
LWZLWKWKHKRVW7KDWLVZHEDVLFDOO\ZDQWWRWHOOWKHKRVWWRVLWDURXQGDQG
ZDLWIRUWKH*38WRȌQLVKEHIRUHSURFHHGLQJ:HDFFRPSOLVKWKDWE\FDOOLQJ
cudaStreamSynchronize()DQGVSHFLI\LQJWKHVWUHDPWKDWZHZDQWWRZDLWIRU
6LQFHWKHFRPSXWDWLRQVDQGFRSLHVKDYHFRPSOHWHGDIWHUV\QFKURQL]LQJstream
ZLWKWKHKRVWZHFDQVWRSRXUWLPHUFROOHFWRXUSHUIRUPDQFHGDWDDQGIUHHRXU
LQSXWDQGRXWSXWEXIIHUV
197
)LQDOO\EHIRUHH[LWLQJWKHDSSOLFDWLRQZHGHVWUR\WKHVWUHDPWKDWZHZHUHXVLQJ
WRTXHXHWKH*38RSHUDWLRQV
return 0;
}
7REHKRQHVWWKLVH[DPSOHKDVGRQHYHU\OLWWOHWRGHPRQVWUDWHWKHSRZHURI
VWUHDPV2IFRXUVHHYHQXVLQJDVLQJOHVWUHDPFDQKHOSVSHHGXSDQDSSOLFDWLRQ
LIZHKDYHZRUNZHZDQWWRFRPSOHWHRQWKHKRVWZKLOHWKH*38LVEXV\FKXUQLQJ
WKURXJKWKHZRUNZHǢYHVWXIIHGLQWRDVWUHDP%XWDVVXPLQJWKDWZHGRQǢWKDYH
PXFKWRGRRQWKHKRVWZHFDQVWLOOVSHHGXSDSSOLFDWLRQVE\XVLQJVWUHDPVDQG
LQWKHQH[WVHFWLRQZHǢOOWDNHDORRNDWKRZWKLVFDQEHDFFRPSOLVKHG
8VLQJ0XOWLSOH&8'$6WUHDPV
/HWǢVDGDSWWKHVLQJOHVWUHDPH[DPSOHIURP6HFWLRQ8VLQJD6LQJOH&8'$
6WUHDPWRSHUIRUPLWVZRUNLQWZRGLIIHUHQWVWUHDPV$WWKHEHJLQQLQJRIWKH
SUHYLRXVH[DPSOHZHFKHFNHGWKDWWKHGHYLFHLQGHHGVXSSRUWHGoverlapDQG
198
EURNHWKHFRPSXWDWLRQLQWRFKXQNV7KHLGHDXQGHUO\LQJWKHLPSURYHGYHUVLRQ
RIWKLVDSSOLFDWLRQLVVLPSOHDQGUHOLHVRQWZRWKLQJVWKHǤFKXQNHGǥFRPSXWD-
WLRQDQGWKHRYHUODSRIPHPRU\FRSLHVZLWKNHUQHOH[HFXWLRQ:HHQGHDYRUWR
JHWVWUHDPWRFRS\LWVLQSXWEXIIHUVWRWKH*38ZKLOHVWUHDPLVH[HFXWLQJLWV
NHUQHO7KHQVWUHDPZLOOH[HFXWHLWVNHUQHOZKLOHVWUHDPFRSLHVLWVUHVXOWV
WRWKHKRVW6WUHDPZLOOWKHQFRS\LWVUHVXOWVWRWKHKRVWZKLOHVWUHDPEHJLQV
H[HFXWLQJLWVNHUQHORQWKHQH[WFKXQNRIGDWD$VVXPLQJWKDWRXUPHPRU\FRSLHV
DQGNHUQHOH[HFXWLRQVWDNHURXJKO\WKHVDPHDPRXQWRIWLPHRXUDSSOLFDWLRQǢV
H[HFXWLRQWLPHOLQHPLJKWORRNVRPHWKLQJOLNH)LJXUH7KHȌJXUHDVVXPHV
WKDWWKH*38FDQSHUIRUPDPHPRU\FRS\DQGDNHUQHOH[HFXWLRQDWWKHVDPH
WLPHVRHPSW\ER[HVUHSUHVHQWWLPHZKHQRQHVWUHDPLVZDLWLQJWRH[HFXWHDQ
RSHUDWLRQWKDWLWFDQQRWRYHUODSZLWKWKHRWKHUVWUHDPǢVRSHUDWLRQ1RWHDOVRWKDW
FDOOVWRcudaMemcpyAsync()DUHDEEUHYLDWHGLQWKHUHPDLQLQJȌJXUHVLQWKLV
FKDSWHUUHSUHVHQWHGVLPSO\DVǤmemcpyǥ
199
,QIDFWWKHH[HFXWLRQWLPHOLQHFDQEHHYHQPRUHIDYRUDEOHWKDQWKLVVRPHQHZHU
19,',$*38VVXSSRUWVLPXOWDQHRXVNHUQHOH[HFXWLRQDQGtwoPHPRU\FRSLHV
RQHtoWKHGHYLFHDQGRQHfromWKHGHYLFH%XWRQDQ\GHYLFHWKDWVXSSRUWVWKH
RYHUODSRIPHPRU\FRSLHVDQGNHUQHOH[HFXWLRQWKHRYHUDOODSSOLFDWLRQVKRXOG
DFFHOHUDWHZKHQZHXVHPXOWLSOHVWUHDPV
'HVSLWHWKHVHJUDQGSODQVWRDFFHOHUDWHRXUDSSOLFDWLRQWKHFRPSXWDWLRQNHUQHO
ZLOOUHPDLQXQFKDQJHG
#include "../common/book.h"
#define N (1024*1024)
#define FULL_DATA_SIZE (N*20)
$VZLWKWKHVLQJOHVWUHDPYHUVLRQZHZLOOFKHFNWKDWWKHGHYLFHVXSSRUWVRYHU-
ODSSLQJFRPSXWDWLRQZLWKPHPRU\FRS\,IWKHGHYLFHdoesVXSSRUWRYHUODSZH
SURFHHGDVZHGLGEHIRUHE\FUHDWLQJ&8'$HYHQWVWRWLPHWKHDSSOLFDWLRQ
200
if (!prop.deviceOverlap) {
printf( “Device will not handle overlaps, so no “
“speed up from streams\n” );
return 0;
}
1H[WZHFUHDWHRXUWZRVWUHDPVH[DFWO\DVZHFUHDWHGWKHVLQJOHVWUHDPLQWKH
SUHYLRXVVHFWLRQǢVYHUVLRQRIWKHFRGH
:HZLOODVVXPHWKDWZHVWLOOKDYHWZRLQSXWEXIIHUVDQGDVLQJOHRXWSXWEXIIHURQ
WKHKRVW7KHLQSXWEXIIHUVDUHȌOOHGZLWKUDQGRPGDWDH[DFWO\DVWKH\ZHUHLQWKH
VLQJOHVWUHDPYHUVLRQRIWKLVDSSOLFDWLRQ+RZHYHUQRZWKDWZHLQWHQGWRXVHWZR
VWUHDPVWRSURFHVVWKHGDWDZHDOORFDWHWZRLGHQWLFDOVHWVRI*38EXIIHUVVRWKDW
HDFKVWUHDPFDQLQGHSHQGHQWO\ZRUNRQFKXQNVRIWKHLQSXW
201
:HWKHQORRSRYHUWKHFKXQNVRILQSXWH[DFWO\DVZHGLGLQWKHȌUVWDWWHPSWDWWKLV
DSSOLFDWLRQ%XWQRZWKDWZHǢUHXVLQJWZRVWUHDPVZHSURFHVVWZLFHDVPXFK
GDWDLQHDFKLWHUDWLRQRIWKHfor()ORRS,Qstream0ZHTXHXHDV\QFKURQRXV
FRSLHVRIaDQGbWRWKH*38TXHXHDNHUQHOH[HFXWLRQDQGWKHQTXHXHDFRS\
EDFNWRc
202
$IWHUTXHXLQJWKHVHRSHUDWLRQVLQstream0ZHTXHXHLGHQWLFDORSHUDWLRQVRQWKH
QH[WFKXQNRIGDWDEXWWKLVWLPHLQstream1
203
$QGVRRXUfor()ORRSSURFHHGVDOWHUQDWLQJWKHVWUHDPVWRZKLFKLWTXHXHV
HDFKFKXQNRIGDWDXQWLOLWKDVTXHXHGHYHU\SLHFHRILQSXWGDWDIRUSURFHVVLQJ
$IWHUWHUPLQDWLQJWKHfor()ORRSZHV\QFKURQL]HWKH*38ZLWKWKH&38EHIRUH
ZHVWRSRXUDSSOLFDWLRQWLPHUV6LQFHZHDUHZRUNLQJLQWZRVWUHDPVZHQHHGWR
V\QFKURQL]HERWK
:HZUDSXSmain()WKHVDPHZD\ZHFRQFOXGHGRXUVLQJOHVWUHDPLPSOHPHQWD-
WLRQ:HVWRSRXUWLPHUVGLVSOD\WKHHODSVHGWLPHDQGFOHDQXSDIWHURXUVHOYHV
2IFRXUVHZHUHPHPEHUWKDWZHQRZQHHGWRGHVWUR\WZRVWUHDPVDQGIUHHWZLFH
DVPDQ\*38EXIIHUVEXWDVLGHIURPWKDWWKLVFRGHLVLGHQWLFDOWRZKDWZHǢYH
VHHQDOUHDG\
204
return 0;
}
:HEHQFKPDUNHGERWKWKHRULJLQDOVLQJOHVWUHDPLPSOHPHQWDWLRQIURP
6HFWLRQ8VLQJD6LQJOH&8'$6WUHDPDQGWKHLPSURYHGGRXEOHVWUHDP
YHUVLRQRQD*H)RUFH*7;7KHRULJLQDOYHUVLRQWDNHVPVWRUXQWRFRPSOH-
WLRQ$IWHUPRGLI\LQJLWWRXVHWZRVWUHDPVLWWDNHVPV
8KRK
:HOOWKHJRRGQHZVLVWKDWWKLVLVWKHUHDVRQZHERWKHUWRWLPHRXUDSSOLFDWLRQV
6RPHWLPHVRXUPRVWZHOOLQWHQGHGSHUIRUPDQFHǤHQKDQFHPHQWVǥGRQRWKLQJ
PRUHWKDQLQWURGXFHXQQHFHVVDU\FRPSOLFDWLRQVWRWKHFRGH
%XWZK\GLGQǢWWKLVDSSOLFDWLRQJHWDQ\IDVWHU":HHYHQVDLGWKDWLWZRXOGJHW
IDVWHU'RQǢWORVHKRSH\HWWKRXJKEHFDXVHZHDFWXDOO\canDFFHOHUDWHWKHVLQJOH
VWUHDPYHUVLRQZLWKDVHFRQGVWUHDPEXWZHQHHGWRXQGHUVWDQGDELWPRUHDERXW
KRZVWUHDPVDUHKDQGOHGE\WKH&8'$GULYHULQRUGHUWRUHDSWKHUHZDUGVRI
GHYLFHRYHUODS7RXQGHUVWDQGKRZVWUHDPVZRUNEHKLQGWKHVFHQHVZHǢOOQHHGWR
ORRNDWERWKWKH&8'$GULYHUDQGKRZWKH&8'$KDUGZDUHDUFKLWHFWXUHZRUNV
*38:RUN6FKHGXOLQJ
$OWKRXJKVWUHDPVDUHORJLFDOO\LQGHSHQGHQWTXHXHVRIRSHUDWLRQVWREHH[HFXWHG
RQWKH*38LWWXUQVRXWWKDWWKLVDEVWUDFWLRQGRHVQRWH[DFWO\PDWFKWKH*38ǢV
TXHXLQJPHFKDQLVP$VSURJUDPPHUVZHWKLQNDERXWRXUVWUHDPVDVRUGHUHG
VHTXHQFHVRIRSHUDWLRQVFRPSRVHGRIDPL[WXUHRIPHPRU\FRSLHVDQGNHUQHO
205
LQYRFDWLRQV+RZHYHUWKHKDUGZDUHKDVQRQRWLRQRIVWUHDPV5DWKHULWKDVRQH
RUPRUHHQJLQHVWRSHUIRUPPHPRU\FRSLHVDQGDQHQJLQHWRH[HFXWHNHUQHOV
7KHVHHQJLQHVTXHXHFRPPDQGVLQGHSHQGHQWO\IURPHDFKRWKHUUHVXOWLQJLQD
WDVNVFKHGXOLQJVFHQDULROLNHWKHRQHVKRZQLQ)LJXUH7KHDUURZVLQWKH
ȌJXUHLOOXVWUDWHKRZRSHUDWLRQVWKDWKDYHEHHQTXHXHGLQWRVWUHDPVJHWVFKHG-
XOHGRQWKHKDUGZDUHHQJLQHVWKDWDFWXDOO\H[HFXWHWKHP
6RWKHXVHUDQGWKHKDUGZDUHKDYHVRPHZKDWRUWKRJRQDOQRWLRQVRIKRZWR
TXHXH*38ZRUNDQGWKHEXUGHQRINHHSLQJERWKWKHXVHUDQGKDUGZDUHVLGHV
RIWKLVHTXDWLRQKDSS\IDOOVRQWKH&8'$GULYHU)LUVWDQGIRUHPRVWWKHUHDUH
LPSRUWDQWGHSHQGHQFLHVVSHFLȌHGE\WKHRUGHULQZKLFKRSHUDWLRQVDUHDGGHG
WRVWUHDPV)RUH[DPSOHLQ)LJXUHVWUHDPǢVPHPRU\FRS\RI$QHHGVWR
EHFRPSOHWHGEHIRUHLWVPHPRU\FRS\RI%ZKLFKLQWXUQQHHGVWREHFRPSOHWHG
EHIRUHNHUQHO$LVODXQFKHG%XWRQFHWKHVHRSHUDWLRQVDUHSODFHGLQWRWKHKDUG-
ZDUHǢVFRS\HQJLQHDQGNHUQHOHQJLQHTXHXHVWKHVHGHSHQGHQFLHVDUHORVWVR
WKH&8'$GULYHUQHHGVWRNHHSHYHU\RQHKDSS\E\HQVXULQJWKDWWKHLQWUDVWUHDP
GHSHQGHQFLHVUHPDLQVDWLVȌHGE\WKHKDUGZDUHǢVH[HFXWLRQXQLWV
206
:KDWGRHVWKLVPHDQWRXV":HOOOHWǢVORRNDWZKDWǢVDFWXDOO\KDSSHQLQJZLWK
RXUH[DPSOHLQ6HFWLRQ8VLQJ0XOWLSOH&8'$6WUHDPV,IZHUHYLHZWKH
FRGHZHVHHWKDWRXUDSSOLFDWLRQEDVLFDOO\DPRXQWVWRDcudaMemcpyAsync()
RIa, cudaMemcpyAsync()RIbRXUNHUQHOH[HFXWLRQDQGWKHQD
cudaMemcpyAsync()RIcEDFNWRWKHKRVW7KHDSSOLFDWLRQHQTXHXHVDOOWKH
RSHUDWLRQVIURPVWUHDPIROORZHGE\DOOWKHRSHUDWLRQVIURPVWUHDP7KH&8'$
GULYHUVFKHGXOHVWKHVHRSHUDWLRQVRQWKHKDUGZDUHIRUXVLQWKHRUGHUWKH\ZHUH
VSHFLȌHGNHHSLQJWKHLQWHUHQJLQHGHSHQGHQFLHVVWUDLJKW7KHVHGHSHQGHQFLHV
DUHLOOXVWUDWHGLQ)LJXUHZKHUHDQDUURZIURPDFRS\WRDNHUQHOLQGLFDWHV
WKDWWKHFRS\GHSHQGVRQWKHNHUQHOFRPSOHWLQJH[HFXWLRQEHIRUHLWFDQEHJLQ
*LYHQRXUQHZIRXQGXQGHUVWDQGLQJRIKRZWKH*38VFKHGXOHVZRUNZHFDQORRN
DWDWLPHOLQHRIKRZWKHVHJHWH[HFXWHGRQWKHKDUGZDUHLQ)LJXUH
%HFDXVHVWUHDPǢVFRS\RIcEDFNWRWKHKRVWGHSHQGVRQLWVNHUQHOH[HFXWLRQ
FRPSOHWLQJVWUHDPǢVFRPSOHWHO\LQGHSHQGHQWFRSLHVRIaDQGbWRWKH*38JHW
EORFNHGEHFDXVHWKH*38ǢVHQJLQHVH[HFXWHZRUNLQWKHRUGHULWǢVSURYLGHG7KLV
LQHIȌFLHQF\H[SODLQVZK\WKHWZRVWUHDPYHUVLRQRIRXUDSSOLFDWLRQVKRZHGDEVR-
OXWHO\QRVSHHGXS7KHODFNRILPSURYHPHQWLVDGLUHFWUHVXOWRIRXUDVVXPSWLRQ
WKDWWKHKDUGZDUHZRUNVLQWKHVDPHPDQQHUDVWKH&8'$VWUHDPSURJUDPPLQJ
PRGHOLPSOLHV
207
7KHPRUDORIWKLVVWRU\LVWKDWZHDVSURJUDPPHUVQHHGWRKHOSRXWZKHQLW
FRPHVWRHQVXULQJWKDWLQGHSHQGHQWVWUHDPVDFWXDOO\JHWH[HFXWHGLQSDUDOOHO
.HHSLQJLQPLQGWKDWWKHKDUGZDUHKDVLQGHSHQGHQWHQJLQHVWKDWKDQGOHPHPRU\
FRSLHVDQGNHUQHOH[HFXWLRQVZHQHHGWRUHPDLQDZDUHWKDWWKHRUGHULQZKLFK
ZHHQTXHXHWKHVHRSHUDWLRQVLQRXUVWUHDPVZLOODIIHFWWKHZD\LQZKLFKWKH
&8'$GULYHUVFKHGXOHVWKHVHIRUH[HFXWLRQ,QWKHQH[WVHFWLRQZHǢOOVHHKRZWR
KHOSWKHKDUGZDUHDFKLHYHRYHUODSRIPHPRU\FRSLHVDQGNHUQHOH[HFXWLRQ
8VLQJ0XOWLSOH&8'$6WUHDPV
(IIHFWLYHO\
$VZHVDZLQWKHSUHYLRXVVHFWLRQLIZHVFKHGXOHDOORIDSDUWLFXODUVWUHDPǢV
RSHUDWLRQVDWRQFHLWǢVYHU\HDV\WRLQDGYHUWHQWO\EORFNWKHFRSLHVRUNHUQHO
H[HFXWLRQVRIDQRWKHUVWUHDP7RDOOHYLDWHWKLVSUREOHPLWVXIȌFHVWRHQTXHXHRXU
RSHUDWLRQVEUHDGWKȌUVWDFURVVVWUHDPVUDWKHUWKDQGHSWKȌUVW7KDWLVUDWKHU
WKDQDGGWKHFRS\RIaFRS\RIbNHUQHOH[HFXWLRQDQGFRS\RIcWRVWUHDP
EHIRUHVWDUWLQJWRVFKHGXOHRQVWUHDPZHERXQFHEDFNDQGIRUWKEHWZHHQWKH
208
VWUHDPVDVVLJQLQJZRUN:HDGGWKHFRS\RIaWRVWUHDPDQGWKHQZHDGGWKH
FRS\RIaWRVWUHDP7KHQZHDGGWKHFRS\RIbWRVWUHDPDQGWKHQZHDGGWKH
FRS\RIbWRVWUHDP:HHQTXHXHWKHNHUQHOLQYRFDWLRQLQVWUHDPDQGWKHQZH
HQTXHXHRQHLQVWUHDP)LQDOO\ZHHQTXHXHWKHFRS\RIcEDFNWRWKHKRVWLQ
VWUHDPIROORZHGE\WKHFRS\RIcLQVWUHDP
7RPDNHWKLVPRUHFRQFUHWHOHWǢVWDNHDORRNDWWKHFRGH$OOZHǢYHFKDQJHGLV
WKHRUGHULQZKLFKRSHUDWLRQVJHWDVVLJQHGWRHDFKRIRXUWZRVWUHDPVVRWKLV
ZLOOEHVWULFWO\DFRS\DQGSDVWHRSWLPL]DWLRQ(YHU\WKLQJHOVHLQWKHDSSOLFDWLRQ
ZLOOUHPDLQXQFKDQJHGZKLFKPHDQVWKDWRXULPSURYHPHQWVDUHORFDOL]HGWRWKH
for()ORRS7KHQHZEUHDGWKȌUVWDVVLJQPHQWWRWKHWZRVWUHDPVORRNVOLNHWKLV
209
,IZHDVVXPHWKDWRXUPHPRU\FRSLHVDQGNHUQHOH[HFXWLRQVDUHURXJKO\FRPSD-
UDEOHLQH[HFXWLRQWLPHRXUQHZH[HFXWLRQWLPHOLQHZLOOORRNOLNH)LJXUH7KH
LQWHUHQJLQHGHSHQGHQFLHVDUHKLJKOLJKWHGZLWKDUURZVVLPSO\WRLOOXVWUDWHWKDW
WKH\DUHVWLOOVDWLVȌHGZLWKWKLVQHZVFKHGXOLQJRUGHU
%HFDXVHZHKDYHTXHXHGRXURSHUDWLRQVEUHDGWKȌUVWDFURVVVWUHDPVZHQR
ORQJHUKDYHVWUHDPǢVFRS\RIcEORFNLQJVWUHDPǢVLQLWLDOPHPRU\FRSLHVRIa
DQGb7KLVDOORZVWKH*38WRH[HFXWHFRSLHVDQGNHUQHOVLQSDUDOOHODOORZLQJRXU
DSSOLFDWLRQWRUXQVLJQLȌFDQWO\IDVWHU7KHQHZFRGHUXQVLQPVDSHUFHQW
LPSURYHPHQWRYHURXURULJLQDOQD±YHGRXEOHVWUHDPLPSOHPHQWDWLRQ)RUDSSOL-
FDWLRQVWKDWFDQRYHUODSQHDUO\DOOFRPSXWDWLRQDQGPHPRU\FRSLHV\RXFDQ
DSSURDFKDQHDUO\WZRIROGLPSURYHPHQWLQSHUIRUPDQFHEHFDXVHWKHFRS\DQG
NHUQHOHQJLQHVZLOOEHFUDQNLQJWKHHQWLUHWLPH
&KDSWHU5HYLHZ
,QWKLVFKDSWHUZHORRNHGDWDPHWKRGIRUDFKLHYLQJDNLQGRIWDVNOHYHOSDUDO-
OHOLVPLQ&8'$&DSSOLFDWLRQV%\XVLQJWZR RUPRUH &8'$VWUHDPVZHFDQ
DOORZWKH*38WRVLPXOWDQHRXVO\H[HFXWHDNHUQHOZKLOHSHUIRUPLQJDFRS\
EHWZHHQWKHKRVWDQG*38:HQHHGWREHFDUHIXODERXWWZRWKLQJVZKHQZH
HQGHDYRUWRGRWKLVWKRXJK)LUVWWKHKRVWPHPRU\LQYROYHGQHHGVWREHDOOR-
FDWHGXVLQJcudaHostAlloc()VLQFHZHZLOOTXHXHRXUPHPRU\FRSLHVZLWK
cudaMemcpyAsync()DQGDV\QFKURQRXVFRSLHVQHHGWREHSHUIRUPHGZLWK
SLQQHGEXIIHUV6HFRQGZHQHHGWREHDZDUHWKDWWKHRUGHULQZKLFKZHDGGRSHU-
DWLRQVWRRXUVWUHDPVZLOODIIHFWRXUFDSDFLW\WRDFKLHYHRYHUODSSLQJRIFRSLHVDQG
NHUQHOH[HFXWLRQV7KHJHQHUDOJXLGHOLQHLQYROYHVDEUHDGWKȌUVWRUURXQGURELQ
DVVLJQPHQWRIZRUNWRWKHVWUHDPV\RXLQWHQGWRXVH7KLVFDQEHFRXQWHULQWXLWLYH
LI\RXGRQǢWXQGHUVWDQGKRZWKHKDUGZDUHTXHXLQJZRUNVVRLWǢVDJRRGWKLQJWR
UHPHPEHUZKHQ\RXJRDERXWZULWLQJ\RXURZQDSSOLFDWLRQV
211
7KHUHLVDQROGVD\LQJWKDWJRHVVRPHWKLQJOLNHWKLVǤ7KHRQO\WKLQJEHWWHUWKDQ
FRPSXWLQJRQD*38LVFRPSXWLQJRQWZR*38Vǥ6\VWHPVFRQWDLQLQJPXOWLSOH
JUDSKLFVSURFHVVRUVKDYHEHFRPHPRUHDQGPRUHFRPPRQLQUHFHQW\HDUV2I
FRXUVHLQVRPHZD\VPXOWL*38V\VWHPVDUHVLPLODUWRPXOWL&38V\VWHPVLQ
WKDWWKH\DUHVWLOOIDUIURPWKHFRPPRQV\VWHPFRQȌJXUDWLRQEXWLWKDVJRWWHQ
TXLWHHDV\WRHQGXSZLWKPRUHWKDQRQH*38LQ\RXUV\VWHP3URGXFWVVXFKDV
WKH*H)RUFH*7;FRQWDLQWZR*38VRQDVLQJOHFDUG19,',$ǢV7HVOD6
FRQWDLQVDZKRSSLQJIRXU&8'$FDSDEOHJUDSKLFVSURFHVVRUVLQLW6\VWHPVEXLOW
DURXQGDUHFHQW19,',$FKLSVHWZLOOKDYHDQLQWHJUDWHG&8'$FDSDEOH*38RQ
WKHPRWKHUERDUG$GGLQJDGLVFUHWH19,',$*38LQRQHRIWKH3&,([SUHVVVORWV
ZLOOPDNHWKLVV\VWHPPXOWL*381HLWKHURIWKHVHVFHQDULRVLVYHU\IDUIHWFKHG
VRZHZRXOGEHEHVWVHUYHGE\OHDUQLQJWRH[SORLWWKHUHVRXUFHVRIDV\VWHPZLWK
PXOWLSOH*38VLQLW
213
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQKRZWRDOORFDWHDQGXVHzero-copyPHPRU\
ǩ <RXZLOOOHDUQKRZWRXVHPXOWLSOH*38VZLWKLQWKHVDPHDSSOLFDWLRQ
ǩ <RXZLOOOHDUQKRZWRDOORFDWHDQGXVHportableSLQQHGPHPRU\
=HUR&RS\+RVW0HPRU\
,Q&KDSWHUZHH[DPLQHGSLQQHGRUSDJHORFNHGPHPRU\DQHZW\SHRI
KRVWPHPRU\WKDWFDPHZLWKWKHJXDUDQWHHWKDWWKHEXIIHUZRXOGQHYHUEH
VZDSSHGRXWRISK\VLFDOPHPRU\,I\RXUHFDOOZHDOORFDWHGWKLVPHPRU\E\
PDNLQJDFDOOWRcudaHostAlloc()DQGSDVVLQJcudaHostAllocDefault
WRJHWGHIDXOWSLQQHGPHPRU\:HSURPLVHGWKDWLQWKHQH[WFKDSWHU\RXZRXOG
VHHRWKHUPRUHH[FLWLQJPHDQVE\ZKLFK\RXFDQDOORFDWHSLQQHGPHPRU\
$VVXPLQJWKDWWKLVLVWKHRQO\UHDVRQ\RXǢYHFRQWLQXHGUHDGLQJ\RXZLOOEH
JODGWRNQRZWKDWWKHZDLWLVRYHU7KHȍDJcudaHostAllocMappedFDQEH
SDVVHGLQVWHDGRIcudaHostAllocDefault7KHKRVWPHPRU\DOORFDWHGXVLQJ
cudaHostAllocMappedLVpinnedLQWKHVDPHVHQVHWKDWPHPRU\DOORFDWHG
ZLWKcudaHostAllocDefaultLVSLQQHGVSHFLȌFDOO\WKDWLWFDQQRWEHSDJHGRXW
RIRUUHORFDWHGZLWKLQSK\VLFDOPHPRU\%XWLQDGGLWLRQWRXVLQJWKLVPHPRU\IURP
WKHKRVWIRUPHPRU\FRSLHVWRDQGIURPWKH*38WKLVQHZNLQGRIKRVWPHPRU\
DOORZVXVWRYLRODWHRQHRIWKHȌUVWUXOHVZHSUHVHQWHGLQ&KDSWHUFRQFHUQLQJ
KRVWPHPRU\:HFDQDFFHVVWKLVKRVWPHPRU\GLUHFWO\IURPZLWKLQ&8'$&
NHUQHOV%HFDXVHWKLVPHPRU\GRHVQRWUHTXLUHFRSLHVWRDQGIURPWKH*38ZH
UHIHUWRLWDVzero-copyPHPRU\
=(52Ȑ&23<'27352'8&7
7\SLFDOO\RXU*38DFFHVVHVRQO\*38PHPRU\DQGRXU&38DFFHVVHVRQO\KRVW
PHPRU\%XWLQVRPHFLUFXPVWDQFHVLWǢVEHWWHUWREUHDNWKHVHUXOHV7RVHHDQ
LQVWDQFHZKHUHLWǢVEHWWHUWRKDYHWKH*38PDQLSXODWHKRVWPHPRU\ZHǢOOUHYLVLW
RXUIDYRULWHUHGXFWLRQWKHYHFWRUGRWSURGXFW,I\RXǢYHPDQDJHGWRUHDGWKLV
HQWLUHERRN\RXPD\UHFDOORXUȌUVWDWWHPSWDWWKHGRWSURGXFW:HFRSLHGWKHWZR
LQSXWYHFWRUVWRWKH*38SHUIRUPHGWKHFRPSXWDWLRQFRSLHGWKHLQWHUPHGLDWH
UHVXOWVEDFNWRWKHKRVWDQGFRPSOHWHGWKHFRPSXWDWLRQRQWKH&38
214
,QWKLVYHUVLRQZHǢOOVNLSWKHH[SOLFLWFRSLHVRIRXULQSXWXSWRWKH*38DQGLQVWHDG
XVH]HURFRS\PHPRU\WRDFFHVVWKHGDWDGLUHFWO\IURPWKH*387KLVYHUVLRQRI
GRWSURGXFWZLOOEHVHWXSH[DFWO\OLNHRXUSLQQHGPHPRU\WHVW6SHFLȌFDOO\ZHǢOO
ZULWHWZRIXQFWLRQVRQHZLOOSHUIRUPWKHWHVWZLWKVWDQGDUGKRVWPHPRU\DQG
WKHRWKHUZLOOȌQLVKWKHUHGXFWLRQRQWKH*38XVLQJ]HURFRS\PHPRU\WRKROG
WKHLQSXWDQGRXWSXWEXIIHUV)LUVWOHWǢVWDNHDORRNDWWKHVWDQGDUGKRVWPHPRU\
YHUVLRQRIWKHGRWSURGXFW:HVWDUWLQWKHXVXDOIDVKLRQE\FUHDWLQJWLPLQJHYHQWV
DOORFDWLQJLQSXWDQGRXWSXWEXIIHUVDQGȌOOLQJRXULQSXWEXIIHUVZLWKGDWD
215
$IWHUWKHDOORFDWLRQVDQGGDWDFUHDWLRQZHFDQEHJLQWKHFRPSXWDWLRQV:HVWDUW
RXUWLPHUFRS\RXULQSXWVWRWKH*38H[HFXWHWKHGRWSURGXFWNHUQHODQGFRS\
WKHSDUWLDOUHVXOWVEDFNWRWKHKRVW
// copy the array 'c' back from the GPU to the CPU
HANDLE_ERROR( cudaMemcpy( partial_c, dev_partial_c,
blocksPerGrid*sizeof(float),
cudaMemcpyDeviceToHost ) );
1RZZHQHHGWRȌQLVKXSRXUFRPSXWDWLRQVRQWKH&38DVZHGLGLQ&KDSWHU
%HIRUHGRLQJWKLVZHǢOOVWRSRXUHYHQWWLPHUEHFDXVHLWRQO\PHDVXUHVZRUNWKDWǢV
EHLQJSHUIRUPHGRQWKH*38
)LQDOO\ZHVXPRXUSDUWLDOUHVXOWVDQGIUHHRXULQSXWDQGRXWSXWEXIIHUV
216
// free events
HANDLE_ERROR( cudaEventDestroy( start ) );
HANDLE_ERROR( cudaEventDestroy( stop ) );
return elapsedTime;
}
7KHYHUVLRQWKDWXVHV]HURFRS\PHPRU\ZLOOEHUHPDUNDEO\VLPLODUZLWKWKH
H[FHSWLRQRIPHPRU\DOORFDWLRQ6RZHVWDUWE\DOORFDWLQJRXULQSXWDQGRXWSXW
ȌOOLQJWKHLQSXWPHPRU\ZLWKGDWDDVEHIRUH
217
$VZLWK&KDSWHUZHVHHcudaHostAlloc()LQDFWLRQDJDLQDOWKRXJKZHǢUH
QRZXVLQJWKHflagsDUJXPHQWWRVSHFLI\PRUHWKDQMXVWGHIDXOWEHKDYLRU7KH
ȍDJcudaHostAllocMappedWHOOVWKHUXQWLPHWKDWZHLQWHQGWRDFFHVVWKLV
EXIIHUIURPWKH*38,QRWKHUZRUGVWKLVȍDJLVZKDWPDNHVRXUEXIIHUzero-copy
)RUWKHWZRLQSXWEXIIHUVZHVSHFLI\WKHȍDJcudaHostAllocWriteCombined
7KLVȍDJLQGLFDWHVWKDWWKHUXQWLPHVKRXOGDOORFDWHWKHEXIIHUDVZULWHFRPELQHG
ZLWKUHVSHFWWRWKH&38FDFKH7KLVȍDJZLOOQRWFKDQJHIXQFWLRQDOLW\LQRXUDSSOL-
FDWLRQEXWUHSUHVHQWVDQLPSRUWDQWSHUIRUPDQFHHQKDQFHPHQWIRUEXIIHUVWKDW
ZLOOEHUHDGRQO\E\WKH*38+RZHYHUZULWHFRPELQHGPHPRU\FDQEHH[WUHPHO\
LQHIȌFLHQWLQVFHQDULRVZKHUHWKH&38DOVRQHHGVWRSHUIRUPUHDGVIURPWKH
EXIIHUVR\RXZLOOKDYHWRFRQVLGHU\RXUDSSOLFDWLRQǢVOLNHO\DFFHVVSDWWHUQVZKHQ
PDNLQJWKLVGHFLVLRQ
6LQFHZHǢYHDOORFDWHGRXUKRVWPHPRU\ZLWKWKHȍDJcudaHostAllocMapped,
WKHEXIIHUVFDQEHDFFHVVHGIURPWKH*38+RZHYHUWKH*38KDVDGLIIHUHQW
YLUWXDOPHPRU\VSDFHWKDQWKH&38VRWKHEXIIHUVZLOOKDYHGLIIHUHQWDGGUHVVHV
ZKHQWKH\ǢUHDFFHVVHGRQWKH*38DVFRPSDUHGWRWKH&387KHFDOOWR
cudaHostAlloc()UHWXUQVWKH&38SRLQWHUIRUWKHPHPRU\VRZHQHHGWRFDOO
cudaHostGetDevicePointer()LQRUGHUWRJHWDYDOLG*38SRLQWHUIRUWKH
PHPRU\7KHVHSRLQWHUVZLOOEHSDVVHGWRWKHNHUQHODQGWKHQXVHGE\WKH*38WR
UHDGIURPDQGZULWHWRRXUKRVWDOORFDWLRQV
218
:LWKYDOLGGHYLFHSRLQWHUVLQKDQGZHǢUHUHDG\WRVWDUWRXUWLPHUDQGODXQFKRXU
NHUQHO
(YHQWKRXJKWKHSRLQWHUVdev_a, dev_bDQGdev_partial_cDOOUHVLGHRQ
WKHKRVWWKH\ZLOOORRNWRRXUNHUQHODVLIWKH\DUH*38PHPRU\WKDQNVWRRXU
FDOOVWRcudaHostGetDevicePointer()6LQFHRXUSDUWLDOUHVXOWVDUHDOUHDG\
RQWKHKRVWZHGRQǢWQHHGWRERWKHUZLWKDcudaMemcpy()IURPWKHGHYLFH
+RZHYHU\RXZLOOQRWLFHWKDWZHǢUHV\QFKURQL]LQJWKH&38ZLWKWKH*38E\FDOOLQJ
cudaThreadSynchronize()7KHFRQWHQWVRI]HURFRS\PHPRU\DUHXQGHȌQHG
GXULQJWKHH[HFXWLRQRIDNHUQHOWKDWSRWHQWLDOO\PDNHVFKDQJHVWRLWVFRQWHQWV
$IWHUV\QFKURQL]LQJZHǢUHVXUHWKDWWKHNHUQHOKDVFRPSOHWHGDQGWKDWRXU]HUR
FRS\EXIIHUFRQWDLQVWKHUHVXOWVVRZHFDQVWRSRXUWLPHUDQGȌQLVKWKHFRPSXWD-
WLRQRQWKH&38DVZHGLGEHIRUH
219
7KHRQO\WKLQJUHPDLQLQJLQWKHcudaHostAlloc()YHUVLRQRIWKHGRWSURGXFWLV
FOHDQXS
HANDLE_ERROR( cudaFreeHost( a ) );
HANDLE_ERROR( cudaFreeHost( b ) );
HANDLE_ERROR( cudaFreeHost( partial_c ) );
// free events
HANDLE_ERROR( cudaEventDestroy( start ) );
HANDLE_ERROR( cudaEventDestroy( stop ) );
return elapsedTime;
}
<RXZLOOQRWLFHWKDWQRPDWWHUZKDWȍDJVZHXVHZLWKcudaHostAlloc(),
WKHPHPRU\DOZD\VJHWVIUHHGLQWKHVDPHZD\6SHFLȌFDOO\DFDOOWR
cudaFreeHost()GRHVWKHWULFN
$QGWKDWǢVWKDW$OOWKDWUHPDLQVLVWRORRNDWKRZmain()WLHVDOORIWKLVWRJHWKHU
7KHȌUVWWKLQJZHQHHGWRFKHFNLVZKHWKHURXUGHYLFHVXSSRUWVPDSSLQJKRVW
PHPRU\:HGRWKLVWKHVDPHZD\ZHFKHFNHGIRUGHYLFHRYHUODSLQWKHSUHYLRXV
FKDSWHUZLWKDFDOOWRcudaGetDeviceProperties()
220
$VVXPLQJWKDWRXUGHYLFHVXSSRUWV]HURFRS\PHPRU\ZHSODFHWKHUXQWLPH
LQWRDVWDWHZKHUHLWZLOOEHDEOHWRDOORFDWH]HURFRS\EXIIHUVIRUXV:HDFFRP-
SOLVKWKLVE\DFDOOWRcudaSetDeviceFlags()DQGE\SDVVLQJWKHȍDJ
cudaDeviceMapHostWRLQGLFDWHWKDWZHZDQWWKHGHYLFHWREHDOORZHGWRPDS
KRVWPHPRU\
7KDWǢVUHDOO\DOOWKHUHLVWRmain():HUXQRXUWZRWHVWVGLVSOD\WKHHODSVHG
WLPHDQGH[LWWKHDSSOLFDWLRQ
elapsedTime = cuda_host_alloc_test( N );
printf( "Time using cudaHostAlloc: %3.1f ms\n",
elapsedTime );
}
7KHNHUQHOLWVHOILVXQFKDQJHGIURP&KDSWHUEXWIRUWKHVDNHRIFRPSOHWHQHVV
KHUHLWLVLQLWVHQWLUHW\
__global__ void dot( int size, float *a, float *b, float *c ) {
__shared__ float cache[threadsPerBlock];
int tid = threadIdx.x + blockIdx.x * blockDim.x;
int cacheIndex = threadIdx.x;
221
float temp = 0;
while (tid < size) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
if (cacheIndex == 0)
c[blockIdx.x] = cache[0];
}
=(52Ȑ&23<3(5)250$1&(
:KDWVKRXOGZHH[SHFWWRJDLQIURPXVLQJ]HURFRS\PHPRU\"7KHDQVZHUWR
WKLVTXHVWLRQLVGLIIHUHQWIRUGLVFUHWH*38VDQGLQWHJUDWHG*38VDiscrete GPUs
DUHJUDSKLFVSURFHVVRUVWKDWKDYHWKHLURZQGHGLFDWHG'5$0VDQGW\SLFDOO\VLW
RQVHSDUDWHFLUFXLWERDUGVIURPWKH&38)RUH[DPSOHLI\RXKDYHHYHULQVWDOOHG
DJUDSKLFVFDUGLQWR\RXUGHVNWRSWKLV*38LVDGLVFUHWH*38Integrated GPUs
DUHJUDSKLFVSURFHVVRUVEXLOWLQWRDV\VWHPǢVFKLSVHWDQGXVXDOO\VKDUHUHJXODU
222
V\VWHPPHPRU\ZLWKWKH&380DQ\UHFHQWV\VWHPVEXLOWZLWK19,',$ǢVQ)RUFH
PHGLDDQGFRPPXQLFDWLRQVSURFHVVRUV 0&3V FRQWDLQ&8'$FDSDEOHLQWH-
JUDWHG*38V,QDGGLWLRQWRQ)RUFH0&3VDOOWKHQHWERRNQRWHERRNDQGGHVNWRS
FRPSXWHUVEDVHGRQ19,',$ǢVQHZ,21SODWIRUPFRQWDLQLQWHJUDWHG&8'$
FDSDEOH*38V)RULQWHJUDWHG*38VWKHXVHRI]HURFRS\PHPRU\LValways a
SHUIRUPDQFHZLQEHFDXVHWKHPHPRU\LVSK\VLFDOO\VKDUHGZLWKWKHKRVWDQ\ZD\
'HFODULQJDEXIIHUDV]HURFRS\KDVWKHVROHHIIHFWRISUHYHQWLQJXQQHFHVVDU\
FRSLHVRIGDWD%XWUHPHPEHUWKDWQRWKLQJLVIUHHDQGWKDW]HURFRS\EXIIHUV
DUHVWLOOFRQVWUDLQHGLQWKHVDPHZD\WKDWDOOSLQQHGPHPRU\DOORFDWLRQVDUH
FRQVWUDLQHG(DFKSLQQHGDOORFDWLRQFDUYHVLQWRWKHV\VWHPǢVDYDLODEOHSK\VLFDO
PHPRU\ZKLFKZLOOHYHQWXDOO\GHJUDGHV\VWHPSHUIRUPDQFH
,QFDVHVZKHUHLQSXWVDQGRXWSXWVDUHXVHGH[DFWO\RQFHZHZLOOHYHQVHHD
SHUIRUPDQFHHQKDQFHPHQWZKHQXVLQJ]HURFRS\PHPRU\ZLWKDGLVFUHWH*38
6LQFH*38VDUHGHVLJQHGWRH[FHODWKLGLQJWKHODWHQFLHVDVVRFLDWHGZLWKPHPRU\
DFFHVVSHUIRUPLQJUHDGVDQGZULWHVRYHUWKH3&,([SUHVVEXVFDQEHPLWLJDWHG
WRVRPHGHJUHHE\WKLVPHFKDQLVP\LHOGLQJDQRWLFHDEOHSHUIRUPDQFHDGYDQWDJH
%XWVLQFHWKH]HURFRS\PHPRU\LVQRWFDFKHGRQWKH*38LQVLWXDWLRQVZKHUH
WKHPHPRU\JHWVUHDGPXOWLSOHWLPHVZHZLOOHQGXSSD\LQJDODUJHSHQDOW\WKDW
FRXOGEHDYRLGHGE\VLPSO\FRS\LQJWKHGDWDWRWKH*38ȌUVW
+RZGR\RXGHWHUPLQHZKHWKHUD*38LVLQWHJUDWHGRUGLVFUHWH":HOO\RXFDQ
RSHQXS\RXUFRPSXWHUDQGORRNEXWWKLVVROXWLRQLVIDLUO\XQZRUNDEOHIRU\RXU
&8'$&DSSOLFDWLRQ<RXUFRGHFDQFKHFNWKLVSURSHUW\RID*38E\QRWVXUSULV-
LQJO\ORRNLQJDWWKHVWUXFWXUHUHWXUQHGE\cudaGetDeviceProperties()7KLV
VWUXFWXUHKDVDȌHOGQDPHGintegratedZKLFKZLOOEHtrueLIWKHGHYLFHLVDQ
LQWHJUDWHG*38DQGfalseLILWǢVQRW
6LQFHRXUGRWSURGXFWDSSOLFDWLRQVDWLVȌHVWKHǤUHDGDQGRUZULWHH[DFWO\RQFHǥ
FRQVWUDLQWLWǢVSRVVLEOHWKDWLWZLOOHQMR\DSHUIRUPDQFHERRVWZKHQUXQZLWK
]HURFRS\PHPRU\$QGLQIDFWLWGRHVHQMR\DVOLJKWERRVWLQSHUIRUPDQFH2QD
*H)RUFH*7;WKHH[HFXWLRQWLPHLPSURYHVE\PRUHWKDQSHUFHQWGURS-
SLQJIURPPVWRPVZKHQPLJUDWHGWR]HURFRS\PHPRU\$*H)RUFH*7;
HQMR\VDVLPLODULPSURYHPHQWVSHHGLQJXSE\SHUFHQWIURPPVWR
PV2IFRXUVHGLIIHUHQW*38VZLOOH[KLELWGLIIHUHQWSHUIRUPDQFHFKDUDFWHULV-
WLFVEHFDXVHRIYDU\LQJUDWLRVRIFRPSXWDWLRQWREDQGZLGWKDVZHOODVEHFDXVHRI
YDULDWLRQVLQHIIHFWLYH3&,([SUHVVEDQGZLGWKDFURVVFKLSVHWV
223
8VLQJ0XOWLSOH*38V
,QWKHSUHYLRXVVHFWLRQZHPHQWLRQHGKRZGHYLFHVDUHHLWKHULQWHJUDWHGRU
GLVFUHWH*38VZKHUHWKHIRUPHULVEXLOWLQWRWKHV\VWHPǢVFKLSVHWDQGWKHODWWHULV
W\SLFDOO\DQH[SDQVLRQFDUGLQD3&,([SUHVVVORW0RUHDQGPRUHV\VWHPVFRQWDLQ
bothLQWHJUDWHGDQGGLVFUHWH*38VPHDQLQJWKDWWKH\DOVRKDYHPXOWLSOH&8'$
FDSDEOHSURFHVVRUV19,',$DOVRVHOOVSURGXFWVVXFKDVWKH*H)RUFH*7;
WKDWFRQWDLQPRUHWKDQRQH*38$*H)RUFH*7;ZKLOHSK\VLFDOO\RFFXS\LQJ
DVLQJOHH[SDQVLRQVORWZLOODSSHDUWR\RXU&8'$DSSOLFDWLRQVDVWZRVHSDUDWH
*38V)XUWKHUPRUHXVHUVFDQDOVRDGGPXOWLSOH*38VWRVHSDUDWH3&,([SUHVV
VORWVFRQQHFWLQJWKHPZLWKEULGJHVXVLQJ19,',$ǢVscalable link interface (SLI)
WHFKQRORJ\$VDUHVXOWRIWKHVHWUHQGVLWKDVEHFRPHUHODWLYHO\FRPPRQWRKDYH
D&8'$DSSOLFDWLRQUXQQLQJRQDV\VWHPZLWKPXOWLSOHJUDSKLFVSURFHVVRUV6LQFH
RXU&8'$DSSOLFDWLRQVWHQGWREHYHU\SDUDOOHOL]DEOHWREHJLQZLWKLWZRXOGEH
H[FHOOHQWLIZHFRXOGXVHHYHU\&8'$GHYLFHLQWKHV\VWHPWRDFKLHYHPD[LPXP
WKURXJKSXW6ROHWǢVȌJXUHRXWKRZZHFDQDFFRPSOLVKWKLV
7RDYRLGOHDUQLQJDQHZH[DPSOHOHWǢVFRQYHUWRXUGRWSURGXFWWRXVHPXOWLSOH
*38V7RPDNHRXUOLYHVHDVLHUZHZLOOVXPPDUL]HDOOWKHGDWDQHFHVVDU\WR
FRPSXWHDGRWSURGXFWLQDVLQJOHVWUXFWXUH<RXǢOOVHHPRPHQWDULO\H[DFWO\ZK\
WKLVZLOOPDNHRXUOLYHVHDVLHU
struct DataStruct {
int deviceID;
int size;
float *a;
float *b;
float returnValue;
};
7KLVVWUXFWXUHFRQWDLQVWKHLGHQWLȌFDWLRQIRUWKHGHYLFHRQZKLFKWKHGRWSURGXFW
ZLOOEHFRPSXWHGLWFRQWDLQVWKHVL]HRIWKHLQSXWEXIIHUVDVZHOODVSRLQWHUVWR
WKHWZRLQSXWVaDQGb)LQDOO\LWKDVDQHQWU\WRVWRUHWKHYDOXHFRPSXWHGDVWKH
GRWSURGXFWRIaDQGb
7RXVHN*38VZHȌUVWZRXOGOLNHWRNQRZH[DFWO\ZKDWYDOXHRINZHǢUHGHDOLQJ
ZLWK6RZHVWDUWRXUDSSOLFDWLRQZLWKDFDOOWRcudaGetDeviceCount()LQ
224
RUGHUWRGHWHUPLQHKRZPDQ\&8'$FDSDEOHSURFHVVRUVKDYHEHHQLQVWDOOHGLQ
RXUV\VWHP
7KLVH[DPSOHLVGHVLJQHGWRVKRZPXOWL*38XVDJHVR\RXǢOOQRWLFHWKDWZH
VLPSO\H[LWLIWKHV\VWHPKDVRQO\RQH&8'$GHYLFH QRWWKDWWKHUHǢVDQ\WKLQJ
ZURQJZLWKWKDW 7KLVLVQRWHQFRXUDJHGDVDEHVWSUDFWLFHIRUREYLRXVUHDVRQV
7RNHHSWKLQJVDVVLPSOHDVSRVVLEOHZHǢOODOORFDWHVWDQGDUGKRVWPHPRU\IRURXU
LQSXWVDQGȌOOWKHPZLWKGDWDH[DFWO\KRZZHǢYHGRQHLQWKHSDVW
:HǢUHQRZUHDG\WRGLYHLQWRWKHPXOWL*38FRGH7KHWULFNWRXVLQJPXOWLSOH*38V
ZLWKWKH&8'$UXQWLPH$3,LVUHDOL]LQJWKDWHDFK*38QHHGVWREHFRQWUROOHG
E\DGLIIHUHQW&38WKUHDG6LQFHZHKDYHXVHGRQO\DVLQJOH*38EHIRUHZH
KDYHQǢWQHHGHGWRZRUU\DERXWWKLV:HKDYHPRYHGDORWRIWKHDQQR\DQFHRI
PXOWLWKUHDGHGFRGHWRRXUȌOHRIDX[LOLDU\FRGHbook.h:LWKWKLVFRGHWXFNHG
DZD\DOOZHQHHGWRGRLVȌOODVWUXFWXUHZLWKGDWDQHFHVVDU\WRSHUIRUPWKH
225
FRPSXWDWLRQV$OWKRXJKWKHV\VWHPFRXOGKDYHDQ\QXPEHURI*38VJUHDWHUWKDQ
RQHZHZLOOXVHRQO\WZRRIWKHPIRUFODULW\
DataStruct data[2];
data[0].deviceID = 0;
data[0].size = N/2;
data[0].a = a;
data[0].b = b;
data[1].deviceID = 1;
data[1].size = N/2;
data[1].a = a + N/2;
data[1].b = b + N/2;
7RSURFHHGZHSDVVRQHRIWKHDataStructYDULDEOHVWRDXWLOLW\IXQFWLRQZHǢYH
QDPHGstart_thread():HDOVRSDVVstart_thread()DSRLQWHUWRDIXQF-
WLRQWREHFDOOHGE\WKHQHZO\FUHDWHGWKUHDGWKLVH[DPSOHǢVWKUHDGIXQFWLRQLV
FDOOHGroutine()7KHIXQFWLRQstart_thread()ZLOOFUHDWHDQHZWKUHDGWKDW
WKHQFDOOVWKHVSHFLȌHGIXQFWLRQSDVVLQJWKHDataStructWRWKLVIXQFWLRQ7KH
RWKHUFDOOWRroutine()JHWVPDGHIURPWKHGHIDXOWDSSOLFDWLRQWKUHDG VRZHǢYH
FUHDWHGRQO\RQHadditionalWKUHDG
%HIRUHZHSURFHHGZHKDYHWKHPDLQDSSOLFDWLRQWKUHDGZDLWIRUWKHRWKHUWKUHDG
WRȌQLVKE\FDOOLQJend_thread()
end_thread( thread );
6LQFHERWKWKUHDGVKDYHFRPSOHWHGDWWKLVSRLQWLQmain()LWǢVVDIHWRFOHDQXS
DQGGLVSOD\WKHUHVXOW
226
free( a );
free( b );
return 0;
}
1RWLFHWKDWZHVXPWKHUHVXOWVFRPSXWHGE\HDFKWKUHDG7KLVLVWKHODVWVWHS
LQRXUGRWSURGXFWUHGXFWLRQ,QDQRWKHUDOJRULWKPWKLVFRPELQDWLRQRIPXOWLSOH
UHVXOWVPD\LQYROYHRWKHUVWHSV,QIDFWLQVRPHDSSOLFDWLRQVWKHWZR*38VPD\
EHH[HFXWLQJFRPSOHWHO\GLIIHUHQWFRGHRQFRPSOHWHO\GLIIHUHQWGDWDVHWV)RU
VLPSOLFLW\ǢVVDNHWKLVLVQRWWKHFDVHLQRXUGRWSURGXFWH[DPSOH
6LQFHWKHGRWSURGXFWURXWLQHLVLGHQWLFDOWRWKHRWKHUYHUVLRQV\RXǢYHVHHQZHǢOO
RPLWLWIURPWKLVVHFWLRQ+RZHYHUWKHFRQWHQWVRIroutine()PD\EHRILQWHUHVW
:HGHFODUHroutine()DVWDNLQJDQGUHWXUQLQJDvoid*VRWKDW\RXFDQUHXVH
WKHstart_thread()FRGHZLWKDUELWUDU\LPSOHPHQWDWLRQVRIDWKUHDGIXQFWLRQ
$OWKRXJKZHǢGORYHWRWDNHFUHGLWIRUWKLVLGHDLWǢVIDLUO\VWDQGDUGSURFHGXUHIRU
FDOOEDFNIXQFWLRQVLQ&
(DFKWKUHDGFDOOVcudaSetDevice()DQGHDFKSDVVHVDGLIIHUHQW,'WRWKLV
IXQFWLRQ$VDUHVXOWZHNQRZHDFKWKUHDGZLOOEHPDQLSXODWLQJDGLIIHUHQW*38
7KHVH*38VPD\KDYHLGHQWLFDOSHUIRUPDQFHDVZLWKWKHGXDO*38*H)RUFH
*7;RUWKH\PD\EHGLIIHUHQW*38VDVZRXOGEHWKHFDVHLQDV\VWHPWKDW
KDVERWKDQLQWHJUDWHG*38DQGDGLVFUHWH*387KHVHGHWDLOVDUHQRWLPSRUWDQW
WRRXUDSSOLFDWLRQWKRXJKWKH\PLJKWEHRILQWHUHVWWR\RX3DUWLFXODUO\WKHVH
GHWDLOVSURYHXVHIXOLI\RXGHSHQGRQDFHUWDLQPLQLPXPFRPSXWHFDSDELOLW\WR
ODXQFK\RXUNHUQHOVRULI\RXKDYHDVHULRXVGHVLUHWRORDGEDODQFH\RXUDSSOLFD-
WLRQDFURVVWKHV\VWHPǢV*38V,IWKH*38VDUHGLIIHUHQW\RXZLOOQHHGWRGRVRPH
227
ZRUNWRSDUWLWLRQWKHFRPSXWDWLRQVVRWKDWHDFK*38LVRFFXSLHGIRUURXJKO\
WKHVDPHDPRXQWRIWLPH)RURXUSXUSRVHVLQWKLVH[DPSOHKRZHYHUWKHVHDUH
SLGGOLQJGHWDLOVZLWKZKLFKZHZRQǢWZRUU\
2XWVLGHWKHFDOOWRcudaSetDevice()WRVSHFLI\ZKLFK&8'$GHYLFHZH
LQWHQGWRXVHWKLVLPSOHPHQWDWLRQRIroutine()LVUHPDUNDEO\VLPLODUWRWKH
YDQLOODmalloc_test()IURP6HFWLRQ=HUR&RS\'RW3URGXFW:HDOOR-
FDWHEXIIHUVIRURXU*38FRSLHVRIWKHLQSXWDQGDEXIIHUIRURXUSDUWLDOUHVXOWV
IROORZHGE\DcudaMemcpy()RIHDFKLQSXWDUUD\WRWKH*38
:HWKHQODXQFKRXUGRWSURGXFWNHUQHOFRS\WKHUHVXOWVEDFNDQGȌQLVKWKH
FRPSXWDWLRQRQWKH&38
228
$VXVXDOZHFOHDQXSRXU*38EXIIHUVDQGUHWXUQWKHGRWSURGXFWZHǢYH
FRPSXWHGLQWKHreturnValueȌHOGRIRXUDataStruct
data->returnValue = c;
return 0;
}
6RZKHQZHJHWGRZQWRLWRXWVLGHRIWKHKRVWWKUHDGPDQDJHPHQWLVVXHXVLQJ
PXOWLSOH*38VLVQRWWRRPXFKWRXJKHUWKDQXVLQJDVLQJOH*388VLQJRXUKHOSHU
FRGHWRFUHDWHDWKUHDGDQGH[HFXWHDIXQFWLRQRQWKDWWKUHDGWKLVEHFRPHV
VLJQLȌFDQWO\PRUHPDQDJHDEOH,I\RXKDYH\RXURZQWKUHDGOLEUDULHV\RXVKRXOG
IHHOIUHHWRXVHWKHPLQ\RXURZQDSSOLFDWLRQV<RXMXVWQHHGWRUHPHPEHUWKDW
HDFK*38JHWVLWVRZQWKUHDGDQGHYHU\WKLQJHOVHLVFUHDPFKHHVH
229
3RUWDEOH3LQQHG0HPRU\
7KHODVWLPSRUWDQWSLHFHWRXVLQJPXOWLSOH*38VLQYROYHVWKHXVHRISLQQHG
PHPRU\:HOHDUQHGLQ&KDSWHUWKDWSLQQHGPHPRU\LVDFWXDOO\KRVWPHPRU\
WKDWKDVLWVSDJHVORFNHGLQSK\VLFDOPHPRU\WRSUHYHQWLWIURPEHLQJSDJHGRXW
RUUHORFDWHG+RZHYHULWWXUQVRXWWKDWSDJHVFDQDSSHDUSLQQHGWRDVLQJOH&38
WKUHDGRQO\7KDWLVWKH\ZLOOUHPDLQSDJHORFNHGLIanyWKUHDGKDVDOORFDWHGWKHP
DVSLQQHGPHPRU\EXWWKH\ZLOORQO\appearSDJHORFNHGWRWKHWKUHDGWKDWDOOR-
FDWHGWKHP,IWKHSRLQWHUWRWKLVPHPRU\LVVKDUHGEHWZHHQWKUHDGVWKHRWKHU
WKUHDGVZLOOVHHWKHEXIIHUDVVWDQGDUGSDJHDEOHGDWD
$VDVLGHHIIHFWRIWKLVEHKDYLRUZKHQDWKUHDGWKDWGLGQRWDOORFDWHDSLQQHG
EXIIHUDWWHPSWVWRSHUIRUPDcudaMemcpy()XVLQJLWWKHFRS\ZLOOEHSHUIRUPHG
DWVWDQGDUGSDJHDEOHPHPRU\VSHHGV$VZHVDZLQ&KDSWHUWKLVVSHHGFDQ
EHURXJKO\SHUFHQWRIWKHPD[LPXPDWWDLQDEOHWUDQVIHUVSHHG:KDWǢVZRUVH
LIWKHWKUHDGDWWHPSWVWRHQTXHXHDcudaMemcpyAsync()FDOOLQWRD&8'$
VWUHDPWKLVRSHUDWLRQZLOOIDLOEHFDXVHLWUHTXLUHVDSLQQHGEXIIHUWRSURFHHG
6LQFHWKHEXIIHUDSSHDUVSDJHDEOHIURPWKHWKUHDGWKDWGLGQǢWDOORFDWHLWWKHFDOO
GLHVDJULVO\GHDWK(YHQLQWKHIXWXUHQRWKLQJZRUNV
%XWWKHUHLVDUHPHG\WRWKLVSUREOHP:HFDQDOORFDWHSLQQHGPHPRU\DV
portablePHDQLQJWKDWZHZLOOEHDOORZHGWRPLJUDWHLWEHWZHHQKRVWWKUHDGV
DQGDOORZDQ\WKUHDGWRYLHZLWDVDSLQQHGEXIIHU7RGRVRZHXVHRXUWUXVW\
cudaHostAlloc()WRDOORFDWHWKHPHPRU\EXWZHFDOOLWZLWKDQHZȍDJ
cudaHostAllocPortable7KLVȍDJFDQEHXVHGLQFRQFHUWZLWKWKH
RWKHUȍDJV\RXǢYHVHHQVXFKDVcudaHostAllocWriteCombinedDQG
cudaHostAllocMapped7KLVPHDQVWKDW\RXFDQDOORFDWH\RXUKRVWEXIIHUVDV
DQ\FRPELQDWLRQRISRUWDEOH]HURFRS\DQGZULWHFRPELQHG
7RGHPRQVWUDWHSRUWDEOHSLQQHGPHPRU\ZHǢOOHQKDQFHRXUPXOWL*38GRW
SURGXFWDSSOLFDWLRQ:HǢOODGDSWRXURULJLQDO]HURFRS\YHUVLRQRIWKHGRW
SURGXFWVRWKLVYHUVLRQEHJLQVDVVRPHWKLQJRIDPDVKXSRIWKH]HURFRS\DQG
PXOWL*38YHUVLRQV$VZHKDYHWKURXJKRXWWKLVFKDSWHUZHQHHGWRYHULI\WKDW
WKHUHDUHDWOHDVWWZR&8'$FDSDEOH*38VDQGWKDWERWKFDQKDQGOH]HURFRS\
EXIIHUV
230
cudaDeviceProp prop;
for (int i=0; i<2; i++) {
HANDLE_ERROR( cudaGetDeviceProperties( &prop, i ) );
if (prop.canMapHostMemory != 1) {
printf( "Device %d cannot map memory.\n", i );
return 0;
}
}
,QSUHYLRXVH[DPSOHVZHǢGEHUHDG\WRVWDUWDOORFDWLQJPHPRU\RQWKHKRVWWR
KROGRXULQSXWYHFWRUV7RDOORFDWHSRUWDEOHSLQQHGPHPRU\KRZHYHULWǢVQHFHV-
VDU\WRȌUVWVHWWKH&8'$GHYLFHRQZKLFKZHLQWHQGWRUXQ6LQFHZHLQWHQGWR
XVHWKHGHYLFHIRU]HURFRS\PHPRU\DVZHOOZHIROORZWKHcudaSetDevice()
FDOOZLWKDFDOOWRcudaSetDeviceFlags()DVZHGLGLQ6HFWLRQ=HUR
&RS\'RW3URGXFW
231
(DUOLHULQWKLVFKDSWHUZHFDOOHGcudaSetDevice()EXWQRWXQWLOZHKDGDOUHDG\
DOORFDWHGRXUPHPRU\DQGFUHDWHGRXUWKUHDGV2QHRIWKHUHTXLUHPHQWVRIDOOR-
FDWLQJSDJHORFNHGPHPRU\ZLWKcudaHostAlloc()WKRXJKLVWKDWZHKDYH
LQLWLDOL]HGWKHGHYLFHȌUVWE\FDOOLQJcudaSetDevice()<RXZLOODOVRQRWLFHWKDW
ZHSDVVRXUQHZO\OHDUQHGȍDJcudaHostAllocPortableWRERWKDOORFDWLRQV
6LQFHWKHVHZHUHDOORFDWHGDIWHUFDOOLQJcudaSetDevice(0)RQO\&8'$GHYLFH
]HURZRXOGVHHWKHVHEXIIHUVDVSLQQHGPHPRU\LIZHKDGQRWVSHFLȌHGWKDWWKH\
ZHUHWREHSRUWDEOHDOORFDWLRQV
:HFRQWLQXHWKHDSSOLFDWLRQDVZHKDYHLQWKHSDVWJHQHUDWLQJGDWDIRURXULQSXW
YHFWRUVDQGSUHSDULQJRXUDataStructVWUXFWXUHVDVZHGLGLQWKHPXOWL*38
H[DPSOHLQ6HFWLRQ=HUR&RS\3HUIRUPDQFH
data[1].deviceID = 1;
data[1].offset = N/2;
data[1].size = N/2;
data[1].a = a;
data[1].b = b;
:HFDQWKHQFUHDWHRXUVHFRQGDU\WKUHDGDQGFDOOroutine()WREHJLQ
FRPSXWLQJRQHDFKGHYLFH
232
%HFDXVHRXUKRVWPHPRU\ZDVDOORFDWHGE\WKH&8'$UXQWLPHZHXVH
cudaFreeHost()WRIUHHLW2WKHUWKDQQRORQJHUFDOOLQJfree()ZHKDYHVHHQ
DOOWKHUHLVWRVHHLQmain()
return 0;
}
7RVXSSRUWSRUWDEOHSLQQHGPHPRU\DQG]HURFRS\PHPRU\LQRXUPXOWL*38
DSSOLFDWLRQZHQHHGWRPDNHWZRQRWDEOHFKDQJHVLQWKHFRGHIRUroutine()
7KHȌUVWLVDELWVXEWOHDQGLQQRZD\VKRXOGWKLVKDYHEHHQREYLRXV
<RXPD\UHFDOOLQRXUPXOWL*38YHUVLRQRIWKLVFRGHZHQHHGDFDOOWR
cudaSetDevice()LQroutine()LQRUGHUWRHQVXUHWKDWHDFKSDUWLFLSDWLQJ
WKUHDGFRQWUROVDGLIIHUHQW*382QWKHRWKHUKDQGLQWKLVH[DPSOHZHKDYH
DOUHDG\PDGHDFDOOWRcudaSetDevice()IURPWKHPDLQWKUHDG:HGLGVRLQ
RUGHUWRDOORFDWHSLQQHGPHPRU\LQmain()$VDUHVXOWZHRQO\ZDQWWRFDOO
233
cudaSetDevice()DQGcudaSetDeviceFlags()RQGHYLFHVZKHUHZHKDYH
QRWPDGHWKLVFDOO7KDWLVZHFDOOWKHVHWZRIXQFWLRQVLIWKHdeviceIDLVQRW
]HUR$OWKRXJKLWZRXOG\LHOGFOHDQHUFRGHWRVLPSO\UHSHDWWKHVHFDOOVRQGHYLFH
]HURLWWXUQVRXWWKDWWKLVLVLQIDFWDQHUURU2QFH\RXKDYHVHWWKHGHYLFHRQD
SDUWLFXODUWKUHDG\RXFDQQRWFDOOcudaSetDevice()DJDLQHYHQLI\RXSDVVWKH
VDPHGHYLFHLGHQWLȌHU7KHKLJKOLJKWHGif()VWDWHPHQWKHOSVXVDYRLGWKLVOLWWOH
QDVW\JUDPIURPWKH&8'$UXQWLPHVRZHPRYHRQWRWKHQH[WLPSRUWDQWFKDQJH
WRroutine()
,QDGGLWLRQWRXVLQJSRUWDEOHSLQQHGPHPRU\IRUWKHKRVWVLGHPHPRU\ZH
DUHXVLQJ]HURFRS\LQRUGHUWRDFFHVVWKHVHEXIIHUVGLUHFWO\IURPWKH*38
&RQVHTXHQWO\ZHQRORQJHUXVHcudaMemcpy()DVZHGLGLQWKHRULJLQDO
PXOWL*38DSSOLFDWLRQEXWZHXVHcudaHostGetDevicePointer()WRJHW
YDOLGGHYLFHSRLQWHUVIRUWKHKRVWPHPRU\DVZHGLGLQWKH]HURFRS\H[DPSOH
+RZHYHU\RXZLOOQRWLFHWKDWZHXVHVWDQGDUG*38PHPRU\IRUWKHSDUWLDOUHVXOWV
$VDOZD\VWKLVPHPRU\JHWVDOORFDWHGXVLQJcudaMalloc()
234
$WWKLVSRLQWZHǢUHSUHWW\PXFKUHDG\WRJRVRZHODXQFKRXUNHUQHODQGFRS\RXU
UHVXOWVEDFNIURPWKH*38
:HFRQFOXGHDVZHDOZD\VKDYHLQRXUGRWSURGXFWH[DPSOHE\VXPPLQJ
RXUSDUWLDOUHVXOWVRQWKH&38IUHHLQJRXUWHPSRUDU\VWRUDJHDQGUHWXUQLQJ
WRmain()
data->returnValue = c;
return 0;
}
&KDSWHU5HYLHZ
:HKDYHVHHQVRPHQHZW\SHVRIKRVWPHPRU\DOORFDWLRQVDOORIZKLFKJHW
DOORFDWHGZLWKDVLQJOHFDOOcudaHostAlloc()8VLQJDFRPELQDWLRQRIWKLV
RQHHQWU\SRLQWDQGDVHWRIDUJXPHQWȍDJVZHFDQDOORFDWHPHPRU\DVDQ\
FRPELQDWLRQRI]HURFRS\SRUWDEOHDQGRUZULWHFRPELQHG:HXVHGzero-copy
235
EXIIHUVWRDYRLGPDNLQJH[SOLFLWFRSLHVRIGDWDWRDQGIURPWKH*38DPDQHXYHU
WKDWSRWHQWLDOO\VSHHGVXSDZLGHFODVVRIDSSOLFDWLRQV8VLQJDVXSSRUWOLEUDU\IRU
WKUHDGLQJZHPDQLSXODWHGPXOWLSOH*38VIURPWKHVDPHDSSOLFDWLRQDOORZLQJ
RXUGRWSURGXFWFRPSXWDWLRQWREHSHUIRUPHGDFURVVPXOWLSOHGHYLFHV)LQDOO\
ZHVDZKRZPXOWLSOH*38VFRXOGVKDUHSLQQHGPHPRU\DOORFDWLRQVE\DOOR-
FDWLQJWKHPDVportableSLQQHGPHPRU\2XUODVWH[DPSOHXVHGSRUWDEOHSLQQHG
PHPRU\PXOWLSOH*38VDQG]HURFRS\EXIIHUVLQRUGHUWRGHPRQVWUDWHDWXUER-
FKDUJHGYHUVLRQRIWKHGRWSURGXFWZHVWDUWHGWR\LQJZLWKEDFNLQ&KDSWHU$V
PXOWLSOHGHYLFHV\VWHPVJDLQSRSXODULW\WKHVHWHFKQLTXHVVKRXOGVHUYH\RXZHOO
LQKDUQHVVLQJWKHFRPSXWDWLRQDOSRZHURI\RXUWDUJHWSODWIRUPLQLWVHQWLUHW\
236
&RQJUDWXODWLRQV:HKRSH\RXǢYHHQMR\HGOHDUQLQJDERXW&8'$&DQGH[SHUL-
PHQWLQJVRPHZLWK*38FRPSXWLQJ,WǢVEHHQDORQJWULSVROHWǢVWDNHDPRPHQW
WRUHYLHZZKHUHZHVWDUWHGDQGKRZPXFKJURXQGZHǢYHFRYHUHG6WDUWLQJZLWK
DEDFNJURXQGLQ&RU&SURJUDPPLQJZHǢYHOHDUQHGKRZWRXVHWKH&8'$
UXQWLPHǢVDQJOHEUDFNHWV\QWD[WRHDVLO\ODXQFKPXOWLSOHFRSLHVRINHUQHOVDFURVV
DQ\QXPEHURIPXOWLSURFHVVRUV:HH[SDQGHGWKHVHFRQFHSWVWRXVHFROOHF-
WLRQVRIWKUHDGVandEORFNVRSHUDWLQJRQDUELWUDULO\ODUJHLQSXWV7KHVHPRUH
FRPSOH[ODXQFKHVH[SORLWHGLQWHUWKUHDGFRPPXQLFDWLRQXVLQJWKH*38ǢVVSHFLDO
RQFKLSVKDUHGPHPRU\DQGWKH\HPSOR\HGGHGLFDWHGV\QFKURQL]DWLRQSULPLWLYHV
WRHQVXUHFRUUHFWRSHUDWLRQLQDQHQYLURQPHQWWKDWVXSSRUWV DQGHQFRXUDJHV
WKRXVDQGVXSRQWKRXVDQGVRISDUDOOHOWKUHDGV
$UPHGZLWKEDVLFFRQFHSWVDERXWSDUDOOHOSURJUDPPLQJXVLQJ&8'$&RQ
19,',$ǢV&8'$$UFKLWHFWXUHZHH[SORUHGVRPHRIWKHPRUHDGYDQFHGFRQFHSWV
DQG$3,VWKDW19,',$SURYLGHV7KH*38ǢVGHGLFDWHGJUDSKLFVKDUGZDUHSURYHV
XVHIXOIRU*38FRPSXWLQJVRZHOHDUQHGKRZWRH[SORLWWH[WXUHPHPRU\WRDFFHO-
HUDWHVRPHFRPPRQSDWWHUQVRIPHPRU\DFFHVV%HFDXVHPDQ\XVHUVDGG*38
FRPSXWLQJWRWKHLULQWHUDFWLYHJUDSKLFVDSSOLFDWLRQVZHH[SORUHGWKHLQWHURSHUD-
WLRQRI&8'$&NHUQHOVZLWKLQGXVWU\VWDQGDUGJUDSKLFV$3,VVXFKDV2SHQ*/
DQG'LUHFW;$WRPLFRSHUDWLRQVRQERWKJOREDODQGVKDUHGPHPRU\DOORZHGVDIH
237
PXOWLWKUHDGHGDFFHVVWRFRPPRQPHPRU\ORFDWLRQV0RYLQJVWHDGLO\LQWRPRUH
DQGPRUHDGYDQFHGWRSLFVVWUHDPVHQDEOHGXVWRNHHSRXUHQWLUHV\VWHPDVEXV\
DVSRVVLEOHDOORZLQJNHUQHOVWRH[HFXWHVLPXOWDQHRXVO\ZLWKPHPRU\FRSLHV
EHWZHHQWKHKRVWDQG*38)LQDOO\ZHORRNHGDWWKHZD\VLQZKLFKZHFRXOGDOOR-
FDWHDQGXVH]HURFRS\PHPRU\WRDFFHOHUDWHDSSOLFDWLRQVRQLQWHJUDWHG*38V
0RUHRYHUZHOHDUQHGWRLQLWLDOL]HPXOWLSOHGHYLFHVDQGDOORFDWHSRUWDEOHSLQQHG
PHPRU\LQRUGHUWRZULWH&8'$&WKDWIXOO\XWLOL]HVLQFUHDVLQJO\FRPPRQPXOWL
*38HQYLURQPHQWV
&KDSWHU2EMHFWLYHV
7KURXJKWKHFRXUVHRIWKLVFKDSWHU\RXZLOODFFRPSOLVKWKHIROORZLQJ
ǩ <RXZLOOOHDUQDERXWVRPHRIWKHWRROVDYDLODEOHWRDLG\RXU&8'$&GHYHORSPHQW
ǩ <RXZLOOOHDUQDERXWDGGLWLRQDOZULWWHQDQGFRGHUHVRXUFHVWRWDNH\RXU&8'$&
GHYHORSPHQWWRWKHQH[WOHYHO
&8'$7RROV
7KURXJKWKHFRXUVHRIWKLVERRNZHKDYHUHOLHGXSRQVHYHUDOFRPSRQHQWVRI
WKH&8'$&VRIWZDUHV\VWHP7KHDSSOLFDWLRQVZHZURWHPDGHKHDY\XVHRIWKH
&8'$&FRPSLOHULQRUGHUWRFRQYHUWRXU&8'$&NHUQHOVLQWRFRGHWKDWFRXOGEH
H[HFXWHGRQ19,',$*38V:HDOVRXVHGWKH&8'$UXQWLPHLQRUGHUWRSHUIRUP
PXFKRIWKHVHWXSDQGGLUW\ZRUNEHKLQGODXQFKLQJNHUQHOVDQGFRPPXQLFDWLQJ
ZLWKWKH*387KH&8'$UXQWLPHLQWXUQXVHVWKH&8'$GULYHUWRWDONGLUHFWO\
WRWKHKDUGZDUHLQ\RXUV\VWHP,QDGGLWLRQWRWKHVHFRPSRQHQWVWKDWZHKDYH
DOUHDG\XVHGDWOHQJWK19,',$PDNHVDYDLODEOHDKRVWRIRWKHUVRIWZDUHLQRUGHU
WRHDVHWKHGHYHORSPHQWRI&8'$&DSSOLFDWLRQV7KLVVHFWLRQGRHVQRWVHUYHZHOO
DVDXVHUǢVPDQXDOWRWKHVHSURGXFWVEXWUDWKHULWDLPVVROHO\WRLQIRUP\RXRI
WKHH[LVWHQFHDQGXWLOLW\RIWKHVHSDFNDJHV
238
\RXGRQǢWKDYHWKH&8'$7RRONLWRQ\RXUPDFKLQHWKHQLWǢVDYHULWDEOHFHUWDLQW\
WKDW\RXKDYHQǢWWULHGWRZULWHRUFRPSLOHDQ\&8'$&FRGH:HǢUHRQWR\RXQRZ
VXFNHU$FWXDOO\WKLVLVQRELJGHDO EXWLWGRHVPDNHXVZRQGHUZK\\RXǢYHUHDG
WKLVHQWLUHERRN 2QWKHRWKHUKDQGLI\RXhaveEHHQZRUNLQJWKURXJKWKHH[DP-
SOHVLQWKLVERRNWKHQ\RXVKRXOGSRVVHVVWKHOLEUDULHVZHǢUHDERXWWRGLVFXVV
&8))7
7KH&8'$7RRONLWFRPHVZLWKWZRYHU\LPSRUWDQWXWLOLW\OLEUDULHVLI\RXSODQWR
SXUVXH*38FRPSXWLQJLQ\RXURZQDSSOLFDWLRQV)LUVW19,',$SURYLGHVDWXQHG
)DVW)RXULHU7UDQVIRUPOLEUDU\NQRZQDVCUFFT$VRIUHOHDVHWKH&8))7
OLEUDU\VXSSRUWVDQXPEHURIXVHIXOIHDWXUHVLQFOXGLQJWKHIROORZLQJ
ǩ 2QHWZRDQGWKUHHGLPHQVLRQDOWUDQVIRUPVRIERWKUHDOYDOXHGDQG
FRPSOH[YDOXHGLQSXWGDWD
ǩ %DWFKH[HFXWLRQIRUSHUIRUPLQJPXOWLSOHRQHGLPHQVLRQDOWUDQVIRUPVLQ
parallel
ǩ 'DQG'WUDQVIRUPVZLWKVL]HVUDQJLQJIURPWRLQDQ\GLPHQVLRQ
ǩ 'WUDQVIRUPVRILQSXWVXSWRPLOOLRQHOHPHQWVLQVL]H
ǩ ,QSODFHDQGRXWRISODFHWUDQVIRUPVIRUERWKUHDOYDOXHGDQGFRPSOH[
YDOXHGGDWD
19,',$SURYLGHVWKH&8))7OLEUDU\IUHHRIFKDUJHZLWKDQDFFRPSDQ\LQJOLFHQVH
WKDWDOORZVIRUXVHLQDQ\DSSOLFDWLRQUHJDUGOHVVRIZKHWKHULWǢVIRUSHUVRQDO
DFDGHPLFRUSURIHVVLRQDOGHYHORSPHQW
&8%/$6
,QDGGLWLRQWRD)DVW)RXULHU7UDQVIRUPOLEUDU\19,',$DOVRSURYLGHVDOLEUDU\RI
OLQHDUDOJHEUDURXWLQHVWKDWLPSOHPHQWVWKHZHOONQRZQSDFNDJHRI%DVLF/LQHDU
$OJHEUD6XESURJUDPV %/$6 7KLVOLEUDU\QDPHGCUBLASLVDOVRIUHHO\DYDLO-
DEOHDQGVXSSRUWVDODUJHVXEVHWRIWKHIXOO%/$6SDFNDJH7KLVLQFOXGHVYHUVLRQV
RIHDFKURXWLQHWKDWDFFHSWERWKVLQJOHDQGGRXEOHSUHFLVLRQLQSXWVDVZHOO
DVUHDODQGFRPSOH[YDOXHGGDWD%HFDXVH%/$6ZDVRULJLQDOO\D)2575$1
LPSOHPHQWHGOLEUDU\RIOLQHDUDOJHEUDURXWLQHV19,',$DWWHPSWVWRPD[LPL]H
FRPSDWLELOLW\ZLWKWKHUHTXLUHPHQWVDQGH[SHFWDWLRQVRIWKHVHLPSOHPHQWDWLRQV
6SHFLȌFDOO\WKH&8%/$6OLEUDU\XVHVDFROXPQPDMRUVWRUDJHOD\RXWIRUDUUD\V
UDWKHUWKDQWKHURZPDMRUOD\RXWQDWLYHO\XVHGE\&DQG&,QSUDFWLFHWKLVLV
239
QRWW\SLFDOO\DFRQFHUQEXWLWGRHVDOORZIRUFXUUHQWXVHUVRI%/$6WRDGDSWWKHLU
DSSOLFDWLRQVWRH[SORLWWKH*38DFFHOHUDWHG&8%/$6ZLWKPLQLPDOHIIRUW19,',$
DOVRGLVWULEXWHV)2575$1ELQGLQJVWR&8%/$6LQRUGHUWRGHPRQVWUDWHKRZWR
OLQNH[LVWLQJ)2575$1DSSOLFDWLRQVWR&8'$OLEUDULHV
&8'$%DVLF7RSLFV
&8'$$GYDQFHG7RSLFV
&8'$6\VWHPV,QWHJUDWLRQ
'DWD3DUDOOHO$OJRULWKPV
*UDSKLFV,QWHURSHUDELOLW\
7H[WXUH
3HUIRUPDQFH6WUDWHJLHV
/LQHDU$OJHEUD
,PDJH9LGHR3URFHVVLQJ
&RPSXWDWLRQDO)LQDQFH
'DWD&RPSUHVVLRQ
3K\VLFDOO\%DVHG6LPXODWLRQ
7KHH[DPSOHVZRUNRQDQ\SODWIRUPWKDW&8'$&ZRUNVRQDQGFDQVHUYHDV
H[FHOOHQWMXPSLQJRIISRLQWVIRU\RXURZQDSSOLFDWLRQV)RUUHDGHUVZKRKDYH
FRQVLGHUDEOHH[SHULHQFHLQVRPHRIWKHVHDUHDVZHZDUQ\RXDJDLQVWH[SHFWLQJ
WRVHHVWDWHRIWKHDUWLPSOHPHQWDWLRQVRI\RXUIDYRULWHDOJRULWKPVLQWKH19,',$
240
*38&RPSXWLQJ6'.7KHVHFRGHVDPSOHVVKRXOGQRWEHWUHDWHGDVSURGXFWLRQ
ZRUWK\OLEUDU\FRGHEXWUDWKHUDVHGXFDWLRQDOLOOXVWUDWLRQVRIIXQFWLRQLQJ&8'$&
SURJUDPVQRWXQOLNHWKHH[DPSOHVLQWKLVERRN
19,',$3(5)250$1&(35,0,7,9(6
,QDGGLWLRQWRWKHURXWLQHVRIIHUHGLQWKH&8))7DQG&8%/$6OLEUDULHV19,',$
DOVRPDLQWDLQVDOLEUDU\RIIXQFWLRQVIRUSHUIRUPLQJ&8'$DFFHOHUDWHGGDWD
SURFHVVLQJNQRZQDVWKH19,',$3HUIRUPDQFH3ULPLWLYHV 133 &XUUHQWO\133ǢV
LQLWLDOVHWRIIXQFWLRQDOLW\IRFXVHVVSHFLȌFDOO\RQLPDJLQJDQGYLGHRSURFHVVLQJ
DQGLVZLGHO\DSSOLFDEOHIRUGHYHORSHUVLQWKHVHDUHDV19,',$LQWHQGVIRU133WR
HYROYHRYHUWLPHWRDGGUHVVDJUHDWHUQXPEHURIFRPSXWLQJWDVNVLQDZLGHUUDQJH
RIGRPDLQV,I\RXKDYHDQLQWHUHVWLQKLJKSHUIRUPDQFHLPDJLQJRUYLGHRDSSOLFD-
WLRQV\RXVKRXOGPDNHLWDSULRULW\WRORRNLQWR133DYDLODEOHDVDIUHHGRZQORDG
DWZZZQYLGLDFRPREMHFWQSSKWPO RUDFFHVVLEOHIURP\RXUIDYRULWHZHEVHDUFK
HQJLQH
'(%8**,1*&8'$&
:HKDYHKHDUGIURPDYDULHW\RIVRXUFHVWKDWLQUDUHLQVWDQFHVFRPSXWHU
VRIWZDUHGRHVQRWZRUNH[DFWO\DVLQWHQGHGZKHQȌUVWH[HFXWHG6RPHFRGH
FRPSXWHVLQFRUUHFWYDOXHVVRPHIDLOVWRWHUPLQDWHH[HFXWLRQDQGVRPH
FRGHHYHQSXWVWKHFRPSXWHULQWRDVWDWHWKDWRQO\DȍLSRIWKHSRZHUVZLWFK
FDQUHPHG\$OWKRXJKKDYLQJFOHDUO\neverZULWWHQFRGHOLNHWKLVSHUVRQDOO\
WKHDXWKRUVRIWKLVERRNUHFRJQL]HWKDWVRPHVRIWZDUHHQJLQHHUVPD\GHVLUH
UHVRXUFHVWRGHEXJWKHLU&8'$&NHUQHOV)RUWXQDWHO\19,',$SURYLGHVWRROVWR
PDNHWKLVSDLQIXOSURFHVVVLJQLȌFDQWO\OHVVWURXEOHVRPH
&8'$Ȑ*'%
$WRRONQRZQDVCUDA-GDBLVRQHRIWKHPRVWXVHIXO&8'$GRZQORDGVDYDLODEOH
WR&8'$&SURJUDPPHUVZKRGHYHORSWKHLUFRGHRQ/LQX[EDVHGV\VWHPV19,',$
H[WHQGHGWKHRSHQVRXUFH*18GHEXJJHU gdb WRWUDQVSDUHQWO\VXSSRUWGHEXJ-
JLQJGHYLFHFRGHLQUHDOWLPHZKLOHPDLQWDLQLQJWKHIDPLOLDULQWHUIDFHRIgdb3ULRU
WR&8'$*'%WKHUHH[LVWHGQRJRRGZD\WRGHEXJGHYLFHFRGHRXWVLGHRIXVLQJ
WKH&38WRVLPXODWHWKHZD\LQZKLFKLWZDVH[SHFWHGWRUXQ7KLVPHWKRG\LHOGHG
H[WUHPHO\VORZGHEXJJLQJDQGLQIDFWLWZDVIUHTXHQWO\DYHU\SRRUDSSUR[L-
PDWLRQRIWKHH[DFW*38H[HFXWLRQRIWKHNHUQHO19,',$ǢV&8'$*'%HQDEOHV
SURJUDPPHUVWRGHEXJWKHLUNHUQHOVGLUHFWO\RQWKH*38DIIRUGLQJWKHPDOORI
241
WKHFRQWUROWKDWWKH\ǢYHJURZQDFFXVWRPHGWRZLWK&38GHEXJJHUV6RPHRIWKH
KLJKOLJKWVRI&8'$*'%LQFOXGHWKHIROORZLQJ
ǩ 9LHZLQJ&8'$VWDWHVXFKDVLQIRUPDWLRQUHJDUGLQJLQVWDOOHG*38VDQGWKHLU
FDSDELOLWLHV
ǩ 6HWWLQJEUHDNSRLQWVLQ&8'$&VRXUFHFRGH
ǩ ,QVSHFWLQJ*38PHPRU\LQFOXGLQJDOOJOREDODQGVKDUHGPHPRU\
ǩ ,QVSHFWLQJWKHEORFNVDQGWKUHDGVFXUUHQWO\UHVLGHQWRQWKH*38
ǩ 6LQJOHVWHSSLQJDZDUSRIWKUHDGV
ǩ %UHDNLQJLQWRFXUUHQWO\UXQQLQJDSSOLFDWLRQVLQFOXGLQJKXQJRUGHDGORFNHG
DSSOLFDWLRQV
$ORQJZLWKWKHGHEXJJHU19,',$SURYLGHVWKH&8'$0HPRU\&KHFNHUZKRVH
IXQFWLRQDOLW\FDQEHDFFHVVHGWKURXJK&8'$*'%RUWKHVWDQGDORQHWRRO
cuda-memcheck%HFDXVHWKH&8'$$UFKLWHFWXUHLQFOXGHVDVRSKLVWLFDWHG
PHPRU\PDQDJHPHQWXQLWEXLOWGLUHFWO\LQWRWKHKDUGZDUHDOOLOOHJDOPHPRU\
DFFHVVHVZLOOEHGHWHFWHGDQGSUHYHQWHGE\WKHKDUGZDUH$VDUHVXOWRID
PHPRU\YLRODWLRQ\RXUSURJUDPZLOOFHDVHIXQFWLRQLQJDVH[SHFWHGVR\RXZLOO
FHUWDLQO\ZDQWYLVLELOLW\LQWRWKHVHW\SHVRIHUURUV:KHQHQDEOHGWKH&8'$
0HPRU\&KHFNHUZLOOGHWHFWDQ\JOREDOPHPRU\YLRODWLRQVRUPLVDOLJQHGJOREDO
PHPRU\DFFHVVHVWKDW\RXUNHUQHODWWHPSWVWRPDNHUHSRUWLQJWKHPWR\RXLQD
IDUPRUHKHOSIXODQGYHUERVHPDQQHUWKDQSUHYLRXVO\SRVVLEOH
19,',$3$5$//(/16,*+7
$OWKRXJK&8'$*'%LVDPDWXUHDQGIDQWDVWLFWRROIRUGHEXJJLQJ\RXU&8'$
&NHUQHOVRQKDUGZDUHLQUHDOWLPH19,',$UHFRJQL]HVWKDWQRWHYHU\GHYHO-
RSHULVRYHUWKHPRRQDERXW/LQX[6RXQOHVV:LQGRZVXVHUVDUHKHGJLQJWKHLU
EHWVE\VDYLQJXSWRRSHQWKHLURZQSHWVWRUHVWKH\QHHGDZD\WRGHEXJWKHLU
DSSOLFDWLRQVWRR7RZDUGWKHHQGRI19,',$LQWURGXFHG19,',$3DUDOOHO
1VLJKW RULJLQDOO\FRGHQDPHG1H[XV WKHȌUVWLQWHJUDWHG*38&38GHEXJJHU
IRU0LFURVRIW9LVXDO6WXGLR/LNH&8'$*'%3DUDOOHO1VLJKWVXSSRUWVGHEXJ-
JLQJ&8'$DSSOLFDWLRQVZLWKWKRXVDQGVRIWKUHDGV8VHUVFDQSODFHEUHDNSRLQWV
DQ\ZKHUHLQWKHLU&8'$&VRXUFHFRGHLQFOXGLQJEUHDNSRLQWVWKDWWULJJHURQ
ZULWHVWRDUELWUDU\PHPRU\ORFDWLRQV7KH\FDQLQVSHFW*38PHPRU\GLUHFWO\
IURPWKH9LVXDO6WXGLR0HPRU\ZLQGRZDQGFKHFNIRURXWRIERXQGVPHPRU\
DFFHVVHV7KLVWRROKDVEHHQPDGHSXEOLFO\DYDLODEOHLQDEHWDSURJUDPDVRI
SUHVVWLPHDQGWKHȌQDOYHUVLRQVKRXOGEHUHOHDVHGVKRUWO\
242
&8'$9,68$/352),/(5
:HRIWHQWRXWWKH&8'$$UFKLWHFWXUHDVDZRQGHUIXOIRXQGDWLRQIRUKLJK
SHUIRUPDQFHFRPSXWLQJDSSOLFDWLRQV8QIRUWXQDWHO\WKHUHDOLW\LVWKDWDIWHU
IHUUHWLQJRXWDOOWKHEXJVIURP\RXUDSSOLFDWLRQVHYHQWKHPRVWZHOOPHDQLQJ
ǤKLJKSHUIRUPDQFHFRPSXWLQJǥDSSOLFDWLRQVDUHPRUHDFFXUDWHO\UHIHUUHGWRDV
VLPSO\ǤFRPSXWLQJǥDSSOLFDWLRQV:HKDYHRIWHQEHHQLQWKHSRVLWLRQZKHUHZH
ZRQGHUǤ:K\LQWKH6DP+LOOLVP\FRGHSHUIRUPLQJVRSRRUO\"ǥ,QVLWXDWLRQVOLNH
WKLVLWKHOSVWREHDEOHWRH[HFXWHWKHNHUQHOVLQTXHVWLRQXQGHUWKHZDWFKIXOJD]H
RIDSURȌOLQJWRRO19,',$SURYLGHVMXVWVXFKDWRRODYDLODEOHDVDVHSDUDWHGRZQ-
ORDGRQWKH&8'$=RQHZHEVLWH)LJXUHVKRZVWKH9LVXDO3URȌOHUEHLQJXVHG
WRFRPSDUHWZRLPSOHPHQWDWLRQVRIDPDWUL[WUDQVSRVHRSHUDWLRQ'HVSLWHQRW
ORRNLQJDWDOLQHRIFRGHLWEHFRPHVTXLWHHDV\WRGHWHUPLQHWKDWERWKPHPRU\
DQGLQVWUXFWLRQWKURXJKSXWRIWKHtranspose()NHUQHORXWVWULSWKDWRIWKH
transpose_naive()NHUQHO %XWWKHQDJDLQLWZRXOGEHXQIDLUWRH[SHFWPXFK
PRUHIURPDIXQFWLRQZLWKnaiveLQWKHQDPH
7KH&8'$9LVXDO3URȌOHUZLOOH[HFXWH\RXUDSSOLFDWLRQH[DPLQLQJVSHFLDOSHUIRU-
PDQFHFRXQWHUVEXLOWLQWRWKH*38$IWHUH[HFXWLRQWKHSURȌOHUFDQFRPSLOHGDWD
EDVHGRQWKHVHFRXQWHUVDQGSUHVHQW\RXZLWKUHSRUWVEDVHGRQZKDWLWREVHUYHG
,WFDQYHULI\KRZORQJ\RXUDSSOLFDWLRQVSHQGVH[HFXWLQJHDFKNHUQHODVZHOO
DVGHWHUPLQHWKHQXPEHURIEORFNVODXQFKHGZKHWKHU\RXUNHUQHOǢVPHPRU\
DFFHVVHVDUHFRDOHVFHGWKHQXPEHURIGLYHUJHQWEUDQFKHVWKHZDUSVLQ\RXUFRGH
H[HFXWHDQGVRRQ:HHQFRXUDJH\RXWRORRNLQWRWKH&8'$9LVXDO3URȌOHULI\RX
KDYHVRPHVXEWOHSHUIRUPDQFHSUREOHPVLQQHHGRIUHVROXWLRQ
:ULWWHQ5HVRXUFHV
,I\RXKDYHQǢWDOUHDG\JURZQTXHDV\IURPDOOWKHSURVHLQWKLVERRNWKHQLWǢV
SRVVLEOH\RXPLJKWDFWXDOO\EHLQWHUHVWHGLQUHDGLQJPRUH:HNQRZWKDWVRPHRI
\RXDUHPRUHOLNHO\WRZDQWWRSOD\ZLWKFRGHLQRUGHUWRFRQWLQXH\RXUOHDUQLQJ
EXWIRUWKHUHVWRI\RXWKHUHDUHDGGLWLRQDOZULWWHQUHVRXUFHVWRPDLQWDLQ\RXU
JURZWKDVD&8'$&FRGHU
352*5$00,1*0$66,9(/<3$5$//(/352&(66256$
+$1'6Ȑ21$3352$&+
,I\RXUHDG&KDSWHUZHDVVXUHG\RXWKDWWKLVERRNZDVPRVWGHFLGHGO\not a
WH[WERRNRQSDUDOOHODUFKLWHFWXUHV6XUHZHEDQGLHGDERXWWHUPVVXFKDVmulti-
processorDQGwarpEXWWKLVERRNVWULYHVWRWHDFKWKHVRIWHUVLGHRISURJUDPPLQJ
ZLWK&8'$&DQGLWVDWWHQGDQW$3,V:HOHDUQHGWKH&8'$&ODQJXDJHZLWKLQWKH
SURJUDPPLQJPRGHOVHWIRUWKLQWKHNVIDIA CUDA Programming GuideODUJHO\
LJQRULQJWKHZD\19,',$ǢVKDUGZDUHDFWXDOO\DFFRPSOLVKHVWKHWDVNVZHJLYHLW
%XWWRWUXO\EHFRPHDQDGYDQFHGZHOOURXQGHG&8'$&SURJUDPPHU\RXZLOO
QHHGDPRUHLQWLPDWHIDPLOLDULW\ZLWKWKH&8'$$UFKLWHFWXUHDQGVRPHRIWKH
QXDQFHVRIKRZ19,',$*38VZRUNEHKLQGWKHVFHQHV7RDFFRPSOLVKWKLV
ZHUHFRPPHQGZRUNLQJ\RXUZD\WKURXJKProgramming Massively Parallel
Processors: A Hands-on Approach7RZULWHLW'DYLG.LUNIRUPHUO\19,',$ǢVFKLHI
VFLHQWLVWFROODERUDWHGZLWK:HQPHL:+ZXWKH:-6DQGHUV,,,FKDLUPDQLQ
HOHFWULFDODQGFRPSXWHUHQJLQHHULQJDW8QLYHUVLW\RI,OOLQRLV<RXǢOOHQFRXQWHU
DQXPEHURIIDPLOLDUWHUPVDQGFRQFHSWVEXW\RXZLOOOHDUQDERXWWKHJULWW\
GHWDLOVRI19,',$ǢV&8'$$UFKLWHFWXUHLQFOXGLQJWKUHDGVFKHGXOLQJDQGODWHQF\
WROHUDQFHPHPRU\EDQGZLGWKXVDJHDQGHIȌFLHQF\VSHFLȌFVRQȍRDWLQJSRLQW
244
KDQGOLQJDQGPXFKPRUH7KHERRNDOVRDGGUHVVHVSDUDOOHOSURJUDPPLQJLQ
DPRUHJHQHUDOVHQVHWKDQWKLVERRNVR\RXZLOOJDLQDEHWWHURYHUDOOXQGHU-
VWDQGLQJRIKRZWRHQJLQHHUSDUDOOHOVROXWLRQVWRODUJHFRPSOH[SUREOHPV
CUDA U
6RPHRIXVZHUHXQOXFN\HQRXJKWRKDYHDWWHQGHGXQLYHUVLW\SULRUWRWKHH[FLWLQJ
ZRUOGRI*38FRPSXWLQJ)RUWKRVHZKRDUHIRUWXQDWHHQRXJKWREHDWWHQGLQJ
FROOHJHQRZRULQWKHQHDUIXWXUHDERXWXQLYHUVLWLHVDFURVVWKHZRUOG
FXUUHQWO\WHDFKFRXUVHVLQYROYLQJ&8'$%XWEHIRUH\RXVWDUWDFUDVKGLHWWRȌW
EDFNLQWR\RXUFROOHJHJHDUWKHUHǢVDQDOWHUQDWLYH2QWKH&8'$=RQHZHEVLWH
\RXZLOOȌQGDOLQNIRUCUDA UZKLFKLVHVVHQWLDOO\DQRQOLQHXQLYHUVLW\IRU&8'$
HGXFDWLRQ 2U\RXFDQQDYLJDWHGLUHFWO\WKHUHZLWKWKH85/ZZZQYLGLDFRP
REMHFWFXGDBHGXFDWLRQ$OWKRXJK\RXZLOOEHDEOHWROHDUQTXLWHDELWDERXW*38
FRPSXWLQJLI\RXDWWHQGVRPHRIWKHRQOLQHOHFWXUHVDW&8'$8DVRISUHVVWLPH
WKHUHDUHVWLOOQRRQOLQHIUDWHUQLWLHVIRUSDUW\LQJDIWHUFODVV
DR. DOBB’S
)RUPRUHWKDQ\HDUVDr. Dobb’sKDVFRYHUHGQHDUO\HYHU\PDMRUGHYHORS-
PHQWLQFRPSXWLQJWHFKQRORJ\DQG19,',$ǢV&8'$LVQRH[FHSWLRQ$VSDUWRIDQ
RQJRLQJVHULHVDr. Dobb’sKDVSXEOLVKHGDQH[WHQVLYHVHULHVRIDUWLFOHVFXWWLQJD
EURDGVZDWKWKURXJKWKH&8'$ODQGVFDSH(QWLWOHGCUDA, Supercomputing for the
MassesWKHVHULHVVWDUWVZLWKDQLQWURGXFWLRQWR*38FRPSXWLQJDQGSURJUHVVHV
245
TXLFNO\IURPDȌUVWNHUQHOWRRWKHUSLHFHVRIWKH&8'$SURJUDPPLQJPRGHO7KH
DUWLFOHVLQDr. Dobb’sFRYHUHUURUKDQGOLQJJOREDOPHPRU\SHUIRUPDQFHVKDUHG
PHPRU\WKH&8'$9LVXDO3URȌOHUWH[WXUHPHPRU\&8'$*'%DQGWKH&8'33
OLEUDU\RIGDWDSDUDOOHO&8'$SULPLWLYHVDVZHOODVPDQ\RWKHUWRSLFV7KLVVHULHV
RIDUWLFOHVLVDQH[FHOOHQWSODFHWRJHWDGGLWLRQDOLQIRUPDWLRQDERXWVRPHRIWKH
PDWHULDOZHǢYHDWWHPSWHGWRFRQYH\LQWKLVERRN)XUWKHUPRUH\RXǢOOȌQGSUDF-
WLFDOLQIRUPDWLRQFRQFHUQLQJVRPHRIWKHWRROVWKDWZHǢYHRQO\KDGWLPHWRJODQFH
RYHULQWKLVWH[WVXFKDVWKHSURȌOLQJDQGGHEXJJLQJRSWLRQVDYDLODEOHWR\RX7KH
VHULHVRIDUWLFOHVLVOLQNHGIURPWKH&8'$=RQHZHESDJHEXWLVUHDGLO\DFFHVVLEOH
WKURXJKDZHEVHDUFKIRUDr Dobbs CUDA
19,',$)25806
(YHQDIWHUGLJJLQJDURXQGDOORI19,',$ǢVGRFXPHQWDWLRQ\RXPD\ȌQG\RXU-
VHOIZLWKDQXQDQVZHUHGRUSDUWLFXODUO\LQWULJXLQJTXHVWLRQ3HUKDSV\RXǢUH
ZRQGHULQJZKHWKHUDQ\RQHHOVHKDVVHHQVRPHIXQN\EHKDYLRU\RXǢUHH[SH-
ULHQFLQJ2UPD\EH\RXǢUHWKURZLQJD&8'$FHOHEUDWLRQSDUW\DQGZDQWHGWR
DVVHPEOHDJURXSRIOLNHPLQGHGLQGLYLGXDOV)RUDQ\WKLQJ\RXǢUHLQWHUHVWHGLQ
DVNLQJZHVWURQJO\UHFRPPHQGWKHIRUXPVRQ19,',$ǢVZHEVLWH/RFDWHGDW
KWWSIRUXPVQYLGLDFRPWKHIRUXPVDUHDJUHDWSODFHWRDVNTXHVWLRQVRIRWKHU
&8'$XVHUV,QIDFWDIWHUUHDGLQJWKLVERRN\RXǢUHLQDSRVLWLRQWRSRWHQWLDOO\
KHOSRWKHUVLI\RXZDQW19,',$HPSOR\HHVUHJXODUO\SURZOWKHIRUXPVWRRVR
WKHWULFNLHVWTXHVWLRQVZLOOSURPSWDXWKRULWDWLYHDGYLFHULJKWIURPWKHVRXUFH:H
DOVRORYHWRJHWVXJJHVWLRQVIRUQHZIHDWXUHVDQGIHHGEDFNRQWKHJRRGEDGDQG
XJO\WKLQJVWKDWZHDW19,',$GR
&RGH5HVRXUFHV
$OWKRXJKWKH19,',$*38&RPSXWLQJ6'.LVDWUHDVXUHWURYHRIKRZWRVDPSOHV
LWǢVQRWGHVLJQHGWREHXVHGIRUPXFKPRUHWKDQSHGDJRJ\,I\RXǢUHKXQWLQJIRU
SURGXFWLRQFDOLEHU&8'$SRZHUHGOLEUDULHVRUVRXUFHFRGH\RXǢOOQHHGWRORRND
ELWIXUWKHU)RUWXQDWHO\WKHUHLVDODUJHFRPPXQLW\RI&8'$GHYHORSHUVZKRKDYH
SURGXFHGWRSQRWFKVROXWLRQV$FRXSOHRIWKHVHWRROVDQGOLEUDULHVDUHSUHVHQWHG
KHUHEXW\RXDUHHQFRXUDJHGWRVHDUFKWKH:HEIRUZKDWHYHUVROXWLRQV\RXQHHG
$QGKH\PD\EH\RXǢOOFRQWULEXWHVRPHRI\RXURZQWRWKH&8'$&FRPPXQLW\
VRPHGD\
246
&8'$'$7$3$5$//(/35,0,7,9(6/,%5$5<
19,',$ZLWKWKHKHOSRIUHVHDUFKHUVDWWKH8QLYHUVLW\RI&DOLIRUQLD'DYLVKDV
UHOHDVHGDOLEUDU\NQRZQDVWKH&8'$'DWD3DUDOOHO3ULPLWLYHV/LEUDU\ &8'33
&8'33DVWKHQDPHLQGLFDWHVLVDOLEUDU\RIGDWDSDUDOOHODOJRULWKPSULPLWLYHV
6RPHRIWKHVHSULPLWLYHVLQFOXGHSDUDOOHOSUHȌ[VXP scan SDUDOOHOVRUWDQG
SDUDOOHOUHGXFWLRQ3ULPLWLYHVVXFKDVWKHVHIRUPWKHIRXQGDWLRQRIDZLGHYDULHW\
RIGDWDSDUDOOHODOJRULWKPVLQFOXGLQJVRUWLQJVWUHDPFRPSDFWLRQEXLOGLQJ
GDWDVWUXFWXUHVDQGPDQ\RWKHUV,I\RXǢUHORRNLQJWRZULWHDQHYHQPRGHUDWHO\
FRPSOH[DOJRULWKPVFKDQFHVDUHJRRGWKDWHLWKHU&8'33DOUHDG\KDVZKDW\RX
QHHGRULWFDQJHW\RXVLJQLȌFDQWO\FORVHUWRZKHUH\RXZDQWWREH'RZQORDGLWDW
KWWSFRGHJRRJOHFRPSFXGSS
CULATOOLS
$VZHPHQWLRQHGLQ6HFWLRQ&8%/$619,',$SURYLGHVDQLPSOHPHQWDWLRQ
RIWKH%/$6SDFNDJHGDORQJZLWKWKH&8'$7RRONLWGRZQORDG)RUUHDGHUVZKR
QHHGDEURDGHUVROXWLRQIRUOLQHDUDOJHEUDWDNHDORRNDW(03KRWRQLFVǢ&8'$
LPSOHPHQWDWLRQRIWKHLQGXVWU\VWDQGDUG/LQHDU$OJHEUD3DFNDJH /$3$&.
,WV/$3$&.LPSOHPHQWDWLRQLVNQRZQDVCULAtoolsDQGRIIHUVPRUHFRPSOH[
OLQHDUDOJHEUDURXWLQHVWKDWDUHEXLOWRQ19,',$ǢV&8%/$6WHFKQRORJ\7KH
IUHHO\DYDLODEOH%DVLFSDFNDJHRIIHUV/8GHFRPSRVLWLRQ45IDFWRUL]DWLRQOLQHDU
V\VWHPVROYHUDQGVLQJXODUYDOXHGHFRPSRVLWLRQDVZHOODVOHDVWVTXDUHVDQG
FRQVWUDLQHGOHDVWVTXDUHVVROYHUV<RXFDQREWDLQWKH%DVLFGRZQORDGDW
ZZZFXODWRROVFRPYHUVLRQVEDVLF<RXZLOODOVRQRWLFHWKDW(03KRWRQLFVRIIHUV
3UHPLXPDQG&RPPHUFLDOOLFHQVHVZKLFKFRQWDLQDIDUJUHDWHUIUDFWLRQRIWKH
/$3$&.URXWLQHVDVZHOODVOLFHQVLQJWHUPVWKDWZLOODOORZ\RXWRGLVWULEXWH\RXU
RZQFRPPHUFLDODSSOLFDWLRQVEDVHGRQ&8/$WRROV
247
KWWSPDWKHPDWLFLDQGHVRIWZDUHS\FXGD)LQDOO\WKHUHDUHELQGLQJVIRU
WKH0LFURVRIW1(7HQYLURQPHQWDYDLODEOHIURPWKH&8'$1(7SURMHFWDW
ZZZKRRSRHFORXGFRP6ROXWLRQV&8'$1(7
$OWKRXJKWKHVHSURMHFWVDUHQRWRIȌFLDOO\VXSSRUWHGE\19,',$WKH\KDYHEHHQ
DURXQGIRUVHYHUDOYHUVLRQVRI&8'$DUHDOOIUHHO\DYDLODEOHDQGHDFKKDVPDQ\
VXFFHVVIXOFXVWRPHUV7KHPRUDORIWKLVVWRU\LVLI\RXUODQJXDJHRIFKRLFH RU
\RXUERVVǢVFKRLFH LVQRW&RU&\RXVKRXOGQRWUXOHRXW*38FRPSXWLQJXQWLO
\RXǢYHȌUVWORRNHGWRVHHZKHWKHUWKHQHFHVVDU\ELQGLQJVDUHDYDLODEOH
&KDSWHU5HYLHZ
$QGWKHUH\RXKDYHLW(YHQDIWHUFKDSWHUVRI&8'$&WKHUHDUHVWLOOORDGVRI
UHVRXUFHVWRGRZQORDGUHDGZDWFKDQGFRPSLOH7KLVLVDUHPDUNDEO\LQWHUHVWLQJ
WLPHWREHOHDUQLQJ*38FRPSXWLQJDVWKHHUDRIKHWHURJHQHRXVFRPSXWLQJ
SODWIRUPVPDWXUHV:HKRSHWKDW\RXKDYHHQMR\HGOHDUQLQJDERXWRQHRIWKH
PRVWSHUYDVLYHSDUDOOHOSURJUDPPLQJHQYLURQPHQWVLQH[LVWHQFH0RUHRYHUZH
KRSHWKDW\RXOHDYHWKLVH[SHULHQFHH[FLWHGDERXWWKHSRVVLELOLWLHVWRGHYHORSQHZ
DQGH[FLWLQJPHDQVIRULQWHUDFWLQJZLWKFRPSXWHUVDQGIRUSURFHVVLQJWKHHYHU
LQFUHDVLQJDPRXQWRILQIRUPDWLRQDYDLODEOHWR\RXUVRIWZDUH,WǢV\RXULGHDVDQGWKH
DPD]LQJWHFKQRORJLHV\RXGHYHORSWKDWZLOOSXVK*38FRPSXWLQJWRWKHQH[WOHYHO
248
&KDSWHUFRYHUHGVRPHRIWKHZD\VLQZKLFKZHFDQXVHDWRPLFRSHUDWLRQVWR
HQDEOHKXQGUHGVRIWKUHDGVWRVDIHO\PDNHFRQFXUUHQWPRGLȌFDWLRQVWRVKDUHG
GDWD,QWKLVDSSHQGL[ZHǢOOORRNDWDQDGYDQFHGPHWKRGIRUXVLQJDWRPLFVWR
LPSOHPHQWORFNLQJGDWDVWUXFWXUHV2QLWVVXUIDFHWKLVWRSLFGRHVQRWVHHPPXFK
PRUHFRPSOLFDWHGWKDQDQ\WKLQJHOVHZHǢYHH[DPLQHG$QGLQUHDOLW\WKLVLVDFFX-
UDWH<RXǢYHOHDUQHGDORWRIFRPSOH[WRSLFVWKURXJKWKLVERRNDQGORFNLQJGDWD
VWUXFWXUHVDUHQRPRUHFKDOOHQJLQJWKDQWKHVH6RZK\LVWKLVPDWHULDOKLGLQJLQ
WKHDSSHQGL[":HGRQǢWZDQWWRUHYHDODQ\VSRLOHUVVRLI\RXǢUHLQWULJXHGUHDGRQ
DQGZHǢOOGLVFXVVWKLVWKURXJKWKHFRXUVHRIWKHDSSHQGL[
249
$ 'RW3URGXFW5HYLVLWHG
,Q&KDSWHUZHORRNHGDWWKHLPSOHPHQWDWLRQRIDYHFWRUGRWSURGXFWXVLQJ&8'$
&7KLVDOJRULWKPZDVRQHRIDODUJHIDPLO\RIDOJRULWKPVNQRZQDVreductions,I
\RXUHFDOOWKHDOJRULWKPFRPSXWHGWKHGRWSURGXFWRIWZRLQSXWYHFWRUVE\GRLQJ
WKHIROORZLQJ
(DFKWKUHDGLQHDFKEORFNPXOWLSOLHVWZRFRUUHVSRQGLQJHOHPHQWVRIWKHLQSXW
YHFWRUVDQGVWRUHVWKHSURGXFWVLQVKDUHGPHPRU\
$OWKRXJKDEORFNKDVPRUHWKDQRQHSURGXFWDWKUHDGDGGVWZRRIWKH
SURGXFWVDQGVWRUHVWKHUHVXOWEDFNWRVKDUHGPHPRU\(DFKVWHSUHVXOWV
LQKDOIDVPDQ\YDOXHVDVLWVWDUWHGZLWK WKLVLVZKHUHWKHWHUPreduction
FRPHVIURP
:KHQHYHU\EORFNKDVDȌQDOVXPHDFKRQHZULWHVLWVYDOXHWRJOREDOPHPRU\
DQGH[LWV
,IWKHNHUQHOUDQZLWKNSDUDOOHOEORFNVWKH&38VXPVWKHVHUHPDLQLQJN
YDOXHVWRJHQHUDWHWKHȌQDOGRWSURGXFW
7KLVKLJKOHYHOORRNDWWKHGRWSURGXFWDOJRULWKPLVLQWHQGHGWREHUHYLHZVRLI
LWǢVEHHQDZKLOHRU\RXǢYHKDGDFRXSOHJODVVHVRI&KDUGRQQD\LWPD\EHZRUWK
WKHWLPHWRUHYLHZ&KDSWHU,I\RXIHHOFRPIRUWDEOHHQRXJKZLWKWKHGRWSURGXFW
FRGHWRFRQWLQXHGUDZ\RXUDWWHQWLRQWRVWHSLQWKHDOJRULWKP$OWKRXJKLW
GRHVQǢWLQYROYHFRS\LQJPXFKGDWDWRWKHKRVWRUSHUIRUPLQJPDQ\FDOFXOD-
WLRQVRQWKH&38PRYLQJWKHFRPSXWDWLRQEDFNWRWKH&38WRȌQLVKLVLQGHHGDV
DZNZDUGDVLWVRXQGV
%XWLWǢVPRUHWKDQDQLVVXHRIDQDZNZDUGVWHSWRWKHDOJRULWKPRUWKHLQHOHJDQFH
RIWKHVROXWLRQ&RQVLGHUDVFHQDULRZKHUHDGRWSURGXFWFRPSXWDWLRQLVMXVWRQH
VWHSLQDORQJVHTXHQFHRIRSHUDWLRQV,I\RXZDQWWRSHUIRUPeveryRSHUDWLRQRQ
WKH*38EHFDXVH\RXU&38LVEXV\ZLWKRWKHUWDVNVRUFRPSXWDWLRQV\RXǢUHRXW
RIOXFN$VLWVWDQGV\RXǢOOEHIRUFHGWRVWRSFRPSXWLQJRQWKH*38FRS\LQWHU-
PHGLDWHUHVXOWVEDFNWRWKHKRVWȌQLVKWKHFRPSXWDWLRQZLWKWKH&38DQGȌQDOO\
XSORDGWKDWUHVXOWEDFNWRWKH*38DQGUHVXPHFRPSXWLQJZLWK\RXUQH[WNHUQHO
6LQFHWKLVLVDQDSSHQGL[RQDWRPLFVDQGZHKDYHJRQHWRVXFKOHQJWKVWRH[SODLQ
ZKDWDSDLQRXURULJLQDOGRWSURGXFWDOJRULWKPLV\RXVKRXOGVHHZKHUHZHǢUH
KHDGLQJ:HLQWHQGWRȌ[RXUGRWSURGXFWXVLQJDWRPLFVVRWKHHQWLUHFRPSXWD-
WLRQFDQVWD\RQWKH*38OHDYLQJ\RXU&38IUHHWRSHUIRUPRWKHUWDVNV,GHDOO\
250
LQVWHDGRIH[LWLQJWKHNHUQHOLQVWHSDQGUHWXUQLQJWRWKH&38LQVWHSZHZDQW
HDFKEORFNWRDGGLWVȌQDOUHVXOWWRDWRWDOLQJOREDOPHPRU\,IHDFKYDOXHZHUH
DGGHGDWRPLFDOO\ZHZRXOGQRWKDYHWRZRUU\DERXWSRWHQWLDOFROOLVLRQVRULQGH-
WHUPLQDWHUHVXOWV6LQFHZHKDYHDOUHDG\XVHGDQatomicAdd()RSHUDWLRQLQWKH
KLVWRJUDPRSHUDWLRQWKLVVHHPVOLNHDQREYLRXVFKRLFH
8QIRUWXQDWHO\SULRUWRFRPSXWHFDSDELOLW\atomicAdd()RSHUDWHGRQO\
RQLQWHJHUV$OWKRXJKWKLVPLJKWEHȌQHLI\RXSODQWRFRPSXWHGRWSURGXFWVRI
YHFWRUVZLWKLQWHJHUFRPSRQHQWVLWLVVLJQLȌFDQWO\PRUHFRPPRQWRXVHȍRDWLQJ
SRLQWFRPSRQHQWV+RZHYHUWKHPDMRULW\RI19,',$KDUGZDUHGRHVQRWVXSSRUW
DWRPLFDULWKPHWLFRQȍRDWLQJSRLQWQXPEHUV%XWWKHUHǢVDUHDVRQDEOHH[SODQD-
WLRQIRUWKLVVRGRQǢWWKURZ\RXU*38LQWKHJDUEDJHMXVW\HW
$WRPLFRSHUDWLRQVRQDYDOXHLQPHPRU\JXDUDQWHHRQO\WKDWHDFKWKUHDGǢVUHDG
PRGLI\ZULWHVHTXHQFHZLOOFRPSOHWHZLWKRXWRWKHUWKUHDGVUHDGLQJRUZULWLQJWKH
WDUJHWYDOXHZKLOHLQSURFHVV7KHUHLVQRVWLSXODWLRQDERXWWKHRUGHULQZKLFKWKH
WKUHDGVZLOOSHUIRUPWKHLURSHUDWLRQVVRLQWKHFDVHRIWKUHHWKUHDGVSHUIRUPLQJ
DGGLWLRQVRPHWLPHVWKHKDUGZDUHZLOOSHUIRUP(A+B)+CDQGVRPHWLPHVLW
ZLOOFRPSXWHA+(B+C)7KLVLVDFFHSWDEOHIRULQWHJHUVEHFDXVHLQWHJHUPDWKLV
DVVRFLDWLYHVR(A+B)+C = A+(B+C))ORDWLQJSRLQWDULWKPHWLFLVnotDVVRFLD-
WLYHEHFDXVHRIWKHURXQGLQJRILQWHUPHGLDWHUHVXOWVVR(A+B)+CRIWHQGRHV
QRWHTXDO A+(B+C)$VDUHVXOWDWRPLFDULWKPHWLFRQȍRDWLQJSRLQWYDOXHVLVRI
GXELRXVXWLOLW\EHFDXVHLWJLYHVULVHWRQRQGHWHUPLQLVWLFUHVXOWVLQDKLJKO\PXOWL-
WKUHDGHGHQYLURQPHQWVXFKDVRQWKH*387KHUHDUHPDQ\DSSOLFDWLRQVZKHUH
LWLVVLPSO\XQDFFHSWDEOHWRJHWWZRGLIIHUHQWUHVXOWVIURPWZRUXQVRIDQDSSOL-
FDWLRQVRWKHVXSSRUWRIȍRDWLQJSRLQWDWRPLFDULWKPHWLFZDVQRWDSULRULW\IRU
HDUOLHUKDUGZDUH
+RZHYHULIZHDUHZLOOLQJWRWROHUDWHVRPHQRQGHWHUPLQLVPLQWKHUHVXOWVZHFDQ
VWLOODFFRPSOLVKWKHUHGXFWLRQHQWLUHO\RQWKH*38%XWZHǢOOȌUVWQHHGWRGHYHORS
DZD\WRZRUNDURXQGWKHODFNRIDWRPLFȍRDWLQJSRLQWDULWKPHWLF7KHVROXWLRQ
ZLOOVWLOOXVHDWRPLFRSHUDWLRQVEXWQRWIRUWKHDULWKPHWLFLWVHOI
$ $720,&/2&.6
7KHatomicAdd()IXQFWLRQZHXVHGWREXLOG*38KLVWRJUDPVSHUIRUPHGD
UHDGPRGLI\ZULWHRSHUDWLRQZLWKRXWLQWHUUXSWLRQIURPRWKHUWKUHDGV$WDORZ
OHYHO\RXFDQLPDJLQHWKHKDUGZDUHORFNLQJWKHWDUJHWPHPRU\ORFDWLRQZKLOH
WKLVRSHUDWLRQLVXQGHUZD\DQGZKLOHORFNHGQRRWKHUWKUHDGVFDQUHDGRUZULWH
WKHYDOXHDWWKHORFDWLRQ,IZHKDGDZD\RIHPXODWLQJWKLVORFNLQRXU&8'$&
NHUQHOVZHFRXOGSHUIRUPDUELWUDU\RSHUDWLRQVRQDQDVVRFLDWHGPHPRU\ORFDWLRQ
251
RUGDWDVWUXFWXUH7KHORFNLQJPHFKDQLVPLWVHOIZLOORSHUDWHH[DFWO\OLNHDW\SLFDO
CPU mutex,I\RXDUHXQIDPLOLDUZLWKPXWXDOH[FOXVLRQ mutex GRQǢWIUHW,WǢVQRW
DQ\PRUHFRPSOLFDWHGWKDQWKHWKLQJV\RXǢYHDOUHDG\OHDUQHG
7KHEDVLFLGHDLVWKDWZHDOORFDWHDVPDOOSLHFHPHPRU\WREHXVHGDVDmutex
7KHPXWH[ZLOODFWOLNHVRPHWKLQJRIDWUDIȌFVLJQDOWKDWJRYHUQVDFFHVVWRVRPH
UHVRXUFH7KHUHVRXUFHFRXOGEHDGDWDVWUXFWXUHDEXIIHURUVLPSO\DPHPRU\
ORFDWLRQZHZDQWWRPRGLI\DWRPLFDOO\:KHQDWKUHDGUHDGVDIURPWKHPXWH[
LWLQWHUSUHWVWKLVYDOXHDVDǤJUHHQOLJKWǥLQGLFDWLQJWKDWQRRWKHUWKUHDGLVXVLQJ
WKHPHPRU\7KHUHIRUHWKHWKUHDGLVIUHHWRORFNWKHPHPRU\DQGPDNHZKDWHYHU
FKDQJHVLWGHVLUHVIUHHRILQWHUIHUHQFHIURPRWKHUWKUHDGV7RORFNWKHPHPRU\
ORFDWLRQLQTXHVWLRQWKHWKUHDGZULWHVDWRWKHPXWH[7KLVZLOODFWDVDǤUHG
OLJKWǥIRUSRWHQWLDOO\FRPSHWLQJWKUHDGV7KHFRPSHWLQJWKUHDGVPXVWWKHQZDLW
XQWLOWKHRZQHUKDVZULWWHQDWRWKHPXWH[EHIRUHWKH\FDQDWWHPSWWRPRGLI\WKH
ORFNHGPHPRU\
$VLPSOHFRGHVHTXHQFHWRDFFRPSOLVKWKLVORFNLQJSURFHVVPLJKWORRNOLNHWKLV
8QIRUWXQDWHO\WKHUHǢVDSUREOHPZLWKWKLVFRGH)RUWXQDWHO\LWǢVDIDPLOLDU
SUREOHP:KDWKDSSHQVLIDQRWKHUWKUHDGZULWHVDWRWKHPXWH[DIWHURXUWKUHDG
KDVUHDGWKHYDOXHWREH]HUR"7KDWLVERWKWKUHDGVFKHFNWKHYDOXHDWmutex
DQGVHHWKDWLWǢV]HUR7KH\WKHQERWKZULWHDWRWKLVORFDWLRQWRVLJQLI\WRRWKHU
WKUHDGVWKDWWKHVWUXFWXUHLVORFNHGDQGXQDYDLODEOHIRUPRGLȌFDWLRQ$IWHUGRLQJ
VRERWKWKUHDGVWKLQNWKH\RZQWKHDVVRFLDWHGPHPRU\RUGDWDVWUXFWXUHDQG
EHJLQPDNLQJXQVDIHPRGLȌFDWLRQV&DWDVWURSKHHQVXHV
7KHRSHUDWLRQZHZDQWWRFRPSOHWHLVIDLUO\VLPSOH:HQHHGWRFRPSDUHWKHYDOXH
DWmutexWRDQGVWRUHDDWWKDWORFDWLRQLIDQGRQO\LIWKHmutexZDV7R
DFFRPSOLVKWKLVFRUUHFWO\WKLVHQWLUHRSHUDWLRQQHHGVWREHSHUIRUPHGDWRPLFDOO\VR
ZHNQRZWKDWQRRWKHUWKUHDGFDQLQWHUIHUHZKLOHRXUWKUHDGH[DPLQHVDQGXSGDWHV
WKHYDOXHDWmutex,Q&8'$&WKLVRSHUDWLRQFDQEHSHUIRUPHGZLWKWKHIXQFWLRQ
atomicCAS()DQDWRPLFFRPSDUHDQGVZDS7KHIXQFWLRQatomicCAS()WDNHV
DSRLQWHUWRPHPRU\DYDOXHZLWKZKLFKWRFRPSDUHWKHYDOXHDWWKDWORFDWLRQDQGD
YDOXHWRVWRUHLQWKDWORFDWLRQLIWKHFRPSDULVRQLVVXFFHVVIXO8VLQJWKLVRSHUDWLRQ
ZHFDQLPSOHPHQWD*38ORFNIXQFWLRQDVIROORZV
252
7KHFDOOWRatomicCAS()UHWXUQVWKHYDOXHWKDWLWIRXQGDWWKHDGGUHVVmutex
$VDUHVXOWWKHwhile()ORRSZLOOFRQWLQXHWRUXQXQWLOatomicCAS()VHHVD
DWmutex:KHQLWVHHVDWKHFRPSDULVRQLVVXFFHVVIXODQGWKHWKUHDGZULWHV
DWRmutex(VVHQWLDOO\WKHWKUHDGZLOOVSLQLQWKHwhile() ORRSXQWLOLWKDV
VXFFHVVIXOO\ORFNHGWKHGDWDVWUXFWXUH:HǢOOXVHWKLVORFNLQJPHFKDQLVPWR
LPSOHPHQWRXU*38KDVKWDEOH%XWȌUVWZHGUHVVWKHFRGHXSLQDVWUXFWXUHVRLW
ZLOOEHFOHDQHUWRXVHLQWKHGRWSURGXFWDSSOLFDWLRQ
struct Lock {
int *mutex;
Lock( void ) {
int state = 0;
HANDLE_ERROR( cudaMalloc( (void**)& mutex,
sizeof(int) ) );
HANDLE_ERROR( cudaMemcpy( mutex, &state, sizeof(int),
cudaMemcpyHostToDevice ) );
}
~Lock( void ) {
cudaFree( mutex );
}
253
,I\RXǢUHH[SHFWLQJVRPHVXEWOHKLGGHQUHDVRQZK\WKLVPHWKRGIDLOVZHKDWHWR
GLVDSSRLQW\RXEXWWKLVZRXOGZRUNDVZHOO6RZK\QRWXVHWKLVPRUHREYLRXV
PHWKRG"$WRPLFWUDQVDFWLRQVDQGJHQHULFJOREDOPHPRU\RSHUDWLRQVIROORZ
GLIIHUHQWSDWKVWKURXJKWKH*388VLQJERWKDWRPLFVDQGVWDQGDUGJOREDOPHPRU\
RSHUDWLRQVFRXOGWKHUHIRUHOHDGWRDQunlock()VHHPLQJRXWRIV\QFZLWKD
VXEVHTXHQWDWWHPSWWRlock()WKHPXWH[7KHEHKDYLRUZRXOGVWLOOEHIXQFWLRQ-
DOO\FRUUHFWEXWWRHQVXUHFRQVLVWHQWO\LQWXLWLYHEHKDYLRUIURPWKHDSSOLFDWLRQǢV
SHUVSHFWLYHLWǢVEHVWWRXVHWKHVDPHSDWKZD\IRUDOODFFHVVHVWRWKHPXWH[
%HFDXVHZHǢUHUHTXLUHGWRXVHDQDWRPLFWRORFNWKHUHVRXUFHZHKDYHFKRVHQWR
DOVRXVHDQDWRPLFWRXQORFNWKHUHVRXUFH
$ '27352'8&75('8;$720,&/2&.6
7KHRQO\SLHFHRIRXUHDUOLHUGRWSURGXFWH[DPSOHWKDWZHHQGHDYRUWRFKDQJH
LVWKHȌQDO&38EDVHGSRUWLRQRIWKHUHGXFWLRQ,QWKHSUHYLRXVVHFWLRQZH
GHVFULEHGKRZZHLPSOHPHQWDPXWH[RQWKH*387KHLockVWUXFWXUHWKDW
LPSOHPHQWVWKLVPXWH[LVORFDWHGLQlock.hDQGLQFOXGHGDWWKHEHJLQQLQJRIRXU
LPSURYHGGRWSURGXFWH[DPSOH
#include "../common/book.h"
#include "lock.h"
:LWKWZRH[FHSWLRQVWKHEHJLQQLQJRIRXUGRWSURGXFWNHUQHOLVLGHQWLFDOWRWKH
NHUQHOZHXVHGLQ&KDSWHU%RWKH[FHSWLRQVLQYROYHWKHNHUQHOǢVVLJQDWXUH
__global__ void dot( Lock lock, float *a, float *b, float *c )
254
,QRXUXSGDWHGGRWSURGXFWZHSDVVDLockWRWKHNHUQHOLQDGGLWLRQWRLQSXW
YHFWRUVDQGWKHRXWSXWEXIIHU7KHLockZLOOJRYHUQDFFHVVWRWKHRXWSXWEXIIHU
GXULQJWKHȌQDODFFXPXODWLRQVWHS7KHRWKHUFKDQJHLVQRWnoticeableIURPWKH
VLJQDWXUHEXWLQYROYHVWKHVLJQDWXUH3UHYLRXVO\WKHfloat *cDUJXPHQWZDVD
EXIIHUIRUNȍRDWVZKHUHHDFKRIWKHNEORFNVFRXOGVWRUHLWVSDUWLDOUHVXOW7KLV
EXIIHUZDVFRSLHGEDFNWRWKH&38WRFRPSXWHWKHȌQDOVXP1RZWKHDUJXPHQW
cQRORQJHUSRLQWVWRDWHPSRUDU\EXIIHUEXWWRDVLQJOHȍRDWLQJSRLQWYDOXHWKDW
ZLOOVWRUHWKHGRWSURGXFWRIWKHYHFWRUVLQaDQGb%XWHYHQZLWKWKHVHFKDQJHV
WKHNHUQHOVWDUWVRXWH[DFWO\DVLWGLGLQ&KDSWHU
float temp = 0;
while (tid < N) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
255
$WWKLVSRLQWLQH[HFXWLRQWKHWKUHDGVLQHDFKEORFNKDYHVXPPHGWKHLU
SDLUZLVHSURGXFWVDQGFRPSXWHGDVLQJOHYDOXHWKDWǢVVLWWLQJLQcache[0](DFK
WKUHDGEORFNQRZQHHGVWRDGGLWVȌQDOYDOXHWRWKHYDOXHDWc7RGRWKLVVDIHO\
ZHǢOOXVHWKHORFNWRJRYHUQDFFHVVWRWKLVPHPRU\ORFDWLRQVRHDFKWKUHDGQHHGV
WRDFTXLUHWKHORFNEHIRUHXSGDWLQJWKHYDOXH c$IWHUDGGLQJWKHEORFNǢVSDUWLDO
VXPWRWKHYDOXHDWcLWXQORFNVWKHPXWH[VRRWKHUWKUHDGVFDQDFFXPXODWHWKHLU
YDOXHV$IWHUDGGLQJLWVYDOXHWRWKHȌQDOUHVXOWWKHEORFNKDVQRWKLQJUHPDLQLQJ
WRFRPSXWHDQGFDQUHWXUQIURPWKHNHUQHO
if (cacheIndex == 0) {
lock.lock();
*c += cache[0];
lock.unlock();
}
}
7KHmain()URXWLQHLVYHU\VLPLODUWRRXURULJLQDOLPSOHPHQWDWLRQWKRXJKLWGRHV
KDYHDFRXSOHGLIIHUHQFHV)LUVWZHQRORQJHUQHHGWRDOORFDWHDEXIIHUIRUSDUWLDO
UHVXOWVDVZHGLGLQ&KDSWHU:HQRZDOORFDWHVSDFHIRURQO\DVLQJOHȍRDWLQJ
SRLQWUHVXOW
256
$VZHGLGLQ&KDSWHUZHLQLWLDOL]HRXULQSXWDUUD\VDQGFRS\WKHPWRWKH
*38%XW\RXǢOOQRWLFHDQDGGLWLRQDOFRS\LQWKLVH[DPSOH:HǢUHDOVRFRS\LQJ
D]HURWRdev_cWKHORFDWLRQWKDWZHLQWHQGWRXVHWRDFFXPXODWHRXUȌQDOGRW
SURGXFW6LQFHHDFKEORFNZDQWVWRUHDGWKLVYDOXHDGGLWVSDUWLDOVXPDQG
VWRUHWKHUHVXOWEDFNZHQHHGWKHLQLWLDOYDOXHWREH]HURLQRUGHUWRJHWWKH
FRUUHFWUHVXOW
$OOWKDWUHPDLQVLVGHFODULQJRXULockLQYRNLQJWKHNHUQHODQGFRS\LQJWKH
UHVXOWEDFNWRWKH&38
Lock lock;
dot<<<blocksPerGrid,threadsPerBlock>>>( lock, dev_a,
dev_b, dev_c );
257
,Q&KDSWHUWKLVLVZKHQZHZRXOGGRDȌQDOfor()ORRSWRDGGWKHSDUWLDO
VXPV6LQFHWKLVLVGRQHRQWKH*38XVLQJDWRPLFORFNVZHFDQVNLSULJKWWRWKH
DQVZHUFKHFNLQJDQGFOHDQXSFRGH
%HFDXVHWKHUHLVQRZD\WRSUHFLVHO\SUHGLFWWKHRUGHULQZKLFKHDFKEORFNZLOO
DGGLWVSDUWLDOVXPWRWKHȌQDOWRWDOLWLVYHU\OLNHO\ DOPRVWFHUWDLQ WKDWWKHȌQDO
UHVXOWZLOOEHVXPPHGLQDGLIIHUHQWRUGHUWKDQWKH&38ZLOOVXPLW%HFDXVHRI
WKHQRQDVVRFLDWLYLW\RIȍRDWLQJSRLQWDGGLWLRQLWǢVWKHUHIRUHTXLWHSUREDEOHWKDW
WKHȌQDOUHVXOWZLOOEHVOLJKWO\GLIIHUHQWEHWZHHQWKH*38DQG&387KHUHLVQRW
PXFKWKDWFDQEHGRQHDERXWWKLVZLWKRXWDGGLQJDQRQWULYLDOFKXQNRIFRGHWR
HQVXUHWKDWWKHEORFNVDFTXLUHWKHORFNLQDGHWHUPLQLVWLFRUGHUWKDWPDWFKHVWKH
VXPPDWLRQRUGHURQWKH&38,I\RXIHHOH[WUDRUGLQDULO\PRWLYDWHGJLYHWKLVDWU\
2WKHUZLVHZHǢOOPRYHRQWRVHHKRZWKHVHDWRPLFORFNVFDQEHXVHGWRLPSOH-
PHQWDPXOWLWKUHDGHGGDWDVWUXFWXUH
$ ,PSOHPHQWLQJD+DVK7DEOH
7KHKDVKWDEOHLVRQHRIWKHPRVWLPSRUWDQWDQGFRPPRQO\XVHGGDWDVWUXFWXUHV
LQFRPSXWHUVFLHQFHSOD\LQJDQLPSRUWDQWUROHLQDZLGHYDULHW\RIDSSOLFDWLRQV
)RUUHDGHUVQRWDOUHDG\IDPLOLDUZLWKKDVKWDEOHVZHǢOOSURYLGHDTXLFNSULPHU
KHUH7KHVWXG\RIGDWDVWUXFWXUHVZDUUDQWVPRUHLQGHSWKVWXG\WKDQZHLQWHQG
WRSURYLGHEXWLQWKHLQWHUHVWRIPDNLQJIRUZDUGSURJUHVVZHZLOONHHSWKLVEULHI
,I\RXDOUHDG\IHHOFRPIRUWDEOHZLWKWKHFRQFHSWVEHKLQGKDVKWDEOHV\RXVKRXOG
VNLSWRWKHKDVKWDEOHLPSOHPHQWDWLRQLQ6HFWLRQ$$&38+DVK7DEOH
258
$ +$6+7$%/(29(59,(:
$KDVKWDEOHLVHVVHQWLDOO\DVWUXFWXUHWKDWLVGHVLJQHGWRVWRUHSDLUVRIkeysDQG
values)RUH[DPSOH\RXFRXOGWKLQNRIDGLFWLRQDU\DVDKDVKWDEOH(YHU\ZRUGLQ
WKHGLFWLRQDU\LVDkeyDQGHDFKZRUGKDVDGHȌQLWLRQDVVRFLDWHGZLWKLW7KHGHȌ-
QLWLRQLVWKHvalueDVVRFLDWHGZLWKWKHZRUGDQGWKXVHYHU\ZRUGDQGGHȌQLWLRQLQ
WKHGLFWLRQDU\IRUPDNH\YDOXHSDLU)RUWKLVGDWDVWUXFWXUHWREHXVHIXOWKRXJK
LWLVLPSRUWDQWWKDWZHPLQLPL]HWKHWLPHLWWDNHVWRȌQGDSDUWLFXODUYDOXHLIZHǢUH
JLYHQDNH\,QJHQHUDOWKLVVKRXOGEHDFRQVWDQWDPRXQWRIWLPH7KDWLVWKHWLPH
WRORRNXSDYDOXHJLYHQDNH\VKRXOGEHWKHVDPHUHJDUGOHVVRIKRZPDQ\NH\
YDOXHSDLUVDUHLQWKHKDVKWDEOH
$WDQDEVWUDFWOHYHORXUKDVKWDEOHZLOOSODFHYDOXHVLQǤEXFNHWVǥEDVHGRQWKH
YDOXHǢVFRUUHVSRQGLQJNH\7KHPHWKRGE\ZKLFKZHPDSNH\VWREXFNHWVLVRIWHQ
FDOOHGWKHhash function$JRRGKDVKIXQFWLRQZLOOPDSWKHVHWRISRVVLEOHNH\V
XQLIRUPO\DFURVVDOOWKHEXFNHWVEHFDXVHWKLVZLOOKHOSVDWLVI\RXUUHTXLUHPHQW
WKDWLWWDNHFRQVWDQWWLPHWRȌQGDQ\YDOXHUHJDUGOHVVRIWKHQXPEHURIYDOXHV
ZHǢYHDGGHGWRWKHKDVKWDEOH
)RUH[DPSOHFRQVLGHURXUGLFWLRQDU\KDVKWDEOH2QHREYLRXVKDVKIXQFWLRQZRXOG
LQYROYHXVLQJEXFNHWVRQHIRUHDFKOHWWHURIWKHDOSKDEHW7KLVVLPSOHKDVK
IXQFWLRQPLJKWVLPSO\ORRNDWWKHȌUVWOHWWHURIWKHNH\DQGSXWWKHYDOXHLQRQH
RIWKHEXFNHWVEDVHGRQWKLVOHWWHU)LJXUH$VKRZVKRZWKLVKDVKIXQFWLRQ
ZRXOGDVVLJQIHZVDPSOHZRUGV
259
*LYHQZKDWZHNQRZDERXWWKHGLVWULEXWLRQRIZRUGVLQWKH(QJOLVKODQJXDJHWKLV
KDVKIXQFWLRQOHDYHVPXFKWREHGHVLUHGEHFDXVHLWZLOOQRWPDSZRUGVXQLIRUPO\
DFURVVWKHEXFNHWV6RPHRIWKHEXFNHWVZLOOFRQWDLQYHU\IHZNH\YDOXHSDLUV
DQGVRPHRIWKHEXFNHWVZLOOFRQWDLQDODUJHQXPEHURISDLUV$FFRUGLQJO\LW
ZLOOWDNHPXFKORQJHUWRȌQGWKHYDOXHDVVRFLDWHGZLWKDZRUGWKDWEHJLQVZLWK
DFRPPRQOHWWHUVXFKDV6WKDQLWZRXOGWDNHWRȌQGWKHYDOXHDVVRFLDWHGZLWKD
ZRUGWKDWEHJLQVZLWKWKHOHWWHU;6LQFHZHDUHORRNLQJIRUKDVKIXQFWLRQVWKDW
ZLOOJLYHXVFRQVWDQWWLPHUHWULHYDORIDQ\YDOXHWKLVFRQVHTXHQFHLVIDLUO\XQGH-
VLUDEOH$QLPPHQVHDPRXQWRIUHVHDUFKKDVJRQHLQWRWKHVWXG\RIKDVKIXQF-
WLRQVEXWHYHQDEULHIVXUYH\RIWKHVHWHFKQLTXHVLVEH\RQGWKHVFRSHRIWKLVERRN
7KHODVWFRPSRQHQWRIRXUKDVKWDEOHGDWDVWUXFWXUHLQYROYHVWKHEXFNHWV,IZH
KDGDSHUIHFWKDVKIXQFWLRQHYHU\NH\ZRXOGPDSWRDGLIIHUHQWEXFNHW,QWKLV
FDVHZHFDQVLPSO\VWRUHWKHNH\YDOXHSDLUVLQDQDUUD\ZKHUHHDFKHQWU\LQWKH
DUUD\LVZKDWZHǢYHEHHQFDOOLQJDbucket+RZHYHUHYHQZLWKDQH[FHOOHQWKDVK
IXQFWLRQLQPRVWVLWXDWLRQVZHZLOOKDYHWRGHDOZLWKcollisions$FROOLVLRQRFFXUV
ZKHQPRUHWKDQRQHNH\PDSVWRDEXFNHWVXFKDVZKHQZHDGGERWKWKHZRUGV
avocadoDQGaardvarkWRRXUGLFWLRQDU\KDVKWDEOH7KHVLPSOHVWZD\WRVWRUHDOORI
WKHYDOXHVWKDWPDSWRDJLYHQEXFNHWLVVLPSO\WRPDLQWDLQDOLVWRIYDOXHVLQWKH
EXFNHW:KHQZHHQFRXQWHUDFROOLVLRQVXFKDVDGGLQJaardvarkWRDGLFWLRQDU\
WKDWDOUHDG\FRQWDLQVavocadoZHSXWWKHYDOXHDVVRFLDWHGZLWKaardvarkDWWKH
HQGRIWKHOLVWZHǢUHPDLQWDLQLQJLQWKHǤ$ǥEXFNHWDVVKRZQLQ)LJXUH$
$IWHUDGGLQJWKHZRUGavocadoLQ)LJXUH$WKHȌUVWEXFNHWKDVDVLQJOHNH\
YDOXHSDLULQLWVOLVW/DWHULQWKLVLPDJLQDU\DSSOLFDWLRQZHDGGWKHZRUGaardvark,
DZRUGWKDWFROOLGHVZLWKavocadoEHFDXVHWKH\ERWKVWDUWZLWKWKHOHWWHUA<RX
ZLOOQRWLFHLQ)LJXUH$WKDWLWVLPSO\JHWVSODFHGDWWKHHQGRIWKHOLVWLQWKHȌUVW
EXFNHW
avocado
avocado
260
aardvark avocado
avocado aardvark
$ $&38+$6+7$%/(
$VGHVFULEHGLQWKHSUHYLRXVVHFWLRQRXUKDVKWDEOHZLOOFRQVLVWRIHVVHQWLDOO\WZR
SDUWVDKDVKIXQFWLRQDQGDGDWDVWUXFWXUHRIEXFNHWV2XUEXFNHWVZLOOEHLPSOH-
PHQWHGH[DFWO\DVEHIRUH:HZLOODOORFDWHDQDUUD\RIOHQJWKNDQGHDFKHQWU\LQ
WKHDUUD\KROGVDOLVWRINH\YDOXHSDLUV%HIRUHFRQFHUQLQJRXUVHOYHVZLWKDKDVK
IXQFWLRQZHZLOOWDNHDORRNDWWKHGDWDVWUXFWXUHVLQYROYHG
#include "../common/book.h"
struct Entry {
unsigned int key;
void* value;
Entry *next;
};
struct Table {
size_t count;
Entry **entries;
Entry *pool;
Entry *firstFree;
};
261
$VGHVFULEHGLQWKHLQWURGXFWRU\VHFWLRQWKHVWUXFWXUHEntryKROGVERWKDNH\
DQGDYDOXH,QRXUDSSOLFDWLRQZHZLOOXVHXQVLJQHGLQWHJHUNH\VWRVWRUHRXU
NH\YDOXHSDLUV7KHYDOXHDVVRFLDWHGZLWKWKLVNH\FDQEHDQ\GDWDVRZHKDYH
GHFODUHGvalueDVDvoid*WRLQGLFDWHWKLV2XUDSSOLFDWLRQZLOOSULPDULO\EH
FRQFHUQHGZLWKFUHDWLQJWKHKDVKWDEOHGDWDVWUXFWXUHVRZHZRQǢWDFWXDOO\VWRUH
DQ\WKLQJLQWKHvalueȌHOG:HKDYHLQFOXGHGLWLQWKHVWUXFWXUHIRUFRPSOHWH-
QHVVLQFDVH\RXZDQWWRXVHWKLVFRGHLQ\RXURZQDSSOLFDWLRQV7KHODVWSLHFHRI
GDWDLQRXUKDVKWDEOHEntryLVDSRLQWHUWRWKHQH[WEntry$IWHUFROOLVLRQVZHǢOO
KDYHPXOWLSOHHQWULHVLQWKHVDPHEXFNHWDQGZHKDYHGHFLGHGWRVWRUHWKHVH
HQWULHVDVDOLVW6RHDFKHQWU\ZLOOSRLQWWRWKHQH[WHQWU\LQWKHEXFNHWWKHUHE\
IRUPLQJDOLVWRIHQWULHVWKDWKDYHKDVKHGWRWKHVDPHORFDWLRQLQWKHWDEOH7KH
ODVWHQWU\ZLOOKDYHDNULL nextSRLQWHU
$WLWVKHDUWWKHTableVWUXFWXUHLWVHOILVDQDUUD\RIǤEXFNHWVǥ7KLVEXFNHW
DUUD\LVMXVWDQDUUD\RIOHQJWKcountZKHUHHDFKEXFNHWLQentriesLVMXVWD
SRLQWHUWRDQEntry7RDYRLGLQFXUULQJWKHFRPSOLFDWLRQDQGSHUIRUPDQFHKLWRI
DOORFDWLQJPHPRU\HYHU\WLPHZHZDQWWRDGGDQEntryWRWKHWDEOHWKHWDEOH
ZLOOPDLQWDLQDODUJHDUUD\RIDYDLODEOHHQWULHVLQpool7KHȌHOGfirstFree
SRLQWVWRWKHQH[WDYDLODEOHEntryIRUXVHVRZKHQZHQHHGWRDGGDQHQWU\WR
WKHWDEOHZHFDQVLPSO\XVHWKHEntrySRLQWHGWRE\firstFreeDQGLQFUHPHQW
WKDWSRLQWHU1RWHWKDWWKLVZLOODOVRVLPSOLI\RXUFOHDQXSFRGHEHFDXVHZHFDQ
IUHHDOORIWKHVHHQWULHVZLWKDVLQJOHFDOOWRfree(),IZHKDGDOORFDWHGHYHU\
HQWU\DVZHZHQWZHZRXOGKDYHWRZDONWKURXJKWKHWDEOHDQGIUHHHYHU\HQWU\
RQHE\RQH
$IWHUXQGHUVWDQGLQJWKHGDWDVWUXFWXUHVLQYROYHGOHWǢVWDNHDORRNDWVRPHRIWKH
RWKHUVXSSRUWFRGH
262
7DEOHLQLWLDOL]DWLRQFRQVLVWVSULPDULO\RIDOORFDWLQJPHPRU\DQGFOHDULQJPHPRU\
IRUWKHEXFNHWDUUD\entries:HDOVRDOORFDWHVWRUDJHIRUDSRRORIHQWULHVDQG
LQLWLDOL]HWKHfirstFreeSRLQWHUWREHWKHȌUVWHQWU\LQWKHSRRODUUD\
$WWKHHQGRIWKHDSSOLFDWLRQZHǢOOZDQWWRIUHHWKHPHPRU\ZHǢYHDOORFDWHGVR
RXUFOHDQXSURXWLQHIUHHVWKHEXFNHWDUUD\DQGWKHSRRORIIUHHHQWULHV
,QRXULQWURGXFWLRQZHVSRNHTXLWHDELWDERXWWKHKDVKIXQFWLRQ6SHFLȌFDOO\
ZHGLVFXVVHGKRZDJRRGKDVKIXQFWLRQFDQPDNHWKHGLIIHUHQFHEHWZHHQDQ
H[FHOOHQWKDVKWDEOHLPSOHPHQWDWLRQDQGSRRURQH,QWKLVH[DPSOHZHǢUHXVLQJ
XQVLJQHGLQWHJHUVDVRXUNH\VDQGZHQHHGWRPDSWKHVHWRWKHLQGLFHVRIRXU
EXFNHWDUUD\7KHVLPSOHVWZD\WRGRWKLVZRXOGEHWRVHOHFWWKHEXFNHWZLWKDQ
LQGH[HTXDOWRWKHNH\7KDWLVZHFRXOGVWRUHWKHHQWU\eLQtable.entries[e.
key]+RZHYHUZHKDYHQRZD\RIJXDUDQWHHLQJWKDWHYHU\NH\ZLOOEHOHVVWKDQ
WKHOHQJWKRIWKHDUUD\RIEXFNHWV)RUWXQDWHO\WKLVSUREOHPFDQEHVROYHGUHOD-
WLYHO\SDLQOHVVO\
,IWKHKDVKIXQFWLRQLVVRLPSRUWDQWKRZFDQZHJHWDZD\ZLWKVXFKDVLPSOH
RQH",GHDOO\ZHZDQWWKHNH\VWRPDSXQLIRUPO\DFURVVDOOWKHEXFNHWVLQRXU
WDEOHDQGDOOZHǢUHGRLQJKHUHLVWDNLQJWKHNH\PRGXORWKHDUUD\OHQJWK,Q
UHDOLW\KDVKIXQFWLRQVPD\QRWQRUPDOO\EHWKLVVLPSOHEXWEHFDXVHWKLVLVMXVW
DQH[DPSOHSURJUDPZHZLOOEHUDQGRPO\JHQHUDWLQJRXUNH\V,IZHDVVXPH
WKDWWKHUDQGRPQXPEHUJHQHUDWRUJHQHUDWHVYDOXHVURXJKO\XQLIRUPO\WKLV
KDVKIXQFWLRQVKRXOGPDSWKHVHNH\VXQLIRUPO\DFURVVDOORIWKHEXFNHWVRIWKH
KDVKWDEOH,Q\RXURZQKDVKWDEOHLPSOHPHQWDWLRQ\RXPD\UHTXLUHDPRUH
FRPSOLFDWHGKDVKIXQFWLRQ
263
+DYLQJVHHQWKHKDVKWDEOHVWUXFWXUHVDQGWKHKDVKIXQFWLRQZHǢUHUHDG\WRORRN
DWWKHSURFHVVRIDGGLQJDNH\YDOXHSDLUWRWKHWDEOH7KHSURFHVVLQYROYHVWKUHH
EDVLFVWHSV
&RPSXWHWKHKDVKIXQFWLRQRQWKHLQSXWNH\WRGHWHUPLQHWKHQHZHQWU\ǢV
EXFNHW
7DNHDSUHDOORFDWHGEntryIURPWKHSRRODQGLQLWLDOL]HLWVkeyDQGvalue
ȌHOGV
,QVHUWWKHHQWU\DWWKHIURQWRIWKHSURSHUEXFNHWǢVOLVW
:HWUDQVODWHWKHVHVWHSVWRFRGHLQDIDLUO\VWUDLJKWIRUZDUGZD\
//Step 2
Entry *location = table.firstFree++;
location->key = key;
location->value = value;
//Step 3
location->next = table.entries[hashValue];
table.entries[hashValue] = location;
}
264
6LQFHLWǢVDJRRGLGHDWRKDYHVRPHLGHDZKHWKHUWKHFRGH\RXǢYHZULWWHQZRUNV
ZHǢYHLPSOHPHQWHGDURXWLQHWRSHUIRUPDVDQLW\FKHFNRQDKDVKWDEOH7KH
FKHFNLQYROYHVȌUVWZDONLQJWKURXJKWKHWDEOHDQGH[DPLQLQJHYHU\QRGH
:HFRPSXWHWKHKDVKIXQFWLRQRQWKHQRGHǢVNH\DQGFRQȌUPWKDWWKHQRGH
LVVWRUHGLQWKHFRUUHFWEXFNHW$IWHUFKHFNLQJHYHU\QRGHZHYHULI\WKDWWKH
QXPEHURIQRGHVactuallyLQWKHWDEOHLVLQGHHGHTXDOWRWKHQXPEHURIHOHPHQWV
we intendedWRDGGWRWKHWDEOH,IWKHVHQXPEHUVGRQǢWDJUHHWKHQHLWKHU
ZHǢYHDGGHGDQRGHDFFLGHQWDOO\WRPXOWLSOHEXFNHWVRUZHKDYHQǢWLQVHUWHGLW
FRUUHFWO\
265
:LWKDOOWKHLQIUDVWUXFWXUHFRGHRXWRIWKHZD\ZHFDQORRNDWmain()$VZLWK
PDQ\RIWKLVERRNǢVH[DPSOHVDORWRIWKHKHDY\OLIWLQJKDVEHHQGRQHLQKHOSHU
IXQFWLRQVVRZHKRSHWKDWmain()ZLOOEHUHODWLYHO\HDV\WRIROORZ
Table table;
initialize_table( table, HASH_ENTRIES, ELEMENTS );
stop = clock();
float elapsedTime = (float)(stop - start) /
(float)CLOCKS_PER_SEC * 1000.0f;
printf( "Time to hash: %3.1f ms\n", elapsedTime );
verify_table( table );
free_table( table );
free( buffer );
return 0;
}
$V\RXFDQVHHZHVWDUWE\DOORFDWLQJDELJFKXQNRIUDQGRPQXPEHUV7KHVH
UDQGRPO\JHQHUDWHGXQVLJQHGLQWHJHUVZLOOEHWKHNH\VZHLQVHUWLQWRRXU
KDVKWDEOH$IWHUJHQHUDWLQJWKHQXPEHUVZHUHDGWKHV\VWHPWLPHLQRUGHUWR
PHDVXUHWKHSHUIRUPDQFHRIRXULPSOHPHQWDWLRQ:HLQLWLDOL]HWKHKDVKWDEOHDQG
WKHQLQVHUWHDFKUDQGRPNH\LQWRWKHWDEOHXVLQJDfor()ORRS$IWHUDGGLQJDOO
WKHNH\VZHUHDGWKHV\VWHPWLPHDJDLQWRFRPSXWHWKHHODSVHGWLPHWRLQLWLDOL]H
DQGDGGWKHNH\V)LQDOO\ZHYHULI\WKHKDVKWDEOHZLWKRXUVDQLW\FKHFNURXWLQH
DQGIUHHWKHEXIIHUVZHǢYHDOORFDWHG
266
<RXSUREDEO\QRWLFHGWKDWZHDUHXVLQJNULLDVWKHYDOXHIRUHYHU\NH\YDOXHSDLU
,QDW\SLFDODSSOLFDWLRQ\RXZRXOGOLNHO\VWRUHVRPHXVHIXOGDWDZLWKWKHNH\EXW
EHFDXVHZHDUHSULPDULO\FRQFHUQHGZLWKWKHKDVKWDEOHLPSOHPHQWDWLRQLWVHOI
ZHǢUHVWRULQJDPHDQLQJOHVVYDOXHZLWKHDFKNH\
$ 08/7,7+5($'('+$6+7$%/(
7KHUHDUHVRPHDVVXPSWLRQVEXLOWLQWRRXU&38KDVKWDEOHWKDWZLOOQRORQJHUEH
YDOLGZKHQZHPRYHWRWKH*38)LUVWZHKDYHDVVXPHGWKDWRQO\RQHQRGHFDQ
EHDGGHGWRWKHWDEOHDWDWLPHLQRUGHUWRPDNHWKHDGGLWLRQRIDQRGHVLPSOHU,I
PRUHWKDQRQHWKUHDGZHUHWU\LQJWRDGGDQRGHWRWKHWDEOHDWRQFHZHFRXOGHQG
XSZLWKSUREOHPVVLPLODUWRWKHPXOWLWKUHDGHGDGGLWLRQSUREOHPVLQWKHH[DPSOH
IURP&KDSWHU
)RUH[DPSOHOHWǢVUHYLVLWRXUǤDYRFDGRDQGDDUGYDUNǥH[DPSOHDQGLPDJLQHWKDW
WKUHDGV$DQG%DUHWU\LQJWRDGGWKHVHHQWULHVWRWKHWDEOH7KUHDG$FRPSXWHVD
KDVKIXQFWLRQRQavocadoDQGWKUHDG%FRPSXWHVWKHIXQFWLRQRQaardvark7KH\
ERWKGHFLGHWKHLUNH\VEHORQJLQWKHVDPHEXFNHW7RDGGWKHQHZHQWU\WRWKHOLVW
WKUHDG$DQG%VWDUWE\VHWWLQJWKHLUQHZHQWU\ǢVnextSRLQWHUWRWKHȌUVWQRGHRI
WKHH[LVWLQJOLVWDVLQ)LJXUH$
7KHQERWKWKUHDGVWU\WRUHSODFHWKHHQWU\LQWKHEXFNHWDUUD\ZLWKWKHLUQHZ
HQWU\+RZHYHUWKHWKUHDGWKDWȌQLVKHVVHFRQGLVWKHRQO\WKUHDGWKDWKDVLWV
XSGDWHSUHVHUYHGEHFDXVHLWRYHUZULWHVWKHZRUNRIWKHSUHYLRXVWKUHDG6R
FRQVLGHUWKHVFHQDULRZKHUHWKUHDG$UHSODFHVWKHHQWU\altitudeZLWKLWVHQWU\IRU
avocado,PPHGLDWHO\DIWHUȌQLVKLQJWKUHDG%UHSODFHVZKDWLWEHOLHYHWREHWKH
HQWU\IRUaltitudeZLWKLWVHQWU\IRUaardvark8QIRUWXQDWHO\LWǢVUHSODFLQJavocado
LQVWHDGRIaltitudeUHVXOWLQJLQWKHVLWXDWLRQLOOXVWUDWHGLQ)LJXUH$
avocado
altitude audience
aardvark
267
avocado
7KUHDG$ǢVHQWU\LVWUDJLFDOO\ǤȍRDWLQJǥRXWVLGHRIWKHKDVKWDEOH)RUWXQDWHO\RXU
VDQLW\FKHFNURXWLQHZRXOGFDWFKWKLVDQGDOHUWXVWRWKHSUHVHQFHRIDSUREOHP
EHFDXVHLWZRXOGFRXQWIHZHUQRGHVWKDQZHH[SHFWHG%XWZHVWLOOQHHGWR
DQVZHUWKLVTXHVWLRQ+RZGRZHEXLOGDKDVKWDEOHRQWKH*38"7KHNH\REVHU-
YDWLRQKHUHLQYROYHVWKHIDFWWKDWRQO\RQHWKUHDGFDQVDIHO\PDNHPRGLȌFDWLRQV
WRDEXFNHWDWDWLPH7KLVLVVLPLODUWRRXUGRWSURGXFWH[DPSOHZKHUHRQO\RQH
WKUHDGDWDWLPHFRXOGVDIHO\DGGLWVYDOXHWRWKHȌQDOUHVXOW,IHDFKEXFNHWKDG
DQDWRPLFORFNDVVRFLDWHGZLWKLWZHFRXOGHQVXUHWKDWRQO\DVLQJOHWKUHDGZDV
PDNLQJFKDQJHVWRDJLYHQEXFNHWDWDWLPH
$ $*38+$6+7$%/(
$UPHGZLWKDPHWKRGWRHQVXUHVDIHPXOWLWKUHDGHGDFFHVVWRWKHKDVKWDEOHZH
FDQSURFHHGZLWKD*38LPSOHPHQWDWLRQRIWKHKDVKWDEOHDSSOLFDWLRQZHZURWH
LQ6HFWLRQ$$&38+DVK7DEOH:HǢOOQHHGWRLQFOXGHlock.hWKHLPSOH-
PHQWDWLRQRIRXU*38LockVWUXFWXUHIURP6HFWLRQ$$WRPLF/RFNVDQGZHǢOO
QHHGWRGHFODUHWKHKDVKIXQFWLRQDVD__device__IXQFWLRQ$VLGHIURPWKHVH
FKDQJHVWKHIXQGDPHQWDOGDWDVWUXFWXUHVDQGKDVKIXQFWLRQDUHLGHQWLFDOWRWKH
&38LPSOHPHQWDWLRQ
268
#include “../common/book.h”
#include “lock.h”
struct Entry {
unsigned int key;
void* value;
Entry *next;
};
struct Table {
size_t count;
Entry **entries;
Entry *pool;
};
,QLWLDOL]LQJDQGIUHHLQJWKHKDVKWDEOHFRQVLVWVRIWKHVDPHVWHSVDVZHSHUIRUPHG
RQWKH&38EXWDVZLWKSUHYLRXVH[DPSOHVZHXVH&8'$UXQWLPHIXQFWLRQVWR
DFFRPSOLVKWKLV:HXVHcudaMalloc()WRDOORFDWHDEXFNHWDUUD\DQGDSRRORI
HQWULHVDQGZHXVHcudaMemset()WRVHWWKHEXFNHWDUUD\HQWULHVWR]HUR7R
IUHHWKHPHPRU\XSRQDSSOLFDWLRQFRPSOHWLRQZHXVHcudaFree()
269
:HXVHGDURXWLQHWRFKHFNRXUKDVKWDEOHIRUFRUUHFWQHVVLQWKH&38LPSOHPHQ-
WDWLRQ:HQHHGDVLPLODUURXWLQHIRUWKH*38YHUVLRQVRZHKDYHWZRRSWLRQV:H
FRXOGZULWHD*38EDVHGYHUVLRQRIverify_table()RUZHFRXOGXVHWKHVDPH
FRGHZHXVHGLQWKH&38YHUVLRQDQGDGGDIXQFWLRQWKDWFRSLHVDKDVKWDEOHIURP
WKH*38WRWKH&38$OWKRXJKHLWKHURSWLRQJHWVXVZKDWZHQHHGWKHVHFRQG
RSWLRQVHHPVVXSHULRUIRUWZRUHDVRQV)LUVWLWLQYROYHVUHXVLQJRXU&38YHUVLRQ
RIverify_table()$VZLWKFRGHUHXVHLQJHQHUDOWKLVVDYHVWLPHDQGHQVXUHV
WKDWIXWXUHFKDQJHVWRWKHFRGHZRXOGQHHGWREHPDGHLQRQO\RQHSODFHIRUERWK
YHUVLRQVRIWKHKDVKWDEOH6HFRQGLPSOHPHQWLQJDFRS\IXQFWLRQZLOOXQFRYHUDQ
LQWHUHVWLQJSUREOHPWKHVROXWLRQWRZKLFKPD\EHYHU\XVHIXOWR\RXLQWKHIXWXUH
$VSURPLVHGverify_table()LVLGHQWLFDOWRWKH&38LPSOHPHQWDWLRQDQGLV
UHSULQWHGKHUHIRU\RXUFRQYHQLHQFH
int count = 0;
for (size_t i=0; i<table.count; i++) {
Entry *current = table.entries[i];
while (current != NULL) {
++count;
if (hash( current->value, table.count ) != i)
printf( "%d hashed to %ld, but was located "
"at %ld\n", current->value,
hash(current->value, table.count), i );
current = current->next;
}
}
270
if (count != ELEMENTS)
printf( “%d elements found in hash table. Should be %ld\n”,
count, ELEMENTS );
else
printf( “All %d elements found in hash table.\n”, count );
free( table.pool );
free( table.entries );
}
6LQFHZHFKRVHWRUHXVHRXU&38LPSOHPHQWDWLRQRIverify_table()ZHQHHGD
IXQFWLRQWRFRS\WKHWDEOHIURP*38PHPRU\WRKRVWPHPRU\7KHUHDUHWKUHHVWHSV
WRWKLVIXQFWLRQWZRUHODWLYHO\REYLRXVVWHSVDQGDWKLUGWULFNLHUVWHS7KHȌUVWWZR
VWHSVLQYROYHDOORFDWLQJKRVWPHPRU\IRUWKHKDVKWDEOHGDWDDQGSHUIRUPLQJDFRS\
RIWKH*38GDWDVWUXFWXUHVLQWRWKLVPHPRU\ZLWKcudaMemcpy():HKDYHGRQH
WKLVPDQ\WLPHVSUHYLRXVO\VRWKLVVKRXOGFRPHDVQRVXUSULVH
7KHWULFN\SRUWLRQRIWKLVURXWLQHLQYROYHVWKHIDFWWKDWVRPHRIWKHGDWDZHKDYH
FRSLHGDUHSRLQWHUV:HFDQQRWVLPSO\FRS\WKHVHSRLQWHUVWRWKHKRVWEHFDXVH
WKH\DUHDGGUHVVHVRQWKH*38WKH\ZLOOQRORQJHUEHYDOLGSRLQWHUVRQWKHKRVW
+RZHYHUWKHUHODWLYHRIIVHWVRIWKHSRLQWHUVwillVWLOOEHYDOLG(YHU\*38SRLQWHU
271
WRDQEntrySRLQWVVRPHZKHUHZLWKLQWKHtable.pool[]DUUD\EXWIRUWKHKDVK
WDEOHWREHXVDEOHRQWKHKRVWZHQHHGWKHPWRSRLQWWRWKHVDPHEntryLQWKH
hostTable.pool[]DUUD\
*LYHQD*38SRLQWHU;ZHWKHUHIRUHQHHGWRDGGWKHSRLQWHUǢVRIIVHWIURPtable.
poolWRhostTable.poolWRJHWDYDOLGKRVWSRLQWHU7KDWLVWKHQHZSRLQWHU
VKRXOGEHFRPSXWHGDVIROORZV
(X - table.pool) + hostTable.pool
:HSHUIRUPWKLVXSGDWHIRUHYHU\EntrySRLQWHUZHǢYHFRSLHGIURPWKH*38WKH
EntrySRLQWHUVLQhostTable.entriesDQGWKHnextSRLQWHURIHYHU\Entry
LQWKHWDEOHǢVSRRORIHQWULHV
+DYLQJVHHQWKHGDWDVWUXFWXUHVKDVKIXQFWLRQLQLWLDOL]DWLRQFOHDQXSDQGYHULȌ-
FDWLRQFRGHWKHPRVWLPSRUWDQWSLHFHUHPDLQLQJLVWKHRQHWKDWDFWXDOO\LQYROYHV
&8'$&DWRPLFV$VDUJXPHQWVWKHadd_to_table()NHUQHOZLOOWDNHDQDUUD\
RINH\VDQGYDOXHVWREHDGGHGWRWKHKDVKWDEOH,WVQH[WDUJXPHQWLVWKHKDVK
WDEOHLWVHOIDQGWKHȌQDODUJXPHQWLVDQDUUD\RIORFNVWKDWZLOOEHXVHGWRORFN
HDFKRIWKHWDEOHǢVEXFNHWV6LQFHRXULQSXWLVWZRDUUD\VWKDWRXUWKUHDGVZLOO
QHHGWRLQGH[ZHDOVRQHHGRXUDOOWRRFRPPRQLQGH[OLQHDUL]DWLRQ
2XUWKUHDGVZDONWKURXJKWKHLQSXWDUUD\VH[DFWO\OLNHWKH\GLGLQWKHGRWSURGXFW
H[DPSOH)RUHDFKNH\LQWKHkeys[]DUUD\WKHWKUHDGZLOOFRPSXWHWKHKDVK
IXQFWLRQLQRUGHUWRGHWHUPLQHZKLFKEXFNHWWKHNH\YDOXHSDLUEHORQJVLQ$IWHU
GHWHUPLQLQJWKHWDUJHWEXFNHWWKHWKUHDGORFNVWKHEXFNHWDGGVLWVNH\YDOXH
SDLUDQGXQORFNVWKHEXFNHW
7KHUHLVVRPHWKLQJUHPDUNDEO\SHFXOLDUDERXWWKLVELWRIFRGHKRZHYHU7KH
for()ORRSDQGVXEVHTXHQWif()VWDWHPHQWVHHPGHFLGHGO\XQQHFHVVDU\,Q
&KDSWHUZHLQWURGXFHGWKHFRQFHSWRIDwarp,I\RXǢYHIRUJRWWHQDZDUSLVD
FROOHFWLRQRIWKUHDGVWKDWH[HFXWHWRJHWKHULQORFNVWHS$OWKRXJKWKHQXDQFHV
RIKRZWKLVJHWVLPSOHPHQWHGLQWKH*38DUHEH\RQGWKHVFRSHRIWKLVERRNRQO\
RQHWKUHDGLQWKHZDUSFDQDFTXLUHWKHORFNDWDWLPHDQGZHZLOOVXIIHUPDQ\D
KHDGDFKHLIZHOHWDOOWKUHDGVLQWKHZDUSFRQWHQGIRUWKHORFNVLPXOWDQHRXVO\
,QWKLVVLWXDWLRQZHǢYHIRXQGWKDWLWǢVEHVWWRGRVRPHRIWKHZRUNLQVRIWZDUHDQG
VLPSO\ZDONWKURXJKHDFKWKUHDGLQWKHZDUSJLYLQJHDFKDFKDQFHWRDFTXLUHWKH
GDWDVWUXFWXUHǢVORFNGRLWVZRUNDQGVXEVHTXHQWO\UHOHDVHWKHORFN
7KHȍRZRImain()VKRXOGDSSHDULGHQWLFDOWRWKH&38LPSOHPHQWDWLRQ:H
VWDUWE\DOORFDWLQJDODUJHFKXQNRIUDQGRPGDWDIRURXUKDVKWDEOHNH\V7KHQZH
FUHDWHVWDUWDQGVWRS&8'$HYHQWVDQGUHFRUGWKHVWDUWHYHQWIRURXUSHUIRUPDQFH
273
PHDVXUHPHQWV:HSURFHHGWRDOORFDWH*38PHPRU\IRURXUDUUD\RIUDQGRP
NH\VFRS\WKHDUUD\XSWRWKHGHYLFHDQGLQLWLDOL]HRXUKDVKWDEOH
Table table;
initialize_table( table, HASH_ENTRIES, ELEMENTS );
7KHODVWVWHSRISUHSDUDWLRQWREXLOGRXUKDVKWDEOHLQYROYHVSUHSDULQJORFNVIRU
WKHKDVKWDEOHǢVEXFNHWV:HDOORFDWHRQHORFNIRUHDFKEXFNHWLQWKHKDVKWDEOH
&RQFHLYDEO\ZHFRXOGVDYHDORWRIPHPRU\E\XVLQJRQO\RQHORFNIRUWKHZKROH
WDEOH%XWGRLQJVRZRXOGXWWHUO\GHVWUR\SHUIRUPDQFHEHFDXVHHYHU\WKUHDG
ZRXOGKDYHWRFRPSHWHIRUWKHWDEOHORFNZKHQHYHUDJURXSRIWKUHDGVWULHVWR
VLPXOWDQHRXVO\DGGHQWULHVWRWKHWDEOH6RZHGHFODUHDQDUUD\RIORFNVRQHIRU
HYHU\EXFNHWLQWKHDUUD\:HWKHQDOORFDWHD*38DUUD\IRUWKHORFNVDQGFRS\
WKHPXSWRWKHGHYLFH
274
Lock lock[HASH_ENTRIES];
Lock *dev_lock;
HANDLE_ERROR( cudaMalloc( (void**)&dev_lock,
HASH_ENTRIES * sizeof( Lock ) ) );
HANDLE_ERROR( cudaMemcpy( dev_lock, lock,
HASH_ENTRIES * sizeof( Lock ),
cudaMemcpyHostToDevice ) );
7KHUHVWRImain()LVVLPLODUWRWKH&38YHUVLRQ:HDGGDOORIRXUNH\VWRWKH
KDVKWDEOHVWRSWKHSHUIRUPDQFHWLPHUYHULI\WKHFRUUHFWQHVVRIWKHKDVKWDEOH
DQGFOHDQXSDIWHURXUVHOYHV
verify_table( table );
275
$ +$6+7$%/(3(5)250$1&(
8VLQJDQ,QWHO&RUH'XRWKH&38KDVKWDEOHH[DPSOHLQ6HFWLRQ$$&38
+DVK7DEOHWDNHVPVWREXLOGDKDVKWDEOHIURP0%RIGDWD7KHFRGH
ZDVEXLOWZLWKWKHRSWLRQ-O3WRHQVXUHPD[LPDOO\RSWLPL]HG&38FRGH7KH
PXOWLWKUHDGHG*38KDVKWDEOHLQ6HFWLRQ$$*38+DVK7DEOHWDNHVPV
WRFRPSOHWHWKHVDPHWDVN'LIIHULQJE\OHVVWKDQSHUFHQWWKHVHDUHURXJKO\
FRPSDUDEOHH[HFXWLRQWLPHVZKLFKUDLVHVDQH[FHOOHQWTXHVWLRQ:K\ZRXOGVXFK
DPDVVLYHO\SDUDOOHOPDFKLQHVXFKDVD*38JHWEHDWHQE\DVLQJOHWKUHDGHG&38
YHUVLRQRIWKHVDPHDSSOLFDWLRQ")UDQNO\WKLVLVEHFDXVH*38VZHUHQRWGHVLJQHG
WRH[FHODWPXOWLWKUHDGHGDFFHVVWRFRPSOH[GDWDVWUXFWXUHVVXFKDVDKDVKWDEOH
)RUWKLVUHDVRQWKHUHDUHYHU\IHZSHUIRUPDQFHPRWLYDWLRQVWREXLOGDGDWDVWUXF-
WXUHVXFKDVDKDVKWDEOHRQWKH*386RLIall\RXUDSSOLFDWLRQQHHGVWRGRLVEXLOG
DKDVKWDEOHRUVLPLODUGDWDVWUXFWXUH\RXZRXOGOLNHO\EHEHWWHURIIGRLQJWKLVRQ
\RXU&38
2QWKHRWKHUKDQG\RXZLOOVRPHWLPHVȌQG\RXUVHOILQDVLWXDWLRQZKHUHDORQJ
FRPSXWDWLRQSLSHOLQHLQYROYHVRQHRUWZRVWDJHVWKDWWKH*38GRHVQRWHQMR\D
SHUIRUPDQFHDGYDQWDJHRYHUFRPSDUDEOH&38LPSOHPHQWDWLRQV,QWKHVHVLWXD-
WLRQV\RXKDYHWKUHH VRPHZKDWREYLRXV RSWLRQV
ǩ 3HUIRUPHYHU\VWHSRIWKHSLSHOLQHRQWKH*38
ǩ 3HUIRUPHYHU\VWHSRIWKHSLSHOLQHRQWKH&38
ǩ 3HUIRUPVRPHSLSHOLQHVWHSVRQWKH*38DQGVRPHRQWKH&38
7KHODVWRSWLRQVRXQGVOLNHWKHEHVWRIERWKZRUOGVKRZHYHULWLPSOLHVWKDW\RX
ZLOOQHHGWRV\QFKURQL]H\RXU&38DQG*38DWDQ\SRLQWLQ\RXUDSSOLFDWLRQZKHUH
\RXZDQWWRPRYHFRPSXWDWLRQIURPWKH*38WR&38RUEDFN7KLVV\QFKURQL]DWLRQ
DQGVXEVHTXHQWGDWDWUDQVIHUEHWZHHQKRVWDQG*38FDQRIWHQNLOODQ\SHUIRU-
PDQFHDGYDQWDJH\RXPLJKWKDYHGHULYHGIURPHPSOR\LQJDK\EULGDSSURDFKLQ
WKHȌUVWSODFH
,QVXFKDVLWXDWLRQLWPD\EHZRUWK\RXUWLPHWRSHUIRUPHYHU\SKDVHRIFRPSX-
WDWLRQRQWKH*38HYHQLIWKH*38LVQRWLGHDOO\VXLWHGIRUVRPHVWHSVRIWKH
DOJRULWKP,QWKLVYHLQWKH*38KDVKWDEOHFDQSRWHQWLDOO\SUHYHQWD&38*38
V\QFKURQL]DWLRQSRLQWPLQLPL]HGDWDWUDQVIHUEHWZHHQWKHKRVWDQG*38DQGIUHH
WKH&38WRSHUIRUPRWKHUFRPSXWDWLRQV,QVXFKDVFHQDULRLWǢVSRVVLEOHWKDWWKH
RYHUDOOSHUIRUPDQFHRID*38LPSOHPHQWDWLRQZRXOGH[FHHGD&38*38K\EULG
DSSURDFKGHVSLWHWKH*38EHLQJQRIDVWHUWKDQWKH&38RQFHUWDLQVWHSV RU
SRWHQWLDOO\HYHQJHWWLQJWURXQFHGE\WKH&38LQVRPHFDVHV
276
$ $SSHQGL[5HYLHZ
:HVDZKRZWRXVHDWRPLFFRPSDUHDQGVZDSRSHUDWLRQVWRLPSOHPHQWD*38
PXWH[8VLQJDORFNEXLOWZLWKWKLVPXWH[ZHVDZKRZWRLPSURYHRXURULJLQDOGRW
SURGXFWDSSOLFDWLRQWRUXQHQWLUHO\RQWKH*38:HFDUULHGWKLVLGHDIXUWKHUE\
LPSOHPHQWLQJDPXOWLWKUHDGHGKDVKWDEOHWKDWXVHGDQDUUD\RIORFNVWRSUHYHQW
XQVDIHVLPXOWDQHRXVPRGLȌFDWLRQVE\PXOWLSOHWKUHDGV,QIDFWWKHPXWH[ZH
GHYHORSHGFRXOGEHXVHGIRUDQ\PDQQHURISDUDOOHOGDWDVWUXFWXUHVDQGZHKRSH
WKDW\RXǢOOȌQGLWXVHIXOLQ\RXURZQH[SHULPHQWDWLRQDQGDSSOLFDWLRQGHYHORS-
PHQW2IFRXUVHWKHSHUIRUPDQFHRIDSSOLFDWLRQVWKDWXVHWKH*38WRLPSOH-
PHQWPXWH[EDVHGGDWDVWUXFWXUHVQHHGVFDUHIXOVWXG\2XU*38KDVKWDEOHJHWV
EHDWHQE\DVLQJOHWKUHDGHG&38YHUVLRQRIWKHVDPHFRGHVRLWZLOOPDNHVHQVH
WRXVHWKH*38IRUWKLVW\SHRIDSSOLFDWLRQRQO\LQFHUWDLQVLWXDWLRQV7KHUHLVQR
EODQNHWUXOHWKDWFDQEHXVHGWRGHWHUPLQHZKHWKHUD*38RQO\&38RQO\RU
K\EULGDSSURDFKZLOOZRUNEHVWEXWNQRZLQJKRZWRXVHDWRPLFVZLOODOORZ\RXWR
PDNHWKDWGHFLVLRQRQDFDVHE\FDVHEDVLV
277
A ORFNVǞ
add()IXQFWLRQ&38YHFWRUVXPVǞ RSHUDWLRQVǞ
add_to_table()NHUQHO*38KDVKWDEOH RYHUYLHZRIǞ
$/8V DULWKPHWLFORJLFXQLWV VXPPDU\UHYLHZǞ
&8'$$UFKLWHFWXUH
XVLQJFRQVWDQWPHPRU\ B
anim_and_exit()PHWKRG*38ULSSOHV EDQGZLGWKFRQVWDQWPHPRU\VDYLQJǞ
anim_gpu()URXWLQHWH[WXUHPHPRU\ %DVLF/LQHDU$OJHEUD6XESURJUDPV %/$6 &8%/$6
DQLPDWLRQ OLEUDU\Ǟ
*38-XOLD6HWH[DPSOHǞ ELQFRXQWV&38KLVWRJUDPFRPSXWDWLRQǞ
*38ULSSOHXVLQJWKUHDGVǞ %/$6 %DVLF/LQHDU$OJHEUD6XESURJUDPV &8%/$6
KHDWWUDQVIHUVLPXODWLRQǞ OLEUDU\Ǟ
animExit(), 149 blend_kernel()
DV\QFKURQRXVFDOO 'WH[WXUHPHPRU\Ǟ
cudaMemcpyAsync()DV WH[WXUHPHPRU\Ǟ
XVLQJHYHQWVZLWK
blockDimYDULDEOH
DWRPLFORFNV
'WH[WXUHPHPRU\Ǟ
*38KDVKWDEOHǞ
GRWSURGXFWFRPSXWDWLRQǞ
RYHUYLHZRIǞ
GRWSURGXFWFRPSXWDWLRQLQFRUUHFW
atomicAdd()
RSWLPL]DWLRQ
DWRPLFORFNVǞ
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV
KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\
Ǟ
QRWVXSSRUWLQJȍRDWLQJSRLQWQXPEHUV
atomicCAS()*38ORFNǞ GRWSURGXFWFRPSXWDWLRQ]HURFRS\PHPRU\
atomicExch()*38ORFNǞ Ǟ
DWRPLFVǞ *38KDVKWDEOHLPSOHPHQWDWLRQ
DGYDQFHGǞ *38ULSSOHXVLQJWKUHDGVǞ
FRPSXWHFDSDELOLW\RI19,',$*38VǞ *38VXPVRIDORQJHUYHFWRUǞ
GRWSURGXFWDQGǞ *38VXPVRIDUELWUDULO\ORQJYHFWRUVǞ
KDVKWDEOHVseeKDVKWDEOHV JUDSKLFVLQWHURSHUDELOLW\
KLVWRJUDPFRPSXWDWLRQ&38Ǟ KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\DWRPLFV
KLVWRJUDPFRPSXWDWLRQ*38Ǟ Ǟ
KLVWRJUDPFRPSXWDWLRQRYHUYLHZ KLVWRJUDPNHUQHOXVLQJVKDUHGJOREDOPHPRU\
KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\DWRPLFV DWRPLFVǞ
Ǟ PXOWLSOH&8'$VWUHDPV
KLVWRJUDPNHUQHOXVLQJVKDUHGJOREDOPHPRU\ UD\WUDFLQJRQ*38
DWRPLFVǞ VKDUHGPHPRU\ELWPDS
IRUPLQLPXPFRPSXWHFDSDELOLW\Ǟ WHPSHUDWXUHXSGDWHFRPSXWDWLRQǞ
279
blockIdxYDULDEOH FDOOEDFNVGPUAnimBitmapXVHUUHJLVWUDWLRQ
'WH[WXUHPHPRU\Ǟ IRU
GHȌQHG &DPEULGJH8QLYHUVLW\&8'$DSSOLFDWLRQVǞ
GRWSURGXFWFRPSXWDWLRQǞ FDPHUD
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV UD\WUDFLQJFRQFHSWVǞ
Ǟ UD\WUDFLQJRQ*38Ǟ
GRWSURGXFWFRPSXWDWLRQ]HURFRS\PHPRU\ FHOOXODUSKRQHVSDUDOOHOSURFHVVLQJLQ
Ǟ FHQWUDOSURFHVVLQJXQLWVsee&38V FHQWUDO
*38KDVKWDEOHLPSOHPHQWDWLRQ SURFHVVLQJXQLWV
*38-XOLD6HW FOHDQLQJDJHQWV&8'$DSSOLFDWLRQVIRUǞ
*38ULSSOHXVLQJWKUHDGVǞ clickDrag(), 149
*38VXPVRIDORQJHUYHFWRUǞ FORFNVSHHGHYROXWLRQRIǞ
*38YHFWRUVXPVǞ FRGHEUHDNLQJDVVXPSWLRQVǞ
JUDSKLFVLQWHURSHUDELOLW\ FRGHUHVRXUFHV&8'DǞ
KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\DWRPLFV FROOLVLRQUHVROXWLRQKDVKWDEOHVǞ
Ǟ FRORU
KLVWRJUDPNHUQHOXVLQJVKDUHGJOREDOPHPRU\ &38-XOLD6HWǞ
DWRPLFVǞ HDUO\GD\VRI*38FRPSXWLQJǞ
PXOWLSOH&8'$VWUHDPV UD\WUDFLQJFRQFHSWV
UD\WUDFLQJRQ*38 FRPSLOHU
VKDUHGPHPRU\ELWPDS IRUPLQLPXPFRPSXWHFDSDELOLW\Ǟ
WHPSHUDWXUHXSGDWHFRPSXWDWLRQǞ VWDQGDUG&IRU*38FRGHǞ
EORFNV FRPSOH[QXPEHUV
GHȌQHG GHȌQLQJJHQHULFFODVVWRVWRUHǞ
*38-XOLD6HW VWRULQJZLWKVLQJOHSUHFLVLRQȍRDWLQJSRLQW
*38YHFWRUVXPVǞ FRPSRQHQWV
KDUGZDUHLPSRVHGOLPLWVRQ FRPSXWDWLRQDOȍXLGG\QDPLFV&8'$DSSOLFDWLRQV
VSOLWWLQJLQWRWKUHDGVseeSDUDOOHOEORFNVVSOLWWLQJ IRUǞ
LQWRWKUHDGV FRPSXWHFDSDELOLW\
EUHDVWFDQFHU&8'$DSSOLFDWLRQVIRUǞ FRPSLOLQJIRUPLQLPXPǞ
EULGJHVFRQQHFWLQJPXOWLSOH*38V cudaChooseDevice()and, 141
EXFNHWVKDVKWDEOH GHȌQHG
FRQFHSWRIǞ RI19,',$*38VǞ
*38KDVKWDEOHLPSOHPHQWDWLRQǞ RYHUYLHZRIǞ
PXOWLWKUHDGHGKDVKWDEOHVDQGǞ FRPSXWHUJDPHV'JUDSKLFGHYHORSPHQWIRUǞ
bufferObjYDULDEOH FRQVWDQWPHPRU\
FUHDWLQJGPUAnimBitmap, 149 DFFHOHUDWLQJDSSOLFDWLRQVZLWK
UHJLVWHULQJZLWK&8'$UXQWLPH PHDVXULQJSHUIRUPDQFHZLWKHYHQWVǞ
UHJLVWHULQJZLWKcudaGraphicsGL- PHDVXULQJUD\WUDFHUSHUIRUPDQFHǞ
RegisterBuffer() RYHUYLHZRI
VHWWLQJXSJUDSKLFVLQWHURSHUDELOLW\Ǟ SHUIRUPDQFHZLWKǞ
EXIIHUVGHFODULQJVKDUHGPHPRU\Ǟ UD\WUDFLQJLQWURGXFWLRQǞ
UD\WUDFLQJRQ*38Ǟ
C UD\WUDFLQJZLWKǞ
cache[]VKDUHGPHPRU\YDULDEOH VXPPDU\UHYLHZ
GHFODULQJEXIIHURIVKDUHGPHPRU\QDPHGǞ __constant__ IXQFWLRQ
GRWSURGXFWFRPSXWDWLRQǞǞ GHFODULQJPHPRU\DVǞ
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV SHUIRUPDQFHZLWKFRQVWDQWPHPRU\Ǟ
Ǟ copy_const_kernel()NHUQHO
cacheIndexLQFRUUHFWGRWSURGXFWRSWLPL]DWLRQ 'WH[WXUHPHPRU\
FDFKHVWH[WXUHǞ XVLQJWH[WXUHPHPRU\Ǟ
280
copy_constant_kernel()FRPSXWLQJ &8'$0HPRU\&KHFNHU
WHPSHUDWXUHXSGDWHVǞ &8'$VWUHDPV
CPUAnimBitmapFODVVFUHDWLQJ*38ULSSOHǞ *38ZRUNVFKHGXOLQJZLWKǞ
Ǟ PXOWLSOHǞǞ
&38V FHQWUDOSURFHVVLQJXQLWV RYHUYLHZRI
HYROXWLRQRIFORFNVSHHGǞ VLQJOHǞ
HYROXWLRQRIFRUHFRXQW VXPPDU\UHYLHZ
IUHHLQJPHPRU\see free(),&ODQJXDJH &8'$7RRONLWǞ
KDVKWDEOHVǞ LQGHYHORSPHQWHQYLURQPHQWǞ
KLVWRJUDPFRPSXWDWLRQRQǞ &8'$WRROV
DVKRVWLQWKLVERRN &8%/$6OLEUDU\Ǟ
WKUHDGPDQDJHPHQWDQGVFKHGXOLQJLQ &8'$7RRONLWǞ
YHFWRUVXPVǞ &8))7OLEUDU\
YHULI\LQJ*38KLVWRJUDPXVLQJUHYHUVH&38 GHEXJJLQJ&8'$&Ǟ
KLVWRJUDPǞ *38&RPSXWLQJ6'.GRZQORDGǞ
&8%/$6OLEUDU\Ǟ 19,',$3HUIRUPDQFH3ULPLWLYHV
cuComplexVWUXFWXUH&38-XOLD6HWǞ RYHUYLHZRI
cuComplexVWUXFWXUH*38-XOLD6HWǞ 9LVXDO3URȌOHUǞ
CUDA, Supercomputing for the Masses Ǟ &8'$=RQH
&8'$$UFKLWHFWXUH cuda_malloc_test()SDJHORFNHGPHPRU\
FRPSXWDWLRQDOȍXLGG\QDPLFDSSOLFDWLRQVǞ cudaBindTexture()WH[WXUHPHPRU\Ǟ
GHȌQHG
cudaBindTexture2D()WH[WXUHPHPRU\
HQYLURQPHQWDOVFLHQFHDSSOLFDWLRQVǞ
cudaChannelFormatDesc()ELQGLQJ'
ȌUVWDSSOLFDWLRQRI
WH[WXUHV
PHGLFDOLPDJLQJDSSOLFDWLRQVǞ
cudaChooseDevice()
UHVRXUFHIRUXQGHUVWDQGLQJǞ
GHȌQHG
XVLQJǞ
GPUAnimBitmap LQLWLDOL]DWLRQ
CUDA C
IRUYDOLG,'Ǟ
FRPSXWDWLRQDOȍXLGG\QDPLFDSSOLFDWLRQVǞ
cudaD39SetDirect3DDevice()'LUHFW;
&8'$GHYHORSPHQWWRRONLWǞ
LQWHURSHUDELOLW\Ǟ
&8'$HQDEOHGJUDSKLFVSURFHVVRUǞ
GHEXJJLQJǞ cudaDeviceMapHost()]HURFRS\PHPRU\GRW
GHYHORSPHQWHQYLURQPHQWVHWXSseeGHYHORSPHQW SURGXFW
HQYLURQPHQWVHWXS cudaDevicePropVWUXFWXUH
GHYHORSPHQWRI cudaChooseDevice()ZRUNLQJZLWK
HQYLURQPHQWDOVFLHQFHDSSOLFDWLRQVǞ PXOWLSOH&8'$VWUHDPV
JHWWLQJVWDUWHGǞ RYHUYLHZRIǞ
PHGLFDOLPDJLQJDSSOLFDWLRQVǞ VLQJOH&8'$VWUHDPVǞ
19,',$GHYLFHGULYHU XVLQJGHYLFHSURSHUWLHV
RQPXOWLSOH*38Vsee*38V JUDSKLFVSURFHVVLQJ &8'$HQDEOHGJUDSKLFVSURFHVVRUVǞ
XQLWV PXOWLV\VWHP cudaEventCreate()
RYHUYLHZRIǞ 'WH[WXUHPHPRU\
SDUDOOHOSURJUDPPLQJLQsee parallel &8'$VWUHDPV
SURJUDPPLQJ&8'$ *38KDVKWDEOHLPSOHPHQWDWLRQǞ
SDVVLQJSDUDPHWHUVǞ *38KLVWRJUDPFRPSXWDWLRQ
TXHU\LQJGHYLFHVǞ PHDVXULQJSHUIRUPDQFHZLWKHYHQWVǞ
VWDQGDUG&FRPSLOHUǞ SDJHORFNHGKRVWPHPRU\DSSOLFDWLRQǞ
VXPPDU\UHYLHZ SHUIRUPLQJDQLPDWLRQZLWKGPUAnimBitmap
XVLQJGHYLFHSURSHUWLHVǞ UD\WUDFLQJRQ*38
ZULWLQJȌUVWSURJUDPǞ VWDQGDUGKRVWPHPRU\GRWSURGXFW
&8'$'DWD3DUDOOHO3ULPLWLYHV/LEUDU\ &8'33 WH[WXUHPHPRU\
&8'$HYHQW$3,DQGSHUIRUPDQFHǞ ]HURFRS\KRVWPHPRU\
281
cudaEventDestroy() PXOWLSOH&38V
GHȌQHG SDJHORFNHGKRVWPHPRU\Ǟ
*38KDVKWDEOHLPSOHPHQWDWLRQ UD\WUDFLQJRQ*38
*38KLVWRJUDPFRPSXWDWLRQ UD\WUDFLQJZLWKFRQVWDQWPHPRU\
KHDWWUDQVIHUVLPXODWLRQ VKDUHGPHPRU\ELWPDS
PHDVXULQJSHUIRUPDQFHZLWKHYHQWVǞ VWDQGDUGKRVWPHPRU\GRWSURGXFW
SDJHORFNHGKRVWPHPRU\Ǟ cudaFreeHost()
WH[WXUHPHPRU\ DOORFDWLQJSRUWDEOHSLQQHGPHPRU\
]HURFRS\KRVWPHPRU\ &8'$VWUHDPV
cudaEventElapsedTime() GHȌQHG
'WH[WXUHPHPRU\ IUHHLQJEXIIHUDOORFDWHGZLWK
&8'$VWUHDPV cudaHostAlloc(), 190
GHȌQHG ]HURFRS\PHPRU\GRWSURGXFW
*38KDVKWDEOHLPSOHPHQWDWLRQ &8'$*'%GHEXJJLQJWRROǞ
*38KLVWRJUDPFRPSXWDWLRQ cudaGetDevice()
KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQ &8'$VWUHDPV
KHDWWUDQVIHUXVLQJJUDSKLFVLQWHURSHUDELOLW\ GHYLFHSURSHUWLHV
SDJHORFNHGKRVWPHPRU\ ]HURFRS\PHPRU\GRWSURGXFW
VWDQGDUGKRVWPHPRU\GRWSURGXFW cudaGetDeviceCount()
]HURFRS\PHPRU\GRWSURGXFW GHYLFHSURSHUWLHV
cudaEventRecord() JHWWLQJFRXQWRI&8'$GHYLFHV
&8'$VWUHDPV PXOWLSOH&38VǞ
&8'$VWUHDPVDQG cudaGetDeviceProperties()
*38KDVKWDEOHLPSOHPHQWDWLRQǞ GHWHUPLQLQJLI*38LVLQWHJUDWHGRUGLVFUHWH
*38KLVWRJUDPFRPSXWDWLRQ PXOWLSOH&8'$VWUHDPV
KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQ TXHU\LQJGHYLFHVǞ
KHDWWUDQVIHUXVLQJJUDSKLFVLQWHURSHUDELOLW\ ]HURFRS\PHPRU\GRWSURGXFW
Ǟ cudaGLSetGLDevice()
PHDVXULQJSHUIRUPDQFHZLWKHYHQWVǞ JUDSKLFVLQWHURSHUDWLRQZLWK2SHQ*/
PHDVXULQJUD\WUDFHUSHUIRUPDQFHǞ SUHSDULQJ&8'$WRXVH2SHQ*/GULYHU
SDJHORFNHGKRVWPHPRU\Ǟ cudaGraphicsGLRegisterBuffer()
UD\WUDFLQJRQ*38 cudaGraphicsMapFlagsNone(), 143
VWDQGDUGKRVWPHPRU\GRWSURGXFW cudaGraphicsMapFlagsReadOnly(), 143
XVLQJWH[WXUHPHPRU\Ǟ cudaGraphicsMapFlagsWriteDiscard(), 143
cudaEventSynchronize() cudaGraphicsUnapResources(), 144
'WH[WXUHPHPRU\ cudaHostAlloc()
*38KDVKWDEOHLPSOHPHQWDWLRQ &8'$VWUHDPV
*38KLVWRJUDPFRPSXWDWLRQ malloc()YHUVXVǞ
KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQ SDJHORFNHGKRVWPHPRU\DSSOLFDWLRQǞ
KHDWWUDQVIHUXVLQJJUDSKLFVLQWHURSHUDELOLW\ ]HURFRS\PHPRU\GRWSURGXFWǞ
PHDVXULQJSHUIRUPDQFHZLWKHYHQWV cudaHostAllocDefault()
SDJHORFNHGKRVWPHPRU\ &8'$VWUHDPV
VWDQGDUGKRVWPHPRU\GRWSURGXFW GHIDXOWSLQQHGPHPRU\
cudaFree() SDJHORFNHGKRVWPHPRU\Ǟ
DOORFDWLQJSRUWDEOHSLQQHGPHPRU\ cudaHostAllocMapped()ȍDJ
&38YHFWRUVXPV GHIDXOWSLQQHGPHPRU\
&8'$VWUHDPV SRUWDEOHSLQQHGPHPRU\
GHȌQHGǞ ]HURFRS\PHPRU\GRWSURGXFWǞ
GRWSURGXFWFRPSXWDWLRQ cudaHostAllocPortable()SRUWDEOHSLQQHG
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV PHPRU\Ǟ
*38KDVKWDEOHLPSOHPHQWDWLRQǞ cudaHostAllocWriteCombined()ȍDJ
*38ULSSOHXVLQJWKUHDGV SRUWDEOHSLQQHGPHPRU\
*38VXPVRIDUELWUDULO\ORQJYHFWRUV ]HURFRS\PHPRU\GRWSURGXFWǞ
282
cudaHostGetDevicePointer() *38-XOLD6HW
SRUWDEOHSLQQHGPHPRU\ *38VXPVRIDUELWUDULO\ORQJYHFWRUV
]HURFRS\PHPRU\GRWSURGXFWǞ PXOWLSOH&8'$VWUHDPV
cudaMalloc(), 124 SDJHORFNHGKRVWPHPRU\
'WH[WXUHPHPRU\Ǟ UD\WUDFLQJRQ*38
DOORFDWLQJGHYLFHPHPRU\XVLQJ VKDUHGPHPRU\ELWPDS
&38YHFWRUVXPVDSSOLFDWLRQ VWDQGDUGKRVWPHPRU\GRWSURGXFW
&8'$VWUHDPVǞ XVLQJPXOWLSOH&38V
GRWSURGXFWFRPSXWDWLRQ cudaMemcpyHostToDevice()
GRWSURGXFWFRPSXWDWLRQVWDQGDUGKRVW &38YHFWRUVXPVDSSOLFDWLRQ
PHPRU\ GRWSURGXFWFRPSXWDWLRQ
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV *38VXPVRIDUELWUDULO\ORQJYHFWRUV
*38KDVKWDEOHLPSOHPHQWDWLRQǞ LPSOHPHQWLQJ*38ORFNIXQFWLRQ
*38-XOLD6HW PHDVXULQJUD\WUDFHUSHUIRUPDQFH
*38ORFNIXQFWLRQ PXOWLSOH&38V
*38ULSSOHXVLQJWKUHDGV PXOWLSOH&8'$VWUHDPV
*38VXPVRIDUELWUDULO\ORQJYHFWRUV SDJHORFNHGKRVWPHPRU\
PHDVXULQJUD\WUDFHUSHUIRUPDQFH VWDQGDUGKRVWPHPRU\GRWSURGXFW
SRUWDEOHSLQQHGPHPRU\ cudaMemcpyToSymbol()FRQVWDQWPHPRU\Ǟ
UD\WUDFLQJRQ*38 cudaMemset()
UD\WUDFLQJZLWKFRQVWDQWPHPRU\ *38KDVKWDEOHLPSOHPHQWDWLRQ
VKDUHGPHPRU\ELWPDS *38KLVWRJUDPFRPSXWDWLRQ
XVLQJPXOWLSOH&38V &8'$1(7SURMHFW
XVLQJWH[WXUHPHPRU\ cudaSetDevice()
cuda-memcheck, 242 DOORFDWLQJSRUWDEOHSLQQHGPHPRU\Ǟ
cudaMemcpy()
Ǟ
'WH[WXUHELQGLQJ
XVLQJGHYLFHSURSHUWLHV
FRS\LQJGDWDEHWZHHQKRVWDQGGHYLFH
XVLQJPXOWLSOH&38VǞ
&38YHFWRUVXPVDSSOLFDWLRQ
cudaSetDeviceFlags()
GRWSURGXFWFRPSXWDWLRQǞ
DOORFDWLQJSRUWDEOHSLQQHGPHPRU\
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV
]HURFRS\PHPRU\GRWSURGXFW
*38KDVKWDEOHLPSOHPHQWDWLRQǞ
cudaStreamCreate(), 194, 201
*38KLVWRJUDPFRPSXWDWLRQǞ
cudaStreamDestroy()
*38-XOLD6HW
cudaStreamSynchronize()Ǟ
*38ORFNIXQFWLRQLPSOHPHQWDWLRQ
cudaThreadSynchronize(), 219
*38ULSSOHXVLQJWKUHDGV
cudaUnbindTexture()'WH[WXUHPHPRU\
*38VXPVRIDUELWUDULO\ORQJYHFWRUV
Ǟ
KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQǞ
PHDVXULQJUD\WUDFHUSHUIRUPDQFH &8'33 &8'$'DWD3DUDOOHO3ULPLWLYHV/LEUDU\
SDJHORFNHGKRVWPHPRU\DQG &8))7OLEUDU\
UD\WUDFLQJRQ*38 &8/$WRROV
VWDQGDUGKRVWPHPRU\GRWSURGXFW FXUUHQWDQLPDWLRQWLPH*38ULSSOHXVLQJWKUHDGV
XVLQJPXOWLSOH&38VǞ Ǟ
cudaMemcpyAsync()
*38ZRUNVFKHGXOLQJǞ D
PXOWLSOH&8'$VWUHDPVǞ GHEXJJLQJ&8'$&Ǟ
VLQJOH&8'$VWUHDPV GHWHUJHQWV&8'$DSSOLFDWLRQVǞ
WLPHOLQHRILQWHQGHGDSSOLFDWLRQH[HFXWLRQXVLQJ dev_bitmapSRLQWHU*38-XOLD6HW
PXOWLSOHVWUHDPV GHYHORSPHQWHQYLURQPHQWVHWXS
cudaMemcpyDeviceToHost() &8'$7RRONLWǞ
&38YHFWRUVXPVDSSOLFDWLRQ &8'$HQDEOHGJUDSKLFVSURFHVVRUǞ
GRWSURGXFWFRPSXWDWLRQǞ 19,',$GHYLFHGULYHU
*38KDVKWDEOHLPSOHPHQWDWLRQ VWDQGDUG&FRPSLOHUǞ
*38KLVWRJUDPFRPSXWDWLRQǞ VXPPDU\UHYLHZ
283
GHYLFHGULYHUV RYHUYLHZRIǞ
GHYLFHRYHUODS*38Ǟ UHFRUGLQJsee cudaEventRecord()
__device__ IXQFWLRQ VWRSSLQJDQGVWDUWLQJsee
*38KDVKWDEOHLPSOHPHQWDWLRQǞ cudaEventDestroy()
*38-XOLD6HW VXPPDU\UHYLHZ
GHYLFHV EXIT_FAILURE()SDVVLQJSDUDPHWHUV
JHWWLQJFRXQWRI&8'$
*38YHFWRUVXPVǞ
SDVVLQJSDUDPHWHUVǞ
F
fAnim()VWRULQJUHJLVWHUHGFDOOEDFNV
TXHU\LQJǞ
)DVW)RXULHU7UDQVIRUPOLEUDU\19,',$
XVHRIWHUPLQWKLVERRN
ȌUVWSURJUDPZULWLQJǞ
XVLQJSURSHUWLHVRIǞ
devPtrJUDSKLFVLQWHURSHUDELOLW\ ȍDJVLQJUDSKLFVLQWHURSHUDELOLW\
dim3 YDULDEOHJULG*38-XOLD6HWǞ float_to_color()NHUQHOVLQJUDSKLFV
DIMxDIM ELWPDSLPDJH*38-XOLD6HWǞ LQWHURSHUDELOLW\
GLUHFWPHPRU\DFFHVV '0$ IRUSDJHORFNHG ȍRDWLQJSRLQWQXPEHUV
PHPRU\ DWRPLFDULWKPHWLFQRWVXSSRUWHGIRU
'LUHFW; &8'$$UFKLWHFWXUHGHVLJQHGIRU
DGGLQJVWDQGDUG&WR HDUO\GD\VRI*38FRPSXWLQJQRWDEOHWRKDQGOH
EUHDNWKURXJKLQ*38WHFKQRORJ\Ǟ )2575$1DSSOLFDWLRQV
*H)RUFH*7; &8%/$6FRPSDWLELOLW\ZLWKǞ
JUDSKLFVLQWHURSHUDELOLW\Ǟ ODQJXDJHZUDSSHUIRU&8'$&
GLVFUHWH*38VǞ IRUXPV19,',$
GLVSOD\DFFHOHUDWRUV' IUDFWDOVsee-XOLD6HWH[DPSOH
'0$ GLUHFWPHPRU\DFFHVV IRUSDJHORFNHG free(),&ODQJXDJH
PHPRU\ cudaFree( )YHUVXVǞ
GRWSURGXFWFRPSXWDWLRQ GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV
RSWLPL]HGLQFRUUHFWO\Ǟ *38KDVKWDEOHLPSOHPHQWDWLRQ
VKDUHGPHPRU\DQGǞ PXOWLSOH&38V
VWDQGDUGKRVWPHPRU\YHUVLRQRIǞ VWDQGDUGKRVWPHPRU\GRWSURGXFW
XVLQJDWRPLFVWRNHHSHQWLUHO\RQ*38Ǟ
Ǟ
GRWSURGXFWFRPSXWDWLRQPXOWLSOH*38V
G
*H)RUFH
DOORFDWLQJSRUWDEOHSLQQHGPHPRU\Ǟ
*H)RUFH*7;
XVLQJǞ
]HURFRS\Ǟ generate_frame()*38ULSSOHǞ
]HURFRS\SHUIRUPDQFH JHQHULFFODVVHVVWRULQJFRPSOH[QXPEHUVZLWK
'U'REE V&8'$Ǟ Ǟ
'5$0VGLVFUHWH*38VZLWKRZQGHGLFDWHGǞ GL_PIXEL_UNPACK_BUFFER_ARBWDUJHW2SHQ*/
draw_funcJUDSKLFVLQWHURSHUDELOLW\Ǟ LQWHURSHUDWLRQ
glBindBuffer()
E FUHDWLQJSL[HOEXIIHUREMHFW
end_thread()PXOWLSOH&38V JUDSKLFVLQWHURSHUDELOLW\
HQYLURQPHQWDOVFLHQFH&8'$DSSOLFDWLRQVIRUǞ glBufferData()SL[HOEXIIHUREMHFW
HYHQWWLPHUseeWLPHUHYHQW glDrawPixels()
HYHQWV JUDSKLFVLQWHURSHUDELOLW\
FRPSXWLQJHODSVHGWLPHEHWZHHQUHFRUGHGsee RYHUYLHZRIǞ
cudaEventElapsedTime() glGenBuffers()SL[HOEXIIHUREMHFW
FUHDWLQJsee cudaEventCreate() JOREDOPHPRU\DWRPLFV
*38KLVWRJUDPFRPSXWDWLRQ *38FRPSXWHFDSDELOLW\UHTXLUHPHQWV
PHDVXULQJSHUIRUPDQFHZLWK KLVWRJUDPNHUQHOXVLQJǞ
PHDVXULQJUD\WUDFHUSHUIRUPDQFHǞ KLVWRJUDPNHUQHOXVLQJVKDUHGDQGǞ
284
285
286
NH\V *38KLVWRJUDPFRPSXWDWLRQǞ
&38KDVKWDEOHLPSOHPHQWDWLRQǞ SDJHORFNHGKRVW SLQQHG Ǟ
*38KDVKWDEOHLPSOHPHQWDWLRQǞ TXHU\LQJGHYLFHVǞ
KDVKWDEOHFRQFHSWVǞ VKDUHGseeVKDUHGPHPRU\
WH[WXUHseeWH[WXUHPHPRU\
L XVHRIWHUPLQWKLVERRN
ODQJXDJHZUDSSHUVǞ 0HPRU\&KHFNHU&8'$
/$3$&. /LQHDU$OJHEUD3DFNDJH memset(),&ODQJXDJH
OLJKWHIIHFWVUD\WUDFLQJFRQFHSWV 0LFURVRIW:LQGRZV9LVXDO6WXGLR&FRPSLOHUǞ
/LQX[VWDQGDUG&FRPSLOHUIRU 0LFURVRIW1(7
LockVWUXFWXUHǞǞ PXOWLFRUHUHYROXWLRQHYROXWLRQRI&38V
ORFNVDWRPLFǞ PXOWLSOLFDWLRQLQYHFWRUGRWSURGXFWV
PXOWLWKUHDGHGKDVKWDEOHVǞ
M mutex*38ORFNIXQFWLRQǞ
0DFLQWRVK26;VWDQGDUG&FRPSLOHU
main()URXWLQH N
'WH[WXUHPHPRU\Ǟ Q)RUFHPHGLDDQGFRPPXQLFDWLRQVSURFHVVRUV
&38KDVKWDEOHLPSOHPHQWDWLRQǞ 0&3V Ǟ
&38KLVWRJUDPFRPSXWDWLRQ NVIDIA
GRWSURGXFWFRPSXWDWLRQǞ FRPSXWHFDSDELOLW\RIYDULRXV*38VǞ
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV FUHDWLQJ'JUDSKLFVIRUFRQVXPHUV
Ǟ FUHDWLQJ&8'$&IRU*38
*38KDVKWDEOHLPSOHPHQWDWLRQǞ FUHDWLQJȌUVW*38EXLOWZLWK&8'$$UFKLWHFWXUH
*38KLVWRJUDPFRPSXWDWLRQ &8%/$6OLEUDU\Ǟ
*38-XOLD6HWǞ &8'$HQDEOHGJUDSKLFVSURFHVVRUVǞ
*38ULSSOHXVLQJWKUHDGVǞ &8'$*'%GHEXJJLQJWRROǞ
*38YHFWRUVXPVǞ &8))7OLEUDU\
JUDSKLFVLQWHURSHUDELOLW\ GHYLFHGULYHU
SDJHORFNHGKRVWPHPRU\DSSOLFDWLRQǞ *38&RPSXWLQJ6'.GRZQORDGǞ
UD\WUDFLQJRQ*38Ǟ 3DUDOOHO16LJKWGHEXJJLQJWRRO
UD\WUDFLQJZLWKFRQVWDQWPHPRU\Ǟ 3HUIRUPDQFH3ULPLWLYHV
VKDUHGPHPRU\ELWPDS
SURGXFWVFRQWDLQLQJPXOWLSOH*38V
VLQJOH&8'$VWUHDPVǞ
9LVXDO3URȌOHUǞ
]HURFRS\PHPRU\GRWSURGXFWǞ
NVIDIA CUDA Programming Guide, 31
malloc()
cudaHostAlloc()YHUVXV
cudaHostAlloc()YHUVXV O
cudaMalloc( )YHUVXV offset'WH[WXUHPHPRU\
UD\WUDFLQJRQ*38 RQFKLSFDFKLQJseeFRQVWDQWPHPRU\WH[WXUH
PDPPRJUDPV&8'$DSSOLFDWLRQVIRUPHGLFDO PHPRU\
LPDJLQJ RQHGLPHQVLRQDOEORFNV
maxThreadsPerBlockȌHOGGHYLFHSURSHUWLHV *38VXPVRIDORQJHUYHFWRU
PHGLDDQGFRPPXQLFDWLRQVSURFHVVRUV 0&3V WZRGLPHQVLRQDOEORFNVYHUVXV
PHGLFDOLPDJLQJ&8'$DSSOLFDWLRQVIRUǞ RQOLQHUHVRXUFHVseeUHVRXUFHVRQOLQH
memcpy(),&ODQJXDJH 2SHQ*/
PHPRU\ FUHDWLQJGPUAnimBitmapǞ
DOORFDWLQJGHYLFHsee cudaMalloc() LQHDUO\GD\VRI*38FRPSXWLQJǞ
FRQVWDQWseeFRQVWDQWPHPRU\ JHQHUDWLQJLPDJHGDWDZLWKNHUQHOǞ
&8'$$UFKLWHFWXUHFUHDWLQJDFFHVVWR LQWHURSHUDWLRQǞ
HDUO\GD\VRI*38FRPSXWLQJ ZULWLQJ'JUDSKLFV
H[HFXWLQJGHYLFHFRGHWKDWXVHVDOORFDWHG RSHUDWLRQVDWRPLFǞ
IUHHLQJsee cudaFree()free(),&ODQJXDJH RSWLPL]DWLRQLQFRUUHFWGRWSURGXFWǞ
287
P SURSHUWLHV
SDJHORFNHGKRVWPHPRU\ cudaDevicePropVWUXFWXUHsee
DOORFDWLQJDVSRUWDEOHSLQQHGPHPRU\Ǟ cudaDevicePropVWUXFWXUH
RYHUYLHZRIǞ maxThreadsPerBlockȌHOGIRUGHYLFH
UHVWULFWHGXVHRI UHSRUWLQJGHYLFH
VLQJOH&8'$VWUHDPVZLWKǞ XVLQJGHYLFHǞ
SDUDOOHOEORFNV 3\&8'$SURMHFWǞ
*38-XOLD6HW 3\WKRQODQJXDJHZUDSSHUVIRU&8'$&
*38YHFWRUVXPV
SDUDOOHOEORFNVVSOLWWLQJLQWRWKUHDGV Q
*38VXPVRIDUELWUDULO\ORQJYHFWRUVǞ TXHU\LQJGHYLFHVǞ
*38VXPVRIORQJHUYHFWRUǞ
*38YHFWRUVXPVXVLQJWKUHDGVǞ R
RYHUYLHZRI
UDVWHUL]DWLRQ
YHFWRUVXPVǞ
UD\WUDFLQJ
3DUDOOHO16LJKWGHEXJJLQJWRRO
FRQFHSWVEHKLQGǞ
SDUDOOHOSURFHVVLQJ
ZLWKFRQVWDQWPHPRU\Ǟ
HYROXWLRQRI&38VǞ
RQ*38Ǟ
SDVWSHUFHSWLRQRI
PHDVXULQJSHUIRUPDQFHǞ
SDUDOOHOSURJUDPPLQJ&8'$
UHDGPRGLI\ZULWHRSHUDWLRQV
&38YHFWRUVXPVǞ
DWRPLFRSHUDWLRQVDVǞ
H[DPSOH&38-XOLD6HWDSSOLFDWLRQǞ
XVLQJDWRPLFORFNVǞ
H[DPSOH*38-XOLD6HWDSSOLFDWLRQǞ
UHDGRQO\PHPRU\seeFRQVWDQWPHPRU\WH[WXUH
H[DPSOHRYHUYLHZǞ
PHPRU\
*38YHFWRUVXPVǞ
UHGXFWLRQV
RYHUYLHZRI
GRWSURGXFWVDV
VXPPDU\UHYLHZ
VXPPLQJYHFWRUVǞ RYHUYLHZRI
SDUDPHWHUSDVVLQJǞ VKDUHGPHPRU\DQGV\QFKURQL]DWLRQIRUǞ
3&JDPLQJ'JUDSKLFVIRUǞ UHIHUHQFHVWH[WXUHPHPRU\ǞǞ
3&,([SUHVVVORWVDGGLQJPXOWLSOH*38VWR UHJLVWUDWLRQ
SHUIRUPDQFH bufferObjZLWKcudaGraphicsGLRegister-
FRQVWDQWPHPRU\DQGǞ Buffer()
HYROXWLRQRI&38VǞ FDOOEDFN
KDVKWDEOH UHQGHULQJ*38VSHUIRUPLQJFRPSOH[
ODXQFKLQJNHUQHOIRU*38KLVWRJUDPFRPSXWDWLRQ resourceYDULDEOH
Ǟ FUHDWLQJGPUAnimBitmapǞ
PHDVXULQJZLWKHYHQWVǞ JUDSKLFVLQWHURSHUDWLRQ
SDJHORFNHGKRVWPHPRU\DQG UHVRXUFHVRQOLQH
]HURFRS\PHPRU\DQGǞ &8'$FRGHǞ
SLQQHGPHPRU\ &8'$7RRONLW
DOORFDWLQJDVSRUWDEOHǞ &8'$8QLYHUVLW\
cudaHostAllocDefault()JHWWLQJGHIDXOW &8'33
DVSDJHORFNHGPHPRU\seeSDJHORFNHGKRVW &8/$WRROV
PHPRU\ 'U'REE V&8'$
SL[HOEXIIHUREMHFWV 3%2 2SHQ*/Ǟ *38&RPSXWLQJ6'.FRGHVDPSOHV
SL[HOVKDGHUVHDUO\GD\VRI*38FRPSXWLQJǞ ODQJXDJHZUDSSHUVǞ
SL[HOVQXPEHURIWKUHDGVSHUEORFNǞ 19,',$GHYLFHGULYHU
SRUWDEOHFRPSXWLQJGHYLFHV 19,',$IRUXPV
Programming Massively Parallel Processors: A VWDQGDUG&FRPSLOHUIRU0DF26;
Hands-on Approach .LUN+ZX 9LVXDO6WXGLR&FRPSLOHU
288
UHVRXUFHVZULWWHQ GHYHORSPHQWHQYLURQPHQWǞ
&8'$8Ǟ NHUQHOFDOOǞ
IRUXPV startHYHQWǞ
SURJUDPPLQJPDVVLYHSDUDOOHOSURFHVVRUVǞ start_thread()PXOWLSOH&38VǞ
ripple, GPU stopHYHQWǞ
ZLWKJUDSKLFVLQWHURSHUDELOLW\Ǟ VWUHDPV
SURGXFLQJǞ &8'$RYHUYLHZRI
routine() &8'$XVLQJPXOWLSOHǞǞ
DOORFDWLQJSRUWDEOHSLQQHGPHPRU\Ǟ &8'$XVLQJVLQJOHǞ
XVLQJPXOWLSOH&38VǞ *38ZRUNVFKHGXOLQJDQGǞ
5XVVLDQQHVWLQJGROOKLHUDUFK\ RYHUYLHZRIǞ
SDJHORFNHGKRVWPHPRU\DQGǞ
S VXPPDU\UHYLHZ
VFDODEOHOLQNLQWHUIDFH 6/, DGGLQJPXOWLSOH*38V VXSHUFRPSXWHUVSHUIRUPDQFHJDLQVLQ
ZLWK VXUIDFWDQWVHQYLURQPHQWDOGHYDVWDWLRQRI
scaleIDFWRU&38-XOLD6HW V\QFKURQL]DWLRQ
VFLHQWLȌFFRPSXWDWLRQVLQHDUO\GD\V RIHYHQWVsee cudaEventSynchronize()
VFUHHQVKRWV RIVWUHDPVǞ
DQLPDWHGKHDWWUDQVIHUVLPXODWLRQ RIWKUHDGV
*38-XOLD6HWH[DPSOH V\QFKURQL]DWLRQDQGVKDUHGPHPRU\
*38ULSSOHH[DPSOH GRWSURGXFWǞ
JUDSKLFVLQWHURSHUDWLRQH[DPSOH GRWSURGXFWRSWLPL]HGLQFRUUHFWO\Ǟ
UD\WUDFLQJH[DPSOHǞ RYHUYLHZRI
UHQGHUHGZLWKSURSHUV\QFKURQL]DWLRQ VKDUHGPHPRU\ELWPDSǞ
UHQGHUHGZLWKRXWSURSHUV\QFKURQL]DWLRQ __syncthreads()
VKDGLQJODQJXDJHV GRWSURGXFWFRPSXWDWLRQǞ
VKDUHGGDWDEXIIHUVNHUQHO2SHQ*/UHQGHULQJ VKDUHGPHPRU\ELWPDSXVLQJǞ
LQWHURSHUDWLRQ XQLQWHQGHGFRQVHTXHQFHVRIǞ
VKDUHGPHPRU\
DWRPLFVǞ T
ELWPDSǞ WDVNSDUDOOHOLVP&38YHUVXV*38DSSOLFDWLRQV
&8'$$UFKLWHFWXUHFUHDWLQJDFFHVVWR 7HFKQL6FDQ0HGLFDO6\VWHPV&8'$DSSOLFDWLRQV
GRWSURGXFWǞ WHPSHUDWXUHV
GRWSURGXFWRSWLPL]HGLQFRUUHFWO\Ǟ FRPSXWLQJWHPSHUDWXUHXSGDWHVǞ
DQGV\QFKURQL]DWLRQ KHDWWUDQVIHUVLPXODWLRQǞ
6LOLFRQ*UDSKLFV2SHQ*/OLEUDU\ KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQǞ
VLPXODWLRQ 7HPSOH8QLYHUVLW\UHVHDUFK&8'$DSSOLFDWLRQV
DQLPDWLRQRIǞ Ǟ
FKDOOHQJHVRISK\VLFDO tex1Dfetch()FRPSLOHULQWULQVLFWH[WXUHPHPRU\
FRPSXWLQJWHPSHUDWXUHXSGDWHVǞ ǞǞ
VLPSOHKHDWLQJPRGHOǞ tex2D()FRPSLOHULQWULQVLFWH[WXUHPHPRU\
6/, VFDODEOHOLQNLQWHUIDFH DGGLQJPXOWLSOH*38V Ǟ
ZLWK WH[WXUHHDUO\GD\VRI*38FRPSXWLQJǞ
VSDWLDOORFDOLW\ WH[WXUHPHPRU\
GHVLJQLQJWH[WXUHFDFKHVIRUJUDSKLFVZLWK DQLPDWLRQRIVLPXODWLRQǞ
KHDWWUDQVIHUVLPXODWLRQDQLPDWLRQǞ GHȌQHG
VSOLWSDUDOOHOEORFNVseeSDUDOOHOEORFNVVSOLWWLQJ RYHUYLHZRIǞ
LQWRWKUHDGV VLPXODWLQJKHDWWUDQVIHUǞ
VWDQGDUG&FRPSLOHU VXPPDU\UHYLHZ
FRPSLOLQJIRUPLQLPXPFRPSXWHFDSDELOLW\ WZRGLPHQVLRQDOǞ
Ǟ XVLQJǞ
289
threadIdxYDULDEOH WLPH*38ULSSOHXVLQJWKUHDGVǞ
'WH[WXUHPHPRU\Ǟ WLPHUHYHQWsee cudaEventElapsedTime()
GRWSURGXFWFRPSXWDWLRQǞ 7RRONLW&8'$Ǟ
GRWSURGXFWFRPSXWDWLRQZLWKDWRPLFORFNV WZRGLPHQVLRQDOEORFNV
Ǟ DUUDQJHPHQWRIEORFNVDQGWKUHDGV
*38KDVKWDEOHLPSOHPHQWDWLRQ *38-XOLD6HW
*38-XOLD6HW *38ULSSOHXVLQJWKUHDGV
*38ULSSOHXVLQJWKUHDGVǞ gridDimYDULDEOHDV
*38VXPVRIDORQJHUYHFWRUǞ RQHGLPHQVLRQDOLQGH[LQJYHUVXV
*38VXPVRIDUELWUDULO\ORQJYHFWRUVǞ WZRGLPHQVLRQDOGLVSOD\DFFHOHUDWRUVGHYHORSPHQW
*38YHFWRUVXPVXVLQJWKUHDGV
RI*38V
KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\DWRPLFV
WZRGLPHQVLRQDOWH[WXUHPHPRU\
Ǟ
GHȌQHG
KLVWRJUDPNHUQHOXVLQJVKDUHGJOREDOPHPRU\
KHDWWUDQVIHUVLPXODWLRQǞ
DWRPLFVǞ
PXOWLSOH&8'$VWUHDPV RYHUYLHZRIǞ
UD\WUDFLQJRQ*38
VHWWLQJXSJUDSKLFVLQWHURSHUDELOLW\ U
VKDUHGPHPRU\ELWPDS XOWUDVRXQGLPDJLQJ&8'$DSSOLFDWLRQVIRU
WHPSHUDWXUHXSGDWHFRPSXWDWLRQǞ XQLȌHGVKDGHUSLSHOLQH&8'$$UFKLWHFWXUH
]HURFRS\PHPRU\GRWSURGXFW XQLYHUVLW\&8'$
WKUHDGV
FRGLQJZLWKǞ V
FRQVWDQWPHPRU\DQGǞ YDOXHV
*38ULSSOHXVLQJǞ
&38KDVKWDEOHLPSOHPHQWDWLRQǞ
*38VXPVRIDORQJHUYHFWRUǞ
*38KDVKWDEOHLPSOHPHQWDWLRQǞ
*38VXPVRIDUELWUDULO\ORQJYHFWRUVǞ
KDVKWDEOHFRQFHSWVǞ
*38YHFWRUVXPVXVLQJǞ
YHFWRUGRWSURGXFWVseeGRWSURGXFWFRPSXWDWLRQ
KDUGZDUHOLPLWWRQXPEHURI
KLVWRJUDPNHUQHOXVLQJJOREDOPHPRU\DWRPLFV YHFWRUVXPV
Ǟ &38Ǟ
LQFRUUHFWGRWSURGXFWRSWLPL]DWLRQDQGGLYHUJHQFH *38Ǟ
RI *38VXPVRIDUELWUDULO\ORQJYHFWRUVǞ
PXOWLSOH&38VǞ *38VXPVRIORQJHUYHFWRUǞ
RYHUYLHZRIǞ *38VXPVXVLQJWKUHDGVǞ
UD\WUDFLQJRQ*38DQGǞ RYHUYLHZRIǞǞ
UHDGPRGLI\ZULWHRSHUDWLRQVǞ verify_table()*38KDVKWDEOH
VKDUHGPHPRU\DQGseeVKDUHGPHPRU\ 9LVXDO3URȌOHU19,',$Ǟ
VXPPDU\UHYLHZ 9LVXDO6WXGLR&FRPSLOHUǞ
V\QFKURQL]LQJ
threadsPerBlock W
DOORFDWLQJVKDUHGPHPRU\Ǟ ZDUSVUHDGLQJFRQVWDQWPHPRU\ZLWKǞ
GRWSURGXFWFRPSXWDWLRQǞ while()ORRS
WKUHHGLPHQVLRQDOEORFNV*38VXPVRIDORQJHU
&38YHFWRUVXPV
YHFWRU
*38ORFNIXQFWLRQ
WKUHHGLPHQVLRQDOJUDSKLFVKLVWRU\RI*38VǞ
ZRUNVFKHGXOLQJ*38Ǟ
WKUHHGLPHQVLRQDOVFHQHVUD\WUDFLQJSURGXFLQJ'
LPDJHRI
tidYDULDEOH Z
blockIdx.xYDULDEOHDVVLJQLQJYDOXHRI ]HURFRS\PHPRU\
FKHFNLQJWKDWLWLVOHVVWKDQNǞ DOORFDWLQJXVLQJǞ
GRWSURGXFWFRPSXWDWLRQǞ GHȌQHG
SDUDOOHOL]LQJFRGHRQPXOWLSOH&38V SHUIRUPDQFHǞ
290