Copyright by Xuliang Han 2003 # The Dissertation Committee for Xuliang Han certifies that this is the approved version of the following dissertation: ## Fan-Out Equalized Shared Optical Backplane Bus | Committee: | | |-------------------------|--| | Ray T. Chen, Supervisor | | | Joe C. Campbell | | | G. Jack Lipovski | | | Michael F. Becker | | | Paul S. Ho | | ## **Fan-Out Equalized Shared Optical Backplane Bus** by Xuliang Han, B.S., M.S.E. #### **Dissertation** Presented to the Faculty of the Graduate School of the University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of **Doctor of Philosophy** The University of Texas at Austin December 2003 ## Acknowledgements I would like to thank the faculty and staff of the Microelectronics Research Center and the Department of Electrical and Computer Engineering at The University of Texas at Austin for providing me with such an intellectually stimulating environment to conduct my research and pursue my graduate education. I greatly appreciate my advisor, Dr. Ray T. Chen, for his support and guidance in my research work. Many thanks also go to my committee members, Dr. Joe C. Campbell, Dr. G. Jack Lipovski, Dr. Michael F. Becker, and Dr. Paul S. Ho, for their insightful advices on my dissertation. I would like to acknowledge the productive collaborations with Dr. Gicherl Kim, a former student in the Optical Interconnect Group and now a research engineer at Omega Optics. With his agreement, some figures and experimental results in this dissertation are adapted from his publications. I also want to thank other group members for their help over the years. Finally, I would like to express my gratitude to my parents and brother for their lifetime support and encouragement. ### Fan-Out Equalized Shared Optical Backplane Bus | Publication | No. | | | | |-------------|-----|--|--|--| | | | | | | Xuliang Han, Ph.D. The University of Texas at Austin, 2003 Supervisor: Ray T. Chen Optics is distinguished for its interconnect capability. A variety of optical interconnect technologies have been successfully employed in the real applications where the conventional implementations that are exclusively based on electrical interconnects have become insufficient, and the boundary demarcating the electrical and optical domain is being further pushed down in the interconnect hierarchy. Many researches have projected an imminent bottleneck throttling the board-to-board data transfers. Accordingly, an opportunity exists for the continuing exploitation of optics to complement or even replace the conventional electrical backplanes. The most prominent benefit of utilizing optics is the tremendous gain in the bandwidth capacity. From the architecture point of view, however, three fundamental optical methodologies, optical waveguide interconnects, free-space optical interconnects, and substrate-guided optical interconnects, have a huge discrepancy in how effectively the obtained bandwidth gain would improve the overall system performance. The approaches that are based on optical waveguide or free-space interconnects provide only the point-to-point topology, in turn the various proposed architectures are essentially an optical point-to-point switched backplane. In contrast, the approaches that are based on substrate-guided optical interconnects can effectively fulfill the shared bus topology, and thus an optical backplane bus can be implemented. In this dissertation, the comparative examinations specifically point out that optical backplane bus has many considerable advantages over optical point-to-point switched backplane. An innovative optical backplane architecture, optical centralized shared bus, is created based on substrate-guided optical interconnects, which utilizes the beneficial physical characteristics of optics while retaining the desirable architectural properties of the shared bus topology. Therefore, it is projected that the bandwidth gain would be maximized. Superior to other optical shared bus architectures, this innovatively designed optical backplane bus can accomplish equalized fan-outs across the entire architecture in an elegant manner. This significant merit can substantially ease the overall system integration. In this dissertation, the equalized bus fan-outs are successfully established on the fabricated optical interconnect layer. To further verify the feasibility of optical centralized shared bus architecture in the practical scenarios, two research prototypes, a microprocessor-to-memory interconnect demonstrator and a centralized shared-memory multiprocessing emulator, are constructed with the physically characterized optical centralized shared bus. ## **Table of Contents** | | | Page | |-----------|---------------------------------------------------------------------|------| | Acknowled | gements | V | | Abstract | | vi | | Chapter 1 | Introduction | 1 | | 1.1 | Interconnect Hierarchy | 1 | | 1.2 | Overview of Backplane Topologies | 3 | | 1.3 | Interconnect Bottleneck | 5 | | 1.4 | Advantages of Optical Interconnects | 8 | | 1.5 | Overview of Optical Interconnect Methodologies | 11 | | 1.6 | Advantages of Optical Backplane Bus vs. Optical Point-to-Point | | | | Switched Backplane | 18 | | 1.7 | Research Contributions | 19 | | 1.8 | Dissertation Outline | 21 | | Chapter 2 | Optical Centralized Shared Bus Architecture | 22 | | 2.1 | Overview of Optical Shared Bus Architectures | 22 | | 2.2 | Architectural Description of Optical Centralized Shared Bus | 24 | | 2.3 | Characteristics of Optical Centralized Shared Bus Architecture | 26 | | 2.4 | Summary | 30 | | Chapter 3 | Optical Interconnect Layer Implementation | 31 | | 3.1 | Configuration of Optical Interconnect Layer | 31 | | 3.2 | Overview of Volume Holographic Gratings | 32 | | 3.3 | Theoretical Analysis of Grating Formation Procedure within | | | | Dry Photopolymer Films | 37 | | 3.4 | Experimental Characterization of Grating Formation Procedure within | | | | Dry Photopolymer Films | 39 | | 3.5 | Fabrication of Waveguide Holograms | 47 | |-----------|-------------------------------------------------------------------|-----| | 3.6 | Demonstration of Equalized Bus Fan-Outs | 48 | | 3.7 | Power Budget Evaluation | 50 | | 3.8 | Bandwidth Characterization of Optical Interconnect Layer | 52 | | 3.9 | Summary | 54 | | Chapter 4 | Electro-Optical Interface Implementation | 55 | | 4.1 | Introduction | 55 | | 4.2 | Overview of High-Speed PCB Layout Design Techniques | 55 | | 4.3 | Transmitter Implementation Examples | 60 | | 4.4 | Receiver Implementation Examples | 63 | | 4.5 | Eye Diagram Measurement | 67 | | 4.6 | Summary | 75 | | Chapter 5 | System Demonstration Strategy | 76 | | 5.1 | Introduction | 76 | | 5.2 | Uniprocessing System | 77 | | 5.3 | Multiprocessing System | 78 | | Chapter 6 | Microprocessor-to-Memory Interconnect Demonstration | 79 | | 6.1 | Introduction | 79 | | 6.2 | Microprocessor-to-Memory Interface Design | 81 | | 6.3 | Microprocessor-to-Memory Interconnect Demonstration | 86 | | 6.4 | Discussions | 89 | | 6.5 | Summary | 89 | | Chapter 7 | Centralized Shared-Memory System Demonstration | 90 | | 7.1 | Overview of Multiprocessor Programming Models | 90 | | 7.2 | Centralized Shared-Memory System on Optical Centralized Shared Bu | s92 | | 7.3 | PCI over Optical Centralized Shared Bus Architecture | 92 | | 7.4 | Centralized Shared-Memory System Demonstration | 98 | | 7.5 | Discussions | 101 | | 7.6 | Summary | 102 | | Chapter 8 | 103 | | |-------------|---------------------------------|-----| | 8.1 | Summary | 103 | | 8.2 | Recommendations for Future Work | 105 | | Bibliograph | ny | 107 | | Vita | | 115 | ## **Chapter 1 Introduction** #### 1.1 Interconnect Hierarchy Interconnects become an even more dominant factor in telecommunication backbones, broadband local data networks, high performance computing (HPC) systems, and state-of-the-art signal processing engines. Table 1.1 represents a hierarchical perspective on interconnects and the pertinent implementation technologies [1], [2]. Distinguished by the widely discrepant distance, the interconnect hierarchy may be coarsely divided into four levels: telecommunications, data communications, board-to-board interconnects, and chip-to-chip interconnects. | | Chip-to-<br>Chip | Board-to-<br>Board | Data Links | Telecom | |------------------------------|-------------------------|-------------------------------------------------|---------------------------|-------------------| | Electrical<br>Implementation | РСВ | Backplane | TP/Coax | TP/Coax | | Optical<br>Implementation | Waveguide<br>Free-Space | Waveguide<br>Free-Space<br>Substrate-<br>Guided | SM/MM Fiber<br>Free-Space | SM Fiber | | Optical Source | 850nm<br>VCSEL | 850nm<br>VCSEL | 1.3/1.55μm<br>LD | 1.3/1.55μm<br>LD | | Interconnect<br>Distance | <1cm | <10cm | <1km | >1km | | | - | | | Optical<br>Domain | Table 1.1 Interconnect hierarchy and pertinent implementation technologies Since the interconnect distance has a considerable influence on the bandwidth capacity of the physical layer, the continuous revolution of the interconnect solutions is clearly reflected by the optical/electrical domain boundary that is being further pushed down in the interconnect hierarchy. The most significant benefit of utilizing optical interconnects is the tremendous gain in the bandwidth capacity. To meet the ever-increasing demand on bandwidth, a variety of optical interconnect technologies have been explored, and some of them have been successfully employed in the real applications where the conventional implementations that are exclusively based on electrical interconnects have become insufficient. In telecommunications, there is no argument about the success of optics. The approaches that are based on optical fibers have become the dominant solutions. The extremely low attenuation of optical fibers has a major influence on their replacement of coppers as the interconnect media. Now optical rack-to-rack interconnects are emerging on the horizon, e.g., in HPC clusters [3] and core network routers [4]. Many researches have projected an imminent bottleneck throttling the board-to-board data transfers [1], [5], [6]. Accordingly, an opportunity exists for the continuing exploitation of optical interconnects to complement or even replace the conventional electrical backplanes. Below the chip-to-chip level where latency is at a premium, optical interconnects might not be competitive with their electronic counterparts [7]. But it is rather early at this stage to predict how far down in the interconnect hierarchy optics would penetrate. In both electrical and optical domains, with the dramatically decreasing distance down in the interconnect hierarchy, system requirements such as bandwidth, latency, fan-out quality, power consumption, heat dissipation, complexity, cost, reliability, and so on are widely discrepant. In turn, the matched interconnect solutions vary from one hierarchical level to another. For example, the distance involved in telecommunications and data communications is so long that attenuation and dispersion are of major concerns. As a result, the 1.3/1.55µm lasers are selected as the primary optical sources since a normal optical fiber has the lowest attenuation at 1.55µm and the lowest dispersion at 1.3µm. On the other hand, the distance involved at and below the board-to-board level is so short that attenuation and dispersion are of minor concerns, and the choice of the wavelength is mainly dependent on the intrinsic properties of the laser sources themselves, including thermal sensitivity, quantum efficiency, and so on. Thus, the 850nm lasers, especially the VCSELs (vertical-cavity surfacing-emitting lasers), are singled out for use [8]. Moreover, many proven optical implementation technologies in telecommunication backbones, e.g., dense wavelength division multiplexing (DWDM), may not even be viable at the board- to-board level simply because these approaches might be prohibitively expensive. Thus, one of the most challenging research objectives of the day is to engineer and justify the appropriate optical implementation technologies at each hierarchical level. #### 1.2 Overview of Backplane Topologies The backplane topologies may be generically categorized into two basic types: shared bus and switched medium, as respectively illustrated in Figure 1.1 and 1.2. By combining these two fundamental blocks in different manners, more sophisticated topologies can be obtained. #### Shared Bus Figure 1.1 Schematic of shared bus topology Certainly the easiest way to connect multiple nodes is to have them share a single interconnect medium, as illustrated in Figure 1.1. By snooping on the shared media, all subscribed daughter boards can simultaneously receive the same information. Thus, the routing functionality is simply fulfilled in a broadcast fashion without involving any explicit routing delay. The shared bus is a direct network, and thus possesses very high connectivity. Each data transfer phase incurs only a single-hop delay. This feature is highly desirable in minimizing the interconnect latency. In a centralized shared-memory multiprocessing system, all cache controllers that are respectively embedded in each microprocessor simultaneously monitor every data transaction proceeding on the shared memory bus. In this manner, cache coherence can be effectively maintained [9]. The shared bus is inexpensive to implement since a lot of hardware are shared. In the electrical domain, the prominent bottleneck of the shared bus arises from its low bandwidth capacity. There are many frequency-dependent physical factors, including power loss, transmission line effects, electromagnetic interference (EMI), and so on, that considerably limit the maximum speed of an electrical backplane bus at a given line density. Furthermore, these restrictions aggravate with the extended bus length and the increased number of bus fan-outs. Another topological deficit originates from the medium access control (MAC) manner of the shared bus. Only one daughter board can deliver data on the shared medium at a time, and thus the shared bus topology possesses little parallelism. #### Switched Medium Figure 1.2 Schematic of switched medium topology The alternative to sharing a single interconnect medium is to have a dedicated line from the source daughter board to a switch that in turn provides a dedicated line to each destination daughter board, as illustrated in Figure 1.2. Thus, the routing functionality is handled in a point-to-point fashion. The switch maintains a routing table that contains the overall point-to-point connectivity information, and updates this table either on a regular basis or upon receiving the explicit requests [10]. By looking up the routing table with the decoded address information from the received data packet, the switch can identify the correct output port(s) to the intended destination board(s). Better than the shared lines, the dedicated lines do not experience any unnecessary fan-outs, considerably saving the overall power budget. The point-to-point interconnect topology can somewhat relieve the physical restrictions in the electrical domain, and thus allows the dedicated electrical lines to run at a faster speed at a given line density than the shared electrical lines. In some circumstances, with the intelligent traffic scheduling, a switch may coordinate multiple pairs of nodes to communicate simultaneously. This beneficial potential of high topological parallelism gives these interconnections much higher aggregate bandwidth than the speed of one shared bus line to one node. The switched medium is an indirect network since all data transfers must pass through an intermediate switching node. Each data transfer phase incurs a double-hop delay and a routing delay that mainly consists of the time spent on looking up the routing table. By its nature, the routing overhead increases in proportional to the size of the table, which is directly determined by the total number of the daughter boards subscribed to the switch. Replacing a single switching node with a multi-stage switch network becomes necessary in order to be able to retain the size of the routing table in each individual switch at a moderate scale, in turn, to save the routing delay. With this approach, however, the hop overhead is expected to rise with the increased number of the intermediate switching nodes. Thus, it is rather difficult to minimize the overall interconnect latency. Another significant drawback of the switched medium topology is related with the implementation cost. The additional involvement of many expensive devices, including switches, line transceivers, dedicated lines, and so on, considerably increases the overall system cost. #### 1.3 Interconnect Bottleneck At the board-to-board level, the required interconnects are usually provided by an electrical backplane bus, as illustrated in Figure 1.3. A daughter board can be simply plugged into the designated backplane connector to obtain the access to the shared media, i.e., the electrical bus lines on the backplane. As previously pointed out, there are many frequency-dependent physical factors that considerably limit the maximum speed of an electrical backplane bus at a given line density. Furthermore, these restrictions aggravate with the extended bus length and the increased number of bus fan-outs. In consequence, a typical electrical backplane bus operates at a frequency of less than 200MHz. Still in the electrical domain, many new techniques are being sorted to either overcome this speed hurdle or circumvent it by making more efficient use of the available bandwidth. For example. Intel is targeting at a physical signaling technique quad-pumping the data transfers over a 200MHz clocked system bus and a buffering scheme allowing for the sustained data transfers at 800Mbps [11]. Although at the considerable expense of the sophisticated digital signal processing facilities, the fundamental physical frequency is still not likely to exceed 500MHz in the practical implementations. This bandwidth deficit severely diminishes other merits of the shared bus topology, and will definitely throttle the data transfers at the board-to-board level in the interconnect hierarchy. Below this level, the performance of the advanced microprocessors continues to improve at a rapid pace. For example, the on-chip frequency of the Intel Pentium 4 microprocessor has surpassed 3GHz in 2003. In contrast, although the maximum front side bus (FSB) speed can reach the so-called 800Mbps at the considerable expense of the sophisticated quadpumping and buffering scheme, the physical off-chip clock frequency is still not able to exceed 200MHz. This problem of computing speed outpacing interconnect capacity is becoming more and more severe in the electrical domain. Meanwhile, as multiprocessing comes into the mainstream, the demand on board-to-board interconnects becomes even more critical. In the design of a multiprocessing system, one significant challenge is to effectively provide the communications among several processes that are simultaneously executing on multiple microprocessors. Above the board-to-board level, the rapid development of local data communications, including 10 Gigabit Ethernet [12], Fiber Channel [13], RapidIO [14], InfiniBand [15], PCI Express [16], and so on, is flushing more and more data into a single linecard within a shorter and shorter time interval. Consequently, in the electrical domain, an imminent throttling bottleneck is projected at the board-to-board level in the interconnect hierarchy. Figure 1.3 Typical electrical backplane bus The electrical backplane topology has been changed from the simple shared bus to the complicated switched medium, as illustrated in Figure 1.4 [14]. Employing a switched medium allows a backplane to operate at a higher frequency at a given line density than an electrical backplane bus, because the dedicated point-to-point lines are in some degree less prone to the frequency-dependent physical restrictions than the shard bus lines. Also, with the intelligent traffic scheduling in some circumstances, a switch may coordinate multiple pairs of nodes to communicate simultaneously. This topological parallelism gives these interconnections much higher aggregate bandwidth than the speed of one shared bus line to one node. These two benefits are the essential rationales behind the backplane topology trend in the electrical domain. As previously discussed, however, the switched medium is an indirect network, so it cannot carry out multicast/broadcast as effectively as the shared bus. From the architecture point of view, this topological deficit critically restricts the overall gain brought from the increased aggregate bandwidth. Each data transfer phase inevitably incurs a routing overhead, which makes it rather difficult to minimize the overall interconnect latency. In [17], the statistics of the memory read latency, respectively, in a medium and a large size switch-based multiprocessing system (Sun Fire 12K and 15K) shows that the wire delay is only moderate fraction of the total memory read latency, while the transactions through switches and the multicast/broadcast actions to maintain cache coherence are a significant fraction, also, the delay associated with switching and cache coherence increases with the system scale more rapidly than the wire delay. Meanwhile, the additional involvement of many expensive devices, including switches, line transceivers, dedicated lines, and so on, considerably increases the overall system cost. Therefore, an innovative technology that can provide sufficient bandwidth capacity while at the same time retaining the essential merits of the shared bus topology is highly desirable. Figure 1.4 Backplane topology trend in electrical domain [14] #### 1.4 Advantages of Optical Interconnects Optics has been well known for its interconnect capability. To meet the everincreasing demand on bandwidth, a variety of optical interconnect solutions have been explored, and some of them have been successfully employed in the real applications where electrical interconnect solutions have become insufficient. As shown in Table 1.1, the continuous revolution of the implementation technologies is clearly reflected by the optical/electrical domain boundary that is being further pushed down in the interconnect hierarchy. Accordingly, optical interconnects are being actively investigated as a primary complement or even alternative to electrical interconnects with the purpose to prevent the projected bottleneck from throttling the data transfers at the board-to-board level. From one hierarchical level to another, system requirements, including bandwidth, latency, fanout quality, power consumption, heat dissipation, complexity, cost, reliability, and so on, undergo considerable changes. In consequence, an implementation technology may have widely discrepant effects at different hierarchical levels. Pertinent to the interconnections at the board-to-board hierarchical level, the physical merits of optics mainly include [1], [18], [19], [20], [21]: - (1) No current-loop associated problems. Whenever an electrical signal switches, an AC current is introduced at the signal switching edge. A current return path, which is dictated for least impedance, is required to complete the loop. In the electrical domain, one of the most important concerns in the high-speed board design is to minimize various current-loop associated problems. If not appropriately handled, the interaction among these dynamic current loops may aggravate ringing, crosstalk, and radiation. In turn, these current-loop associated problems can severely degrade the single integrity. In contrast, there are no such current loops in the optical domain, and thus no current-loop associated problems present in the optical interconnection links either. - (2) Intrinsic immunity to electromagnetic interference (EMI). EMI is detrimental to nearly any electronic device, and the impact becomes more critical with the increased frequency. Meanwhile, the high-speed electrical wires can generate severe EMI as the harmful high-frequency noise in the surrounding environment. In contrast, the optical interconnection links are naturally insensitive to EMI, and do not radiate any EMI. - (3) Little liability to the transmission line effects. On the high-speed board, all signal wires are practically transmission lines. To provide the best medium for the transfers of electrical signals on the board, the impedance along the entire electrical transmission line must remain unchanged. However, the essential functionality of an electrical backplane is to provide a large number of fan-outs. In such a scenario, it is nearly impossible to obtain an electrical transmission line with constant impedance in the practical implementations. Along a transmission line with physical discontinuities, the original signal is subject to reflections wherever the impedance changes. If not properly controlled, these electrical reflections may interfere with signaling, resemble ring, cause false clocking, or destroy system functionality. In order to suppress the disturbing reflections, a bus line must be properly terminated at every stub. At each termination, however, considerable power is consumed to bring the voltage above the predefined logic threshold level. In contrast, because of the enormous carrier frequency of optical signals, the transmission line effects have little negative impacts in the optical interconnection links. These physical merits make optics a competitive candidate to provide the required interconnects at the board-to-board level. The most significant benefit of utilizing optical interconnects is the tremendous gain in the bandwidth capacity. In particular, the bandwidth capacity of a single optical interconnect line was experimentally characterized to be approximately 2.5THz [22]. Meanwhile, less power consumption per line is expected [23]. Because optics does not involve any current-loop associated problem or EMI, the crosstalk among the adjacent optical channels is largely reduced. Therefore, much higher interconnection density can be achieved by utilizing optical interconnects. Furthermore, the prominent progress in the fabrication of the two-dimensional (2-D) vertical-cavity surface-emitting laser (VCSEL) array devices and the 2-D photodiode array devices initiates the exploitation of real three-dimensional (3-D) interconnects. By adding another dimension for the data transfers, the haunting problem associated with pin-out may be relieved. By stacking multiple levels of active devices directly on top of one another, the interconnect distance can be minimized. This real 3-D approach opens many new areas of research. A low-threshold high-efficiency single-transverse-mode 2-D VCSEL array device with a high degree of uniformity was demonstrated [24]. The output of a VCSEL has much smaller beam divergence than an edge-emitting laser, and more importantly, is circular symmetric, which largely eases the integration of a VCSEL array device with a microlens array to further reduce the beam divergence [25]. The planar structure of a VCSEL array makes it easy to integrate with a CMOS laser driver array [26]. Meanwhile, the individual cost per VCSEL is considerably reduced through wafer-scale fabrication and on-wafer testing. Also, VCSELs have proven to be robust devices [27]. With these desirable features, the VCSEL technology has been credited as a key enabling solution to the high-performance board-to-board interconnects. #### 1.5 Overview of Optical Interconnect Methodologies There have been many research work devoted to investigate how to effectively utilize optical interconnects to expedite the data transfers among all daughter boards within a box. As respectively illustrated in Figure 1.5, 1.8, and 1.10, the optical implementation solutions at the board-to-board level that have been explored so far may be categorized into three basic types: optical waveguide interconnects, free-space optical interconnects, and substrate-guided optical interconnects, without counting the hybrid approaches that are based on the various combinations involving these fundamental methodologies. #### Optical Waveguide Interconnects Figure 1.5 Optical waveguide interconnects In a similar fashion to electrical lines, optical waveguides can be laid out on a board to provide the required interconnects among all daughter boards, as illustrated in Figure 1.5. Data can be transferred in optical waveguides at a much higher speed at a given line density than in electrical lines. As a result, the overall system performance may be considerably improved. A major concern in the approaches that are based on optical waveguide interconnects is the signal loss during the propagation. Optical polymers offer a versatile material for the fabrication of high-quality optical waveguides on a variety of types of substrates. A polymer waveguide device with very low attenuation of 0.03dB/cm at 840nm and good thermal stability was demonstrated [28]. Another critical issue is to couple the light that carries the information into and out of optical waveguides in an efficient and reliable manner. A 45° planar micromirror can simply function as an optical waveguide coupler, as suggested for use in the open waveguide layers in Figure 1.6 (a) [29] and (b) [30], and the fully embedded waveguide layer in Figure 1.6 (c) [31]. The 45° micromirror structure is naturally insensitive to wavelength and relatively easy to fabricate with high coupling efficiency and low scattering loss at the end of an optical waveguide. In addition to optical polymer waveguides, the regular optical fibers can be routed within a flexible film, as shown in Figure 1.7 (a) [32], and then the fabricated optical circuitry can function as a flexible backplane, as illustrated in Figure 1.7 (b). Figure 1.6 (a) 45° micromirror as waveguide coupler in [29] Figure 1.6 (b) 45° micromirror as waveguide coupler in [30] Figure 1.6 (c) 45° micromirror as waveguide coupler in [31] Figure 1.7 (a) STRATOS optical flex circuitry [32] Figure 1.7 (b) Flexible optical backplane The approaches that are based on optical waveguide interconnects provide only point-to-point interconnects, since it is exceptionally difficult for optical waveguides to deliver multiple high-quality bus fan-outs. For the data transfers between a pair of daughter boards that are not directly connected by optical waveguides, signal switching is required. From the architecture point of view, this topological deficit critically restricts the overall gain obtained by utilizing optical interconnects. Meanwhile, not as convenient as the free-style electrical signal lines, optical waveguides cannot be simply laid out on a board at demand due to the concern on the bending loss. In consequence, the routing scheme of optical waveguides is considerably subject to the geometrical constrictions. #### Free-Space Optical Interconnects Figure 1.8 Free-space optical interconnects The data transfers between two face-to-face daughter boards can be directly fulfilled through the free space in a point-to-point fashion, as illustrated in Figure 1.8. Data can be transferred in free-space optical links at a much higher rate at a given line density than in electrical lines. In turn, the overall system performance may be considerably improved. With the remarkable progress in the fabrication of the active optoelectronic array devices, especially the smart pixel array (SPA) devices as shown in Figure 1.9 (a) [33] and (b) [34], high-density parallel free-space optical interconnects are becoming attractive. Also, the routing scheme is fairly flexible because all optical interconnects proceed in the third dimension rather than on the original 2-D planar board. Moreover, because there is little interaction among multiple crossing optical beams in the free space, more radical 3-D interconnect architectures may be explored. Figure 1.9 (a) Packaged smart pixel array (SPA) in [33] Figure 1.9 (b) Packaged smart pixel array (SPA) in [34] Same as the optical waveguide methodology, the approaches that are based on free-space optical interconnects provide only the point-to-point connectivity. For the data transfers between two daughter boards that are not directly facing each other, signal relay and/or switching is required. From the architecture point of view, this topological deficit critically restricts the overall gain obtained by utilizing optical interconnects. Meanwhile, the optical signal links in the free space are open to the environmental noise, and the critical alignments can be easily disturbed. This packaging reliability shortfall makes the high-integrity data transfers in the harsh environment practically impossible. #### Substrate-Guided Optical Interconnects Figure 1.10 Substrate-guided optical interconnects As illustrated in Figure 1.10, in the approaches that are based on substrate-guided optical interconnects [35], the light that carries the information is confined within an optical waveguiding substrate that has the properly designed volume holographic gratings integrated on its top surface. The substrate provides a turbulence-free medium for optical interconnects, and the waveguide holograms function as optical fan-in/fan-out devices. For a data transfer, the light emitted from the source laser diode is diffracted into the substrate by the source waveguide hologram. Within the substrate, the incident angle at the substrate/air interface is engineered to be larger than the critical angle so as to satisfy the total internal reflection (TIR) condition. As a result, this light cannot escape from the confinement of the substrate. Then, the destination waveguide hologram couples this confined light out of the substrate and projects it onto the destination photodiode. In this manner, an optical interconnection link is established and adequately protected by the substrate from the noise present in the surrounding environment. The bandwidth capacity per substrate-guided optical interconnection line was experimentally characterized to be approximately 2.5THz [22]. Thus, by utilizing substrate-guided optical interconnects the overall system performance may be considerably improved. Most significantly, with the appropriate design of the types of the waveguide holograms and their relative positions on the top surface of the substrate, a variety of topologies, including the shared bus, can be effectively implemented. ## 1.6 Advantages of Optical Backplane Bus vs. Optical Point-to-Point Switched Backplane In the optical domain, there have been many research efforts devoted to prevent the projected bottleneck from throttling the data transfers at the board-to-board level. The most significant benefit of utilizing optical interconnects is the tremendous gain in the bandwidth capacity. From the architecture point of view, however, the three fundamental methodologies just compared have a huge discrepancy in how effectively the obtained bandwidth would improve the overall system performance. The approaches that are based on optical waveguide or free-space optical interconnects provide only the point-to-point topology. In turn, the various proposed solutions are essentially an optical point-to-point switched backplane. As previously discussed, this topological deficit critically restricts the gain in the bandwidth capacity. The point-to-point switched backplane is an indirect network, so it cannot carry out multicast/broadcast as effectively as the backplane bus. Besides the hop delay, each data transfer phase inevitably incurs a routing overhead, which makes it rather difficult to minimize the overall interconnect latency. The most crucial device in a switched medium is certainly the switch that coordinates the point-to-point connectivity among all daughter boards, and adjusts the available aggregate bandwidth according to the real-time traffic behavior. The most important functions of a switch are buffering, routing, and switching. So far, unfortunately, optical solutions have not been able to implement these critical functions as effectively as their electronic counterparts. This fact implies that extra optical-to-electrical and electrical-to-optical conversions must be performed at the interface of the switch if only the interconnection links are optically implemented. This optical-domain overhead introduces additional latency and complexity, and thus further diminishes the gain in the bandwidth capacity. Meanwhile, the additional involvement of many expensive devices, including switches, electrical-optical interface modules, and so on, considerably increases the overall system cost. In contrast, the approaches that are based on substrate-guided optical interconnects can effectively provide a variety of topologies, including the shared bus, and thus an optical backplane bus can be implemented. The backplane bus is a direct network, and thus possesses very high connectivity. Each data transfer phase incurs only a single-hop delay, which is highly desirable in minimizing the interconnect latency. Also, the shared bus is inexpensive to implement since a lot of hardware are shared. In the electrical domain, the major bottleneck of the shared bus topology arises from its low bandwidth capacity. Therefore, one objective of this dissertation is to start up an innovative technology in the optical domain that can provide sufficient bandwidth capacity while at the same time retaining the essential merits of the shared bus topology. #### 1.7 Research Contributions Several optical backplane bus architectures have been proposed in the past [36], [37]. As might be expected, in these pervious attempts, substrate-guided optical interconnects were substantially involved in the design with the purpose to engineer the shared bus topology. Unfortunately, none of them could fulfill equalized bus fan-outs across the entire optical backplane layer. This significant drawback can severely affect the overall system integration due to the constraint on the dynamic range. As a result, the feasibility of these previously proposed optical backplane architectures is in doubt. So far, none of them has been able to be really applied in any practical scenarios. The objectives of this dissertation are: - To start up an innovative technology in the optical domain that can provide sufficient bandwidth capacity while retaining the essential merits of the shared bus topology. - To fulfill equalized bus fan-outs across the entire optical interconnect layer. - To demonstrate the feasibility of the innovatively designed optical backplane bus in the practical scenarios, such as uniprocessing and multiprocessing systems. The major contributions of this dissertation are summarized below: - With the comparative examinations from the architecture point of view, the substrateguided optical interconnect methodology is distinguished out because of the beneficial potential of creating an optical backplane bus. - A new optical interconnect architecture, optical centralized shared bus, is introduced. This innovative optical backplane bus utilizes the beneficial physical characteristics of optics while retaining the desirable architectural properties of the shared bus topology. - Optical centralized shared bus architecture can fulfill equalized bus fan-outs across the entire backplane layer in an elegant manner. This significant merit can substantially ease the overall system integration in the practical implementations. - A systematic recording scheme is developed to assure the quality of the fabricated waveguide holograms and the accuracy of their diffraction efficiency. - The optical interconnect layer specified in optical centralized shared bus architecture is completely implemented, and the equalized bus fan-outs are successfully established across the entire fabricated optical interconnect layer. - The feasibility of optical centralized shared bus architecture in uniprocessing systems is experimentally verified by applying it to fulfill the critical microprocessor-to-memory interconnects in a research prototype. - As a preliminary effort, optical centralized shared bus architecture is applied in a multiprocessing research prototype to partially emulate the centralized shared-memory multiprocessing scheme. #### 1.8 Dissertation Outline In Chapter 2, optical centralized shared bus architecture is introduced, and its major characteristics, especially the merit to achieve equalized bus fan-outs, are delineated. In Chapter 3, the detailed procedure of implementing the optical interconnect layer specified in optical centralized shared bus architecture is presented, the equalized bus fan-outs are demonstrated across the entire implemented optical backplane bus, and the bandwidth capacity of the fabricated optical interconnect layer is characterized. In Chapter 4, the implementation and the high-speed performance characterization of the electro-optical interface modules are described. In Chapter 5, the strategy of verifying the conceptual feasibility of optical centralized shared bus architecture is addressed. In Chapter 6, optical centralized shared bus architecture is actually instantiated in a uniprocessing prototype to provide the critical microprocessor-to-memory interconnects. In Chapter 7, the preliminary attempt of exploiting optical centralized shared bus architecture in a centralized shared-memory multiprocessing prototype is shown. Finally, in Chapter 8, a summary of this dissertation is given, and the future directions of the research that targets at eliminating the potential bottleneck at the board-to-board hierarchical level are suggested. ## **Chapter 2 Optical Centralized Shared Bus Architecture** #### 2.1 Overview of Optical Shared Bus Architectures In any optical shared bus architecture, the interface between the optical and electrical backplane layer imposes the most significant constrictions on the system integration. The uniformity of the bus fan-outs on the shared media is of the most critical concern. The larger the variation, the more difficult the integration of the electro-optical transceivers becomes. Unfortunately, none of the optical shared bus architectures previously proposed by others could successfully manage this crucial issue. In [36], the shared bus topology was configured in a straightforward way by using double-grating waveguide holograms. As shown in Figure 2.1, the optical interconnect layer consists of an optical waveguiding substrate that provides a turbulence-free medium for optical interconnects, and the double-grating waveguide holograms integrated on its top surface function as optical fan-in/fan-out devices. As exhibited in Figure 2.2, however, this architecture intrinsically cannot fulfill equalized bus fan-outs on the shared media, and the variation among the fan-outs dramatically rises with the increased fan-out count. Moreover, the spatial overlap of the fan-in and fan-out optical beams further complicates the practical implementation of the electro-optical interface modules. Figure 2.2 Non-uniform bus fan-outs in [36] In [37], a hybrid approach involving both substrate-guided optical interconnects and free-space optical interconnects was employed to construct the shared bus topology, as illustrated in Figure 2.3. This design comprises the merits of substrate-guided optical interconnects by incorporating unreliable free-space optical signal links. Although the variation among the bus fan-outs can be mitigated in some degree, equalization is still intrinsically impossible. Also, this approach lacks of scalability due to the complicated routing scheme. Figure 2.3 Optical shared bus in [37] #### 2.2 Architectural Description of Optical Centralized Shared Bus Figure 2.4 Optical centralized shared bus architecture Figure 2.4 illustrates the architectural concept of the innovation, optical centralized shared bus [38]. For the simplicity, only five slots (#A1, #A2, #B1, #B2, and #C) are drawn in this schematic. Nonetheless, this innovation does not directly impose any restrictions on the total number of daughter boards. The board that is to be inserted into the central slot (#C) plays a pivotal role in this architecture, and is referred to as distributor in this dissertation. The electrical backplane provides interconnects for the non-critical signals. The electro-optical interface modules, including photodiodes and VCSELs, are integrated on the backside of the electrical backplane, and aligned with the underlying optical interconnect layer. Therefore, the insertion/removal of daughter boards during the normal operations does not disturb the critical alignment. In contrast to the other modules, the positions of the central VCSEL and photodiode are swapped. The optical interconnect layer consists of an optical waveguiding substrate integrated with the properly designed volume holographic gratings on its top surface. The substrate provides a turbulence-free medium for optical interconnects, and the waveguide holograms function as optical fan-in/fan-out devices. Underlying the distributor is a double-grating waveguide hologram, and the others are single-grating waveguide holograms. Figure 2.5 (a) and (b) illustrate the overall optical connectivity. By employing such an innovative interconnect configuration, this unique optical backplane architecture effectively fulfills both broadcastability and bi-directionality of signal flows on the shared bus [39]. Figure 2.5 (a) Data delivery to distributor Figure 2.5 (b) Data broadcast from distributor As indicated in Figure 2.4 and 2.5, there are two optical signal channels in this innovative architecture. One is for the source daughter board to deliver data to the distributor, as illustrated in Figure 2.5 (a), and the other for the distributor to broadcast data to every regular daughter board on the shared bus, as illustrated in Figure 2.5 (b). For a complete data transfer, as the name of this architecture implies, the source daughter board first deliver the data to the distributor. The VCSEL of the source daughter board emits the light that carries the information and projects it surface-normally onto its underlying waveguide hologram. This light is coupled into the optical waveguiding substrate by the grating and propagates within the confinement of the substrate under the total internal reflection (TIR) condition. Then, it is surface-normally coupled out of the substrate by the central double-grating waveguide hologram and detected by the central photodiode. Subsequently, the central VCSEL generates the optical signal that carries the original information and projects it surface-normally onto its underlying double-grating waveguide hologram. This light is coupled into the substrate and diffracted into two beams propagating along the two opposite directions within the confinement of the substrate under the TIR condition. During the propagation, a portion of the light is surface-normally coupled out of the substrate by a regular daughter board's underlying waveguide hologram and detected by its photodiode. This daughter board accepts the retrieved data if the destination header falls within its address range. If the distributor is actually the originating source of the data, the first data delivery process does not happen. If the distributor recognizes that it is the only data recipient, the second data broadcast process is not necessary. #### 2.3 Characteristics of Optical Centralized Shared Bus Architecture The most attractive feature of this innovative architecture is its ability to fulfill equalized bus fan-outs on the shared media. This merit can considerably save the overall power budget and substantially ease the practical system integration, because the critical interface between the optical and electrical backplane layer is uniform across the entire architecture. Assuming that the VCSELs in all electro-optical interface modules emit the same optical power $P_{in}$ , the criteria of equalized bus fan-outs are (1) the power of the optical signal broadcast from the distributor to every regular daughter board is same; (2) the power of the optical signal delivered from every regular daughter board to the distributor is same; and (3) the power of the optical signal broadcast from the distributor to every regular daughter board is same as that delivered from every regular daughter board to the distributor. By balancing the diffraction efficiency of the waveguide holograms in use, equalized bus fan-outs can be achieved. As illustrated in Figure 2.5 (a) and (b), in such a symmetric configuration, the central double-grating hologram functions as an equal-strength beam splitter with diffraction efficiency $\eta_{equal}$ [40], [41], and the diffraction efficiency of the single-grating holograms satisfies $$\eta_{Ai} = \eta_{Bi} \qquad (2.1)$$ where the subscription i represents the slot number counted in reference to the central slot (#C). Considering the two consecutive fan-outs on the broadcast bus illustrated in Figure 2.6, $\eta_i$ percent of the light that carries the information is coupled out of the substrate by hologram i, and $(1-\eta_i)$ percent of the light continues to propagate within the confinement of the substrate under the TIR condition. Then, by hologram i+1, $\eta_{i+1}$ percent of the remaining light is coupled out of the substrate. These two bus fan-outs are equalized when $$\eta_i = (1 - \eta_i) \eta_{i+1}$$ (2.2) Equation (2.2) can be rewritten in a more revealing form of $$\eta_{i+1} = \frac{\eta_i}{1 - \eta_i} \qquad (2.3)$$ By simply reversing the propagation directions of the optical beams drawn in Figure 2.6, it can be verified that the second criterion of equalized bus fan-outs is also guaranteed by equation (2.3). It follows that the power of the optical signal broadcast from the distributor to every regular daughter board can be expressed as $$P_{broadcast} = P_{in} \cdot \eta_{equal} \cdot \eta_1 \qquad (2.4)$$ and the power of the optical signal delivered from every regular daughter board to the distributor is $$P_{delivery} = P_{in} \cdot \eta_1 \cdot \eta_{equal} \qquad (2.5)$$ By comparing equation (2.4) with equation (2.5), the third criterion is also verified. Therefore, the iterative relationship specified by equation (2.3) along with equation (2.1) indicates the required diffraction efficiency balance to equalize all bus fan-outs. Figure 2.6 Schematic for the derivation of the condition for equalized bus fan-outs The single-grating holograms at the two ends of the substrate should be able to completely couple the remaining light out of the substrate so as not to introduce reflection waves on the shared bus. Otherwise, the reflected waves may interfere with the successive optical signals to cause signal integrity degradations. Thus, their diffraction efficiency $\eta_{\text{max}}$ should be as close to 100% as possible. Meanwhile, high $\eta_{\text{max}}$ also saves the overall power budget. By iterating equation (2.3), the diffraction efficiency of all waveguide holograms can be expressed in an explicit form. It can be derived that $$\eta_1 = \frac{\eta_{\text{max}}}{1 + (N/2 - 1)\eta_{\text{max}}}$$ (2.6) where N is the total number of the slots for the regular daughter boards on the electrical backplane (it is always an even integer because of the symmetric configuration). By substituting equation (2.6) into either equation (2.4) or (2.5), bus fan-out coefficient $\eta_{fan-out}$ , which is a critical factor in the power budget evaluation, can be expressed as $$\eta_{fan-out} = P_{broadcast} / P_{in} = P_{delivery} / P_{in}$$ $$= \eta_{equal} \cdot \eta_1 = \frac{\eta_{equal} \cdot \eta_{max}}{1 + (N/2 - 1)\eta_{max}} \qquad (2.7)$$ Figure 2.7 shows bus fan-out coefficient as a function of the total number of the slots for the regular daughter boards with the assumptions of $\eta_{equal} = 50\%$ and $\eta_{max} = 100\%$ . Figure 2.7 Bus fan-out coefficient vs. total number of the slots for the regular daughter boards with the assumptions of $\eta_{equal} = 50\%$ and $\eta_{max} = 100\%$ It is possible to replace the active distributor with a passive optical component, e.g., a right-angle prism. Although this passive device can fulfill the same function of routing optical signals in optical centralized shared bus architecture, the versatility of an active distributor can bring more desirable features. It has the similar functionality to a repeater employed in the optical fiber telecommunication system for optical signal reamplifying, reshaping, and retiming (3R) [42]. A centralized arbiter that coordinates the media access control (MAC) can be readily embedded into an active distributor. Furthermore, an active distributor can function as an edge router in a hierarchical interconnection network [43] where the communications within a local cluster are individually carried out on an optical centralized shared bus. By appropriately provisioning daughter boards into local clusters in accordance with the locality characteristic of the traffic behavior, such a hierarchical approach can generate extremely high parallelism, because the non-conflicting intracluster communications can proceed simultaneously. #### 2.4 Summary In this chapter, after reviewing the optical shared bus architectures previously proposed by others and realizing the major deficit caused by the non-uniform bus fanouts, a novel backplane architecture, optical centralized shared bus, is introduced. By employing an innovative configuration, this unique optical backplane architecture effectively fulfills both broadcastability and bi-directionality of signal flows on the shared bus. The most attractive feature of this innovation is its ability to achieve equalized bus fan-outs on the shared media. This merit can considerably save the overall power budget, and thus is highly desirable from the system integration point of view. With the appropriate MAC protocols, the innovation described herein is transparent to the higher architectural layers. It shall provide an open solution to high-performance interconnects in existing and future high-end systems. ## **Chapter 3 Optical Interconnect Layer Implementation** ### 3.1 Configuration of Optical Interconnect Layer As shown in Figure 2.4 and 2.5, the optical interconnect layer in optical centralized shared bus architecture consists of an optical waveguiding substrate with the properly designed volume holographic gratings integrated on its top surface [38]. The substrate provides a turbulence-free medium for optical interconnects, and the holograms function as optical fan-in/fan-out devices. Underlying the central distributor is a double-grating waveguide hologram, and the others are single-grating waveguide holograms. The singlegrating hologram provides 45° diffraction within the substrate, as illustrated in Figure 3.1 (a), and the double-grating hologram delivers 45° diffraction within the substrate in two directions, as illustrated in Figure 3.1 (b). The substrate in use has a refractive index of approximately 1.5, and thus the critical angle at the substrate/air interface is nearly 42°, i.e., less than 45°. Therefore, the total internal reflection (TIR) condition is guaranteed. The most attractive feature of this innovatively designed optical interconnect layer is its ability to achieve equalized bus fan-outs on the shared media by balancing the diffraction efficiency of the waveguide holograms in use. This merit can considerably save the overall power budget, and thus is highly desirable from the system integration point of view. Figure 3.1 (a) Wave vector diagram of single-grating waveguide hologram Figure 3.1 (b) Wave vector diagram of double-grating waveguide hologram #### 3.2 Overview of Volume Holographic Gratings The analysis of the diffraction characteristics of the general dielectric planar gratings has a long and interesting history. Reference [44] provides a detailed review along with a fairly thorough bibliography. The common methods of analyzing diffraction by grating are the modal approach, which is sometimes referred to as Floquet, Floquet-Bloch, or coupled-mode approach, and the coupled-wave approach, which is occasionally called coupled-mode approach too with confusion. Both of these approaches can produce the exact formulations without approximations. In fact, these formulations are completely equivalent in their full rigorous forms. They are merely alternative mathematical representations of the total electromagnetic field inside the grating region, and associated with each representation is a different physical perspective. The most obvious effect of diffraction by grating is the occurrence of multiple propagating backward- and forward-diffracted orders that typically exist outside the grating region, as illustrated by the wave vector diagram in Figure 3.2. In the coupled-wave approach, the total electromagnetic field inside the grating region is expanded in terms of its space-harmonic components, and their wave vectors have the following relationship of $$\vec{\beta}_i = \vec{\beta}_0 - i\vec{K} \qquad (3.1)$$ where $\vec{\beta}_i$ represents the wave vector of the i-th space-harmonic field, and $\vec{K}$ is the grating vector. This equation is often referred to as the Floquet condition. Figure 3.2 intuitively visualizes that the incident homogeneous plane wave may be divided into many diffracted inhomogeneous plane waves that have directions given by equation (3.1), and the i=0 inhomogeneous plane wave corresponds to the refracted incident wave. These inhomogeneous plane waves are not independent. In the grating-modulated medium, energy is coupled back and forth between the adjacent orders. As exhibited by the horizontal dashed lines in Figure 3.2, the i-th space-harmonic component inside the grating region produces a phase matched i-th field, respectively, in region 1 and 3. Outside the grating region, the field whose wave vector can be literally drawn on the semicircle in region 1 or 3 is a propagating wave, otherwise an evanescent wave. Figure 3.2 Wave vector diagram illustrating phase matching of the space-harmonic components of the total electromagnetic field inside the grating (region 2) with the propagating backward-diffracted orders (region 1), the propagating forward-diffracted orders (region 3), and the evanescent waves outside the grating [44] Because solving the rigorous coupled-wave equations is enormously time-consuming, the vast majority of the papers on grating diffraction analysis deal with approximate theories. There are a large number of possible approximations and assumptions that can be made. In some scenarios, the simplifications even allow the solutions in analytical forms to be obtained. From the rigorous theory along with a series of fundamental assumptions and simplifications, the exact formulations can be reduced to a variety of approximate theories. Among them, the two-wave first-order coupled-wave theory, which is commonly called Kogelnik theory [45], is of the most interest in holography, and now widely referenced in the analysis of the diffraction characteristics of volume holographic gratings. It assumes that: - (1) The spatial modulation of the refractive index and the absorption constant is of a sinusoidal form. - (2) Light incidence is at or near the Bragg angle and only the diffraction orders that obey the Bragg condition at least approximately are retained in the analysis. Other diffraction orders are neglected. This is the two-wave approximation that only takes the refracted incident wave and the i = 1 inhomogeneous plane wave into account. - (3) There is only a slow energy interchange per wavelength between the retained two coupled waves. This is the first-order approximation that eliminates all second derivatives of the field amplitudes from the coupled-wave equations, and thus some boundary effects are neglected. Consequently, the terminologies of transmission hologram and reflection hologram can be unambiguously used. These assumptions and approximations limit the validity of this theory to the Bragg regime diffraction by volume holographic gratings. The criteria for Bragg regime and volume hologram are articulated in [46]. By applying the Kogelnik theory, the diffraction efficiency of the single-grating hologram in Figure 3.1 (a) at its Bragg angle ( $0^{\circ}$ ) can be analytically expressed as $$\eta = \sin^2 \left( \frac{\pi \Delta nd}{\lambda \sqrt{\cos \theta}} \right) \quad (3.2)$$ where d is the thickness of the grating-modulated medium, $\lambda$ is the wavelength (850nm) of the incident light emitted from a VCSEL, $\theta$ is the diffraction angle (45°) within the substrate, and $\Delta n$ is the refractive index modulation. The double-grating hologram in Figure 3.1 (b) functions as an equal-strength beam splitter. It contains two incoherently superimposed phase gratings with a common Bragg angle (0°) and the same refractive index modulation. With the extended analyses based on the Kogelnik theory [40], [41], its diffraction efficiency can be derived as $$\eta_{equal} = \frac{1}{2} \sin^2 \left( \sqrt{2} \frac{\pi \Delta nd}{\lambda \sqrt{\cos \theta}} \right)$$ (3.3) when the Bragg condition is satisfied inside the grating region. The appropriately defined two-beam interference patterns can be recorded within dry photopolymer films to form the desired grating structures, as shown in Figure 3.3. Dry photopolymer films are well suitable for fabricating high-efficiency holographic gratings. The advantages of photopolymers over other types of emulsion, such as dichromated gelatin and silver halides, include dry-processing capability, long shelf life, and good photo-speed [47]. In the setup as shown in Figure 3.3, the 532nm line from the Verdi laser provides the exposing illuminations, and the shutter controls the exposure time. Two objective lenses paired with a pinhole respectively at their focal points are used for spatial filtering of the Verdi light. Two focusing lenses are employed to collimate the laser beams into plane waves, and expand them large enough to guarantee the uniformity of the formed grating structures. The combination of a half-wave plate with a polarizing beam splitter adjusts the intensity ratio of the two collimated laser beams, in turn, the fringe visibility of their interference pattern. Two rotation stages host, respectively, a right-angle prism and a mirror. Rotating them sets the appropriate recording angles [48]. A device is prepared by laminating dry photopolymer films onto an optical waveguiding substrate. To reduce the Fresnel reflection, the surface of the substrate is completely concealed by black tape except for the areas that are covered by the films. This substrate is put against the right-angle prism, and the appropriate index-matching oil is applied at their interface to reduce the Fresnel reflection. Figure 3.3 Setup for hologram recording A dry photopolymer film consists of monomers, polymeric binders, and photo-initiators. The monomers have strong absorption of the light around 530nm, and then become polymers. Thus, the refractive index of the film is accordingly changed. When exposed to a two-beam interference pattern, as shown in Figure 3.3, there are more monomers being polymerized in the bright regions than in the dark regions. The monomer concentration gradients that are introduced by this non-uniform exposure drive the monomers to diffuse from the dark regions to their adjacent bright regions. This procedure leads to a spatial periodic distribution of polymers, and the resulted refractive index modulation within the film conforms to the original illumination pattern. Therefore, a grating structure is formed within the film. A final uniform illumination polymerizes the remaining monomers, and thus stabilizes this grating structure. To obtain double-grating holograms, two sequential exposure steps are to be performed to record two incoherently superimposed phase gratings within the same film. # 3.3 Theoretical Analysis of Grating Formation Procedure within Dry Photopolymer Films The dynamic change of the monomer concentration u(x,t) during the exposure can be described by the following one-dimensional (1-D) polymerization/diffusion equation of $$\frac{\partial u(x,t)}{\partial t} = \frac{\partial}{\partial x} \left[ D(x,t) \frac{\partial u(x,t)}{\partial x} \right] - F_o[1 + V\cos(Kx)]u(x,t)$$ (3.4) where D(x,t) is diffusion parameter, $F_o$ is polymerization factor ( $F_o = \kappa I_o$ , where $\kappa$ is a constant and $I_o$ is the average irradiance), V is fringe visibility, and K is the magnitude of the grating vector. This equation indicates the dependence of the monomer polymerization and diffusion rate on the monomer concentration gradients formed by the period of the two-beam interference pattern during the exposure, and its solution can be written as a Fourier series as $$u(x,t) = \sum_{i=0}^{\infty} u_i(t) \cos(iKx)$$ (3.5) where $u_i(t)$ represents the amplitude of the i-th order harmonic of the monomer concentration. The initial conditions are $$u_0(t=0) = U_0$$ (3.6) $$u_{i\neq 0}(t=0) = 0$$ (3.7) where $U_{\scriptscriptstyle o}$ is the initial monomer concentration within the film. Since the mobility of the monomers is affected by the monomer concentration, the diffusion parameter can also be written as $$D(x,t) = \sum_{i=0}^{\infty} D_i(t) \cos(iKx)$$ (3.8) where $D_i(t)$ represents the amplitude of the i-th order harmonic of the diffusion parameter. Usually, retaining a few low grating orders would be sufficient to obtain a satisfactory estimation of the solutions to equation (3.4) [49]. The distribution of the polymer concentration affects the refractive index of the dry photopolymer film, in turn, the diffraction properties of the formed grating structure. From the solutions to equation (3.4), the dynamic change of the polymer concentration N(x,t) can be obtained as $$N(x,t) = \int_{0}^{t} F_{o}[1 + V\cos(Kx)]u(x,t')dt' = \sum_{i=0}^{\infty} N_{i}(t)\cos(iKx)$$ (3.9) where $N_i(t)$ represents the amplitude of the i-th order harmonic of the polymer concentration. If an approximately linear relationship between the polymer concentration and the refractive index of the film can be assumed, equation (3.9) gives the phase grating profile in terms of its space-harmonic components. To obtain double-grating holograms with non-parallel grating orientations, two sequential exposure steps are to be performed. During the first exposure, the dynamic change of the monomer concentration can be analyzed by solving equation (3.4). During the second exposure, the further monomer polymerization/diffusion can be described by the following two-dimensional (2-D) equation of $$\frac{\partial u(\hat{r},t)}{\partial t} = \nabla_t [D(\hat{r},t)\nabla_t u(\hat{r},t)] - F_o[1 + V\cos(\vec{K}_2 \cdot \hat{r})]u(\hat{r},t)$$ (3.10) where $\vec{K}_1$ and $\vec{K}_2$ represent, respectively, the grating vector formed by the first and second exposure. Consequently, its solutions can be written as the sum of two Fourier series as $$u(\hat{r},t) = \sum_{i=0}^{\infty} u_i(t) \cos(\vec{K}_1 \cdot \hat{r}) + \sum_{k=0}^{\infty} u_k(t) \cos(\vec{K}_2 \cdot \hat{r})$$ (3.11) If the two multiplexed gratings are orthogonal to each other $(\vec{K}_1 \perp \vec{K}_2)$ , the 2-D equation can be decomposed into two uncoupled 1-D polymerization/diffusion equations. Thus, the solutions to equation (3.10) can be simply obtained by sequentially solving two single-grating formations described by equation (3.4) with the appropriate connection conditions. In contrast, if the two multiplexed gratings are non-orthogonal, the two 1-D polymerization/diffusion equations obtained by decomposing equation (3.10) are coupled with each other. This non-orthogonality implies that the formation of the second grating is affected by the first grating in a fairly complicated way. # 3.4 Experimental Characterization of Grating Formation Procedure within Dry Photopolymer Films Equation (3.4) and (3.10) involve several material-specific parameters that are not available at this stage. Thus, experimental characterizations of the grating formation procedures within a specific type of dry photopolymer film are necessary to assure the quality of the fabricated waveguide holograms. Figure 3.4 (a) shows the real-time monitoring setup that was employed to evaluate the dynamic characteristics of the grating formation procedures in the 20µm-thick DuPont dry photopolymer films (HRF-600X014-20) [50]. In this setup, the wavelength of the probe beam is 850nm, which is same as the light emitted from a VCSEL. Since the monomers have little response to the light at this wavelength, this probe beam does not affect the grating formation procedures. A right-angle prism, which has the same refractive index as the waveguiding substrate, separates the undiffracted incident probe beam from the beam diffracted by the formed grating structure, as illustrated in Figure 3.4 (b), and their intensity is respectively monitored by an optical power meter. From this monitored dynamic diffraction efficiency, the information on the formed grating structure can be inferred in real time [51]. Figure 3.4 (a) Setup for hologram recording with real-time monitoring Figure 3.4 (b) Separation of diffracted and undiffracted beam by right-angle prism As concluded in [46], the criteria for Bragg regime and volume hologram require $$Q >> 1$$ and $F >> 1$ (3.12) The Q factor is $$Q = \frac{2\pi\lambda d}{n\Lambda^2} \qquad (3.13)$$ where n is the average refractive index of the film, $\Lambda$ is the spatial period of the formed grating structure. The F factor is $$F = 16 \frac{n^2}{\Delta n^2} \sin^2(\theta/2)$$ (3.14) By taking the relevant values of the waveguide holograms in Figure 3.1 into equation (3.13) and (3.14), the criteria for the Kogelnik theory are justified, and thus equation (3.2) and (3.3) are validated for the diffraction efficiency calculation. Rearranging the terms in equation (3.2), it can be derived that $$\Delta n = \lambda \sqrt{\cos \theta} \sin^{-1}(\sqrt{\eta}) / \pi d \qquad (3.15)$$ This equation reveals that the information on the refractive index modulation $\Delta n$ can be obtained from the measured diffraction efficiency $\eta$ . Since the diffraction properties in Bragg regime of a volume hologram are essentially determined by its first-order grating, it may be assumed that the $\Delta n$ calculated from equation (3.15) gives a fairly close estimation of the formed first-order grating within the film. Figure 3.5 exhibits such an example with the data from a typical experiment in which a single-grating hologram as illustrated in Figure 3.1 (a) was fabricated by using the DuPont dry photopolymer film. Figure 3.5 (a) Monitored dynamic diffraction efficiency during grating formation Figure 3.5 (b) Calculated change of refractive index modulation during grating formation The second term on the right hand side of equation (3.4), $F_o[1+V\cos(Kx)]u(x,t)$ , represents the amount of the monomers being polymerized during the grating formation procedure. By substituting u(x,t) with the Fourier series in equation (3.5), this term can be explicitly written as $F_o[1+V\cos(Kx)][\sum_{i=0}^{\infty}u_i(t)\cos(iKx)]$ , which contains the products of the first-order grating with higher grating orders. These products indicate that the formation of a grating structure within a dry photopolymer film is a non-linear process. As exhibited by the considerable downslides in Figure 3.5 (a) and (b), the consumption of the available monomers by higher grating orders weakens the first-order grating within the film, and thus subdues the diffraction efficiency of the hologram. Therefore, identifying the experimental conditions that can substantially suppress the formation of higher grating orders is crucial to the quality of the fabricated waveguide holograms. To examine the dynamic characteristics of the single-grating formation procedure in the DuPont dry photopolymer film, a series of single-grating holograms, as shown in Figure 3.1 (a), were fabricated by employing the setup in Figure 3.4 (a) [50]. In these experiments, the exposing illumination was stopped before reaching the maximum diffraction efficiency. Figure 3.6 shows the dynamic diffraction efficiency measured in real time during and after the exposure. As shown in this figure, after terminating the exposure, the dynamic diffraction efficiency continued to increase before reaching a saturation value. This post-exposure increment depended on the diffraction efficiency at which the exposure was stopped ( $\eta_{stop}$ ). The data in Figure 3.6 show that the dynamic diffraction efficiency increased, respectively, 5%, 10%, 13%, 13%, 10%, and 8% after the exposure was stopped at 26%, 37%, 48%, 62%, 73%, and 87%. At relatively low diffraction efficiency, such as 26%, the monomer concentration gradients are too weak to further drive the monomers to diffuse from the dark regions to their adjacent bright regions. When reaching relatively high diffraction efficiency, such as 87%, only a small amount of the available monomers are left for the continuation of the grating formation procedure. Thus, the post-exposure increments in the dynamic diffraction efficiency are relatively small in these two cases. Figure 3.6 Dynamic diffraction efficiency measured in real time during and after the exposure. The exposure was stopped, respectively, at 26% ( $\blacksquare$ ), 37% ( $\blacktriangle$ ), 48% ( $\times$ ), 62% ( $\bullet$ ), 73% (+), and 87% ( $\bullet$ ). The second term on the right hand side of equation (3.10), $F_o[1+V\cos(\vec{K}_2\cdot\hat{r})]u(\hat{r},t)$ , can be explicitly written as $F_o[1+V\cos(\vec{K}_2\cdot\hat{r})][\sum_{i=0}^{\infty}u_i(t)\cos(\vec{K}_1\cdot\hat{r})+\sum_{k=0}^{\infty}u_k(t)\cos(\vec{K}_2\cdot\hat{r})]$ , in which the products between the first and second grating represent the interactions between these two grating formation procedures within the same film. To examine the dynamic characteristics of the double-grating formation procedure in the DuPont dry photopolymer film, a series of double-grating holograms, as shown in Figure 3.1 (b), were fabricated by employing the setup in Figure 3.4 (a) [50]. These experiments were conducted in the following order of (1) A DuPont dry photopolymer film was exposed to an appropriate two-beam interference pattern to form the first grating. - (2) The first exposure was stopped when the monitored dynamic diffraction efficiency reached a specific value $\eta_{stop}$ . - (3) The orientation of the optical waveguiding substrate was rotated 180° in respect to its original position. - (4) The same DuPont dry photopolymer film was exposed to the original recording beams again to form the second grating. To ensure the stability of the first grating and the repeatability of the experiments, some extra waiting time was spent at the end of the third step to assure that the monitored dynamic diffraction efficiency of the first grating reached the prescribed value $\eta_1^0$ before the second exposure was started. One more optical power meter was used in the setup to monitor the intensity of the beam diffracted by the second grating structure. Figure 3.7 (a), (b), (c), and (d) show the dynamic diffraction efficiency of the two incoherently superimposed gratings measured in real time during and after the second exposure with different $\eta_1^0$ . The crossing points at which these two multiplexed gratings reached the same dynamic diffraction efficiency $\eta_{\it equal}$ were characterized through these experiments. As shown in these figures, during the second exposure, the dynamic diffraction efficiency of the first grating continued to increase before reaching a saturation value, and then rolled down with the further exposure. The dynamic diffraction efficiency of the second grating exhibited a similar behavior, but it somewhat lagged behind the first grating. The data in Figure 3.7 (a), (b), and (c) show that $\eta_{equal}$ was, respectively, 26%, 43%, and 47% when $\eta_1^0$ was 16%, 30%, and 37%. In these cases, the amount of the available monomers left after the first exposure was still sufficient so that the formation of the second grating was able to catch up with the first grating, and higher $\eta_{equal}$ could be reached with the increased $\eta_{\rm l}^{\rm 0}$ . In contrast, the data in Figure 3.7 (d) shows that $\eta_{\rm \it equal}$ did not exist when $\eta_1^0$ was 52%, since the amount of the available monomers left after the first exposure had become too small to form the second grating within the same film with a sufficiently large refractive index modulation. Figure 3.7 (a) Dynamic diffraction efficiency of the two incoherently superimposed gratings measured in real time during and after the second exposure. The second exposure was started when the dynamic diffraction efficiency of the first grating $\eta_1^0$ reached, respectively, (a) 16%, (b) 30%, (c) 37%, and (d) 52%. #### 3.5 Fabrication of Waveguide Holograms In optical centralized shared bus architecture, two types of waveguide holograms, as respectively illustrated in Figure 3.1 (a) and (b), are specified. Their quality directly affects the bus fan-out uniformity and the power budget, and thus is pivotal to the practical implementation of the designed optical interconnect layer. In order to save the power budget, the waveguide holograms in use should have high diffraction efficiency and low insertion loss. As stated in Chapter 2, the diffraction efficiency balance of the single-grating holograms should satisfy equation (2.1) and (2.3), and the double-grating hologram should function as an equal beam splitter. Based on the understanding of the dynamic characteristics of the grating formation procedures within the DuPont dry photopolymer film, in conjunction with the setup's capability of real-time exposure dosage control in response to the monitored dynamic diffraction efficiency as shown in Figure 3.4 (a), the optimal recording schedules were established [50]. By conducting a series of trial experiments similar to the one as shown in Figure 3.5, the optimal experimental conditions, including irradiance, intensity ratio of the two recording beams, and other manageable factors, were identified and maintained during the following waveguide hologram fabrications. Because the formation of higher grating orders was substantially suppressed, a large refractive index modulation could be obtained. For the fabrications of the single-grating holograms, their diffraction efficiency was directly controlled in real time. From the data in Figure 3.6, the correspondence between the stop diffraction efficiency $\eta_{stop}$ and the post-exposure increment could be identified. Thus, for a prescribed diffraction efficiency value that satisfies equation (2.3), the corresponding $\eta_{stop}$ could be determined. For the fabrications of the equal-strength double-grating holograms, the appropriate recording schedules were established with a systematic approach. From the data in Figure 3.7 (a), (b), (c), and (d), the optimal $\eta_1^0$ and the corresponding second exposure time could be identified, and then the matched $\eta_{stan}$ could be determined from Figure 3.6 with this known $\eta_1^0$ . This fabrication approach was able to assure the quality of the fabricated waveguide holograms and the accuracy of their diffraction efficiency with satisfactory repeatability. In comparison with the fabrication scheme involving only indirect diffraction efficiency control by setting the illumination time prior to the exposure [52], this systematic approach also eliminates the uncertainties caused by environmental noise or other unknown factors. #### 3.6 Demonstration of Equalized Bus Fan-Outs The waveguide holograms, as illustrated in Figure 3.1 (a) and (b), were fabricated by employing the systematic approach described above, and the optical interconnect layer was implemented as specified by optical centralized shared bus architecture. Figure 3.8 exhibits the equalized bus fan-outs that were successfully established across the entire optical interconnect layer [50]. In this implementation, the diffraction efficiency balance as specified by equation (2.1) and (2.3) was completely satisfied. In the middle, an equalstrength double-grating hologram with $\eta_{equal} = 47\%$ was integrated on the top surface of the substrate. At its left and right side was, respectively, a single-grating hologram with $\eta = 50\%$ . Because the aluminum-coated 22.5° bevel at the end of the substrate could provide nearly 100% reflection efficiency, it was equivalent to a single-grating hologram with $\eta = 100\%$ . An 850nm VCSEL was used to obtain the CCD photo in Figure 3.8 (a). The input VCSEL power was 2mW, and the measured intensity of the bus fan-outs was, respectively, 0.404mW, 0.4.06mW, 0.400mW, and 0.396mW, from left to right. Unlike the common optical waveguides or fibers, in a waveguiding substrate multiple optical interconnects can simultaneously proceed in a 2-D fashion [53]. This scheme can effectively increase the overall aggregation bandwidth of an optical backplane. The basic optical centralized shared bus architecture, as illustrated in Figure 2.4, can be simply extended to a configuration with 2-D multiple bus lines, and equalized bus fan-outs can still be individually established along each bus line. Figure 3.8 (b) exhibits such a demonstration. A 2-D 8X8 VCSEL array was packaged with a 2-D 8X8 microlens array [54], and a 2-D 2X2 sub-array on it with a pitch size of 1mm was turned on to obtain the CCD photo in Figure 3.8 (b). Figure 3.8 (a) Equalized bus fan-outs on an optical centralized shared bus Figure 3.8 (b) Equalized bus fan-outs on a 4-bit optical centralized shared bus #### 3.7 Power Budget Evaluation Power budget is a major concern in the system implementation. Particularly in optical centralized shared bus architecture, it directly affects the maximum number of the daughter boards that can be accommodated in the shared media. Furthermore, with a thorough power budget evaluation the problems associated with heat dissipations can be identified and then managed accordingly. In this innovative architecture, for a complete data transfer, the source daughter board first delivers the data to the central distributor, as illustrated in Figure 2.5 (a), and then the central distributor transfers these data to all subscribed regular daughter boards in a broadcast fashion, as illustrated in Figure 2.5 (b). As discussed in Chapter 2 and demonstrated in Figure 3.8, the most attractive feature of this innovatively designed optical interconnect layer is its ability to fulfill equalized bus fan-outs on the shared media. Because of this feature, the power budget evaluation on the delivery bus is same as on the broadcast bus. On the broadcast bus, a comprehensive power budget assessment consists of the evaluations from two aspects. One is from the application point of view to determine the specifications of the individual bus line, mainly including data rate (DR), bit error rate (BER), and marginal power penalty $(P_{margin})$ . The other one is from the physical link point of view to evaluate the characteristics and identify the capacity of each involved component. Following the trace of an optical beam in the optical interconnect layer, as illustrated in Figure 2.5 (b), these factors primarily are VCSEL emission power $(P_{in})$ , VCSEL emission pattern, beam propagation diversion, insertion loss of each passive optical component $(\alpha_k)$ , numbers of bus fan-outs (n), bus fan-out coefficient $(\eta_{fan-out})$ , and photodiode sensitivity $(P_{pd})$ . The photodiode sensitivity is a function of the specified data rate and bit error rate, and thus can be explicitly written as $P_{pd}(DR,BER)$ . The emission power of a VCSEL can be determined by $$P_{in} = \eta_{qe} P_o \qquad (3.16)$$ where $P_o$ is the supply power and $\eta_{qe}$ is the quantum efficiency of the VCSEL. The fundamental mode of a VCSEL has a circular symmetric emission pattern, and its field profile can be described by a Gaussian function. As a Gaussian beam propagates, its beam width $\omega(z)$ expands as $$\omega(z) = \omega_o \sqrt{1 + (\frac{\lambda z}{\pi \omega_o^2})^2} \qquad (3.17)$$ where $\lambda$ is the wavelength of the light emitted from the VCSEL, $\omega_o$ is the width of the beam waist, which is approximately equal to the diameter of the active aperture of the VCSEL. In consequence, its power intensity falls accordingly during the propagation. As a first-order approximation, this propagation diversion loss $\gamma(z)$ can be simply estimated by $$\gamma(z) \approx \frac{\omega_o}{\omega(z)}$$ (3.18) With all these information, the total power budget can be expressed as $$P_{in}(1 - \sum_{k} \alpha_k - n\eta_{fan-out})\gamma(z) \ge P_{pd}(DR, BER) + P_{margin}$$ (3.19) This inequality imposes a restriction across the entire implemented optical interconnect layer. In reference to it, some factors may be accordingly optimized to save the overall power budget. For example, it is highly desirable to utilize high-power single-mode VCSELs [55]. The insertion loss of each passive optical component may be respectively minimized. As the diffraction phenomenon dictates, the divergent angle of the emitted VCSEL light increases with the decreased active area of the VCSEL. Since the VCSEL design usually favors small active area due to the concerns on the speed performance, the divergent angle of the light directly emitted from a VCSEL is not likely to be reduced. In order to collimate the divergent light, the packaging of a VCSEL should contain a focusing subassembly, e.g., a dome lens. Similarly, a focusing subassembly is necessary in the packaging of a photodiode in order to gather the incoming light onto its active area. In order to reduce heat dissipations, high quantum efficiency VCSELs and photodiodes with low thermal resistances are highly desirable. Substrate-removed thin-film VCSELs can be directly used without any cooling techniques. A 50% quantum efficiency increase was experimentally confirmed for the $10\mu$ m-thick VCSELs, and they had the lowest thermal resistances compared to the $250\mu$ m-, $200\mu$ m-, $150\mu$ m-, and $100\mu$ m-thick VCSELs [56]. These special features of the substrate-removed thin-film VCSELs can substantially ease the heat management. Because in this innovative architecture all bus fan-outs are equalized, the implementation of the photodiodes in use can be optimized in favor of the sensitivity performance with an extremely small noise figure by taking the advantage of the reduced dynamic range requirement. This merit can effectively optimize the $P_{pd}(DR, BER)$ function in inequality (3.19), and thus considerably save the total power budget. This particular power budget evaluation on the broadcast bus assumes that there is only a single optical bus line in the basic optical centralized shared bus architecture, as illustrated in Figure 2.4 and 2.5. If multiple optical interconnects simultaneously proceed in a two-dimensional fashion, as demonstrated in Figure 3.8 (b), the crosstalk among the bus lines and the related issues should be taken into account for a more complete power budget evaluation [54]. #### 3.8 Bandwidth Characterization of Optical Interconnect Layer The most significant merit of optical interconnects comes from the tremendously large bandwidth capacity that optics possesses. In optical centralized shared bus architecture, substrate-guided optical interconnects are utilized to boost the throughput on the optical backplane. As illustrated in Figure 2.4, the optical interconnect layer consists of an optical waveguiding substrate with the properly designed volume holographic gratings integrated on its top surface. The substrate provides a turbulence-free media for optical interconnects, and the holograms function as optical fan-in/fan-out devices. The optical bandwidth of the substrate-guided optical bus line had been experimentally characterized [22]. First, the frequency spectrum of a femtosecond laser pulse in air was measured as a reference. Then, let this short pulse repetitively propagate across the entire implemented optical interconnect layer, as illustrated in Figure 3.9 (a). The acquired time-domain data of the output laser pulse were processed in real time by fast Fourier transform (FFT). In the frequency domain, by comparing this obtained FFT spectrum with the reference, the bandwidth capacity of the media through which this pulse traveled could be directly viewed. As shown in Figure 3.9 (b) [22], the 3dB bandwidth of the implemented optical interconnect layer was identified to be approximately 2.5THz per bus line. Such an enormous figure is far beyond the capacity of existing electro-optical transceivers. Figure 3.9 (a) Femtosecond laser pulse for bandwidth characterization Figure 3.9 (b) Bandwidth characterization of a single substrate-guided optical interconnect line in the frequency domain [22] ### 3.9 Summary In this chapter, the theories analyzing diffraction by grating are briefly reviewed. Based on the understanding of the dynamic characteristics of the grating formation procedures within the DuPont dry photopolymer film, in conjunction with the setup's capability of real-time exposure dosage control in response to the monitored dynamic diffraction efficiency, the optimal recording schedules were established with a systematic approach. With these optimized recording schedules, high-quality waveguide holograms were fabricated as specified by optical centralized shared bus architecture, and equalized bus fan-outs were successfully demonstrated across the entire implemented optical interconnect layer. The systematic recording scheme described in this chapter can also be employed to conduct the fabrications of other types of single or multiplexed holographic gratings within dry photopolymer films. A generic power budget evaluation is presented, and the feasible optimizations pertinent to optical centralized shared bus architecture are suggested accordingly. Finally, the enormous bandwidth capacity of the implemented optical interconnect layer was experimentally characterized to be approximately 2.5THz per bus line. ## **Chapter 4 Electro-Optical Interface Implementation** #### 4.1 Introduction The burden of electrical-to-optical and optical-to-electrical conversions has critically prevented optics from growing to be a practical solution at the board-to-board level. This chapter represents an electrical-domain perspective at the interface between the electrical and optical interconnect layer. To ensure the high-speed performance of the implemented electro-optical interface modules, the individual properties of each active optoelectronic component and the physical layouts on the printed circuit board (PCB) are the two major concerns. Nonetheless, the scope of this dissertation mainly covers the latter one, and the implementation examples presented in this chapter were completely based on the off-the-shelf devices. #### 4.2 Overview of High-Speed PCB Layout Design Techniques One of the most important considerations in the high-speed PCB design is about the power distribution network. For a simple illustration, it is assumed that a PCB requires a power supply (VCC) and a ground (GND). The objective is to deliver the exact voltage of VCC to the power pins of every device on the PCB, regardless of its position relative to the power source. Furthermore, the voltage at the power pins should be free of line noise. These ideal properties could be ensured only if the power source had zero impedance. In reality, the design goal is to reduce the impedance of the power distribution network as much as possible. For this purpose, the multi-layer stack scheme is commonly employed, in which each power level occupies a separate layer, as illustrated by the four-layer PCB in Figure 4.1. The entire power plane is covered with metal, and the only gaps in the metal are those needed for placing pins or signal vias. As a result, the impedance of the power distribution network is largely reduced. Since the multi-layer stack scheme along does not eliminates line noise, extra noise filtering with bypass capacitors is mandatory. Generally, a $1\mu F$ to $10\mu F$ capacitor is placed across the power inputs on the PCB, and $0.01\mu F$ to $0.1\mu F$ capacitors are positioned across the power and ground pins of every active device. It is best to place the bypass capacitors on the opposite side of the PCB directly under the device. In such a scenario, the surface-mount packaged capacitor works well. Also, the construction type of the bypass capacitors is an important factor and depends on the specific frequency ranges and applications [57]. Figure 4.1 Four-layer PCB with dedicated power and ground planes Another important function of the power distribution network is the provision of a return path for all signals in the system, whether generated on or off the PCB. Whenever a signal switches, an AC current is introduced at the signal switching edge. A current return path, which is dictated for least impedance, is required to complete the loop. If not appropriately handled, the interaction among these dynamic current loops may aggravate ringing, crosstalk, and radiation. In turn, these current-loop associated problems can degrade the single integrity. A current loop can be thought of as a single-turn coil, and the return path of least impedance is the one that results in the smallest loop. In other words, the return signal follows the original signal line as closely as possible. In the multi-layer stack scheme, this usually happens in the VCC or GND plane above or below the origin signal line. A power plane imposes no natural restrictions on the current flow. Therefore, the return current can follow the optimal path of least impedance, resulting in the smallest possible current loop. Meanwhile, the noise currents are distributed because their return paths are not restricted either. By creating separate analog and digital power planes, the sensitive analog devices can be well protected away from the digital noise. If a signal line must cross the analog/digital power plane boundary, as illustrated in Figure 4.2, the return path of the signal in this line can still be optimized by placing a narrow jumper to bridge the power plane break. Figure 4.2 Jumper to bridge power plane break for current return On the high-speed PCBs, all signal interconnections are practically transmission lines, rather than simple wire connections. To provide the best medium for the on-board signal transmissions, the impedance along an entire signal line must remain unchanged. Such an interconnection is called an impedance-controlled signal line. Along a transmission line with physical discontinuities, the original signal is subject to reflections wherever the impedance changes. Without proper management, these reflections may interfere with signaling, resemble ring, cause false clocking, or destroy system functionality. On a PCB with the dedicated power planes, there are two transmission line categories: microstrip and stripline, as respectively illustrated in Figure 4.3 and 4.4. Figure 4.3 Microstrip transmission line Figure 4.3 defines the dimensions of a microstrip transmission line. In this case, the signal trace is open to the environment on its top, and has a dielectric material layer and a conducting plane underneath it. This technique allows easy access to the signal line. The characteristic impedance $(Z_o)$ of a microstrip can be adequately estimated by the first-order approximate equation of [57] $$Z_o = \frac{87}{\sqrt{\varepsilon_r + 1.41}} \ln\left[\frac{5.98H}{0.8W + T}\right] (\Omega) \quad (4.1)$$ The common PCB material is FR4 with a typical relative dielectric constant $\varepsilon_r$ of 4.4~5. Figure 4.4 Stripline transmission line Figure 4.4 defines the dimensions of a stripline transmission line. In this case, the signal trace is sandwiched between two dielectric material layers and two conducting planes. Because the signal line is shielded on its both sides, this technique theoretically delivers the cleanest signals. However, it is not easy to access to a stripline since it is hidden. The characteristic impedance ( $Z_o$ ) of a stripline can be adequately estimated by the first-order approximate equation of [57] $$Z_o = \frac{60}{\sqrt{\varepsilon_r}} \ln[\frac{4H}{0.67\pi(0.8W + T)}] (\Omega)$$ (4.2) More accurate calculations of the characteristic impedance of the microstrip and stripline transmission line are provided by several RF applications software, e.g., AppCAD from Aglient [58]. The design of the PCB layouts should avoid any physical discontinuity. On the other hand, however, the essential functionality of a signal transmission line is to deliver a large number of fan-outs. In consequence, it is nearly impossible to obtain an impedancecontrolled signal line in the practical implementations. Thus, a signal transmission line must be properly terminated at each fan-in/fan-out stub so as to suppress the disturbing reflections. The PECL (positive emitter coupled logic) devices have gained tremendous popularity in the high-speed digital applications [59]. Various termination schemes are available to interface to the PECL electro-optical transceivers [60]. As one +5V PECL termination example, the two-resistor divider scheme in Figure 4.5 is commonly used in the practical implementations. In addition to providing the signal line termination, the $68\Omega/191\Omega$ combination in Figure 4.5 biases the input level at 3.7V, which is the common level of the +5V PECL. Thus, the maximum input sensitivity can be obtained. The $330\Omega$ resistor in Figure 4.5 provides an output path for the emitter current. A high-speed PECL signal is usually differentially transferred in two signal lines. With the ability of rejecting the common mode noise, this special scheme can further ensure the signal integrity. From the physical layout point of view, two differential lines should be routed as close to each other as possible with constant spacing and equal electrical length. Figure 4.5 +5V PECL termination by two-resistor divider #### 4.3 Transmitter Implementation Examples Inside an electro-optical interface module, a transmitter implements the electrical-to-optical conversion. It consists of a laser diode and a laser driver. The laser driver provides a stable bias current for the laser diode, and converts the input modulation voltage into the current that modulates the laser diode. A GBIC (Gigabit Interface Converter) [61] module contains an integrated transmitter. Figure 4.6 shows a multimode 850nm 1.0625GBd Fiber Channel 1.3 Gigabit Ethernet 1X9 transceiver from Infineon [62]. This type of electro-optical interface module can be readily used in the practical applications that are based on the specific communication standards such as Gigabit Ethernet and Fiber Channel. Figure 4.6 Infineon GBIC V23826-K305-C63 Discrete electro-optical components can be combined with auxiliary external circuits to address a much wider range of practical applications. Figure 4.7 shows a VCSEL diode (VCT-B85B20) from Lasermate [63]. Its threshold current varies around 2.5mA. For fast switching, it is a common practice to keep the bias current above the threshold so as to reduce laser turn-on delay and relaxation resonance. A common TO46-packaged VCSEL diode has a beam divergent angle of approximately 15°. In contrast, the output beam divergence of the VCT-B85B20 VCSEL diode is largely reduced to less than 2° by the dome lens cap integrated with the package as shown in Figure 4.7. This advantageous feature can substantially increase the propagation distance of the VCSEL beam within the optical waveguiding substrate as in optical centralized shared bus architecture. Figure 4.8 shows the measured optical output powers and DC drive currents of a typical VCT-B85B20 VCSEL diode at different voltages. Figure 4.7 Lasermate VCSEL VCT-B85B20 Figure 4.8 Optical output power/voltage vs. drive current of a typical VCT-B85B20 VCSEL diode The other constitutional piece of a complete transmitter is the laser driver. Figure 4.9 shows the schematic of the transmitter that was implemented by using the VCT-B85B20 VCSEL diode and the MAX3261 VCSEL driver from Maxim Integrated Products. The essential function of the MAX3261 VCSEL driver is to provide a stable bias current at the IBIASOUT pin, and to convert the differential PECL input voltages to the In+ and Inpins into the output modulation currents at the Out+ and Out- pins. Resistor RBIASSET adjusts the output bias current at the IBIASSET pin. Resistor RMODSET controls the amplitude of the output modulation currents at the Out+ and Out- pins. Resistor ROSADJ regulates the overshoot/undershoot for the best signal quality. The VCT-B85B20 VCSEL diode is connected to the Out- pin. Resistor R12 and inductor L1 are to compensate for the parasitic inductance of the VCT-B85B20 VCSEL diode package [64]. Monitor J2 is designed for use to measure the actual bias current. Connecting monitor J1 with a $50\Omega$ oscilloscope will balance resistor R10, which is also $50\Omega$ , so as to produce a total load of $25\Omega$ at the Out+ pin. Meanwhile, the modulation current of the VCT-B85B20 VCSEL diode ( $I_{mod}$ ) can be inferred in real time from the waveform displayed by the oscilloscope ( $V_{osc}$ ) as $$I_{\text{mod}} = V_{osc} / 25$$ (4.3) The resistor pairs of R3/R4 and R5/R6 provide the necessary PECL signal terminations at the In+ and In- pins. Figure 4.9 VCSEL driver using MAX3261 ### 4.4 Receiver Implementation Examples Inside an electro-optical interface module, a receiver implements the electrical-to-optical conversion. It consists of a photodiode, a pre-amplifier, and a post-amplifier. Usually, a photodiode and a pre-amplifier are integrated together inside the same package so as to reduce noise. The post-amplifier, which is also called limiting-amplifier, provides a constant-level output voltage interfacing with other logic devices on the PCB. There is also an integrated receiver inside a GBIC module. Meanwhile, a customized receiver was implemented by using the discrete electro-optical components including a VSC7810 photodiode/transimpedance pre-amplifier from Vitesse and a MAX3268 postamplifier from Maxim Integrated Products. Together with a transmitter and an optical transmission medium, a complete optical interconnection link can be constructed. These implementations, however, are only valid for the specific communication standards that encode all data in appropriate formats before they are sent out through the transmission links [65]. If the transferred data are not properly encoded, the post-amplifier will operate in an unstable state and oscillate, as exhibited by the PSPICE simulation in Figure 4.10. This phenomenon originates from the AC-coupling between the photodiode/pre-amplifier and post-amplifier inside the receiver. If a DC-coupling scheme is used, the sensitivity of the constructed receiver will be largely reduced, since it allows the low-frequency flicker noise from the first transistor in the photodiode/pre-amplifier to be completely applied to the high-gain post-amplifier [66]. Another serious problem associated with the DCcoupling scheme is the amplification of the DC offset. There occasionally exist small voltage drifts in the photodiode/pre-amplifier due to the temperature change. With DCcoupling, the high-gain post-amplifier can considerably magnify these small drifts. This amplified DC offset may saturate the post-amplifier and result in pulse-width distortion. Therefore, it becomes necessary that the photodiode/pre-amplifier should be AC-coupled to the post-amplifier inside the receiver. From the data pattern point of view, the ACcoupling scheme directly restricts the maximum allowable period of the logic '0' or '1' state so as to always retain the post-amplifier in a normal comparing state. Otherwise, the overrun of the logic '0' or '1' state allows the discrepancy between the two differential inputs to a post-amplifier to become too ambiguous, as simulated in Figure 4.10. In such a scenario, the post-amplifier cannot make any decisive judgment and simply oscillates. Therefore, all data must be appropriately encoded with restricted duty factor variation before they are sent out through the transmission links. Besides wasting the available link bandwidth, another apparent drawback of encoding data is the additional complexity of the sophisticated encoding function that must be added to the transmitter. Consequently, a corresponding decoding function must also be added to the receiver at the other end of the communication link. Figure 4.10 PSPICE simulation of post-amplifier oscillating By its very nature, the board-to-board data exchanges happen in a burst manner and involve only the unencoded data. The transferred data in the unencoded or burst-mode communications are often called arbitrary duty factor data, because they may remain in the logic '1' or '0' state for an indefinite time span, and thus have a duty factor that arbitrarily varies from 0% to 100%. In a straightforward interconnect manner without adding an intermediate encoding/decoding stack, the receiver in use must be capable of detecting whether a data in the interconnection link is changing from a high to low or low to high logic state. In other words, it needs to be an edge detector [65]. This special switching behavior can be obtained and managed by adding hysteresis to the comparator inside the post-amplifier. When hysteresis is properly added, as illustrated in Figure 4.11, the comparator switches its output logic states only upon detecting evident transitions in the input differential signals, and remains stable without any undesired oscillations after the original transitions disappear due to the AC-coupling and the differential inputs fall into the hysteresis band. Apparently, the trade-off of adding hysteresis to a comparator is the reduction of its resolution. Therefore, the actual amount of the hysteresis to be added to a comparator should be appropriated handled in reference to the power budget so as not to jeopardize the correct data transfers. Figure 4.11 Switching characteristic with and without hysteresis At relatively low data rates between 0 and 10Mbps, it is possible to fabricate a monolithic edge detector, e.g., the HFD3023 receiver from Honeywell. With the discrete electro-optical components, a customized edge detector, as shown in Figure 4.12, was implemented for constructing a complete optical interconnection link that is capable of handling arbitrary duty factor at a much higher speed. Inside this implemented receiver, the RSC-M85P406 GaAs PIN photodiode/AGC pre-amplifier from Lasermate was AC-coupled through capacitor C1 and C2 (0.1μF) to the MAX9013 precision post-amplifier from Maxim Integrated Products. The edge detection behavior of this receiver is exhibited by the PSPICE simulation result in Figure 4.13. In this simulation, the same data pattern at the same bit rate as in Figure 4.10 was used as the differential inputs, whereas this edge detector can correctly switches its output logic states in response to the data arriving without any undesired oscillations. Figure 4.12 Edge detector using MAX9013 Figure 4.13 PSPICE simulation of edge detection ### 4.5 Eye Diagram Measurement Eye diagram [67] measurement is a time-domain technique for the practical high-speed performance and bandwidth characterization. Figure 4.14 illustrates a typical eye diagram measurement setup. The device under test (DUT) may be a particular component or a specific system consisting of several modules. To characterize the performance of the DUT, a pseudo-random bit sequence (PRBS) at a prescribed bit rate from the pulse generator (HP8133A) is delivered as the input to the DUT. Along with the trigger signal from the pulse generator, the output response signal from the DUT is fed into the digital communication analyzer (HP83480A). By overlapping all generated signal sweeps, the digital communication analyzer displays a pattern that resembles the human eye, as illustrated in Figure 4.15. This characterization method is valid for both electrical and optical device/system tests. In particular, the procedure of the optical eye diagram measurement has been standardized as TIA/EIA-526-4-A [68]. Figure 4.14 Eye diagram measurement setup Figure 4.15 Eye diagram Upon the obtained eye diagrams, a variety of diagnoses and parametric measurements, including overshoot/undershoot, rise/fall time, jitter, zero crossing, Q factor, eye height, extinction ratio, eye height, and so on, can be performed [69]. For example, the Q factor is a measure of the signal to noise ratio (SNR) at the decision circuit. By assuming that the probability density function of the noise is a Gaussian distribution, the Q factor can be simply expressed as [70] $$Q = \left| \frac{\mu_1 - \mu_o}{\sigma_1 + \sigma_o} \right| \quad (4.4)$$ where $\mu_1$ and $\mu_o$ are respectively the average value of the measured logic '1' and '0' state in voltage, current, or optical power unit, $\sigma_1$ and $\sigma_o$ are respectively the standard deviation of the measured logic '1' and '0' state in voltage, current, or optical power unit, as illustrated in Figure 4.15. With the displayed eye pattern, the Q factor value can be adequately estimated from equation (4.4). In comparison to the quality test in which the Q factor is measured with only a sampling oscilloscope, a more accurate Q factor value can be obtained in an eye diagram measurement since the non-Gaussian distributed noises are essentially excluded by the statistical nature of overlapping all generated signal sweeps. The optimal decision level, as marked in Figure 4.15, is the median of $\mu_1$ and $\mu_o$ so as to maximize the available noise tolerance. When the decision circuit is set at the optimal level, the bit error rate (*BER*) can be estimated from the *Q* factor as [71] $$BER = erfc(Q/\sqrt{2})/2$$ $$\approx \frac{1}{Q\sqrt{2\pi}} \exp(-\frac{Q^2}{2}) \qquad (4.5)$$ In the public data communication networks, the acceptable BER is less than $10^{-12}$ , which requires a minimum tolerable Q factor value of approximately 7.0 as calculated from equation (4.5). Thus, the measured eye diagram can be used as a pass/fail criterion during the on-board quality evaluation. By taking in more factors, a comprehensive eye mask can be specifically defined for the routine device/system qualification test. With these empirical features, eye diagram measurement is being widely employed as a versatile tool for both characterizing the high-speed performance and identifying the possible causes of signal integrity degradations. The high-speed performance of the implemented electro-optical interface modules and the completely constructed optical data bus lines were characterized by performing eye diagram measurements at several data rates. By using the PRBS signals at the prescribed bit rates as the inputs to the implemented transmitter and directly connecting monitor J1, as indicated in Figure 4.9, to the digital communication analyzer, the electrical eye diagrams of the MAX3261 VCSEL driver were obtained up to a data rate of 1.25Gbps, as shown in Figure 4.16 (a) and (b). By coupling the PRBS-modulated light emitted from the VCT-B85B20 VCSEL diode to the internal optical detection module of the digital communication analyzer through an optical fiber cable, the optical eye diagrams of the implemented transmitter were obtained up to a data rate of 1.25Gbps, as shown in Figure 4.17 (a) and (b). By using the PRBS signals at the prescribed bit rates as the inputs to the MAX3268 post-amplifier and directly feeding its outputs to the digital communication analyzer, the electrical eye diagrams of the MAX3268 post-amplifier were obtained up to a data rate of 1.25Gbps, as shown in Figure 4.18 (a) and (b). Figure 4.16 (a) Electrical eye diagram of MAX3261 at 622Mbps Figure 4.16 (b) Electrical eye diagram of MAX3261 at 1.25Gbps Figure 4.17 (a) Optical eye diagram of MAX3261 with VCT-B85B20 at 622Mbps Figure 4.17 (b) Optical eye diagram of MAX3261 with VCT-B85B20 at 1.25Gbps Figure 4.18 (a) Electrical eye diagram of MAX3268 at 622Mbps Figure 4.18 (b) Electrical eye diagram of MAX3268 at 1.25Gbps By integrating the implemented electro-optical interface modules with the verified optical interconnect layer, the complete optical data bus line was constructed. From the transmitting end to the receiving end, it consists of a MAX3261 VCSEL driver, a VCT-B85B20 VCSEL diode, a fan-in waveguide hologram, an optical waveguiding substrate, a fan-out waveguide hologram, a VSC7810 photodiode/transimpedance pre-amplifier, and a MAX3268 post-amplifier. As shown in Figure 4.19, the high-speed performance of the entire optical interconnection link was characterized by measuring eye diagrams at several data rates. The PRBS signals at the prescribed bit rates from the pulse generator (HP8133A) were used as the inputs to the MAX3261 VCSEL driver in the transmitter. The PRBS-modulated light emitted from the VCT-B85B20 VCSEL diode propagated through the entire optical interconnect layer, and then projected into the VSC7810 photodiode/transimpedance pre-amplifier in the receiver. Along with the trigger signals from the pulse generator, the PECL outputs from the MAX3268 post-amplifier were fed into the digital communication analyzer (HP83480A). By overlapping all generated signal sweeps, the digital communication analyzer displayed the eye patterns, as shown in Figure 4.20 (a) and (b), which reflected the aggregate performance of the entire optical bus line. Figure 4.19 Optical interconnection link performance characterization by eye diagram Figure 4.20 (a) Eye diagram of optical interconnection link at 622Mbps Figure 4.20 (b) Eye diagram of optical interconnection link at 1.25Gbps ## 4.6 Summary In this chapter, the indispensable electro-optical interface modules were implemented, and their high-speed performance was characterized by using eye diagrams up to a data rate of 1.25Gbps. # **Chapter 5 System Demonstration Strategy** #### 5.1 Introduction One objective of this dissertation is to demonstrate the feasibility of the innovatively designed optical backplane bus in the practical scenarios. The most significant benefit of utilizing optical interconnects is the tremendous gain in the bandwidth capacity, which has been proved both theoretically and experimentally in many literatures. As identified in Chapter 3, the 3dB bandwidth of the implemented substrate-guided optical bus line is approximately 2.5THz, which is far beyond the capacity of any existing electro-optical transceiver. From the architecture point of view, however, the three fundamental optical methodologies defined in Chapter 1, optical waveguide interconnects, free-space optical interconnects, and substrate-guided optical interconnects, have a huge discrepancy in how effectively the obtained bandwidth gain would improve the overall data throughput at the board-to-board hierarchical level. The approaches that are based on optical waveguide or free-space optical interconnects provide only the point-to-point connectivity. As a result, the various proposed architectures are essentially an optical point-to-point switched backplane. On the other hand, the approaches that are based on substrate-guided optical interconnects can effectively fulfill the shared bus topology, in turn an optical backplane bus can be implemented. As compared in Chapter 1, optical shared backplane bus has many significant advantages over optical point-to-point switched backplane. As described in Chapter 2, optical centralized shared bus architecture is created based on substrate-guided optical interconnects. This innovatively designed optical backplane bus utilizes the beneficial physical characteristics of optics while at the same time retaining the desirable architectural properties of the shared bus topology. Therefore, it is projected that the bandwidth gain would be maximized. As demonstrated in Chapter 3, in contrast to the previously proposed optical shared bus architectures, this innovatively designed optical backplane bus can fulfill equalized bus fan-outs across the entire optical interconnect layer in an elegant manner. This significant merit can substantially ease the overall system integration in the practical implementations. Upon the successful physical characterizations of the shared bus layer, the next logical research topic is the demonstration at the system level by instantiating optical centralized shared bus architecture in the practical scenarios. The research prototypes presented in this dissertation were completely based on the existing systems, and the implemented optical centralized shared bus was used to as the physical data channel compatible with the original systems. As a result, the actual data transfers were still at the same bit rates as without using optical interconnects. Apparently, the compromise is that the improvement of the overall system performance cannot be exhibited. This drawback certainly does not render any doubt on the interconnect capability of optics, because the bandwidth capacity of the implemented optical centralized shared bus has been individually verified to be approximately 2.5THz. The scope of the system demonstration in this dissertation is to construct and verify the optical connectivity specified in optical centralized shared bus architecture in the practical scenarios. Meanwhile, during the actual building procedures of prototype demonstrators, various practical implementation issues can be identified. #### 5.2 Uniprocessing System The performance of the advanced microprocessors continues to improve at a rather rapid pace. For example, the on-chip frequency of the Intel Pentium 4 microprocessor has surpassed 3GHz in 2003, whereas the physical off-chip clock frequency is still not able to exceed 200MHz. In the electrical domain, this problem of computing speed outpacing interconnect capacity is becoming more and more severe. The conventional memory bus in use is an electrical shared bus, which has very limited bandwidth capacity due to many frequency-dependant physical effects as discussed in Chapter 1. In this special case, the architectural advantages of optical backplane bus over optical point-to-point switched backplane become more evident. In a uniprocessing system, optical centralized shared bus architecture can be directly applied to fulfill the critical microprocessor-to-memory interconnects. From the architecture point of view, the single microprocessor is a natural distributor, and thus the incorporation of an optical centralized shared bus may be carried out in a seamless manner ### 5.3 Multiprocessing System As multiprocessing attracts more interests, the demand on high-performance board-to-board interconnects becomes even more critical. One significant challenge in the design of a multiprocessing system is to effectively provide the communications among several parallel processes. In the multiprocessing domain, various topologies distinguished by their different architectural properties have been configured for connecting multiple microprocessors, memories, and/or switches [9]. As a result, applying optical centralized bus architecture with different multiprocessing schemes will have a huge discrepancy. In the centralized shared-memory multiprocessing system, multiple microprocessors share a single physical memory through a shared bus, and all cache controllers simultaneously monitor every data transaction proceeding on the shared memory bus. In this simple way, the critical cache coherence can be effectively maintained. As expected, the major bottleneck of this multiprocessing scheme originates from the restricted bandwidth capacity of the shared bus in the electrical domain. From the topology point of view, optical centralized shared bus architecture well fits in the centralized shared-memory multiprocessing model, and the shared memory is a natural distributor on the shared bus. ## **Chapter 6 Microprocessor-to-Memory Interconnect Demonstration** #### 6.1 Introduction To determine the feasibility of optical centralized shared bus architecture particularly in uniprocessing environments as discussed in Chapter 5, a research prototype [38], as shown in Figure 6.1, was constructed by using the off-the-shelf electronic components, the physically verified optical centralized shared bus demonstrated in Chapter 3, and the implemented electro-optical interface modules presented in Chapter 4. This demonstrator is referred to as prototype U1 in this dissertation. Although it definitely is more appealing to directly apply optical centralized shared bus architecture to the actual cache/DRAM interface, this project would be too costly and excessively time-consuming. Instead, in prototype U1 one Motorola 68HC812A4 microprocessor [72], as shown in Figure 6.2, was externally interfaced to multiple static memories, and the critical microprocessor-to-memory interconnects were fulfilled by an implemented optical centralized shared bus. Figure 6.1 Microprocessor-to-memory interconnect demonstration prototype Figure 6.2 Pin-out configuration of Motorola 68HC812A4 microprocessor The Motorola 68HC812A4 microprocessor has been being widely used as the microprocessor control unit (MCU) in a broad range of embedded applications. The core feature of this type of microprocessor is its ability to run in both single chip mode and expanded mode. The expanded mode allows the microprocessor to be connected with external memories and other peripherals. Background debug mode (BDM) is a special interface of the microprocessor that is intended for code loading and onboard debugging. This particular feature is highly desirable for the practical prototype development. The development board (ADAPT812) for the Motorola 68HC812A4 microprocessor provided by Technological Arts, as shown in Figure 6.3, was utilized to facilitate the construction of prototype U1. This board is a compact modular implementation of the Motorola 68HC812A4 microprocessor. The ADAPT812 board holds the oscillator circuit and chips that allow the microprocessor to be able to run in both single chip mode and expanded mode, and fully supports the BDM interface of the microprocessor. The BDM12 pod offered by Kevin Ross [73] was used between the BDM interface of the microprocessor and a host computer. With this powerful debugging aid, a developer can access registers, program EEPROM, and execute instructions step by step on the ADAPT812 board through the host computer. The test codes for the implemented demonstration prototype was developed on the host computer with the ICC12 IDE (Integrated Development Environment) from IMAGEcraft Creations, which is an ANSI C embedded system development tool for the Motorola 68HC812A4 microprocessor. Figure 6.3 ADAPT812 development board #### 6.2 Microprocessor-to-Memory Interface Design Figure 6.4 shows the connectivity diagram of the implemented demonstration prototype, consisting of one Motorola 68HC812A4 microprocessor and four external memory boards [38]. For the simplicity, only one memory board is drawn in this schematic. The data link between the microprocessor and the external memory boards was fulfilled by an optical centralized shared bus, whereas other interconnects, such as address and logic control lines, were implemented by the conventional electrical means. Figure 6.4 Connectivity diagram of prototype U1 The CY7C185 chips from Cypress Semiconductor were used to build the external memory boards in prototype U1. Figure 6.5 (a) shows the pin-out configuration of this chip. The CY7C185 memory is a high-performance 15ns CMOS SRAM organized as 8192 (8K) words by 8 bits, as illustrated by the logic block diagram in Figure 6.5 (b). The memory expansion is provided by the active-LOW chip enable ( $\overline{CE1}$ ), the active-HIGH chip enable ( $\overline{CE2}$ ), the active-LOW output enable ( $\overline{OE}$ ), and the internal three-state drivers. The input/output pins (I/O 0 - I/O 7) remain in a high-impedance state unless the chip is selected and the output is enabled. The active-LOW writing enable ( $\overline{WE}$ ) controls the type of the access to the memory, either write or read. When the $\overline{CE1}$ and $\overline{WE}$ inputs are both logic-LOW and the $\overline{CE2}$ input is logic-HIGH, the data delivered on the eight input/output pins (I/O 0 - I/O 7) are written into the memory location specified by the address presented on the address pins (A0 - A12). A memory read proceeds in a similar manner while the $\overline{WE}$ input remains logic-HIGH. Figure 6.5 (a) Pin-out configuration of CY7C185 SRAM Figure 6.5 (b) Logic block diagram of CY7C185 SRAM It is an on-going tendency to replace the wide but low-speed bit-parallel data lines with fewer but faster bit-serial data lines. Because an optical link possesses much higher capacity than an electrical wire, the advantage of the bit-serial approach becomes even more evident in effectively utilizing the available bandwidth. Thus, in prototype U1, an optical bit-serial link was implemented on the optical centralized shared bus to function as the data channel between the microprocessor and the external memory boards, as shown by the connectivity diagram in Figure 6.4. To simplify the hardware, a clockforwarding scheme was employed in prototype U1 without involving the complicated clock-recovery circuits. In this approach, the same clock signal was used for both the serialization at the transmitter end and the descrialization at the receiver end. The major concern of such a clock-forwarding scheme is to correctly align the data boundary with the clock cycle in spite of the fact that the logic devices in the real circuit may accumulate a non-negligible and chip-dependent delay. For this purpose, a unique linklayer protocol was devised and embedded in the hardware of the constructed demonstration prototype. The principle of this protocol is similar to the UART (Universal Asynchronous Receiver/Transmitter) protocol [74] that is commonly used to control a computer's interface to its attached serial devices. At the idle state, the optical bit-serial link presents logic-LOW. To transmit a data, as illustrated in Figure 6.6, the serializer shifts the input eight parallel bits out one by one and attaches a logic-HIGH bit as the header of the output serial data packet. This header bit signifies the beginning boundary of the actual data bits. Because all data packets have the same length of nine bits, the deserializer can determine the ending boundary of the received packet upon detecting its leading logic-HIGH bit. As illustrated in Figure 6.7, the deserializer stops shifting after exactly eight clock cycles when the decoded header bit blocks the further clocking, and thus the eight parallel bits describlized from the received data packet are stabilized on the output pins. In this manner, the clocking at the receiver end can be self-consistently synchronized with the incoming data packets so that the deserialization can be properly performed. To ensure that the implemented logic modules can correctly function as devised, the header-decoding latency in the real circuit, which is mainly caused by the gate delays of the logic chips in use, should be well less than one clock cycle. Thus, special attentions must be paid on the selection of the logic chips to be used in the practical hardware implementations. Figure 6.6 Schematic of serializer (8:1) with header encoding Figure 6.7 Schematic of deserializer (1:8) with header decoding ### **6.3** Microprocessor-to-Memory Interconnect Demonstration The microprocessor-to-memory interconnects were successfully demonstrated by implementing optical centralized shared bus architecture in the research demonstrator as shown in Figure 6.1. This demonstration prototype consists of one ADAPT12 board as shown in Figure 6.3, four external memory boards, one customized five-slot PCB with the electro-optical interface modules assembled on its backside as shown in Figure 6.8, and one physically verified optical centralized shared bus as exhibited in Figure 3.8. The microprocessor board was inserted into the central slot (#C) as the distributor, and the memory boards were respectively plugged into slot #A1, #A2, #B1, and B2. The data transfers between the microprocessor and the external memory boards were carried out in the implemented optical interconnect layer as shown by the connectivity diagram in Figure 6.4. Figure 6.8 Five-slot PCB with electro-optical interface modules To operate prototype U1, special test codes were developed and downloaded into the microprocessor through its BDM interface from the host computer. In one test operation, the microprocessor was instructed to write a data to the selected memory location, and then read it back from that memory location. At the end of this operation, the retrieved data was checked bit by bit to be identical with the predefined data. Thus, it proved that this data was successfully transferred on the optical centralized shared bus during the execution. In another test operation, prototype U1 was programmed to run in an infinite loop in which the microprocessor was instructed to write a data to the selected memory location, and then read it back from that memory location [38]. With prototype U1 executing, the serial data patterns on the optical centralized shared bus were measured at the monitor ports shown in Figure 6.1, which indicated the modulation currents of their corresponding VCSELs. By simultaneously displaying the measured serial data patterns on the screen of an oscilloscope, this special test visualized the implemented optical connectivity in prototype U1. Figure 6.9 (a) shows the experimental result when a hexadecimal data '0x56' was transferred between the microprocessor and memory board A1. Along with its leading synchronization bit, the serial data pattern of this packet presented on the optical centralized shared bus should be '101101010' in the timeincreasing order. The correct serial data pattern was observed as shown in Figure 6.9 (a), where channel 1 and 2 displayed, respectively, the VCSEL modulation current measured at monitor port C and A1. Figure 6.9 (b) shows the experimental result when a hexadecimal data '0xB2' was transferred between the microprocessor and memory board B2. The correct serial data pattern of this data, '101001101' in the time-increasing order, was also observed as shown in Figure 6.9 (b), where channel 1 and 2 displayed, respectively, the VCSEL modulation current measured at monitor port C and B2. With the same approach, the correct serial data patterns of the packets transferred between the microprocessor and memory board A2 and B1 were also respectively observed on the optical centralized shared bus. Therefore, the required optical connectivity for the microprocessor-to-memory interconnects in the implemented demonstration prototype was completely verified by these experimental tests. Figure 6.9 (a) Optical connectivity verification between microprocessor and memory A1 with a hexadecimal data of '0x56' Figure 6.9 (b) Optical connectivity verification between microprocessor and memory B2 with a hexadecimal data of '0xB2' #### 6.4 Discussions The Motorola 68HC812A4 microprocessor is developed for use as the MCU in various embedded applications, not as the engine in high-performance computing (HPC) systems or advanced signal-processing machines. Consequently, the speed capability of the microprocessor in use limits the overall performance of the implemented research demonstrator. Prototype U1 ran at 2MHz, and the optical serial data link operated at 16MHz. At the first glance, these figures seem to be so trivial that the use of optical interconnects might be superfluous. As discussed in Chapter 5, the advantages of optical interconnects and the merits of optical centralized shared bus architecture are prominent in the applications where the conventional electrical backplanes cannot provide the required enormous bandwidth capacity. From the proof-of-concept standpoint, however, the innovatively designed optical interconnect architecture was practically implemented and successfully applied to fulfill the critical microprocessor-to-memory interconnects in the real operations. As pointed out in Chapter 2 and verified in Chapter 3, this innovative optical interconnect architecture possesses a substantial merit, equalized bus fan-outs across the entire optical interconnect layer, which is intrinsically impossible in any previously proposed optical shared bus architecture. This feature is highly desirable from the system integration point of view. The extrinsic deficit in the implemented speed shall not shadow the significance of this demonstration that essentially proves the practical feasibility of optical centralized shared bus architecture. #### 6.5 Summary In this chapter, as a preliminary proof-of-concept demonstration, optical centralized shared bus architecture was applied to fulfill the critical microprocessor-to-memory interconnects in prototype U1. The required optical connectivity for the microprocessor-to-memory interconnects was completely verified by executing several specially designed test operations on the constructed research demonstrator. ## **Chapter 7 Centralized Shared-Memory System Demonstration** #### 7.1 Overview of Multiprocessor Programming Models As multiprocessing attracts more interests, the demand on high-performance board-to-board interconnects becomes even more critical. One significant challenge in the design of a multiprocessing system is to effectively provide the communications among several parallel processes distributed among multiple microprocessors. As discussed in Chapter 5, various topologies distinguished by their different architectural properties have been deigned for connecting multiple microprocessors, memories, and/or switches [9]. In turn, applying optical centralized bus architecture with different multiprocessing schemes will have a huge discrepancy in how effectively the obtained bandwidth gain would improve the overall multiprocessing performance. Figure 7.1 Centralized shared-memory multiprocessing model In the centralized shared-memory multiprocessing model, as illustrated in Figure 7.1, multiple microprocessors share a single physical memory through a shared bus. Because the centralized main memory has a uniform access time from each microprocessor, such a multiprocessing scheme is sometimes called UMAs for uniform memory access [9]. The term of shared-memory refers to the fact that the address space is shared, i.e., the same physical address on different microprocessors refers to the same location in the main memory. This shared address space can be used to communicate data implicitly via data load and store operations. The advantages of shared-memory communications include: - Ease of programming when the communication patterns among the microprocessors are complex or vary dynamically during execution. Also, this advantage simplifies the compiler design. - Lower communication overhead and better use of the available bandwidth. This arises from the implicit nature of sharing the information and the use of memory mapping to implement protection in hardware rather than through the operating system. - Capability of automatic caching of all data, both shared and private. Caching provides both decreased latency and reduced contention for accessing the shared data, and thus the frequency of remote communications can be minimized. Caching is a widely applied technique to improve system performance by taking the advantage of locality. The centralized shared-memory multiprocessing model supports the caching of both shared and private data. The private data are used only by a single microprocessor, while the shared data are accessible to multiple microprocessors. The communications among the microprocessors are essentially carried out through reads and writes of the shared data. When a private item is cached, its location is migrated to the cache, reducing the average access time as well as the memory bandwidth required. Since no other microprocessors use the private data, the program behavior is identical to that in a uniprocessor. When a shared item is cached, the shared value may be replicated in multiple caches. In addition to the reduction in the access latency and required memory bandwidth, this replication also decreases the contention that may exist for the shared items that are being accessed by multiple microprocessors at the same time. To ensure the consistency of shared-memory communications, all caches must be retained coherent. By snooping on the shared bus, which carries all actual data exchanges, all cache controllers can simultaneously monitor every memory access in the real time, and quickly determine whether or not they have a cached copy of the item being transferred. Accordingly, the cached copy may be invalidated or updated with the detected new value. In this manner, cache coherence can be effectively maintained [9]. As expected, in the electrical domain, the bottleneck of the centralized shared-memory multiprocessing model mainly originates from the restricted bandwidth capacity of the shared bus. ### 7.2 Centralized Shared-Memory System on Optical Centralized Shared Bus As discussed in Chapter 5, from the topology point of view, optical centralized shared bus architecture well fits in the centralized shared-memory multiprocessing model. The shared memory is a natural distributor on the shared bus, and thus the incorporation of an optical centralized shared bus can be carried out in a seamless manner. Meanwhile, the bandwidth capacity of the shared bus, which carries all actual data transfers, can be substantially enhanced. Therefore, it is projected that the obtained bandwidth gain would be maximized for the improvement of the overall multiprocessing performance. #### 7.3 PCI over Optical Centralized Shared Bus Architecture To fully demonstrate the conceptual feasibility of the innovatively designed optical centralized shared bus architecture in multiprocessing environments, it is definitely appealing to directly instantiate it in a real centralized shared-memory multiprocessing system. However, such a project is certainly peculiar to a specific microprocessor, and would be too much involved to fit in a doctoral dissertation. Therefore, as only a preliminary effort, the optical centralized shared bus implemented in Chapter 3 was interfaced to a generic PCI subsystem to provide optical data transfers in the PCI format, and the centralized shared-memory multiprocessing scheme was partially emulated on this optically-equipped PCI subsystem. PCI stands for Peripheral Component Interconnect. It was first initiated in 1992 with the objective to ensure the fast communications between the peripheral devices and the microprocessor. A comprehensive explanation of PCI system architecture can be found in Reference [75]. PCI defines a local bus architecture that is not specific to any particular processor. As illustrated in Figure 7.2, the host PCI Bridge, which is also frequently referred to as the North Bridge, connects one or multiple microprocessors to the root PCI bus. In addition, a chipset may support more than one North Bridge. The use of the North Bridge isolates the general-purpose PCI local bus from the specific processor bus. The PCI bus can be populated with the adapters and add-in cards that require fast accesses to each other and/or main memory at a rate approaching the full processor bus speed. Figure 7.2 PCI subsystem diagram Similar to the Direct Memory Access (DMA) mode on an ISA bus, the PCI data transfers can proceed without invoking any CPU action. There are two participants in every PCI data transfer phase: master, or called initiator, and target. The master is the device that initiates the data transfer, and the target is the device that is addressed by the master for the purpose of performing the data transfer. It is very important to note that the PCI data transfers can be accomplished using burst transfers. A burst transfer consists of a single address phase followed by two or more data phases, and the master has to arbitrate for the bus ownership only one time for the whole block of the data to be transferred. Thus, the overhead is largely reduced, and the available local bus bandwidth can be fully utilized. During the address phase, the start address and transaction type are issued in a broadcast manner. The target device latches the start address into an address counter, claims the transaction, and is responsible for incrementing the address from one data phase to the next. As the master becomes ready to transfer each data item, it informs the target whether or not it is the last one, and the entire PCI burst transaction completes when the final data item has been transferred. Figure 7.3 PCI data transfer timing diagram Figure 7.3 is the timing diagram of a typical PCI burst transfer captured in the real time by a PCI bus analyzer card [76]. From this timing diagram, the basic PCI protocol can be easily explained. When the FRAME\_ signal is first asserted, the master broadcasts its intention of initiating a transaction on the shared bus. During the first PCI clock cycle, which is defined as the address phase, the master announces on the shared bus the start address of the data and the command that indicates the transaction type. All PCI devices latch the address and command on the rising edge of the second PCI clock and begin the decoding process. The currently addressed target claims the transaction by asserting the DEVSL\_ signal. In the burst transfer mode, the address phase is followed by multiple data phases. By asserting the IRDY\_ signal, the master indicates its readiness for the current data phase. By asserting the TRDY\_ signal, the target indicates its readiness for the current data phase. The current data phase completes when both IRDY\_ and TRDY\_ are sampled asserted. Otherwise, wait states are inserted into the current data phase until both IRDY\_ and TRDY\_ are sampled asserted. The master indicates the final data phase by deasserting the FRAME\_ signal. When the overall burst transfer has been completed, the target deasserts the TRDY\_ and DEVSL\_ signal, and the master deasserts the IRDY\_ signal, returning the bus to the idle state. The REQ64\_ signal, which has the same timing and duration as the FRAME\_ signal, and the ACK64\_ signal, which has the same timing and duration as the DEVSL\_ signal, are the 64bit PCI extensions. The electrical PCI bus line handles bi-directional signal transmissions. Meanwhile, as stated above, the PCI protocol indicates the actual data transfer direction in an implicit manner through several PCI control signals. In contrast, a pair of transmitter and receiver only establishes a one-way signal link in the optical domain. In order to bridge this gap, a customized electro-optical transceiver, as illustrated in Figure 7.4 (a), was equipped with a PCI-interpreting logic circuit, as shown in Figure 7.4 (b). By integrating such an electro-optical module with each bi-directional PCI I/O, optical centralized shared bus architecture can be fully interfaced to a generic PCI subsystem. The essential function of the logic circuit in Figure 7.4 (b) is to interpret the pertinent PCI control signals presented on the shared bus, determine the data transfer direction in reference to the PCI I/O it integrated with, and generate the right control signals, TACTIVE and RACTIVE, which coordinates the operations of the transmitter, receiver, and tri-state buffers within the electro-optical module as shown in Figure 7.4 (a). Besides the FRAME, IRDY, and TRDY signals, the logic circuit also acquires the logic states of the GNT and CBE0 signals. The GNT signal, which is a device-specific rather than bus signal, indicates the ownership of the shared bus at the beginning of the transaction and may change before the overall transaction is completed. During the address phase, the CBE0 signal indicates the transaction type, either '0' for read or '1' for write, whereas has completely different meaning during the following data phases. Thus, the logic states of the GNT and CBE0 signals must be acquired only during the address phase, which explains the use of the FRAME signal as the triggering clock to the flip-flop that latches the logic states of the GNT and CBE0 signals, as shown in Figure 7.4 (b). Figure 7.5 (a), (b), (c), and (d) simulates the generated control signals, TACTIVE and RACTIVE, by the logic circuit in the case of, respectively, PCI data write at the master end, PCI data read at the master end, PCI data write at the target end, and PCI data read at the target end. Figure 7.4 (a) PCI electro-optical interface module Figure 7.4 (b) PCI electro-optical interface logic circuit Figure 7. 5 (a) PCI data write at master (GNT\_ '0' and CBE0 '1') Figure 7. 5 (b) PCI data read at master (GNT\_ '0' and CBE0 '0') Figure 7.5 (c) PCI data write at target (GNT\_ '1' and CBE0 '1') Figure 7.5 (d) PCI data read at target (GNT\_ '1' and CBE0 '0') ### 7.4 Centralized Shared-Memory System Demonstration Figure 7.6 Configuration of prototype E1 The centralized shared-memory multiprocessing scheme was emulated upon the constructed PCI subsystem partially equipped with a fabricated optical centralized shared bus. This research emulator is referred to as prototype E1 in this dissertation. As shown in Figure 7.6, the electrical part of prototype E1 consists of a passive PCI backplane [77], a 1.2GHz Intel Tualatin CPU card [78], a 128MB PCI memory card [79], and a Gigabit Ethernet network interface card (NIC). The CPU card, PCI memory card, and NIC reside on the passive PCI backplane along with a PCI bus analyzer card [76], which captures all bus activities and displays them on a logic analyzer in the real time. The SDRAM chips on the CPU card function as the private memory in the real centralized shared-memory multiprocessing scheme. The PCI memory card emulates the centralized shared memory. The NIC is directly connected to another workstation's NIC through a RJ-45 crossover cable. In this way, this card can function as an asynchronous data agent, emulating one more processor besides the real one on the CPU card. The CPU card, PCI memory card, and NIC are capable to request for the ownership of the PCI subsystem and then initiate data transactions as the master, which makes it possible to emulate the centralized shared-memory multiprocessing scheme on this PCI subsystem. The schematic in Figure 7.7 illustrates the configuration of the PCI subsystem that emulates the centralized shared-memory multiprocessing scheme. Figure 7.7 Emulation of centralized shared-memory multiprocessing on PCI The operating system (OS) of prototype E1 is RedHat Linux 7.3 with kernel 2.4.19 [80]. The Linux kernel deals with all devices in a uniform way and accesses them through the same interface, and a device driver can be dynamically loaded into or removed from the running kernel without rebooting the system. During the demonstration on prototype E1, the modules in the running kernel are listed in Figure 7.8. The e1000 module is the device driver for the Gigabit Ethernet NIC, and the umem module for the PCI memory card. After mounting a file system on the PCI memory card through the umem module, the PCI memory card is treated as a RAM disk from the OS point of view, as shown in Figure 7.9. Both the CPU card and NIC can access to this RAM disk via the same file system interface. On the PCI bus, the data transactions emulate the communications in the centralized shared-memory multiprocessing scheme. Figure 7.8 Running kernel modules on prototype E1 Figure 7.9 PCI memory card as RAM disk As a preliminary effort, only PCI bus line AD02 was replaced by the optical interconnection link while the other metal wires on the passive PCI backplane were remained. Figure 7.10 shows the real-time signal waveforms that were captured on PCI bus line AD02 during a series of PCI burst transfers. Channel 1 displays the modulation current of the VCSEL at the transmitter side. Channel 2 displays the output AD02 signal at the receiver side. The result verifies the correct optical connectivity replacing the original PCI bus line. Figure 7.10 Optical interconnection link of PCI bus line AD02 #### 7.5 Discussions Prototype E1 is only at its very early development stage, and more work need to be done towards fully functioning. The PCI subsystem was only partially equipped with the fabricated optical centralized shared bus. Meanwhile, the PCI subsystem ran at 33MHz. At such a low speed, the use of optical interconnects seems superfluous. Furthermore, the constructed demonstrator is at most an emulator rather than a real centralized shared-memory multiprocessing system. Putting these drawbacks aside, the significance of this practical demonstration is that the first step is initiated to verify the conceptual feasibility of optical centralized shared bus architecture in multiprocessing systems. Also, since PCI system architecture is a not peculiar to any specific microprocessor, the construction of prototype E1 can focus only at the architectural aspects of interests without involving the complicated processor-dependant implementation issues, and the generic demonstrations on prototype E1 will not simply be obsolete when the advanced microprocessors upgrade at a rapid rate from one generation to the next. ## 7.6 Summary In this chapter, as a preliminary effort, the centralized shared-memory multiprocessing scheme was emulated upon a constructed PCI subsystem that was partially equipped with a fabricated optical centralized shared bus. # **Chapter 8 Summary and Recommendations for Future Work** #### 8.1 Summary Optics is distinguished for its interconnect capability. The most significant benefit of utilizing optical interconnects is the tremendous gain in the bandwidth capacity. To meet the ever-increasing demand on bandwidth, a variety of optical interconnect technologies have been successfully employed in the real applications where the approaches that are exclusively based on electrical interconnects have become insufficient, and the boundary demarcating the electrical and optical domain is being further pushed down in the interconnect hierarchy. An imminent bottleneck at the board-to-board hierarchical level is projected mainly due to the physical limitations of electrical interconnects. Accordingly, an opportunity exists for the continuing exploitation of optics to complement or even replace the conventional electrical backplanes. This dissertation is dedicated to the investigation on how to effectively utilize optical interconnects to expedite the data transfers among all daughter boards within a box. In Chapter 1, two basic backplane topologies, shared bus and switched medium, and three fundamental optical methodologies at the board-to-board hierarchical level, optical waveguide interconnects, free-space optical interconnects, and substrate-guided optical interconnects, were overviewed. The approaches that are based on optical waveguide or free-space interconnects provide only the point-to-point topology, in turn the various proposed architectures are essentially an optical point-to-point switched backplane. In contrast, the approaches that are based on substrate-guided optical interconnects can effectively fulfill the shared bus topology, and thus an optical backplane bus can be implemented. The comparative examinations specifically pointed out that optical backplane bus has many considerable advantages over optical point-to-point switched backplane. Since the major bottleneck of the electrical backplane bus originates from the limited bandwidth capacity, an innovative technology in the optical domain that can provide sufficient bandwidth capacity while at the same time retaining the essential merits of the shared bus topology is highly desirable. As a result, the substrate-guided optical interconnect methodology was distinguished out because of the beneficial potential of creating an optical backplane bus. In Chapter 2, a new optical interconnect architecture, optical centralized shared bus, was introduced based on substrate-guided optical interconnects. This unique optical backplane architecture effectively fulfills both broadcastability and bi-directionality of signal flows on the shared bus. The most attractive feature of this innovation is its ability to achieve equalized bus fan-outs on the shared media. This merit can considerably save the overall power budget, and thus is highly desirable from the system integration point of view. In Chapter 3, a systematic approach was employed to develop the optimal recording schedules, and high-quality waveguide holograms were fabricated as specified by optical centralized shared bus architecture. The equalized bus fan-outs were successfully demonstrated across the entire implemented optical interconnect layer. In Chapter 4, the indispensable electro-optical interface modules were implemented, and their high-speed performance was characterized by using eye diagrams up to a data rate of 1.25Gbps. In Chapter 5, the strategy of verifying the conceptual feasibility of optical centralized shared bus architecture was addressed respectively in uniprocessing and multiprocessing systems. In Chapter 6, optical centralized shared bus architecture was applied to fulfill the critical microprocessor-to-memory interconnects, and the required optical connectivity for the microprocessor-to-memory interconnects was completely verified by executing several specially developed test operations on the constructed research demonstrator. In Chapter 7, the centralized shared-memory multiprocessing scheme was emulated upon a constructed PCI subsystem that was partially equipped with a fabricated optical centralized shared bus. #### 8.2 Recommendations for Future Work There are certainly many open issues that need to be addressed before optical interconnects see success at the board-to-board hierarchical level, but only one will be briefly addressed here. Another topological deficit of the shared bus topology originates from the medium access control (MAC) manner. Only one daughter board can deliver data on the shared medium at a time, and thus the shared bus topology possesses little parallelism. The basic architectural design illustrated in Figure 2.4 may be extended in certain ways to address this issue. Just as one possible approach, a bit-interleaved scheme may be utilized, as illustrated in Figure 8.1. Figure 8.1 Full-access bit-interleaved optical backplane Different from Figure 2.4, in the central module herein one photodiode is exclusively dedicated to each regular daughter board on the backplane in a point-to-point fashion. Thus, there are N photodiodes in the central module if the total number of the regular daughter boards is N, which is of the minimum complexity required to enable fullaccessibility to the central distributor. Because there is a central photodiode exclusively dedicated to each regular daughter board on the backplane in a point-to-point fashion, unlike the conventional shared bus, this backplane is fully accessible, i.e., any board can deliver data onto the backplane at any time. Therefore, no time is wasted for any daughter board waiting for the delivery bus to return to the idle state. This merit can substantially increase the aggregate bandwidth to the degree comparable with the switched backplane. Meanwhile, these dedicated optical access links do not experience any unnecessary bus fan-outs, considerably saving the system power budget. Also different from Figure 2.4, the distributor has a bit interleaver embedded on it. At the proper interleaving rate, the bit interleaver assembles a "super bit" by taking a bit in a round-robin manner from the delivered bit sequence by each daughter board, and then this bit-interleaved data stream is transferred to all regular daughter boards on the backplane in a broadcast fashion. Distinguished by the bit sampler right after the receiver module, there are two types of daughter boards. The one that is intended to be aware of the multicast/broadcast actions on the backplane must be equipped with a bit sampler that samples the incoming optical signals at the interleaving rate. On the other hand, the daughter board that does not concern itself with any multicast/broadcast actions on the backplane may simply sample the incoming optical signals at its own bit rate. With such a bit-interleaving scheme along with the shared bus topology, the enormous bandwidth capacity of optical interconnects is effectively utilized to accommodate multiple data transactions simultaneously without competing for the shared media. Meanwhile, the effectiveness in broadcast, which is the essential merit of the shared bus topology, is retained in this architecture. ### **Bibliography** - [1] Ronald A. Nordin, A. F. J. Levi, Richard N. Nottenburg, J. O'Gorman, T. Tanbun-Ek, and Ralph A. Logan, "A system perspective on digital interconnection technology," *IEEE Journal of Lightwave Technology*, vol. 10, no. 6, pp. 811-827, June 1992. - [2] Ronald A. Nordin, William R. Holland, and Muhammed A. Shahid, "Advanced optical interconnection technology in switching equipment," *IEEE Journal of Lightwave Technology*, vol. 13, no. 6, pp. 987-994, June 1995. - [3] Cray-Dell PowerEdge Xeon Cluster (LoneStar), Texas Advanced Computing Center (TACC), http://www.tacc.utexas.edu - [4] NX64000 Switch/Router, Lucent Technologies. - [5] A. F. J. Levi, "Optical interconnects in systems," *Proceedings of the IEEE*, vol. 88, no. 6, pp. 750-757, June 2000. - [6] International Technology Roadmap for Semiconductors (ITRS), 2001 Edition, http://public.itrs.net/Files/2001ITRS/Home.htm - [7] Joseph W. Goodman, "Computer optical interconnects," *Conference Digest of LEOS Summer Topical on Optical Multiple Access Networks*, pp. 68, July 1990. - [8] Holger Karstensen, Christian Hanke, Martin Honsberg, Jörg-Reinhardt Kropp, Jörg Wieland, Markus Blaser, Peter Weger, and Joseph Popp, "Parallel optical interconnection for uncoded data transmission with 1 Gb/s-per-channel capacity, high dynamic range, and low power consumption," *IEEE Journal of Lightwave Technology*, vol. 13, no. 6, pp. 1017-1030, June 1995. - [9] John L. Hennessy, David A. Patterson, "Computer architecture: a quantitative approach," Second Edition, Chapter 8, Morgan Kaufmann Publishers, August 1995. - [10] Larry L. Peterson, Bruce S. Davie, "Computer networks: a systems approach," Second Edition, Chapter 4, Morgan Kaufmann Publishers, October 1999. - [11] "The Intel Pentium 4 processor product overview," Intel. - [12] 10 Gigabit Ethernet Alliance, http://www.10gea.org - [13] Fiber Channel Industry Association (FCIA), http://www.fibrechannel.org - [14] RapidIO Trade Association, http://www.rapidio.org - [15] InfiniBand Trade Association, http://www.infinibandta.org - [16] PCI Special Interest Group (PCI-SIG), http://www.pcisig.com - [17] Dawei Huang, Theresa Sze, Anders Landin, Rick Lytel, and Howard L. Davidson, "Optical interconnects: out of the box forever?" *IEEE Journal on Selected Topics in Quantum Electronics*, vol. 9, no. 2, pp. 614-623, March/April 2003. - [18] Joseph W. Goodman, Frederick J. Leonberger, Sun-Yuan Kung, and Ravindra A. Athale, "Optical interconnections for VLSI systems," *Proceedings of the IEEE*, vol. 72, no. 7, pp. 850-866, July 1984. - [19] Michael R. Feldman, Sadik C. Esener, Clark C. Guest, and Sing H. Lee, "Comparison between optical and electrical interconnects based on power and speed characteristics," *Applied Optics*, vol. 27, no. 9, pp. 1742-1751, May 1988. - [20] Thomas J. Cloonan, "Comparative study of optical and electronic interconnection technologies for large asynchronous transfer mode packet switching applications," *Optical Engineering*, vol. 33, no. 5, pp. 1512-1523, May 1994. - [21] Jaemin Shin, Chung-Seok Seo, Ananthasayanam Chellappa, Martin Brooke, Abhijit Chatterjee, and Nan M. Jokerst, "Comparison of electrical and optical interconnect," *Proceedings of the 53rd Electronic Components and Technology Conference*, pp. 1067-1072, May 2003. - [22] Gicherl Kim, Ray T. Chen, "Three-dimensionally interconnected bidirectional optical backplane," *IEEE Photonics Technology Letters*, vol. 11, no. 7, pp. 880-882, July 1999. - [23] Efstathios D. Kyriakis-Bitzaros, Nikos Haralabidis, M. Lagadas, Alexandros Georgakilas, Y. Moisiadis, and George Halkias, "Realistic end-to-end simulation of the optoelectronic links and comparison with the electrical interconnections for system-on-chip applications," *IEEE Journal of Lightwave Technology*, vol. 19, no. 10, pp. 1532-1542, October 2001. - [24] Hideo Kosaka, Kaori Kurihara, Atsuko Uemura, Takashi Yoshikawa, Ichiro Ogura, Takahiro Numai, Mitsunori Sugimoto, and Kenichi Kasahara, "Uniform characteristics with low threshold and high efficiency for a single-transverse-mode vertical-cavity surface-emitting laser-type device array," *IEEE Photonics Technology Letters*, vol. 6, no. 3, pp. 323-325, March 1994. - [25] S. Eitel, S. J. Fancey, H. P. Gauggel, K. H. Gulden, W. Bächtold, and M. R. Taghizadeh, "Highly uniform vertical-cavity surface-emitting lasers integrated with microlens arrays," *IEEE Photonics Technology Letters*, vol. 12, no. 5, pp. 459-461, May 2000. - [26] Thomas C. Banwell, Ann C. Von Lehmen, and Robert R. Cordell, "VCSE laser transmitters for parallel data links," *IEEE Journal of Quantum Electronics*, vol. 29, no. 2, pp. 635-644, February 1993. - [27] Michael Lebby, Craig A. Gaw, Wenbin Jiang, P. A. Kiely, Chan Long Shieh, P. R. Claisse, Jamal Ramdani, Davis H. Hartman, Daniel B. Schwartz, and Jerry Grula, "Characteristics of VCSEL arrays for parallel optical interconnects," *Proceedings of the 46th Electronic Components and Technology Conference*, pp. 279-291, May 1996. - [28] Louay Eldada, Chengzeng Xu, Kelly M. T. Stengel, Lawrence W. Shacklette, and James T. Yardley, "Laser-fabricated low-loss single-mode raised-rib waveguiding devices in polymers," *IEEE Journal of Lightwave Technology*, vol. 14, no. 7, pp. 1704-1713, July 1996. - [29] Takashi Sakamoto, Hiroyuki Tsuda, Makoto Hikita, Toshiaki Kagawa, Kouta Tateno, and Chikara Amano, "Optical interconnection using VCSELs and polymeric waveguide circuits," *IEEE Journal of Lightwave Technology*, vol. 18, no. 11, pp. 1487-1492, November 2000. - [30] Yuzo Ishii, Shinji Koike, Yoshimitsu Arai, and Yasuhiro Ando, "SMT-compatible optical-I/O chip packaging for chip-level optical interconnects," *Proceedings of the 51st Electronic Components and Technology Conference*, pp. 870-875, June 2001. - [31] Yujie Liu, Lei Lin, Chulchae Choi, Bipin Bihari, and Ray T. Chen, "Optoelectronic integration of polymer waveguide array and metal-semiconductor-metal photodetector through micromirror couplers," *IEEE Photonics Technology Letters*, vol. 13, no. 4, pp. 355-357, April 2001. - [32] "Optical flex circuitry," STRATOS Lightwave. - [33] John A. Neff, Christine Chen, Tim McLaren, Chong-Chang Mao, Adam Fedor, Wes Berseth, Y. C. Lee, and Valentin Morozov, "VCSEL/CMOS smart pixel arrays for free-space optical interconnects," *Proceeds of the 3rd International Conference on Massively Parallel Processing using Optical Interconnections*, pp. 282-289, October 1996. - [34] Yue Liu, E. M. Strzelecka, J. Nohava, M. K. Hibbs-Brenner, and Elias Towe, "Smart-pixel array technology for free-space optical interconnects," *Proceedings of the IEEE*, vol. 88, no. 6, pp. 764-768, June 2000. - [35] Karl-Heinz Brenner, Frank Sauer, "Diffractive-reflective optical interconnects," *Applied Optics*, vol. 27, no. 20, pp. 4251-4254, October 1988. - [36] Srikanth Natarajan, Chunhe Zhao, and Ray T. Chen, "Bi-directional optical backplane bus for general purpose multi-processor board-to-board optoelectronic interconnects," *IEEE Journal of Lightwave Technology*, vol. 13, no. 6, pp. 1031-1040, June 1995. - [37] Jang-Hun Yeh, Raymond. K. Kostuk, and Kun-Yii, Tu, "Hybrid free-space optical bus system for board-to-board interconnections," *Applied Optics*, vol. 35, no. 32, pp. 6354-6364, November 1996. - [38] Xuliang Han, Gicherl Kim, G. Jack Lipovski, and Ray T. Chen, "An optical centralized shared-bus architecture demonstrator for microprocessor-to-memory interconnects," *IEEE Journal on Selected Topics in Quantum Electronics*, vol. 9, no. 2, pp. 512-517, March/April 2003. - [39] Gicherl Kim, Xuliang Han, and Ray T. Chen, "A method for rebroadcasting signals in an optical backplane bus system," *IEEE Journal of Lightwave Technology*, vol. 19, no. 7, pp. 959-965, July 2001. - [40] Steven K. Case, "Coupled-wave theory for multiply exposed thick holographic gratings," *Journal of the Optical Society of America*, vol. 65, no. 6, pp. 724-729, June 1975. - [41] Jang-Hun Yeh, Raymond K. Kostuk, "Substrate-mode holograms used in optical interconnects: design issues," *Applied Optics*, vol. 34, no. 17, pp. 3152-3164, June 1995. - [42] Harry Dutton, "Understanding optical communications," First Edition, pp. 261, http://www.redbooks.ibm.com, September 1998. - [43] Timothy M. Pinkston, "Design considerations for optical interconnects in parallel computers," *Proceedings of the 1st International Workshop on Massively Parallel Processing Using Optical Interconnections*, pp. 306-322, Cancun Mexico, April 1994. - [44] Thomas K. Gaylord, M. G. Moharam, "Analysis and applications of optical diffraction by gratings," *Proceedings of the IEEE*, vol. 73, no. 5, pp. 894-937, May 1985. - [45] Herwig Kogelnik, "Coupled wave theory for thick hologram gratings," *The Bell System Technical Journal*, vol. 48, no. 9, pp. 2909-2947, November 1969. - [46] B. Benlarbi, D. J. Cooke, and L. Solymar, "Higher order modes in thick phase gratings," *Optica Acta*, vol. 27, no. 7, pp. 885-895, 1980. - [47] William J. Gambogi, Andrew M. Weber, and T. John Trout, "Advances and applications of DuPont holographic photopolymers," *Proceedings of SPIE*, vol. 2043, pp. 2-13, 1993. - [48] Ray T. Chen, Suning Tang, Maggie M. Li, David Gerald, and Srikanth Nataraja, "1-to-12 surface normal three dimensional optical interconnections," *Applied Physics Letters*, vol. 63, no. 14, pp. 1883-1885, October 1993. - [49] Guoheng Zhao, Pantazis Mouroulis, "Diffusion model of hologram formation in dry photopolymer materials," *Journal of Modern Optics*, vol. 41, no. 10, pp. 1929-1939, 1994. - [50] Xuliang Han, Gicherl Kim, and Ray T. Chen, "Accurate diffraction efficiency control for multiplexed volume holographic gratings," *Optical Engineering*, vol. 41, no. 11, pp. 2799-2802, November 2002. - [51] Raymond K. Kostuk, "Dynamic hologram recording characteristics in DuPont photopolymers," *Applied Optics*, vol. 38, no. 8, pp. 1357-1363, March 1999. - [52] Jian Liu, Chunhe Zhao, Richard Lee, and Ray T. Chen, "Cross-link optimized cascaded volume hologram array with energy-equalized one-to-many surface-normal fan-outs," *Optics Letters*, vol. 22, no. 13, pp. 1024-1026, July 1997. - [53] Gicherl Kim, Xuliang Han, and Ray T. Chen, "An 8-Gb/s optical backplane bus based on microchannel interconnects: design, fabrication, and performance measurements," *IEEE Journal of Lightwave Technology*, vol. 18, no. 11, pp. 1477-1486, November 2000. - [54] Gicherl Kim, Xuliang Han, and Ray T. Chen, "Crosstalk and interconnection distance considerations for board-to-board optical interconnects using 2-D VCSEL and microlens array," *IEEE Photonics Technology Letters*, vol. 12, no. 6, pp. 743-745, June 2000. - [55] T. Milster, W. Jiang, E. Walker, D. Burak, P. Claisse, P. Kelly, and R. Binder, "A single-mode high-power vertical cavity surface emitting laser," *Applied Physics Letters*, vol. 72, no. 26, pp. 3425-3427, June 1998. - [56] Chulchae Choi, Lei Lin, Yujie Liu, and Ray T. Chen, "Performance analysis of 10µm-thick VCSEL array in fully embedded board level guided-wave optoelectronic interconnects," *IEEE Journal of Lightwave Technology*, vol. 21, pp. 1531-1535, June 2003. - [57] "High-speed board design techniques," Application Note, Vantis, August 1997. - [58] AppCAD-RF Applications Software, Agilent. - [59] "Designing with PECEL (ECL at +5.0V)," Application Note AN1406, Motorola. - [60] "Interfacing to PECL optical transceivers," Application Note 1173, Agilent. - [61] "SFF committee proposed specification for GBIC (Gigabit Interface Converter)," Revision 5.5, http://playground.sun.com/pub/OEmod, September 2000. - [62] V23826-K305-Cxx/Cxx, Multimode 850nm 1.0625GBd Fiber Channel 1.3 Gigabit Ethernet 1x9 Transceiver, Infineon. - [63] VCT-B85B20 VCSEL Diode, Lasermate, http://www.lasermate.com/vctbb.htm - [64] "Interfacing Maxim laser drivers with laser diodes," Application Note HFAN-02.0, Maxim Integrated Products. - [65] "Inexpensive dc to 32MBd fiber-optic solutions for industrial, medical, telecom, and proprietary data communication applications," Application Note 1121, Agilent. - [66] "Inexpensive 2 to 70MBd fiber-optic solutions for industrial, medical, telecom, and proprietary data communication applications," Application Note 1122, Agilent. - [67] John G. Proakis, "Digital communications," Third Edition, pp. 541, McGraw-Hill, March 1995. - [68] "OFSTP-4 optical eye pattern measurement procedure," TIA/EIA-526-4-A, TIA (Telecommunications Industry Association), http://www.tiaonline.org - [69] Dennis Derickson, "Fiber optic test and measurement," Chapter 8, Prentice Hall, January 1998. - [70] "Q factor measurement/eye diagram measurement, SDH/SONET pattern editing," Application Note, Anritsu. - [71] Neal S. Bergano, F. W. Kerfoot, and C. R. Davidson, "Margin measurements in optical amplifier system," *IEEE Photonics Technology Letters*, vol. 5, no. 3, pp. 304-306, March 1993. - [72] http://www.seattlerobotics.org/encoder/jan97/The68HC12.html - [73] http://www.kevinro.com - [74] Jonathan W. Valvano, "Embedded microcomputer systems: real time interfacing," pp. 351, Brooks-Cole Publishers, January 2000. - [75] Tom Shanley, Don Anderson, "PCI system architecture," Fourth Edition, Addison-Wesley Longman, August 1999. - [76] PI-PCI64 PCI Bus Analyzer/Logic Analyzer Preprocessor, Corelis. - [77] PCI64-14S 64-bit/66MHz PICMG Backplane, Armorlink. - [78] ROCKY-3706 EV Single Board Computer, Armorlink. - [79] MM-5415CN 128MB Non-Volatile SDRAM PCI Memory Card, Micro Memory. - [80] The Linux Kernel Archives, http://www.kernel.org Vita Xuliang Han was born in Tianjin, China, on June 22, 1976, the son of Guanyun Han and Guilan Zhang. After completing his education at Nankai Middle School, Tianjin, China, in 1994, he entered the Department of Electronic Engineering at Tsinghua University, Beijing, China. He received the degree of Bachelor of Science from Tsinghua University in July 1999. In August 1999, he joined the Department of Electrical and Computer Engineering at The University of Texas at Austin. He received the degree of Master of Science in Engineering from The University of Texas at Austin in August 2001. His current research topics cover massively parallel processing using optical interconnections, optical bus, optical switched backplane, optical networking, high-speed modular optoelectronic transceivers, and holography. Permanent Address: Nankai University Long Xing Li 8-1-201 Tianjin 3000192 China This dissertation was typed by the author. 115