RadSat – A Radiation Tolerant SmallSat Computer System
Radiation effects on space computers are becoming more of a concern as feature sizes of modern transistors continue to shrink, which in turn increases their susceptibility to SEEs (Single Event Effects) caused by ionizing particles. This is especially of concern when flying commercial off-the-shelf (COTS) parts that are attractive from a cost perspective but do not have any intentional radiation "hardening". FPGAs (Field Programmable Gate Arrays) are emerging as a potential platform to implement novel computer architectures for space due to their inherent flexibility; however, they are uniquely susceptible to SEEs due to storing their configuration in on-board SRAM. MSU (Montana State University) of Bozeman, MT, has been developing a radiation tolerant computer technology for the past decade that exploits the re-programmability of an FPGA to mitigate SEEs. In this approach, a redundant architecture is able to continue foreground operation in the presence of faults. When the fault is detected, the impacted region can be re-programmed in the background to flush out any errors and restore the region to its original operational state. This technology has been advanced to TRL-7 through a series of demonstrations and is now being prepared for an orbital mission to achieve TRL-9. 1) 2)
Project Relevance: The NASA Earth Science Decadal Survey states the need for on-board processing and power efficiency that far exceeds existing computer systems in order to meet NASA's future science goals. Additionally, the problem statement for the flight computing needs within the NASA TA11: Technology & Processing Roadmap is "ultra-reliable, radiation hardened platforms which, until recently, have been costly and limited in performance". 3) The TA11 roadmap also calls for innovative computing architectures to meet the needs of both science and engineering and emphasizes the need for scalable processing platforms that include intelligent fault-tolerant technologies to increase the robustness of computing platforms for long-duration missions. Simultaneously, the NASA Strategic Plan calls for "transforming NASA's missions and the Nation's capabilities by maturing cross-cutting and innovative space technologies" (Objective 1.7), particularly those that decrease cost and thus expand opportunities for future space activities. With the prevalence of computer systems in all future NASA missions, improving the capability of space computers has significant relevance and broad-scale impact across all NASA programs.
Radiation Effects Computer Electronics: Space computers must operate in a harsh radiation environment that leads to multiple types of failures. Radiation effects are separated into two broad categories: TID (Total Ionizing Dose) and Single Event Effects. Each of these failure mechanisms are caused by ionizing radiation striking the integrated circuit substrate and depositing unwanted energy. TID failure is caused by lower energy protons and electrons (<30 MeV/amu) striking the substrate and creating electron/hole pairs that are trapped in the insulating materials of the electronic devices. When this trapped charge occurs in the gate oxide of a transistor it causes the threshold voltage to be altered, which puts the device into a state where it is either always on or always off. When this trapped charge occurs in the isolation regions between devices, it can cause leakage current that consumes excessive power and can ultimately destroy the device. TID exposure causes a gradual degradation of the part as opposed to instantaneous failure.
SEE faults refer to electron/hole pairs caused by high energy particles and heavy ions striking the diffusion regions of a device. SEEs do not cause permanent damage to the device like TID does, but they do cause unwanted logic level transitions. These unwanted transitions lead to system failures such as erratic computer behavior or full system crashes. When a high-energy particle passes through an integrated circuit and generates enough free charge carriers to change the state of a digital logic line, it is called a SET (Single Event Transient). If this voltage transient is captured and stored by a flip-flop or other memory device, the event is referred to as a SEU (Single Event Upset). It is generally possible to recover from an SEU by simply resetting the affected circuit. However, if the SEU somehow produces such an alteration that a reset alone is not sufficient to restore the device to a healthy state, it is called a SEFI (Single Event Functional Interrupt). SEFIs typically require more drastic recovery measures such as power cycling or full system re-initialization. Figure 1 shows the cross-section of typical MOSFET device and how various radiation strikes cause different types of failures.
Figure 1: Cross-Section of MOSFET device showing radiation-induced fault mechanisms (image credit: MSU)
RadSat is the name of a satellite mission to demonstrate a novel computer architecture designed to mitigate radiation induced faults using COTS FPGAs. The computer technology is implemented as an experiment within RadSat to demonstrate it in an operational space environment. The fault mitigation approach in this computer involves breaking a commercial FPGA fabric into redundant tiles, each with the characteristics that they can fully contain the circuit of interest and also be individually reprogrammed using partial reconfiguration.
MSU's Fault Mitigation Approach:
MSU's approach to space computing involves implementing a novel SEE mitigation strategy on a modern, COTS, FPGA. By using a modern FPGA (28 µm process node), an acceptable level of TID immunity can be realized without needing expensive hardening techniques commonly used in space computers. Our system uses a Xilinx Artix-7 FPGA, which provides a predicted ~600 krad of TID immunity. The use of COTS parts also reduces the cost of computing by an order of magnitude compared to existing, rad-hard systems. For example, one of the widely-adopted space computer is the BAE RAD750. While this computer provides increased levels of radiation tolerance, it is also cost prohibitive for use in small satellites. The MSU computer uses all COTS parts, thus making it practical for small satellite applications.
The next technique employed by our computer is an extension of the widely adopted TMR (Triple Modular Redundancy). TMR is an architectural approach in which a circuit is triplicated. Each copy of the circuit produces an output that is fed into a voter. The voter produces the system output based on the majority coming from the three circuit copies. In this manner, if one of the circuits is faulted, the system can still produce the correct output. One of the drawbacks of TMR is that once the output is produced, the system must be halted while the faulted circuit is repaired. The main repair mechanism used in FPGAs is called configuration memory scrubbing. A scrubber continually overwrites the contents of the FPGA configuration memory to restore any fields that may have been corrupted by radiation. The scrubber uses a separate, non-volatile memory device that contains the original contents of the configuration memory implemented using a less susceptible storage technology such as a EEPROM. The drawback of TMR+Scrubbing is that the system must wait for the contents to be repaired using a full FPGA reconfiguration (i.e., scrubbing the entire FPGA) before resuming foreground operation. This leads to reduced computation in addition to being susceptible to a subsequent radiation strike during its repair procedure that puts the system into a perpetual state of failure.
The SEE fault mitigation approach in our computer extends TMR+Scrubbing by including spare circuitry to enhance the operation of TMR and a spatially aware approach to improve traditional scrubbing. Our approach to providing reliability involves breaking the COTS FPGA fabric into redundant tiles, each with the characteristics that they can fully contain the circuit of interest and also be individually reprogrammed using partial reconfiguration (PR). For our system, each tile contains a Xilinx MicroBlaze soft processor (32-bit RISC architecture provided by Xilinx). At any given time, three of the tiles run in TMR with the rest of the tiles reserved as spares. The TMR voter is able to detect faults in the active triad by voting on the tile outputs. A configuration memory scrubber continually runs in the background and is able to detect faults in the configuration memory of both the active and inactive tiles. In the event of a fault in the active triad, (either detected by the TMR voter or scrubber), the damaged tile is replaced with a known good spare and foreground TMR operation continues. The damaged tile is repaired in the background by reinitializing its configuration memory through partial reconfiguration. This approach mitigates single effect upsets (SEUs) in the FPGA circuit fabric in addition to single event functional interrupts (SEFIs) in the configuration memory. The advantage of this approach is that foreground operation can continue while the faulted tile is repaired and reintroduced as an available spare in the background. Since bringing on a spare tile takes significantly less time than performing background repair via partial or full reconfiguration of the FPGA, the system availability is increased.
Our implementation using a Xilinx Artix-7 FPGA has achieved a performance of 234 MIPs at 2W of full system power consumption. This represents a 2x improvement in power efficiency compared to the current state-of-the-art radiation hardened computers such as the BAE RAD750 and a 7x increase in performance compared to the more commonly adopted radiation hardened processors (e.g., HyperX, Maestro, RAD6000). The Artix-7 uses a 28 nm process node that has been shown to provide up to 600 krad of TID immunity, which meets the TID requirements for the majority of space missions.
The novel SEE mitigation strategy of our computer extends the SEE immunity of a COTS Artix-7 FPGA beyond existing mitigation strategies by a factor of 90 x. The MTBF (Mean-Time-Before-Failure) due to SEE's of our system implemented on an Artix-7 residing in the worst case location of an ISS orbit (e.g., the SAA) under worst week conditions is 5.4 hours compared to only 3.6 minutes using existing mitigation strategies (e.g., TMR+Scrubbing). This computer system promises to meet the performance, power efficiency, and reliability requirements of future science missions at a cost that is 130 x lower than existing radiation hardened computers. This technology is ready for mission operation testing in order to increase its TRL to 9. Figure 2 shows the Artix-7 FPGA board that has been developed at MSU and the FPGA floor plan of the 9-tile MicroBlaze system. Each square within the floor plan represents a tile that contains a full 32-bit MicroBlaze soft processor and can be partially reconfigured. Also shown is the entire computer PCB stack containing local power regulation and a data logging system that is used for flight testing.
Figure 2: Computer System (left), Artix-7 FPGA Board (center), and Tiled FPGA Floor Plan (right), image credit: MSU
History of Technology Maturation:
MSU has been maturing the radiation tolerant computer technology described in this proposal for nearly a decade from TRL-1 in to its current level of TRL-7 through a series of incremental demonstrations. In 2008, the Montana Space Grant Consortium funded a seed project to begin collaborating with NASA/MSFC (Marshall Space Flight Center) Advanced Avionics and Processor Systems project on technologies to address future space computing requirements. The initial version of the computer system was implemented using commercial FPGA evaluation boards and breadboards. This allowed the technology to be matured to TRL-3 through a demonstration to MSFC engineers at MSU in 2009.
In 2010, a NASA EPSCoR (Established Program to Stimulate Competitive Research) award titled "Development and Testing of a Radiation Tolerant Flight Computer with Real-Time Fault Detection, Recovery, and Repair" (NNX10AN32A) allowed the computer to be matured TRL-4 through ground-based testing in a cyclotron. This involved implementing the computer and a sensor system in a custom form factor in order to facilitate more rigorous laboratory testing and implementing the novel SEE fault mitigation strategy on a Virtex-6 75LX FPGA. This system was tested three times under bombardment by the ion Kr at an energy level of 25MeV/amu at the Texas A&M (TAMU) cyclotron. This testing demonstrated the computer system at TRL-4 and the sensor at TRL-5. The reason for the difference in TRL levels was that the Kr ion could not penetrate the FPGA package, so only the functionality of the idea and an individual component was demonstrated. The Kr ion easily penetrated the sensor, so it was able to achieve TRL-5 in a representative environment. This demonstration did verify that the FPGA SEE mitigation approach could be triggered by external radiation.
Later in 2010, a NASA educational grant titled "Engaging Women in Engineering Through an Interdisciplinary Payload Design" (NNX10AN91A) allowed an undergraduate engineering students to build high altitude balloon payloads to run experiments on the computer system in a representative environment. This project conducted six balloon flights between 2011 and 2013 of various versions of the computer system to altitudes of 90,000 feet in southwest Montana. This parallel project enabled the computer system to be developed into a form-factor suitable for even higher altitude balloon testing. Based on the results from these local balloon tests. In 2011, the computer was accepted into the Louisiana State University, HASP (High Altitude Student Payload) balloon program. This platform allowed the system to gain prolonged exposure to a higher radiation environment at a higher altitude. In September of 2012, the computer system was flown to an altitude of 120,000 feet for a duration of 10 hours. The computer system operated successfully for the entire flight and detected two high energy particle strikes using an on-board sensor. This flight demonstrated TRL-5 of the entire system on a sounding balloon.
In 2012, a project funded through the NASA OCT Game Changing Opportunities in Technology Development program titled "Suborbital Flight Demonstration of An FPGA-based, Radiation Tolerant, Reconfigurable Computer System" (NNX12AM50G) allowed the computer to be flown on two sounding rocket missions. The first was on the UP Aerospace SpaceLoft-9 vehicle out of SpacePort America in New Mexico in 2014. This took the computer to an altitude of 116 km in order to demonstrate the functionality of subsystems in a relevant end-to-end space environment (TRL-6). A second flight was conducted out of the Wallops Flight Facility in 2016 on the Improved Terrior-Orion vehicle; however, a power system failure prevented the computer from proper operation.
In 2013, a project funded through the NASA SmallSat Technology Partnership titled "Radiation Tolerant, FPGA-based SmallSat Computer System" (NNX13AR03A) allowed a satellite concept to be architected and the computer technology to be matured for orbital testing. In this project, the MSU team partnered with the Goddard Space Flight Center in order to mature the computer to a point where it was ready for long term space testing. This project allowed the FPGA subsystem to be redesigned to support both internal ISS testing and ultimately to ride as a payload on a small satellite.
In 2014, a project funded by the NASA EPSCoR ISS Flight Opportunity program titled "Space Flight Demonstration of a Radiation Tolerant, FPGA-based Computer System on the International Space Station" (NNX14AL03A) allowed the computer to be demonstrated as an internal experiment on the ISS using the NanoRacks CubeLab locker. The MSU computer was installed in December of 2016 and has been operating nominally for 7 months at the time of this writing. The results of this internal ISS experiment have allowed the computer technology to reach TRL-7 through a demonstration in an operational environment.
In 2016, funding from the NASA Undergraduate Student Instrument Project (USIP) titled "Student-Built CubeSat to Demonstrate a Radiation Tolerant Computer Technology" (NNX16AI75A) is allowing an undergraduate team to fully plan an orbital mission to demonstrate the computer technology and design the 3U satellite that will carry the computer as an experiment. This ongoing USIP project will result in a mission concept and an engineering model that has passed a rigorous series of NASA design reviews, adheres to all integration requirements set by the launch service provider NanoRacks, and meets all of the ISS safety requirements.
The RadSat satellite has been selected by the NASA 2015 CubeSat Launch Initiative (CSLI) for a launch in 2018. This mission will put RadSat into orbit from the ISS using the NanoRacks CubeSat Deployer. This ongoing project is scheduled to complete qualification and integration in December of 2017, thus achieving TRL-8. After deployment from the ISS, the computer will be operated for ~12 months in LEO. The operation of the satellite and analysis of the computer's performance will allow the technology to reach its final level of TRL-9. Figure 3 shows a montage of the technology maturation of the MSU computer.
Figure 3: History of Technology Maturation of the Radiation Tolerant Computer System (image credit: MSU)
The Need for Orbital Testing: The next step in evaluating the radiation tolerance of this computer is a stand-alone satellite mission. A satellite mission provides three key criteria for the technology to reach TRL-9. First, it provides testing in space. Space testing is required because the energy levels necessary to empirically test the system's SEE immunity are difficult to produce terrestrially. Cyclotrons and particle accelerators cannot reproduce the space environment accurately and are typically used to bombard sub-systems in modified form factors such as integrated circuits that have regions intentionally exposed to increase their SEE susceptibility. These types of tests do not meet the requirements of an "actual system mission proven" in order to achieve TRL-9. Second, a satellite mission provides a sufficient amount of time to verify operation in the presence of infrequent radiation strikes (2-3 per day). While high altitude balloons and sounding rockets are able to put the computer outside of the atmosphere where the radiation is not attenuated by collisions with atmospheric particles, they are only able to maintain their altitude for minutes (sound rocket) or hours (balloon). A satellite deployed in LEO using the NRCSD (NanoRacks CubeSat Deployer) will provide a nominal 8 months of exposure to radiation strikes. Finally, a satellite mission allows the computer to operate outside of a controlled environment such as on the ground or inside the ISS. This forces the computer to run without human intervention thus proving its mission reliability.
A highly efficient approach to putting a 3U CubeSat into LEO is using the ISS-based NanoRacks CubeSat Deployer. This approach delivers the CubeSat to the ISS in a pressurized cargo vehicle with a much less rigorous vibration profile. This reduces the amount of structural robustness that must be incorporated into the satellite in addition to reducing the amount of vibration testing required for qualification. Additionally, when using the NRCSD, NanoRacks handles all interfacing with the ISS safety team. This reduces the administrative work for the university team and allows them to focus on the qualification of the satellite.
RadSat Mission Concept:
The 3U RadSat nanosatellite will be put into orbit using the NanoRacks CubeSat Deployer. The NanoRacks CubeSat deployer is a satellite ejection system that is loaded with 6U worth of satellites on the ground and then carried to the ISS (International Space Station) by a commercial cargo provider. The NRCSD is received on the ISS though an airlock and then stored for deployment at a scheduled time. For a deployment, the platform is moved outside of the ISS via the Kibo Module's Airlock using theJEMRMS ( Japanese Experimental Module Remote Manipulator System). The JEMRMS robotic arm moves the NRCSD into the correct orientation and then releases the satellites into Low Earth Orbit.
There are two domestic launch providers that transport the NRCSD to the ISS, SpaceX and Orbital ATK. SpaceX uses the Falcon 9 vehicle carrying the Dragon spacecraft and launches its cargo resupply missions out of the Cape Canaveral Air Force Station in Florida. Orbital ATK using its Antares vehicle carrying its Cygnus spacecraft and launches its cargo missions out of the Wallops Flight Facility in Virginia. Each of these transport systems carry the NRCSD within a pressurized capsule packaged in a CTB (Crew Transport Bag). This type of soft stow launch minimizes the environmental conditions that the satellite must withstand during transport and reduces the necessary environmental testing. Once on the ISS, the payloads are typically deployed 1 month after berthing. This deployment approach eliminates the need for any specific interface with the launch vehicle. This type of launch provides a 51.6º inclination at an initial altitude of 385-400 km. This orbit has a lifetime of 8-12 months before it deorbits. The deployment from the ISS using the NRCSD guarantees a deorbit within 18 months, which meets the 25-year LEO lifetime limitation specified in the NASA Procedural Requirements for Limiting Space Debris (NPR8715.6). Figure 4 shows mission concept for RadSat.
The NanoRacks deployer requires that the MSU satellite contains two inhibit switches that will prevent power from reaching the satellite while stowed in the CTB. This inhibit switch is a common feature of small satellites and will be implemented by using a standard, 3U Pumpkin chassis, which has this switch system built into its satellite enclosures. Note that MSU has used this power inhibit switch system on all of its prior deployed satellites. Once the satellite is deployed from the NRCSD, the switches will be de-asserted and the satellite will power on. ISS requires that the satellite remain radio quiet for 30 minutes after deployment. This will be accomplished with a simple on-board timer. Once the MSU satellite is free-flying and has met the 30 minute quiet period, it will extend its single monopole antenna, which is constructed from a steel tape measure. The communication transceiver on the satellite is implemented with an AstroDev Li-2 radio with UHF uplink/downlink (~440 MHz, 1W). This communication link operates within the amateur radio (HAM) bands. MSU has already received approval for frequency coordination from the FCC (Federal Communications Commission) for the radio link in this satellite as part of the existing CSLI contract. The frequency coordination is administered through the IARU (International Amateur Radio Union). MSU has also received a waiver from the NOAA (National Oceanic and Atmospheric Administration) Commercial Remote Sensing Regulatory Affairs Office (CRSRA) stating that no imaging license is needed as this satellite does not contain any remote sensing devices.
Figure 4: RadSat Mission Concept (image credit: MSU)
Once operational, the FPGA-based computer system will generate a state-of-health telemetry packet that is ~480 bytes every 5 minutes. This telemetry packet contains information about the computer system foreground operation (simple binary counters running on the MicroBlaze processors), any faults that have occurred, sensor information, the voltage and current consumption, and time stamps. This information is passed to the satellite avionics communication system and downlinked. The telemetry beacon will be downlinked at 19.2 kbit/s. With the small size of information that is generated by the FPGA, the entire state-of-health telemetry packet can be downloaded in the beacon.
The satellite adheres to a standard 3U CubeSat form factor. The structure for the satellite is a Pumpkin 3U chassis that is modified with the necessary cutouts and mounting holes for our design. The satellite is powered by an on-board Li-ion battery that provides +8.4 V to various local regulators that in turn create the power supplies for the electronics. The satellite contains four 3U solar panels from ClydeSpace. Each panel contains 7 Spectrolab UTJ (Ultra Triple Junction) solar cells configured in series. A custom electrical power system (Power Board) uses the solar energy from the panels to continually charge the Li-ion battery. The nominal power consumption of the satellite is 2.4 W. Simulations using the AGI Systems Took Kit (STK) predict that under nominal conditions, our panels will generate 4.8 W per orbit, thus sustaining continuous power for the duration of the mission.
The avionics system is based on a Pumpkin motherboard containing a PIC processor. This serves as the C&DH (Command & Data Handling system) board. The communication for the satellite is handled with an Astronautical Development Lithium 2 UHF radio (Radio Board) attached to a single monopole antenna. Communication is handled using a half-duplex scheme with the antenna. To ensure the radiation pattern is in a direction that the MSU ground station can see, a passive magnetic stabilization system is used consisting of a permanent magnetic and hysteresis rods. The avionics system interfaces to the computer experiment through a custom Payload Interface Board.
The computer experiment consists of 3 stacked printed circuit boards. The FPGA Board contains a Xilinx Artix-7 FPGA, which contains the radiation tolerant architecture. A Xilinx Spartan-6 FPGA handles all housekeeping associated with the experiment. A Payload Power Board receives an unregulated +8.4v from the power system and creates all of the necessary voltages for the FPGAs. A Data Logging Board is used for redundant storage of the telemetry data for the experiment. This storage is used for ground-testing. An internal solid state radiation sensor is included in the satellite to correlate radiation strikes to computer faults. The avionics system queries the computer experiment for telemetry data and stores the data in its own FLASH memory system. This telemetry data is then sent over the radio in the beacon every 5 minutes. Figure 8 shows a 3D model of the RadSat design.
Figure 5: Illustration of the RadSat spacecraft design (image credit: MSU)
Launch: The RadSat nanosatellite is scheduled to be flown to this ISS on a CSLI (CubeSat Launch Initiative) in 2018.
Orbit: Near circular orbit, altitude of ~400 km, inclination = 51.6º.
MSU's SSEL (Space Science and Engineering Laboratory) operates its own ground station for its small satellites that will be used for this project. Much of this station is built from Amateur radio hardware, including the Yagi antennas, antenna rotators, uplink TNC, and uplink transmitter. Figure 6 shows a block diagram of the MSU ground station. This ground station requires three computers, labeled SSEL-Ground1, SSEL-Ground2, and SSEL-Ground3. SSEL-Ground3 controls the Yaesu G-550 antenna rotators, based on Two-Line Elements (TLEs) obtained from SpaceTrack.org using the NOVA software tool. SSEL-Ground2 runs the Linux Mint operating system in order to run GNURadio, an open-source software defined radio toolkit. The GNURadio controls the USRP N200 software defined radio and handles demodulating the packets from the satellite. SSEL-Ground2 also runs the InControl "Linkage" program, a custom program that routes all uplink and downlink packets to/from a commercial grade version of L3's InControl satellite command and control software on SSEL-Ground. InControl allows the packet information to be quickly recovered and analyzed.
Figure 6: Ground station administered by the MSU SSEL (image credit: MSU)
1) Brock J. LaMeres, Colin Delaney, Matt Johnson, Connor Julien, Kevin Zack, Ben Cunningham Todd Kaiser, Larry Springer, David Klumpar, "Next on the Pad: RadSat – A Radiation Tolerant Computer System," Proceedings of the 31st Annual AIAA/USU Conference on Small Satellites, Logan UT, USA, Aug. 5-10, 2017, paper: SSC17-III-11, URL: http://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=3618&context=smallsat
2) Brock J. LaMeres, Samuel Harkness, Mathew Handley, Patrick Moholt, Connor Julien, Todd Kaiser, David Klumpar, Keith Mashburn, Larry Springer, Gary A. Crum, "RadSat - Radiation Tolerant SmallSat Computer System," Proceedings of the 29th Annual AIAA/USU Conference on Small Satellites, Logan, Utah, USA, August 8-13, 2015, paper: SSC15-X-8, URL: http://www.montana.edu/blameres/vitae/publications/d_conference_full/conf_full-028_radsat_smallsat_computer.pdf
3) "NASA Space Technology Roadmaps – Modeling, Simulation, Information Technology & Processing Roadmap, Technology Area 11", 2010, URL: https://www.nasa.gov/offices/oct/home/roadmaps/index.html
The information compiled and edited in this article was provided by Herbert J. Kramer from his documentation of: "Observation of the Earth and Its Environment: Survey of Missions and Sensors" (Springer Verlag) as well as many other sources after the publication of the 4th edition in 2002. - Comments and corrections to this article are always welcome for further updates (firstname.lastname@example.org).