A reconfigurable heterogeneous in-memory computing architecture for variable precision computation: a software-hardware co-design approach

doi:10.1007/s44275-025-00028-1

Moore and More ›› 2026, Vol. 2 ›› Issue (1): 1-14.DOI: 10.1007/s44275-025-00028-1

• ORIGINAL ARTICLE • Next Articles

A reconfigurable heterogeneous in-memory computing architecture for variable precision computation: a software-hardware co-design approach

Yizhe Chen¹^,²^,³, Hanjie Liu¹^,³, Saiya Wang¹, Jinyao Mi¹, Xiaodi Xing³, Yuexi Lv³, Aifei Zhang³, Lichuan Luo⁴, Yong Pei², Minghua Tang⁵, Wang Kang¹^,^*()

¹ School of Integrated Circuit Science and Engineering, Beihang University , Beijing 100191, China
² The College of Chemistry, Xiangtan University , Xiangtan 411105, Hunan, China
³ Zhicun Research Lab , Hangzhou 311101, China
⁴ State Key Laboratory of Wireless Mobile Communications (CICT) , Beijing 100191, China
⁵ School of Materials Science and Engineering, Xiangtan University , Xiangtan 411105, Hunan, China

Received:2024-10-14 Revised:2025-01-03 Accepted:2025-01-13 Published:2025-07-14 Online:2025-07-14
Contact: *Wang Kang (wang.kang@buaa.edu.cn)
About author:Yizhe Chen received a B.S. degree from the College of Electrical and Information Engineering in Zhengzhou University of Light Industry, in 2021. He is currently working toward an M.S. degree in the College of Chemistry at Xiangtan University. His research interests include noise modeling and optimization algorithms in analog in-memory computing, quantization, and training of neural networks.
Hanjie Liu received a B.S. degree in measurement and control technology and instrumentation from Wuhan Textile University, in 2016. He is currently working toward an M.S. degree in information and electronics at Beihang University. His research interests include AI compiler design, simulator design, and system co-design of computing in memory.
Saiya Wang received a B.S. degree in communication engineering from Hunan University in 2024. Currently, she is working toward an M.S. degree in science and engineering of integrated circuits at Beihang University. Her research mainly focuses on inmemory-computing circuit design and efficient deep learning.
Jinyao Mi received a B.S. degree in integrated circuit design and integrated system from Beihang University, Beijing, in 2024. Currently, he is working toward a Eng.D. degree in integrated circuit engineering at Beihang University. His research mainly focuses on analog in-memory-computing circuit.
Xiaodi Xing received a Ph.D. degree in communications and information systems in 2014, and a bachelor’s degree in electronic engineering in 2006, both from Beihang University, Beijing. He joined Zhicun Research Lab in 2018 after 5 years’ experience in IBM China system center, and has been working on the architecture design of neural network processor units built on computing-in-memory (MPU) ever since. His current interests include visual processing network and LLM model chip design and architectural optimization.
Yuexi Lv received a B.S. degree from the School of Electronic Science and Engineering in Nanjing University in 2015, and an M.S. degree from the State Key Laboratory of Superlattices and Microstructures in the Institute of Semiconductors, in 2018. His research interests include noise modeling and optimization algorithms in analog inmemory computing, quantization and training of neural networks.
Aifei Zhang received an M.S. degree from the School of Information and Electronics in the Beijing Institute of Technology, Beijing, in 2017. He is currently working in the Zhicun Research Lab, Beijing. His research interests mainly include AI compilers, frameworks, high-efficiency computing and heterogeneous cooperative computing platforms.
Lichuan Luo received a B.S. degree in integrated circuit design and integrated systems from Xidian University, Xi’an, Shaanxi, in 2013, and an M.S. degree in microelectronics and solid-state electronics from the Institute of Semiconductor, Chinese Academy of Sciences, Beijing, in 2016. He received his Ph.D. degree from Beihang University, Beijing, in 2024. He is an ASIC Design Engineer with China Information and Communication Technology Group Co., Ltd. His research interests include computingin-memory, RISC-V, embedded deep learning, and reconfigurable computing.
Yong Pei received a B.S. degree from the College of Chemistry in Xiangtan University, in 2001, and a Ph.D. degree from the Institute of Theoretical and Computational Chemistry in Nanjing University, in 2006. His research interests include theoretical computational simulation of the structural evolution of clusters, electronic structure, photophysical and chemical properties, and metal exchange mechanisms.
Minghua Tang received a B.S. degree in physics and a Ph.D. degree in materials physics and chemistry from Xiangtan University, Hunan, China, in 1988 and 2007, respectively. He is currently with the School of Materials Science and Engineering, Xiangtan University, Xiangtan, Hunan, and was a Visiting Professor at the Institute of Microelectronics of Tsinghua University, China (2003-2004), Tokyo Institute of Technology, Japan (2008- 2009) and Nanyang Technological University, Singapore (2011), with research work focused on the fabrication and the characteristics of ferroelectric thin film memory with 65 nm process. His research interests include ferroelectric thin film memory, resistive random access memory, and neuromorphological devices for computing-in memory application and power devices.
Wang Kang (Senior Member, IEEE) received a double Ph.D. degree in physics from the University of Paris- Sud, France, and in microelectronics from the Beihang University, Beijing. He is an Associate Professor with the School of Integrated Circuit Science and Engineering, Beihang University. His research interests include spintronics and its related devices, circuits, and architectures. He has coauthored three book chapters, over 40 Chinese patents, and over 100 scientific papers.

Abstract

Abstract:

In-memory computing (IMC) has emerged as a promising approach for accelerating deep neural network (DNN) inference by relocating computations to memory arrays. However, the efficacy of analog IMC diminishes when higher computational precision is required due to inherent device non-idealities. In this paper, we present a reconfigurable heterogeneous architecture that integrates a digital computing unit (DCU) with an analog IMC unit (AIMCU). The computational data is partitioned into most significant bits (MSBs) and least significant bits (LSBs); the sparse MSBs are processed by the DCU with lossless precision, and the dense LSBs are computed by the AIMCU for high energy efficiency, thereby enhancing inference accuracy and optimizing area efficiency. The architecture also features multiple modes that support variable-precision input splitting and weight splitting computation. Additionally, by leveraging hardware characteristics, we have developed several optimization strategies for neural network deployment, including parameter splitting, shifting algorithms, and sparse weight mapping. The experimental results show that the perceptual evaluation of speech quality (PESQ) of the deep complex convolution recurrent network (DCCRN) improved by 28.98%, while the peak signal-to-noise ratio (PSNR) of the super-resolution network (SRN) increased by 17.27%. Compared to previous state-of-the-art (SOTA) work, the reconfigurable heterogeneous-IMC-based system on a chip (SoC) demonstrates a significant improvement in energy efficiency while achieving accuracy close to that of pure digital computing.

Key words: In-memory computing, DNN, Heterogeneous, Accuracy improvement, Parameter split

Yizhe Chen, Hanjie Liu, Saiya Wang, Jinyao Mi, Xiaodi Xing, Yuexi Lv, Aifei Zhang, Lichuan Luo, Yong Pei, Minghua Tang, Wang Kang. A reconfigurable heterogeneous in-memory computing architecture for variable precision computation: a software-hardware co-design approach[J]. Moore and More, 2026, 2(1): 1-14.

References 26

[1]	Pan B , Wang G , Zhang H , Kang W , Zhao W (2022) A mini tutorial of processing in memory: From principles, devices to prototypes. IEEE Trans Circ Syst II Express Briefs 69(7): 3044-3050. https://doi.org/10.1109/TCSII.2022.3172494
[2]	Wang G , Lv Y , Tian Y , Zhang J , Guo C , Bai T et al (2023) A 40nm 5-16Tops/W@INT8 eFlash in-memory computing SoC chip with noise suppression and compensation techniques to improve the accuracy. In: 2023 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA). pp 128-129. https://doi.org/10.1109/ICTA60488.2023.10363786
[3]	Bai T , Mao W , Wang G , Liu H , Zhang A , Fu S et al (2024) An end-to-end in-memory computing system based on a 40-nm eFlash-based IMC SoC: circuits, toolchains, and systems co-design framework. IEEE Trans Comput Aided Des Integr Circ Syst 43(6): 1729-1740. https://doi.org/10.1109/TCAD.2024.3349502
[4]	Chen J , Zhao Y , Xiong T , Si X (2024) An INT8 charge-digital hybrid compute-in-memory macro with CNN-friendly shift-feed register design. IEEE Trans Circ Syst II Express Briefs 71(3): 1371-1375. https://doi.org/10.1109/TCSII.2023.3323211
[5]	Chih YD , Lee PH , Fujiwara H , Shih YC , Lee CF , Naous R et al (2021) 16.4 an 89Tops/W and 16.3Tops/mm ² all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications . In: 2021 IEEE International Solid-State Circuits Conference (ISSCC) 64: 252-254. https://doi.org/10.1109/ISSCC42613.2021.9365766
[6]	Chen YC , Chang CY , Wu AY (2023) H-RIS: hybrid computing-in-memory architecture exploring repetitive input sharing. In: 2023 IEEE International Symposium on Circuits and Systems (ISCAS). pp 1-5. https://doi.org/10.1109/ISCAS46773.2023.10181793
[7]	Kim H , Chen Q , Yoo T , Kim TTH , Kim B (2019) A 1-16b precision reconfigurable digital in-memory computing macro featuring column-MAC architecture and bit-serial computation. In: ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference (ESSCIRC). pp 345-348. https://doi.org/10.1109/ESSCIRC.2019.8902824
[8]	Chen YC , Ando S , Fujiki D , Takamaeda-Yamazaki S , Yoshioka K (2024) OSA-HCIM: On-the-fly saliency-aware hybrid SRAM CIM with dynamic precision configuration. In: 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). pp 539-544. https://doi.org/10.1109/ASP-DAC58780.2024.10473966
[9]	Kao JI , Lu W , Huang PT , Chen HM (2022) Precision-aware workload distribution and dataflow for a hybrid digital-CIM deep CNN accelerator. In: 2022 19th International SoC Design Conference (ISOCC). pp 171-172. https://doi.org/10.1109/ISOCC56007.2022.10031486
[10]	Bai J , Fan Y , Sun S , Kang W , Zhao W (2021) Tiny neural network search and implementation for embedded FPGA: a software-hardware co-design approach. In: 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC). pp 1-3. https://doi.org/10.1109/A-SSCC53895.2021.9634749
[11]	Ding Y , Liu C , Duan M , Chang W , Li K , Li K (2023) HAIMA: a hybrid SRAM and DRAM accelerator-in-memory architecture for transformer. In: 2023 60th ACM/IEEE Design Automation Conference (DAC). pp 1-6. https://doi.org/10.1109/DAC56929.2023.10247913
[12]	Wilkes MV (1995) The memory wall and the cmos end-point. SIGARCH Comput Archit News 23(4): 4-6. https://doi.org/10.1145/218864.218865
[13]	Wulf WA , McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1): 20-24. https://doi.org/10.1145/216585.216588
[14]	Feinberg B , Vengalam UKR , Whitehair N , Wang S , Ipek E (2018) Enabling scientific computing on memristive accelerators. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). pp 367-382. https://doi.org/10.1109/ISCA.2018.00039
[15]	Shafiee A , Nag A , Muralimanohar N , Balasubramonian R , Strachan JP , Hu M et al (2016) ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). pp 14-26. https://doi.org/10.1109/ISCA.2016.12
[16]	Zhu Z , Lin J , Cheng M , Xia L , Sun H , Chen X et al (2018) Mixed size crossbar based RRAM CNN accelerator with overlapped mapping method. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). pp 1-8. https://doi.org/10.1145/3240765.3240825
[17]	Yuan Y , Yang Y , Wang X , Li X , Ma C , Chen Q et al (2024) 34.6 A 28nm 72.12TFLOPS/W hybrid-domain outer-product based floating-point SRAM computing-in-memory macro with logarithm bit-width residual ADC. In: 2024 IEEE International Solid-State Circuits Conference (ISSCC) 67: 576-578. https://doi.org/10.1109/ISSCC49657.2024.10454313
[18]	Jeong S , Oh J , Jeon D (2024) A 28nm 157TOPS/W 446.9kb/mm ² compute-in-memory SRAM macro with analog-digital hybrid computing for deep neural network inference . In: 2024 IEEE Custom Integrated Circuits Conference (CICC). pp 1-2. https://doi.org/10.1109/CICC60959.2024.10529098
[19]	Haq Rashed MR , Jha SK , Ewetz R (2021) Hybrid analog-digital in-memory computing. In: 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). pp 1-9. https://doi.org/10.1109/ICCAD51958.2021.9643526
[20]	Guo A , Chen X , Dong F , Chen J , Yuan Z , Hu X et al (2024) 34.3 a 22nm 64kb lightning-like hybrid computing-in-memory macro with a compressed adder tree and analog-storage quantizers for transformer and CNNS. In: 2024 IEEE International Solid-State Circuits Conference (ISSCC) 67: 570-572. https://doi.org/10.1109/ISSCC49657.2024.10454278
[21]	Bai J , Xue W , Fan Y , Sun S , Kang W (2023) Partial sum quantization for computing-in-memory-based neural network accelerator. IEEE Trans Circ Syst II Express Briefs 70(8): 3049-3053. https://doi.org/10.1109/TCSII.2023.3246562
[22]	Bai J , Sun S , Zhao W , Kang W (2024) CIMQ: a hardware-efficient quantization framework for computing-in-memory-based neural network accelerators. IEEE Trans Comput Aided Des Integr Circ Syst 43(1): 189-202. https://doi.org/10.1109/TCAD.2023.3298705
[23]	Sun S , Bai J , Shi Z , Zhao W , Kang W (2024) CIM ²PQ: an array-wise and hardware-friendly mixed precision quantization method for analog computing-in-memory . IEEE Trans Comput Aided Des Integr Circ Syst 43(7): 2084-2097. https://doi.org/10.1109/TCAD.2024.3358609
[24]	Guo X , Bayat FM , Bavandpour M , Klachko M , Mahmoodi MR , Prezioso M et al (2017) Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. In: 2017 IEEE International Electron Devices Meeting (IEDM). pp 6.5.1-6.5.4. https://doi.org/10.1109/IEDM.2017.8268341
[25]	Su JW , Chou YC , Liu R , Liu TW , Lu PJ , Wu PC et al (2023) A 8-b-precision 6T SRAM computing-in-memory macro using segmented-bitline charge-sharing scheme for AI edge chips. IEEE J Solid-State Circ 58(3): 877-892. https://doi.org/10.1109/JSSC.2022.3199077
[26]	Lee K , Cheon S , Jo J , Choi W , Park J (2021) A charge-sharing based 8T SRAM in-memory computing for edge DNN acceleration. In: 2021 58th ACM/IEEE Design Automation Conference (DAC). pp 739-744. https://doi.org/10.1109/DAC18074.2021.9586103

A reconfigurable heterogeneous in-memory computing architecture for variable precision computation: a software-hardware co-design approach

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

References 26

Related Articles 1

Recommended Articles

Metrics