Research

Research Interest 🔬

My research interests are in general AI applications, including:

Natural language processing
- Named entity recognition
- Document analysis
Speech signal processing
- Speech synthesis
Natural image processing
- Super-resolution, denoising
- Classification, segmentation, detection
- Image-to-image translation, image generation
Medical image processing
- Ultrasound
- Mammography, X-ray, Computed Tomography
- Magnetic Resonance Imaging
Industrial image processing
- STEM-EDX tomography
- Baggage scanner

In addition, I try to use mathematical tools to understand the behaviour of artificial intelligence (AI) because there are very important areas when developing and improving the models. Furthermore, to simplify complicated problems caused by coupled physics functions, I prefer to include the physics functions as a specific layer in the AI architecture. The research topic below shows what I did in various fields.

Besides AI technology, I am also interested in developing geocoding and blockchain.

Research Topics 📝

Speech signal processing 🎙️

Emotional Speech Synthesis (Paper; ICASSP 2023 - under review, Demo)

With the rapid development of the speech synthesis system, recent text-to-speech models have reached the level of generating natural speech similar to what humans say, but still have limitations in terms of expressiveness. In particular, the existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space is difficult to control the continuous emotional intensity because of entanglement of features. In this paper, we propose a novel method to control the continuous intensity of emotions using semi-supervised learning.

Natural image processing 📷

Super Resolution (Paper; CVPRW 2017 SISR challenge, Github)

The latest deep learning approaches perform better than the state-of-the-art signal processing approaches in various image restoration tasks. However, if an image contains many patterns and structures, the performance of these CNNs is still inferior. To address this issue, here we propose a novel feature space deep residual learning algorithm that outperforms the existing residual learning. The main idea is originated from the observation that the performance of a learning algorithm can be improved if the input and/or label manifolds can be made topologically simpler by an analytic mapping to a feature space. Our extensive numerical studies using denoising experiments and NTIRE SISR competition demonstrate that the proposed feature space residual learning outperforms the existing SOTA.

Medical image processing 🏥

Ultrasound (Paper; ISBI 2016)

To reduce data rate for power-limited portable ultrasound imaging systems, various compressed sensing approaches have been investigated. However, most of the existing approaches require either hardware changes or computationally expensive forward modeling of wave propagation. To overcome these limitations, here we propose a novel low rank interpolation method that exploits the annihilation property of ultrasound measurements. We exploit temporal correlation between the measurements from consecutive frames by augmenting Hankel matrix side by side for a low rank matrix completion. Reconstruction results confirmed that the proposed method can effectively reduce the data rate for ultrasound acquisition without sacrificing the image quality.

Mammography (Harvard Medical School)

Breast Position Monitoring is designed to evaluate the good position of the breast in a 2D mammography image. The aim is to help the detection of bad quality images due to positioning issue and then to reduce the number of recalls after breast exam. Most of the proposed AI models achieve more than 0.8 AUC, which is better than the conventional algorithm developed by the GE healthcare team.

Sparse-view CT (Paper; IEEE TMI)

X-ray computed tomography (CT) using sparse projection views is a recent approach to reduce the radiation dose. However, due to the insufficient projection views, an analytic reconstruction approach using the filtered back projection (FBP) produces severe streaking artifacts. Inspired by the recent theory of deep convolutional framelets, the main goal of this paper is, therefore, to reveal the limitation of U-Net and propose new multi-resolution deep learning schemes.

Region-of-Interest CT (Paper; MED PHYS)

Computed tomography for the reconstruction of region of interest (ROI) has advantages in reducing the x-ray dose and the use of a small detector. However, standard analytic reconstruction methods such as filtered back projection (FBP) suffer from severe cupping artifacts, and existing model-based iterative reconstruction methods require extensive computations. Two types of neural networks are designed. The first type learns ROI size-specific cupping artifacts from FBP images, whereas the second type network is for the inversion of the truncated Hilbert transform from the truncated differentiated backprojection (DBP) data. Their generalizabilities for different ROI sizes, pixel sizes, detector pitch and starting angles for a short scan are then investigated.

Conebeam CT (Paper; IEEE TMI)

Conebeam CT using a circular trajectory is quite often used for various applications due to its relative simple geometry. For conebeam geometry, Feldkamp, Davis and Kress algorithm is regarded as the standard reconstruction method, but this algorithm suffers from so-called conebeam artifacts as the cone angle increases. In this paper, we develop a novel deep learning approach for accurate conebeam artifact removal. In particular, our deep network, designed on the differentiated backprojection domain, performs a data-driven inversion of an ill-posed deconvolution problem associated with the Hilbert transform. The reconstruction results along the coronal and sagittal directions are then combined using a spectral blending technique to minimize the spectral leakage.

Interior tomography with low-dose X-ray CT (Paper; Phys Med Biol)

Many researchers have utilized image-domain deep learning (DL) approaches to remove each artifact and demonstrated impressive performances, and the theory of deep convolutional framelets supports the reason for the performance improvement. However, it is difficult to solve coupled artifacts using an image-domain convolutional neural network (CNN). To address the coupled problem, we decouple it into two sub-problems: (i) image-domain noise reduction inside the truncated projection to solve low-dose CT problem and (ii) extrapolation of the projection outside the truncated projection to solve the ROI CT problem. The decoupled sub-problems are solved directly with a novel proposed end-to-end learning method using dual-domain CNNs.

MRI with Projection-reconstruction (Paper; Megn Reson Med)

The radial k-space trajectory is a well-established sampling trajectory used in conjunction with magnetic resonance imaging. However, the radial k-space trajectory requires a large number of radial lines for high-resolution reconstruction. Increasing the number of radial lines causes longer acquisition time, making it more difficult for routine clinical use. On the other hand, if we reduce the number of radial lines, streaking artifact patterns are unavoidable. To solve this problem, we propose a novel deep learning approach with domain adaptation to restore high-resolution MR images from under-sampled k-space data.

MRI with k-space interpolation (Paper; IEEE TMI)

The annihilating filter-based low-rank Hankel matrix approach (ALOHA) is one of the state-of-the-art compressed sensing approaches that directly interpolates the missing k-space data using low-rank Hankel matrix completion. The success of ALOHA is due to the concise signal representation in the k-space domain, thanks to the duality between structured low-rankness in the k-space domain and the image domain sparsity. Inspired by the recent mathematical discovery that links convolutional neural networks to Hankel matrix decomposition using data-driven framelet basis, here we propose a fully data-driven deep learning algorithm for k-space interpolation. Our network can be also easily applied to non-Cartesian k-space trajectories by simply adding an additional regridding layer.

Industrial image processing 🏭

CT Baggage Scanner (Paper; CT MEETING 2018)

For homeland and transportation security applications, 2D X-ray explosive detection system (EDS) have been widely used, but they have limitations in recognizing 3D shape of the hidden objects. Among various types of 3D computed tomography (CT) systems to address this issue, this paper is interested in a stationary CT using fixed X-ray sources and detectors. However, due to the limited number of projection views, analytic reconstruction algorithms produce severe streaking artifacts. Inspired by recent success of deep learning approach for sparse view CT reconstruction, here we propose a novel image and sinogram domain deep learning architecture for 3D reconstruction from very sparse view measurement.

STEM-EDX Tomography (Paper; Nature Machine Intelligence)

Energy-dispersive X-ray spectroscopy (EDX) is often performed simultaneously with high-angle annular dark-field scanning transmission electron microscopy (STEM) for nanoscale physico-chemical analysis. However, high-quality STEM-EDX tomographic imaging is still challenging due to fundamental limitations such as sample degradation with prolonged scan time and the low probability of X-ray generation. To address this, we propose an unsupervised deep learning method for high-quality 3D EDX tomography of core–shell nanocrystals, which can be usually permanently dammaged by prolonged electron beam. The proposed deep learning STEM-EDX tomography method was used to accurately reconstruct Au nanoparticles and InP/ZnSe/ ZnS core–shell quantum dots, used in commercial display devices.