Applied Science and Convergence Technology 2023; 32(5): 122-126
Published online September 30, 2023
https://doi.org/10.5757/ASCT.2023.32.5.122
Copyright © The Korean Vacuum Society.
Sang-Bin Lee , Ji-Hoon Kim , Gwan Kim , Jun-Woo Park , Byung-Kwan Chae , and Hee-Hwan Choe *
School of Electronics and Information Engineering, Korea Aerospace University, Goyang 10540, Republic of Korea
Correspondence to:choehh@kau.ac.kr
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc-nd/4.0/) which permits non-commercial use, distribution and reproduction in any medium without alteration, provided that the original work is properly cited.
This study proposes a model that combines deep learning (DL) techniques with plasma simulations to efficiently investigate optimal process conditions. The DL model was trained using data obtained from an Ar/O2 inductively coupled plasma discharge simulation. Plasma discharge parameters such as the O2 ratio, pressure, and power were trained as input data to predict the electron density, electron temperature, potential, and the densities of Ar+, O2+, O−, and O+. The performance of the DL model was verified by comparing the results of interpolation, which predicted a constant pattern within the range of the trained data, and extrapolation, which predicted a pattern beyond the trained data range, with the ground truth to verify the low error rate. The proposed deep neural network model can significantly reduce the necessity for trial and error when adjusting the process conditions. This model is expected to be an effective tool for narrowing the process window during the early stages of equipment and process development.
Keywords: Deep learning, Plasma simulation, Interpolation, Extrapolation, Plasma process
Introducing plasma processes into semiconductor manufacturing has enabled the fabrication of high-aspect-ratio micropatterns [1,2]. These advancements have paved the way for patterns with complex three-dimensional stacked structures, improving the performance of semiconductor equipment [3,4]. The complexity of equipment structures increases as equipment performance improves; moreover, improvements in process capabilities increase the number of process steps required to improve the device performance [5].
Some plasma processes have been developed to achieve high density and uniformity [6,7]. Achieving the requisite plasma density and uniformity for semiconductor processes involves extensive trial and error because of the complex interactions between process gases and electrons [8], coupled with the effects of process variables, such as power and pressure [9].
Data-driven machine learning (ML) methods have been widely adopted across various scientific fields [10,11]. ML serves as an effective method for tackling complex and computationally expensive realworld systems [12]. These ML-based optimization approaches have demonstrated their effectiveness as valuable tools for addressing intricate engineering problems, with potential applications in optimizing plasma processes [13].
Data from the simulations of Ar/O2 inductively coupled plasma (ICP) discharge were used for model training before designing the deep learning (DL) model. The DL model was optimized through six selection criteria, being trained with power, pressure, and O2 ratio as input data to predict the electron density, electron temperature, potential, and densities of Ar+, O2+, O−, and O+. The performance of the deep-learning model was validated by comparing its interpolation and extrapolation results against the ground truth. Interpolation predicts a constant pattern within the range of the trained data, while extrapolation extends predictions beyond that range [14].
This paper is structured to provide a comprehensive overview of the experimental details, encompassing the presentation of the DL model and optimization methodology. The results and discussion section delves into the detailed optimization results and validation of the DL model against the ground truth. The conclusion section summarizes the study.
Figure 1 depicts an ICP chamber with a radius and height of 30 and 30 cm, respectively. Simulation results for the plasma discharge were obtained using COMSOL Multiphysics. The ICP chamber was subjected to the conditions listed in Table I to generate a dataset for prediction using the DL model. Subsequently, the electron density, electron temperature, potential, and densities of Ar+, O2+, O−, and O+ were calculated. The frequency value and flow rate were fixed at 13.56 MHz at 250 sccm, respectively. Moreover, the ranges of O2 ratio, pressure, and power were set to 0.3–0.7 (ITI:0.1), 10–50 mTorr (ITI:10 mTorr), and 100–180 W (ITI:20 W) to assess the changes in plasma variables based on parameters mentioned above for constructing the DL model.
Table 1 . Chamber parameter condition.
Parameter | Value |
---|---|
Frequency (MHz) | 13.56 |
Quality factor (Qf) | 250 |
Power (W) | 100 ~ 180 |
Pressure (mTorr) | 10 ~ 50 |
O2 ratio | 0.3 ~ 0.7 |
The prediction process follows a step-by-step procedure. This method can mitigate inaccurate predictions by sequentially predicting O2 ratios, pressures, and power levels; furthermore, this approach demonstrates superior performance compared to existing DNN models [15,16]. A total of 25 data files were generated by maintaining the pressure and power at fixed values and organizing the files depending on the O2 ratio. The optimal DNN model for the data was determined through a cross-validation process. The first prediction was based on the O2 ratio, followed by additional predictions based on pressure. Finally, we interpolated and extrapolated the three variables by predicting them based on power. Accuracy was confirmed by comparing the predicted data with the original data.
The DNN models were cross-validated using six criteria: input data normalization, output data normalization, train type, train network, activation function, and hidden layer, which were compared to select the optimized DNN model. We compared the mean absolute error (MAE) of each dataset and selected the lowest case (Table II). The cases of the train network and hidden layer were not selected based on MAE. The accuracy of the data was low because of overfitting, stemming from excessively complex and large-scale neural networks [17,18]. Furthermore, we selected the most reasonable case while considering the time required for training. We aimed to minimize the underfitting and overfitting by optimizing the DNN model.
Table 2 . Details of each case.
Category | Case | Details |
---|---|---|
Input normalization | 1 | None |
2 | Z-score normalize | |
Output normalization | 1 | None |
2 | Log scale normalize | |
Train type | 1 | Trainbr |
2 | Trainbfg | |
3 | Trainlm | |
Train network | 1 | Cascadeforwardnet |
2 | Fitnet | |
3 | Feedforwardnet | |
Activation function | 1 | None |
2 | ReLU | |
3 | Tanh | |
4 | Sigmoid | |
Hidden layer | 1 | (n[1] = 20, n[2] = 20, n[3] = 20) |
2 | (n[1] = 10, n[2] = 40, n[3] = 10) | |
3 | (n[1] = 20, n[2] = 20, n[3] = 20, n[4] = 20) |
Input and output data normalizations were achieved through Zscore normalization and log normalization, respectively. Bayesian regulation (Trainbr) and feedforward net were train type and train network, respectively. The activation function was the rectified linear unit (ReLU), and the hidden layer was set as (n[1] = 20, n[2] = 20, n[3] = 20) a selection deemed optimal for the DNN model. Applying Z-score and log-scale normalization serves the dual purpose of expediting the training and enhancing the generalization of the DNN [19]. Bayesian regulation, compared to standard backpropagation networks, offers increased potency and can minimize lengthy cross-validation [20]. The feedforward network is notable for its efficiency in terms of the number of configured units and its capacity to generalize to patterns not included in the training dataset [21]. Notably, the ReLU DNN model with sufficient layers can reproduce any linear finite element function [22]. Predictions were made for seven outputs, namely, electron density, electron temperature, potential, and densities of Ar+, O2+, O−, and O+, based on three input variables–O2, pressure, and power. We trained the network with 20 neurons in each of the three layers.
We opted for specific variable values: O2 = 0.4, p0 = 35 mTorr, and Pw = 150 W based on data from COMSOL (Multiphysics v. 6.1. COMSOL AB, Stockholm, Sweden). Interpolation predicts data within the bounds of the existing data. Hence, the optimized DNN model with O2 = 0.4, p0 = 35 mTorr, and Pw = 150 W was predicted in two steps, by the sequential prediction of the pressure and power using the prepared data for O2 = 0.4, p0 = 10–50 mTorr, and Pw = 100–180 W. We could predict p0 = 35 mTorr because our data included values for p0 = 10–50 mTorr. Subsequently, we obtained data for O2 = 0.4, p0 = 35 mTorr, and Pw = 100–180 W, which enabled us to predict the value for Pw = 150 W, culminating in the prediction for O2 = 0.4, p0 = 35 mTorr, and Pw = 150 W.
Figure 2(a) compares the data predicted by the DNN model in Step 2 with the ground truth from COMSOL. Seven variables were compared: electron density, electron temperature, electron potential, Ar+ ion density, O2+ ion density, O− ion density, and O+ ion density. The average MAE was exceedingly low (approximately 1 %), even considering error accumulation from Step 1 and Step 2.
Figure 2(b) compares the original and DNN-predicted density data at the center of the chamber. The solid lines and diamond symbols represent the original data and predicted values, respectively. Consequently, the two lines were approximately identical. The selected optimized DNN model predicted the interpolation effectively.
We selected the following variable values to verify the extrapolation performance: O2 = 0.2, p0 = 60 mTorr, and Pw = 200 W. The extrapolation predicted data that fell outside the existing data range, demonstrating the process of using prepared data with O2 = 0.3–0.7, p0 = 10–50 mTorr, and Pw = 100–180 W for data prediction with O2 = 0.2, p0 = 60 mTorr, and Pw = 200 W. All three variables extended beyond the range of the prepared data set. We made a step-by-step prediction with the O2 ratio, followed by pressure and power. Step 1 involved predicting 25 datasets corresponding to O2 = 0.2, p0 = 10–50 mTorr, and Pw = 100–180 W. The predicted data for O2 = 0.2, p0 = 20 mTorr, and Pw = 120 W were compared with the original data to determine the performance of the trained model. Figure 3(a) compares the data predicted by the DNN model in Step 1 and the ground truth of COMSOL. The average MAE was determined to be 1.65 %.
Figure 3(b) compares the predicted values of O2 = 0.2, p0 = 10–50 mTorr, and Pw = 120 W with the chamber-wide average values obtained from the original data. The accuracy of the overall data predicted in Step 1 was compared with original data. The following four variables were compared: electron, O2+ density, electronic potential, and electric temperature. The solid lines and symbols (diamond) represent the original and DNN-predicted data, respectively.
Step 2 used the data predicted in Step 1 for O2 = 0.2, p0 = 10– 50 mTorr, and Pw = 100–180 W were used to predict data for O2 = 0.2, p0 = 60 mTorr, and Pw = 100−180 W. Figure 4(a) compares the chamber plot and MAE of the original and predicted data. Among the data for O2 = 0.2, p0 = 60 mTorr, Pw = 100–180 W, the specific case of O2 = 0.2, p0 = 60 mTorr, Pw = 120 W was validated. The image comparison revealed that the original and predicted data were approximately identical. The average MAE was approximately 4 %.
Figure 4(b) compares the average values of the original and predicted data for O2 = 0.2, p0 = 60 mTorr, and Pw = 100−180 W. Four variables were compared: electrons, O2+ density, electronic potential, and electric temperature. The solid lines and symbols denote the original and predicted data, respectively. The overall correspondence between the two datasets was striking despite some errors. The error in Step 3 was small because the accumulated error in Step 2 was adjusted.
Step 3 involved predicting for O2 = 0.2, p0 = 60 mTorr, and Pw = 100–180 W based on the prediction made in Step 2. The O2 and pressure were kept constant. Power data ranged from 100 to 180 W. Therefore, the predicted value could be 200 W. Figure 5(a) compares the ground truth and predicted data. The measured average MAE in Step 3 was 5.26 %.
Figure 5(b) compares the densities of electrons, Ar+, and O2+ at the center of the chamber. The error between the original (solid line) and predicted data (symbol) was negligible. The MAE was larger than Step 2 owing to the accumulation of errors. However, the MAE of 5.26 % was significant. The MAE was lower than 10 % for all variables.
We predicted O2 = 0.8, p0 = 60 mTorr, and Pw = 200 W to accurately verify the extrapolation performance of the optimized DNN model. Our objective was to demonstrate the feasibility of bidirectional extrapolation by predicting O2 = 0.2 and O2 = 0.8 data, leveraging the prepared data spanning O2 = 0.3–0.7. We predicted the seven variables in three steps, following the order of O2, pressure, and power. Notably, Step 1 and Step 2 are identical. Therefore, we omitted both and introduced only Step 3.
Figure 6(a) compares the ground truth and predicted data images by measuring the MAE values. The average MAE observed in Step 3, with O2 = 0.8, p0 = 60 mTorr, and Pw = 200 W, was 5.61 %. Notably, an MAE of 8.44 % or less was predicted for all variables. Figure 6(b) compares the average values of the data at the center of the chamber. The original and predicted data were approximately identical. The selected optimized DNN model proved the possibility of two-way extrapolation.
This study proposes an efficient model for determining the optimal process conditions by combining a DL model with plasma simulation data. We eliminated complex equations and multiple models using a DL model based on data pattern learning to understand the relationship between the given process conditions and plasma states.
The DL model demonstrates fast inference capability with respect to the ground truth and provides a reliable correlation between the process conditions and plasma, verified by the experimentally predicted plasma uniformity. A step-by-step algorithm and optimization iterations can be applied to determine the optimal processing conditions for plasma uniformity owing to the fast inference capability of the DL model. This approach promises substantial reductions in the iterative trial and error experimentation typically associated with exploring optimal process conditions. Furthermore, the applicability of the model can be broadened by controlling the process variables and dataset of the plasma simulation, indicating that the model can be applied to different process conditions in ICP.
The combined model can be a powerful tool for narrowing process windows and efficiently controlling complex plasma processing by considering different process and hardware conditions.
This work was supported by the National Research Council of Science & Technology (NST) grant from the Korean Government (MNIST) (No. CRC-20-01-NFRI). We would like to thank Editage (www.editage.co.kr) for English language editing.
The authors declare no conflicts of interest.