Incorporating Mobile-based Artificial Intelligence to English Pronunciation Learning in Tertiary-level Students: Developing Autonomous Learning

EFL pronunciation learning is known to be teacher-centered, in which students are reliant on the feedback given by their teachers. Fortunately, the presence of technology-based learning tools is believed to be able to reduce students' dependency on teachers. A built-in Automatic Speech Recognition technology (ASR): ELSA Speak is a mobile-based application that could help students with pronunciation learning by providing prompt feedback and detailed evaluation of its users' pronunciation accuracy. The objective of this study was to examine the


Introduction
For decades, prior to the pandemic, learners were heavy reliance on deliberate isolated practice on each component of pronunciation skills For decades, prior to the pandemic, learners were heavy reliance on deliberate isolated practice on each component of pronunciation skills [1], and accustomed to learning pronunciation through role modeling activities in face-to-face settings led by the teacher [2].Since English is their foreign language, most of them are incapable and unenthusiastic to strive without being overseen and corrected persistently by their teachers [2], [3].Hence, in the next new normal, there has been a shift performed by educators on the teaching approach from a typical classroom environment to fully e-learning or some others carried out in hybrid and/ or blended EFL pronunciation learning is known to be teacher-centered, in which students are reliant on the feedback given by their teachers.Fortunately, the presence of technology-based learning tools is believed to be able to reduce students' dependency on teachers.A built-in Automatic Speech Recognition technology (ASR): ELSA Speak is a mobile-based application that could help students with pronunciation learning by providing prompt feedback and detailed evaluation of its users' pronunciation accuracy.The objective of this study was to examine the use of the mobile-based application, ELSA, in pronunciation learning and whether it could stimulate learners to become more autonomous.A mixmethod approach was employed in this study.The quantitative data, in the forms of pre-test and post-test, was gathered through students' pronunciation recordings available in ELSA.This data was analyzed using the paired sample t-test.In addition, a Google Form questionnaire, in five-point Likert Scale, was used to support the quantitative findings.Meanwhile, a semi-structured interview with selected participants served as the qualitative one.These data were analyzed thematically based on the participants' responses.The result shows that incorporating mobilebased technology into English pronunciation learning could provide an opportunity for students to improve their pronunciation.Moreover, students could also broaden their repertoire of available practice strategies that might be useful to improve their autonomous learning ability.
learning.In this post-COVID-19, technology-integrated teaching has become an immense aspect to be explored by teachers in carrying out the learning process in this distance learning system due to its potential in facilitating language teachers and learners in accomplishing their academic goals [4]- [6].
Online learning is flexible, where the students can personalize and excel in their self-management learning by having the privilege to learn remotely from home in places which are convenient to them collaborating and communicating with their peers and instructors [5-[7], as well as in English Language Teaching.Unfortunately, learning pronunciation is challenging for learners, and complicated to be acquired autonomously since it requires considerable attempts of practice and feedback [2] to be successful in communication and communicative competence [8], [9].Learners need to develop their pronunciation, as it is fundamental in learning to speak, and the most required component in speaking skills besides fluency, vocabulary, and grammar [3].The learning itself is a proactive activity due to the complexity of the English language, and hard to adopt because of pronouncing is contrasting to writing words.
In pronunciation learning, the learners should emphasize pronunciation sub-skills like word and sentence stress and intonation, rhythm, and sounds producing which relates to phonetics and phonology [1].Many previous empirical researches [1]- [3], [8], [10] believe that the substantial aspect of learning pronunciation is how the learners can intelligibly and comprehensibly use the target language in their communication.In other words, it means that they can pronounce the language properly, and well enough to be understood by their interlocutors instead of to be nativelike accuracy, apart from increasing self-confidence, and their ability to observe and customize their speech strategies [11].Definitely, in this new normal, exhibiting and implementing effective selfregulated learning strategies paired with technology-integrated language teaching and learning is mandatory [12]- [14].
Thereby, despite limitations and challenges faced by teachers and learners in making use of technology as a learning tool, teachers are strongly encouraged to explore and apply meaningful and productive strategies in everyday usage of digital technology such as mobile phones to enhance language learning [15]- [17].They can delve into the most effective tools out of a range of available tools which is practical to learn and use, integrated with the topic in the curriculum right from the start, and free of cost [18]- [20], then experiment on the tool prior to implement it in the learning process with the students [6].Due to this circumstance, mobile phones and other technologies can be more than tools that precondition the learners for their lifelong learning and as a powerful learning toolkit for improving their self-confidence and competence [21]- [23].
However, the use of technology in pronunciation classes to increase learners' autonomy has yet been maximized by teachers because some of them are uncertain about how integrating the technology in the mobile device or other technology into the learning process and make this learning becoming reassuring.Therefore, it is equally important to examine specifically designed pronunciation learning through technology-based perceptual training which may facilitate the teachers in reinforcing subsequent accurate pronunciation and proper use of words which are troublesome to comprehend without instruction [10], [24], [25], and encouraging more independent practices that will enhance their students' pronunciation and attend to novel segmental forms [18], [19].
Hence, this study aims to examine whether the AI pronunciation application can provoke learners to practice and function as autonomous agents in their own learning.Through their self-regulated learning, assessments, and feedback received, it is expected that it can reinforce their ability to control their learning environment through the advances of technological techniques.To some extent, instructional methods being used by teachers in this study enable students to demonstrate the required level of achievement in their pronunciation and shift the teachers' role to become the facilitator and resource for autonomous learning.By and large, the methodologies employed and the level of development reached within this study can be acknowledged in the coming future as state-of-the-art in teaching and learning pronunciation.

A. Language Learning and Autonomy
Nowadays, more innovative approaches have been employed by teachers to meet the needs of language learners to be more communicative, and be able to construct their knowledge.There has been a shift in learning from teacher-centered to student-centered, where the learners are expected to be competent.It is by developing their learning and relating it to their prior knowledge while at the same time increasing their ability to be aware of their learning; purpose, method, observation, and evaluation of their learning [27], [28], and to discuss, engage, and interact with others [26], [29], [30].
In language learning, learners are encouraged to be autonomous since by being independent they can acquire knowledge in the target language, be attentive and critical in their learning, as well as design adequate learning environments in accordance with their needs and styles of learning [26], [31].Accordingly, being autonomous in their learning means they should build an interpersonal relationship with their teacher and classmate which becomes the prerequisite component in the student-centered learning approach.To some extent, collaborative learning in exploring the language can reduce the dominant role of the teachers [2], and only become the facilitator of learning by encouraging students to be creative and innovative constructing the knowledge obtained effectively [29].
However, some previous studies [2], [32]- [35] argued that autonomous learning can present when both teachers and students have the same perspective and readiness.Learner autonomy can be established thoroughly when the teachers are capable to promote and practice autonomy to the students effectively which means the teachers should be autonomous ahead of time [2], [30], [35].
Particularly in the Asian context, where teachers' belief has a powerful influence on the learning process of their students, teachers should have an understanding of the autonomy concepts prior to exercising them with their students [33], [34].Due to this social/ cultural issue, teachers are expected to improve their relationship with their students by providing more constructive guidance and creating conducive learning environments to meet students' developmental, emotional, and academic needs [30], [36].

B. Mobile-Based Technology in Pronunciation Learning
There have been several studies conducted to improve English learners' pronunciation utilizing mobile-based technologies [37], [38].In the pedagogical approach, two mobile-based techs can be used in pronunciation class; first by combining the in-class lesson with cutting-edge, widely utilized ASR technology; second by using non-ASR software.The major differences are in the corrective feedback provided and the approaches to teaching.In accordance with the needs of the learners, ASR technology is more sufficient compared to non-ASR, as the feedback from ASR technology are customized based on the ability of each learner, whereas the latter is focused on learners' answer concerning isolated sounds and short sentences and compared to a pronunciation model, even though the two utterances do not coincide [2], [39].
Therefore, many experts agreed that ASR tech can become a great supplement and valuable tool for teaching pronunciation, as it supports learners to practice frequently and receive real-time assessments [37].Due to its appropriateness to learning objectives, quality and accuracy, the practicality of use, and cost, many teachers prefer to choose ASR technology in their pronunciation class [18] p. 196.Particularly in recent times that the need to be intelligible and comprehensible is significant, meanwhile, the scheduled time to learn pronunciation in face-to-face pronunciation teaching is limited as there are many workloads should be done by the teachers in this circumstance.Thus, using technology in pronunciation learning to improve learners' pronunciation ability autonomously in a stress-free environment is required [2], [24].

C. Technology and Self-directed Learning
Many foreign language learners have a substantial concern about their anxiety in learning and using the language which has become the major obstacle to their fluency and language acquisition.Previous empirical studies [38]- [42] highlighted that out of the four skills in English, speaking is the most affected one as learners are cautious in the pronunciation related to being intelligible.In line with the circumstances in recent years where technology has emerged and reinforced in educational settings due to the pandemic [16], technology enables the learners to have more practices in their own time and space where they can develop their competencies in their learning in an encouraging setting [16], [39], [42], [43], and expose to new chances for the teaching and learning improvement [17].
Learners and teachers can use technology in educational contexts in one of two key ways: by "learning from" or "learning with" it.Technology-assisted learning is concerned with the strategic application of technology to enhance traditional learning methods in which students are largely passive participants [23], [44], [45].Groff [17] pointed out that learning with technology opens up new options that improve teaching and learning.Tailoring learning to individual learner requirements, which is strongly supported by the learning sciences can be the role that technology can play in the process.However, the elements in the system should be interconnected to form a whole such as teachers' and students attitudes and capability towards technology, and institutional supports [46].Whereas learning from technology is integrating multimedia like computer-based learning to improve the learning in the class.By shifting laborious or repetitive tasks to a computer, instructional technologies serve as tools that could benefit educators and students as well while they are actively participating in educational activities [45], [46].Taken as the example is the use of online dictionaries while students are assigned to a task during their learning in the class [47].
Integrating digital devices of technology to support learners in customizing their learning environments noticeably can develop autonomy in learning concerning its affordance to individual learner needs [17], [43], [48] which also becomes a potential benefit for learners' language achievements [38], [48].Despite the teacher's conflict in utilizing technology in the teachinglearning process, and the shift of the role in learning towards learner-centeredness, providing opportunities to the learners to manage their learning throughout their lifetime through the use of technology enriches learners' learning experiences leading to language improvement [21], [43], [49], [50].

Methodology
This study focused on the mobile-based application for independent pronunciation practice: Elsa Speak, where the students were assigned to carry out a particular task per meeting using the application.This Automatic Speech Recognition (ASR) application is a high-tech version of the audio-lingual-style where the users can use their learning with a great deal of mechanical repetition and receive automated reasoning with maximum accuracy and perfection.Before the study, the students were asked to download the ASR app into their cell phones and received basic instructions about how to use this mobile-based application and incorporate its use into their tasks.

A. Research Design
A mixed methods approach was applied to examine the students' learning behavior changes.The quantitative data was collected from the paired sampled t-test by comparing the results of participants' pre-test and post-test.In addition, a questionnaire was administered online using Likert scale questions to measure the participants' beliefs, attitudes, and opinions.The questions in the questionnaire addressed the participants' learning habits after participating in the research on how they can self-direct their learning, learning situation [35], and pronunciation proficiency.Subsequently, the qualitative method was carried out as the follow-up of the quantitative analysis to broaden and cross-validate the findings obtained from the quantitative approach [51].It was obtained through semi-structured interviews.The interview was done to explore the in-depth data referring to the participants' beliefs and opinions on their level of autonomy and empowerment in pronunciation skills.

B. Participants
A total of 26 students of the English Department, Politeknik Negeri Padang, were recruited in this study.These students were taking the Speaking 2 subject, which was the continuation of the Speaking 1 they took in the previous semester.Therefore, the participants were considered to have adequate knowledge of speaking skills.The classroom meeting for this subject was usually once-a-week for about 100 minutes per meeting.Mainly, the classroom learning was set for the students to do the roleplay activities to practice speaking with their peers, and the teacher played the role of the facilitator.Then, at the end of the class, the teacher would provide some tasks concerning pronunciation and how they can practice it at home.
The participants' age ranged from 19 to 20, where 18 of the 25 students were female.Since the beginning of the Speaking 2 subject, they have been introduced to the use of the Elsa Speak app as an additional tool in their learning to practice their pronunciation independently.Furthermore, there was an evaluation made per end-of-class meeting to control the use of this app by the participants and how frequently they used the app.

C. Data Collection Procedures
The informed consent form, the study procedure, anonymity, and confidentiality were given to the participants to be signed before their participation.Initially, the participants were informed to install the app on their mobile phones and comprehend how to access the app and the essential information on the stages they should follow.The first data was obtained from the results of pre-and post-tests provided in the app.In three months between the pre-and post-tests, they were skilled in using the artificial intelligence app through practice-oriented sessions during the once-a-week teachinglearning process.
The second data was gathered from the result of an online questionnaire, which was distributed to the participants in Goole Form (Gform).The questionnaire contained 10 questions by rating each component on three dimensions of autonomy questions, on a 1-5 Likert scale: Strongly disagree; (2) Disagree; (3) Neither agree nor disagree; (4) Agree; (5) Strongly agree.The questionnaire was sent through Whatssapp group.The students submitted their answers within three days.The dimension of questions in the questionnaire was adopted from Alzieni [36].Alzieni [36] p. 1022 highlighted that there were three requisite dimensions of autonomy that can be utilized when measuring the development of language learner autonomy; 1) inquiry-based learning skills, 2) metacognitive skills, 3) emotional intelligence skills.In his experiment conducted on the foundation program students, the result showed that integrating mobile learning into English language learning could foster learner autonomy on to the learners.Among the ten questions in the questionnaire, four questions were arranged on the 1 st dimension of autonomy; inquiry-based learning skills.Another four questions were for the 2 nd dimension, metacognitive skills, and the last two questions were for the 3 rd dimension, emotional intelligence skills.
The third data was derived from the semi-structured interview as the follow-up of the questionnaire's results to elicit realistic responses.The interview was performed with ten students and carried out through Zoom Meetings based on a volunteer basis.The process for this interview was controlled and recorded following the interview protocol.The interview questions were personalized to the need of the study for further information to what extent autonomous pronunciation learning using the mobile-based app has made a difference in the students' self-control learning and improvement in their pronunciation ability.Bahasa Indonesia, L1 of the participants, was used in the interview to gain insight into the participants' perspective to be more specific and detailed concerning the research topic [52], [53].

D. Data Analysis Technique
For the data analysis, the results obtained from the pre-and post-tests were analyzed by using paired sample t-test to compare the mean between the two tests.Qualitative data analysis involved the findings from the questionnaire's results where the data captured was descriptively analyzed to determine the degree of agreement on the four dimensions covering the empowerment of autonomy learning.Furthermore, the interview data were transcribed and sent to the participants for their approval on the written transcription and to use certain excerpts for thorough analysis.Then, the transcriptions were analyzed narratively to seize a larger pattern in answering the research questions [54].

A. Correspondences between Pre-and Post-Tests on schwa <ə> and ending sound
Before conducting paired sample t-test, there were two assumptions that should be fulfilled; there must be no outlier for each group, and the data should be normally distributed.The outlier was measured using boxplot where out of 26 students who participated in the pre-and post-tests, there were two students whose score becoming the outliers.One student was from her pre-and post-tests' scores for the ending sound, and another one was from her pre-test' score for the schwa <ə> sound.
For the paired t-test calculation, the data obtained from these two students were not included.Seen in Table 1 below is the boxplot after removing the data outlier.Furthermore, Shapiro-Wilk test was conducted for normality after there was no outlier.Sig value for all groups was 0.05 indicating that the whole groups were approximately normally distributed.When it has been confirmed that there was no outlier and approximately normally distributed, then paired sample t-test could be performed using the SPSS 27, as shown in Table 2 and 3 below: A paired samples t-test was conducted to determine the effect of the tasks performed on the pronunciation score (schwa <ə> and ending sound).The results present: (1) mean different between preand post-test score and (2) the test of significance.It can be seen from the mean different that there was an increase of mean in schwa <ə> sound from pre-test (M=60.33,SD=10.635) to post-test (M=62.71,SD=10.921).The same condition was found in ending sound in which the pre-test mean score increased from M=65.28, SD=13.281, to M=68.60, SD=9.399.
However, paired sample t-test also demonstrated that the mean differences between the pre-test and the post-test scores of the two variables were not significant.It was [t(36) = 1.086, p = .284]for the result between the pronunciation score for schwa <ə> before and after using the speech recognition application.As well as the ending sound score before and after using the app, which was [t(23) = -1.135,p = .268]and [t(23) = -1.248,p = .224]for the latter (see Table 2).Nevertheless, the significance was not reached, the results in Table 3 shown above indicated that the pronunciation score of the students for both schwa <ə> and ending sounds was increased slightly.

B. Evidence from the Questionnaires
All 26 participants were asked to reflect on and assess their self-development of their self-directed learning referring to the app they had used.The questions in the 1st dimension elicited the inquiryoriented activity in which the participants constructed the knowledge creatively and critically by him/herself or by cooperating with their peers or groups to complete the tasks given.The results revealed that most of the students (N= 14, 73.1%) thought the exercises in the app were challenging and interesting for self-pronunciation learning, as the exercises provoked them to keep discovering lessons on pronouncing the words intelligibly and comprehensively.They also strongly agreed (N=17, 65.4%) that the learning systems designed by the app made them becoming aware of their strengths and weaknesses so that they could make a systematic plan in their learning process in accordance with the targeted outcome for their performance.Additionally, there were slightly above half of the students (N= 14, 53.8%) came to an agreement that frequently carrying out the exercises in the app improved their critical and creative thinking, and encouraged them to collaboratively work with their classmates in doing and evaluating the exercises without the presence of their teachers [36], [55].Empirical researches [40], [56]- [58] demonstrated that the students who enjoyed the learning atmosphere tended to have self-efficacy and identity-related orientations toward increasing their willingness to study and improve their academic achievement.
The results for the 2 nd dimension: metacognitive skills were varied, and there was less agreement obtained from the students' reflection particularly when they encountered difficult times in employing specific strategies when they completed the exercises in the app (Neither agree and disagree N=6, 23.1%; Agree N=12, 46.2%; Totally Agree N=7, 26.9%).Even though, Akyol [8] stated that there was no correlation between the pronunciation ability of the students with the learning strategies they employed.The result was in line with McCrocklin' [2] and Kim' [48] point of view that the more comprehensive the students in applying the strategies in their learning, the more learning context experiences they acquired specifically in understanding and being supportive of their classmates.The students also specified that the materials they found in the app were different from what they received in the classroom (Agree N=7, 26.9%; Totally Agree N=8, 30.8%).As Liu et al., [37] pointed out that in recent several years most teachers considered that spending extra time in teaching pronunciation was time-consuming and impractical, especially in a large class.Despite the problems, they considered that they had improved their pronunciation skills after learning the materials provided in the app and accessing the exercises (N=15, 57.7%).The only higher number for the students' conformity was their reflection on the new essential knowledge they obtained to be intelligible and comprehensible in spoken language (N=17, 65.4%).To some extent, it was in conforming with Gilakjani [59], Tuan [29], and Dornyei [55] studies that providing frequent practices and encouragement in the learning process could increase the awareness of the students of the significance of learning and improving their pronunciation beyond the classroom.
In the 3 rd dimension: emotional intelligence skills, students contended that through the self-access tool of learning, the students highlighted that they had become accountable and motivated in viewing their successes and failures (N=20, 76.9%).Most of them (N=23, 84.9%) affirmed that the stressfree environment and the training and guidance provided by the teachers made them aware of being reflective and critical of the learning, and improved their relationship management and social awareness when collaborating with their classmates during the completion of the exercises in the app.Having the freedom to explore and experiment with the materials and exercises outside of the class, being able to manage and control their learning, and receiving immediate feedback on their work developing their self-awareness and self-management skills for their success in the future [2], [4], [36], [43].

C. Students' In-depth Interviews
The semi-structured interview was conducted as the data triangulation to provide further perceptiveness and delve deeply into the students' ideas concerning the level of autonomy, they made in pronunciation learning through the ASR.The qualitative data from the interview result strengthened the data obtained from the paired sample t-test and questionnaire previously mentioned.The majority of the students pointed out that using the ASR in out-of-classroom activities motivated them to learn how to pronounce words correctly so that they could be understood by their interlocutors.Settling their own learning time based on their preferences with the absence of pressure and fear of being laughed at when they made mistakes facilitated the language acquisition process.Direct feedback that they received from the application on their pronunciation errors, and guidance to improve their pronunciation ability increased their self-esteem and reduce their anxiety in speaking which at the same time promoted self-directed learning.
One student highlighted, "I always enjoy using this application.I can start using it from the level that I want, and feel comfortable with.I also like it because I can set the time that I want to study, and usually, I set the alarm in my cell phone to remind me".Another student commented, "The app was so helpful because it gave me the freedom to complete the lessons and the exercise level-by-level.It challenged me to complete the exercises and to keep practicing and say the words with correct pronunciation without receiving any judgment on the errors I made".The findings affirmed that the more flexible and comfortable the learning atmosphere that the student had, shown the greater their performance in terms of their pronunciation ability to be intelligible and comprehensible, and the more confident they were to actively engage in the class [60], [61].Additionally, most of the students revealed that learning from the ASR and drilling their pronunciation with this app before practicing with their peers positively affected their psychology and behavior to communicate with their teachers and perform during class time [57], [60], [62].
The results indicated that the activity that the students found in the app attracted their curiosity and interest, and stimulated them to outperform and reach their target in their learning.As Dornyei (1994) stated that intrinsic motivation within the students could be the central motivator for their learning process, as the process was not compulsory like at school.In other words, the learning was likely to flourish academically when the students were sufficiently self-determined, especially in their need for achievement and self-confidence [55], [58], [63].

Conclusions
By using metacognitive strategies, teachers can assist their students to use metacognitive strategies in improving their learning through self-learning patterns, besides reducing the workload and time spent in the pronunciation class.Using ASR technology provides a positive impact onto the students' learning habits as they can develop their pronunciation skills outside of the classroom and strengthen their social and relationship awareness among their peers without the presence of their teachers.In student-centered learning, teachers play the role of facilitators who promote instruments as the appropriate tool for the students to learn independently, provide opportunities for the students to practice and control their learning, and encourage them to work on being intelligible and comprehensible in their pronunciation.Even though students a few times find using the ASR technology challenging and provoking them to put much effort, they acknowledge that the materials and exercises provided by the app stimulate their critical thinking and problem-solving abilities.They realize that gradually they can develop their pronunciation by having frequent and scheduled practices in the stress-free learning environment they have set.

Table 1 .
Test of Normality

Table 2 .
Paired Samples Statistics

Table 3 .
Paired Samples Test