Students ’ Perception of Instructional Feedback Scale: Validity and Reliability Study

Students ’ ABSTRACT This study aimed to develop the “ Students ’ Perception of Instructional Feedback Scale ” (SPIFS) determining a framework related to the perception of instructional feedback by students. The sequential exploratory mixed method was used in the study. The study was conducted during the instructional design course offered to sophomores in the Department of Computer Education and Instructional Technology at two different universities. Accordingly, firstly a scale consisting of 31 items with Likert-type responses was prepared based on the literature review. Validity and reliability analyses of the scale were completed with a total of 231 participants. After necessary steps were applied in exploratory factor analysis (EFA, n=100), a structure with three factors and 19 items was established. The internal consistency analysis (Cronbach ’ s alpha), which was applied to the factors obtained and the whole scale, showed the scale to be reliable (whole scale α=.85, 1 st factor (mastery, 8 items) α=.92, 2 nd factor (positive affect, 6 items) α=.90, and 3 rd factor (negative affect, 5 items) α=.96). Confirmatory factor analysis (CFA) was performed (n=131). The structure established through EFA was tested via CFA. The results indicated that the developed structure had acceptable fit (RMSEA=.08, CFI=.91, and RMR=.03).


INTRODUCTION
Teaching quality affects the success of any curriculum in reaching its goals (Hattie & Clarke, 2018). Though there are different factors, feedback is seen as an important element to determine the quality of instruction (Dick et al., 2005;Senemoglu, 2013). Feedback can be provided in a classroom face-to-face or through online platforms (verbal or written) in every stage of a course (Hattie & Timperley, 2007;Parr & Timperley, 2010). Feedback is given to students to check their learning process, guide them to reach the right information, and correct their mistakes.

Types of Feedback
Various types of feedback can be seen in the literature. For example, Kulhavy and Stock (1989) categorized feedback as verification and elaboration. While verification is related to whether the student's answer is correct or not, elaboration is related to giving cues to guide them. Mason and Bruning (2001) emphasized that effective feedback should cover both of these types. Black and Wiliam (1998) indicated two main functions of feedback: directive and facilitative. Directive feedback explains what needs to be fixed or revised to students.
Facilitative feedback includes comments and suggestions to guide students in their own revision and conceptualization. Cho et al. (2006) revealed six feedback categories based on comments providing feedback: directive (suggests a specific change), nondirective (suggests a nonspecific change), praise (encouraging remarks), criticism (gives a critical or negative evaluation), summary (recapitulates the main points), and off task. Shute (2008) examined the experimental and meta-analysis studies about instructional feedback. She performed a comprehensive review (103 articles, 24 books and book chapters, 10 proceedings, and 4 other reports) that included the feedback types provided: verification, elaboration, correct response, try again, error flagging, attribute isolation, topic contingent, hints, bugs, and informative tutoring.

Developed Scales
In addition to instructional feedback types, scales were examined in the literature. The "Perceived Teacher Feedback Scale" was developed by Koka and Hein in 2003and revised in 2005(Koka & Hein, 2005. When the scale items are examined, it covers feedback provided for the development of psychomotor skills. Beydogan (2016) developed a scale for "perceptions of teacher candidates about feedback correction" with 670 students who received undergraduate and formation education. As a result of reliability and validity analyses, the "Feedback-Correction Perception Scale for Prospective Teachers" was created with 29 items and 6 factors (principality, quality, style, competence, scope, and transparency). When the scale items are examined, it is understood that the scale was created about the features that should be included in the feedback. Zumbrunn et al. (2016) used the "student writing feedback perceptions scale" in their studies and they expanded on the relationship between variables "writing self-efficacy" and "writing self-regulation aptitude". Conversely, they revealed a new qualitative construct in qualitative interviews to show detailed student writing feedback perception. The scale consists of 7 items.

Effects of Feedback on Learning and Performance
In the literature, the effect of feedback on learning and performance was another focus. Studies indicate that positive instructional feedback support students' academic performance (Bandura, 1991;Bandura & Cervone, 1983) and their engagement in the learning process (Charles et al., 2011). Additionally, feedback interactions may affect the increase of higher-level learning outcomes (Tan et al., 2019). Meta-analysis studies examined the effects of feedback on learning and performance and revealed that it has an impact on learning ranging from about .40 standard deviation (SD) (Guzzo et al., 1985) to .80 SD and higher (Kluger & DeNisi, 1996) compared to control conditions. Also, Kellogg and Whitefor (2009) emphasized that instructional feedback improves students' motivation and skills.
However, motivation may be adversely affected if feedback is misdirected or misunderstood (Zumbrunn et al., 2016). Also, if students perceive the feedback as too long or complex, they will become distracted (Shute, 2008). The length and complexity of feedback have an important role in the implementation of specific tasks by students. Concerns among students with lower self-efficacy about weaknesses in learning performance may lower their desire to receive feedback (Dujinhower et al., 2010).
In addition to all this, research results show that the complexity and length of feedback, scaffolding, motivation, timing, delayed and immediate feedback support are important feedback elements for students to fulfill their tasks (Shute, 2008). However, even if teachers provided all these elements to improve student projects, students still may not implement the feedback (Wilson & Czik, 2016). In this case, Nelson and Schunn (2009) emphasized that feedback implementations are related to cognitive and affective features of the feedback, as well as the student's understanding and agreement with the feedback message. For this reason, it is important to reveal the cognitive and affective structures of students in relation to receiving feedback. These dimensions of providing feedback were frequently examined in studies. However, the generalized cognitive and affective dimensions' framework was not determined for receiving feedback. Zumbrunn et al. (2016) pointed out the student's perception of "mastery" within the cognitive dimension in the structure created through interviews. Students have more motivation for mastery thanks to feedback. The improved mastery perception may be related to developed self-efficacy and the academic performance of students (Meece et al., 2006). As stated in the literature, positive (Bandura, 1991, Charles et al., 2011 and negative (Dujinhower et al., 2010;Shute, 2008;Zumbrunn et al., 2016) effects are included within the affective features of students who are receiving feedback. Fedor et al. (2001) stated that feedback may negatively influence the improving performance of learners occasionally. Corno and Snow (1986) emphasized that if the feedback is given when a student is actively focused on something, this interrupts the learning process.
Within the scope of the literature presented above, it is necessary to constitute a construct to determine the cognitive/mastery and affective (negative or positive) perception levels of students after receiving feedback. Also, a scale that could be used as an affective measure of the perceptions of students after receiving feedback (Isacescu et al., 2017;Watson et al., 1988) was not found within the scope of scale development research in the literature. Despite the known and significant impact of feedback on learning, determining students' perceptions of receiving feedback may help to update the types of feedback provided and correct the mistakes made during feedback. Also, teachers can manage learning and performance situations in line with students' perceptions. For all these reasons, this study aimed to develop a "Students' Perception of Instructional Feedback Scale" (SPIFS) to fill the existing void.

Research Design
The sequential exploratory mixed method was used in the study. First, qualitative data are collected and analyzed in this method. Then, results obtained from the qualitative data are converted to quantitative data. Therefore, this design is also referred to as a tool development pattern (Creswell et al., 2004).
This study aims to develop a students' perception of the instructional feedback scale. In the qualitative part of the study, students (n=70) were observed for 14 weeks during the instructional design course. The observation was unstructured and based on the experiences of teachers. Also, studies about instructional feedback were investigated to constitute an item pool (Ayar, 2009;Zumbrunn et al., 2016). In other words, items were generated from student observations and the literature in this study. The item pool consisted of 31 items (26 positives and 5 negatives) with 5-point Likert-type (1= Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, and 5=Strongly Agree). The scale was checked for content validity by three field experts and one language expert. On the other side, the quantitative part of the study was consisting of exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).

The Procedure of the Study
To develop the scale, construct validity, exploratory factor analysis, and then confirmatory factor analysis was conducted. Exploratory factor analysis (EFA) is a widely-used statistical technique transforming many related variables into a few meaningful and independent factors. EFA was conducted to examine the conformity of factor analysis. It is determined the construct validity of the scale and name the factors. Confirmatory factor analysis (CFA) is a statistical technique that is employed to determine if the variable groups contained in the factors decided through EFA are sufficiently represented in these factors and provide evidence that the scale can produce the same structure in similar groups (Buyukozturk, 2010;Tabachnick & Fidell, 2007). SPSS 18 and IBM SPSS AMOS 21 software were used to analyze the data. After these analyses were completed, the scale was finalized through the interpretation of the data. Figure 1 summarizes the procedure in the study.

Instructional Feedback Within the Course Design
The study was conducted during the instructional design course (2-hour theoretical and 2-hour practical) offered to sophomores in the CEIT at two different universities. Attention was paid to ensure that the oneterm course was presented with similar steps at both universities.
The students could contact the faculty members when needed with e-mail and Edmodo. Feedback was planned to be given by a faculty member at least three times a week. However, several students were reluctant to receive instructional feedback. To encourage the participants that were reluctant to receive feedback, the instructor gave 'plus (+)' to the student for each feedback because of their previous experiences. Also, the importance of receiving feedback was explained repeatedly. Nevertheless, the planned number of feedback sessions could not be reached. The feedback types were correct response, try again, error flagging, attribute isolation, topic contingent, hints, bugs, and informative tutoring.

Participants
Following the item pool development, the correlation between the number of items and the sample size was checked to conduct EFA. According to studies in the literature, the number of participants should be three times the number of items in the pool (Cattell, 1978). In the study, the examination of special groups caused an obstacle to reaching the required sample size. For this reason, the number of samples was constituted at a minimum level. Therefore, we used the data obtained from 100 (58 males and 42 females) CEIT students from University A who had already taken an instructional design course with Teacher A. In this way, we ensured that there was an ideal sample size. Teacher A and Teacher B taught the instructional design course together between 2012 and 2014 at University A. Therefore, the instructional feedback features of teachers were similar to each other. EFA study was conducted with these students. The CFA was conducted with 131 sophomore participants (74 males and 58 females) who attended the same courses presented in the same style at both University A and B. Three extreme data were removed from the data set. The convenience sampling method was used since the researchers conducted the study within the scope of their courses. During the course, students experienced project design and instructors gave feedback.
Instructional feedback was received by all participants (yes 81%, partially 19% in the EFA group and yes 88.5%, partially 11.5% in the CFA group) about their projects. Accordingly, the participants had the experience of receiving instructional feedback from instructors. Feedback frequencies varied from twice a week to once in three weeks. Twice a week (36%) and once a week (36%) comprised 72% of the instructional feedback frequency. Detailed information about the sample is given in Table 1.

Pre-Analysis
Before EFA, the fit of data was checked to determine the reverse items, missing values, normality of the dataset, extreme scores, relationships between items, sample fitness (KMO and Bartlett's sphericity test), and sample size (Field, 2009;Tabachnick & Fidell, 2007). The KMO coefficient was .854 and Bartlett's sphericity test had χ2 value of 1,746.589 (p<.05). There were no missing data because data were collected using an online system. Five reverse items were recoded. The histogram graphics, Kolmogorov-Smirnov test (p>.05), kurtosis and skewness coefficients (between +2 and -2), mean values, median, and mode were evaluated for the normality tests of the dataset (Field, 2009). The items were found to be close to normal.

Results of EFA and Reliability Study
The Promax rotation method was used in EFA. In the analysis, items were examined one by one in accordance with meaning and analysis results. Since the studies in the literature reported the requirement of removing items that were included in more than one factor with less than .1 difference (Field, 2009), the process was repeated until this criterion was met. Items included in two or more factors were removed starting from less useful ones for the scale. At first, items were named in factors such as "mastery", "positive affect" and, "negative affect". Based on literature and expert opinions, the first factor was named "mastery", the second-factor "positive affect", and the third-factor "negative affect". The first factor contains eight statements about the student's feeling of gaining mastery through instructional feedback. The second factor includes six statements about the student developing positive emotions towards receiving instructional feedback. The third factor involves five statements about the student developing negative emotions towards receiving instructional feedback. For the communalities table, according to Pallant (2007, p. 196), items with values under .3 are not appropriate with other items in their own factor. For this reason, the commonalities table was checked when removing items; however, values less than .3 were not found in any steps. As a result, 19 items were loaded in three factors as follows; 'mastery' (MA, 41.80%, α=.92), 'positive affect' (PA, 7.96%, α=.90), and 'negative affect' (NA, 22.56%, α=.96). The corrected item-total correlations ranged from .65 to .85 for "mastery", .58 to .82 for "positive affect", and .80 to .95 for "negative affect". The total variance explained by the factors was found to be 72.34% and Cronbach's alpha was α=.85. Also, the composite reliability (CR) and average variance extracted (AVE) values for each factor were checked to assess the construct validity of the scale. The results indicated that AVE values were .63 (mastery), .64 (positive affect), and .85 (negative affect). CR values were .93 (mastery), .91 (positive affect) and .97 (negative affect). Since values are higher than .5 for AVE and .7 for CR, the convergence and the distinct validity of the scale are adequate (Fornell & Larker, 1981). The scree plot graph of breaking point indicates three factors in scale. The graph is shown in Figure 2.   Table 2 shows the rotated factor loadings, variance explained by the factors, and the reliability coefficients. In this lesson, students were trained to become instructional designers. In the case of using this scale in different fields, the statement "instructional design" in items 1 and 11 could be replaced by another suitable statement.

Results of CFA
To verify the scale used in this study, CFA was performed on the data obtained from 131 sophomores students from Universities A and B, who had taken the instructional design course. Two extreme data values were excluded from the analysis. At the end of the analysis, we found that χ2=290.964 (df=145, p<.05). However, this value is quite sensitive to the sample size. As an alternative, a calculation based on the division of chi-square by the degree of freedom was previously suggested (Kline, 2011). A rate of 2 or lower indicates good fit of the model, while values of 2 to 5 represent acceptable fit (Hoe, 2008).
In the present study, the fit of the model was found to be good with χ2/df calculated as 1.81. Additionally, four modification indices were used to provide the goodness of fit values. In this context, the root means the square error of approximation (RMSEA), incremental fit index (IFI), comparative fit index (CFI), and root mean square residual (RMR) were determined. The ranges for the indices were based on studies by Schreiber et al. (2006), Simsek (2007), and Hooper et al. (2008). Table 3 shows the goodness-of-fit values together with statistical ranges according to CFA.
Also, the t-values of the model are presented in Table 4. The t-values of the model showed that all items are at significant level (p<.01) according to the model. Figure 3 presents the visualization of the results.

DISCUSSION AND CONCLUSIONS
This study aimed to develop the "Students' Perception of Instructional Feedback Scale" (SPIFS) to determine learners' 'affective' responses to receiving feedback. In this context, a scale consisting of 31 items with Likert-type responses was prepared through qualitative methods in the first step. Then the validity and reliability analyses for the scale were performed with 231 participants who received instructional feedback with various frequencies in two different universities. Following the application of necessary steps in EFA, a structure with three factors and 19 items was established.  The internal consistency analysis (Cronbach's alpha), which was performed for the factors acquired and the whole scale, demonstrated the scale to be reliable (whole scale α=.85, 1 st factor α=.92, 2 nd factor α=.90, 3 rd factor α=.96). To confirm this structure, CFA was completed. The structure that was established by EFA was tested through CFA. The results showed the developed structure to be acceptable in terms of fit values.
Feedback is one of the most powerful instructional techniques in the classroom (Hattie & Clarke, 2018;Oinas et al., 2020). Classroom feedback can be cited in relation to many theories in communication, psychology, and education (King et al., 2009). In education, Fujiya et al. (2020) stated that positive feedback, neutral feedback, and descriptive feedback are types of feedback that are already used in the learning process. This study is focused on how students perceive these types of feedback. One of the perceived feedbacks by students is the perception of mastery. The mastery factor represents students' appreciation for instructional feedback as a means to improve themselves (Zumbrunn et al., 2016).
The goals of mastery and performance may be confused with one another. A performance-approach goal is about attaining competence relative to others, whereas a mastery goal is about the development of competence itself and task mastery (Elliot & McGregor, 2001). Mastery experience is often associated with academic outcomes, achievement, and self-efficacy (Meece et al., 2006). Furthermore, two-way feedback interactions that can improve students' metacognition increase higher-level learning outcomes (Tan et al., 2019). The literature suggests that students with strong mastery goals focus on improving their skills more (Ames, 1992). However, there may be individuals with negative affect (NA) towards getting feedback due to various reasons. Studies mostly state the reason for this situation as students fear of being criticized, judged, and given a low mark. Students feeling this way may avoid receiving instructional feedback as they feel uneasy (Zumbrunn et al., 2016). Moreover, the judgmental and controlling features of instructional feedback received may lead to NA (Fedor et al., 2001). In this study, there is a higher than .3 relationship between survey results for "mastery" items and "positive affect" and "negative affect" items. Similarly, Baydas Onlu et al. (2020) stated that there was a relationship between "mastery" and "positive affect" and between "mastery" and "negative affect".

Limitations and Suggestions of the Study
The scale developed in line with the results obtained in the present study is a valid and reliable measurement tool to be used at the university level. The scale was developed to measure the perceptions of instructional feedback with university students, and in this study, data obtained from students of the Department of Computer Education and Instructional Technology were used. In this respect, it is thought that this study is limited but can be applied to different student groups. Although the scale was developed as a result of implementations in the instructional design course, it can also be used for other project-based course contents. Although it is seen as a limitation that these courses consist of applied content, this scale can also be applied to students enrolled in theoretical courses, since their feedback is considered in general. Also, four modification indices were used to improve the fit value in this study. Modification indices restricted this study because of reduced the generalizability of results. Therefore, researchers can verify the reliability of the scale by applying this scale to different and larger sample groups. Additionally, this study did not examine any relationship between feedback given and the impact on student learning/performance. This issue needs to be addressed in potential future research. Though feedback typically has a positive effect, not all types of feedback have an equal effect (Hattie & Timperley, 2007). Future studies may focus on the effects of different (oral, written, game-based, etc.) types of feedback and on differences between frequencies of feedback provided.