Linear Regression - 선형회귀 개념 및 결과 해석방법 (매우쉬움) -2

1. Data Analyst/1-0-2. Statistics (통계)

Linear Regression - 선형회귀 개념 및 결과 해석방법 (매우쉬움) -2

Data Analyst 2023. 9. 7. 09:43

728x90

회귀분석 결과 해석방법은 다음과 같다.

[Regression Analysis Output] -1. Summary Output

=> 회귀 방정식의 데이터 적합성

> Regression analysis is used to estimate the relationships between two more variables

: 회귀분석은 두 개 이상의 변수간 관계를 추정하기 위해 사용되는 기법이다.

	-1. Summary Output
1	Multiple R (= Correlation coefficient) ( -1 ~ 1 )	> Measures the strength of a linear relationship between 2 variables. : 두 변수간의 선형 관계성 정도를 나타내는 지표 > 높을수록 선형관계가 강하다
2	R square (= Coefficient of Determination) ( more than 95% is considered a good fit)	> Used as an indicator of the goodness of fit. : 적합도를 나타내는 지표로 사용됨. > The figures indicate that n% of our values fit the regression model, n% of the dependent variables (y-values) are explained by the independent variables (x-values). : n% 수치는 다음과 같이 해석된다. 우리의 종속변수 n%가 이 종속변수로 설명 된다.
3	Adjused R square	여러 개의 독립변수를 고려해 조정된 R Square이며, 다중 선형 회귀분석에서 R^2 대신 사용된다.
4	Standard Error	> It is also another goodness-of-fit measure that shows the precision of your regression analysis - the smaller the number, the more certain you can be about your regression equation. : 적합도를 측정하는 또 다른 지표. 얼마나 이 회귀 분석이 정확한지를 나타내며 수치가 적을수록 정확도가 높음을 의미한다. > While R^2 represents the percentage of the dependent variables variance that is explained by the model, Standard Error is an absolute measure that shows the average distance that the data points fall from the regression line. : R^2는 모델로 설명되는 종속변수의 %를 나타내는 것에 반해 Standarad Error은 각 데이터가 회귀선에서 떨어진 평균거리를 의미한다.
5	Observations	> It is simply the number of observations in your model 관측치

[Regression Analysis Output] -2. ANOVA

=> 분산분석

> Basically, it splits the sum of squares into individual components that give information about the levels of variability within your regression model

: 기본적으로, 각 제곱의 합을 회귀모델 내의 가변성 수준을 나타내는 구성 요소로 분할한다.

	-2. ANOVA
1	[df]	[df] is the number of the degrees of freedom associated with the sources of variance. : [df]는 분산과 관련된 자유도의 수를 의미
2	[SS]	[SS] is the sum of squares. The smaller the Residual SS compared with the Total SS, the better your model fits the data. : [SS]는 제곱합이다. 잔차 제곱합이 작을수록 모델을 잘 설명한다.
3	[MS]	[MS] is the mean square. : [MS] 는 평균제곱.
4	[F]	[F] is F statistic, or F-test for the null hypothesis. It is used to test the overall significance of the model. : [F] 는 귀무가설(영가설)에 대한 F 통계량 혹은 F-test를 의미하며, 전반적인 모델의 중요성을 시험할 때 사용됨.
5	[Significance F]	[Significance F] is the P-value of F. [Significance F]는 F의 P-value이다. => 결과를 얼마나 믿을 수 있는지를 알려주는 지표 => If [Significance F] < 0.05 (5%), = 믿어도 괜찮다. [Significance F] > 0.05 (5%), = 다른 독립변수를 선택하는 것이 좋아 보인다.is the P-value of F. [Significance F]는 F의 P-value이다. => 결과를 얼마나 믿을 수 있는지를 알려주는 지표 => If [Significance F] < 0.05 (5%), = 믿어도 괜찮다. [Significance F] > 0.05 (5%), = 다른 독립변수를 선택하는 것이 좋아 보인다.

[Regression Analysis Output] -3. Coefficients

=> 계수

> This section provides specific information about the componenets of your analysis

: 분석 구성요소에 대한 좀 더 자세한 설명을 나타냄

The most usefull component in this section is COEFFICIENTS. It enables you to build a linear regression equation.

[Regression Analysis Output] -4. Residuals

=> 잔차

> If you compare the estimated and actual number of sold umbrellas corresponding to the monthly rainfalls of 82mm, you will see that these numbers are slightly different. Why's the difference? Because 독립변수 (independent variables) are never perfect predictors of the dependent variables. And the residuals can help you understand how far away the actual values are from the predicted values:

* 두 변수간의 관계를 빠르게 확인하고 싶다면?

-> 선형회귀 차트 그리기

> 두 컬럼 선택 (X, Y)

> 최소제곱 회귀선(Trendline - Linear) 선택

728x90