Cars Data Processing

Last updated: 2022-12-28

The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s. It is a data frame with 50 observations on 2 variables.

Data frame: .
ID Name Type Label missings Values Value Labels
1 speed numeric 0 (0.00%) range: 4-25
2 dist numeric 0 (0.00%) range: 2-120

The table below gives a snapshot of the first 10 data points in the Car Data set that would be used in the entire section to provide some insight of how data can be structured to provide a data-driven decision making.

Data Table

Cars Summary Table

The Cars Summary table provides Summary Statistics of the two numeric Variables Speed and Distance.

Data Visualization

One visual is more more than a thousand - word/text explaining an outcome of a data analysis. Therefore, in this subsection we would provide some data visuals from the Cars data set. The visualization would provide a clear picture of what the data would mean in the real World set up.

Histograms of Individual Variables

From the above Histogram we Can deduce that 30 ft is the Stopping Distance of many Car. Most Cars are also stopping within 0 to 90 ft while only one Car has the Capability of stopping at 120 ft. This provides an insight whether to consider it an outlier. As a result we might be interested to plot a box plot to investigate further.

As was detected by the histogram in the previous Chat, Seemingly the Car with the Stopping distance might be an outlier in this Cars Dataset.

A scatter plot of Speed Against The Stopping Distance

Scatter Plot with a Fitted Linear Regression line

The Correlation between Stopping Distance and Speed

Linear Regression Model

We fitted a linear model (estimated using OLS) to predict dist with speed (formula: dist ~ speed). The model explains a statistically significant and substantial proportion of variance (R2 = 0.65, F(1, 48) = 89.57, p < .001, adj. R2 = 0.64). The model’s intercept, corresponding to speed = 0, is at -17.58 (95% CI [-31.17, -3.99], t(48) = -2.60, p = 0.012). Within this model:

Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.

Visual Representation of the Regression Model Effect

Conclusion

In conclusion, linear model (estimated using OLS) to predict dist with speed (formula: dist ~ speed).The model explains a statistically significant and substantial proportion of variance (R2 = 0.65, F(1, 48) = 89.57, p < .001, adj. R2 = 0.64). The intercept is statistically significant and negative (beta = -17.58, 95% CI [-31.17, -3.99], t(48) = -2.60, p = 0.012; Std. beta = 8.71e-17, 95% CI [-0.17, 0.17]), The effect of speed is statistically significant and positive (beta = 3.93, 95% CI [3.10, 4.77], t(48) = 9.46, p < .001; Std. beta = 0.81, 95% CI [0.64, 0.98]). Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.