By Robert P. Trueblood, John N. Lovett Jr. (auth.)

This booklet is not only one other theoretical textual content approximately information or information mining. No, as an alternative it really is aimed for database directors who are looking to use SQL or bolster their figuring out of records to help facts mining and buyer dating administration analytics.

Each bankruptcy is self-contained, with examples adapted to actual enterprise purposes. and every research method should be expressed in a mathematical layout for coding as both a database question or a visible uncomplicated process utilizing SQL. bankruptcy contents comprise formulation, graphs, charts, tables, information mining recommendations, and more!

**Extra info for Data Mining and Statistical Analysis Using SQL**

**Example text**

08 Table 2-7. 07 We can calculate the weighted mean by executing Query 2_14. Observe how the Sum function was used twice--once in the numerator and again in the denominator. To obtain the percentage, we multiply by 100. 55% and it shows the average increase in the total costs of running the business. 21 Chapter2 John's Jewels If Only More Than Half Were Above the Median Once I was talking with a high school counselor. " "Yes, that would make sense," I thought to myself. "If we could only get more students above the median, I think we would be in a better placement position," he concluded.

The reason is that it resets the counter to one; otherwise, the counter continues to increment from its last value. 15 Chapter2 John's Jewels Mean or Median? When my students asked me when a median was a better measure of "average" than a mean, I offered the exam score example. It goes something like this: Now suppose I graded your exams, and the scores generally ranged from the 70s through the 90s, but one poor devil made a 15%. Now if you asked me the average score, I could quote the mean, but it would be somewhat biased, or artificially low, due to the "outlier" score of 15.

A visual inspection of the histogram indicates that the mean most likely falls between 19 and 22, in the highest bar, since the shape is relatively symmetric (or "mirrored") around this bar. 85 or about 20. The data are dispersed between 4 and 29. We can determine the extent of either the absolute dispersion or the average dispersion, by employing two new statistical concepts presented below. Before we do, however, let's use SQL to generate the histogram. ,... ~ E z= 12 8 4 0 5 1 [4-7) 0 [7-10) 1 [10-13) [13-16) (16-19) [19-22) [22-25) [25-28) [28-31) Stock Price ($) Figure 2-2.