Categories: Gambling

Data SGP – How to Load and Analyze SGP Data Using the python Package Data SGP

Data sprawl is the accumulation of vast amounts of data by organizations to a point where they no longer know what data they have or where it is. It has numerous drawbacks including increased management overhead, tying up technical talent with less impactful administrative tasks, hidden security risks, and missed opportunities to use data for business value. In addition, stringent regulations such as GDPR make it more important than ever to be able to locate customer data and comply with requests for that information in a timely manner.

In many ways, the term “big data” has become a buzzword in modern life that is used to refer to datasets too large for standard analysis tools. While the SGP does work with a significant amount of data, it is comparatively small in comparison to, for example, analyzing the interactions between Facebook users. This is why we think of the SGP as working with “medium data”.

sgpData

The sgpData vignette demonstrates how to load and analyze SGP data using the python package data sgp. It provides an exemplar data set that models the format of data that is used by the lower level student growth percentiles and student growth projection/trajectories functions in the SGP package. These functions perform quantile regression and utilize student assessment data to provide information about a students achievement history (i.e., their percentile rank) and their potential progression to future achievement levels (i.e., their percentile growth projection).

These models are based on Gaussian process regression (GPR) and variational inference algorithms. However, their computational complexity (O(NM2) time and memory for some chosen NM) limits their usage to small to medium size datasets. As a result, sparse GP approximation methods have gained popularity in the literature as efficient alternatives that scale to large datasets.

These approximation methods model the posterior distribution of the variables in a dataset using a low-rank representation and a sparse approximation of the covariance matrix. Moreover, they are capable of computing conditional distributions without requiring any assumptions about the distribution of noise on training data.

The sgpData_INSTRUCTOR_NUMBER table is an anonymized lookup table that provides insturctor information associated with each test record. This table is used by the student growth percentiles and projections functions to calculate current SGP scores, which show a snapshot of a student’s relative growth in comparison with their academic peers within a given window of time. This information is useful to teachers and administrators as they can determine if a student grew more than, less than or as much as their peers. This is different than Window Specific SGP, which compares a student’s performance to their peers across two or more consecutive testing windows. The SGP scores in these two formats can be viewed on the student’s report card and are available for teachers to review with their students. SGP results for students are reported on a 1-99 scale, where higher numbers indicate greater relative growth. These numbers are calculated by comparing the most recent test score to a prior assessment that is known for each student.

Article info