Statistical Modelling

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables.

The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning,


Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.



Describe a problem in which you’ve applied a statistical model


·         Provides context
·         Describes an actual statistical approach (examples could involve regression modeling, predictive analytics, or econometric analysis)
·         A statistical model should draw from a set of empirically observed data to infer non-observed data (for example, we use a model to predict future Worker quality based on observed Worker performance)
·         Relates to a business impact
Describes the tool used – SAS, R, SPSS, others)


Logistic regression, or logit regression, or logit model is a regression model where the dependent variable (DV) is categorical.


o prepare it for modelling. You will identify and treat missing values, detect outliers, transform variables, create binary variables if required and so on. This stage is very influenced by the modelling technique you will use at the next stage.  For example, regression involves a fair amount of data preparation, but decision trees may need less prep whereas clustering requires a whole different kind of prep as compared to other techniques.


How will you treat outlier values?
You can identify outliers using graphical analysis and univariate analysis. If there are only a few outliers, you can assess them individually. If there are many, you may want to substitute the outlier values with the 1stpercentile or the 99th percentile values.
If there is a lot of data, you may decide to ignore records with outliers.
Not all extreme values are outliers. Not all outliers are extreme values.


Why econometrics? The difference between econometrics and statistics is that statistical modeling is more concerned with fit, and econometric modeling is more concerned with properly estimating the coefficients in a regression. Getting the “right” (consistent & unbiased) estimates means that the analyst can more effectively measure how a change in one variable can strongly predict (or cause) a change in the dependent variable. These techniques can help solve problems in social/web data that previously were only solvable using future data collection from randomized multivariate experiments. 

 Supervised Learning
 - Right values are given/plotted and then trying to solve a problem
 
Regression Problem
- Fit a straight line or polynomial curve line
- Predict continous value output (say price)

Classification Problem
- Discrete valued output (0 or 1)
-    

Clustering
- Example: Google News groups the news together
  

Comments