Keith Markus' Urban Sprawl:

PSYC U80103.JJ3
Categorical Data Analysis
(CRN 18047)
Course Information
Azen & Walker Data Sets
Comprehensive R Archive Network
Homework Assignments
CUNY Blackboard Login

Site Map

Spring 2012

Time:  Wednesday 6:30-8:30 PM
Room:  10.72.00, John Jay College of Criminal Justice, 899 Tenth Avenue New York, NY, 10019
Office Hours:  Tuesdays 5:45 PM to 6:15 PM GC room 3204.02, Wednesday 4:30 PM to 5 PM JJ room 10.63.11 and Thursday 2PM to 3PM JJ room 10.63.11.
(It usually works best to email me.)

Contact Information:
Dr. Keith A. Markus
Room 10.63.11
Psychology Department, John Jay College

Course Description: 

This course presents the theory and application of methods for analyzing nominal and ordinal outcome variables, including the use of computer programs for performing these analyses. Methods covered include classical statistical tests, log-linear models, logistic regression, cumulative link models, and latent class analysis. Both categorical and continuous predictor variables will be covered.

Course Objectives:
1. Introduce theoretical underpinnings of categorical data analysis.
2. Provide a survey of some common categorical data analysis methods.
3. Provide experience with conducting categorical data analyses.
4. Provide a conceptual basis for formulating testable hypotheses with categorical outcome variables.
5. Build skills for critical evaluation of research involving categorical outcome variables.

Text Books:
    Azen, R. & Walker, C. M. (2011). Categorical data analysis for the behavioral and social sciences. New York: Routledge.
    McCutcheon, A. L. (1987). Latent class analysis. Quantitative Applications in the Social Sciences (64). Thousand Oaks, CA: Sage Publications.

Additional Reading:
    Christensen, R. H. B. (2011). Analysis of ordinal data with cumulative link models estimation with the R-package ordinal. ( or (
    Gameroff, M. J. (2005). Using the Proportional Odds Model for Health-Related Outcomes: Why, When, and How with Various SAS® Procedures, Paper 205-30, In SAS Institute Inc. 2005. Proceedings of the Thirtieth Annual SAS Users Group International Conference. Cary, NC: SAS Institute Inc. (
    Linzer, D. A. & Lewis, J. (2011a). poLCA: An R Package for Polytomous Variable Latent Class Analysis. Journal of Statistical Software, 42, 1-29. (
    Linzer D. A. & Lewis, J. (2011b). poLCA: Polytomous Variable Latent Class Analysis. R package version 1.3. (
    Visser, I. (2007). depmix: An R-package for fitting mixture models on mixed multivariate data with Markov dependencies. (

Required Software:
R with ordinal package, poLCA package, and depmix package: R is a powerful open-source free statistics package that runs very efficiently (even on a PDA) but requires a little adjustment for those accustomed to point and click statistical environments. The optional packages do not come with the base installation and must be added after you install R. It will not be necessary to master R in order to use the packages for the class. If you are completely new to R, you may want to follow along with the sample session provided at the end of An Introduction to R just to get used to the R environment. This document offers a useful introduction to R, although it covers a great deal of material not needed for this course. Additional guides to R can be found on the R homepage ( by clicking "Other" under the documentation menu. Note that R is available for Windows, Linux, and Mac. However, I only have access to the Windows version for answering questions and I have been told that there is some variability between details of how R works on the three operating systems. So, if Windows is an option for you, that may offer the safest choice.

Note on software: The classical and multivariate statistics covered in the first three-quarters of the course are available in SPSS and SAS and the book provides detailed explanations of how to obtain them from these two commercial packages. The modern statistics covered in the last quarter of the course are not available in SPSS. SAS procedures are available from the Methodology Center at Pennsylvania State University. However, this course will not make use of commercial software. LEM offers an alternative as a powerful and user friendly free software package for categorical data analysis. The current course will not make use of LEM, but students interested in conducting such analyses are encouraged to consider it as an additional alternative analysis tool.

R Installation:
1. Point your web browser to the Comprehensive R Archive Network (CRAN).
2. From the sidebar menu on the left, near the top, click Mirrors and select something geographically close (e.g., Pennsylvania). The same page will reload from a closer server.
3. Select Windows (if that is your operating system), if you use an Apple computer, your version of R differs somewhat and I am not familiar with it.
4. Click base. Then download and run the newest version installation file (currently R-2.9.1-win32.exe). Further installation instructions are provided on the CRAN web page.
5. Once installation is compete, start R. You will see a window with a '>' prompt. At the prompt you may type the following command to test the installation.

> demo(graphics)

You will be prompted to hit Enter several times as you move through the demo. A series of graphs should appear in a separate window inside the R window if R has been installed correctly.

6. On the Packages menu in R, select Install Packages. You will be prompted with a list of mirror sites that opens in a separate window. Again, pick something close (e.g., USA PA or USA PA2).
7. Momentarily, you will be prompted with a list of packages in a window similar to the mirror site window that you just used. Click ordinal.
8. After some brief chugging, you should have a message in your main R window indicating that the ordinal package installed correctly.
9. You can test the installation by typing the following command at the R prompt.


This should open a new window outside of the main R window with a help file on the ordinal package. At the top, it should say "Regression Models for Ordinal Data via Cumulative Link (Mixed) Models".

Repeat steps 6-9 for the poLCA package and depmix package (which is not the same as depmixS4). Use library(poLCA), ?poLCA, library(depmix), and ?depmix to test the installation.
10. Return to the R console window where you type commands. At the prompt, enter the following command. When prompted, choose not to save the workspace image. This will close R.


Blackboard Access: Access to Blackboard is an essential part of this course. Course materials will be distributed through Blackboard and I will use Blackboard to send you email. If you have any difficulty accessing the Graduate Center Blackboard system, please resolve those difficulties as soon as possible. Please check that you have a valid email address listed in Blackboard.

Examinations:  The examinations will not be cumulative but later material will always presuppose a familiarity with prior material.  Content of the examinations will reflect the reading. Examinations will emphasize your ability to reason using statistical principles studied in the course. Examinations comprise four online modules. Each student will receive a random sample of items to be completed before the due date on Blackboard. The course contains four examination modules,each covering roughly one quarter of the material.

Homework:  You will need to run examples using R and turn in printed output to demonstrate that you have done this.  As such, you need to have a PC capable of running R, access to the Internet, and a printer.  Homework will generally involve small tasks.  However, as with any other new skill, give yourself plenty of extra time to get confused, muck around by trial and error, and eventually figure out what you did wrong.

Turn in homework assignments at the beginning of class on the days noted on the schedule.  The assignments may not make sense to you until you cover the material to which they refer. The specific assignments will appear on Blackboard.

Grading:  Each of the four examination modules is worth 20% of your total grade.  That leaves 20% for the homework assignments.  Letter grades will be assigned as indicated below.

Letter Grade
Percent Grade

Special Needs:
To request accommodations please contact the Office of the Vice President for Student Affairs (Room 7301 Graduate Center; (212) 817-7400). Information about accommodations can be found in the Graduate Center Student Handbook 05-06, pp. 51-52).

Academic Honesty:    
The Graduate Center of The City University of New York is committed to the highest standards of academic honesty. Acts of academic dishonesty include—but are not limited to—plagiarism, (in drafts, outlines, and examinations, as well as final papers), cheating, bribery, academic fraud, sabotage of research materials, the sale of academic papers, and the falsification of records. An individual who engages in these or related activities or who knowingly aids another who engages in them is acting in an academically dishonest manner and will be subject to disciplinary action in accordance with the bylaws and procedures of The Graduate Center and the Board of Trustees of The City University of New York.  

Each member of the academic community is expected to give full, fair, and formal credit to any and all sources that have contributed to the formulation of ideas, methods, interpretations, and findings. The absence of such formal credit is an affirmation representing that the work is fully the writer’s. The term “sources” includes, but is not limited to, published or unpublished materials, lectures and lecture notes, computer programs, mathematical and other symbolic formulations, course papers, examinations, theses, dissertations, and comments offered in class or informal discussions, and includes electronic media. The representation that such work of another person is the writer’s own is plagiarism.

Care must be taken to document the source of any ideas or arguments. If the actual words of a source are used, they must appear within quotation marks. In cases that are unclear, the writer must take due care to avoid plagiarism.

The source should be cited whenever:
(a) a text is quoted verbatim
(b) data gathered by another are presented in diagrams or tables
(c) the results of a study done by another are used
(d) the work or intellectual effort of another is paraphrased by the writer

    Because the intent to deceive is not a necessary element in plagiarism, careful note taking and record keeping are essential in order to avoid unintentional plagiarism.

    For additional information, please consult “Avoiding and Detecting Plagiarism,” available in the Office of the Vice President for Student Affairs, the Provost’s Office, or at

(From The Graduate Center Student Handbook 05-06, pp. 36-37)

Reading Assignments Due
Examination and Homework Assignments Due
Week1: W 2/1
Azen & Walker (A&W) Chapter 1: Introduction and overview. Installing and using R.

Week 2: W 2/8

A&W Chapter 2:  Probability distributions.

Week 3: W 2/15

A&W Chapter 3:  Proportions, estimation, and goodness-of-fit.

Homework Assignment 1 (HA1 probability)
Week 4: W 2/22
A&W Chapter 4: Association between two categorical variables.

Week 5: W 2/29
A&W Chapter 5: Association between three categorical variables. Test Module 1 (weeks 1-4)
Week 6: W 3/7
A&W Chapter 6:  Modeling and the generalized linear model. HA2 (1-way and 2-way GOF)
Week 7: W 3/14
A&W Chapter 7:  Log-linear models. HA3 (CMH test)
Week 8: W 3/21
A&W Chapter 8:  Logistic regression with continuous predictors.
Test Module 2 (weeks 5-7)
Week 9: W 3/28
A&W Chapter 9:  Logistic regression with categorical predictors. HA4 (log-linear)
Week 10: W 4/4
A&W Chapter 10:  Logistic regression with multicategory outcomes.  
Week 11: W 4/18
(No classes 4/11)
Gameroff (2005), Christensen (2011): Proportional odds regression and other cumulative link function models for ordinal data. HA5 (logistic)
Week 12: W 4/25
McCutcheon (M) Chapters 1-2, Linzer & Lewis (2011a): The logic of latent variables, Latent class analysis. Test Module 3 (weeks 8-11)
Week 13: W 5/2

M Chapters 3-4, Linzer & Lewis (2011b): & Estimating latent categorical variables & Analyzing scale response patterns.
Week 14: W 5/9
M Chapters 4-6, Visser (2007): Comparing latent structures among groups & Conclusions. HA6 (LCA)
Finals Week: W 5/23
(W 5/16 is a reading day)

Test Module 4 (weeks 12-14)

Created 30 December 2011
Updated 26 January 2012
This page was created using Mozilla SeaMonkey v.2.5 and is best viewed using a Mozilla web browser.