Keith Markus' Urban Sprawl: http://web.jjay.cuny.edu/~kmarkus


  EPSY U88000.01 GC
Seminar in Research Methods:
Introduction to Data Analysis and Programming
with R and Python
CRN 58077

Course Information
Links
Syllabus
R-project
Schedule
Comprehensive R Archive Network

Python
 
Anaconda Python
   Site Map

Blackboard Login (CUNY portal)

Syllabus
 
Spring 2022

Time: Wednesday 6:30-8:30 PM
Room:
Room 6418, CUNY Graduate Center, 365 Fifth Avenue

Contact Information:
Professor Keith A. Markus
kmarkus@aol.com  (This is the best way to contact me.)
212-237-8784 (For some reason I no longer receive voice messages as email, so I do not recommend voice messages.)
Office: 10.65.04  New Building, John Jay College
Address:  Psychology Department, 10th Floor
John Jay College of Criminal Justice, CUNY
524 W59th Street, New York, NY, 10019

Office Hours:
  Priority will be given to students who make an appointment beforehand.  5 PM to 6 PM Wednesdays when classes are in session.

Course Description:
  R and Python offer widely used programming environments.  The course offers a basic introduction to R and Python programming for data analysis and data management.  The focus is on providing a firm foundation for further self-guided learning in both environments.  The course is aimed at behavioral science researchers and methodologists and assumes a basic familiarity with behavioral science data analysis, commonly used statistical distributions and statistical tests.  The course provides a basic introduction to flow charts and program design.  The course explores the basic environments (R packages and Python modules) including key elements of syntax, data types, programming basics.  The course emphasizes functional programming in R and object oriented programming in Python.

Course Objectives:
1. Students will gain a basic understanding of the process or writing clear, readable, and reusable code.
2. Students will gain a basic level of comfort and familiarity with both the R and Python programming environments.
3. Students will gain hands on experience with functional programming in R.
4. Students will gain hands on experience with object oriented programming in Python.
5. Students will gain sufficient familiarity with both environments to explore further topics on their own.

Reading:

ITR: Venables, W. N., Smith, D. M. & the R Core Team (2021). An Introduction to R:  Notes on R: A Programming Environment for Data Analysis and Graphics.  Version 4.1.2 (2021-11-01). 
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

URDAG: Maindonald, J. H. (2008).  Using R for Data Analysis and Graphics: Introduction, Code and Commentary.
https://cran.r-project.org/doc/contrib/usingR.pdf

TPT: The Python Tutorial https://docs.python.org/3/tutorial/index.html

ITNSM: No Author (No Date).  Introduction to numpy, scipy and matplotlib*
https://www.patnauniversity.ac.in/e-content/science/physics/MScPhy21.pdf 

P1: No Author (No Date).  Pandas 1: Introduction*
https://www.acme.byu.edu/wp-content/uploads/2021/09/pandas1_2021.pdf
(Your browser may give a security warning for this URL.  I will post all the pdf files on Blackboard so that you can download them from there.)

RSG:  Google's R Style Guide  https://google.github.io/styleguide/Rguide.html

PEP8:  Style Guide for Python Code  https://www.python.org/dev/peps/pep-0008/

ZEN:  The Zen of Python (Type 'import this' in the Python console.)

*If you are the author of one of these documents and wish to be identified as such, please t me know (kmarkus@aol.com).

Course Flow:  Familiarize yourself with the reading before class meets.  We will illustrate concepts from the reading in class and reinforce them with in-class practice problems.

Viewing:
The following three YouTube playlists are very strongly recommended.  I would suggest watching both DataDaft playlists all the way through.  The Socratica list includes some topics that we will not directly address in this course but also covers some topics not covered by the DataDaft list.  In particular, the episode on Python classes is highly recommended because we will use those extensively.

DataDaft: Introduction to R https://www.youtube.com/playlist?list=PLiC1doDIe9rDjk9tSOIUZJU4s5NpEyYtE

DataDaft: Python for Data Analysis https://www.youtube.com/playlist?list=PLiC1doDIe9rCYWmH9wIEYEXXaJ4KAi3jc

Socratica:  Learn Python  https://www.youtube.com/playlist?list=PLi01XoE8jYohWFPpC17Z-wWhPOSuh8Er-


Software:

Software installation details will depend on your operating system.

R logo
You can install base R here:  https://cran.r-project.org/

Recommended, you can install R Studio here:  https://www.rstudio.com/
This is a popular graphical user interface (GUI) for R that runs R in the background and makes many routine tasks easier.  I will use R Studio in class.

Python Logo
You can install base Python here:  https://www.python.org/
This is not recommended for this course.  See recommended installation below.

Recommended: You can install Anaconda Python here:  https://www.anaconda.com/
The Anaconda Python distribution includes all the special modules used for data analysis applications.  These can be tricky to install yourself unless you have a high comfort level using the command line terminal on your operating system.  I will be using the Spyder integrated development environment (IDE, which is like a GUI for our purposes) to work with Anaconda Python in class.  Spyder comes packaged with Anaconda Python.  (If installation does not provide an icon or other shortcut, try typing 'spyder' at the command line with no quotes to open the application for the first time.)

Dia Logo
Recommended:  You can use the Dia program to draw flow charts.  You can install it from the Dia homepage here:  http://dia-installer.de/


Coding Shares:

You are responsible for posting at least 3 contributions to the "Coding Knowledge Exchange" bulletin board on Blackboard.  As you explore R and Python, you will discover various things that are helpful to you.  This might be a particular function, R package or Python module.  It might be a particular coding idiom used to complete a particular kind of task.  It might be a blog post, help forum post (e.g., Stack Overflow or Cross Validated), or other web resource.  It might be a helpful book.  It might be something that you found confusing at first and now have a clear idea how to explain to others.  In any event, each share should contain at least a short paragraph of original text that explains (a) what the share comprises, (b) why you found it helpful, and (c) one or more examples of potential use cases (applications).  A share should never be just a URL link or direct quotation.

Further use of the forum is not graded but I strongly encourage robust discussion.  I encourage you to continue sharing after you have made three posts and to reply to one another's posts with questions, comments, or additions.

Course Projects:

There will be separate projects in R and Python.  Details about the project assignments will be posted to Blackboard.  Both will involve writing a reusable suite of functions (in R) or classes and methods (in Python) to complete a concrete data management/analysis task using concepts from the course.

Grading:  The final grade comprises coding shares and the two course projects. Coding shares count for 30% of your grade and each project counts for 35% of the grade.  Letter grades will be assigned as indicated below.
 
 

Letter Grade
Percent Grade
A
92-100
A-
84-91
B+
76-83
B
68-75
B-
60-67
C+
52-59
C
44-51
C-
36-43
F
0-35

Diversity:
Everyone should feel welcomed as a member of the R and Python user communities.  The broader software development community has recognized that it has struggled with diversity and has responded with efforts to address this issue (https://en.wikipedia.org/wiki/Silicon_Valley#Demographics).  Jessica McKellar served as a figurehead for early efforts to pursue greater diversity and inclusion in the Python community.  Her message has been that there are no shortcuts or easy fixes; Instead it takes long hours of networking and sending individual email invitations to diversify conference participation.  The Python Foundation has published a diversity statement (https://www.python.org/community/diversity/) and offers grants that can support, among other things, local diversity and inclusion efforts (https://www.python.org/psf-landing/).  The R Consortium has a high level project called R Community IDEA that pursues inclusion, diversity, equity and accessibility.  A presentation by Heather Turner describes various aspects of these efforts.  The R Foundation has endorsed R conferences in a variety of languages (https://www.r-project.org/conferences/) to include people outside Europe and North America and to target under-supported regions globally.  Diversity and inclusion statements have become a key instrument in making the field more welcoming to anyone interested in getting involved.  Reactionary events (e.g., Gamer Gate and the infamous Google employee memo) often garner more press attention than the positive work that is being done.  Overall, it is my understanding that such efforts have made up more ground promoting gender inclusion (https://en.wikipedia.org/wiki/Sexism_in_the_technology_industry) than other forms of inclusion.  Moreover, one sometimes still sees prominent personalities express counter-productive views that demonstrate their struggle to adjust to the cultural changes taking place.  So, plenty of work remains to be done but do not let that discourage you.  Those interested in contributing to these efforts will find a receptive audience and ample opportunities through user organizations and conferences.  I encourage anyone interested to attend an R user conference or Python user conference to meet other users and learn about things you might not otherwise explore on your own. UseR! and PyData are two prominent options.  The R Foundation requires conferences that it endorses, including UseR!, to have a code of conduct statement.   PyData also has a code of conduct statement.  Such statements have become key tools in establishing norms of professional behavior.  Education is another area targeted by efforts to diversify coding and include more under-represented groups.  Organizations like Code.org work to bring coding instruction to diverse students to help pave the way for a more inclusiveness in the field.  To sum up, software development has a recognized diversity problem and addressing that problem remains a work in progress but you will find many people and initiatives in the field dedicated to progress in this area.

I hope that this course can serve as an entry point to both user communities and help promote diversity and inclusion in that way.  Likewise, I hope that you will share what you learn with others.  At a philosophical level, my choice to combine two languages in this course reflects my deeper commitment to dialogism: the view that the world is best understood through multiple representations, that discussing one language system from the perspective of another plays an important role in minding the gap between what we communicate about and how we communicate about it, and that no single system of representation is ever sufficient in itself to the exclusion of others.  In turn, these philosophical commitments shape my understanding of and approach to advancing diversity and inclusion.

Special Needs:
To request accommodations please contact the Office of the Vice President for Student Affairs (Room 7301 Graduate Center; (212) 817-7400). Information about accommodations can be found in the Graduate Center Student Handbook 05-06, pp. 51-52).

Academic Honesty:    
The Graduate Center of The City University of New York is committed to the highest standards of academic honesty. Acts of academic dishonesty include—but are not limited to—plagiarism, (in drafts, outlines, and examinations, as well as final papers), cheating, bribery, academic fraud, sabotage of research materials, the sale of academic papers, and the falsification of records. An individual who engages in these or related activities or who knowingly aids another who engages in them is acting in an academically dishonest manner and will be subject to disciplinary action in accordance with the bylaws and procedures of The Graduate Center and the Board of Trustees of The City University of New York.  

Each member of the academic community is expected to give full, fair, and formal credit to any and all sources that have contributed to the formulation of ideas, methods, interpretations, and findings. The absence of such formal credit is an affirmation representing that the work is fully the writer’s. The term “sources” includes, but is not limited to, published or unpublished materials, lectures and lecture notes, computer programs, mathematical and other symbolic formulations, course papers, examinations, theses, dissertations, and comments offered in class or informal discussions, and includes electronic media. The representation that such work of another person is the writer’s own is plagiarism.

Care must be taken to document the source of any ideas or arguments. If the actual words of a source are used, they must appear within quotation marks. In cases that are unclear, the writer must take due care to avoid plagiarism.

The source should be cited whenever:
(a) a text is quoted verbatim
(b) data gathered by another are presented in diagrams or tables
(c) the results of a study done by another are used
(d) the work or intellectual effort of another is paraphrased by the writer

    Because the intent to deceive is not a necessary element in plagiarism, careful note taking and record keeping are essential in order to avoid unintentional plagiarism.

    For additional information, please consult “Avoiding and Detecting Plagiarism,” available in the Office of the Vice President for Student Affairs, the Provost’s Office, or at http://web.gc.cuny.edu/provost/pdf/AvoidingPlagiarism.pdf.

(From The Graduate Center Student Handbook 05-06, pp. 36-37)



Top
Schedule
Date
Topics
Reading Due
Assignments Due
Week 1 W 2/2
Introduction to R and Python environments, course overview, flow charts and programming basics


Week 2 W 2/9
R basics & procedural programming: Data types, indexing, missing data, loops, reading and writing data, etc.
ITR, RSG

Week 3 W 2/16
Defining your own R functions & Functional programming URDAG

Week 4 W 2/23
Survey of some statistical analysis functions in R ITR

Week 5 W 3/2
R graphics

Week 6 W 3/9
Test driven programming in R

Week 7 W 3/16
R statistical distribution functions and a general framework for simulation studies in R URDAG

Week 8 W 3/23
General considerations for refactoring and writing re-usable code in R

Week 9 W 3/30
Python basics: Importing modules, data types, loops, variable scoping, defining functions, file handling, etc.

TPT, PEP8, ZEN

R project
Week 10 W 4/6
Object oriented programming:  Classes, attributes, methods & composition
TPT

Week 11 W 4/13
Test driven programming in Python, unit tests and assertion checks
 
Week 12 W 4/27
(Class does not meet 4/20)
Data types, data management and statistical distributions: NumPy, SciPy and Pandas ITNSM

Week 13 W 5/4
Data Analysis: Statsmodels and Mathplotlib ITNSM, P1
 
Week 14 W 5/11
General considerations for refactoring and writing re-usable code in Python


Week 15 W 5/18
Student project presentations



Python project


Top
Created 5 September 2021, updated 26 November 2021, 3 December 2021, 20 January 2022
This page was created using SeaMonkey v.2.53.10.