Keith
Markus' Urban Sprawl:
http://web.jjay.cuny.edu/~kmarkus
EPSY U88000.01 GC
Seminar in Research Methods:
Introduction to Data Analysis and Programming
with R and Python
CRN 58077
Syllabus
Spring 2022
Time: Wednesday 6:30-8:30
PM
Room: Room 6418,
CUNY Graduate Center, 365 Fifth Avenue
Contact Information:
Professor Keith A. Markus
kmarkus@aol.com (This is
the best way to contact me.)
212-237-8784 (For some
reason I no longer receive voice messages as email, so I do not
recommend voice messages.)
Office: 10.65.04
New Building, John Jay College
Address: Psychology Department, 10th Floor
John Jay College of Criminal Justice, CUNY
524 W59th Street, New York, NY, 10019
Office Hours: Priority will be given to students who
make an appointment beforehand. 5 PM to 6 PM
Wednesdays when classes are in session.
Course Description: R and Python offer widely used
programming environments. The course offers a basic
introduction to R and Python programming for data analysis and data
management. The focus is on providing a firm foundation for
further self-guided learning in both environments. The course
is aimed at behavioral science researchers and methodologists and
assumes a basic familiarity with behavioral science data analysis,
commonly used statistical distributions and statistical tests.
The course provides a basic introduction to flow charts and program
design. The course explores the basic environments (R packages
and Python modules) including key elements of syntax, data types,
programming basics. The course emphasizes functional
programming in R and object oriented programming in Python.
Course Objectives:
1. Students will gain a basic understanding of the process or
writing clear, readable, and reusable code.
2. Students will gain a basic level of comfort and familiarity
with both the R and Python programming environments.
3. Students will gain hands on experience with functional
programming in R.
4. Students will gain hands on experience with object oriented
programming in Python.
5. Students will gain sufficient familiarity with both
environments to explore further topics on their own.
Reading:
ITR: Venables, W. N., Smith, D. M. & the R Core Team
(2021). An Introduction to R: Notes on R: A Programming
Environment for Data Analysis and Graphics. Version 4.1.2
(2021-11-01).
https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
URDAG: Maindonald, J. H. (2008). Using R for Data Analysis
and Graphics: Introduction, Code and Commentary.
https://cran.r-project.org/doc/contrib/usingR.pdf
TPT: The Python Tutorial
https://docs.python.org/3/tutorial/index.html
ITNSM: No Author (No Date). Introduction to numpy, scipy
and matplotlib*
https://www.patnauniversity.ac.in/e-content/science/physics/MScPhy21.pdf
P1: No Author (No Date). Pandas 1: Introduction*
https://www.acme.byu.edu/wp-content/uploads/2021/09/pandas1_2021.pdf
(Your browser may give a security warning for this URL. I will
post all the pdf files on Blackboard so that you can download them
from there.)
RSG: Google's R Style Guide https://google.github.io/styleguide/Rguide.html
PEP8: Style Guide for Python Code
https://www.python.org/dev/peps/pep-0008/
ZEN: The Zen of Python (Type 'import this' in the Python
console.)
*If you are the author of one of these documents and wish to be
identified as such, please t me know (kmarkus@aol.com).
Course Flow: Familiarize yourself with the reading
before class meets. We will illustrate concepts from the
reading in class and reinforce them with in-class practice problems.
Viewing:
The following three YouTube playlists are very strongly
recommended. I would suggest watching both DataDaft
playlists all the way through. The Socratica list includes
some topics that we will not directly address in this course but
also covers some topics not covered by the DataDaft list. In
particular, the episode on Python classes is highly recommended
because we will use those extensively.
DataDaft: Introduction to R https://www.youtube.com/playlist?list=PLiC1doDIe9rDjk9tSOIUZJU4s5NpEyYtE
DataDaft: Python for Data Analysis https://www.youtube.com/playlist?list=PLiC1doDIe9rCYWmH9wIEYEXXaJ4KAi3jc
Socratica: Learn Python https://www.youtube.com/playlist?list=PLi01XoE8jYohWFPpC17Z-wWhPOSuh8Er-
Software:
Software installation details will depend on your operating
system.

You can install base R here: https://cran.r-project.org/
Recommended, you can install R Studio here: https://www.rstudio.com/
This is a popular graphical user interface (GUI) for R that runs R
in the background and makes many routine tasks easier. I
will use R Studio in class.

You can install base Python here: https://www.python.org/
This is not recommended for this course. See
recommended installation below.
Recommended: You can install Anaconda Python here: https://www.anaconda.com/
The Anaconda Python distribution includes all the special modules
used for data analysis applications. These can be tricky to
install yourself unless you have a high comfort level using the
command line terminal on your operating system. I will be
using the Spyder integrated development environment (IDE, which is
like a GUI for our purposes) to work with Anaconda Python in
class. Spyder comes packaged with Anaconda Python. (If
installation does not provide an icon or other shortcut, try
typing 'spyder' at the command line with no quotes to open the
application for the first time.)

Recommended: You can use the Dia program to draw flow
charts. You can install it from the Dia homepage here:
http://dia-installer.de/
Coding Shares:
You are responsible for posting at least 3 contributions to the
"Coding Knowledge Exchange" bulletin board on Blackboard. As
you explore R and Python, you will discover various things that
are helpful to you. This might be a particular function, R
package or Python module. It might be a particular coding
idiom used to complete a particular kind of task. It might
be a blog post, help forum post (e.g., Stack Overflow or Cross
Validated), or other web resource. It might be a helpful
book. It might be something that you found confusing at
first and now have a clear idea how to explain to others. In
any event, each share should contain at least a short paragraph of
original text that explains (a) what the share comprises, (b) why
you found it helpful, and (c) one or more examples of potential
use cases (applications). A share should never be just a URL
link or direct quotation.
Further use of the forum is not graded but I strongly encourage
robust discussion. I encourage you to continue sharing after
you have made three posts and to reply to one another's posts with
questions, comments, or additions.
Course Projects:
There will be separate projects in R and Python. Details about
the project assignments will be posted to Blackboard. Both
will involve writing a reusable suite of functions (in R) or classes
and methods (in Python) to complete a concrete data
management/analysis task using concepts from the course.
Grading: The final grade comprises coding shares and
the two course projects. Coding shares count for 30% of your grade
and each project counts for 35% of the grade. Letter grades
will be assigned as indicated below.
Letter Grade
|
Percent Grade
|
A
|
92-100
|
A-
|
84-91
|
B+
|
76-83
|
B
|
68-75
|
B-
|
60-67
|
C+
|
52-59
|
C
|
44-51
|
C-
|
36-43
|
F
|
0-35
|
Diversity:
Everyone should feel welcomed as a member of the R and Python user
communities. The broader software development community has
recognized that it has struggled with diversity and has responded
with efforts to address this issue (https://en.wikipedia.org/wiki/Silicon_Valley#Demographics).
Jessica
McKellar served as a figurehead for early efforts to pursue
greater diversity and inclusion in the Python community. Her
message has been that there are no shortcuts or easy fixes; Instead
it takes long hours of networking and sending individual email
invitations to diversify conference participation. The Python
Foundation has published a diversity statement (https://www.python.org/community/diversity/)
and offers grants that can support, among other things, local
diversity and inclusion efforts (https://www.python.org/psf-landing/).
The R
Consortium has a high level project called R
Community IDEA that pursues inclusion, diversity, equity and
accessibility. A presentation
by Heather Turner describes various aspects of these
efforts. The R Foundation has endorsed R conferences in a
variety of languages (https://www.r-project.org/conferences/)
to include people outside Europe and North America and to target
under-supported regions globally. Diversity
and inclusion statements have become a key instrument in
making the field more welcoming to anyone interested in getting
involved. Reactionary events (e.g., Gamer Gate and the
infamous Google employee memo) often garner more press attention
than the positive work that is being done. Overall, it is my
understanding that such efforts have made up more ground promoting
gender inclusion (https://en.wikipedia.org/wiki/Sexism_in_the_technology_industry)
than other forms of inclusion. Moreover, one sometimes still
sees prominent personalities express counter-productive views that
demonstrate their struggle to adjust to the cultural changes taking
place. So, plenty of work remains to be done but do not let
that discourage you. Those interested in contributing to these
efforts will find a receptive audience and ample opportunities
through user organizations and conferences. I encourage anyone
interested to attend an R user conference
or Python user
conference to meet other users and learn about things you
might not otherwise explore on your own. UseR! and PyData are
two prominent options. The R Foundation requires conferences
that it endorses, including UseR!, to have a code
of conduct statement. PyData also has a code
of conduct statement. Such statements have become key
tools in establishing norms of professional behavior.
Education is another area targeted by efforts to diversify coding
and include more under-represented groups. Organizations like
Code.org
work to bring coding instruction to diverse students to help pave
the way for a more inclusiveness in the field. To sum up,
software development has a recognized diversity problem and
addressing that problem remains a work in progress but you will find
many people and initiatives in the field dedicated to progress in
this area.
I hope that this course can serve as an entry point to both user
communities and help promote diversity and inclusion in that
way. Likewise, I hope that you will share what you learn with
others. At a philosophical level, my choice to combine two
languages in this course reflects my deeper commitment to dialogism:
the view that the world is best understood through multiple
representations, that discussing one language system from the
perspective of another plays an important role in minding the gap
between what we communicate about and how we communicate about it,
and that no single system of representation is ever sufficient in
itself to the exclusion of others. In turn, these
philosophical commitments shape my understanding of and approach to
advancing diversity and inclusion.
Special Needs:
To request accommodations please contact the Office of the Vice
President for Student Affairs (Room 7301 Graduate Center; (212)
817-7400). Information about accommodations can be found in the
Graduate Center Student Handbook 05-06, pp. 51-52).
Academic Honesty:
The Graduate Center of The City University of New York is committed
to the highest standards of academic honesty. Acts of academic
dishonesty include—but are not limited to—plagiarism, (in drafts,
outlines, and examinations, as well as final papers), cheating,
bribery, academic fraud, sabotage of research materials, the sale of
academic papers, and the falsification of records. An individual who
engages in these or related activities or who knowingly aids another
who engages in them is acting in an academically dishonest manner
and will be subject to disciplinary action in accordance with the
bylaws and procedures of The Graduate Center and the Board of
Trustees of The City University of New York.
Each member of the academic community is expected to give full,
fair, and formal credit to any and all sources that have contributed
to the formulation of ideas, methods, interpretations, and findings.
The absence of such formal credit is an affirmation representing
that the work is fully the writer’s. The term “sources” includes,
but is not limited to, published or unpublished materials, lectures
and lecture notes, computer programs, mathematical and other
symbolic formulations, course papers, examinations, theses,
dissertations, and comments offered in class or informal
discussions, and includes electronic media. The representation that
such work of another person is the writer’s own is plagiarism.
Care must be taken to document the source of any ideas or arguments.
If the actual words of a source are used, they must appear within
quotation marks. In cases that are unclear, the writer must take due
care to avoid plagiarism.
The source should be cited whenever:
(a) a text is quoted verbatim
(b) data gathered by another are presented in diagrams or tables
(c) the results of a study done by another are used
(d) the work or intellectual effort of another is paraphrased by the
writer
Because the intent to deceive is not a necessary
element in plagiarism, careful note taking and record keeping are
essential in order to avoid unintentional plagiarism.
For additional information, please consult
“Avoiding and Detecting Plagiarism,” available in the Office of the
Vice President for Student Affairs, the Provost’s Office, or at
http://web.gc.cuny.edu/provost/pdf/AvoidingPlagiarism.pdf.
(From The Graduate Center Student Handbook 05-06, pp. 36-37)
Schedule
Date
|
Topics
|
Reading Due
|
Assignments Due
|
Week 1 W 2/2
|
Introduction to R and Python
environments, course overview, flow charts and programming
basics
|
|
|
Week 2 W 2/9
|
R basics & procedural
programming: Data types, indexing, missing data, loops,
reading and writing data, etc.
|
ITR, RSG
|
|
Week 3 W 2/16
|
Defining your own R functions
& Functional programming |
URDAG
|
|
Week 4 W 2/23
|
Survey of some statistical
analysis functions in R |
ITR
|
|
Week 5 W 3/2
|
R graphics |
|
|
Week 6 W 3/9
|
Test driven programming in R |
|
|
Week 7 W 3/16
|
R statistical distribution
functions and a general framework for simulation studies
in R |
URDAG
|
|
Week 8 W 3/23
|
General considerations for
refactoring and writing re-usable code in R |
|
|
Week 9 W 3/30
|
Python basics: Importing
modules, data types, loops, variable scoping, defining
functions, file handling, etc. |
TPT, PEP8, ZEN
|
R project
|
Week 10 W 4/6
|
Object oriented
programming: Classes, attributes, methods &
composition
|
TPT
|
|
Week 11 W 4/13
|
Test driven programming in
Python, unit tests and assertion checks |
|
|
Week 12 W 4/27
(Class does not meet 4/20)
|
Data types, data management
and statistical distributions: NumPy, SciPy and Pandas |
ITNSM
|
|
Week 13 W 5/4
|
Data Analysis: Statsmodels and
Mathplotlib |
ITNSM, P1
|
|
Week 14 W 5/11
|
General considerations for
refactoring and writing re-usable code in Python
|
|
|
Week 15 W 5/18
|
Student project presentations
|
|
Python project |
Created 5 September 2021, updated 26
November 2021, 3 December 2021, 20 January 2022
This page was created using
SeaMonkey
v.2.53.10.