1.4 Guardar entrada y salida
Al analizar datos o programar en R, un usuario nunca se meterá en problemas serios siempre que siga dos reglas básicas:
- Deje siempre intactos los archivos de datos originales. Cualquier versión revisada de los datos debe escribirse en un archivo nuevo. Si está trabajando con un conjunto de datos particularmente grande y difícil de manejar, escriba un programa corto que se adapte a lo que necesita, guarde el archivo limpiado por separado y luego escriba el código que funcione con el nuevo archivo.
- Escriba todo el código de entrada en una secuencia de comandos que se guarda. Los usuarios generalmente deben evitar escribir código directamente en la consola. Esto incluye código para limpiar, recodificar y remodelar datos, así como para realizar análisis o desarrollar nuevos programas.
Si se siguen estas dos reglas, entonces el usuario siempre recuperará su trabajo hasta el punto de que se produzca algún error u omisión. Entonces, incluso si, en la administración de datos, se pierde o pierde información esencial, o incluso si un revisor de una revista nombra un predictor que un modelo debe agregar, el usuario siempre puede volver sobre sus pasos. Al llamar al conjunto de datos original con el programa guardado, el usuario puede realizar ajustes menores en el código para incorporar una nueva función de análisis o recuperar información perdida. Por el contrario, si los datos originales se sobrescriben o el código de entrada no se guarda, entonces el usuario probablemente tendrá que iniciar el proyecto completo desde el principio, lo que es una pérdida de tiempo.
Un archivo de script en R es simplemente texto sin formato, generalmente guardado con el sufijo .R. Para crear un nuevo archivo de script en R, simplemente elija Archivo \(\rightarrow\) Nuevo documento en el menú desplegable para abrir el documento. Alternativamente, la ventana de la consola que se muestra en la Fig. 1.2 muestra un ícono que parece una página en blanco en la parte superior de la pantalla (segundo ícono de la derecha). Al hacer clic en esto, también se creará un nuevo archivo de script R. Una vez abierto, se aplican los comandos normales Guardar y Guardar como del menú Archivo. Para abrir un script existente, seleccione Archivo \(\rightarrow\) Abrir documento en el menú desplegable, o haga clic en el icono en la parte superior de la pantalla que parece un página con escritura (tercer icono desde la derecha en la Fig. 1.2). Cuando se trabaja con un archivo de secuencia de comandos, cualquier código dentro del archivo se puede ejecutar en la consola simplemente resaltando el código de interés y escribiendo el atajo de teclado Ctrl + R en Windows o Cmd + Return en Mac. Además del editor de archivos de script predeterminado, también están disponibles editores de texto más sofisticados como Emacs y RWinEdt.
El producto de cualquier sesión de R se guarda en el directorio de trabajo. El directorio de trabajo es la ruta de archivo predeterminada para todos los archivos que el usuario desea leer o escribir. El comando getwd (que significa ** get w ** orking ** d ** irectory) imprimirá el directorio de trabajo actual de R, mientras que setwd (** set w ** orking ** d ** irectory) le permite cambiar el directorio de trabajo como se desee. Dentro de una máquina con Windows, la sintaxis para verificar y luego configurar, el directorio de trabajo se vería así:
getwd()
setwd("C:/temp/")
This now writes any output files, be they data sets, figures, or printed output to the folder temp in the C: drive. Observe that R expects forward slashes to designate subdirectories, which contrasts from Windows’s typical use of backslashes. Hence, specifying C:/temp/ as the working directory points to C: in normal Windows syntax. Meanwhile for Mac or Unix, setting a working directory would be similar,andthepathdirectoryisprintedexactlyastheseoperatingsystemsdesignate them with forward slashes:
setwd("/Volumes/flashdisk/temp")
Note that setwd can be called multiple times in a session, as needed. Also, specifying the full path for any file overrides the working directory.
To save output from your session in R, try the sink command. As a general computing term, a sink is an output point for a program where data or results are written out. In R, this term accordingly refers to a file that records all of our printed output. To save your session’s ouput to the file Rintro.txt within the working directory type:
sink("Rintro.txt")
Alternatively, if we wanted to override the working directory, in Windows for instance, we could have instead typed:
sink("C:/myproject/code/Rintro.txt")
Now that we have created an output file, any output that normally would print to the console will instead print to the file Rintro.txt. (For this reason, in a first runofnewcode,itisusuallyadvisable toallowoutputtoprinttothescreenandthen rerun the code later to print to a file.) The print command is useful for creating output that can be easily followed. For instance, the command:
print("The mean of variable x is...")
will print the following in the file Rintro.txt:
[1] “The mean of variable x is…”
Another useful printing command is the cat command (short for catenate, to connect things together), which lets you mix objects in R with text. As a preview of simulation tools described in Chap.11, let us create a variable named x by means of simulation:
<- rnorm(1000) x
By way of explanation: this syntax draws randomly 1000 times from a standard normal distribution and assigns the values to the vector x. Observe the arrow (<-), formed with a less than sign and a hyphen, which is R’s assignment operator. Any time we assign something with the arrow (<-) the name on the left (x in this case) allows us to recall the result of the operation on the right (rnorm(1000) in this case)3. Now we can print the mean of these 1000 draws (which should be close to 0 in this case) to our output file as follows:
cat("The mean of variable x is...", mean(x), "\n")
With this syntax, objects from R can be embedded into the statement you print. The character puts in a carriage return. You also can print any statistical output using the either print or cat commands. Remember, your output does not go to the log file unless you use one of the print commands. Another option is to simply copy and paste results from the R console window into Word or a text editor. To turn off the sink command, simply type:
sink()
- Work Session Management
A key feature of R is that it is an object-oriented programming language. Variables, data frames, models, and outputs are all stored in memory as objects, or identified (and named) locations in memory with defined features. R stores in working memory any object you create using the name you define whenever you load data into memory or estimate a model. To list the objects you have created in a session use either of the following commands:
objects()
ls()
To remove all the objects in R type:
rm(list=ls(all=TRUE))
As a rule, it is a good idea to use the rm command at the start of any new program. If the previous user saved his or her workspace, then they may have used objects sharing the same name as yours, which can create confusion.
To quit R either close the console window or type:
q()
At this point, R will ask if you wish to save the workspace image. Generally, it is advisable not to do this, as starting with a clean slate in each session is more likely to prevent programming errors or confusion on the versions of objects loaded in memory.
Finally, in many R sessions, we will need to load packages, or batches of code and data offering additional functionality not written in R’s base code. Throughout this book we will load several packages, particularly in Chap.8, where our focus will be on example packages written by prominent Political Scientists to implement cutting-edge methods. The necessary commands to load packages are install.packages, a command that automatically downloads and installs a package on a user’s copy of R, and library, a command that loads the package in a given session. Suppose we wanted to install the package MCMCpack. This package provides tools for Bayesian modeling that we will use in Chap.8. The form of the syntax for these commands is:
install.packages("MCMCpack")
library(MCMCpack)
Package installation is case and spelling sensitive. R will likely prompt you at this point to choose one of the CRAN mirrors from which to download this package: For faster downloading, users typically choose the mirror that is most geographically proximate. The install.packages command only needs to be run once per R installation for a particular package to be available on a machine. The library command needs to be run for every session that a user wishes to use the package. Hence, in the next session that we want to use MCMCpack, we need only type: library(MCMCpack).
- Resources
Given the wide array of base functions that are available in R, much less the even wider array of functionality created by R packages, a book such as this cannot possibly address everything R is capable of doing. This book should serve as a resource introducing how a researcher can use R as a basic statistics program and offer some general pointers about the usage of packages and programming features. As questions emerge about topics not covered in this space, there are several other resources that may be of use:
- Within R, the Help pull down menu (also available by typing help.start() in the console) offers several manuals of use, including an “Introduction to R” and “Writing R Extensions.” This also opens an HTML-based search engine of the help files.
- UCLA’s Institute for Digital Research and Education offers several nice tutorials (http://www.ats.ucla.edu/stat/r/). The CRAN website also includes a variety of online manuals (http://www.cran.r-project.org/other-docs.html).
- Some nice interactive tutorials include swirl, which is a package you install in your own copy of R (more information:http://www.swirlstats.com/), and Try R, which is completed online (http://tryr.codeschool.com/).
- Within the R console, the commands ?, help(), and help.search() all serve to find documentation. For instance, ?lm would find the documenta- tion for the linear model command. Alternatively, help.search(“linear model”) would search the documentation for a phrase.
- Practice Problems 11
- To search the internet for information,Rseek (http://www.rseek.org/, powered by Google) is a worthwhile search engine that searches only over websites focused on R.
- Finally, Twitter users reference R through the hashtag #rstats.
At this point, users should now have R installed on their machine, hold a basic sense of how commands are entered and output is generated, and recognize where to find the vast resources available for R users. In the next six chapters, we will see how R can be used to fill the role of a statistical analysis or econometrics software program.
1.6 Practice Problems
Each chapter will end with a few practice problems. If you have tested all of the code from the in-chapter examples, you should be able to complete these on your own. If you have not done so already, go ahead and install R on your machine for free and try the in-chapter code. Then try the following questions.
- Compute the following in R:
- 7 23
- 8
82C 1
- cpos
- 81
- lne4
- Whatdoesthecommandcor do?Finddocumentationaboutitanddescribewhat the function does.
- What does the command runif do? Find documentation about it and describe what the function does.
- Create a vector named x that consists of 1000 draws from a standard normal distribution, using code just like you see in Sect.1.3. Create a second vector named y in the same way. Compute the correlation coefficient between the two vectors. What result do you get, and why do you get this result?
- Get a feel for how to decide when add-on packages might be useful for you. Log in tohttp://www.rseek.organd look up what the stringr package does. What kinds of functionality does this package give you? When might you want to use it?
The arrow (<-) is the traditional assignment operator, though a single equals sign (=) also can serve for assignments.↩︎