Resources
This page collects useful resources to support your learning in Data and Code Management: From Collection to Application.
All resources are freely available online.
Getting started with R and RStudio
- The CRAN website — official source for R and its documentation.
- An Introduction to R — official tutorial covering basics of R programming and statistical analysis.
- RStudio IDE — the most widely used IDE for R.
- RStudio cheat sheets — concise guides for R packages and workflows.
- The tidyverse — collection of R packages for data manipulation, visualization, and analysis.
Getting started with Python
- Python official documentation — language reference and tutorials.
- Python Data Science Handbook by Jake VanderPlas — free online guide to NumPy, pandas, matplotlib, scikit-learn.
- Real Python — tutorials and best practices for Python programming.
- Pandas documentation — data wrangling in Python.
- Matplotlib gallery — quick examples for Python plotting.
SQL and Databases
- Mode SQL Tutorial — interactive beginner-to-advanced SQL lessons.
- SQLBolt — hands-on SQL exercises.
- PostgreSQL documentation — reference for one of the most popular open-source databases.
- SQLite documentation — lightweight relational database, easy to set up.
Reproducibility and Collaboration
- Happy Git and GitHub for the useR by Jenny Bryan — beginner-friendly guide to version control.
- Pro Git Book — complete reference on Git.
- RMarkdown and Jupyter Notebooks — tools for literate programming.
- Quarto — next-generation publishing system for R, Python, and Julia.
Business Intelligence & Visualization
- Power BI Documentation — tutorials and user guides from Microsoft.
- Mastering Shiny by Hadley Wickham — developing interactive web apps in R.
- Streamlit documentation — build web apps in Python for data science.
- ggplot2 book by Hadley Wickham — R visualization.
- Seaborn documentation — Python data visualization.
Large Language Models (LLMs) and AI Tools
- Prompt Engineering Guide — strategies for effective LLM use.
- LangChain documentation — framework for building applications powered by LLMs in Python.
- OpenAI API documentation — reference for integrating LLMs into data workflows.
Recommended Books
- Advanced R by Hadley Wickham.
- R for Data Science by Grolemund & Wickham.
- Fluent Python (2nd Edition) by Luciano Ramalho.
- Effective Python by Brett Slatkin.
- SQL for Data Scientists by Renee M. P. Teate.
Miscellaneous
- Shiny — open-source R package for interactive apps.
- Streamlit — Python alternative for quick dashboards.
- Rcpp for Seamless R and C++ Integration — extend R with C++.
- Engineering Production-Grade Shiny Apps — best practices for robust Shiny development.