Posts

Final Project

Image
I chose to do my project on Shipping. I worked as head of a Shipping department for several years so I figured I could do something with a dataset that involved that. My package has several features. The first of which cleans a dataset and changes the values to numeric for proper formatting.  I then have a validate code to make sure the columns the next 2 executables function properly. If the columns are not present it will kick back a false flag. My next executable takes a dataset and groups it by most common shipping zone. It then lists them in most frequently shipped to order and the average cost it takes to ship something to that area.  My final feature is a heatmap. You can load a CSV into it and it will show you which zones are getting more orders at a quick glance.  It should in theory be applicable to any number of entries in a dataset as long as it is in the proper format. Other features could be added as needed with quick ease and a dashboard could be made if on...

Assignment #12: Introduction to R Markdown

Image
 R markdown is a fairly interesting idea in concept. The whole Latex system took a bit of research to figure out, and I only used a very small portion of what its capable of. You can easily setup the narrative and code sections of an R markdown document to prepare an easy to follow presentation.  As shown you can add code and run it similar to how the output to html is. The code and immediate output is woven into the document.  code repository can be found at: https://github.com/Wrightkov/r-programming-assignments

Assignment 11

 Today's lesson is in debugging. we were given the code: tukey.outlier <- function(x, k = 1.5) {   q1 <- quantile(x, 0.25, na.rm = TRUE)   q3 <- quantile(x, 0.75, na.rm = TRUE)   iqr <- q3 - q1   x < (q1 - k * iqr) | x > (q3 + k * iqr) } tukey_multiple <- function(x) {   outliers <- array(TRUE, dim = dim(x))   for (j in 1:ncol(x)) {     outliers[, j] <- outliers[, j] && tukey.outlier(x[, j])   }   outlier.vec <- vector("logical", length = nrow(x))   for (i in 1:nrow(x)) {     outlier.vec[i] <- all(outliers[i, ])   }   return(outlier.vec) Error in outliers[, j] && tukey.outlier(x[, j]) : 'length = 10' in coercion to 'logical(1) when run with a seed it gives this error, this happens because this only evaluates the first element in each vector. A fully corrected and debugged version of the code can be found here. This replaces the broken && code and t...

Assignment #10: Building Your Own R Package

The purpose of my blog this week is to build a  draft a proposal describing: The purpose and scope of your package (who will use it and why). The idea would be for small businesses to quickly sort and predict where volume of orders are being shipped most often. Key functions you plan to implement (name + brief description). clean address sort labels by region determine where most your key shipping locations are. How you chose the fields in  DESCRIPTION  (dependencies, license, authors). Package: Friedman Title: What the Package Does (One Line, Title Case) Version: 0.0.0.9000 Authors@R:      person("Zachary", "Wright", , "zakkwright@yahoo.com", role = c("aut", "cre")) Description: "Tools for organizing and analyzing shipping and order data for small collectible businesses.  With the intention of " License: `use_mit_license()`, `use_gpl3_license()` or friends to pick a     license Encoding: UTF-8 Roxygen: list(markdown = TRUE...
Image
 This week's assignment was to make visualizations in three different ways in R.  I chose a humorous Beenie baby dataset to work with from the list chosen that is based around the age of the Beenie and its current price.   Dataset can be located here: https://vincentarelbundock.github.io/Rdatasets/datasets.html My first 2 plots were simple plots using built in R commands: plot(beanie$age, beanie$value,      main = "Beanie Value vs Age",      xlab = "Age",      ylab = "Value") hist(beanie$value,      main = "Distribution of Beanie Values",      xlab = "Value") Next I was told to make a plot using "lattice"  xyplot(value ~ age, data = beanie,        main = "Value vs Age",        xlab = "Age",        ylab = "Value",        col = "blue") bwplot(age ~ value, data = beanie,        main = "Value b...

Module # 8 Input/Output, string manipulation and plyr package

 This week was all about manipulating a file and outputting a new one.  Starting with I imported Plyr, this will be used to manipulate the data. library(plyr) I then imported the file that was given to us and did the first use of dply with ddply. Student_assignment_6 <- read.csv("Assignment 6 Dataset.txt", header = TRUE) StudentAverage <- ddply(Student_assignment_6, "Sex", transform,                         Grade.Average = mean(Grade)) print(StudentAverage) I know that dpplyr should have a way to do this next step but I struggled with it a bit and just split it for my own sake. mean_male <- mean(Student_assignment_6$Grade[Student_assignment_6$Sex == "Male"]) mean_female <- mean(Student_assignment_6$Grade[Student_assignment_6$Sex == "Female"]) I then used Grep command to make a subset of people with I in their name. In order to get both upper and lower I used [iI] so it read both i_students <- subset(Stu...

Module # 7 R Object: S3 vs. S4 assignment

  How do you tell what OO system (S3 vs. S4) an object is associated with?          Y ou can check an Objects class using  class(object_name) in order to determine if its S3 or S4. If an object has a formal class definition then it is S4. Most base R objects are S3 How do you determine the base type (like integer or list) of an object? You can use the command typeof() in order to get a readout of what type an object is. For instance class(mtcars) will give you a readout "data.frame" but typeof(mtcars) will give you the base type list.  What is a generic function? A generic function is a function that performs differently depending on the class. R actually decides what the function does for you in a process called method dispatch. methods() will list all the options the function could perform some examples of generics print() summary() plot() mean() head() tail() str() What are the main differences between S3 and S4? S3 is a simple and informa...

Module # 6 Doing math in R part 2

Image
 This week was more on matrixes in R.  For step one we had to make 2 matrixes and then add and subtract them. A = matrix(c(2,0,1,3), ncol=2) B = matrix(c(5,2,4,-1), ncol=2) A + B A - B                                                       Next I was instructed to make a matrix " matrix of size 4 with the following values in the diagonal 4,1,2,3." matrix1 <- diag(c(4, 1, 2, 3))  Finally I was asked to make a very specific matrix using Diag M <- diag(3, 5) M[1, ] <- c(3, 1, 1, 1, 1) M[2:5, 1] <- 2

Module # 5 Doing Math

Image
This week we are looking at matrixes and doing some math with them. To start with we have A which is a 10*10 and B which is a 100*10 matrix(not shown). With the instructions to find the inverse if we used solve(A) it would fail because the determinate is 0. "Error: system is computationally singular" B would fail because it is not square.  There were other instructions like performing multiplication with the Matrixes. for example multiplying the two matrixes together would require inverting B so it is a 10*100 because the inner dimensions need to match. The first few rows of that math are here, the resulting vector is also 10x100.

Module # 4 Programming structure assignment

Image
 This week was about taking raw data and putting them into arrays, converting high and low to numerical values so they can be represented in graphs. and then making a boxplot and histogram to look at the information. I did my side by side boxplot on blood pressure and the final decision on "head of the emergency unit's decision regarding immediate care for the patient". This allows us to see what blood pressure correlates to the decision of admitting them.  As this shows, patients with a blood pressure under 100 were admitted infrequently, and those with blood pressure over 100 were all admitted. I also made a histogram that shows the blood pressure frequency. this allows us to take a look at how many patients were at each blood pressure.  This histogram is right skewed with most patients having blood pressure under 150. As there are extremes outside the norm it may skew the data. Each of those with blood pressure under 150 may be the extreme cases and removing these migh...

Week 3

Image
 This week is about comparing Election poll Results. To start with arranging the 3 variables into a data frame provides us with this. Name ABC_poll_results CBS_poll_results 1 Jeb 4 12 2 Donald 62 75 3 Ted 51 43 4 Marco 21 19 5 Carly 2 1 6 Hillary 14 21 7 Berine 15 19 We can look at the differences between the two polls. Name ABC_poll_results CBS_poll_results Difference 1 Jeb 4 12 -8 2 Donald 62 75 -13 3 Ted 51 43 8 4 Marco 21 19 2 5 Carly 2 1 1 6 Hillary 14 21 -7 7 Berine 15 19 -4 We can also clearly ...

Week 2

 assignment2 <- c(16, 18, 14, 22, 27, 17, 19, 17, 17, 22, 20, 22) myMean <- function(assignment2) {   return(sum(assignment) / length(someData)) } myMean(assignment2) This returns an error as there are 2 erroneous variable names assignment and someData. A corrected program sets both of these to proper names such as:   myMean <- function(assignment2) {   return(sum(assignment2) / length(assignment2)) } myMean(assignment2) This calls to correct Variable and removes the error.

Week 1

Image
  I had no issues installing R because I have had it installed for quite a while.  Vectors in R are a collection of elements, all of which must be the same type. They are used to perform calculations on entire datasets. It also allows you to use statistical functions more directly on them. They are also easier to read at a glance.