Immigration and Jobs

Cohete Rojo · May 18, 2016

Northside Storm said: ↑

I've built things in production that are used by tens of thousands of people.

...

no, no, I'm not a programmer.

Python with a deep affinity for Pandas, React.js now and a few Meteor apps. I've worked with Django, Flask (though work is a strong word with Flask), and Rails/HAML. Trying to learn Scala/Spark and the whole distributed computing route next because I'm starting to touch really massive datasets.

Did you learn a whole bunch of computer science buzzwords in the 1980s and decide to use them now? Stay current at least, Mr. USB.
Click to expand...

For someone so proficient with Python, why did you recommend a CSV and not Feather?

Northside Storm · May 18, 2016

Cohete Rojo said: ↑

For someone so proficient with Python, why did you recommend a CSV and not Feather?
Click to expand...

Lowest common denominator format. I don't have to assume any technical knowledge on your part to assume you can understand how to use a CSV, which anybody who has ever touched a default spreadsheet tool could use.

I prefer HDF5 myself for larger dataframes, but I'm used to interfacing with all kinds of people, especially open data folks. I've learned something pretty quickly: don't assume people know anything about technology. Always lean to the format that is accessible.

I expect I will have to move on from HDF5 if I want to learn how to deal with more distributed computing type problems.

thanks bruv, let me know if you still want that CSV

Cohete Rojo · May 19, 2016

Northside Storm said: ↑

Lowest common denominator format. I don't have to assume any technical knowledge on your part to assume you can understand how to use a CSV, which anybody who has ever touched a default spreadsheet tool could use.

I prefer HDF5 myself for larger dataframes, but I'm used to interfacing with all kinds of people, especially open data folks. I've learned something pretty quickly: don't assume people know anything about technology. Always lean to the format that is accessible.

I expect I will have to move on from HDF5 if I want to learn how to deal with more distributed computing type problems.

thanks bruv, let me know if you still want that CSV
Click to expand...

Once again, yes I would like to see that data. I'd also like to see that link to your lawyer's website you talked about in the other thread. For someone teaching the basics of "data" science, you seem to have an aversion to data itself.

Northside Storm · May 19, 2016

Cohete Rojo said: ↑

Once again, yes I would like to see that data. I'd also like to see that link to your lawyer's website you talked about in the other thread. For someone teaching the basics of "data" science, you seem to have an aversion to data itself.
Click to expand...

email me through the board, let's set up a 10 minute phone call if you're so interested in knowing about me and data science.

My aversion to redo work other people have already done is kinda rule #1 of programming.

but here...

http://www.kauffman.org/microsites/...-index-of-entrepreneurial-activity-data-files

KISA Data year X to 2014 is what you want. I'm sure you know enough to get that into different file formats if needed

You will need the Cookbook (in the link I attached) to understand the column labels. you are probably most interested in immigr (simple binary categorical variable) and natvty for more granular data throughout the different years, though natvty is a numbered column that corresponds to country codes you may want to store in a dictionary. You will probably want to stitch together the CSVs, convert them to a different file format for performance issues (hey, Feather could work), and then use Pandas to slice and dice through the years to get the growth figures for native entrepreneurship vs non-native. I suggest reading them into Pandas then wrangling them there before optimizing them for performance--should be fairly simple as all of the datasets have the same number of features, and you can easily categorize which timeframe a value comes from by appending a dummy combined year-month column to each CSV, or by referring to the year columns if you don't want to go into month-by-month detail. You will want to graph the change in ent015u as compared to demographic factors to do your own in-depth analysis of the contested figure.

You will not, I repeat NOT want to use any spreadsheet tool as each individual CSV is quite large.

You may also want to check the CPS for cross-reference and you want to see what person_ids correspond to. May contain more qualitative data the categories of data here don't capture.

My immigration attorney: http://gordonlawgrouppc.com/team/gali-schaham-gordon/. Don't contact her unsolicited unless you're willing to sink in 1-1.5k.

Thanks.

Cohete Rojo · May 19, 2016

Northside Storm said: ↑

email me through the board, let's set up a 10 minute phone call if you're so interested in knowing about me and data science.

My aversion to redo work other people have already done is kinda rule #1 of programming.

but here...

http://www.kauffman.org/microsites/...-index-of-entrepreneurial-activity-data-files

KISA Data year X to 2014 is what you want. I'm sure you know enough to get that into different file formats if needed

You will need the Cookbook (in the link I attached) to understand the column labels. you are probably most interested in immigr (simple binary categorical variable) and natvty for more granular data throughout the different years, though natvty is a numbered column that corresponds to country codes you may want to store in a dictionary. You will probably want to stitch together the CSVs, convert them to a different file format for performance issues (hey, Feather could work), and then use Pandas to slice and dice through the years to get the growth figures for native entrepreneurship vs non-native. I suggest reading them into Pandas then wrangling them there before optimizing them for performance--should be fairly simple as all of the datasets have the same number of features, and you can easily categorize which timeframe a value comes from by appending a dummy combined year-month column to each CSV, or by referring to the year columns if you don't want to go into month-by-month detail. You will want to graph the change in ent015u as compared to demographic factors to do your own in-depth analysis of the contested figure.

You may also want to check the CPS for cross-reference.

My immigration attorney: http://gordonlawgrouppc.com/team/gali-schaham-gordon/. Don't contact her unsolicited unless you're willing to sink in 1-1.5k.

Thanks.
Click to expand...

I don't use Pandoc, but I will take a look. Thanks.

Northside Storm · May 19, 2016

Cohete Rojo said: ↑

I don't use Pandoc, but I will take a look. Thanks.
Click to expand...

Whatever you use, be aware of performance and memory issues. They're not that large but stitching everything together won't be totally trivial unless you're doing it programmatically.

Northside Storm · May 28, 2016

Cohete!

Apparently I have to repeat myself:

You will need the Cookbook (in the link I attached) to understand the column labels. you are probably most interested in immigr (simple binary categorical variable) and natvty for more granular data throughout the different years, though natvty is a numbered column that corresponds to country codes you may want to store in a dictionary.
Click to expand...

Straight from the horse's mouth and the Codebook itself:

immigr immigrant
natvty country of birth (see Appendix 4 for codes)

You even have, for s**ts and giggles:

spneth Spanish ethnicity

Here is our original discussion:

Northside Storm said:

Millions of more people who would create that effect--in case you hadn't bothered to remember that immigrants are twice as likely as natives to found companies and had a near-null effect on wages.
Click to expand...

Cohete Rojo said:

You keep saying this but you don't provide the data.
Click to expand...

Cohete Rojo said:

I like to be able to facet my data - aggregate it, plot it, and think about it. Not sure what they taught you in Canada but that's what I was taught here in the US.
Click to expand...

I've provided you data to show that immigrants found businesses at twice the rate of natives. This is a discussion independent of immigration status. You like to facet your data apparently, well, don't let changing the goalposts do anything for you.

Re: wages

http://www.cato.org/blog/immigrations-real-impact-wages-employment

The implicit assumption in CIS’ publications is that if those millions of immigrants weren’t working in the United States, more native-born Americans would have jobs – a static view of the economy. CIS’ fixed pie implication is inappropriate to any kind of reasonable economic analysis of the effects of immigration on the labor market. That is the primary reason why labor economists do not use CIS’ methods when attempting to measure the labor market impacts of immigration. Even if CIS’ numbers were compiled correctly, they are not measuring anything useful.

A large body of academic economic research has found that immigration has a relatively small effect on U.S-born American wages and their employment prospects. For wages impact, the estimates are that immigrants either lower the wages of some American workers by about 2 percent or raise them by about 2 percent in a dynamic economy (this, this, and this). The employment effects vary little but, like wages, the effects are small and clustered around zero. Nowhere will you find a tradeoff where one additional immigrants means that one American loses a job in the economy.
Click to expand...

David Card (2012)
COMMENT: THE ELUSIVE SEARCH FOR
NEGATIVE WAGE IMPACTS OF IMMIGRATION

http://davidcard.berkeley.edu/papers/jeea2012.pdf

To many non-economists it seems obvious that a rise in population leads to lower wages.
Standard economic models break the Malthusian link through capital accumulation
and suggest that the main effect of immigration will be on relative wages, and even then
only if immigrants alter the distribution of the labor force across different skill groups.
The two papers by OP and MMW provide valuable new evidence on the ways that skill
groups are defined, and on the quantitative magnitude of the effect of immigration on
native wages in the United States and the U.K. Importantly, these papers show that there
is no conflict between the conclusions drawn from studies of economy-wide trends in
wages and quantities of labor, and the conclusions from local area studies that compare
wages and employment rates in different cities, states, or regions. In both cases, the
effects of immigration on the relative wage structure of native workers are quite
small.

Contrary to the impression conveyed by the title of Borjas (2003), the absence of
a strong effect of immigrant inflows is fully consistent with a properly specified model
of the demand side of the labor market. This is not to say that some particular groups
of workers haven’t been affected by competition from immigrants. But the state of the
evidence suggests that the overall impacts on native wages are small—far smaller than
the effects of other factors like new technology, institutional changes, and recessionary
macro conditions that have cumulatively led to several decades of slow wage growth
for most US workers.
Click to expand...

Unauthorized immigrants are very mobile workers. If they come from geographically closer places like Mexico or Latin America then they can respond very rapidly to changes in the U.S. labor market. They are then very mobile once they arrive in the United States. The stock of unauthorized immigrants peaked in 2007 before the Great Recession and dipped after that as the unemployment rate for immigrants rose (-0.42 correlation coefficient). By the way, the worst period for native job growth occurred during the period when the number of unauthorized immigrants also declined. If immigrants [of any kind] really did substantially decrease employment opportunities for Americans, we wouldn’t see that effect.
Click to expand...

Cohete Rojo · May 28, 2016

Northside Storm said: ↑

Cohete!

Apparently I have to repeat myself:

You do have data on immigration status:

Straight from the horse's mouth and the Codebook itself:

immigr immigrant
natvty country of birth (see Appendix 4 for codes)

You even have, for s**ts and giggles:

spneth Spanish ethnicity
Click to expand...

Nope. Tells me absolutely nothing about whether someone is H1B visa, student, illegal, or other.

Northside Storm · May 28, 2016

Cohete Rojo said: ↑

Nope. Tells me absolutely nothing about whether someone is H1B visa, student, illegal, or other.
Click to expand...

Look to my edited post. You've moved the goalposts, and you know it.

Originally Posted by Northside Storm
Millions of more people who would create that effect--in case you hadn't bothered to remember that immigrants are twice as likely as natives to found companies and had a near-null effect on wages.

Quote:
Originally Posted by Cohete Rojo
You keep saying this but you don't provide the data.

Quote:
Originally Posted by Cohete Rojo
I like to be able to facet my data - aggregate it, plot it, and think about it. Not sure what they taught you in Canada but that's what I was taught here in the US.
Click to expand...

I've provided you data to show that immigrants found businesses at twice the rate of natives. This is a discussion independent of immigration status. You like to facet your data apparently, well, don't let changing the goalposts do anything for you.

It's okay just to say "I'm lazy, and changing the goalposts is what I'm currently doing."

I'd like to see your "incomplete code". Please share in a Github repo, or format it here if you have privacy issues.

Northside Storm · May 28, 2016

As for changing the goalposts, you don't even like thinking of your data too deeply, as it turns out. I guess the faceting perspective only comes out of things spoonfed to you.

http://fivethirtyeight.com/datalab/how-do-we-know-how-many-undocumented-immigrants-there-are/

http://www.pewhispanic.org/2008/10/02/appendix-a-methodology-2/

The estimates of the unauthorized immigrant population presented in this report are derived with a residual methodology that compares the size of the total foreign-born population of the U.S. (legal and undocumented) with an independent, demographic estimate of the legally resident foreign-born population. The difference between the two is the estimated unauthorized population. Variants of the residual method have been used as a basis for measuring the unauthorized immigrant population since 1980 by various analysts, most recently by the Department of Homeland Security (Hoefer et al. 2008). (See Passel 2007 for a review of methods and estimates.) This appendix includes a brief description of the estimation methods and highlights critical assumptions and parameters.
Click to expand...

For 1820-2012, you have data on legal permanent residents from the DHS. You can extrapolate a reasonably confident interval of illegal immigrants if that's your thing.

https://www.dhs.gov/publication/yearbook-immigration-statistics-2012-legal-permanent-residents

Have fun!

g1184 · May 29, 2016

Among Hispanics, immigrants were about twice as likely as those born in the U.S. to be self-employed, by 11% to 6%.

Almost one-in-five (17%) white immigrants were self-employed in 2014, compared with 11% of whites who were U.S. born.

http://www.pewsocialtrends.org/2015/10/22/immigrants-contributions-to-job-creation/

Have fun.

Cohete Rojo · May 31, 2016

I haven't moved any goalposts.

Here is the primer on the code and such:

Getting started:

Code:

# To download R, go to the following link
# and choose a mirror server closest to your location:
#     https://cran.r-project.org/mirrors.html
# Then download the appropriate version for your operating system
# 
#         if you want to learn more about the packages available at CRAN, 
#         start with task views:
#             https://cran.r-project.org/web/views/
#
#         if you want to read about a package's functionality, try for example:
#             vignette(package = "rvest")
#             # browse the list of topics and select one
#             vignette(package = "rvest", topic = "selectorgadget")
# 
# to run R, download RStudio for your operating system at the folowing link:
#     https://www.rstudio.com/products/rstudio/download/

About memory:

Code:

# R loads everything into virtual memory.
# Use a system program to monitor your system memory usage.
# Use the following functions to monitor your R memory (i.e. lsos()).
# Found at: http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session
#
# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                         decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
        fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

The code I will share for now will download and read the CSV's into the program.
It will keep only a dozen or so of the columns.
The total amount of memory used when I ran this script was about 1.9 giga bytes.

Code:


# Load Dependencies -------------------------------------------------------

install.packages(c("rvest", "plyr", "dplyr"))  # install first
library(rvest)
library(plyr)   ## for optional ldply()
library(dplyr)


# Create, read, and parse url for the unique hrefs ------------------------


url <- "http://www.kauffman.org/microsites/kauffman-index/about/archive/kauffman-index-of-entrepreneurial-activity-data-files"
hrefs <- url %>%
         read_html() %>%
         html_nodes("a") %>%
         html_attr("href")
hrefs.csv <- hrefs[grep("csv", hrefs)] %>% unique


# Create file names to save downloads -------------------------------------


d.file <- paste0("k.",
                 2014:1996,
                 ".csv")


# Download the csv files in hrefs.csv -------------------------------------


# these files will be given the names in d.file
# files will download to your working directory
# getwd() is your working directory; setwd() sets your working directory
d.status <- mapply(FUN = download.file, url = hrefs.csv, destfile = d.file)
# sum(d.status) should equal 0


# Read all of the downloaded files ----------------------------------------


# if you want to UNION all the data frames in the list:
    # then delete the "#" in "# %>% ldply()"
    # then ignore names(k.data)
k.data <- lapply(d.file, function(x){
    # the quick and dirty approach to finding and declaring column data types
    classes <- readLines(x, n = 2L) %>% textConnection() %>%
               read.csv(stringsAsFactors = FALSE) %>%
               sapply(class)
    classes[classes == "logical"] <- "numeric" 
    classes[classes == "integer"] <- "numeric"
    x.file <- read.csv(x, colClasses = classes) %>% tbl_df() %>%
              select(month,
                     year,
                     immigr,
                     age,
                     ent015u,
                     ent015ua,
                     faminc,
                     grdatn,
                     hours,
                     state,
                     class,
                     mlr,
                     indmaj2,
                     class_t1,
                     mlr_t1,
                     indmaj2_t1,
                     pid)
}) # %>% ldply()

names(k.data) <- 2014:1996  # add names to the list or IGNORE if using ldply()

Toodles.

Northside Storm · May 31, 2016

Cohete Rojo said: ↑

I haven't moved any goalposts.
Click to expand...

Millions of more people who would create that effect--in case you hadn't bothered to remember that immigrants are twice as likely as natives to found companies and had a near-null effect on wages.
Click to expand...

You keep saying this but you don't provide the data.

During the plagues of Europe, the dwindling labor supply allowed those peasants who survived to bargain for better wages, rents, and conditions.

https://books.google.com/books?id=3I...page&q&f=false
Click to expand...

what the f**k are you talking about. Yeah, you referred to H1B holders in a totally different post then the one that prompted your whole demand for data.

Northside Storm · May 31, 2016

Cohete Rojo said: ↑
Here is the primer on the code and such:

Getting started:
Code:
# To download R, go to the following link
# and choose a mirror server closest to your location:
#     https://cran.r-project.org/mirrors.html
# Then download the appropriate version for your operating system
# 
#         if you want to learn more about the packages available at CRAN, 
#         start with task views:
#             https://cran.r-project.org/web/views/
#
#         if you want to read about a package's functionality, try for example:
#             vignette(package = "rvest")
#             # browse the list of topics and select one
#             vignette(package = "rvest", topic = "selectorgadget")
# 
# to run R, download RStudio for your operating system at the folowing link:
#     https://www.rstudio.com/products/rstudio/download/
Why are you copying and pasting R installation documentation to show that you're using R? A simple "I'm using R" would have been sufficient? -Northside Storm

About memory:
Code:
# R loads everything into virtual memory.
# Use a system program to monitor your system memory usage.
# Use the following functions to monitor your R memory (i.e. lsos()).
# Found at: http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session
#
# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                         decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
        fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}
I mean, I get that programming these days is copy & pasting from Stack Overflow, but I don't get why you're showing me your ability to copy & paste top posts from SO? - Northside Storm

http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session

The code I will share for now will download and read the CSV's into the program.
It will keep only a dozen or so of the columns.
The total amount of memory used when I ran this script was about 1.9 giga bytes.
Click to expand...
Code:
# Load Dependencies -------------------------------------------------------

install.packages(c("rvest", "plyr", "dplyr"))  # install first
library(rvest)
library(plyr)   ## for optional ldply()
library(dplyr)


# Create, read, and parse url for the unique hrefs ------------------------


url <- "http://www.kauffman.org/microsites/kauffman-index/about/archive/kauffman-index-of-entrepreneurial-activity-data-files"
hrefs <- url %>%
         read_html() %>%
         html_nodes("a") %>%
         html_attr("href")
hrefs.csv <- hrefs[grep("csv", hrefs)] %>% unique


# Create file names to save downloads -------------------------------------


d.file <- paste0("k.",
                 2014:1996,
                 ".csv")


# Download the csv files in hrefs.csv -------------------------------------


# these files will be given the names in d.file
# files will download to your working directory
# getwd() is your working directory; setwd() sets your working directory
d.status <- mapply(FUN = download.file, url = hrefs.csv, destfile = d.file)
# sum(d.status) should equal 0


# Read all of the downloaded files ----------------------------------------


# if you want to UNION all the data frames in the list:
    # then delete the "#" in "# %>% ldply()"
    # then ignore names(k.data)
k.data <- lapply(d.file, function(x){
    # the quick and dirty approach to finding and declaring column data types
    classes <- readLines(x, n = 2L) %>% textConnection() %>%
               read.csv(stringsAsFactors = FALSE) %>%
               sapply(class)
    classes[classes == "logical"] <- "numeric" 
    classes[classes == "integer"] <- "numeric"
    x.file <- read.csv(x, colClasses = classes) %>% tbl_df() %>%
              select(month,
                     year,
                     immigr,
                     age,
                     ent015u,
                     ent015ua,
                     faminc,
                     grdatn,
                     hours,
                     state,
                     class,
                     mlr,
                     indmaj2,
                     class_t1,
                     mlr_t1,
                     indmaj2_t1,
                     pid)
}) # %>% ldply()

names(k.data) <- 2014:1996  # add names to the list or IGNORE if using ldply()
Toodles.
Click to expand...
You went to this much effort to show you could read data and didn't even start wrangling it beyond selectively choosing a few columns and setting the dataframe? You left so many random comments behind that I have to conclude this is from some tutorial. It would have taken you a few more steps just to filter through with dplyr.

A few problems here:

1) Why did you choose these 12 columns? I didn't see you choose natvty or crucially month (are you going to be doing your analysis year-by-year?). Curious as to your thinking behind why family income matters in this debate?

2) You remind me of why I hate R syntax and generally try to avoid it, but even a cursory reading of this (and maybe why the file size is so large) shows me that you're ingesting a lot of CSVs for no reason! If you parse all CSV links you'll also end up with

National Components Data 2015
National Demographic Components Data 2015
State Components Data 2015
Metro Area Components Data 2015
All Geographies Components Data 2015

And a blowjob of a mess when it comes to wrangling

Does rvest not have filtering/parsing options like requests and beautifulsoup? o_0

3) And I guess the major problem, for somebody who likes aggregating and faceting data, why did you stop at importing them into memory, copy+pasting 2 blocks of core R documentation (including of all things, installation instructions?!), and then calling it a day? You honestly are closer to actionable insights with a few lines of code then the time it took you to copy + paste random documentation?

Unless your tbl_df is all f**ked up which judging by how it's been imported, I would guess would be the case.

I mean, I don't even mess with R, but damn dude, this is pretty lazy.

Northside Storm · May 31, 2016

I don't know how R deals with time series analysis, I imagine there must be some really good libraries out there somewhere, but Pandas is ace for that s**t.

In case you wanted to facet by time periods (you do.)

http://pandas.pydata.org/pandas-docs/stable/timeseries.html

Cohete Rojo · May 31, 2016

I've brought up H1B visas in this thread (see above quote), and even corrected you on Sergey Brin's immigration status.

This is the code I am sharing for now - as I said above.

I provided information to get you started in R. If you, or any other CF member, should need further assistance, please let me know.

Cohete Rojo · May 31, 2016

Btw, that's the way tbl_df is designed to look. It does not print all the columns nor all the rows.

Cohete Rojo · May 31, 2016

As far as parsing options for rvest, it uses xml2 for parsing and is built on rcurl. You could try using httr to send individual request, but I think much of that is beyond the scope of this thread. rvest and download.file should be enough to download the data.

Feel free to show whatever alternate code you may have.

Northside Storm · May 31, 2016

Cohete Rojo said: ↑

Btw, that's the way tbl_df is designed to look. It does not print all the columns nor all the rows.
Click to expand...

I'm aware of that. I'm telling you your columns and your rows are probably screwed up as a function of how you designed your tbl_df and how you combined different CSVs. I am not questioning your use of tbl_df given you're dealing with a larger though not unpleasantly large data set, I am questioning your particular tbl_df and the logic of how you decided to wrangle data.

Northside Storm · May 31, 2016

Cohete Rojo said: ↑

As far as parsing options for rvest, it uses xml2 for parsing and is built on rcurl. You could try using httr to send individual request, but I think much of that is beyond the scope of this thread. rvest and download.file should be enough to download the data.

Feel free to show whatever alternate code you may have.
Click to expand...

or you can not use R, which until Hadley Wickham came about, utterly sucked balls at taking information from the web. No, you don't have to tell me what libraries in R are using one another, that's not the problem I'm bringing up.

Or you could even manually have stored things instead of scraping things together if you couldn't do it properly in R. jesus.

One of these weekends, if I have the time, I'll do the whole ******* thing with requests + beautifulsoup and it'll take a hell of a lot less time and documentation to get through it all. Of course this is your data you asked for to "facet and aggregate".

Is this what is taught in the States? maybe that's why H1B visas are so in demand.

Forums

Immigration and Jobs

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Northside Storm Contributing Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Northside Storm Contributing Member

g1184 Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Northside Storm Contributing Member

Northside Storm Contributing Member

Cohete Rojo Contributing Member

Cohete Rojo Contributing Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

Northside Storm Contributing Member

Share This Page

About ClutchFans

Rockets Content

Support ClutchFans!