you could've just copy + pasted: https://cran.r-project.org/ in case any of us wanted to get a horribly misformed dataframe. The data I provided was to talk about what you were trying to refute, not absolutely everything you have ever said in your life. Jesus Christ. Does it say legal immigrants? Does it say immigrants of a certain status? No, it says immigrants. You're adding conditions now because apparently you can't procede with immigration status, but you're going to randomly include randomass variables like family income in some grabass horrorshow of irrelevancy. I even gave you a method for you to derive immigration status that you refused to take. And can you bring up when I said anything about Sergey Brin's immigration status? Specifically that he is a H1-B holder? I see talking about the co-founder of Zenefits, and Yann LaCun, which is not Google nor Sergey Brin last time I checked. Sergey would have been probably on the student visa class given his thesis work at Stanford. I feel like it would be out of character for me to say what specific visa class he was a part of but surprise me if you can. All I see is you bringing up Sergey for no ******* reason.
I don't see anything wrong with it other than I could have stopped at tbl_df and written over x.file with the select() function. You can take it out, use as.tbl or some other wrapper, or you could opt to not use a wrapper and deal with the data frame itself. I like the way dplyr's wrapper's don't print all columns when head() is called. The code I shared will download all the relevant CSV files within the link you gave; it will also "store" the links in the hrefs.csv vector. httr::GET() is another package/function you can use to download a file, if you so chose. Personally I like R - indentation does not change the nature of the code.
I, again, don't have any problem with the file format itself and I understand why you'd want to treat your dataframe the way you do. yes, when you have more data than needed, it's nice to access head() and the first 5 values alone. I am trying to question the contents of the dataframe and how it is currently organized. ex: since you didn't do a UNION, how are you dealing with the unique IDs per year, why no groupby() or whatever the f**k R equivalent there is, why 12 columns (what extra value do some of these columns bring), and I have a strong suspicion your parsing is off when it comes to the different CSVs you've read into memory but that has been lessened now that I look more carefully. Just so we're clear, the bolded is what I was disputing. I think now though, you're reading into a CSV a compendium of all links then using the CSV as your filtered input vector for your tbl_df, which is just a strange thing for somebody used to storing values as dicts, parsing them there on the spot with dictionary comprehension, then grabbing files directly through requests and the CSV module. All this to say: BOO R. it's a strange thing to me that you hate Python for indentation reasons -- then you love R for having the strangest syntax known to man??? i almost got a much more efficient structure, but you made me think in regex, and now I gotta run. still this below = all of the CSV vector bulls**t you have to do with R.
#libraries to import import requests from bs4 import BeautifulSoup import re #get all of the s**t r = requests.get("http://www.kauffman.org/microsites/kauffman-index/about/archive/kauffman-index-of-entrepreneurial-activity-data-files") soup = BeautifulSoup(r.text, 'html.parser') print (soup.prettify) #parse all of the s**t links = soup.findAll(href=re.compile("\.csv$")) for link in links: print(link.get('href')) linklist = list(links) print(linklist) (I don't like sets, better list slicing notation) Just ran that quickly in between another sprint item. If you run it in a Python notebook, you should be fine, I just need to fiddle with the regex to get it more precise (f**king regex.) and I'm halfway to what you've done.
Meanwhile, there is real crime connected to immigration... ICE lawyer in Seattle charged with stealing immigrant IDs https://www.seattletimes.com/seattl...seattle-charged-with-stealing-immigrants-ids/