Tuesday, January 31, 2017

Week 2- coding: fix one error- create two more

My goal for this past week was:
  • Create a function that reads in the new dataset but that creates the same dictionary as last semester 
  • Analyze the variables within two of the files 

There has been some difficulties with the function I am trying to write, because instead of just one file I am pulling data from, there are multiple files. I then have to match up all the Project Entry ID numbers in these files to extract the dates and correct information out and store them into a dictionary. This is a work-in-progress and it is my goal to finish this function by next week. 


Tuesday, January 24, 2017

Week 1- Cleaning the data and creating a file of dictionaries

This week was filled with looking at the large dataset that we were given from CARES, Inc. in Albany, NY. There are 12 different files, with the largest file containing over 310,000 lines (each line represents an individual's entry) and the smallest containing only 2 lines. These files contain information that individuals staying in homelessness programs have entered into HMIS's system. We created a google spreadsheet to explain of the fields that are in each file. This will require some time since this must be analyzed in two different manuals that HMIS have provided. My goal is to document the fields from two files per week.

Last semester we were given a sample dataset that we put the following variables into a dictionary: start date of program, end date of program, length of stay, personal ID, and project type (Emergency Shelter, Rapid Re-housing, Street Outreach, etc.). With the new dataset, we need to add these same variables into a new dictionary. This will be a general function that can read in any of the 12 files and return a list of dictionaries.

My goal for this past week: 

  • To analyze the dataset and familiarize myself with the new data
  • Write the existing dictionary to a file and read it in using Python
  • Determine which process is more efficient/faster: 1. Reading in the text file and creating a dictionary or 2. Reading in a file with the existing dictionaries
Results: I started to analyze the dataset and have finished the headings for the first file. Reading in a file with the dictionaries is much quicker (~35x faster!!!!)

My goal for the following week:
  • Create a function that reads in the new dataset but creates the same dictionary as last semester 
  • Analyze the variables within two of the files 

Wednesday, January 18, 2017

Summary of the Project

An organization called CARES has been collecting personal, housing and service information from homeless or potentially homeless individuals in New York that we are going to be able to analyze using Python. By analyzing and visualizing this data, we are hoping to observe trends that lead to specific characteristics of individuals that could lead to chronic homelessness and instability. The tool we are striving to create will be able to identify these characteristics as red flags and ensure that the individuals that need extra help actually get the attention and resources that they need. This could lead to a more efficient and successful way to match the appropriate individuals with the program that will best suit them and ultimately reduce the amount of homelessness in New York State.