Wednesday, March 8, 2017

Make a file with all dictionaries or go into datafile to make a dictionary?

We are timing the difference between 1. Making a file with the dictionaries or 2. Going into the master datafile(s) and making the appropriate dictionaries. This will help us determine which way we should approach the analysis.

I wrote a function to input a certain age and get all the individuals that are less/greater than that value depending on which the user wants. The two different ways get these individuals are going to analyzed. So far the time it takes to build the master dictionary file is >30 minutes but we would only have to do this every time we have an update that wanted to be analyzed or a different subset of the data. Stay tuned on the actual timing!

Wednesday, February 15, 2017

Week 4- Got the plotly plots working and met with CARES!

I met all of my goals from last week! As a reminder my goals were to:

  • Create a function that will determine what type of Project the ProjectID matches up with in the Project.csv file.
  • Get the plotly working with the plots so that the mouse-over feature works on the new plots.
  • Create a function called "pretty print" that can print out an individual's dictionary in a legible manner.

Here is an example plot from plotly!!

This plotly plot shows the amount of times each individual was entered into the system. The y-axis shows each individual and the x-axis is the time frame they were specifically in the program. The colors are different programs. There is a mouse-over feature that tell you additional information but cannot be seen on this static plot. The following plot shows the matplotlib plot of the same data: 



These are from two different python plotting packages. 

Additionally, Dr. Bellis and I presented some of this preliminary work with Maureen and Terry from CARES and Ruth from ACE. It was a good meeting to show them our plots and get some feedback from them on how we should be moving forward, and to ensure that we weren't replicating anything they haven't seen. Something that was very exciting is that they haven't seen their data visualized like this before and were very interested in these example plots. It was even referenced as "magic":D 

Some goals for when we get back from "spring" break:
  • Add more information to individual's dictionaries (like age, disabilities?)
  • Go through the spreadsheets to reference the HMIS data fields and update more of the variables



Wednesday, February 8, 2017

Week 3: Times series plot works!

The goal for the past week was: to create a function that reads in the new dataset but that creates the same dictionary as last semester. I finally got this working! Wellllll, even though it is working, it is not the same as the function last semester. I still need to write another function that will match up the Projects and determine which type of project it is, whether it is an "Emergency Shelter" or "Rapid Re-Housing", etc. This way when we plot the time series it will be easy to see the differences between each type of project an individual is in.

So, goals for this week:

  • Create a function that will determine what type of Project the ProjectID matches up with in the Project.csv file.
  • Get the plotly working with the plots so that the mouse-over feature works on the new plots.
  • Create a function called "pretty print" that can print out an individual's dictionary in a legible manner.

Dr. Bellis and I will present our preliminary work with some of the CARES staff and ACE faculty on Wednesday 2/15/17 at 11am. So this presentation also needs to get done by Wednesday of next week.

Tuesday, January 31, 2017

Week 2- coding: fix one error- create two more

My goal for this past week was:
  • Create a function that reads in the new dataset but that creates the same dictionary as last semester 
  • Analyze the variables within two of the files 

There has been some difficulties with the function I am trying to write, because instead of just one file I am pulling data from, there are multiple files. I then have to match up all the Project Entry ID numbers in these files to extract the dates and correct information out and store them into a dictionary. This is a work-in-progress and it is my goal to finish this function by next week. 


Tuesday, January 24, 2017

Week 1- Cleaning the data and creating a file of dictionaries

This week was filled with looking at the large dataset that we were given from CARES, Inc. in Albany, NY. There are 12 different files, with the largest file containing over 310,000 lines (each line represents an individual's entry) and the smallest containing only 2 lines. These files contain information that individuals staying in homelessness programs have entered into HMIS's system. We created a google spreadsheet to explain of the fields that are in each file. This will require some time since this must be analyzed in two different manuals that HMIS have provided. My goal is to document the fields from two files per week.

Last semester we were given a sample dataset that we put the following variables into a dictionary: start date of program, end date of program, length of stay, personal ID, and project type (Emergency Shelter, Rapid Re-housing, Street Outreach, etc.). With the new dataset, we need to add these same variables into a new dictionary. This will be a general function that can read in any of the 12 files and return a list of dictionaries.

My goal for this past week: 

  • To analyze the dataset and familiarize myself with the new data
  • Write the existing dictionary to a file and read it in using Python
  • Determine which process is more efficient/faster: 1. Reading in the text file and creating a dictionary or 2. Reading in a file with the existing dictionaries
Results: I started to analyze the dataset and have finished the headings for the first file. Reading in a file with the dictionaries is much quicker (~35x faster!!!!)

My goal for the following week:
  • Create a function that reads in the new dataset but creates the same dictionary as last semester 
  • Analyze the variables within two of the files 

Wednesday, January 18, 2017

Summary of the Project

An organization called CARES has been collecting personal, housing and service information from homeless or potentially homeless individuals in New York that we are going to be able to analyze using Python. By analyzing and visualizing this data, we are hoping to observe trends that lead to specific characteristics of individuals that could lead to chronic homelessness and instability. The tool we are striving to create will be able to identify these characteristics as red flags and ensure that the individuals that need extra help actually get the attention and resources that they need. This could lead to a more efficient and successful way to match the appropriate individuals with the program that will best suit them and ultimately reduce the amount of homelessness in New York State.