[TRU Research] Web App Data Schema

Mon Aug 19 18:26:37 PDT 2019

I have time to go through the survey data and find the reported transit
benefits per employer, though I'll need the data set that contains that
data. :)

Otherwise, I am going to be trying to match CTR neighborhoods to the
employers already in our spreadsheet, as well as adding any employers
mentioned in other sources/sheets on our wiki.

On Mon, Aug 19, 2019 at 6:02 PM Tom Chartrand <tmchartrand at gmail.com> wrote:

> This is looking great, Stephen!
> I had put myself down to organize the survey data with respect to
> employers for this, but I just realized that info was removed as PII (of
> course)! So either Mike will need to take that on (I think Mike did the
> original PII removal) or we'll need to figure out an appropriate way of
> sharing that.
> I'm feeling pretty swamped myself lately, so if you (Stephen) were down to
> help him with the task that could be great. I can certainly still take on
> some of it if needed though, once we get this sorted out.
> Katie, maybe you could help coordinate this and make sure Mike sees this
> sooner rather than later?
>
> Also, do let me know if you have any more specific spots in the report
> where some backup from the PSRC dataset could be useful!
>
> On Sun, Aug 18, 2019 at 3:59 PM Stephen DeSanto <rachidian at gmail.com>
> wrote:
>
>> I've added the list of industry categories to the Google Sheet, so that
>> should help validate the data we add there, though it's going to likely be
>> a manual task to fill in industries for all the employers.
>>
>> I've also added a "citation" column, which can be the public
>> representation of where we got the data to make our claim. We can fuss with
>> the wording later.
>>
>> I should have time this week to go through our survey data and other wiki
>> tables to add or modify employers in the Google Sheet. Agree that it'll be
>> good to have solid information on our primary targets and champions.
>>
>> On Tue, Aug 13, 2019 at 10:48 PM Harry Maher <harryb.maher at gmail.com>
>> wrote:
>>
>>> Just a quick update with regard to qualitative data analysis: I made a
>>> "Commute Survey Qualitative Data Analysis" folder on pbworks and put a doc
>>> with some quotes in it for the report. I tried to pull out the main
>>> relevant themes that I noticed discussed in the two qualitative questions
>>> currently in the file with a couple of quote options for each
>>> theme/category of response to the question.
>>>
>>> -Harry
>>>
>>> On Sun, Aug 11, 2019 at 12:54 PM Tom Chartrand <tmchartrand at gmail.com>
>>> wrote:
>>>
>>>> Regarding where to have this discussion - I'm just gonna continue the
>>>> email chain cause I haven't followed where to put the discussion on the
>>>> wiki, but someone feel free to steer it over there if we want to!
>>>>
>>>> A brief update regarding establishing a larger list of employers to
>>>> include in the dataset: basic contact information for all seattle
>>>> businesses, sorted by the North American Industry Classification System, is
>>>> available at
>>>> https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
>>>> but it's a huge list of course, with no info on number of employees or
>>>> revenue to filter out the smaller ones. Still, I did send off an email
>>>> about getting a copy of the database just for purposes of cross-referencing
>>>> names and such.
>>>> On 8/10/19 6:42 PM, Katie Wilson wrote:
>>>>
>>>> For “neighborhood” I think it makes sense to use the “CTR Network
>>>> Areas” as defined here
>>>> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>
>>>> .
>>>>
>>>> For “industry” I think it makes sense to use the “Employment Sector”
>>>> categories listed on Page 12 of this CTR strategic plan.
>>>> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>>>>
>>>> On the ratings, I think it does make sense to lump "piggy bank" and
>>>> "brown tortoise" in the same rating (0), and then add a tortoise badge for
>>>> employers that aren’t even doing the pre-tax thing.
>>>>
>>>> Another simplification option to consider would be to lump together 3
>>>> and 4 leaves. But let’s leave them separate for now and depending on how
>>>> things shake out we can easily combine them later.
>>>>
>>>> We don’t have any major sources of data on what benefits employers
>>>> provide other than:
>>>> — Metro public disclosure request spreadsheet
>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
>>>> — Our commute survey
>>>> — Info gleaned online from company websites, asking around, glassdoor
>>>> etc (what I’ve found I’ve added to the relevant tables in the wiki
>>>> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
>>>> on CTR employers and “potential poster children” and “likely target
>>>> assessment” and “hotels”)
>>>>
>>>> Maybe it makes sense to have another string indicating sufficient
>>>> certainty — when we have two sources, or one very reliable source, we enter
>>>> an X or whatever, and that gives us the green light to display that data.
>>>> Also it may not make sense to put a lot of work into categorizing employers
>>>> into Network Area and Employment Sector until we have reliable data on what
>>>> benefits they’re offering.
>>>>
>>>> Speaking of Seattle Coffee Works, I spoke with their HR person a few
>>>> months ago and actually employees have to pay $20/month (pre-tax $) if they
>>>> want an ORCA card. Still a great deal but not 100% subsidy as reported in
>>>> the Metro data— which, I then learned, is self-reported by the company.
>>>> Metro only knows that all those companies are signed up for the Passport
>>>> program. I noted the real situation on this page
>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
>>>> Anyway, the point is we should probably crosscheck the Metro data as much
>>>> as we can with our survey or other sources of information.
>>>>
>>>> (Also speaking of Seattle Coffee Works they have locations in Capitol
>>>> Hill & Cascade too <https://www.seattlecoffeeworks.com/our-cafes.aspx>.
>>>> From talking with the HR person I’m pretty sure all are include in their
>>>> passport program, and the employees swap around a lot from location to
>>>> location. They probably use the Ballard location as home base for transit
>>>> pass purposes since that’s the least expensive zone.)
>>>>
>>>> One project would be to come up with a list of employers that have name
>>>> recognition (or that we are interested in for some other reason) and put a
>>>> little work into attaining sufficient certainty. If we posted the list to a
>>>> page and put a call out on social media and email I bet we’d get some
>>>> answers.
>>>>
>>>> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com>
>>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I've taken a first pass at the data schema for showing employer transit
>>>> benefits in our upcoming web app. In this draft, each employer record is
>>>> represented as follows:
>>>>
>>>> {
>>>>     "employer": string,
>>>>     "industry": [string],
>>>>     "neighborhood": [string],
>>>>     "alias": [string],
>>>>     "rating": int,
>>>>     "description": string
>>>>     "badges": [string]
>>>> }
>>>>
>>>> *Employer* is a plain text string.
>>>> *Industry* is a list of strings (or a single string, if we want to
>>>> limit one employer = one industry).
>>>> *Neighborhood* is treated similarly to industry
>>>> *Alias* is a list of other names for the same company. For example,
>>>> *Rating* is a numerical scale that represents the "worker's monthly
>>>> cost of an unlimited transit pass". The scale provided during the meeting
>>>> went from "4 leaves" to "brown tortoise"; aligning to the leaves, that
>>>> gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to 0-5, or
>>>> lump "piggy bank" and "brown tortoise" in the same rating.
>>>> *Description* is a string that describes the employer's transit
>>>> benefits, i.e. why they got the rating they did.
>>>> *Badges* is a list of strings that represent any additional categories
>>>> we want to assign to a company (e.g. "industry leader", "polluter").
>>>>
>>>> We can make changes to this schema if it makes it easier to work with
>>>> our underlying data visualization platform (e.g. Tableau? DataTables?), but
>>>> hopefully this is a suitable starting place.
>>>>
>>>> As an example, take a hypothetical record for Seattle Coffee Works.
>>>>
>>>> {
>>>>     "employer": "Seattle Coffee Works",
>>>>     "industry": ["restaurant"],
>>>>     "neighborhood": ["cbd", "ballard"],
>>>>     "alias": ["Ballard Coffee Works"],
>>>>     "rating": 4,
>>>>     "description": "Provides 100% ORCA Passport subsidy."
>>>>     "badges": ["leader"]
>>>> }
>>>>
>>>> *Where Our Data Lives (For Now)*
>>>>
>>>> I've also taken a rough chop at getting started with the data. Here,
>>>> I've just taken the raw list of ORCA Business Passport employers and
>>>> assigned a score based on their subsidy percentage, as an example:
>>>>
>>>>
>>>> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>>>>
>>>> The spreadsheet contains columns for each item of the employer record,
>>>> as well as some additional columns to record the raw data we have on file
>>>> for that employer, so we can use that data to automatically or manually
>>>> determine an employer's rating.
>>>>
>>>> If we have data from other sources not listed (e.g. survey data, City
>>>> of Seattle data), the "source_" columns can be renamed or added to
>>>> represent that source's data. For example, if I want to add data from the
>>>> TRU survey, I might rename "__source_b" to "__TRU Survey", then include
>>>> results from that survey in that column for each company. (The columns
>>>> beginning with two underscores are ones I don't expect to be publicly
>>>> available.)
>>>>
>>>> PBworks feels really inadequate for editing large data sets, and I
>>>> don't know where else to put it, so it's living in Google Sheets for now.
>>>> Set to read-only with the link, for now, but please request editing
>>>> permissions so you can add stuff to the sheet.
>>>>
>>>> Currently, my expectation is that the spreadsheet will be hand-edited
>>>> in Google Sheets, and then when we're ready to put live data in the web
>>>> app, we can export the sheet to a flat file, which we can then import into
>>>> a format appropriate for the website (big ol' JSON file, database,
>>>> whatever). Manual process, but probably fine for a project of this scale;
>>>> I'm open to alternatives.
>>>>
>>>> *Things To Do Next*
>>>>
>>>> Aside from the ORCA Passport data and the data we collected through TRU
>>>> survey / legwork (on PBworks), do we have any other data sources that would
>>>> provide context for a score?
>>>>
>>>> For the data sources we have, we'll have to start filling out the rest
>>>> of the spreadsheet, I guess?
>>>>
>>>> Also, we will need to determine:
>>>> a) master list of "industries" we want to support, and
>>>> b) "industry" field(s) for each employer
>>>> c) "neighborhood" field(s) for each employer we don't have one for (or
>>>> being more precise than what I have now)
>>>> d) which companies get tagged with which badges
>>>>
>>>> Hope that helps.
>>>>
>>>> In solidarity,
>>>>
>>>> Stephen
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190819/36b03cce/attachment.html>