[TRU Research] Web App Data Schema
Katie Wilson
katie at transitriders.org
Sat Aug 10 18:42:51 PDT 2019
For “neighborhood” I think it makes sense to use the “CTR Network Areas” as defined here <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>.
For “industry” I think it makes sense to use the “Employment Sector” categories listed on Page 12 of this CTR strategic plan. <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
On the ratings, I think it does make sense to lump "piggy bank" and "brown tortoise" in the same rating (0), and then add a tortoise badge for employers that aren’t even doing the pre-tax thing.
Another simplification option to consider would be to lump together 3 and 4 leaves. But let’s leave them separate for now and depending on how things shake out we can easily combine them later.
We don’t have any major sources of data on what benefits employers provide other than:
— Metro public disclosure request spreadsheet <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
— Our commute survey
— Info gleaned online from company websites, asking around, glassdoor etc (what I’ve found I’ve added to the relevant tables in the wiki <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>, on CTR employers and “potential poster children” and “likely target assessment” and “hotels”)
Maybe it makes sense to have another string indicating sufficient certainty — when we have two sources, or one very reliable source, we enter an X or whatever, and that gives us the green light to display that data. Also it may not make sense to put a lot of work into categorizing employers into Network Area and Employment Sector until we have reliable data on what benefits they’re offering.
Speaking of Seattle Coffee Works, I spoke with their HR person a few months ago and actually employees have to pay $20/month (pre-tax $) if they want an ORCA card. Still a great deal but not 100% subsidy as reported in the Metro data— which, I then learned, is self-reported by the company. Metro only knows that all those companies are signed up for the Passport program. I noted the real situation on this page <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>. Anyway, the point is we should probably crosscheck the Metro data as much as we can with our survey or other sources of information.
(Also speaking of Seattle Coffee Works they have locations in Capitol Hill & Cascade too <https://www.seattlecoffeeworks.com/our-cafes.aspx>. From talking with the HR person I’m pretty sure all are include in their passport program, and the employees swap around a lot from location to location. They probably use the Ballard location as home base for transit pass purposes since that’s the least expensive zone.)
One project would be to come up with a list of employers that have name recognition (or that we are interested in for some other reason) and put a little work into attaining sufficient certainty. If we posted the list to a page and put a call out on social media and email I bet we’d get some answers.
> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com> wrote:
>
> Hi everyone,
>
> I've taken a first pass at the data schema for showing employer transit benefits in our upcoming web app. In this draft, each employer record is represented as follows:
>
> {
> "employer": string,
> "industry": [string],
> "neighborhood": [string],
> "alias": [string],
> "rating": int,
> "description": string
> "badges": [string]
> }
>
> Employer is a plain text string.
> Industry is a list of strings (or a single string, if we want to limit one employer = one industry).
> Neighborhood is treated similarly to industry
> Alias is a list of other names for the same company. For example,
> Rating is a numerical scale that represents the "worker's monthly cost of an unlimited transit pass". The scale provided during the meeting went from "4 leaves" to "brown tortoise"; aligning to the leaves, that gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to 0-5, or lump "piggy bank" and "brown tortoise" in the same rating.
> Description is a string that describes the employer's transit benefits, i.e. why they got the rating they did.
> Badges is a list of strings that represent any additional categories we want to assign to a company (e.g. "industry leader", "polluter").
>
> We can make changes to this schema if it makes it easier to work with our underlying data visualization platform (e.g. Tableau? DataTables?), but hopefully this is a suitable starting place.
>
> As an example, take a hypothetical record for Seattle Coffee Works.
>
> {
> "employer": "Seattle Coffee Works",
> "industry": ["restaurant"],
> "neighborhood": ["cbd", "ballard"],
> "alias": ["Ballard Coffee Works"],
> "rating": 4,
> "description": "Provides 100% ORCA Passport subsidy."
> "badges": ["leader"]
> }
>
> Where Our Data Lives (For Now)
>
> I've also taken a rough chop at getting started with the data. Here, I've just taken the raw list of ORCA Business Passport employers and assigned a score based on their subsidy percentage, as an example:
>
> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing>
>
> The spreadsheet contains columns for each item of the employer record, as well as some additional columns to record the raw data we have on file for that employer, so we can use that data to automatically or manually determine an employer's rating.
>
> If we have data from other sources not listed (e.g. survey data, City of Seattle data), the "source_" columns can be renamed or added to represent that source's data. For example, if I want to add data from the TRU survey, I might rename "__source_b" to "__TRU Survey", then include results from that survey in that column for each company. (The columns beginning with two underscores are ones I don't expect to be publicly available.)
>
> PBworks feels really inadequate for editing large data sets, and I don't know where else to put it, so it's living in Google Sheets for now. Set to read-only with the link, for now, but please request editing permissions so you can add stuff to the sheet.
>
> Currently, my expectation is that the spreadsheet will be hand-edited in Google Sheets, and then when we're ready to put live data in the web app, we can export the sheet to a flat file, which we can then import into a format appropriate for the website (big ol' JSON file, database, whatever). Manual process, but probably fine for a project of this scale; I'm open to alternatives.
>
> Things To Do Next
>
> Aside from the ORCA Passport data and the data we collected through TRU survey / legwork (on PBworks), do we have any other data sources that would provide context for a score?
>
> For the data sources we have, we'll have to start filling out the rest of the spreadsheet, I guess?
>
> Also, we will need to determine:
> a) master list of "industries" we want to support, and
> b) "industry" field(s) for each employer
> c) "neighborhood" field(s) for each employer we don't have one for (or being more precise than what I have now)
> d) which companies get tagged with which badges
>
> Hope that helps.
>
> In solidarity,
>
> Stephen
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190810/11aea4cf/attachment.html>
More information about the Research
mailing list