[TRU Research] Web App Data Schema

Harry Maher harryb.maher at gmail.com
Tue Aug 13 22:48:10 PDT 2019


Just a quick update with regard to qualitative data analysis: I made a
"Commute Survey Qualitative Data Analysis" folder on pbworks and put a doc
with some quotes in it for the report. I tried to pull out the main
relevant themes that I noticed discussed in the two qualitative questions
currently in the file with a couple of quote options for each
theme/category of response to the question.

-Harry

On Sun, Aug 11, 2019 at 12:54 PM Tom Chartrand <tmchartrand at gmail.com>
wrote:

> Regarding where to have this discussion - I'm just gonna continue the
> email chain cause I haven't followed where to put the discussion on the
> wiki, but someone feel free to steer it over there if we want to!
>
> A brief update regarding establishing a larger list of employers to
> include in the dataset: basic contact information for all seattle
> businesses, sorted by the North American Industry Classification System, is
> available at
> https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
> but it's a huge list of course, with no info on number of employees or
> revenue to filter out the smaller ones. Still, I did send off an email
> about getting a copy of the database just for purposes of cross-referencing
> names and such.
> On 8/10/19 6:42 PM, Katie Wilson wrote:
>
> For “neighborhood” I think it makes sense to use the “CTR Network Areas”
> as defined here
> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>
> .
>
> For “industry” I think it makes sense to use the “Employment Sector”
> categories listed on Page 12 of this CTR strategic plan.
> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>
> On the ratings, I think it does make sense to lump "piggy bank" and "brown
> tortoise" in the same rating (0), and then add a tortoise badge for
> employers that aren’t even doing the pre-tax thing.
>
> Another simplification option to consider would be to lump together 3 and
> 4 leaves. But let’s leave them separate for now and depending on how things
> shake out we can easily combine them later.
>
> We don’t have any major sources of data on what benefits employers provide
> other than:
> — Metro public disclosure request spreadsheet
> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
> — Our commute survey
> — Info gleaned online from company websites, asking around, glassdoor etc
> (what I’ve found I’ve added to the relevant tables in the wiki
> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
> on CTR employers and “potential poster children” and “likely target
> assessment” and “hotels”)
>
> Maybe it makes sense to have another string indicating sufficient
> certainty — when we have two sources, or one very reliable source, we enter
> an X or whatever, and that gives us the green light to display that data.
> Also it may not make sense to put a lot of work into categorizing employers
> into Network Area and Employment Sector until we have reliable data on what
> benefits they’re offering.
>
> Speaking of Seattle Coffee Works, I spoke with their HR person a few
> months ago and actually employees have to pay $20/month (pre-tax $) if they
> want an ORCA card. Still a great deal but not 100% subsidy as reported in
> the Metro data— which, I then learned, is self-reported by the company.
> Metro only knows that all those companies are signed up for the Passport
> program. I noted the real situation on this page
> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
> Anyway, the point is we should probably crosscheck the Metro data as much
> as we can with our survey or other sources of information.
>
> (Also speaking of Seattle Coffee Works they have locations in Capitol
> Hill & Cascade too <https://www.seattlecoffeeworks.com/our-cafes.aspx>.
> From talking with the HR person I’m pretty sure all are include in their
> passport program, and the employees swap around a lot from location to
> location. They probably use the Ballard location as home base for transit
> pass purposes since that’s the least expensive zone.)
>
> One project would be to come up with a list of employers that have name
> recognition (or that we are interested in for some other reason) and put a
> little work into attaining sufficient certainty. If we posted the list to a
> page and put a call out on social media and email I bet we’d get some
> answers.
>
> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com> wrote:
>
> Hi everyone,
>
> I've taken a first pass at the data schema for showing employer transit
> benefits in our upcoming web app. In this draft, each employer record is
> represented as follows:
>
> {
>     "employer": string,
>     "industry": [string],
>     "neighborhood": [string],
>     "alias": [string],
>     "rating": int,
>     "description": string
>     "badges": [string]
> }
>
> *Employer* is a plain text string.
> *Industry* is a list of strings (or a single string, if we want to limit
> one employer = one industry).
> *Neighborhood* is treated similarly to industry
> *Alias* is a list of other names for the same company. For example,
> *Rating* is a numerical scale that represents the "worker's monthly cost
> of an unlimited transit pass". The scale provided during the meeting went
> from "4 leaves" to "brown tortoise"; aligning to the leaves, that gives us
> a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to 0-5, or lump
> "piggy bank" and "brown tortoise" in the same rating.
> *Description* is a string that describes the employer's transit benefits,
> i.e. why they got the rating they did.
> *Badges* is a list of strings that represent any additional categories we
> want to assign to a company (e.g. "industry leader", "polluter").
>
> We can make changes to this schema if it makes it easier to work with our
> underlying data visualization platform (e.g. Tableau? DataTables?), but
> hopefully this is a suitable starting place.
>
> As an example, take a hypothetical record for Seattle Coffee Works.
>
> {
>     "employer": "Seattle Coffee Works",
>     "industry": ["restaurant"],
>     "neighborhood": ["cbd", "ballard"],
>     "alias": ["Ballard Coffee Works"],
>     "rating": 4,
>     "description": "Provides 100% ORCA Passport subsidy."
>     "badges": ["leader"]
> }
>
> *Where Our Data Lives (For Now)*
>
> I've also taken a rough chop at getting started with the data. Here, I've
> just taken the raw list of ORCA Business Passport employers and assigned a
> score based on their subsidy percentage, as an example:
>
>
> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>
> The spreadsheet contains columns for each item of the employer record, as
> well as some additional columns to record the raw data we have on file for
> that employer, so we can use that data to automatically or manually
> determine an employer's rating.
>
> If we have data from other sources not listed (e.g. survey data, City of
> Seattle data), the "source_" columns can be renamed or added to represent
> that source's data. For example, if I want to add data from the TRU survey,
> I might rename "__source_b" to "__TRU Survey", then include results from
> that survey in that column for each company. (The columns beginning with
> two underscores are ones I don't expect to be publicly available.)
>
> PBworks feels really inadequate for editing large data sets, and I don't
> know where else to put it, so it's living in Google Sheets for now. Set to
> read-only with the link, for now, but please request editing permissions so
> you can add stuff to the sheet.
>
> Currently, my expectation is that the spreadsheet will be hand-edited in
> Google Sheets, and then when we're ready to put live data in the web app,
> we can export the sheet to a flat file, which we can then import into a
> format appropriate for the website (big ol' JSON file, database, whatever).
> Manual process, but probably fine for a project of this scale; I'm open to
> alternatives.
>
> *Things To Do Next*
>
> Aside from the ORCA Passport data and the data we collected through TRU
> survey / legwork (on PBworks), do we have any other data sources that would
> provide context for a score?
>
> For the data sources we have, we'll have to start filling out the rest of
> the spreadsheet, I guess?
>
> Also, we will need to determine:
> a) master list of "industries" we want to support, and
> b) "industry" field(s) for each employer
> c) "neighborhood" field(s) for each employer we don't have one for (or
> being more precise than what I have now)
> d) which companies get tagged with which badges
>
> Hope that helps.
>
> In solidarity,
>
> Stephen
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190813/71bffddc/attachment.html>


More information about the Research mailing list