[TRU Research] Web App Data Schema

Stephen DeSanto rachidian at gmail.com
Thu Aug 8 17:26:27 PDT 2019


Hi everyone,

I've taken a first pass at the data schema for showing employer transit
benefits in our upcoming web app. In this draft, each employer record is
represented as follows:

{
    "employer": string,
    "industry": [string],
    "neighborhood": [string],
    "alias": [string],
    "rating": int,
    "description": string
    "badges": [string]
}

*Employer* is a plain text string.
*Industry* is a list of strings (or a single string, if we want to limit
one employer = one industry).
*Neighborhood* is treated similarly to industry
*Alias* is a list of other names for the same company. For example,
*Rating* is a numerical scale that represents the "worker's monthly cost of
an unlimited transit pass". The scale provided during the meeting went from
"4 leaves" to "brown tortoise"; aligning to the leaves, that gives us a
scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to 0-5, or lump
"piggy bank" and "brown tortoise" in the same rating.
*Description* is a string that describes the employer's transit benefits,
i.e. why they got the rating they did.
*Badges* is a list of strings that represent any additional categories we
want to assign to a company (e.g. "industry leader", "polluter").

We can make changes to this schema if it makes it easier to work with our
underlying data visualization platform (e.g. Tableau? DataTables?), but
hopefully this is a suitable starting place.

As an example, take a hypothetical record for Seattle Coffee Works.

{
    "employer": "Seattle Coffee Works",
    "industry": ["restaurant"],
    "neighborhood": ["cbd", "ballard"],
    "alias": ["Ballard Coffee Works"],
    "rating": 4,
    "description": "Provides 100% ORCA Passport subsidy."
    "badges": ["leader"]
}

*Where Our Data Lives (For Now)*

I've also taken a rough chop at getting started with the data. Here, I've
just taken the raw list of ORCA Business Passport employers and assigned a
score based on their subsidy percentage, as an example:

https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing

The spreadsheet contains columns for each item of the employer record, as
well as some additional columns to record the raw data we have on file for
that employer, so we can use that data to automatically or manually
determine an employer's rating.

If we have data from other sources not listed (e.g. survey data, City of
Seattle data), the "source_" columns can be renamed or added to represent
that source's data. For example, if I want to add data from the TRU survey,
I might rename "__source_b" to "__TRU Survey", then include results from
that survey in that column for each company. (The columns beginning with
two underscores are ones I don't expect to be publicly available.)

PBworks feels really inadequate for editing large data sets, and I don't
know where else to put it, so it's living in Google Sheets for now. Set to
read-only with the link, for now, but please request editing permissions so
you can add stuff to the sheet.

Currently, my expectation is that the spreadsheet will be hand-edited in
Google Sheets, and then when we're ready to put live data in the web app,
we can export the sheet to a flat file, which we can then import into a
format appropriate for the website (big ol' JSON file, database, whatever).
Manual process, but probably fine for a project of this scale; I'm open to
alternatives.

*Things To Do Next*

Aside from the ORCA Passport data and the data we collected through TRU
survey / legwork (on PBworks), do we have any other data sources that would
provide context for a score?

For the data sources we have, we'll have to start filling out the rest of
the spreadsheet, I guess?

Also, we will need to determine:
a) master list of "industries" we want to support, and
b) "industry" field(s) for each employer
c) "neighborhood" field(s) for each employer we don't have one for (or
being more precise than what I have now)
d) which companies get tagged with which badges

Hope that helps.

In solidarity,

Stephen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190808/10900dbd/attachment.html>


More information about the Research mailing list