[TRU Research] Web App Data Schema
Tom Chartrand
tmchartrand at gmail.com
Sun Aug 11 12:54:11 PDT 2019
Regarding where to have this discussion - I'm just gonna continue the
email chain cause I haven't followed where to put the discussion on the
wiki, but someone feel free to steer it over there if we want to!
A brief update regarding establishing a larger list of employers to
include in the dataset: basic contact information for all seattle
businesses, sorted by the North American Industry Classification System,
is available at
https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
but it's a huge list of course, with no info on number of employees or
revenue to filter out the smaller ones. Still, I did send off an email
about getting a copy of the database just for purposes of
cross-referencing names and such.
On 8/10/19 6:42 PM, Katie Wilson wrote:
> For “neighborhood” I think it makes sense to use the “CTR Network
> Areas” as defined here
> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>.
>
> For “industry” I think it makes sense to use the “Employment Sector”
> categories listed on Page 12 of this CTR strategic plan.
> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>
> On the ratings, I think it does make sense to lump "piggy bank" and
> "brown tortoise" in the same rating (0), and then add a tortoise badge
> for employers that aren’t even doing the pre-tax thing.
>
> Another simplification option to consider would be to lump together 3
> and 4 leaves. But let’s leave them separate for now and depending on
> how things shake out we can easily combine them later.
>
> We don’t have any major sources of data on what benefits employers
> provide other than:
> — Metro public disclosure request spreadsheet
> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
> — Our commute survey
> — Info gleaned online from company websites, asking around, glassdoor
> etc (what I’ve found I’ve added to the relevant tables in the wiki
> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
> on CTR employers and “potential poster children” and “likely target
> assessment” and “hotels”)
>
> Maybe it makes sense to have another string indicating sufficient
> certainty — when we have two sources, or one very reliable source, we
> enter an X or whatever, and that gives us the green light to display
> that data. Also it may not make sense to put a lot of work into
> categorizing employers into Network Area and Employment Sector until
> we have reliable data on what benefits they’re offering.
>
> Speaking of Seattle Coffee Works, I spoke with their HR person a few
> months ago and actually employees have to pay $20/month (pre-tax $) if
> they want an ORCA card. Still a great deal but not 100% subsidy as
> reported in the Metro data— which, I then learned, is self-reported by
> the company. Metro only knows that all those companies are signed up
> for the Passport program. I noted the real situation on this page
> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
> Anyway, the point is we should probably crosscheck the Metro data as
> much as we can with our survey or other sources of information.
>
> (Also speaking of Seattle Coffee Works they have locations in Capitol
> Hill & Cascade too
> <https://www.seattlecoffeeworks.com/our-cafes.aspx>. From talking with
> the HR person I’m pretty sure all are include in their passport
> program, and the employees swap around a lot from location to
> location. They probably use the Ballard location as home base for
> transit pass purposes since that’s the least expensive zone.)
>
> One project would be to come up with a list of employers that have
> name recognition (or that we are interested in for some other reason)
> and put a little work into attaining sufficient certainty. If we
> posted the list to a page and put a call out on social media and email
> I bet we’d get some answers.
>
>> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com
>> <mailto:rachidian at gmail.com>> wrote:
>>
>> Hi everyone,
>>
>> I've taken a first pass at the data schema for showing employer
>> transit benefits in our upcoming web app. In this draft, each
>> employer record is represented as follows:
>>
>> {
>> "employer": string,
>> "industry": [string],
>> "neighborhood": [string],
>> "alias": [string],
>> "rating": int,
>> "description": string
>> "badges": [string]
>> }
>>
>> *Employer* is a plain text string.
>> *Industry* is a list of strings (or a single string, if we want to
>> limit one employer = one industry).
>> *Neighborhood* is treated similarly to industry
>> *Alias* is a list of other names for the same company. For example,
>> *Rating* is a numerical scale that represents the "worker's monthly
>> cost of an unlimited transit pass". The scale provided during the
>> meeting went from "4 leaves" to "brown tortoise"; aligning to the
>> leaves, that gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust
>> this up to 0-5, or lump "piggy bank" and "brown tortoise" in the same
>> rating.
>> *Description* is a string that describes the employer's transit
>> benefits, i.e. why they got the rating they did.
>> *Badges* is a list of strings that represent any additional
>> categories we want to assign to a company (e.g. "industry leader",
>> "polluter").
>>
>> We can make changes to this schema if it makes it easier to work with
>> our underlying data visualization platform (e.g. Tableau?
>> DataTables?), but hopefully this is a suitable starting place.
>>
>> As an example, take a hypothetical record for Seattle Coffee Works.
>>
>> {
>> "employer": "Seattle Coffee Works",
>> "industry": ["restaurant"],
>> "neighborhood": ["cbd", "ballard"],
>> "alias": ["Ballard Coffee Works"],
>> "rating": 4,
>> "description": "Provides 100% ORCA Passport subsidy."
>> "badges": ["leader"]
>> }
>>
>> *Where Our Data Lives (For Now)*
>>
>> I've also taken a rough chop at getting started with the data. Here,
>> I've just taken the raw list of ORCA Business Passport employers and
>> assigned a score based on their subsidy percentage, as an example:
>>
>> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>>
>> The spreadsheet contains columns for each item of the employer
>> record, as well as some additional columns to record the raw data we
>> have on file for that employer, so we can use that data to
>> automatically or manually determine an employer's rating.
>>
>> If we have data from other sources not listed (e.g. survey data, City
>> of Seattle data), the "source_" columns can be renamed or added to
>> represent that source's data. For example, if I want to add data from
>> the TRU survey, I might rename "__source_b" to "__TRU Survey", then
>> include results from that survey in that column for each company.
>> (The columns beginning with two underscores are ones I don't expect
>> to be publicly available.)
>>
>> PBworks feels really inadequate for editing large data sets, and I
>> don't know where else to put it, so it's living in Google Sheets for
>> now. Set to read-only with the link, for now, but please request
>> editing permissions so you can add stuff to the sheet.
>>
>> Currently, my expectation is that the spreadsheet will be hand-edited
>> in Google Sheets, and then when we're ready to put live data in the
>> web app, we can export the sheet to a flat file, which we can then
>> import into a format appropriate for the website (big ol' JSON file,
>> database, whatever). Manual process, but probably fine for a project
>> of this scale; I'm open to alternatives.
>>
>> *Things To Do Next*
>>
>> Aside from the ORCA Passport data and the data we collected through
>> TRU survey / legwork (on PBworks), do we have any other data sources
>> that would provide context for a score?
>>
>> For the data sources we have, we'll have to start filling out the
>> rest of the spreadsheet, I guess?
>>
>> Also, we will need to determine:
>> a) master list of "industries" we want to support, and
>> b) "industry" field(s) for each employer
>> c) "neighborhood" field(s) for each employer we don't have one for
>> (or being more precise than what I have now)
>> d) which companies get tagged with which badges
>>
>> Hope that helps.
>>
>> In solidarity,
>>
>> Stephen
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190811/6af0d319/attachment.html>
More information about the Research
mailing list