[TRU Research] Web App Data Schema

Thu Sep 5 12:11:37 PDT 2019

I'm also past a busy time at work now and could put in some time on this
over the weekend. Katie, let me know if there's some chunk it would be easy
to delegate, like getting some additional set into the spreadsheet.
-Tom

On Thu, Sep 5, 2019 at 12:07 PM Katie Wilson <katie at transitriders.org>
wrote:

> I think we’re in pretty good shape— Jim has been mocking up some visuals
> and we’ll want to add a numerical “monthly cost” column (instead of or in
> addition to the rating) but I think I can do that.
>
> Thanks for all your work on this, and hope you’re doing ok Stephen!
>
> On Sep 5, 2019, at 11:42 AM, Stephen DeSanto <rachidian at gmail.com> wrote:
>
> Hi friends, happy Thursday. I am dealing with some personal issues that's
> preventing me from dedicating a lot of time and energy to this project
> right now. I hope you all can get this over the finish line.
>
> In solidarity,
>
> Stephen
>
>
> On Tue, Aug 27, 2019 at 4:23 PM Katie Wilson <katie at transitriders.org>
> wrote:
>
>> Great! I will head over there at 5:30. I’ll grab the conference room if
>> it’s free, otherwise I’ll just find us a table.
>>
>> On Aug 27, 2019, at 12:18 PM, Jim Walseth <jim.walseth at gmail.com> wrote:
>>
>> I should be able to attend as well. Cheers, -Jim
>>
>> On Mon, Aug 26, 2019 at 3:32 PM Stephen DeSanto <rachidian at gmail.com>
>> wrote:
>>
>>> Victrola is fine! 5:30 works for me. Gives me time to get there after
>>> work.
>>>
>>> On Mon, Aug 26, 2019 at 3:09 PM Katie Wilson <katie at transitriders.org>
>>> wrote:
>>>
>>>> Cool, does Victrola on 15th work again, or is somewhere else better?
>>>> Having an actual meeting, even if a small one, will help me to sit down and
>>>> focus on this instead of getting distracted by other things. 5pm good, or
>>>> 5:30 or 6?
>>>>
>>>> On Aug 26, 2019, at 9:38 AM, Stephen DeSanto <rachidian at gmail.com>
>>>> wrote:
>>>>
>>>> I'm also available Tues evening. Otherwise I'll keep making updates
>>>> from home. :)
>>>>
>>>> On Sun, Aug 25, 2019 at 4:43 PM Katie Wilson <katie at transitriders.org>
>>>> wrote:
>>>>
>>>>> This is awesome, thank you Stephen. I put some thoughts in-line in
>>>>> red below, and attached a hard-to-interpret spreadsheet with info
>>>>> about Business Choice participants.
>>>>>
>>>>> I will try to schedule some time this week to start completing rows
>>>>> for the businesses I’m sure about based on the info we have. If anyone
>>>>> wants to get together for a spreadsheet workparty this week let me know, I
>>>>> have time Tuesday and Friday evenings.
>>>>>
>>>>> Katie
>>>>>
>>>>> On Aug 25, 2019, at 3:37 PM, Stephen DeSanto <rachidian at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi everyone. Made a few updates to the master list of employers
>>>>> <https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing>
>>>>> for the upcoming website:
>>>>>
>>>>>    - Added companies and transit benefits raw descriptions from TRU
>>>>>    survey data
>>>>>    - Added "Likely CTR Targets":
>>>>>    https://seattletransitpasses-research.pbworks.com/w/page/133438365/Likely%20Target%20Assessment
>>>>>    - Added "Potential CTR Targets":
>>>>>    https://seattletransitpasses-research.pbworks.com/w/page/133437828/Potential%20CTR%20Targets
>>>>>    - "Likely" / high-profile targets (hotels, banks) are highlighted
>>>>>    in ORANGE
>>>>>    - Added "Potential Poster Children":
>>>>>    https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children
>>>>>    - Poster children are highlighted in GREEN
>>>>>
>>>>> Highlight colors are just to make it easier to find rows that a)
>>>>> someone said should be included in the list, and b) probably needs benefits
>>>>> data incorporated
>>>>>
>>>>> Things to be done:
>>>>>
>>>>>    - Normalize locations data For this and the “leaf scores” and the
>>>>>    “polluter” columns, I’d be inclined to do this after we’ve got
>>>>>    enough info to check the “publish” box. I could be wrong, but I feel like
>>>>>    it will be less work that way.
>>>>>    - Assign "leaf scores" to all companies that don't have one
>>>>>    - Assign "polluter" etc badges to companies we want to name&shame
>>>>>    - Add all of the hotels?
>>>>>    https://seattletransitpasses-research.pbworks.com/w/page/133666440/Hotels
>>>>>     Yeah let’s go ahead and add them...
>>>>>    - Add Choice participants? Good question. I did get info back from
>>>>>    Metro on what products the choice participants are buying, I can’t remember
>>>>>    whether I shared that with you all. Anyway, it’s attached. It’s actually a
>>>>>    little hard to interpret (I got a tutorial from a Metro staffer) so I can
>>>>>    try to explain by phone or in person if someone wants to dig throughthat.
>>>>>    Maybe it makes sense to look through that info and add businesses
>>>>>    selectively as we feel like we have a grasp on their programs.
>>>>>    - Add column for Commute Seattle participants?
>>>>>    https://seattletransitpasses-research.pbworks.com/w/page/133438167/Commute%20Seattle%20List%20of%20Passport%20Participants I
>>>>>    don’t think we need to do this, because this info is most likely
>>>>>    duplicative with what we learned from Metro about passport participants.
>>>>>    The Commute Seattle list doesn’t tell us how much of a subsidy they
>>>>>    provide, so it’s not going to add much.
>>>>>
>>>>>
>>>>>    - Citations, descriptions of benefits, etc. for companies that
>>>>>    need it
>>>>>
>>>>> There's still a lot we're not 100% sure about for employer benefits,
>>>>> but we can do the best we have with what we've got, and make changes as we
>>>>> get new information.
>>>>>
>>>>> My thinking was, we can use the "master employer list" to get as much
>>>>> information about the companies we're interested in. When we're satisfied
>>>>> that a row is finished and ready for publication, check the checkbox in the
>>>>> "__publish" column. Then, when we export this data to the website, we can
>>>>> only get the rows where "__publish" is checked. This hopefully will ensure
>>>>> that someone manually reviewed and verified all the data for an employer
>>>>> before it gets published, and that unfinished rows won't be accidentally
>>>>> exported.
>>>>>
>>>>> Is this helpful? Am I just spinning my wheels in the mud?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 22, 2019 at 7:06 PM Stephen DeSanto <rachidian at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> FYI, added most of you as editors on the spreadsheet I'm working on,
>>>>>> in case anyone has time for tedious data tasks (or a quick way to do
>>>>>> tedious data tasks). I'm currently adding in data from the TRU survey, from
>>>>>> respondents whose employers offer transit benefits. Eventually, we'll need
>>>>>> these tagged with industry and fix the neighborhoods data? And add in any
>>>>>> other company data we have from the other research spreadsheets on the
>>>>>> wiki? And eventually some subset of this data ends up on the website?
>>>>>>
>>>>>> On Tue, Aug 20, 2019 at 3:03 PM Tom Chartrand <tmchartrand at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Oh you're right, sorry for the confusion everyone! was just fairly
>>>>>>> hidden in the view i looked at. Column S!
>>>>>>>
>>>>>>> On Tue, Aug 20, 2019 at 3:00 PM Katie Wilson <
>>>>>>> katie at transitriders.org> wrote:
>>>>>>>
>>>>>>>> I think the spreadsheet with PII removed still does include the
>>>>>>>> Employer column, no?
>>>>>>>>
>>>>>>>> Sorry I’m being slow to respond to all this good stuff, I am still
>>>>>>>> digging myself out from being away last week and I’m at an all-day thing
>>>>>>>> today… but I should have time to pay more attention before the end of the
>>>>>>>> week!
>>>>>>>>
>>>>>>>> On Aug 19, 2019, at 6:26 PM, Stephen DeSanto <rachidian at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I have time to go through the survey data and find the reported
>>>>>>>> transit benefits per employer, though I'll need the data set that contains
>>>>>>>> that data. :)
>>>>>>>>
>>>>>>>> Otherwise, I am going to be trying to match CTR neighborhoods to
>>>>>>>> the employers already in our spreadsheet, as well as adding any employers
>>>>>>>> mentioned in other sources/sheets on our wiki.
>>>>>>>>
>>>>>>>> On Mon, Aug 19, 2019 at 6:02 PM Tom Chartrand <
>>>>>>>> tmchartrand at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> This is looking great, Stephen!
>>>>>>>>> I had put myself down to organize the survey data with respect to
>>>>>>>>> employers for this, but I just realized that info was removed as PII (of
>>>>>>>>> course)! So either Mike will need to take that on (I think Mike did the
>>>>>>>>> original PII removal) or we'll need to figure out an appropriate way of
>>>>>>>>> sharing that.
>>>>>>>>> I'm feeling pretty swamped myself lately, so if you (Stephen) were
>>>>>>>>> down to help him with the task that could be great. I can certainly still
>>>>>>>>> take on some of it if needed though, once we get this sorted out.
>>>>>>>>> Katie, maybe you could help coordinate this and make sure Mike
>>>>>>>>> sees this sooner rather than later?
>>>>>>>>>
>>>>>>>>> Also, do let me know if you have any more specific spots in the
>>>>>>>>> report where some backup from the PSRC dataset could be useful!
>>>>>>>>>
>>>>>>>>> On Sun, Aug 18, 2019 at 3:59 PM Stephen DeSanto <
>>>>>>>>> rachidian at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I've added the list of industry categories to the Google Sheet,
>>>>>>>>>> so that should help validate the data we add there, though it's going to
>>>>>>>>>> likely be a manual task to fill in industries for all the employers.
>>>>>>>>>>
>>>>>>>>>> I've also added a "citation" column, which can be the public
>>>>>>>>>> representation of where we got the data to make our claim. We can fuss with
>>>>>>>>>> the wording later.
>>>>>>>>>>
>>>>>>>>>> I should have time this week to go through our survey data and
>>>>>>>>>> other wiki tables to add or modify employers in the Google Sheet. Agree
>>>>>>>>>> that it'll be good to have solid information on our primary targets and
>>>>>>>>>> champions.
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 13, 2019 at 10:48 PM Harry Maher <
>>>>>>>>>> harryb.maher at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Just a quick update with regard to qualitative data analysis: I
>>>>>>>>>>> made a "Commute Survey Qualitative Data Analysis" folder on pbworks and put
>>>>>>>>>>> a doc with some quotes in it for the report. I tried to pull out the main
>>>>>>>>>>> relevant themes that I noticed discussed in the two qualitative questions
>>>>>>>>>>> currently in the file with a couple of quote options for each
>>>>>>>>>>> theme/category of response to the question.
>>>>>>>>>>>
>>>>>>>>>>> -Harry
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Aug 11, 2019 at 12:54 PM Tom Chartrand <
>>>>>>>>>>> tmchartrand at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Regarding where to have this discussion - I'm just gonna
>>>>>>>>>>>> continue the email chain cause I haven't followed where to put the
>>>>>>>>>>>> discussion on the wiki, but someone feel free to steer it over there if we
>>>>>>>>>>>> want to!
>>>>>>>>>>>>
>>>>>>>>>>>> A brief update regarding establishing a larger list of
>>>>>>>>>>>> employers to include in the dataset: basic contact information for all
>>>>>>>>>>>> seattle businesses, sorted by the North American Industry Classification
>>>>>>>>>>>> System, is available at
>>>>>>>>>>>> https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
>>>>>>>>>>>> but it's a huge list of course, with no info on number of employees or
>>>>>>>>>>>> revenue to filter out the smaller ones. Still, I did send off an email
>>>>>>>>>>>> about getting a copy of the database just for purposes of cross-referencing
>>>>>>>>>>>> names and such.
>>>>>>>>>>>> On 8/10/19 6:42 PM, Katie Wilson wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> For “neighborhood” I think it makes sense to use the “CTR
>>>>>>>>>>>> Network Areas” as defined here
>>>>>>>>>>>> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> For “industry” I think it makes sense to use the “Employment
>>>>>>>>>>>> Sector” categories listed on Page 12 of this CTR strategic plan.
>>>>>>>>>>>> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>>>>>>>>>>>>
>>>>>>>>>>>> On the ratings, I think it does make sense to lump "piggy bank"
>>>>>>>>>>>> and "brown tortoise" in the same rating (0), and then add a tortoise badge
>>>>>>>>>>>> for employers that aren’t even doing the pre-tax thing.
>>>>>>>>>>>>
>>>>>>>>>>>> Another simplification option to consider would be to lump
>>>>>>>>>>>> together 3 and 4 leaves. But let’s leave them separate for now and
>>>>>>>>>>>> depending on how things shake out we can easily combine them later.
>>>>>>>>>>>>
>>>>>>>>>>>> We don’t have any major sources of data on what benefits
>>>>>>>>>>>> employers provide other than:
>>>>>>>>>>>> — Metro public disclosure request spreadsheet
>>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
>>>>>>>>>>>> — Our commute survey
>>>>>>>>>>>> — Info gleaned online from company websites, asking around,
>>>>>>>>>>>> glassdoor etc (what I’ve found I’ve added to the relevant
>>>>>>>>>>>> tables in the wiki
>>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
>>>>>>>>>>>> on CTR employers and “potential poster children” and “likely target
>>>>>>>>>>>> assessment” and “hotels”)
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe it makes sense to have another string indicating
>>>>>>>>>>>> sufficient certainty — when we have two sources, or one very reliable
>>>>>>>>>>>> source, we enter an X or whatever, and that gives us the green light to
>>>>>>>>>>>> display that data. Also it may not make sense to put a lot of work into
>>>>>>>>>>>> categorizing employers into Network Area and Employment Sector until we
>>>>>>>>>>>> have reliable data on what benefits they’re offering.
>>>>>>>>>>>>
>>>>>>>>>>>> Speaking of Seattle Coffee Works, I spoke with their HR person
>>>>>>>>>>>> a few months ago and actually employees have to pay $20/month (pre-tax $)
>>>>>>>>>>>> if they want an ORCA card. Still a great deal but not 100% subsidy as
>>>>>>>>>>>> reported in the Metro data— which, I then learned, is self-reported by the
>>>>>>>>>>>> company. Metro only knows that all those companies are signed up for the
>>>>>>>>>>>> Passport program. I noted the real situation on this page
>>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
>>>>>>>>>>>> Anyway, the point is we should probably crosscheck the Metro data as much
>>>>>>>>>>>> as we can with our survey or other sources of information.
>>>>>>>>>>>>
>>>>>>>>>>>> (Also speaking of Seattle Coffee Works they have locations in Capitol
>>>>>>>>>>>> Hill & Cascade too
>>>>>>>>>>>> <https://www.seattlecoffeeworks.com/our-cafes.aspx>. From
>>>>>>>>>>>> talking with the HR person I’m pretty sure all are include in their
>>>>>>>>>>>> passport program, and the employees swap around a lot from location to
>>>>>>>>>>>> location. They probably use the Ballard location as home base for transit
>>>>>>>>>>>> pass purposes since that’s the least expensive zone.)
>>>>>>>>>>>>
>>>>>>>>>>>> One project would be to come up with a list of employers that
>>>>>>>>>>>> have name recognition (or that we are interested in for some other reason)
>>>>>>>>>>>> and put a little work into attaining sufficient certainty. If we posted the
>>>>>>>>>>>> list to a page and put a call out on social media and email I bet we’d get
>>>>>>>>>>>> some answers.
>>>>>>>>>>>>
>>>>>>>>>>>> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <
>>>>>>>>>>>> rachidian at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> I've taken a first pass at the data schema for showing employer
>>>>>>>>>>>> transit benefits in our upcoming web app. In this draft, each employer
>>>>>>>>>>>> record is represented as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>>     "employer": string,
>>>>>>>>>>>>     "industry": [string],
>>>>>>>>>>>>     "neighborhood": [string],
>>>>>>>>>>>>     "alias": [string],
>>>>>>>>>>>>     "rating": int,
>>>>>>>>>>>>     "description": string
>>>>>>>>>>>>     "badges": [string]
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> *Employer* is a plain text string.
>>>>>>>>>>>> *Industry* is a list of strings (or a single string, if we
>>>>>>>>>>>> want to limit one employer = one industry).
>>>>>>>>>>>> *Neighborhood* is treated similarly to industry
>>>>>>>>>>>> *Alias* is a list of other names for the same company. For
>>>>>>>>>>>> example,
>>>>>>>>>>>> *Rating* is a numerical scale that represents the "worker's
>>>>>>>>>>>> monthly cost of an unlimited transit pass". The scale provided during the
>>>>>>>>>>>> meeting went from "4 leaves" to "brown tortoise"; aligning to the leaves,
>>>>>>>>>>>> that gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to
>>>>>>>>>>>> 0-5, or lump "piggy bank" and "brown tortoise" in the same rating.
>>>>>>>>>>>> *Description* is a string that describes the employer's
>>>>>>>>>>>> transit benefits, i.e. why they got the rating they did.
>>>>>>>>>>>> *Badges* is a list of strings that represent any additional
>>>>>>>>>>>> categories we want to assign to a company (e.g. "industry leader",
>>>>>>>>>>>> "polluter").
>>>>>>>>>>>>
>>>>>>>>>>>> We can make changes to this schema if it makes it easier to
>>>>>>>>>>>> work with our underlying data visualization platform (e.g. Tableau?
>>>>>>>>>>>> DataTables?), but hopefully this is a suitable starting place.
>>>>>>>>>>>>
>>>>>>>>>>>> As an example, take a hypothetical record for Seattle Coffee
>>>>>>>>>>>> Works.
>>>>>>>>>>>>
>>>>>>>>>>>> {
>>>>>>>>>>>>     "employer": "Seattle Coffee Works",
>>>>>>>>>>>>     "industry": ["restaurant"],
>>>>>>>>>>>>     "neighborhood": ["cbd", "ballard"],
>>>>>>>>>>>>     "alias": ["Ballard Coffee Works"],
>>>>>>>>>>>>     "rating": 4,
>>>>>>>>>>>>     "description": "Provides 100% ORCA Passport subsidy."
>>>>>>>>>>>>     "badges": ["leader"]
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> *Where Our Data Lives (For Now)*
>>>>>>>>>>>>
>>>>>>>>>>>> I've also taken a rough chop at getting started with the data.
>>>>>>>>>>>> Here, I've just taken the raw list of ORCA Business Passport employers and
>>>>>>>>>>>> assigned a score based on their subsidy percentage, as an example:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>>>>>>>>>>>>
>>>>>>>>>>>> The spreadsheet contains columns for each item of the employer
>>>>>>>>>>>> record, as well as some additional columns to record the raw data we have
>>>>>>>>>>>> on file for that employer, so we can use that data to automatically or
>>>>>>>>>>>> manually determine an employer's rating.
>>>>>>>>>>>>
>>>>>>>>>>>> If we have data from other sources not listed (e.g. survey
>>>>>>>>>>>> data, City of Seattle data), the "source_" columns can be renamed or added
>>>>>>>>>>>> to represent that source's data. For example, if I want to add data from
>>>>>>>>>>>> the TRU survey, I might rename "__source_b" to "__TRU Survey", then include
>>>>>>>>>>>> results from that survey in that column for each company. (The columns
>>>>>>>>>>>> beginning with two underscores are ones I don't expect to be publicly
>>>>>>>>>>>> available.)
>>>>>>>>>>>>
>>>>>>>>>>>> PBworks feels really inadequate for editing large data sets,
>>>>>>>>>>>> and I don't know where else to put it, so it's living in Google Sheets for
>>>>>>>>>>>> now. Set to read-only with the link, for now, but please request editing
>>>>>>>>>>>> permissions so you can add stuff to the sheet.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently, my expectation is that the spreadsheet will be
>>>>>>>>>>>> hand-edited in Google Sheets, and then when we're ready to put live data in
>>>>>>>>>>>> the web app, we can export the sheet to a flat file, which we can then
>>>>>>>>>>>> import into a format appropriate for the website (big ol' JSON file,
>>>>>>>>>>>> database, whatever). Manual process, but probably fine for a project of
>>>>>>>>>>>> this scale; I'm open to alternatives.
>>>>>>>>>>>>
>>>>>>>>>>>> *Things To Do Next*
>>>>>>>>>>>>
>>>>>>>>>>>> Aside from the ORCA Passport data and the data we collected
>>>>>>>>>>>> through TRU survey / legwork (on PBworks), do we have any other data
>>>>>>>>>>>> sources that would provide context for a score?
>>>>>>>>>>>>
>>>>>>>>>>>> For the data sources we have, we'll have to start filling out
>>>>>>>>>>>> the rest of the spreadsheet, I guess?
>>>>>>>>>>>>
>>>>>>>>>>>> Also, we will need to determine:
>>>>>>>>>>>> a) master list of "industries" we want to support, and
>>>>>>>>>>>> b) "industry" field(s) for each employer
>>>>>>>>>>>> c) "neighborhood" field(s) for each employer we don't have one
>>>>>>>>>>>> for (or being more precise than what I have now)
>>>>>>>>>>>> d) which companies get tagged with which badges
>>>>>>>>>>>>
>>>>>>>>>>>> Hope that helps.
>>>>>>>>>>>>
>>>>>>>>>>>> In solidarity,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephen
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190905/36a86f5b/attachment.html>