[TRU Research] Web App Data Schema
Stephen DeSanto
rachidian at gmail.com
Thu Sep 5 11:42:00 PDT 2019
Hi friends, happy Thursday. I am dealing with some personal issues that's
preventing me from dedicating a lot of time and energy to this project
right now. I hope you all can get this over the finish line.
In solidarity,
Stephen
On Tue, Aug 27, 2019 at 4:23 PM Katie Wilson <katie at transitriders.org>
wrote:
> Great! I will head over there at 5:30. I’ll grab the conference room if
> it’s free, otherwise I’ll just find us a table.
>
> On Aug 27, 2019, at 12:18 PM, Jim Walseth <jim.walseth at gmail.com> wrote:
>
> I should be able to attend as well. Cheers, -Jim
>
> On Mon, Aug 26, 2019 at 3:32 PM Stephen DeSanto <rachidian at gmail.com>
> wrote:
>
>> Victrola is fine! 5:30 works for me. Gives me time to get there after
>> work.
>>
>> On Mon, Aug 26, 2019 at 3:09 PM Katie Wilson <katie at transitriders.org>
>> wrote:
>>
>>> Cool, does Victrola on 15th work again, or is somewhere else better?
>>> Having an actual meeting, even if a small one, will help me to sit down and
>>> focus on this instead of getting distracted by other things. 5pm good, or
>>> 5:30 or 6?
>>>
>>> On Aug 26, 2019, at 9:38 AM, Stephen DeSanto <rachidian at gmail.com>
>>> wrote:
>>>
>>> I'm also available Tues evening. Otherwise I'll keep making updates from
>>> home. :)
>>>
>>> On Sun, Aug 25, 2019 at 4:43 PM Katie Wilson <katie at transitriders.org>
>>> wrote:
>>>
>>>> This is awesome, thank you Stephen. I put some thoughts in-line in red below,
>>>> and attached a hard-to-interpret spreadsheet with info about Business
>>>> Choice participants.
>>>>
>>>> I will try to schedule some time this week to start completing rows for
>>>> the businesses I’m sure about based on the info we have. If anyone wants to
>>>> get together for a spreadsheet workparty this week let me know, I have time
>>>> Tuesday and Friday evenings.
>>>>
>>>> Katie
>>>>
>>>> On Aug 25, 2019, at 3:37 PM, Stephen DeSanto <rachidian at gmail.com>
>>>> wrote:
>>>>
>>>> Hi everyone. Made a few updates to the master list of employers
>>>> <https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing>
>>>> for the upcoming website:
>>>>
>>>> - Added companies and transit benefits raw descriptions from TRU
>>>> survey data
>>>> - Added "Likely CTR Targets":
>>>> https://seattletransitpasses-research.pbworks.com/w/page/133438365/Likely%20Target%20Assessment
>>>> - Added "Potential CTR Targets":
>>>> https://seattletransitpasses-research.pbworks.com/w/page/133437828/Potential%20CTR%20Targets
>>>> - "Likely" / high-profile targets (hotels, banks) are highlighted
>>>> in ORANGE
>>>> - Added "Potential Poster Children":
>>>> https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children
>>>> - Poster children are highlighted in GREEN
>>>>
>>>> Highlight colors are just to make it easier to find rows that a)
>>>> someone said should be included in the list, and b) probably needs benefits
>>>> data incorporated
>>>>
>>>> Things to be done:
>>>>
>>>> - Normalize locations data For this and the “leaf scores” and the
>>>> “polluter” columns, I’d be inclined to do this after we’ve got
>>>> enough info to check the “publish” box. I could be wrong, but I feel like
>>>> it will be less work that way.
>>>> - Assign "leaf scores" to all companies that don't have one
>>>> - Assign "polluter" etc badges to companies we want to name&shame
>>>> - Add all of the hotels?
>>>> https://seattletransitpasses-research.pbworks.com/w/page/133666440/Hotels
>>>> Yeah let’s go ahead and add them...
>>>> - Add Choice participants? Good question. I did get info back from
>>>> Metro on what products the choice participants are buying, I can’t remember
>>>> whether I shared that with you all. Anyway, it’s attached. It’s actually a
>>>> little hard to interpret (I got a tutorial from a Metro staffer) so I can
>>>> try to explain by phone or in person if someone wants to dig throughthat.
>>>> Maybe it makes sense to look through that info and add businesses
>>>> selectively as we feel like we have a grasp on their programs.
>>>> - Add column for Commute Seattle participants?
>>>> https://seattletransitpasses-research.pbworks.com/w/page/133438167/Commute%20Seattle%20List%20of%20Passport%20Participants I
>>>> don’t think we need to do this, because this info is most likely
>>>> duplicative with what we learned from Metro about passport participants.
>>>> The Commute Seattle list doesn’t tell us how much of a subsidy they
>>>> provide, so it’s not going to add much.
>>>>
>>>>
>>>> - Citations, descriptions of benefits, etc. for companies that need
>>>> it
>>>>
>>>> There's still a lot we're not 100% sure about for employer benefits,
>>>> but we can do the best we have with what we've got, and make changes as we
>>>> get new information.
>>>>
>>>> My thinking was, we can use the "master employer list" to get as much
>>>> information about the companies we're interested in. When we're satisfied
>>>> that a row is finished and ready for publication, check the checkbox in the
>>>> "__publish" column. Then, when we export this data to the website, we can
>>>> only get the rows where "__publish" is checked. This hopefully will ensure
>>>> that someone manually reviewed and verified all the data for an employer
>>>> before it gets published, and that unfinished rows won't be accidentally
>>>> exported.
>>>>
>>>> Is this helpful? Am I just spinning my wheels in the mud?
>>>>
>>>>
>>>>
>>>> On Thu, Aug 22, 2019 at 7:06 PM Stephen DeSanto <rachidian at gmail.com>
>>>> wrote:
>>>>
>>>>> FYI, added most of you as editors on the spreadsheet I'm working on,
>>>>> in case anyone has time for tedious data tasks (or a quick way to do
>>>>> tedious data tasks). I'm currently adding in data from the TRU survey, from
>>>>> respondents whose employers offer transit benefits. Eventually, we'll need
>>>>> these tagged with industry and fix the neighborhoods data? And add in any
>>>>> other company data we have from the other research spreadsheets on the
>>>>> wiki? And eventually some subset of this data ends up on the website?
>>>>>
>>>>> On Tue, Aug 20, 2019 at 3:03 PM Tom Chartrand <tmchartrand at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Oh you're right, sorry for the confusion everyone! was just fairly
>>>>>> hidden in the view i looked at. Column S!
>>>>>>
>>>>>> On Tue, Aug 20, 2019 at 3:00 PM Katie Wilson <katie at transitriders.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the spreadsheet with PII removed still does include the
>>>>>>> Employer column, no?
>>>>>>>
>>>>>>> Sorry I’m being slow to respond to all this good stuff, I am still
>>>>>>> digging myself out from being away last week and I’m at an all-day thing
>>>>>>> today… but I should have time to pay more attention before the end of the
>>>>>>> week!
>>>>>>>
>>>>>>> On Aug 19, 2019, at 6:26 PM, Stephen DeSanto <rachidian at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I have time to go through the survey data and find the reported
>>>>>>> transit benefits per employer, though I'll need the data set that contains
>>>>>>> that data. :)
>>>>>>>
>>>>>>> Otherwise, I am going to be trying to match CTR neighborhoods to the
>>>>>>> employers already in our spreadsheet, as well as adding any employers
>>>>>>> mentioned in other sources/sheets on our wiki.
>>>>>>>
>>>>>>> On Mon, Aug 19, 2019 at 6:02 PM Tom Chartrand <tmchartrand at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is looking great, Stephen!
>>>>>>>> I had put myself down to organize the survey data with respect to
>>>>>>>> employers for this, but I just realized that info was removed as PII (of
>>>>>>>> course)! So either Mike will need to take that on (I think Mike did the
>>>>>>>> original PII removal) or we'll need to figure out an appropriate way of
>>>>>>>> sharing that.
>>>>>>>> I'm feeling pretty swamped myself lately, so if you (Stephen) were
>>>>>>>> down to help him with the task that could be great. I can certainly still
>>>>>>>> take on some of it if needed though, once we get this sorted out.
>>>>>>>> Katie, maybe you could help coordinate this and make sure Mike sees
>>>>>>>> this sooner rather than later?
>>>>>>>>
>>>>>>>> Also, do let me know if you have any more specific spots in the
>>>>>>>> report where some backup from the PSRC dataset could be useful!
>>>>>>>>
>>>>>>>> On Sun, Aug 18, 2019 at 3:59 PM Stephen DeSanto <
>>>>>>>> rachidian at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I've added the list of industry categories to the Google Sheet, so
>>>>>>>>> that should help validate the data we add there, though it's going to
>>>>>>>>> likely be a manual task to fill in industries for all the employers.
>>>>>>>>>
>>>>>>>>> I've also added a "citation" column, which can be the public
>>>>>>>>> representation of where we got the data to make our claim. We can fuss with
>>>>>>>>> the wording later.
>>>>>>>>>
>>>>>>>>> I should have time this week to go through our survey data and
>>>>>>>>> other wiki tables to add or modify employers in the Google Sheet. Agree
>>>>>>>>> that it'll be good to have solid information on our primary targets and
>>>>>>>>> champions.
>>>>>>>>>
>>>>>>>>> On Tue, Aug 13, 2019 at 10:48 PM Harry Maher <
>>>>>>>>> harryb.maher at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Just a quick update with regard to qualitative data analysis: I
>>>>>>>>>> made a "Commute Survey Qualitative Data Analysis" folder on pbworks and put
>>>>>>>>>> a doc with some quotes in it for the report. I tried to pull out the main
>>>>>>>>>> relevant themes that I noticed discussed in the two qualitative questions
>>>>>>>>>> currently in the file with a couple of quote options for each
>>>>>>>>>> theme/category of response to the question.
>>>>>>>>>>
>>>>>>>>>> -Harry
>>>>>>>>>>
>>>>>>>>>> On Sun, Aug 11, 2019 at 12:54 PM Tom Chartrand <
>>>>>>>>>> tmchartrand at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Regarding where to have this discussion - I'm just gonna
>>>>>>>>>>> continue the email chain cause I haven't followed where to put the
>>>>>>>>>>> discussion on the wiki, but someone feel free to steer it over there if we
>>>>>>>>>>> want to!
>>>>>>>>>>>
>>>>>>>>>>> A brief update regarding establishing a larger list of employers
>>>>>>>>>>> to include in the dataset: basic contact information for all seattle
>>>>>>>>>>> businesses, sorted by the North American Industry Classification System, is
>>>>>>>>>>> available at
>>>>>>>>>>> https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
>>>>>>>>>>> but it's a huge list of course, with no info on number of employees or
>>>>>>>>>>> revenue to filter out the smaller ones. Still, I did send off an email
>>>>>>>>>>> about getting a copy of the database just for purposes of cross-referencing
>>>>>>>>>>> names and such.
>>>>>>>>>>> On 8/10/19 6:42 PM, Katie Wilson wrote:
>>>>>>>>>>>
>>>>>>>>>>> For “neighborhood” I think it makes sense to use the “CTR
>>>>>>>>>>> Network Areas” as defined here
>>>>>>>>>>> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>> For “industry” I think it makes sense to use the “Employment
>>>>>>>>>>> Sector” categories listed on Page 12 of this CTR strategic plan.
>>>>>>>>>>> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>>>>>>>>>>>
>>>>>>>>>>> On the ratings, I think it does make sense to lump "piggy bank"
>>>>>>>>>>> and "brown tortoise" in the same rating (0), and then add a tortoise badge
>>>>>>>>>>> for employers that aren’t even doing the pre-tax thing.
>>>>>>>>>>>
>>>>>>>>>>> Another simplification option to consider would be to lump
>>>>>>>>>>> together 3 and 4 leaves. But let’s leave them separate for now and
>>>>>>>>>>> depending on how things shake out we can easily combine them later.
>>>>>>>>>>>
>>>>>>>>>>> We don’t have any major sources of data on what benefits
>>>>>>>>>>> employers provide other than:
>>>>>>>>>>> — Metro public disclosure request spreadsheet
>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
>>>>>>>>>>> — Our commute survey
>>>>>>>>>>> — Info gleaned online from company websites, asking around,
>>>>>>>>>>> glassdoor etc (what I’ve found I’ve added to the relevant
>>>>>>>>>>> tables in the wiki
>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
>>>>>>>>>>> on CTR employers and “potential poster children” and “likely target
>>>>>>>>>>> assessment” and “hotels”)
>>>>>>>>>>>
>>>>>>>>>>> Maybe it makes sense to have another string indicating
>>>>>>>>>>> sufficient certainty — when we have two sources, or one very reliable
>>>>>>>>>>> source, we enter an X or whatever, and that gives us the green light to
>>>>>>>>>>> display that data. Also it may not make sense to put a lot of work into
>>>>>>>>>>> categorizing employers into Network Area and Employment Sector until we
>>>>>>>>>>> have reliable data on what benefits they’re offering.
>>>>>>>>>>>
>>>>>>>>>>> Speaking of Seattle Coffee Works, I spoke with their HR person a
>>>>>>>>>>> few months ago and actually employees have to pay $20/month (pre-tax $) if
>>>>>>>>>>> they want an ORCA card. Still a great deal but not 100% subsidy as reported
>>>>>>>>>>> in the Metro data— which, I then learned, is self-reported by the company.
>>>>>>>>>>> Metro only knows that all those companies are signed up for the Passport
>>>>>>>>>>> program. I noted the real situation on this page
>>>>>>>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
>>>>>>>>>>> Anyway, the point is we should probably crosscheck the Metro data as much
>>>>>>>>>>> as we can with our survey or other sources of information.
>>>>>>>>>>>
>>>>>>>>>>> (Also speaking of Seattle Coffee Works they have locations in Capitol
>>>>>>>>>>> Hill & Cascade too
>>>>>>>>>>> <https://www.seattlecoffeeworks.com/our-cafes.aspx>. From
>>>>>>>>>>> talking with the HR person I’m pretty sure all are include in their
>>>>>>>>>>> passport program, and the employees swap around a lot from location to
>>>>>>>>>>> location. They probably use the Ballard location as home base for transit
>>>>>>>>>>> pass purposes since that’s the least expensive zone.)
>>>>>>>>>>>
>>>>>>>>>>> One project would be to come up with a list of employers that
>>>>>>>>>>> have name recognition (or that we are interested in for some other reason)
>>>>>>>>>>> and put a little work into attaining sufficient certainty. If we posted the
>>>>>>>>>>> list to a page and put a call out on social media and email I bet we’d get
>>>>>>>>>>> some answers.
>>>>>>>>>>>
>>>>>>>>>>> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> I've taken a first pass at the data schema for showing employer
>>>>>>>>>>> transit benefits in our upcoming web app. In this draft, each employer
>>>>>>>>>>> record is represented as follows:
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> "employer": string,
>>>>>>>>>>> "industry": [string],
>>>>>>>>>>> "neighborhood": [string],
>>>>>>>>>>> "alias": [string],
>>>>>>>>>>> "rating": int,
>>>>>>>>>>> "description": string
>>>>>>>>>>> "badges": [string]
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> *Employer* is a plain text string.
>>>>>>>>>>> *Industry* is a list of strings (or a single string, if we want
>>>>>>>>>>> to limit one employer = one industry).
>>>>>>>>>>> *Neighborhood* is treated similarly to industry
>>>>>>>>>>> *Alias* is a list of other names for the same company. For
>>>>>>>>>>> example,
>>>>>>>>>>> *Rating* is a numerical scale that represents the "worker's
>>>>>>>>>>> monthly cost of an unlimited transit pass". The scale provided during the
>>>>>>>>>>> meeting went from "4 leaves" to "brown tortoise"; aligning to the leaves,
>>>>>>>>>>> that gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to
>>>>>>>>>>> 0-5, or lump "piggy bank" and "brown tortoise" in the same rating.
>>>>>>>>>>> *Description* is a string that describes the employer's transit
>>>>>>>>>>> benefits, i.e. why they got the rating they did.
>>>>>>>>>>> *Badges* is a list of strings that represent any additional
>>>>>>>>>>> categories we want to assign to a company (e.g. "industry leader",
>>>>>>>>>>> "polluter").
>>>>>>>>>>>
>>>>>>>>>>> We can make changes to this schema if it makes it easier to work
>>>>>>>>>>> with our underlying data visualization platform (e.g. Tableau?
>>>>>>>>>>> DataTables?), but hopefully this is a suitable starting place.
>>>>>>>>>>>
>>>>>>>>>>> As an example, take a hypothetical record for Seattle Coffee
>>>>>>>>>>> Works.
>>>>>>>>>>>
>>>>>>>>>>> {
>>>>>>>>>>> "employer": "Seattle Coffee Works",
>>>>>>>>>>> "industry": ["restaurant"],
>>>>>>>>>>> "neighborhood": ["cbd", "ballard"],
>>>>>>>>>>> "alias": ["Ballard Coffee Works"],
>>>>>>>>>>> "rating": 4,
>>>>>>>>>>> "description": "Provides 100% ORCA Passport subsidy."
>>>>>>>>>>> "badges": ["leader"]
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> *Where Our Data Lives (For Now)*
>>>>>>>>>>>
>>>>>>>>>>> I've also taken a rough chop at getting started with the data.
>>>>>>>>>>> Here, I've just taken the raw list of ORCA Business Passport employers and
>>>>>>>>>>> assigned a score based on their subsidy percentage, as an example:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>> The spreadsheet contains columns for each item of the employer
>>>>>>>>>>> record, as well as some additional columns to record the raw data we have
>>>>>>>>>>> on file for that employer, so we can use that data to automatically or
>>>>>>>>>>> manually determine an employer's rating.
>>>>>>>>>>>
>>>>>>>>>>> If we have data from other sources not listed (e.g. survey data,
>>>>>>>>>>> City of Seattle data), the "source_" columns can be renamed or added to
>>>>>>>>>>> represent that source's data. For example, if I want to add data from the
>>>>>>>>>>> TRU survey, I might rename "__source_b" to "__TRU Survey", then include
>>>>>>>>>>> results from that survey in that column for each company. (The columns
>>>>>>>>>>> beginning with two underscores are ones I don't expect to be publicly
>>>>>>>>>>> available.)
>>>>>>>>>>>
>>>>>>>>>>> PBworks feels really inadequate for editing large data sets, and
>>>>>>>>>>> I don't know where else to put it, so it's living in Google Sheets for now.
>>>>>>>>>>> Set to read-only with the link, for now, but please request editing
>>>>>>>>>>> permissions so you can add stuff to the sheet.
>>>>>>>>>>>
>>>>>>>>>>> Currently, my expectation is that the spreadsheet will be
>>>>>>>>>>> hand-edited in Google Sheets, and then when we're ready to put live data in
>>>>>>>>>>> the web app, we can export the sheet to a flat file, which we can then
>>>>>>>>>>> import into a format appropriate for the website (big ol' JSON file,
>>>>>>>>>>> database, whatever). Manual process, but probably fine for a project of
>>>>>>>>>>> this scale; I'm open to alternatives.
>>>>>>>>>>>
>>>>>>>>>>> *Things To Do Next*
>>>>>>>>>>>
>>>>>>>>>>> Aside from the ORCA Passport data and the data we collected
>>>>>>>>>>> through TRU survey / legwork (on PBworks), do we have any other data
>>>>>>>>>>> sources that would provide context for a score?
>>>>>>>>>>>
>>>>>>>>>>> For the data sources we have, we'll have to start filling out
>>>>>>>>>>> the rest of the spreadsheet, I guess?
>>>>>>>>>>>
>>>>>>>>>>> Also, we will need to determine:
>>>>>>>>>>> a) master list of "industries" we want to support, and
>>>>>>>>>>> b) "industry" field(s) for each employer
>>>>>>>>>>> c) "neighborhood" field(s) for each employer we don't have one
>>>>>>>>>>> for (or being more precise than what I have now)
>>>>>>>>>>> d) which companies get tagged with which badges
>>>>>>>>>>>
>>>>>>>>>>> Hope that helps.
>>>>>>>>>>>
>>>>>>>>>>> In solidarity,
>>>>>>>>>>>
>>>>>>>>>>> Stephen
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190905/ff44b820/attachment.html>
More information about the Research
mailing list