[TRU Research] Web App Data Schema
Tom Chartrand
tmchartrand at gmail.com
Tue Aug 20 15:03:43 PDT 2019
Oh you're right, sorry for the confusion everyone! was just fairly hidden
in the view i looked at. Column S!
On Tue, Aug 20, 2019 at 3:00 PM Katie Wilson <katie at transitriders.org>
wrote:
> I think the spreadsheet with PII removed still does include the Employer
> column, no?
>
> Sorry I’m being slow to respond to all this good stuff, I am still digging
> myself out from being away last week and I’m at an all-day thing today… but
> I should have time to pay more attention before the end of the week!
>
> On Aug 19, 2019, at 6:26 PM, Stephen DeSanto <rachidian at gmail.com> wrote:
>
> I have time to go through the survey data and find the reported transit
> benefits per employer, though I'll need the data set that contains that
> data. :)
>
> Otherwise, I am going to be trying to match CTR neighborhoods to the
> employers already in our spreadsheet, as well as adding any employers
> mentioned in other sources/sheets on our wiki.
>
> On Mon, Aug 19, 2019 at 6:02 PM Tom Chartrand <tmchartrand at gmail.com>
> wrote:
>
>> This is looking great, Stephen!
>> I had put myself down to organize the survey data with respect to
>> employers for this, but I just realized that info was removed as PII (of
>> course)! So either Mike will need to take that on (I think Mike did the
>> original PII removal) or we'll need to figure out an appropriate way of
>> sharing that.
>> I'm feeling pretty swamped myself lately, so if you (Stephen) were down
>> to help him with the task that could be great. I can certainly still take
>> on some of it if needed though, once we get this sorted out.
>> Katie, maybe you could help coordinate this and make sure Mike sees this
>> sooner rather than later?
>>
>> Also, do let me know if you have any more specific spots in the report
>> where some backup from the PSRC dataset could be useful!
>>
>> On Sun, Aug 18, 2019 at 3:59 PM Stephen DeSanto <rachidian at gmail.com>
>> wrote:
>>
>>> I've added the list of industry categories to the Google Sheet, so that
>>> should help validate the data we add there, though it's going to likely be
>>> a manual task to fill in industries for all the employers.
>>>
>>> I've also added a "citation" column, which can be the public
>>> representation of where we got the data to make our claim. We can fuss with
>>> the wording later.
>>>
>>> I should have time this week to go through our survey data and other
>>> wiki tables to add or modify employers in the Google Sheet. Agree that
>>> it'll be good to have solid information on our primary targets and
>>> champions.
>>>
>>> On Tue, Aug 13, 2019 at 10:48 PM Harry Maher <harryb.maher at gmail.com>
>>> wrote:
>>>
>>>> Just a quick update with regard to qualitative data analysis: I made a
>>>> "Commute Survey Qualitative Data Analysis" folder on pbworks and put a doc
>>>> with some quotes in it for the report. I tried to pull out the main
>>>> relevant themes that I noticed discussed in the two qualitative questions
>>>> currently in the file with a couple of quote options for each
>>>> theme/category of response to the question.
>>>>
>>>> -Harry
>>>>
>>>> On Sun, Aug 11, 2019 at 12:54 PM Tom Chartrand <tmchartrand at gmail.com>
>>>> wrote:
>>>>
>>>>> Regarding where to have this discussion - I'm just gonna continue the
>>>>> email chain cause I haven't followed where to put the discussion on the
>>>>> wiki, but someone feel free to steer it over there if we want to!
>>>>>
>>>>> A brief update regarding establishing a larger list of employers to
>>>>> include in the dataset: basic contact information for all seattle
>>>>> businesses, sorted by the North American Industry Classification System, is
>>>>> available at
>>>>> https://web6.seattle.gov/fas/slimbizsearch/ResultsPage.aspx?NAICList=Top100,
>>>>> but it's a huge list of course, with no info on number of employees or
>>>>> revenue to filter out the smaller ones. Still, I did send off an email
>>>>> about getting a copy of the database just for purposes of cross-referencing
>>>>> names and such.
>>>>> On 8/10/19 6:42 PM, Katie Wilson wrote:
>>>>>
>>>>> For “neighborhood” I think it makes sense to use the “CTR Network
>>>>> Areas” as defined here
>>>>> <https://www.seattle.gov/transportation/projects-and-programs/programs/transportation-options-program/commute-trip-reduction-program/draft-2019-2023-networks-and-targets>
>>>>> .
>>>>>
>>>>> For “industry” I think it makes sense to use the “Employment Sector”
>>>>> categories listed on Page 12 of this CTR strategic plan.
>>>>> <https://www.seattle.gov/Documents/Departments/SDOT/TransportationOptionsProgram/CTR_Draft_Strategic_Plan_Jan2019.pdf>
>>>>>
>>>>> On the ratings, I think it does make sense to lump "piggy bank" and
>>>>> "brown tortoise" in the same rating (0), and then add a tortoise badge for
>>>>> employers that aren’t even doing the pre-tax thing.
>>>>>
>>>>> Another simplification option to consider would be to lump together 3
>>>>> and 4 leaves. But let’s leave them separate for now and depending on how
>>>>> things shake out we can easily combine them later.
>>>>>
>>>>> We don’t have any major sources of data on what benefits employers
>>>>> provide other than:
>>>>> — Metro public disclosure request spreadsheet
>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133438080/First%20Public%20Records%20Request>
>>>>> — Our commute survey
>>>>> — Info gleaned online from company websites, asking around, glassdoor
>>>>> etc (what I’ve found I’ve added to the relevant tables in the wiki
>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/132177123/Employers>,
>>>>> on CTR employers and “potential poster children” and “likely target
>>>>> assessment” and “hotels”)
>>>>>
>>>>> Maybe it makes sense to have another string indicating sufficient
>>>>> certainty — when we have two sources, or one very reliable source, we enter
>>>>> an X or whatever, and that gives us the green light to display that data.
>>>>> Also it may not make sense to put a lot of work into categorizing employers
>>>>> into Network Area and Employment Sector until we have reliable data on what
>>>>> benefits they’re offering.
>>>>>
>>>>> Speaking of Seattle Coffee Works, I spoke with their HR person a few
>>>>> months ago and actually employees have to pay $20/month (pre-tax $) if they
>>>>> want an ORCA card. Still a great deal but not 100% subsidy as reported in
>>>>> the Metro data— which, I then learned, is self-reported by the company.
>>>>> Metro only knows that all those companies are signed up for the Passport
>>>>> program. I noted the real situation on this page
>>>>> <https://seattletransitpasses-research.pbworks.com/w/page/133439169/Potential%20Poster%20Children>.
>>>>> Anyway, the point is we should probably crosscheck the Metro data as much
>>>>> as we can with our survey or other sources of information.
>>>>>
>>>>> (Also speaking of Seattle Coffee Works they have locations in Capitol
>>>>> Hill & Cascade too <https://www.seattlecoffeeworks.com/our-cafes.aspx>.
>>>>> From talking with the HR person I’m pretty sure all are include in their
>>>>> passport program, and the employees swap around a lot from location to
>>>>> location. They probably use the Ballard location as home base for transit
>>>>> pass purposes since that’s the least expensive zone.)
>>>>>
>>>>> One project would be to come up with a list of employers that have
>>>>> name recognition (or that we are interested in for some other reason) and
>>>>> put a little work into attaining sufficient certainty. If we posted the
>>>>> list to a page and put a call out on social media and email I bet we’d get
>>>>> some answers.
>>>>>
>>>>> On Aug 8, 2019, at 5:26 PM, Stephen DeSanto <rachidian at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I've taken a first pass at the data schema for showing employer
>>>>> transit benefits in our upcoming web app. In this draft, each employer
>>>>> record is represented as follows:
>>>>>
>>>>> {
>>>>> "employer": string,
>>>>> "industry": [string],
>>>>> "neighborhood": [string],
>>>>> "alias": [string],
>>>>> "rating": int,
>>>>> "description": string
>>>>> "badges": [string]
>>>>> }
>>>>>
>>>>> *Employer* is a plain text string.
>>>>> *Industry* is a list of strings (or a single string, if we want to
>>>>> limit one employer = one industry).
>>>>> *Neighborhood* is treated similarly to industry
>>>>> *Alias* is a list of other names for the same company. For example,
>>>>> *Rating* is a numerical scale that represents the "worker's monthly
>>>>> cost of an unlimited transit pass". The scale provided during the meeting
>>>>> went from "4 leaves" to "brown tortoise"; aligning to the leaves, that
>>>>> gives us a scale of [-1, 0, 1, 2, 3, 4]. We could adjust this up to 0-5, or
>>>>> lump "piggy bank" and "brown tortoise" in the same rating.
>>>>> *Description* is a string that describes the employer's transit
>>>>> benefits, i.e. why they got the rating they did.
>>>>> *Badges* is a list of strings that represent any additional
>>>>> categories we want to assign to a company (e.g. "industry leader",
>>>>> "polluter").
>>>>>
>>>>> We can make changes to this schema if it makes it easier to work with
>>>>> our underlying data visualization platform (e.g. Tableau? DataTables?), but
>>>>> hopefully this is a suitable starting place.
>>>>>
>>>>> As an example, take a hypothetical record for Seattle Coffee Works.
>>>>>
>>>>> {
>>>>> "employer": "Seattle Coffee Works",
>>>>> "industry": ["restaurant"],
>>>>> "neighborhood": ["cbd", "ballard"],
>>>>> "alias": ["Ballard Coffee Works"],
>>>>> "rating": 4,
>>>>> "description": "Provides 100% ORCA Passport subsidy."
>>>>> "badges": ["leader"]
>>>>> }
>>>>>
>>>>> *Where Our Data Lives (For Now)*
>>>>>
>>>>> I've also taken a rough chop at getting started with the data. Here,
>>>>> I've just taken the raw list of ORCA Business Passport employers and
>>>>> assigned a score based on their subsidy percentage, as an example:
>>>>>
>>>>>
>>>>> https://docs.google.com/spreadsheets/d/1HmOcG7hJLD1G0unCMPcsDnXr4RIA_PMKEE5ne-hhQR8/edit?usp=sharing
>>>>>
>>>>> The spreadsheet contains columns for each item of the employer record,
>>>>> as well as some additional columns to record the raw data we have on file
>>>>> for that employer, so we can use that data to automatically or manually
>>>>> determine an employer's rating.
>>>>>
>>>>> If we have data from other sources not listed (e.g. survey data, City
>>>>> of Seattle data), the "source_" columns can be renamed or added to
>>>>> represent that source's data. For example, if I want to add data from the
>>>>> TRU survey, I might rename "__source_b" to "__TRU Survey", then include
>>>>> results from that survey in that column for each company. (The columns
>>>>> beginning with two underscores are ones I don't expect to be publicly
>>>>> available.)
>>>>>
>>>>> PBworks feels really inadequate for editing large data sets, and I
>>>>> don't know where else to put it, so it's living in Google Sheets for now.
>>>>> Set to read-only with the link, for now, but please request editing
>>>>> permissions so you can add stuff to the sheet.
>>>>>
>>>>> Currently, my expectation is that the spreadsheet will be hand-edited
>>>>> in Google Sheets, and then when we're ready to put live data in the web
>>>>> app, we can export the sheet to a flat file, which we can then import into
>>>>> a format appropriate for the website (big ol' JSON file, database,
>>>>> whatever). Manual process, but probably fine for a project of this scale;
>>>>> I'm open to alternatives.
>>>>>
>>>>> *Things To Do Next*
>>>>>
>>>>> Aside from the ORCA Passport data and the data we collected through
>>>>> TRU survey / legwork (on PBworks), do we have any other data sources that
>>>>> would provide context for a score?
>>>>>
>>>>> For the data sources we have, we'll have to start filling out the rest
>>>>> of the spreadsheet, I guess?
>>>>>
>>>>> Also, we will need to determine:
>>>>> a) master list of "industries" we want to support, and
>>>>> b) "industry" field(s) for each employer
>>>>> c) "neighborhood" field(s) for each employer we don't have one for (or
>>>>> being more precise than what I have now)
>>>>> d) which companies get tagged with which badges
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> In solidarity,
>>>>>
>>>>> Stephen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.transitriders.org/pipermail/research/attachments/20190820/234b7627/attachment.html>
More information about the Research
mailing list