Usage data collection (was: Re: ImageJ 2.0.0-rc-11 released)

Posted by Saalfeld, Stephan on
URL: http://imagej.273.s1.nabble.com/ImageJ-2-0-0-rc-11-released-tp5009074p5009111.html

I fully support Gabriel and Herbie in their strong rejection of this
sort of data collection and, in particular, the way it aims to dupe the
clueless, introducing it default on.

Nobody questions that this kind of data is extremely interesting and
useful.  I strongly believe that every plugin developer drools over
knowing how much their plugins got used, and when, and where.  This
lust, however, does by now means justify to actually do it.

I am also pretty sure that your intentions about this procedure are
perfectly honorable and that you're not planning any evil.  However,
that does not mean that everybody else does at any time in the future
and the data will be there, exposed to those hacking into your servers.

Please ask yourself whether you would feel comfortable about your
operating system reporting back which applications you've used when,
where and how often.  I would feel spied on.

It would be great if you could consider switching that functionality off
by default and offer interested users the choice to contribute
consciously and voluntarily.  That way, you would consciously make the
decision to gather significantly less data but in the most honorable
way, withstanding the (understandable) desire to get more faster.  In
addition to that, I would ask you to, in spirit of full Open Source and
Open Policy make the collected data (the data, not the derived
statistics) read-accessible to everybody in full and license it under an
Open Data license, e.g.

http://opendatacommons.org/licenses/odbl/

or one of the CCs

http://creativecommons.org/licenses/

Everybody can then test whether the data is truly harmless, but I
actually believe that we may find interesting ways to identify
individuals by their ImageJ usage patterns.

If you would change the usage data collection policy in this spirit, I
would consider switching it on, for a while.

I am very sorry to be so negative in this particular aspect.  I
appreciate a lot the immense amount of work you are spending to make
ImageJ2 a better analysis tool freely available to the community.

Best regards,
Stephan





On Mon, 2014-08-11 at 22:53 +0100, Gabriel Landini wrote:

> On Monday 11 Aug 2014 14:18:11 Mark Hiner wrote:
> > > SCIFIO was opt in, but usage tracking is opt out? It does not make sense.
> >
> > To be clear, SCIFIO is enabled by default.. you have to uncheck a box to
> > disable SCIFIO, so it is opt out.
>
> Right, but it was impossible to miss as I had to answer the SCIFIO dialog when
> the update came.
> What is the problem in showing a similar dialog and let people know what will
> be going on?
>
> > I think there is a difference in the questions "what do you do with the
> > software" and "what do users do with the software". I don't believe we will
> > ever ask the former question.
>
> Mark, what you or me personally *believe* somebody will ask in the future does
> not matter. It is the process of getting informed consent on the data
> collection; IJ2 is assuming and makes it less obvious than it could be.
>
> > we can ask:
> >  "How many times was Bio-Formats used with Java 7"?
> > we *can not* ask:
> >  "how many times did Gabriel Landini run Bio-Formats?"
>
> Even with my poor knowledge of network traffic I can imagine that it might
> trivial to script something using time stamps and ip addresses of the
> uploading machine as well as plenty of emails also ip addresses from users.
> Not that I remotely think that the devel team would have the time or
> inclination to do this, but if we are talking about what is impossible, I
> suspect it is not. So whether that is potentially identifiable information is
> probably debatable. If there is then you would be effectively logging in a
> database their location every hour (!) IJ2 runs. Doesn't that sound a bit
> creepy?
>
> My issue was (and remains) that data collection needs to be fully informed
> before it takes place, not to be On by default.
>
> > >If this happens to be something people want to adhere to, then there is
> > > nothing to worry about as there will be lots of users opting in when given
> > > the chance.
>  
> > I believe this is actually hard to predict.
>
> Ask the users in a similar way SCIFIO was done and you will have the answer.
> Then we would not be having this conversation.
>  
> > If usage statistics were presented similarly - with a pop up on launch and
> > an options menu - my expectations for opt-in numbers would be very low. Not
> > because people don't want to contribute but because we created a barrier to
> > the process.
>
> The issue that does not seem to stick after all this typing is that IJ2 should
> not make that decision for the users. IJ2 is not the owner of the processes
> happening in a user's computer. You need to ask, not assume, that people will
> be happy for their computers to contact a database every hour and letting it
> know they are there and doing this or that.
>
> > A more successful alternative might be, when statistics are actually being
> > uploaded, to display a dialog asking to proceed or not - with yes/no/don't
> > ask me again options. That sounds promising, but also potentially annoying
> > or confusing to get that pop up, and we can still expect statistics
> > reporting to drop.
>
> But if the reporting statistics drop, that would have to do. Make estimates
> instead of collecting all possible data.
>
> > So since we are not sending or storing use-specific data, and provided and
> > publicized the opt-out mechanism, we decided to go with the option that was
> > un-disruptive at the workflow level and maximized data collection.
>
> Yes, you said that before, and I am sure I am not alone thinking it is not the
> desirable way of doing it.
>
> > Especially given, as you mentioned, that users ultimately need to agree to
> > communicate with an external server to download these applications and
> > updates.
>
> But there is an obvious difference between the two situations. One is
> requesting an update. The other is broadcasting to a database.
>
> > I hope it's clear that I am not saying we are unwilling to change how
> > permissions are exposed.. but if we can circumvent that need via discussion
> > it would certainly be my preference. And if we do end up making any
> > changes, I would like them to be as minimally damaging to the quality of
> > the data gathering as possible.
>
> I sounds like it is preferable not to ask people about the data collection.
> That is in my view an error of judgement that can be resolved easily.
>
> > To me, there has to be actual user data being exposed to be a matter of
> > privacy. Can you clarify what you believe to be the concern here?
>
> Sure: that the process of collecting usage data is not made clear from the
> beginning and it should have informed consent before the collection starts.
>
> If there are no privacy issues, why is the function to switch it Off called
> "Privacy"?
>
> Regards
>
> Gabriel
>
> --
> ImageJ mailing list: http://imagej.nih.gov/ij/list.html

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html