bbgm - the discussion - Latest Comments

Re: business|bytes|genes|molecules

harijay — Sun, 22 Mar 2015 21:14:53 -0000

Great to see you blog again..I too was planning to get back into it when I saw the death of Friendfeed announcement.

Re: business|bytes|genes|molecules

mndoci — Mon, 28 Jan 2013 18:41:29 -0000

I'll ask a question: How do you define the commercial research community? Is it a pharma company? A small biotech? A company made up of two people in a hackerspace? There are many models for commercializing academic software. This is the one that ends up being frustrating and leaves the door open for people to move to something else that has much more clear cut boundaries.

In the end your goal is to do right by your customers and decide who those customers are and what their experience should be. Playing armchair quarterback, this does not seem like a very customer friendly choice. Time will tell.

Re: business|bytes|genes|molecules

Geraldine Van der Auwera — Mon, 28 Jan 2013 17:18:42 -0000

Hi Deepak,

Mick Watson made some interesting points in his blog, which you seem to agree with, so you may find my response over there to be of some interest.

I won't try to change anybody's mind on the open source vs. free for academic use only debate -- these are ultimately matters of personal preference and worldview. I just want to say for the record that we did our best to minimize any obstacles for academic research scientists as well as developers who work with the GATK framework. This is not to imply that academic research is in any way superior to or more worthy than commercial research. But it recognizes that these are two different (if somewhat overlapping) worlds with different dynamics, needs and resources, and the new licensing model is our way of dealing with this duality and its consequences for our development and support activities. We honestly believe that the commercial research community will be better served by Appistry than by us, while the move allows us to focus better on the academic community.

Regarding your point on GATK as a community resource, I would nuance this to say that the GATK *framework* (including the engine, infrastructure libraries and many utility tools) is indeed a community resource that has seen valuable contributions from a rich developer community. Accordingly, the framework remains as previously open and free to all under the original MIT license. What is "protected" by the new license is a subset of the suite of tools that is produced internally by our team, to which there are typically no external contributions. We do as a result feel that we have some reasonable claim to ownership, which gives us the privilege of choosing the terms under which we share them with the community -- which we try to do as broadly as possible within some constraints.

Finally, I'll just add that the question of the no-added-value companies is not much more than a distraction from the issues that really concern us. It has been entertaining to disparage them a little, and they do raise a few legitimate questions. In that context you make some worthwhile points about issues in the sci software system generally. But those companies are really not the driving motivation for the changes we are making, so there is not much point in trying to deconstruct our actions through that particular lens.

Re: business|bytes|genes|molecules

Peter Cock — Mon, 28 Jan 2013 10:43:29 -0000

I like the wording "complete distaste", I just went with "sucks" for my blog post title:
http://blastedbio.blogspot....

Re: My chem coach carnival - business|bytes|genes|molecules

Susan Baxter — Sat, 27 Oct 2012 00:29:56 -0000

I'm not sure it's as good as voting - but glad you stood up as a chemist after so many years! Cheers, Deepak (and thanks!)

Re: Titus makes my life easy - business|bytes|genes|molecules

Titus Brown — Fri, 19 Oct 2012 09:50:10 -0000

Supply/demand, not to mention inflation.

Re: Titus makes my life easy - business|bytes|genes|molecules

mndoci — Fri, 19 Oct 2012 09:48:45 -0000

And here I thought latte's and Pho would be sufficient

Re: Titus makes my life easy - business|bytes|genes|molecules

Titus Brown — Thu, 18 Oct 2012 22:49:48 -0000

Remember, time is money -- I take Visa, AmEx, Mastercard, and PayPal. :)

Re: Scientific software and being customer centric - business|bytes|genes|molecules

Titus Brown — Mon, 01 Oct 2012 15:50:54 -0000

I would imagine most people would download software by cloning and compiling it; at least, that's what my tutorials say to do :). Only developers would fork on github.

Re: Scientific software and being customer centric - business|bytes|genes|molecules

mndoci — Mon, 01 Oct 2012 02:40:22 -0000

I definitely agree that needs to be figured out. Forks don't count? In a git world you fork and then submit pull requests, so checkouts are somewhat redundant, aren't they

Re: Scientific software and being customer centric - business|bytes|genes|molecules

Titus Brown — Mon, 01 Oct 2012 00:53:03 -0000

More generally, I always worry about how to present software to grant reviewers. Should I say it's being used by lots of people? What is "lots"? Or should I emphasize the problems it could solve? Or what? I have no idea.

Re: Scientific software and being customer centric - business|bytes|genes|molecules

Titus Brown — Mon, 01 Oct 2012 00:52:09 -0000

Well, and the incentives for the PIs need to change, too. Funding is part of that, but (under your model) if I don't publish new algorithms, and therefore don't publish papers, then I don't get citations. So we need metrics for software utility. This is quite hard in an age where the most practical & least-effort thing to do is to just publish it on github, which doesn't track checkouts!

Re: business|bytes|genes|molecules

Manoj Samanta — Wed, 05 Sep 2012 13:00:21 -0000

http://www.homolog.us/blogs...

Re: business|bytes|genes|molecules

Peter — Mon, 27 Aug 2012 13:44:13 -0000

Thing is, the majority of code in our lab is written by grad students. Little scripts to get their research done. They publish their thesis and move on, and the "code residue" is not worth much. In addition, we're a bioinformatics lab,about eighty percent of our grad students come from a life sciences background with no prior exposure to coding. The physicists on the other side of at least have undergrad python classes, but its not party of the biology curriculum yet.

So I agree with what you have written but the incentives line up all wrong. Most scientists get little career credit for their coding efforts. And they move often, leaving a debt trail behind. Of course this is not true at core facilities, and they have no excuse for poor practice.

Cameron Neylon writes persuasively on this stuff, and likens grad school practice to the workings of a mediaeval guild. There are powerful examples of how to do things better, but those tend to be in instances where an external goal (eg developing new TB drugs) changes the incentive structure and this is supported by new practices.

Re: business|bytes|genes|molecules

Peter Cock — Tue, 31 Jul 2012 04:49:58 -0000

I did a Storify of this Twitter discussions about this earlier this month, http://storify.com/pjacock/... - quite a few people are grumbling about this move,

Re: business|bytes|genes|molecules

Simone Brunozzi — Mon, 30 Jul 2012 13:35:49 -0000

Lovely! Can't wait to hear more from you :)

Re: business|bytes|genes|molecules

Brad Chapman — Mon, 30 Jul 2012 13:02:14 -0000

Mark;
I can give an open source contributor point of view. I've written a large open-source system heavily dependent on the GATK, contributed patches back to the source tree, and also helped with writing documentation and generally answering GATK questions in the community.

I understand your dilemma with regards to funding. The science software funding model is fundamentally broken right now. We need better paths to fund development, maintenance and documentation of software.

On the other side, we also need to work more closely together as a community instead of continuously reinventing approaches. This is my worry with the dual licensed model: it fragments the community, encourages duplication to circumvent the licensing/fees, and creates an additional licensing explanation burden for anyone building on top of the GATK. Right now it's especially tough to build off new GATK licensed features as we can't estimate the future GATK costs for commercial collaborators.

To follow on your last sentence about a small number of patches, there are a few things that could help with gaining contributions:

- The development work on GATK is relatively opaque to the non-Broad community. Lots of features magically appear when moved from private to public repositories, so it's hard to undertake a larger project without potentially duplicating effort or building off features that are changing.

- The support and upgrade model is not especially developer friendly. There are non-compatible API and command line changes on many releases without a deprecation or documentation process in place. This is more problematic because asking questions about the API is actively discouraged. I fully understand your support burden and think you do a great job with helping people through issues, but there needs to be at least some help for developers to cultivate open source contributions; otherwise people get stuck and give up. This is a relatively recent example of a reasonable question about open file handles:

http://gatk.vanillaforums.c...

- Recognizing contributions: with my small patches, there wasn't any mention of the contribution in the release notes or otherwise in the code base. The scientific culture now is based on recognition of effort, so this could help encourage folks.

Hope this is helpful feedback. Thanks again for all the great work you do with GATK. It is much appreciated,
Brad

Re: business|bytes|genes|molecules

mndoci — Mon, 30 Jul 2012 09:50:16 -0000

Mark

My position comes from having observed a lot of mixed source models in chemistry and observing the success of open licensing outside of the sciences in fostering community and rapid evolution.

My concerns are independent of the beta and fall into two categories

1. The assumption that all commercial entities are cash rich. This is part of the reason there is limited startup innovation in the sciences. You're essentially shutting out the 2 smart guys in a garage who want to do interesting things. Their option will be to write their own or use something else. You could argue that this is not the spirit of the licensing and they will be accommodated, but that's a non-scalable solution

2. Lost opportunity to get contributions from the broader community.

My arguments are not GATK specific. Regarding the lack of contribution, the better question is why? I do think there are way too many codes in the genome analysis and not enough people and the community is better served collaborating on a few and making them better rather than competing, but that's a flaw in the incentive model/funding system.

Re: business|bytes|genes|molecules

Mark DePristo — Mon, 30 Jul 2012 08:27:24 -0000

Hi Deepak,

Thank you for sharing your comments. I have a few follow-up questions. One, are you primarily concerned with the mixed source model or the fact that current (while the beta is active) restriction to non-commercial entities, that will be lifted when the commercial license is available? Certainly those who want to use the GATK within their research program will be able to use the full version -- academic or commercial -- in the near future, but the commercial entities will need to purchase a commercial license which comes with superior support.

What would you see as workable alternatives to the mixed source model? What do you see as the largest concerns with a mixed source model? Above you mostly describe the lost opportunity to extend and then redistribute code back to the community. In the past three years we've only received a handful of patches to any of the GATK codebase, and of course all of this occurred in source code that remains open.

Re: Data, software, and money

mndoci — Sun, 12 Jun 2011 02:53:02 -0000

Steve,

Thanks for the clarification. Historically I've never been a fan of the pure data play, but there are enough counterexamples out there. The thing I worry about data in itself is that eventually in most cases the data becomes enough of a commodity that the value diminishes and empowering users with the software bits that extract information is where the interesting bits lie, but I suspect if we make a matrix we'll find enough examples to conclude that we both are right.

I do agree that the pure tangible software company is long in the tooth.

Re: Data, software, and money

stephen o'grady — Tue, 31 May 2011 23:52:10 -0000

I'm not sure we're as far apart as you believe. True, I do believe that data itself has value. Acxiom, for example, exists to aggregate and sell data. Closer to home, Spiceworks gives away a software product and leverages the data as a saleable asset.

But ultimately, yes, the purpose of software is, as you say, "to bring value to the end user from the data." For some software producers, the value of data will be direct. For others, it will be less so. Either way, it will have value for those that leverage it.

Re: The data is the question

mndoci — Tue, 05 Apr 2011 13:26:54 -0000

The examples may not have been the best to put in the same post, but your second point is one that I subscribe to; One person's output is someone else's input.

Re: The data is the question

Todd Smith — Tue, 05 Apr 2011 10:50:46 -0000

Deepak,

IMHO, your example mixes apples and oranges. The apples being "raw data," and the oranges being "processed data." I'm, of course making an assumption based on the characters in your story. Richard Durbin from the Sanger Center is involved with collecting sequence data and Joe Dudley is involved with analyzing GWAS, GEO, medical record, and other data.

As I think about it, what's interesting about this juxtaposition, is how we define data, information, and knowledge. One persons information (the output of processing data) is another's data (the input to data processing).

Re: Abundance

Paweł Szczęsny — Mon, 14 Feb 2011 04:45:06 -0000

This post reminded me of Long Term Science posts of Long Now Foundation blog: http://blog.longnow.org/cat... (not that scientific as the name suggests, but still). Too bad "always think in terms of your next paper" is a dominant mode of thinking in academia.

Re: Abundance

neilfws — Sun, 13 Feb 2011 21:27:13 -0000

As someone who is (to some degree) in the firing line, I absolutely agree with all points.

I'm increasingly concerned that scientific research is on the wrong path and has been for some time. It seems to be less and less about doing good science and more and more about preserving individual careers, at any cost. In fact it's reached the point where I would advise "smart young people" to avoid academia altogether.

I'm also concerned that we simply cannot do what we want to do with the available data, never mind that which is yet to come. The lack of concern/interest by scientists about storage, archiving, standards and APIs makes it impossible for people like me to use public data effectively; we simply cannot access it in a form suitable for large-scale, integrated computations.

To be quite honest, the business of doing bioinformatics has become depressing. Bioinformaticians are becoming people who know how to use and apply existing software tools to rather small, local data sets and are unable to build anything radically new or exciting.