Why Your Analytics Has NOTHING On The Hadron Collider

Switzerland, Geneva, CERN, Large Hadron Collidor.
Humans are dwarfed by a giant like CMS. And yet without humans these incredible machines and cameras would not exist, and neither would the theories and ideas that make people sit down and invent new technologies for even more successful particle detection work. The fact that 10,000 people work together in harmony, driven by curiosity, challenged by seeming impossibilities, is hard to photograph. But this is precisely why it deserves as much admiration as the admirable beasts of science.
Schweiz, Genf, CERN, Large Hadron Collider.
Der Mensch ist winzig gegen den Riesen CMS. Und doch würden diese riesigen Maschinen und Kameras ohne den Menschen nicht existieren, ebenso wenig wie die Theorien und Ideen, die Menschen dazu bringen, sich hinzusetzen und neue, bessere Technologien für eine noch erfolgreichere Teilchen-Detektivarbeit zu entwerfen. Die Tatsache, dass 10.000 Leute harmonisch zusammenarbeiten, angetrieben von der Neugierde, herausgefordert von scheinbar unlösbaren Problemen, lässt sich schwer fotografieren. Doch genau diese Tatsache verdient ebenso viel Aufmerksamkeit, wie sie den bewundernswerten „Biestern” der Wissenschaft entgegengebracht wird.

Suisse, Genève, CERN, Large Hadron Collidor.L’HOMME EST UN NAIN DEVANT LE CMS. Et pourtant, sans les humains, ces prodigieuses machines et caméras n’existeraient pas, pas plus que les théories et les idées qui poussent les hommes à s’asseoir pour inventer de nouvelles théories permettant de mieux travailler à la détection de particules. Le fait que 10.000 personnes travaillent ensemble en bonne harmonie, qu’ils sont poussés par la curiosité et confrontés à d’apparentes impossibilités n’est pas facile à prendre en photo, mais il mérite tout autant d’admiration que les belles bêtes de la science.
SHARE
THIS


What best describes you?

So you think you know all about analytics? Patting yourself on the back about your latest cross-channel attribution modelling and the terabytes of data you’ve successfully corralled into a database? Time for a little perspective — because there’s big data and then there’s BIG data.

While brand managers the world over complain about the deluge of data they need to make sense of these days, data scientists at CERN are trying to solve the mysteries of the universe using facilities like the Large Hadron Collider (LHC), the world’s largest particle accelerator. Sifting through billions of data points from a fire hose measurable in terabytes per second, the data challenges faced by CERN’s physicists dwarf those of most commercial entities.

Bob Jones is the Project Leader at CERN, who is a driving force behind CERN’s information management expertise and was the Head of CERN openlab between January 2012 and December 2014.

Among the many challenges Jones and his colleagues face is trying to gather insights from more than 20 petabytes of data from CERN’s Large Hadron Collider every year and, in particular, isolating the small number of particle collisions involving the elusive Higgs particle from the vast stream of event data.

To quantify the magnitude of this task, Jones explains: “During Run 1, the LHC produced six million billion proton-proton collisions… Of these, only around 400 produced results compatible with the Higgs particle, whose discovery was announced in July 2012. So you can see that identifying the right 400 events out of six million billion proton-proton collisions is really like looking for a needle in a haystack.”

To manage this scale of data effectively, CERN has been a long-time champion of distributed processing and innovative data storage approaches.

According to Jones, “CERN and the physics community has been a driving force in the development of grid computing since the year 2001. This has led to the deployment of a production infrastructure with a global footprint known as the Worldwide LHC Computing Grid, or WLCG for short, which provides the resources to store, distribute and analyse the data from the LHC.”

Jones works in the IT department of CERN, which serves the whole organisation and has a critical role to play in supporting the scientific programme. Understandably, “It is a very demanding environment with continuous renewal and upgrades to services.”

Despite — or perhaps because of — this pressure, Jones says he is constantly impressed by the quality of the people he works with, including the numerous world-class experts on site, and their approach to tackling vast and complex tasks. “CERN has a university campus feel about it and the people are very open and willing to help and collaborate.”

We asked Jones to describe what for him and his colleagues are the biggest challenges of big data at CERN. The challenges are many and varied: “It is the combination of storage capacity, access patterns and sometimes unpredictable analysis workloads that are the biggest challenge,” he said.

As well as dealing with the voluminous data produced by CERN’s many experiments, the speed with which physicists develop and change focus in their experimental work adds to the data-management challenge. According to Jones, research moves quickly and those involved can’t always predict which particular dataset will prove to be the most popular and require the most resources.

“So our system needs to be very dynamic. We have a 3.5 MW computer centre on-site in Geneva and have leased space in a second computer centre at Wigner in Budapest, Hungary.

“We are ready for Run 2 of the LHC, which started in June 2015 with the experiments taking data at the unprecedented energy of 13 TeV, following the two-year long shutdown. We have multiple 100Gbps lines linking the two centres, which enables us to operate them as a single OpenStack cloud.”

And the volume of data is set to continue growing. The LHC experiments have already recorded 100 times more data for the summer conferences this year than they had around the same time after the LHC started up at 7 TeV in 2010, he says.

On the grid

While CERN originally hosted all IT services on-site in a traditional service provision model, as the needs of the scientific programme expanded it focused more heavily on developing off-site data processing and management capability through grid computing.

This approach has allowed CERN to federate IT resources from partner organisations around the world, as well as scale storage and processing power more efficiently. The cloud services market — originated by Amazon Web Services back in 2002 and much-loved by data scientists in publishing and other high-data-volume sectors — has proved a powerful tool for CERN as well.

“This market now offers us new opportunities to increase the scale and range of the IT services we can build on. We are working towards a hybrid cloud model, where we can we can opportunistically use any resources taking into account availability, price and policy.”

He said CERN is actively investigating this approach with cloud service companies and other research organisations in the context of the Helix Nebula initiative. “So seeking new opportunities and keeping flexible enough to profit from them is a key aspect of our strategy at CERN. But we have learnt that operating production services at this scale is not something that can be improvised.”

According to Jones, “Every time we have had to increase scale it has required development which takes time and advanced planning. Similarly, the importance of preserving data is paramount. CERN puts significant resources into bit-level preservation of data, including the use of tape systems where the technology continues to evolve.”

This archived data must be actively managed to ensure it remains available for future use, he said. However there is most certainly a balancing act required to ensure CERN holds on to meaningful scientific data but doesn’t unintentionally house excessive volumes of meaningless information.

“At this scale it is not possible to keep all the data (the LHC produces up to a Petabyte of data per second) and it is essential to have efficient data-filtering mechanisms so that we can separate the wheat from the chaff. A key risk is throwing away the data you need and cannot reproduce.”

Some description

We also asked Jones if the approach to data analysis or exploration is very different when dealing with the vast quantities of data from the Large Hadron Collider or if the thought processes are similar to smaller-scale experiments, just executed using tools capable of handling greater scale.

“The volume of data produced at the LHC is a challenge, but the process is similar for smaller-scale experiments,” he said

“Particles collide at high energies inside the detectors, creating new particles that decay in complex ways as they move through layers of subdetectors. The subdetectors register each particle’s passage and microprocessors convert the particles’ paths and energies into electrical signals, combining the information to create a digital summary of the collision event.”

The raw data per event is around one million bytes (1 MB), produced at a rate of about 600 million events per second. The Worldwide LHC Computing Grid tackles this mountain of data in a two-stage process.

First, it runs dedicated algorithms written by physicists to reduce the number of events and select those considered interesting — a sophisticated winnowing out of noise from the data sets. “Analysis can then focus on the most important data — that which could bring new physics measurements or discoveries.”

When it comes to analytical tools, it’s unsurprising to hear that CERN’s data scientists have built and tweaked their own analytical toolkit. “The physics community has progressively developed, over a number of years, a set of software tools dedicated to this task. These tools are constantly being improved to ensure they continue work at the growing scale of the LHC data challenge. ROOT is a popular data-analysis framework — it is a bit like R on steroids.”

Jones says that grid computing has also been immensely helpful in enabling physicists to run analysis at scale. “Grid computing helps by providing an underlying global infrastructure with the capacity to be able to match the analysis needs of the LHC. But the grid itself is evolving to make more use of cloud computing techniques and profit from the improvements in hardware (processors, storage etc.) as well as the cost-effectiveness of high performance networks.”

Lessons for brands

The CERN Data Centre has the ability to process incredibly high throughput in order to manage the data coming out of the Large Hadron Collider. That prompts the question of whether there will be many situations in which the commercial sector would need that extreme throughput capability.

According to Jones, “CERN is a leader but not alone in having to deal with such high data throughputs. We expect to see similar scales in other sciences (such as next generation genome sequencing as well as the Square Kilometre Array which will primarily be deployed in Australia and South Africa) and various business sectors linked to the growing Internet of Things in the near future.”

He described CERN as being ahead of the curve, and said the technologies and processes developed — as well as the lessons learned — at CERN can be applied in other fields. However, he emphasised that CERN’s advanced capabilities are not acquired by happenstance — the organisation spends a great deal of effort in growing the skills needed to develop cutting-edge data solutions.

“Education is a key element of CERN’s mission. For those working at CERN, we have technical and management training programmes and series of computing seminars as well as the CERN School of Computing. We are constantly recruiting young scientists, engineers and technicians who also bring new skills and ideas into CERN’s environment. Engagement with leading IT companies through CERN openlab has been a source of many new developments and helps train successive generations of personnel in the latest techniques, ” he said.

CERN is also a poster child for the power of not only open source but also a culture of organisational openness. Jones said, “CERN’s open culture coupled with developments such as commercial cloud services where an organisation’s data may be stored off-site, and a Bring Your Own Device (BYOD) policy for the site, means we have to be proactive to ensure everyone respects intellectual property rights and the relevant data protection legislation.

“We are also active in the deployment of federated identity-management systems for access to IT services, and such a model has been in place for the Worldwide LHC Computing Grid since its creation.”

This article originally appeared at www.which-50.com

Please login with linkedin to comment

Latest News

ABC Replaces Lateline With New 10.30pm National News Bulletin
  • Media

ABC Replaces Lateline With New 10.30pm National News Bulletin

The ABC has announced it will replace its axed current affairs program Lateline with a half-hour news bulletin to be hosted by seasoned presenter Jeremy Fernandez. The 30-minute late-night edition news will air Monday-to-Friday Australia-wide from 29 January. Fernandez will also continue to present the NSW 7pm News on Fridays and Saturdays. ABC Director, News Gaven […]

On Air sign in a studio broadcasting via radio, podcast or wireless transmission.
  • Media

ARN Announces Departure Of Content Director & Radio Veteran

Australian Radio Network (ARN) has announced that Charlie Fox, content Director for WSFM and The Edge, and WSFM morning announcer Ron E Sparks, have departed. Fox has been with ARN since 2005, and Sparks joined WSFM in 2002. ARN national content director Duncan Campbell said the departure of Fox and Sparks are part of the […]

OMD Announces New MD For Melbourne
  • Media

OMD Announces New MD For Melbourne

OMD unveils new Melbourne MD, and it certainly wasn't among any of B&T's picks. Not that we were picking, mind you.

Sydney Festival Chooses Brightcove To Enhance Online Video Content
  • Media

Sydney Festival Chooses Brightcove To Enhance Online Video Content

Video cloud service Brightcove has announced that Sydney Festival has selected the platform to improve its user experience through online video content. Brightcove will do so by removing third-party branding and pre-roll ads, as well as adding autoplay functionality, ahead of the event in January. Leveraging the Brightcove video platform to host and publish video […]

Does Your Christmas Marketing Suck?
  • Opinion

Does Your Christmas Marketing Suck?

Remember the old Palmolive soap ad that asked, "Don't wait to be told"? This is sort of similar, albeit sans any suds.

Opinion

by Peter Harris

Peter Harris
Queensland Anti-Porn Crusaders Cop Ad Ban
  • Campaigns

Queensland Anti-Porn Crusaders Cop Ad Ban

We have a strident anti-pornography policy at B&T. Earphones must be worn at all times and strictly no little people.

by B&T Magazine

B&T Magazine
Nature’s Way Celebrates Big Things Little People Do In UCG Videos Via Wavemaker & Storyful
  • Advertising
  • Campaigns
  • Marketing

Nature’s Way Celebrates Big Things Little People Do In UCG Videos Via Wavemaker & Storyful

Health and wellbeing company Nature’s Way is celebrating the achievements of kids in a series of relatable and entertaining online videos that tug at the heart strings of parents. Instead of using actors to tell the brand story of Nature’s Way’s Kids Smart supplement range, Storyful has helped Wavemaker to license user-generated content clips of […]

DEC PR Wins Hisense Account
  • Marketing

DEC PR Wins Hisense Account

Electronics and whitegoods specialist Hisense has announced it has appointed DEC PR to lead its strategic communications remit in the local market, following a competitive tender process. DEC PR will work to support its brand building activities and managing corporate communications. The agency’s work will integrate closely with Hisense’s other marketing partners. Andre Iannuzzi, marketing […]

QMS Media Expands Presence In Melbourne With New Digital Billboard ‘The G’
  • Advertising
  • Media

QMS Media Expands Presence In Melbourne With New Digital Billboard ‘The G’

Outdoor media company QMS Media has strengthened its digital outdoor presence in Victoria, unveiling an iconic digital billboard in the heart of Melbourne’s famous sporting precinct. The launch of ‘The G’ sees QMS’ landmark digital portfolio ‘bookending’ arguably one of the most iconic and in-demand digital outdoor locations in Australia – Richmond Station Bridge – […]

HSBC Sydney 7s Creates New Visual Identity For 2018 Via Digilante
  • Advertising
  • Campaigns
  • Marketing

HSBC Sydney 7s Creates New Visual Identity For 2018 Via Digilante

As part of the HSBC World Rugby Sevens Series, the Sydney 7s event returns in January complete with a new brand identity, creative strategy and media campaign. With the 2017 Sydney 7s a sell-out event, the challenge for 2018 was to carry that momentum into the third year, defining Sydney’s place as the new ‘go-to’ […]

Bound Round Launches Family Travel Publication for Aussies
  • Media

Bound Round Launches Family Travel Publication for Aussies

Family travel platform Bound Round has announced the launch of its first e-magazine to Australian consumers. Travel Bound is a family-focused, digital quarterly publication available on all Virgin Australia flights through the carrier’s in-flight entertainment system, and is also available for consumer download via the Bound Round website. Bound Round founder and CEO Janeece Keller […]

SBS Unveils New Look, New App For SBS News
  • Media

SBS Unveils New Look, New App For SBS News

Do you like say "I only watch SBS" in an attempt to sound smarter than you are? Well, why not pretend to read this too?

by B&T Magazine

B&T Magazine
Ex-Samsung CMO Joins Amazon Australia As Marketing Director
  • Marketing

Ex-Samsung CMO Joins Amazon Australia As Marketing Director

Amazon Australia has appointed former Samsung Electronics chief marketing officer (CMO) Arno Lenior to lead its marketing division. Lenior worked as Samsung’s CMO for more than three years before departing in 2015. According to his LinkedIn profile, he has also held CEO roles at VR company Virtical and brand consultancy Blue Ocean Brands, and joined […]

by B&T Magazine

B&T Magazine
News Corp Names Its Car Of The Year
  • Marketing

News Corp Names Its Car Of The Year

News announces its annual Car Of The Year and it's not what you'd expect. Well, we certainly didn't expect it anyway.

Aussie Tennis Open & Country Road Announce New Partnership
  • Marketing

Aussie Tennis Open & Country Road Announce New Partnership

The Australian Open has partnered with iconic Australian clothing and lifestyle brand Country Road to supply the uniforms for next year’s tournament. Confirmed today as the official fashion lifestyle Partner of Australian Open 2018, one of Australia’s most loved brands will be worn by thousands of tournament staff at Melbourne Park in January. On-court officials including the chair and […]

Andrew O’Keefe Quits His Weekend Sunrise Duties
  • Media

Andrew O’Keefe Quits His Weekend Sunrise Duties

After 12 years with Weekend Sunrise, Andrew O’Keefe has decided to step away from his hosting and reporting duties. O’Keefe will continue his full-time role as host of Seven’s quiz show The Chase Australia, as well as focus on other projects with the Network. “The ideal job is doing work you love with people you love,” O’Keefe […]

Study: Brand Loyalty Hinges On Customer Support Experiences
  • Marketing

Study: Brand Loyalty Hinges On Customer Support Experiences

Fast and effective support will be a competitive differentiator for companies looking to win over digitally transformed customers and employees, new research has revealed. According to a study by IDC and LogMeIn titled Support Services as a Competitive Differentiator, nearly 67 per cent of consumers said that customer satisfaction was more important than price when […]

OMA Appoints Red Ant Design To Build Industry’s First Automated Proposal Platform
  • Advertising
  • Media

OMA Appoints Red Ant Design To Build Industry’s First Automated Proposal Platform

The Outdoor Media Association (OMA) has announced the appointment of Red Ant Design to build the out-oh-home (OOH) industry’s first automated proposal platform (APP). After an extensive global search and industry consultation, the decision to build the system in-house with a local Australian company was made. The APP will be built in an agile process, […]