Why Your Analytics Has NOTHING On The Hadron Collider

Switzerland, Geneva, CERN, Large Hadron Collidor.
Humans are dwarfed by a giant like CMS. And yet without humans these incredible machines and cameras would not exist, and neither would the theories and ideas that make people sit down and invent new technologies for even more successful particle detection work. The fact that 10,000 people work together in harmony, driven by curiosity, challenged by seeming impossibilities, is hard to photograph. But this is precisely why it deserves as much admiration as the admirable beasts of science.
Schweiz, Genf, CERN, Large Hadron Collider.
Der Mensch ist winzig gegen den Riesen CMS. Und doch würden diese riesigen Maschinen und Kameras ohne den Menschen nicht existieren, ebenso wenig wie die Theorien und Ideen, die Menschen dazu bringen, sich hinzusetzen und neue, bessere Technologien für eine noch erfolgreichere Teilchen-Detektivarbeit zu entwerfen. Die Tatsache, dass 10.000 Leute harmonisch zusammenarbeiten, angetrieben von der Neugierde, herausgefordert von scheinbar unlösbaren Problemen, lässt sich schwer fotografieren. Doch genau diese Tatsache verdient ebenso viel Aufmerksamkeit, wie sie den bewundernswerten „Biestern” der Wissenschaft entgegengebracht wird.

Suisse, Genève, CERN, Large Hadron Collidor.L’HOMME EST UN NAIN DEVANT LE CMS. Et pourtant, sans les humains, ces prodigieuses machines et caméras n’existeraient pas, pas plus que les théories et les idées qui poussent les hommes à s’asseoir pour inventer de nouvelles théories permettant de mieux travailler à la détection de particules. Le fait que 10.000 personnes travaillent ensemble en bonne harmonie, qu’ils sont poussés par la curiosité et confrontés à d’apparentes impossibilités n’est pas facile à prendre en photo, mais il mérite tout autant d’admiration que les belles bêtes de la science.
SHARE
THIS



So you think you know all about analytics? Patting yourself on the back about your latest cross-channel attribution modelling and the terabytes of data you’ve successfully corralled into a database? Time for a little perspective — because there’s big data and then there’s BIG data.

While brand managers the world over complain about the deluge of data they need to make sense of these days, data scientists at CERN are trying to solve the mysteries of the universe using facilities like the Large Hadron Collider (LHC), the world’s largest particle accelerator. Sifting through billions of data points from a fire hose measurable in terabytes per second, the data challenges faced by CERN’s physicists dwarf those of most commercial entities.

Bob Jones is the Project Leader at CERN, who is a driving force behind CERN’s information management expertise and was the Head of CERN openlab between January 2012 and December 2014.

Among the many challenges Jones and his colleagues face is trying to gather insights from more than 20 petabytes of data from CERN’s Large Hadron Collider every year and, in particular, isolating the small number of particle collisions involving the elusive Higgs particle from the vast stream of event data.

To quantify the magnitude of this task, Jones explains: “During Run 1, the LHC produced six million billion proton-proton collisions… Of these, only around 400 produced results compatible with the Higgs particle, whose discovery was announced in July 2012. So you can see that identifying the right 400 events out of six million billion proton-proton collisions is really like looking for a needle in a haystack.”

To manage this scale of data effectively, CERN has been a long-time champion of distributed processing and innovative data storage approaches.

According to Jones, “CERN and the physics community has been a driving force in the development of grid computing since the year 2001. This has led to the deployment of a production infrastructure with a global footprint known as the Worldwide LHC Computing Grid, or WLCG for short, which provides the resources to store, distribute and analyse the data from the LHC.”

Jones works in the IT department of CERN, which serves the whole organisation and has a critical role to play in supporting the scientific programme. Understandably, “It is a very demanding environment with continuous renewal and upgrades to services.”

Despite — or perhaps because of — this pressure, Jones says he is constantly impressed by the quality of the people he works with, including the numerous world-class experts on site, and their approach to tackling vast and complex tasks. “CERN has a university campus feel about it and the people are very open and willing to help and collaborate.”

We asked Jones to describe what for him and his colleagues are the biggest challenges of big data at CERN. The challenges are many and varied: “It is the combination of storage capacity, access patterns and sometimes unpredictable analysis workloads that are the biggest challenge,” he said.

As well as dealing with the voluminous data produced by CERN’s many experiments, the speed with which physicists develop and change focus in their experimental work adds to the data-management challenge. According to Jones, research moves quickly and those involved can’t always predict which particular dataset will prove to be the most popular and require the most resources.

“So our system needs to be very dynamic. We have a 3.5 MW computer centre on-site in Geneva and have leased space in a second computer centre at Wigner in Budapest, Hungary.

“We are ready for Run 2 of the LHC, which started in June 2015 with the experiments taking data at the unprecedented energy of 13 TeV, following the two-year long shutdown. We have multiple 100Gbps lines linking the two centres, which enables us to operate them as a single OpenStack cloud.”

And the volume of data is set to continue growing. The LHC experiments have already recorded 100 times more data for the summer conferences this year than they had around the same time after the LHC started up at 7 TeV in 2010, he says.

On the grid

While CERN originally hosted all IT services on-site in a traditional service provision model, as the needs of the scientific programme expanded it focused more heavily on developing off-site data processing and management capability through grid computing.

This approach has allowed CERN to federate IT resources from partner organisations around the world, as well as scale storage and processing power more efficiently. The cloud services market — originated by Amazon Web Services back in 2002 and much-loved by data scientists in publishing and other high-data-volume sectors — has proved a powerful tool for CERN as well.

“This market now offers us new opportunities to increase the scale and range of the IT services we can build on. We are working towards a hybrid cloud model, where we can we can opportunistically use any resources taking into account availability, price and policy.”

He said CERN is actively investigating this approach with cloud service companies and other research organisations in the context of the Helix Nebula initiative. “So seeking new opportunities and keeping flexible enough to profit from them is a key aspect of our strategy at CERN. But we have learnt that operating production services at this scale is not something that can be improvised.”

According to Jones, “Every time we have had to increase scale it has required development which takes time and advanced planning. Similarly, the importance of preserving data is paramount. CERN puts significant resources into bit-level preservation of data, including the use of tape systems where the technology continues to evolve.”

This archived data must be actively managed to ensure it remains available for future use, he said. However there is most certainly a balancing act required to ensure CERN holds on to meaningful scientific data but doesn’t unintentionally house excessive volumes of meaningless information.

“At this scale it is not possible to keep all the data (the LHC produces up to a Petabyte of data per second) and it is essential to have efficient data-filtering mechanisms so that we can separate the wheat from the chaff. A key risk is throwing away the data you need and cannot reproduce.”

Some description

We also asked Jones if the approach to data analysis or exploration is very different when dealing with the vast quantities of data from the Large Hadron Collider or if the thought processes are similar to smaller-scale experiments, just executed using tools capable of handling greater scale.

“The volume of data produced at the LHC is a challenge, but the process is similar for smaller-scale experiments,” he said

“Particles collide at high energies inside the detectors, creating new particles that decay in complex ways as they move through layers of subdetectors. The subdetectors register each particle’s passage and microprocessors convert the particles’ paths and energies into electrical signals, combining the information to create a digital summary of the collision event.”

The raw data per event is around one million bytes (1 MB), produced at a rate of about 600 million events per second. The Worldwide LHC Computing Grid tackles this mountain of data in a two-stage process.

First, it runs dedicated algorithms written by physicists to reduce the number of events and select those considered interesting — a sophisticated winnowing out of noise from the data sets. “Analysis can then focus on the most important data — that which could bring new physics measurements or discoveries.”

When it comes to analytical tools, it’s unsurprising to hear that CERN’s data scientists have built and tweaked their own analytical toolkit. “The physics community has progressively developed, over a number of years, a set of software tools dedicated to this task. These tools are constantly being improved to ensure they continue work at the growing scale of the LHC data challenge. ROOT is a popular data-analysis framework — it is a bit like R on steroids.”

Jones says that grid computing has also been immensely helpful in enabling physicists to run analysis at scale. “Grid computing helps by providing an underlying global infrastructure with the capacity to be able to match the analysis needs of the LHC. But the grid itself is evolving to make more use of cloud computing techniques and profit from the improvements in hardware (processors, storage etc.) as well as the cost-effectiveness of high performance networks.”

Lessons for brands

The CERN Data Centre has the ability to process incredibly high throughput in order to manage the data coming out of the Large Hadron Collider. That prompts the question of whether there will be many situations in which the commercial sector would need that extreme throughput capability.

According to Jones, “CERN is a leader but not alone in having to deal with such high data throughputs. We expect to see similar scales in other sciences (such as next generation genome sequencing as well as the Square Kilometre Array which will primarily be deployed in Australia and South Africa) and various business sectors linked to the growing Internet of Things in the near future.”

He described CERN as being ahead of the curve, and said the technologies and processes developed — as well as the lessons learned — at CERN can be applied in other fields. However, he emphasised that CERN’s advanced capabilities are not acquired by happenstance — the organisation spends a great deal of effort in growing the skills needed to develop cutting-edge data solutions.

“Education is a key element of CERN’s mission. For those working at CERN, we have technical and management training programmes and series of computing seminars as well as the CERN School of Computing. We are constantly recruiting young scientists, engineers and technicians who also bring new skills and ideas into CERN’s environment. Engagement with leading IT companies through CERN openlab has been a source of many new developments and helps train successive generations of personnel in the latest techniques, ” he said.

CERN is also a poster child for the power of not only open source but also a culture of organisational openness. Jones said, “CERN’s open culture coupled with developments such as commercial cloud services where an organisation’s data may be stored off-site, and a Bring Your Own Device (BYOD) policy for the site, means we have to be proactive to ensure everyone respects intellectual property rights and the relevant data protection legislation.

“We are also active in the deployment of federated identity-management systems for access to IT services, and such a model has been in place for the Worldwide LHC Computing Grid since its creation.”

This article originally appeared at www.which-50.com

Latest News

Women In Media Profile: Zeina Khodr
  • Media

Women In Media Profile: Zeina Khodr

When B&T was offered to interview a lady named “Zeina”, we initially thought it was the warrior princess herself.

by B&T Magazine

B&T Magazine
Q&A With Cannes Chairman Terry Savage
  • Advertising
  • Media

Q&A With Cannes Chairman Terry Savage

If you mistakenly thought this was an interview with Kojak's Telly Savalas, as it turns out, it's even better than that.

Optus To Phase Out Virgin Mobile Brand
  • Marketing

Optus To Phase Out Virgin Mobile Brand

Didn't much fancy that Telstra story above? Well, perhaps B&T could tempt you with this tasty Optus tickler instead.

by B&T Magazine

B&T Magazine
AKQA R&D Teams Up With Google For Semi Permanent 2018
  • Media

AKQA R&D Teams Up With Google For Semi Permanent 2018

AKQA R&D in collaboration with Google has developed a process called Somesthetic Transfer. It uses machine learning to take both the style and texture of an artwork and apply it to another image to be 3D printed with a UV printer. The works will be displayed at Semi Permanent 2018 in Sydney from Thursday 24 […]

WPP’s Health & Wellness To Unveil ‘How Design Can Save A Life” For Vivid
  • Marketing
  • Media

WPP’s Health & Wellness To Unveil ‘How Design Can Save A Life” For Vivid

Design students from Torrens University’s Billy Blue College of Design have partnered with WPP AUNZ’s Health & Wellness division to decode clinical data on ‘Medicinal Marijuana in the Treatment of Epilepsy’. Their creative solutions will be presented at a Vivid Ideas Exchange titled ‘Design can save your life: how can creativity improve comprehension of health data?’ […]

Diversity Case Study Series: The Royals
  • Advertising
  • Marketing
  • Media

Diversity Case Study Series: The Royals

If you read one diversity case study on B&T today, make it this one. Not that there's any others to choose from, really.

by B&T Magazine

B&T Magazine
Cricket Australia Finds New Major Sponsor
  • Marketing

Cricket Australia Finds New Major Sponsor

B&T was going to insert a homage to Sherbert's classic "Howzat" here before realising it was naff and showed our age.

by B&T Magazine

B&T Magazine
PHD Wins HSBC Global Media Account
  • Media

PHD Wins HSBC Global Media Account

ICYMI, PHD has FTW the HSBC business IRL. It's not so much a LOL or NSFW, but a TL;DR. Ahh, forget the whole thing.

InMoment Accelerates APAC Expansion With Key Acquisition
  • Media

InMoment Accelerates APAC Expansion With Key Acquisition

Customer experience (CX) intelligence platform, InMoment, has expanded into Australia and New Zealand with the acquisition of customer experience agency, brandXP and appointment of Claire Fastier as its APAC business head.

Marketers Need To Focus On Voice Search Right Now
  • Opinion

Marketers Need To Focus On Voice Search Right Now

In this opinion piece, Performics Australia performance content account director Steve Robinson (pictured below) dives into the hazy relationship between marketers and voice-activated technology. From settling family disputes, (“Hey Google, are fossil fuels really made from dinosaur bones?”) to running a household, (“OK Google, add toothpaste to my shopping list”), voice-activated systems are becoming more […]

Opinion

by B&T Magazine

B&T Magazine
SpotX Launches Online Resource Library For Publishers Seeking GDPR Guidance
  • Advertising
  • Media

SpotX Launches Online Resource Library For Publishers Seeking GDPR Guidance

Video advertising and monetisation platform SpotX has announced the launch of an online resource library for the European Union’s General Data Protection Regulation (GDPR). The library can be found on the company’s website and is part of a wider educational initiative by SpotX – which is headquartered in the US and has an office in […]

Daylight Agency Opens Government Division, Hires Legal & Political Expert
  • Marketing

Daylight Agency Opens Government Division, Hires Legal & Political Expert

Integrated communications firm Daylight Agency has cemented its presence in the government and political arena with the appointment of industry veteran David Begg. A practising solicitor for over 28 years, Begg (pictured above) will head up the agency’s new government division. He has extensive experience working with large commercial clients on high-profile regulatory and legal […]

Frost Design Unveils New Look For John Holland
  • Marketing

Frost Design Unveils New Look For John Holland

Frost Design has shown off its comprehensive rebrand of iconic Australian infrastructure and property business John Holland, based around the brand idea of ‘Transforming Lives’. John Holland CEO Joe Barr said Frost was the only branding business who understood that the impact of the rebrand was just as important internally as it would be externally, […]

Women In Media Profile: Jill Johnston
  • Media

Women In Media Profile: Jill Johnston

We've got an absolute cracking Women in Media profile today. Not to say it's approved by the Chiropractors' Association.

by B&T Magazine

B&T Magazine
First Lady Michelle Obama and President Barack Obama Dance Together at the Presidents Ball
  • Media

Netflix Secures Producer Deal With The Obamas

It seems former US president Barack Obama and former First Lady Michelle Obama aren’t quite done with the international spotlight just yet. The Obamas have officially signed a multiyear deal with Netflix, which will see the power couple producing films and series with the streaming giant. According to Netflix, “The Obamas will produce a diverse […]

How To Spot A Trend Versus A Fad
  • Opinion

How To Spot A Trend Versus A Fad

One need only view our editor's hot pink leg warmers today to realise he's neither fad nor trend.

Opinion

by B&T Magazine

B&T Magazine
Why Engagement Is The New Rule Of Content
  • Opinion

Why Engagement Is The New Rule Of Content

And why is B&T running a photo of an engaged couple for this content engagement article? It was either that or a carrot.

Opinion

by B&T Magazine

B&T Magazine
Fairfax Media Recruits 20 Trainee Journalists Across All Its Titles
  • Media

Fairfax Media Recruits 20 Trainee Journalists Across All Its Titles

Fairfax Media has today announced the recruitment of 20 trainee journalists to join the newsrooms of The Sydney Morning Herald, The Age, The Canberra Times, The Australian Financial Review, Brisbane Times and WAtoday. This represents one of Metro’s largest trainee intakes in years and underlines the commitment of the mastheads to fostering a new generation […]

Sydney Australia - November 7, 2011. An extreme close up of an American express card issued by the Australian bank Westpac.
  • Media

AMEX Launches Global Media Review

Are there not enough hours in the day? Are you cheek-by-jowl with your co-workers? Well, hopefully you won't win this.

Sydney Water & Republic of Everyone Create Vinyl Records Made Of Recycled Marine Plastic
  • Campaigns
  • Marketing

Sydney Water & Republic of Everyone Create Vinyl Records Made Of Recycled Marine Plastic

Sydney Water has pioneered a way to create records from marine plastics and, in doing so, encourage more people to join the movement of local clean-up groups rolling up their sleeves every weekend to clean plastics, litter and other pollutants from the city’s beaches and waterways. Plastics are collected by volunteers, then cut in pieces […]

Credit Card Compare Names Its First Head Of Marketing
  • Marketing

Credit Card Compare Names Its First Head Of Marketing

Aussie credit card comparison site Credit Card Compare has announced the appointment of Caroline Raffan as its inaugural head of marketing. Raffan (pictured above) has over 10 years’ experience in brand strategy, campaign development, digital marketing, events, public relations, content creation, market research, advertising, stakeholder management, social media marketing, and project management. She joins Credit […]

Why Programmatic Needs To Become A Branding Machine
  • Advertising
  • Marketing
  • Opinion
  • Technology

Why Programmatic Needs To Become A Branding Machine

In this opinion piece, Phil Murrell (pictured below), country manager for Australia and New Zealand at Sizmek, argues that programmatic ad spend needs to move up the branding funnel. When I started in digital advertising way back in 2004, we got our media insertion orders via the fax machine. We relied on that machine to […]

Opinion

by B&T Magazine

B&T Magazine