Welcome to symthic forums! We would love if you'd register!
You don't have to be expert in bit baking, everyone is more than welcome to join our community.

You are not logged in.

Hey! If this is your first visit on symthic.com, also check out our weapon damage charts.
Currently we have charts for Battlefield 3, Call of Duty: Black Ops 2, Medal of Honor: Warfighter and Call of Duty: Modern Warfare 3

  • "pmax" started this thread

Posts: 139

Date of registration
: Dec 8th 2011

Platform: PC

Location: Durham, NC

Battlelog:

Reputation modifier: 10

  • Send private message

1

Sunday, April 6th 2014, 5:15am

Nerdy Statistics

Greetings, everyone. I am a long-time reader, and almost-first-time poster. The official Battlelog forums make me simultaneously sad and angry; it's always nice to come here and be whelmed by how everyone is so reasonable.

Some members of this community put forth quite a lot of effort digging into the a priori aspects of the game: the hard numbers straight from the game files on things like weapon performance. I'm afraid I'm not much help on that front, but I can, perhaps, make some "hard" a posteriori observations. Through the Battlelog stats API, I have gathered data on about 25,000 PC players (see below for that methodology); I think this sample is large enough that findings about it are significant. Here are a few examples of stuff I have gleaned:

The total k/d ratio of the entire game is around 1.34. That is total kills divided by total deaths.

The average k/d ratio of individual soldiers is 1.299 0.0237 (99% confidence interval). That is the sum of all the individual soldiers' k/d ratios divided by the total number of soldiers.

However, when dealing with ratios, it's generally better to transform them logarithmically; the mean log(k/d) is 0.1391 0.0077, which translates back to a k/d of 1.1492, with a 99% confidence interval of (1.1403, 1.1581).

Here are a couple of graphics I made with some of the available data, too:





This one is from a simple random sample of 2,500 players (about 10% of the total data--all of the data on one scatter plot like this would be ridiculous):



The correlation value here is -0.03, so basically zero. Maybe all those "sissy snipers" aren't actually doing their k/d ratios any favors. The more relevant plot would be log(k/d) vs (fraction of) time using a sniper rifle, but that involves a slightly more, well, involved SQL query; I'll save that one for another time.

So, if there are any statistical questions you guys have that I might be able to answer with this gigantic chunk of data, this is the place to ask. I'd love to attack them!

-----

ADDENDUM: Data Collection Methodology

In order to query Battlelog's stats API, an individual soldier's "soldier ID" is required. To collect a bunch of IDs, I wrote a script to scrape the official Battlelog forums (I think just the BF4 General Discussions and Battlefield 4 - PC subforums, as I was seeking specifically data about PC players) for account names, then another to scrape those accounts' profile pages looking for BF4 PC soldier IDs, then another to do a gazillion queries of the stats API. I ended up with data on just over 25,000 PC players in a 370ish MB SQLite database. The collection happened before the release of Naval Strike, although it happened close enough before then that it does feature some data on the NS weapons, vehicles, and equipment. I will probably run the collection script again in the near future; I have it running on a Raspberry Pi, and it'll probably take between a week and two weeks to gather all the queries and update its database (which gets slower as the SQLite file gets larger).

So, admittedly, this isn't a perfect random sample of players; it only includes players who have posted on the forums. I'm not sure what kind of bias this might induce in my data. Is it bias towards good players, because the ones most interested in the game are the ones who are going to post on the forums? Is it bias towards lousy players, because they are the ones spending all their time whining on the forums instead of getting better? In any case, the bias is probably much more deep and complicated than just good/bad (what do "good" and "bad" even mean in this context?). I honestly don't know.

Posts: 77

Date of registration
: Nov 17th 2013

Platform: PC

Battlelog:

Reputation modifier: 4

  • Send private message

2

Sunday, April 6th 2014, 5:55am

That's pretty cool. Although I do wonder about the sampling as well.

Would it not be more valid to just generate sequential IDs, and reject the ones that are invalid?

Posts: 1,614

Date of registration
: Apr 12th 2013

Platform: PC

Location: Guilin Peaks, Finland

Battlelog:

Reputation modifier: 14

  • Send private message

3

Sunday, April 6th 2014, 8:51am

Very good, this kind of an approach can be made very informative. You can control the sampling issues by conjunction searches: for example "what are the top-10 used ARs, their kpm, accuracy, etc. in players that have played more than 100 hours as assault and have an above average kdr?"

Are there available any stats related maps? Or related to loadouts?

We should think what kind of questions would be useful to ask. For instance, can we corroborate a priori accuracy estimates with these data? Could the gun-usability factor be estimated ... as in "what is the difference between a priori T100 gun ordering and in-game KPM? What are the gun stats (v recoil, fsm, ...) that best predict this difference?"

And hey, welcome!
"Less is more? How can that be? How could less be more, that's impossible. More is more." Yngwie Malmsten
"Many bullets help." WoopsyYaya
"most rhetorically legitimate ad hominem 2015" ToTheSun!

Posts: 286

Date of registration
: Mar 18th 2014

Platform: Xbox One

Location: The Moon

Battlelog:

Reputation modifier: 2

  • Send private message

4

Sunday, April 6th 2014, 9:16am

*Sees thread title. Instantly clicks link and reads*

Wow, that was certainly informative. Two things though: shouldn't the total of all players K/D ratio be slightly less than 1? For every death someone has either killed you or you were KIA. For these 25,000 players, people who tend to look on the forums tend to be more serious/ curious on how to excel - would this lead to a bias in the selection of statistics?

I'd like to go on record by saying I like you already :)

*Activates Jedi Mind Powers*

VincentNZ

Holy War? No Thanks.

(2,416)

Posts: 2,810

Date of registration
: Jul 25th 2013

Platform: PC

Battlelog:

Reputation modifier: 16

  • Send private message

5

Sunday, April 6th 2014, 11:27am

It just goes to show that the starting PDW is the only PDW on these charts...

Posts: 630

Date of registration
: Jan 27th 2013

Platform: PC

Battlelog:

Reputation modifier: 5

  • Send private message

6

Sunday, April 6th 2014, 12:44pm

At first I was thinking how on earth can average K/D be above 1.0, when it should average to less than 1.0 if 1 kill = 1 death excluding suicides, then you add suicides. But then I remembered: ASSIST COUNTS AS KILL

ToTheSun!

Be Creative.

(5,050)

Posts: 7,811

Date of registration
: Mar 9th 2012

Platform: PC

Location: Portugal

Reputation modifier: 19

  • Send private message

7

Sunday, April 6th 2014, 12:51pm

The one problem i see with your script is that BLog posting is appealing only to, mostly, average/bad-ish players and trolls. I, honestly, never see many good players there.

I'm not sure what kind of bias this might induce in my data. Is it bias towards good players, because the ones most interested in the game are the ones who are going to post on the forums? Is it bias towards lousy players, because they are the ones spending all their time whining on the forums instead of getting better?

So, the latter would be my guess.

Otherwise, looks very neat.

Miffyli

Symthic Developer

(6,883)

Posts: 3,738

Date of registration
: Mar 21st 2013

Platform: PC

Location: __main__, Finland

Reputation modifier: 17

  • Send private message

8

Sunday, April 6th 2014, 1:30pm

Yesssssss, Yeeeeeeeeeeeeeeeeeeeeeeeeessss, this is what I like to see here on these forums as it is, after all, "statistics" forum :D. First I was going ask if you could share those samples you collected but then I read the method you got the IDs as I am "struggling" with that too. Maybe only way to get "pure" random sample is to generate random IDs and try them out if they result a real soldier.
If you happen to write a script for more random sampler, would you mind sharing the samples you collect? Would save me the trouble of doing that if I happen to get that mood for statistics at some point.
Links to users' thread list who have made analytical/statistical/mathematical/cool posts on Symthic:
  • 3VerstsNorth - Analysis of game mechanics in BF4 (tickrates, effects of tickrate, etc)
  • InterimAegis - Weapon comparisons/scoring.
  • leptis - Analysis of shotguns, recoil, recoil control and air drag.
  • Veritable - Scoring of BF4/BF1 firearms in terms of usability, firing and other mechanics.
  • pmax - Statistical analysis of BF4 players/games.
  • Miffyli - Random statistical analysis of BF4 battlereports/players and kill-distances. (list is cluttered with other threads).
Sorry if your name wasn't on the list, I honestly can't recall all names : ( . Nudge me if you want to be included

Posts: 7,809

Date of registration
: Feb 25th 2012

Platform: PC

Location: italy

Battlelog:

Reputation modifier: 19

  • Send private message

9

Sunday, April 6th 2014, 1:59pm

The correlation value here is -0.03, so basically zero. Maybe all those "sissy snipers" aren't actually doing their k/d ratios any favors. The more relevant plot would be log(k/d) vs (fraction of) time using a sniper rifle, but that involves a slightly more, well, involved SQL query; I'll save that one for another time.

not really understood this part,i mean,what should the picture show?
and also yeah,sniping doesn't help your K/D unless you're basically abadass(and in fact,am i the only who plays long range sniper for anything except kd ratio?)


p.s.: miff,it's said "brillante",without the "".
"I'm just a loot whore."


stuff mostly unrelated to BF4 that interests nobody



bf4
on 13/05/2016
23rd M320FB user on pc(13/05/16)
rush mode score RANK:2794 TOP:2% OUT OF:215398
obliteration mode scoreRANK:994 TOP:1% OUT OF:159466
handgun medals RANK:2236 TOP:2% OUT OF:143874
longest headshot RANK:9512 TOP:4% OUT OF:257589
recon score RANK:10871 TOP:4% OUT OF:274899
general score per minute RANK:10016 TOP:4% OUT OF:294774

bf3
31/3/2012 4:58:

Headshot distance RANK:493* TOP:0%
Revives per assault minute RANK: 6019 TOP: 3%
Headshots / kill percentage RANK:25947 TOP:13%
MVP ribbons RANK:18824 TOP:11%

*= 6 if we not count the EOD BOT headshots

@kataklism

ARGUMENT DESTROYED 100

ENEMY KILLED [REASON] JSLICE20 100


WRITING SPREE STOPPED 500

link to full-size old avatar:
http://i.imgur.com/4X0321O.gif




  • "pmax" started this thread

Posts: 139

Date of registration
: Dec 8th 2011

Platform: PC

Location: Durham, NC

Battlelog:

Reputation modifier: 10

  • Send private message

10

Sunday, April 6th 2014, 6:55pm

And hey, welcome!

Thanks to everybody for the warm sort-of welcome!

Here's the chart I said I would make in my first post (again from an SRS of 2,500):



Again, the correlation is -0.006, which is more or less none. Snipers, it seems, don't tend to have any better k/d ratios than anyone else.

I have a bunch of follow-up planned; I'd like to try to answer everybody; this may result in my posting multiple responses in a row; I hope you guys will forgive this temporary breach of etiquette.

Are there available any stats related maps? Or related to loadouts?

Not that I know of. You might be able to glean map data by looking at battle reports, but those are mostly hidden to people not on a player's friends list. I think the only data one could gather about loadouts would have to come from crawling the individual soldiers' loadout pages. I'll leave that up to someone else. You can see for yourself what data is available through the API; go ahead and query my soldier:

http://battlelog.battlefield.com/bf4/war…te/362854283/1/
http://battlelog.battlefield.com/bf4/war…ts/362854283/1/
http://battlelog.battlefield.com/bf4/war…ts/362854283/1/

Most of this data isn't particularly interesting and has to do with how stuff is displayed on Battlelog. There were also several queries from which I didn't use any data:

http://battlelog.battlefield.com/bf4/war…te/362854283/1/
http://battlelog.battlefield.com/bf4/war…te/362854283/1/
http://battlelog.battlefield.com/bf4/war…te/362854283/1/
http://battlelog.battlefield.com/bf4/war…ry/362854283/1/

There is also a panoply of queries of the form "http:://.../warsawWeaponAccessoriesPopulateStats/[personaID]/1/[WEAPON ID]/" which might give you some more loadout information (I haven't looked closely), but I didn't collect it; my database is already huge.

In any case, to get an idea of what data I actually stored, here's the .SCHEMA of the SQLite database I used:

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
CREATE TABLE main (
                id TEXT,
                date TEXT,
                time INTEGER,
                score INTEGER,
                score_award INTEGER,
                score_bonus INTEGER,
                score_combat INTEGER,
                score_general INTEGER,
                score_squad INTEGER,
                score_team INTEGER,
                score_unlock INTEGER,
                
                rank INTEGER,
                rounds INTEGER,
                wins INTEGER,
                skill INTEGER,
                
                kills INTEGER,
                deaths INTEGER,
                shots INTEGER,
                hits INTEGER,
                headshots INTEGER,
                kills_avenger INTEGER,
                kills_savior INTEGER,
                assists INTEGER,
                suppressions INTEGER,
                
                flags_captured INTEGER,
                kills_mcom_defend INTEGER,
                kills_flag_defend INTEGER,
                
                longest_headshot REAL,
                
                vehicle_damages INTEGER,
                vehicles_destroyed INTEGER,
                
                dogtags INTEGER,
                
                score_conquest INTEGER,
                score_rush INTEGER,
                score_deathmatch INTEGER,
                score_domination INTEGER,
                score_obliteration INTEGER,
                score_defuse INTEGER,
                
                time_assault INTEGER,
                score_assault INTEGER,
                heals INTEGER,
                revives INTEGER,
                time_engineer INTEGER,
                score_engineer INTEGER,
                repairs INTEGER,
                time_recon INTEGER,
                score_recon INTEGER,
                time_support INTEGER,
                score_support INTEGER,
                resupplies INTEGER,
                
                score_vehicle INTEGER
            );
CREATE TABLE vehicle (
                id TEXT,
                slug CHAR(32),
                category TEXT,
                time INTEGER,
                kills INTEGER,
                destroys INTEGER,
                date TEXT
            );
CREATE TABLE weapon (
                id TEXT,
                slug CHAR(32),
                category CHAR(8),
                time INTEGER,
                shots INTEGER,
                hits INTEGER,
                headshots INTEGER,
                kills INTEGER,
                date TEXT
            );


We should think what kind of questions would be useful to ask. For instance, can we corroborate a priori accuracy estimates with these data? Could the gun-usability factor be estimated ... as in "what is the difference between a priori T100 gun ordering and in-game KPM? What are the gun stats (v recoil, fsm, ...) that best predict this difference?"

This is actually pretty interesting. I'll need to get the actual game data arranged into some useable Python data structure; I'll stick this on the list of things to tackle.

Similar threads